Birth Death Code [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My question:
At an institute for experimental mathematics there is a computer that helps solve problems.
Problems arrive to the computer at a poisson process with intensity "Landa" per hour.
The time to solve each problem can be seen as a exponential distribution with parameter "mu".
In our world we have four different states. S = (0,1,2,3)
State 0 = 0 problems have arrived to the computer
State 1 = the computer is solving 1 question
State 2 = the computer is solving 1 question + 1 in queue.
State 3 = the computer is solving 1 question + 2 in queue.
If a question comes when we are in state 3, the sender gets an error message and tries again later. The institution has decided that maximum of 5 % of the senders should get this error message.
To decide who should have access to the computer, we are 3 different proposals.
Only the professors are alowed to send questions ( Landa = 2, Mu = 10)
Professors and students are alowed to send questions( Landa = 6, Mu = 10)
Anyone is alowed to send questions( Landa =10, Mu = 10)
We should investigate what of the 3 proposals don't fill upp the computer more than 5% of the time.
I have two things i need help with
First thing:
To Solve the question I've been given this structure of code(the code under). What i need help with is if someone can briefly explain to me the purpose of the following code paragraf where i have written "#?".
So what i really need help with is some1 to explain parts of the code.
Secound thing:
In two places i have written "...", there i need help to fill in some code.
bd_process <- function(lambda, mu, initial_state = 0, steps = 100) {
time_now <- 0
state_now <- initial_state
time <- 0
state <- initial_state
for (i in 1:steps) {
if (state_now == 3) {
lambda_now <- 0
} else {
lambda_now <- lambda
}
if (state_now == 0) {
mu_now <- 0
} else {
mu_now <- mu
}
#?
time_to_transition <- ...
#?
if (...) {
state_now <- state_now - 1
} else {
state_now <- state_now + 1
}
#?
time_now <- time_now + time_to_transition
#?
time <- c(time, time_now)
#?
state <- c(state, state_now) #WHAT DOE THIS VECTOR CONSIST OF?
}
list(time = time, state = state)
}

The code appears to be written with an implicit assumption that the interarrival and service distributions are memoryless, i.e., either exponential or geometric. Without memorylessness it's invalid to turn off processing by setting the rates to zero.
With the memoryless property, you can figure out the time_to_transition as a superposition of the two Poisson processes, and determine whether it was an arrival or a departure by randomizing proportional to the ratio of one of the rates to the combined rate. It's also valid to zero one of the rates then because when you unzero it the elapsed time where the rate was zero doesn't matter due to the memoryless property.

Related

Having trouble solving simulation

I got a question related to probability theory and I tried to solve it by simulating it in R. However, I ran into a problem as the while loop does not seem to break.
The question is asking: How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December?
Here is my code:
prob <- 0
people <- 1
while (prob <= 0.7) {
people <- people + 1 #start the iteration with 2 people in the room and increase 1 for every iteration
birthday <- sample(365, size = people, replace = TRUE)
prob <- length(which(birthday == 365)) / people
}
return(prob)
My guess is that it could never hit 70%, therefore the while loop never breaks, am I right? If so, did I interpret the question wrongly?
I did not want to post this on stats.stackexchange.com because I thought this is more related to code rather than math itself, but I will move it if necessary, thanks.
This is a case where an analytical solution based on probability is easier and more accurate than trying to simulate. I agree with Harshvardhan that your formulation is solving the wrong problem.
The probability of having at least one person in a pool of n have their birthday on a particular target date is 1-P{all n miss the target date}. This probability is at least 0.7 when P{all n miss the target date} < 0.3. The probability of each individual missing the target is assumed to be P{miss} = 1-1/365 (365 days per year, all birthdates equally likely). If the individual birthdays are independent, then P{all n miss the target date} = P{miss}^n.
I am not an R programmer, but the following Ruby should translate pretty easily:
# Use rationals to avoid cumulative float errors.
# Makes it slower but accurate.
P_MISS_TARGET = 1 - 1/365r
p_all_miss = P_MISS_TARGET
threshold = 3r / 10 # seeking P{all miss target} < 0.3
n = 1
while p_all_miss > threshold
p_all_miss *= P_MISS_TARGET
n += 1
end
puts "With #{n} people, the probability all miss is #{p_all_miss.to_f}"
which produces:
With 439 people, the probability all miss is 0.29987476838793214
Addendum
I got curious, since my answer differs from the accepted one, so I wrote a small simulation. Again, I think it's straightforward enough to understand even though it's not in R:
require 'quickstats' # Stats "gem" available from rubygems.org
def trial
n = 1
# Keep adding people to the count until one of them hits the target
n += 1 while rand(1..365) != 365
return n
end
def quantile(percentile = 0.7, number_of_trials = 1_000)
# Create an array containing results from specified number of trials.
# Defaults to 1000 trials
counts = Array.new(number_of_trials) { trial }
# Sort the array and determine the empirical target percentile.
# Defaults to 70th percentile
return counts.sort[(percentile * number_of_trials).to_i]
end
# Tally the statistics of 100 quantiles and report results,
# including margin of error, formatted to 3 decimal places.
stats = QuickStats.new
100.times { stats.new_obs(quantile) }
puts "#{"%.3f" % stats.avg}+/-#{"%.3f" % (1.96*stats.std_err)}"
Five runs produce outputs such as:
440.120+/-3.336
440.650+/-3.495
435.820+/-3.558
439.500+/-3.738
442.290+/-3.909
which is strongly consistent with the analytical result derived earlier and seems to differ significantly from other responder's answers.
Note that on my machine the simulation takes roughly 40 times longer than the analytical calculation, is more complex, and introduces uncertainty. To increase the precision you would need larger sample sizes, and thus longer run times. Given these considerations, I would reiterate my advice to go for the direct solution in this case.
Indeed, your probability will (almost) never reach 0.7, because you hardly will hit the point where exactly 1 person has got birthday = 365. When people gets larger, there will be more people having a birthday = 365, and the probability for exactly 1 person will decrease.
Furthermore, to calculate a probability for a given number of persons, you should draw many samples and then calculate the probability. Here is a way to achieve that:
N = 450 # max. number of peoples being tried
probs = array(numeric(), N) # empty array to store found probabilities
# try for all people numbers in range 1:N
for(people in 1:N){
# do 200 samples to calculate prop
samples = 200
successes = 0
for(i in 1:samples){
birthday <- sample(365, size = people, replace = TRUE)
total_last_day <- sum(birthday == 365)
if(total_last_day >= 1){
successes <- successes + 1
}
}
# store found prop in array
probs[people] = successes/samples
}
# output of those people numbers that achieved a probability of > 0.7
which(probs>0.7)
As this is a simulation, the result depends on the run. Increasing the sample rate would make the result more stable.
You are solving the wrong problem. The question is, "How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December?". What you are finding now is "How many people are needed such that 70% have their birthdays on the last day of December?". The answer to the second question is close to zero. But the first one is much simpler.
Replace prob <- length(which(birthday == 365)) / people with check = any(birthday == 365) in your logic because at least one of them has to be born on Dec 31. Then, you will be able to find if that number of people will have at least one person born on Dec 31.
After that, you will have to rerun the simulation multiple times to generate empirical probability distribution (kind of Monte Carlo). Only then you can check for probability.
Simulation Code
people_count = function(i)
{
set.seed(i)
for (people in 1:10000)
{
birthday = sample(365, size = people, replace = TRUE)
check = any(birthday == 365)
if(check == TRUE)
{
pf = people
break
}
}
return(pf)
}
people_count() function returns the number of people required to have so that at least one of them was born on Dec 31. Then I rerun the simulation 10,000 times.
# Number of simulations
nsim = 10000
l = lapply(1:nsim, people_count) %>%
unlist()
Let's see the distribution of the number of people required.
To find actual probability, I'll use cumsum().
> cdf = cumsum(l/nsim)
> which(cdf>0.7)[1]
[1] 292
So, on average, you would need 292 people to have more than a 70% chance.
In addition to #pjs answer, I would like to provide one myself, written in R. I attempted to solve this question by simulation rather than an analytical approach, and I am sharing it in case it is helpful for someone else who also has the same problem. Its not that well written but the idea is there:
# create a function which will find if anyone is born on last day
last_day <- function(x){
birthdays <- sample(365, size = x, replace = TRUE) #randomly get everyone's birthdays
if(length(which(birthdays == 365)) >= 1) {
TRUE #find amount of people born on last day and return true if >1
} else {
FALSE
}
}
# find out how many people needed to get 70%
people <- 0 #set number of people to zero
prob <- 0 #set prob to zero
while (prob <= 0.7) { #loop does not stop until it hits 70%
people <- people + 1 #increase the number of people every iteration
prob <- mean(replicate(10000, last_day(people))) #run last_day 10000 times to find the mean of probability
}
print(no_of_people)
last_day() only return TRUE or FALSE. So I run last_day() 10000 times in the loop for every iteration to find out, out of 10000 times, how many times does it have one or more people born on the last day (This will give the probability). I then keep the loop running until the probability is 70% or more, then print the number of people.
The answer I get from running the loop once is 440 which is quite close to the answer provided by #pjs.

Coding a simple loop for a sliding window [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have the following problem. I have a time series made by 2659 observations. I need to perform a statistical test over a sliding window of length 256 and each time I want to extract the p-values from these tests and gather them into a time series vector. To perform this test (runs test) I want as threshold a moving average that moves along with the data and the rolling window. Here is my attemp (in R)
x<- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
for(i in 1:2404){
runs <- runs.test(x[i:i+255], threshold = mean(x[i:i+255]))
ret[i] <- runs$p.value
}
The index starts from 1 but stops to 2404 because the time window must move of 256 each time, therefore the first window goes from 1 to 256, the second from 2 to 257... and finally stops to 255+2404 = 2659. I hope that I made clear my problem, I do not understand why it does not work. Of course I need to plot the result over time to have in a plot all the p-values over the time. I hope you can help me.
PS: Please, set a seed if you propose an example so that I can reproduce your results.
Use rollapplyr with the indicated function.
library(zoo)
pv <- function(xx) runs.test(xx, threshold = mean(xx))$p.value
out <- rollapplyr(x, 256, pv, fill = NA)
Note
library(randtests)
set.seed(123)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
Two changes to your existing code should make it work:
set.seed(0)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
ret <- rep(NA, length(x))
for(i in 1:2404){
runs <- runs.test(x[i:(i+255)], threshold = mean(x[i:(i+255)]))
ret[i] <- runs$p.value
}
First change is to initialize the ret variable before the loop. ret <- rep(NA, length(x))
The second change is to add the parenthesis, i.e. x[i:(i+255)]. If you do x[i:i+255], you will get a single return value, x[i].

Basic insurance risk model - R

I'm currently working on an R code for a basic insurance risk process.
it is part of an assignment, and I'm struggling with 3 things -
The question is in the image attached here. Question
1) the time arrival is specified as a homogeneous Poisson process. I was wondering if the generation of E was right to address this.
2) I'm not sure how to include a summation in R
3) I'm also struggling with how to integrate the j effectively in the code? Am i using it the right way? I'm still new to R. my code seems wrong to me, i was just wondering if someone can point me in the right direction.
a = 10
n=100
t=0
l=10
for(t in 1:n){
E <- rexp(n,l)
if(t + E < t){
t= t + E
}else{
if(t + E >=t){
for(j in 1:t)
R = a + 5*t - sum(X[j])
}
}
}

Is there a R package for solving -1/0/1 knapsack? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Is there an efficient R-package for dealing with the following problem:
I have a set of numerical observations (N in the order of thousands) ranging from -one million to +one million. Given a target value and round off accurarcy is there a linear combination with weights -1(subtract)/0(leave out)/1(add up) such that the sum is equal to the target value within rounding errors and also presenting the weights?
Here is the genetic algorithm I referenced modified to your case, for an explanation of the algorithm see my answer there. There may be (are certainly) ways to solve your issue with less code, but I had this solution on the shelf already and adapting it was simple. The input required is a data.frame with a column value and a column weights, which can be all zero:
value weights
1 45 0
2 33 0
3 47 0
4 65 0
5 12 0
6 43 0
7 5 0
... ... ...
The algorithm will then find a set of weights from the set c(-1,0,1) such that the value of
abs(target_value - sum(final_solution$value*final_solution$weights))
is minimized.
There is definitely still room for improvement, for example the weights are now set completely randomly, so the expected weighted sum of an initial solution is always 0. If the target_value is very high, it would be best to assign 1's a higher probability than -1, to converge to an optimal solution faster.
It seems to work very well for this case, with 100000 objects and a target value of 12000, it finds an optimal solution within a fraction of a second:
Code:
### PARAMETERS -------------------------------------------
n_population = 100 # the number of solutions in a population
n_iterations = 100 # The number of iterations
n_offspring_per_iter = 80 # number of offspring to create per iteration
frac_perm_init = 0.25 # fraction of columns to change from default solution while creating initial solutions
early_stopping_rounds = 100 # Stop if score not improved for this amount of iterations
### SAMPLE DATA -------------------------------------------------
n_objects = 100000
datain =data.frame(value=round(runif(n_objects,0,100)),weights = 0))
target_value=12000
### ALL OUR PREDEFINED FUNCTIONS ----------------------------------
# Score a solution
# We calculate the score by taking the sum of the squares of our overcapacity (so we punish very large overcapacity on a day)
score_solution <- function(solution,target_value)
{
abs(target_value-sum(solution$value*solution$weights))
}
# Merge solutions
# Get approx. 50% of tasks from solution1, and the remaining tasks from solution 2.
merge_solutions <- function(solution1,solution2)
{
solution1$weights = ifelse(runif(nrow(solution1),0,1)>0.5,solution1$weights,solution2$weights)
return(solution1)
}
# Randomize solution
# Create an initial solution
randomize_solution <- function(solution)
{
solution$weights = sample(c(-1,0,1),nrow(solution),replace=T)
return(solution)
}
# sort population based on scores
sort_pop <- function(population)
{
return(population[order(sapply(population,function(x) {x[['score']]}),decreasing = F)])
}
# return the scores of a population
pop_scores <- function(population)
{
sapply(population,function(x) {x[['score']]})
}
### RUN SCRIPT -------------------------------
# starting score
print(paste0('Starting score: ',score_solution(datain,target_value)))
# Create initial population
population = vector('list',n_population)
for(i in 1:n_population)
{
# create initial solutions by making changes to the initial solution
solution = randomize_solution(datain)
score = score_solution(solution,target_value)
population[[i]] = list('solution' = solution,'score'= score)
}
population = sort_pop(population)
score_per_iteration <- score_solution(datain,target_value)
# Run the algorithm
for(i in 1:n_iterations)
{
print(paste0('\n---- Iteration',i,' -----\n'))
# create some random perturbations in the population
for(j in 1:10)
{
sol_to_change = sample(2:n_population,1)
new_solution <- randomize_solution(population[[sol_to_change]][['solution']])
new_score <- score_solution(new_solution,target_value)
population[[sol_to_change]] <- list('solution' = new_solution,'score'= new_score)
}
# Create offspring, first determine which solutions to combine
# determine the probability that a solution will be selected to create offspring (some smoothing)
probs = sapply(population,function(x) {x[['score']]})
if(max(probs)==min(probs)){stop('No diversity in population left')}
probs = 1-(probs-min(probs))/(max(probs)-min(probs))+0.2
# create combinations
solutions_to_combine = lapply(1:n_offspring_per_iter, function(y){
sample(seq(length(population)),2,prob = probs)})
for(j in 1:n_offspring_per_iter)
{
new_solution <- merge_solutions(population[[solutions_to_combine[[j]][1]]][['solution']],
population[[solutions_to_combine[[j]][2]]][['solution']])
new_score <- score_solution(new_solution,target_value)
population[[length(population)+1]] <- list('solution' = new_solution,'score'= new_score)
}
population = sort_pop(population)
population= population[1:n_population]
print(paste0('Best score:',population[[1]]['score']))
score_per_iteration = c(score_per_iteration,population[[1]]['score'])
if(i>early_stopping_rounds+1)
{
if(score_per_iteration[[i]] == score_per_iteration[[i-10]])
{
stop(paste0("Score not improved in the past ",early_stopping_rounds," rounds. Halting algorithm."))
}
}
}
plot(x=seq(0,length(score_per_iteration)-1),y=score_per_iteration,xlab = 'iteration',ylab='score')
final_solution = population[[1]][['solution']]

Why is this causing my program to crash? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have created the following user defined function which generates random variables using the Acceptance-Rejection Method. However, whenever it is called, my program goes on and eventually crashes or I have to force quit. I have gone through it several times. Any ideas as to what could be causing this?
I am aware that this may not be the best way to have written this (Yesterday was the first time I have used R) so any extra tips are a bonus!
acceptReject <- function(){
Z <- 0
Y <- c(0,0)
while(Y[2] < .5*(Y[1]-1)**2){
U <- runif(2,0,1)
Y <- log(U)
}
Z <- Y[1]
U <- runif(1,0,1)
if(U <= .5){
Z <- abs(Z)
}
else{
Z <- -abs(Z)
}
Z
}
You have an infinite loop.
If you assume that Y ~ log( [0,1] ) (mathematically), that means that it always range between log(0) and log(1), equating to -Inf and 0, respectively. (Bottom line, it is always less-than-or-equal-to zero.)
Now let's look at your conditional: .5*(Y[1]-1)**2. If you know the domain of Y is c(-Inf,0), then the range of this formula is
.5*(c(-Inf,0)-1)**2
# [1] Inf 0.5
(This is always greater-than-or-equal-to 0.5.)
Since Y is always <= 0 and the formula is always >= 0.5, your conditional will mathematically always be true. Infinite loop.

Resources