Running the airplane prob simulation - r
I'm trying to run a simulation on R but I'm quite stuck; The simulation has to do with a variation of the Airplane Probability problem.
This is the scenario: A small 100 seat theatre is conducting a play, and assigns a random seat number (from 1–100) to the ticketed guests right before they walk in. There are 36 guests in total, who usually sit in their assigned seats. If their seats are occupied for some reason, they choose another seat at random. An actor who is part of the play messes this up by picking a seat out of the 100 seats randomly, possibly taking a ticketed audience's numbered seat.
I want to try and run this on R and try and answer the questions: What is the probability that the last person is in the wrong seat? and On average, approximately how many people will sit in the wrong seat?
Could someone help me with this? I added my code below and what I attempt to do is find the probability that the last person is in the wrong seat... I think there are errors in my code and I would love some suggestions/help to make it better!
#Following are the 2 empty vectors which we will use later to store some probabilities and other stuff
Probregister <- c()
Register <- c()
Person <- c(1:36) #this vector creates 100 people standing in que from 1 to 100.
Seat <- sample(1:100, 100) #this vector allots each one of them a seat randomly. These are the assigned seats.
Actualseats <- c(1:100) #these are 100 empty seats in the theatre yet to be filled. Each entry is a seat no.
Actualperson <- rep(0,36) #This is an empty vector. Here we will know who is actually occupying the given Actualseat.
Data <- data.frame(Person, Seat, Actualseats, Actualperson)
Data$Actualperson[sample(1:100,1)] <- 1 #Selecting any seat from 100 empty seats in the theatre.
#this next loop cycles the decision procedure given in question from 2nd person to 36th person.
for(i in 2:36) {
if (Data$Actualperson[Data$Seat[i]] == 0) {
Data$Actualperson[Data$Seat[i]] <- i #If the seat assigned to ith person is empty then the person sits in it.
} else {
#This next line is very crucial piece and read it carefully.
#First square bracket selects only those seats which are empty. ie. Actualperson = 0
#Second square bracket randomly chooses 1 seat from these empty seats for ith person to sit in.
Data$Actualperson[which(Data$Actualperson == 0)][sample(1:length(Data$Actualperson[which(Data$Actualperson == 0)]), 1)] <- i #If their assigned seat is unavailable then they select randomly from remaining empty seats.
}
} #Here the loop ends for one trial. T
if(Data$Actualperson[Data$Seat[36]] == 36) {
Register <- append(Register, "Yes", after = length(Register)) #if 36th person is sitting in his alloted seat then add "Yes" to the Register.
} else {
Register <- append(Register, "No", after = length(Register)) #if 36th person is not sitting in his alloted seat then add "No" to the Register.
}
}
Probability <- length(Register[which(Register=="Yes")])/length(Register)
Probregister <- append(Probregister, Probability, after = length(Probregister))
}
Probsummary <- summary(Probregister)
plot(density(Probregister), col="red")
abline(v = Probsummary[3], col="blue")
This is a simulation I perform. p is probability that actor remove the seats. You may change this as n and remove n <- floor(100 * p) line in function.
func <- function(p){
x <- c(1:100) #stands for seats
y <- c(1:36) #stands for 36 person's seats, consider it as 1~36 cause it doesn't matter
correct <- rep(NA, 36) #dummy vector to record if person seat on correct seat
fin_passenger_dummy <- rep(NA,36) #dummy vector to record final passenger seat on correct seat
n <- floor(100 * p) #number of seats that an actor remove
yy <- sample(y, 36) #order of persons
actor <- sample(x, n) #id's of removed seats
seats <- setdiff(x, actor) #id's of remained seats
for (i in 1:36){
if (yy[i] %in% seats){
correct[yy[i]] <- TRUE #append that yy[i] seat on his seat
fin_passenger_dummy[i] <- TRUE #append that yy[i] seat on his seat
seats <- setdiff(seats, yy[i]) #update remaining seats
} else{
y_sad <- sample(seats, 1) #randomly choose seat to seat
correct[yy[i]] <- FALSE
fin_passenger_dummy[i] <- FALSE
seats <- setdiff(seats, y_sad)
}
}
return(list(total = correct, final = last(fin_passenger_dummy)))
}
To get the probability that the last person is in the wrong seat, replicate this function for enough time and take mean of $final. For example, letting p = 0.3 means an actor remove 30 seats,
dummy <- c()
for (i in 1:1000){
dummy <- c(dummy, func(0.3)$final)
}
mean(dummy)
[1] 0.532
And to get "On average, approximately how many people will sit in the wrong seat",
dummy <- c()
for (i in 1:1000){
dummy <- c(dummy, sum(func(0.3)$total))
}
mean(dummy)
[1] 11.7015
will do.
If you need more description about the code, pleas let me know
I'll refer to anyone that sits in a random unticketed seat as a "floater". The first floater is the actor. If the actor takes someone's seat, that person becomes the floater, etc. A few observations to speed things up:
The actual seats positions/ordering doesn't matter, only the order with which the guests enter the theatre
Each unoccupied ticketed seat has an equal probability that the floater will sit in it
The probability that the floater will sit in another guest's seat is the number of unoccupied ticketed seats divided by the number of unoccupied seats
An expected 64% of the simulation replications result in all guests sitting in their ticketed seats. The other replications don't need to be simulated. We need only simulate the number of replications that require simulation (via rbinom).
This lets us run the simulation recursively:
TheatreRec <- function(tickets, seats) {
# recursive function for simulating the theatre seating problem
# Inputs:
# tickets: the number of ticketed guests yet to be seated
# seats: the number of unoccupied seats remaining
# Output: an integer vector of length 2:
# 1: number of guests in the wrong seat
# 2: whether the last seated guest sits in the wrong seat (0 or 1)
# the floater sits in a random unoccupied seat
floater <- sample(seats, 1)
if (floater > tickets) {
# the floater didn't take anyone's seat
return(c(0L, 0L))
} else if (floater < tickets){
# the floater took one of the guests' seats, but not the last guest's seat
return(c(1L, 0L) + TheatreRec(tickets - floater, seats - floater))
} else {
# the floater took the last guest's seat
return(c(1L, 1L))
}
}
# create a vectorized version of TheatreRec
TheatreRecVec <- Vectorize(TheatreRec)
I'll run a million replications. For an expected 36% of the replications, the actor will sit in one of the guest's seats. For these replications, use the sample function to simulate whose seat the actor takes (in order of entry into the theatre). Then complete the simulation with TheatreRecVec, which gives the results column-wise. Note that for all of these replications, the first floater (after the actor) needs to be added to the results of TheatreRecVec.
floater <- sample(36, rbinom(1, 1e6, 0.36), replace = TRUE)
(results <- setNames(rowSums(rbind(1L, floater == 36) + TheatreRecVec(36L - floater, 100L - floater))/1e6, c("Avg. in wrong seat", "P(last guess in wrong seat)")))
Avg. in wrong seat P(last guess in wrong seat)
0.442944 0.015423
EDIT to compare the simulation to the exact solutions:
To check the simulation, we can use the exact value for the expected number of guests who end up in the wrong seat:
digamma(s + 1) - digamma(s - g + 1)
where s is the number of seats in the theatre, and g is the number of ticketed guests.
> digamma(100 + 1) - digamma(100 - 36 + 1)
[1] 0.4434866
The probability that the last guess ends up in the wrong seat is simply 1/(s - g + 1)
> 1/(100 - 36 + 1)
[1] 0.01538462
These match pretty closely with the simulation results above.
Related
Average time in a restaurant in R
I am visiting a restaurant that has a menu with N dishes. Every time that I visit the restaurant I pick one dish at random. I am thinking, what is the average time until I taste all the N dishes in the restaurant? I think that the number of dishes that I have tasted after n visits in the restaurant is a Markov chain with transition probabilities: p_{k,k+1} = (N-k)/N and p_{k,k} = k/N for k =0,1,2,...,N I want to simulate this process in R. Doing so (I need help here) given that the restaurant has 100 dishes I did: nits <- 1000 #simulate the problem 1000 times count <- 0 N = 100 # number of dishes for (i in 1:nits){ x <- 1:N while(length(x) > 0){ x <- x[x != sample(x=x,size=1)] # pick one dish at random that I have not tasted count <- count + 1/nits } } count I want some help because my mathematical result is the the average time is N*log(N) and the code above produces different results.
You have 2 issues. It's always a red flag when you loop over i, but don't use i inside the loop. Set up a structure to hold the results of every iteration: results = integer(length = nits) ... for (i in 1:nits){ ... while(length(x) > 0){ ... } results[i] <- count } Your text says pick one dish at random Your code says pick one dish at random that I have not tasted If you always pick a dish you have not tasted, then the problem is trivial - it will take N visits. Let's adjust your code to pick on dish at random whether you have tasted it or not: nits <- 1000 #simulate the problem 1000 times results = integer(length = nits) N = 100 # number of dishes for (i in 1:nits){ dishes = 1:N tasted = rep(0, N) count = 0 while(sum(tasted) < N){ tasted[sample(dishes, size = 1)] = 1 count = count + 1 } results[i] = count } results Looking at the results, I think you may have made a math error: 100 * log(100) # [1] 460.517 mean(results) # [1] 518.302 You can read more about this problem on Wikipedia: Coupon Collector's Problem. Using the result there, the simulation is doing quite well: 100 * log(100) + .577 * 100 + 0.5 # [1] 518.717
Randomly generating numbers in R with condition for the total sum AND with restrictions for specific members of the generated vector
I am looking to randomly generate a vector of numbers in R, with a specific sum but which also has a restriction for some specific members of the generated vector, e.g. that the 4th number (say, in a vector of 5) cannot exceed 50. I am doing this within a for loop with millions of iterations in order to simulate election vote changes, where I am adding votes to one party and taking them away from other parties with equal probability. However, my issue is that in many iterations, votes turn out to be negative, which is illogical. I have figured out how to do the "sums up to X" part from other answers here, and I have made a workaround for the second restriction as follows: parties <- data.table(party = c("red", "green", "blue", "brown", "yellow"), votes = c(657, 359, 250, 80, 7)) votes_to_reallocate <- 350 immune_party <- "green" parties_simulation <- copy(parties) parties_simulation[party != immune_party, votes := votes - as.vector(rmultinom(1, size=votes_to_reallocate, prob=rep(1, nrow(parties)-1))) ] # Most likely there are negative votes for the last party, perhaps even the last two. # While loop is supposed to correct this while (any(parties_simulation[, votes]<0)) { negative_parties <- parties_simulation[votes < 0, party] for (i in seq_along(negative_parties)) { votes_to_correct <- parties_simulation[party == negative_parties[i], abs(votes)] parties_to_change <- parties_simulation[party != immune_party & !party %in% negative_parties, .N] parties_simulation[party != immune_party & !party %in% negative_parties, votes := votes - as.vector(rmultinom(1, size=votes_to_correct, prob=rep(1, parties_to_change))) ] parties_simulation[party == negative_parties[i], votes := votes + votes_to_correct] } } However, this seems to be a huge bottleneck as each simulation has to be corrected by the while loop. I am curious as to whether there is a solution for this that would generate the random numbers with the restriction already imposed (for instance, generate 4 random numbers, adding up to 350, and with the fourth number not exceeding 7). If not, perhaps there is a more efficient way to solve this?
Maybe I'm missing something, but would this work: const_rng <- function(n, const, total){ consts <- sapply(const, function(x)sample(1:x, 1)) rest <- rmultinom(1, total - sum(consts), prob = rep(1/(n-length(consts)), (n-length(consts)))) res <- rep(NA, n) res[as.numeric(names(const))] <- consts res[-as.numeric(names(const))] <- rest return(res) } out <- const_rng(5, const=c("4" = 7), 350) out # [1] 90 76 88 5 91 sum(out) # [1] 350 First, it draws the constrained values from the integers 1:const. Then it draws the remainder total - the sum of the constrained draws) from a multinomial distribution giving each other outcome equal probability. The const argument is specified by a vector where the name is the observation number to be constrained and the value is the upper bound of the draw. For example const = c("4" = 7) means constrain the fourth point to be between 0 and 7.
Running random operations over vector, conditional
I am doing some modelling and wish to simulate randomness. I have a total number of runs run_times which is 5 in this example. A vector holding run_lengths will print 1's for which, so if run length is 3, it prints 1's 3 times. The sample_data includes a sample of 1's and 0's. The application of printing 1's along a run_lengths is randomly done when sample_data == 1; not all == 1 is to be picked though. Only random... and operation can only print 1 for a total number of run_times (5). Theres a few moving parts for sure. I am tackling the problem in this manner: I am able to select run_lengths at random with sample(run_lengths, 1). I am unsure how to select sample_data at random and I'm trying to keep a counter in order to stay under run_times: run_lengths <- c(2,4,5,6,7,8,1) run_times <- 5 sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0) # Randomly select 1's from sample_data, when find 1, randomly print 1's along run_lengths # Only print a certain amount of times (run_times) # Pick run_lengths at random == sample(run_lengths,1) # Pick df$sample 1's at random, how to randomly select???? count <- 0 # keep track of how many random run_lengths is being applied res <- NULL while (length(res) < length(sample_data)) { if (sample_data[length(res)+1] == 1 & count < run_times) { # not sure how to pick sample_date == 1? res <- c(res, rep(1,sample(run_lengths,1))) # if signal == 1 (randomly) then randomly rep a run_length count <- count +1 # count how many random reps, run_lengths have been applied } else { res <- c(res, 0) # Note if condition is not true, we print 0 vs 1 } } res <- res[1:length(sample_data)] res I have completed it maybe on 60%? I'm not sure what is the best approach for choosing random 1's from sample_data. Also I'm not sure how to only keep number of run_lengths under the run_times maximum. I am attempting to keep a count for when the condition was true. If it was exceeded, it would ignore any other true conditions.
Ok, time to put down some code, still not sure about if it's right or not sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0) # take indices of of sampled data where value == 1 i <- which(sample_data %in% 1) # now shuffle them all, no replacement - random positions with 1s p <- sample(i, length(i), replace=FALSE) print(sample_data[p[1]]) print(sample_data[p[2]]) print(sample_data[p[3]]) ... Is this what you want?
Simulate experiment with R Programming
I am new in R programming. I need to solve 1 problem in R. I need to simulate the following experiment in R. A poker hand consists of 5 cards dealt from a conventional pack of 52 cards, the order of the cards not being important. Find the probability that a given hand has at least one king and at least one queen. I know how to find for atleast 1 king but not for at least one king and at least one queen. for atleast 1 king code is : deck<- rep(1:13 , each=4) #here J=11 , Q=12, K=13 nhands <- 1000 Xk<- c(rep(-1, nhands)) for( i in 1:nhands){ hand <- sample( deck , 5 , replace= FALSE) numberofK<-0 for( j in 1:5){ # count Kings if( hand[j] == 13){ numberofK <- numberofK +1 } } #print(numberofK) Xk[i] <-numberofK #print(hand) } table(Xk) /nhands Can anyone please help me in coding the required 1.. Thanks
The probability of a hand of 5 cards that contain at least one king or one queen can be simply written as following, where sample(deck, 5) gives a hand of 5 cards while any(c(12, 13) ...) checks whether King or Queen is within the hand and sum counts how many times such case happens within the 1000 simulation: set.seed(10) sum(sapply(1:100000, function(i) { any(c(12, 13) %in% sample(deck, 5))}))/100000 # [1] 0.58365 Theoretically, the probability of such case would be: (choose(52, 5) - choose(44, 5))/choose(52, 5) # [1] 0.5821375 Which are pretty close. And on the other hand, if it is indeed And which means at least a king and a queen, simulation gives: set.seed(10) sum(sapply(1:100000, function(i) { all(c(12, 13) %in% sample(deck, 5))}))/100000 # [1] 0.09932 And theoretically: (choose(52, 5) - choose(44, 5) - (2*(choose(48, 5) - choose(44, 5))))/choose(52, 5) # [1] 0.1001785 And the number matches closely.
For a growing data feed in R, how can two time lengths be calculated based on "time to peak" and "time back to baseline"?
How can the following be accomplished with R? Connect a constantly changing data source (e.g. https://goo.gl/XCM6yG) into R, Measure time once prices start to rise consistently from initial baseline range to peak (represented by the green horizontal line), Measure time from peak back to baseline range (the teal line) Note: "Departure from baseline range" (unless there is a better mathematical way) defined as at least the most recent 5 prices all being over 3 standard deviations above the mean of the latest 200 prices
This is a really vague questions with an unknown use case but... here we go. Monitoring in what way? The length? That's what I did The vector has over 200 values we can take the mean, so we need a control flow for that part. I added in some noise which basically says force the behavior you want to calculate ( ifelse(i %in% 996:1000, 100, 0) which means, if the iterator is in 996 to 1000, add 100 to the random normal i generated). We set a counter and check if each value is about 3 sd of the vector values, if so we record the time. At each input of the data...check if the current value is the max value... now this is more tricky since we would have to look at the trend. This is beyond the scope of my assistance. Up to you to figure out since I don't really understand vec <- vecmean <- val5 <- c() counter <- 0 for(i in 1:1000){ vec[i] <- rnorm(1) + ifelse(i %in% 996:1000, 100, 0) Sys.sleep(.001) # change to 1 second #1 cat('The vector has',length(vec),'values within...\n') #2 if(length(vec)>200){ vecmean <- c(vecmean, mean(vec[(i-200):i])) cat('The mean of the last 200 observations is ', format(vecmean[length(vecmean)], digits =2),'\n') #3 upr <- vecmean[length(vecmean)] + 3*sd(vec) if(vec[i] > upr){ counter <- counter + 1 } else{ counter <- 0 } if(counter > 4){ cat('Last 5 values greater than 3sd aboving the rolling mean!\n') val5 <- Sys.time() cat("Timestamp:",as.character(val5),'\n') } } # 4 theMax <- max(vec) if(vec[i] == theMax & !is.null(val5) ){ valMax <- Sys.time() valDiff <- valMax - val5 cat('The time difference between the first flag and second is', as.character(valDiff),'\n') } }