I am visiting a restaurant that has a menu with N dishes. Every time that I visit the restaurant I pick one dish at random. I am thinking, what is the average time until I taste all the N dishes in the restaurant?
I think that the number of dishes that I have tasted after n visits in the restaurant is a Markov chain with transition probabilities:
p_{k,k+1} = (N-k)/N
and
p_{k,k} = k/N
for k =0,1,2,...,N
I want to simulate this process in R.
Doing so (I need help here) given that the restaurant has 100 dishes I did:
nits <- 1000 #simulate the problem 1000 times
count <- 0
N = 100 # number of dishes
for (i in 1:nits){
x <- 1:N
while(length(x) > 0){
x <- x[x != sample(x=x,size=1)] # pick one dish at random that I have not tasted
count <- count + 1/nits
}
}
count
I want some help because my mathematical result is the the average time is N*log(N) and the code above produces different results.
You have 2 issues.
It's always a red flag when you loop over i, but don't use i inside the loop. Set up a structure to hold the results of every iteration:
results = integer(length = nits)
...
for (i in 1:nits){
...
while(length(x) > 0){
...
}
results[i] <- count
}
Your text says
pick one dish at random
Your code says
pick one dish at random that I have not tasted
If you always pick a dish you have not tasted, then the problem is trivial - it will take N visits. Let's adjust your code to pick on dish at random whether you have tasted it or not:
nits <- 1000 #simulate the problem 1000 times
results = integer(length = nits)
N = 100 # number of dishes
for (i in 1:nits){
dishes = 1:N
tasted = rep(0, N)
count = 0
while(sum(tasted) < N){
tasted[sample(dishes, size = 1)] = 1
count = count + 1
}
results[i] = count
}
results
Looking at the results, I think you may have made a math error:
100 * log(100)
# [1] 460.517
mean(results)
# [1] 518.302
You can read more about this problem on Wikipedia: Coupon Collector's Problem. Using the result there, the simulation is doing quite well:
100 * log(100) + .577 * 100 + 0.5
# [1] 518.717
Related
I'm trying to run a simulation on R but I'm quite stuck; The simulation has to do with a variation of the Airplane Probability problem.
This is the scenario: A small 100 seat theatre is conducting a play, and assigns a random seat number (from 1–100) to the ticketed guests right before they walk in. There are 36 guests in total, who usually sit in their assigned seats. If their seats are occupied for some reason, they choose another seat at random. An actor who is part of the play messes this up by picking a seat out of the 100 seats randomly, possibly taking a ticketed audience's numbered seat.
I want to try and run this on R and try and answer the questions: What is the probability that the last person is in the wrong seat? and On average, approximately how many people will sit in the wrong seat?
Could someone help me with this? I added my code below and what I attempt to do is find the probability that the last person is in the wrong seat... I think there are errors in my code and I would love some suggestions/help to make it better!
#Following are the 2 empty vectors which we will use later to store some probabilities and other stuff
Probregister <- c()
Register <- c()
Person <- c(1:36) #this vector creates 100 people standing in que from 1 to 100.
Seat <- sample(1:100, 100) #this vector allots each one of them a seat randomly. These are the assigned seats.
Actualseats <- c(1:100) #these are 100 empty seats in the theatre yet to be filled. Each entry is a seat no.
Actualperson <- rep(0,36) #This is an empty vector. Here we will know who is actually occupying the given Actualseat.
Data <- data.frame(Person, Seat, Actualseats, Actualperson)
Data$Actualperson[sample(1:100,1)] <- 1 #Selecting any seat from 100 empty seats in the theatre.
#this next loop cycles the decision procedure given in question from 2nd person to 36th person.
for(i in 2:36) {
if (Data$Actualperson[Data$Seat[i]] == 0) {
Data$Actualperson[Data$Seat[i]] <- i #If the seat assigned to ith person is empty then the person sits in it.
} else {
#This next line is very crucial piece and read it carefully.
#First square bracket selects only those seats which are empty. ie. Actualperson = 0
#Second square bracket randomly chooses 1 seat from these empty seats for ith person to sit in.
Data$Actualperson[which(Data$Actualperson == 0)][sample(1:length(Data$Actualperson[which(Data$Actualperson == 0)]), 1)] <- i #If their assigned seat is unavailable then they select randomly from remaining empty seats.
}
} #Here the loop ends for one trial. T
if(Data$Actualperson[Data$Seat[36]] == 36) {
Register <- append(Register, "Yes", after = length(Register)) #if 36th person is sitting in his alloted seat then add "Yes" to the Register.
} else {
Register <- append(Register, "No", after = length(Register)) #if 36th person is not sitting in his alloted seat then add "No" to the Register.
}
}
Probability <- length(Register[which(Register=="Yes")])/length(Register)
Probregister <- append(Probregister, Probability, after = length(Probregister))
}
Probsummary <- summary(Probregister)
plot(density(Probregister), col="red")
abline(v = Probsummary[3], col="blue")
This is a simulation I perform. p is probability that actor remove the seats. You may change this as n and remove n <- floor(100 * p) line in function.
func <- function(p){
x <- c(1:100) #stands for seats
y <- c(1:36) #stands for 36 person's seats, consider it as 1~36 cause it doesn't matter
correct <- rep(NA, 36) #dummy vector to record if person seat on correct seat
fin_passenger_dummy <- rep(NA,36) #dummy vector to record final passenger seat on correct seat
n <- floor(100 * p) #number of seats that an actor remove
yy <- sample(y, 36) #order of persons
actor <- sample(x, n) #id's of removed seats
seats <- setdiff(x, actor) #id's of remained seats
for (i in 1:36){
if (yy[i] %in% seats){
correct[yy[i]] <- TRUE #append that yy[i] seat on his seat
fin_passenger_dummy[i] <- TRUE #append that yy[i] seat on his seat
seats <- setdiff(seats, yy[i]) #update remaining seats
} else{
y_sad <- sample(seats, 1) #randomly choose seat to seat
correct[yy[i]] <- FALSE
fin_passenger_dummy[i] <- FALSE
seats <- setdiff(seats, y_sad)
}
}
return(list(total = correct, final = last(fin_passenger_dummy)))
}
To get the probability that the last person is in the wrong seat, replicate this function for enough time and take mean of $final. For example, letting p = 0.3 means an actor remove 30 seats,
dummy <- c()
for (i in 1:1000){
dummy <- c(dummy, func(0.3)$final)
}
mean(dummy)
[1] 0.532
And to get "On average, approximately how many people will sit in the wrong seat",
dummy <- c()
for (i in 1:1000){
dummy <- c(dummy, sum(func(0.3)$total))
}
mean(dummy)
[1] 11.7015
will do.
If you need more description about the code, pleas let me know
I'll refer to anyone that sits in a random unticketed seat as a "floater". The first floater is the actor. If the actor takes someone's seat, that person becomes the floater, etc. A few observations to speed things up:
The actual seats positions/ordering doesn't matter, only the order with which the guests enter the theatre
Each unoccupied ticketed seat has an equal probability that the floater will sit in it
The probability that the floater will sit in another guest's seat is the number of unoccupied ticketed seats divided by the number of unoccupied seats
An expected 64% of the simulation replications result in all guests sitting in their ticketed seats. The other replications don't need to be simulated. We need only simulate the number of replications that require simulation (via rbinom).
This lets us run the simulation recursively:
TheatreRec <- function(tickets, seats) {
# recursive function for simulating the theatre seating problem
# Inputs:
# tickets: the number of ticketed guests yet to be seated
# seats: the number of unoccupied seats remaining
# Output: an integer vector of length 2:
# 1: number of guests in the wrong seat
# 2: whether the last seated guest sits in the wrong seat (0 or 1)
# the floater sits in a random unoccupied seat
floater <- sample(seats, 1)
if (floater > tickets) {
# the floater didn't take anyone's seat
return(c(0L, 0L))
} else if (floater < tickets){
# the floater took one of the guests' seats, but not the last guest's seat
return(c(1L, 0L) + TheatreRec(tickets - floater, seats - floater))
} else {
# the floater took the last guest's seat
return(c(1L, 1L))
}
}
# create a vectorized version of TheatreRec
TheatreRecVec <- Vectorize(TheatreRec)
I'll run a million replications. For an expected 36% of the replications, the actor will sit in one of the guest's seats. For these replications, use the sample function to simulate whose seat the actor takes (in order of entry into the theatre). Then complete the simulation with TheatreRecVec, which gives the results column-wise. Note that for all of these replications, the first floater (after the actor) needs to be added to the results of TheatreRecVec.
floater <- sample(36, rbinom(1, 1e6, 0.36), replace = TRUE)
(results <- setNames(rowSums(rbind(1L, floater == 36) + TheatreRecVec(36L - floater, 100L - floater))/1e6, c("Avg. in wrong seat", "P(last guess in wrong seat)")))
Avg. in wrong seat P(last guess in wrong seat)
0.442944 0.015423
EDIT to compare the simulation to the exact solutions:
To check the simulation, we can use the exact value for the expected number of guests who end up in the wrong seat:
digamma(s + 1) - digamma(s - g + 1)
where s is the number of seats in the theatre, and g is the number of ticketed guests.
> digamma(100 + 1) - digamma(100 - 36 + 1)
[1] 0.4434866
The probability that the last guess ends up in the wrong seat is simply 1/(s - g + 1)
> 1/(100 - 36 + 1)
[1] 0.01538462
These match pretty closely with the simulation results above.
How can the following be accomplished with R?
Connect a constantly changing data source (e.g. https://goo.gl/XCM6yG) into R,
Measure time once prices start to rise consistently from initial baseline range to peak (represented by the green horizontal line),
Measure time from peak back to baseline range (the teal line)
Note: "Departure from baseline range" (unless there is a better mathematical way) defined as at least the most recent 5 prices all being over 3 standard deviations above the mean of the latest 200 prices
This is a really vague questions with an unknown use case but... here we go.
Monitoring in what way? The length? That's what I did
The vector has over 200 values we can take the mean, so we need a control flow for that part.
I added in some noise which basically says force the behavior you want to calculate ( ifelse(i %in% 996:1000, 100, 0) which means, if the iterator is in 996 to 1000, add 100 to the random normal i generated). We set a counter and check if each value is about 3 sd of the vector values, if so we record the time.
At each input of the data...check if the current value is the max value... now this is more tricky since we would have to look at the trend. This is beyond the scope of my assistance.
Up to you to figure out since I don't really understand
vec <- vecmean <- val5 <- c()
counter <- 0
for(i in 1:1000){
vec[i] <- rnorm(1) + ifelse(i %in% 996:1000, 100, 0)
Sys.sleep(.001) # change to 1 second
#1
cat('The vector has',length(vec),'values within...\n')
#2
if(length(vec)>200){
vecmean <- c(vecmean, mean(vec[(i-200):i]))
cat('The mean of the last 200 observations is ',
format(vecmean[length(vecmean)], digits =2),'\n')
#3
upr <- vecmean[length(vecmean)] + 3*sd(vec)
if(vec[i] > upr){
counter <- counter + 1
} else{
counter <- 0
}
if(counter > 4){
cat('Last 5 values greater than 3sd aboving the rolling mean!\n')
val5 <- Sys.time()
cat("Timestamp:",as.character(val5),'\n')
}
}
# 4
theMax <- max(vec)
if(vec[i] == theMax & !is.null(val5) ){
valMax <- Sys.time()
valDiff <- valMax - val5
cat('The time difference between the first flag and second is', as.character(valDiff),'\n')
}
}
I'm new to R, so most of my code is most likely wrong. However, I was wondering how to use a while() loop within a for() loop. I'm trying to simulate rolling a pair of dice several times if the total 2,3,7,11,or 12 then I stop. If the total 4,5,6,8,9, or 10 then I continue to the roll the dice until the initial total appears or 7. I'm trying to find the average number of rolls it take to end the game
count = 0
x = NULL
for (i in 1:10) {
x[i] = c(sample(1:6,1) +sample(1:6,1))
if(x[i] == c(2||3||7||11||12)) {
if(TRUE) {count = count +1}
} else { while(x[i] == c(4||5||6||8||9||10)) {
x[i +1] = c(sample(1:6,1)+sample(1:6,1))
if(x[i+1] == c(x[i]||7)) {
if(TRUE){count = count + x[i+1]}
}
}
}
}
print(count)
I think there are a few issues with your logic. I'm not quite sure what you're trying to do in your code, but this is my interpretation of your description of your problem ... this only runs a single round of your game -- it should work if you embed it in a for loop though (just don't reset count or reset the random-number seed in side your loop -- then count will give you the total number of rolls, and you can divide by the number of rounds to get the average)
Setup:
count = 0
sscore <- c(2,3,7,11,12)
set.seed(101)
debug = TRUE
Running a single round:
x = sample(1:6,1) +sample(1:6,1) ## initial roll
count = count + 1
if (x %in% sscore) {
## don't need to do anything if we hit,
## as the roll has already been counted
if (debug) cat("hit",x[i],"\n")
} else {
## initialize while loop -- try one more time
y = c(sample(1:6,1)+sample(1:6,1))
count = count + 1
if (debug) cat("initial",x,"next",y,"\n")
while(!(y %in% c(x,7))) {
y = c(sample(1:6,1)+sample(1:6,1))
count = count+1
if (debug) cat("keep trying",y,"\n")
} ## end while
} ## end if-not-hit
print(count)
I tried embedding this in a for loop and got a mean of 3.453 for 1000 rounds, close to #PawelP's answer.
PS I hope this isn't homework, as I prefer not to answer homework questions ...
EDIT: I had a bug - forgot to remove if negation. Now the below seems to be 100% true to your description of the problem.
This is my implementation of the game you've described. It calculates the average number of rolls it took to end the game over a TOTAL_GAMES many games.
TOTAL_GAMES = 1000
counts = rep(0, TOTAL_GAMES)
x = NULL
for (i in 1:TOTAL_GAMES) {
x_start = c(sample(1:6,1) +sample(1:6,1))
counts[i] = counts[i] + 1
x = x_start
if(x %in% c(2, 3, 7, 11, 12)){
next
}
repeat {
x = c(sample(1:6,1)+sample(1:6,1))
counts[i] = counts[i] + 1
if(x %in% c(x_start, 7)){
break
}
}
}
print(mean(counts))
It seems that the average number of rolls is around 3.38
Here's one approach to this question - I made a function that runs a single trial, and another function which conducts a variable number of these trials and returns the cumulative average.
## Single trial
rollDice <- function(){
init <- sample(1:6,1)+sample(1:6,1)
rolls <- 1
if( init %in% c(2,3,7,11,12) ){
return(1)
} else {
Flag <- TRUE
while( Flag ){
roll <- sample(1:6,1)+sample(1:6,1)
rolls <- rolls + 1
if( roll %in% c(init,7) ){
Flag <- FALSE
}
rolls
}
}
return(rolls)
}
## Multiple trials
simAvg <- function(nsim = 100){
x <- replicate(nsim,rollDice())
Reduce("+",x)/nsim
}
##
## Testing
nTrial <- seq(1,1000,25)
Results <- sapply(nTrial, function(X){ simAvg(X) })
##
## Plot over varying number of simulations
plot(x=nTrial,y=Results,pch=20)
As #Ben Bolker pointed out, you had a couple of syntax errors with ||, which is understandable for someone new to R. Also, you'll probably hear it a thousand times, but for and while loops are pretty inefficient in R so you generally want to avoid them if possible. In the case of the while loop in the above rollDice() function, it probably isn't a big deal because the probability of the loop executing a large number of times is very low. I used the functions Reduce and replicate to serve the role of a for loop in the second function. Good question though, it was fun to work on.
I have a working solution to my problem, but I will not be able to use it because it is so slow (my calculations predict that the whole simulation will take 2-3 years!). Thus I am looking for a better (faster) solution. This is (in essence) the code I am working with:
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
# second loop
for (j in 1:2000) { #second loop
#count rle for group 1 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==1&v$p==j])))
#count rle for group 2 for each pnumber
x <- rbind(x, table(rle(v$v[v$t==2&v$p==j])))
} #end second loop
} #end first loop
#total rle counts for both group 1 & 2
y <-aggregate(x, list(as.numeric(rownames(x))), sum)
In words: The code generates a coin-flip simulation (v). A group factor is generated (1 & 2). A p.number factor is generated (1:2000). The run lengths are recorded for each p.number (1:2000) for both groups 1 & group 2 (each p.number has runs in both groups). After N loops (the first loop), the total run lengths are presented as a table (aggregate) (that is, the run lengths for each group, for each p.number, over N loops as a total).
I need the first loop because the data that I am working with comes in individual files (so I'm loading the file, calculating various statistics etc and then loading the next file and doing the same). I am much less attached to the second loop, but can't figure out how to replace it with something faster.
What can be done to the second loop to make it (hopefully, a lot) faster?
You are committing the cardinal sin of growing an object within a for() loop in R. Don't (I repeat don't) do this. Allocate sufficient storage for x at the beginning and then fill in x as you go.
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
Then in the inner loop
x[ii, ] <- table(rle(....))
where ii is a loop counter that you initialise to 1 before the first loop and increment within the second loop:
x <- matrix(nrow = N * (2000 * 2), ncol = ??)
ii <- 1
for(i in 1:N) {
.... # stuff here
for(j in 1:2000) {
.... # stuff here
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
x[ii, ] <- table(rle(....))
## increment ii
ii <- ii + 1
} ## end inner loop
} ## end outer loop
Also note that you are reusing index i in bot for()loops which will not work.iis just a normal R object and so bothfor()loops will be overwriting it as the progress. USej` for the second loop as I did above.
Try that simple optimisation first and see if that will allow the real simulation to complete in an acceptable amount of time. If not, come back with a new Q showing the latest code and we can think about other optimisations. The optimisation above is simple to do, optimising table() and rle() might take a lot more work. Noting that, you might look at the tabulate() function which does the heavy lifting in table(), which might be one avenue for optimising that particular step.
If you just want to run rle and table for each combination of the values of v$t and v$p separately, there is no need for the second loop. It is much faster in this way:
values <- v$v + v$t * 10 + v$p * 100
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- table(runlength)
y <- aggregate(unclass(x), list(as.numeric(rownames(x))), sum)
The whole code will look like this. If N is as low as 4, the growing object x will not be a severe problem. But generally I agree with #GavinSimpson, that it is not a good programming technique.
N=4
x <-NULL
for (i in 1:N) { #first loop
v <-sample(0:1, 1000000, 1/2) #generate data
v <-as.data.frame(v) #convert to dataframe
v$t <-rep(1:2, each=250) #group
v$p <-rep(1:2000, each=500) #p.number
values <- v$v + N * 10 + v$t * 100 + v$p * 1000
runlength <- rle(values)
runlength$values <- runlength$values %% 2
x <- rbind(x, table(runlength))
} #end first loop
y <-aggregate(x, list(as.numeric(rownames(x))), sum) #tota
I want to simulate different poker hands. Through painful trial and error, I got the ranks, suits, the deck and a function to draw any given number of cards as:
suits <- c("spd","hrt","dimd","clbs")
ranks <- c(1:10,"J","Q","K")
deck <- as.matrix(expand.grid('rank' = ranks, 'suit' = suits))
draw <- function (n) deck[sample(nrow(deck), n), ]
draw(5) # Drawing 5 cards from the deck...
Output:
rank suit
[1,] "4" "dimd"
[2,] "6" "dimd"
[3,] "8" "spd"
[4,] "K" "hrt"
[5,] "8" "clbs"
Now I want to find out through simulation the probability of getting different hands. I did come up with some possible loops with a counter for the number of successes but I am stuck.
Here is an example... Let me try to figure out how many full houses I get in 1000 simulations. Since a full house is defined as "three matching cards of one rank and two matching cards of another rank", I figured that the key part of the function would be to have a boolean within an if statement that takes advantage of the R function unique()==2, meaning 2 unique ranks - with 5 cards dealt, 2 unique ranks could be a full house (another possibility is four-of-a-kind with any other rank).
iterations <- 1000
counter <- 0
for (i in iterations){
s <- draw(5)
if(length(unique(s[,1])) == 2) counter <- counter + 1
}
counter
Output: [1] 0
I have tried multiple other things, including counter[i] <- 1 for successful cases, and with the idea of running a sum(counter) at the end, but all without getting the loop to work.
In your code you have:
for(i in 1000) {
print(i)
} # 1000
It would only print once because i would iterate once as 1000.
Here's an alternative approach using rle.
iterations <- 10000
draws <- list()
for (i in 1:iterations){
s <- draw(5)
draws[[i]] <- all(rle(sort(s[,1]))$lengths %in% c(2,3))
if(draws[[i]]) {
print(s)
}
}
summary(unlist(draws))
Using a data frame as follows, it seems to produce the result you are looking for:
suits <- c("spd","hrt","dimd","clbs")
ranks <- c(1:10,"J","Q","K")
deck <- as.data.frame(expand.grid('rank' = ranks, 'suit' = suits))
draw <- function (n) deck[sample(nrow(deck), n), ]
counter <- 0;
for (i in 1:1000) {
df <- draw(5);
counter <- counter + (length(unique(df$rank)) == 2)
}
counter
[1] 156
suits <- c("spd","hrt","dimd","clbs")
ranks <- c(1:10,"J","Q","K")
deck <- as.data.frame(expand.grid('rank' = ranks, 'suit' = suits))
draw <- function (n) deck[sample(nrow(deck), n), ]
iterations <- 1000
counter <- 0
for (i in 1:iterations) {
hand <- draw(5)
rank_table <- table(hand[, 1])
if (length(names(rank_table)) == 2 & min(rank_table) > 1) counter <- counter + 1
# could have four of a rank, one of another;
# need to ensure two of a rank, three of another
}
counter
[1] 1
This result is not far from what is expected http://www.math.hawaii.edu/~ramsey/Probability/PokerHands.html