Using R as a game simulator - r

I am trying to simulate a simple game where you spin a spinner, labeled 1-5, and then progress on until you pass the finish line (spot 50). I am a bit new to R and have been working on this for a while searching for answers. When I run the code below, it doesn't add the numbers in sequence, it returns a list of my 50 random spins and their value. How do I get this to add the spins on top of each other, then stop once => 50?
SpacesOnSpinner<-(seq(1,5,by=1))
N<-50
L1<-integer(N)
for (i in 1:N){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
L1[i]<-L1[i]+takeaspin
}

This is a good use-case for replicate. I'm not sure if you have to use a for loop, but you could do this instead (replicate is a loop too):
SpacesOnSpinner<-(seq(1,5,by=1))
N<-10
cumsum( replicate( N , sample(SpacesOnSpinner,1,replace=TRUE) ) )
#[1] 5 10 14 19 22 25 27 29 30 33
However, since you have a condition which you want to break on, perhaps the other answer with a while condition is exactly what you need in this case (people will tell you they are bad in R, but they have their uses). Using this method, you can see how many spins it took you to get past 50 by a simple subset afterwards (but you will not know in advance how many spins it will take, but at most it will be 50!):
N<-50
x <- cumsum( replicate( N , sample(5,1) ) )
# Value of accumulator at each round until <= 50
x[ x < 50 ]
#[1] 5 6 7 8 12 16 21 24 25 29 33 34 36 38 39 41 42 44 45 49
# Number of spins before total <= 50
length(x[x < 50])
[1] 20

Here is another interesting way to simulate your game, using a recursive function.
spin <- function(outcomes = 1:5, start = 0L, end = 50L)
if (start <= end)
c(got <- sample(outcomes, 1), Recall(outcomes, start + got, end))
spin()
# [1] 5 4 4 5 1 5 3 2 3 4 4 1 5 4 3
Although elegant, it won't be as fast as an improved version of #Simon's solution that makes a single call to sample, as suggested by #Viktor:
spin <- function(outcomes = 1:5, end = 50L) {
max.spins <- ceiling(end / min(outcomes))
x <- sample(outcomes, max.spins, replace = TRUE)
head(x, match(TRUE, cumsum(x) >= end))
}
spin()
# [1] 3 5 2 3 5 2 2 5 1 2 1 5 5 5 2 4
For your ultimate goal (find the probability of one person being in the lead for the entire game), it is debatable whether while will be more efficient or not: a while loop is certainly slower, but you may benefit from the possibility of exiting early as the lead switches from one player to the other. Both approaches are worth testing.

You can use a while statement and a variable total for keeping track of the sum:
total <- 0
while(total <= 50){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
total <- takeaspin + total
}
print (total)

Related

How to simulate a martingale process problem in R?

100 people are watching a theater.At the end of the show all of them are visiting the vesting room in order to take their coats.The man working on the vesting room give back people's coat totally at random.The participants that they will pick the right coat leave.The other that have picked the wrong one, give back the coat and the man again randomly gives back the coat.The process ends when all the customers of the theater take back their right coat.
I want to simulate in R this martingale process in order to find the expected time that this process will end.
But I don't know how .Any help ?
Something like:
# 100 customers
x = seq(1,100,by=1);x
# random sample from x
y = sample(x,100,replace=FALSE)
x==y
# for the next iteration exclude those how are TRUE and run it again until everyone is TRUE
The expected time is how many iterations where needed .
Or something like this :
n = 100
X = seq(1,100,by=1)
martingale = rep(NA,n)
iterations = 0
accept = 0
while (X != n) {
iterations = iterations + 1
y = sample(1:100,100,replace=FALSE)
if (X = y){
accept = accept + 1
X = X+1
martingale [X] = y
}
}
accept
iterations
One way to do this is as follows (using 10 people as an example, the print statement is unnecessary, just to show what's done in each iteration):
set.seed(0)
x <- 1:10
count <- 0
while(length(x) > 0){
x <- x[x != sample(x)]
print(x)
count <- count + 1
}
# [1] 1 2 3 4 5 6 7 9 10
# [1] 3 4 5 6 7 9
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 6
#
count
# [1] 10
For each step in the loop, it removes the values of x where the customers have been randomly allocated their coat, until there are none left.
To use this code to get the expected time taken for 100 people, you could extend it to:
set.seed(0)
nits <- 1000 #simulate the problem 1000 times
count <- 0
for (i in 1:nits){
x <- 1:100
while(length(x) > 0){
x <- x[x != sample(x)]
count <- count + 1/nits
}
}
count
# [1] 99.901
I hypothesise without proof that the expected time for n people is n iterations - it seems pretty close when I tried with 50, 100 or 200 people.
I didn't follow your discussion above and I'm not entirely sure if that's what you want, but my rationale was as follows:
You have N people and queue them.
In the first round the first person has a chance of 1/N to get their clothes right.
At this point you have two options. Eitehr person 1 gets their clothes right or not.
If person 1 gets their clothes right, then person 2 has a chance of 1/(N-1) to get their clothes right. If person 1 didn't get the correct clothes, person 1 remains in the pool (at the end), and person 2 also has a 1/N probability to get their clothes right.
You continue to assign thes probabilities until all N persons have seen the clerk once. Then you sort out those who have the right clothes and repeat at step 1 until everyone has their clothes right.
For simulation purposes, you'd of course repeat the whole thing 1000 or 10000 times.
If I understand you correctly, you are interstes in the number of iterations, i.e. how often does the clerk have to go through the whole queue (or what remains of it) until everyone has their clothes.
library(tidyverse)
people <- 100
results <- data.frame(people = 1:people,
iterations = NA)
counter <- 0
finished <- 0
while (finished < people)
{
loop_people <- results %>%
filter(is.na(iterations)) %>%
pull(people)
loop_prob <- 1/length(loop_people)
loop_correct <- 0
for (i in 1:length(loop_people))
{
correct_clothes_i <- sample(c(0,1), size = 1, prob = c(1-loop_prob, loop_prob))
if (correct_clothes_i == 1)
{
results[loop_people[i], 2] <- counter + 1
loop_correct <- loop_correct + 1
loop_prob <- 1/(length(loop_people) - loop_correct)
}
}
counter <- counter + 1
finished <- length(which(!is.na(results$iterations)))
}
max(results$iterations)
[1] 86
head(results)
people iterations
1 1 7
2 2 42
3 3 86
4 4 67
5 5 2
6 6 9
The results$iterations column contains the iteration number where each person has gotten their clothes right, thus max(results$iterations) gives you the total number of loops.
I have no proof, but empirically and intuitively the number of required iterations should approach N.

How to speed up this function (for n parameters) in R?

I have this function:
col <- 0
rres <- data.frame(matrix(nrow=nrow(ind),ncol=length(lt)))
gig <- NULL
> lt
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
> delta.1
[1] 5 7 9 10 12 15 17 20 22 26 29 34 39 46 54 68 96 138 138
> f.bio
function(x,y,a,b,l,k,m)
{
for (t in 1:nrow(y)){
for (i in 1:length(lt)){
for(j in 1:delta.1[i]){
ifelse (t+j-1>nrow(x),gig[j]<- NA,
gig[j] <- x[t+j-1,i]*
(a*(l-(((l-(lt[i]+1))/(exp(-k*((j-1)/12))))))
^b)*exp(m[(1+j),i]*(j-1)))
}
rres[t,i] <- sum(gig, na.rm = TRUE)
}
result <- apply(rres,1,function(x) sum(x)/1000000)
}
return(result)
}
which it is apply to some biological data, the code is:
f.bio(ind,eff,a_all,b_all,Linf,K_coef,mort)
where the arguments are:
> dim(ind)
[1] 1356 19
> dim(eff)
[1] 1356 1
a_all = 0.004
b_all= 3
Linf= 19.4
K_coef = 0.57
> dim(mort)
[1] 110 19
ind, eff, and mort are data.frame.
Now, my question is, is possible to apply this function to n parameters, without excessive time machine?
I mean for n parameters a distribution of a certain parameters, for example:
set.seed(1)
a_all_v <- round(sort(rnorm(40,a_all,0.00034)),5) #40 values!!
and so on for the 4 par: a_all, b_all, K_coef, Linf
I wrote this code, with loop ( in this loop i can combine a_all with b_all, and Linf with K_coef):
col <- 0
for (m1 in 1:length(a_all_v)){
a_all <- a_all_v[m1]
b_all <- b_all_v[m1]
for(m2 in 1:length(Linf_v)){
Linf <- Linf_v[m2]
K_coef <- k_coef_v[m2]
col <- col+1
res.temp <-f.bio(ind,eff,a_all,b_all,Linf,K_coef,mort)
res.2[,col] <-res.temp
}
}
where res.2 is:
res.2 <- data.frame(matrix(nrow=1356,ncol=1600)) #1600=40*40 (number of values for each parameters distribution)
This loop employ many time machine (many day on my PC). For these reason, there is some package or function (like Monte Carlo or bootstrap) that can change my code structure, and run the function with a good number of parameters combination, in little time (if is possible)?
If you keep your current setup with for loops, you need to start preallocating your output objects. For example, you start with an empty gig (NULL) and iteratively fill it. However, the way you do it right now gig needs to be rebuild every iteration as the analysis progresses, and reallocation of memory is a very expensive operation. Simply making gig as large as it needs to be and then doing the assignment will speed up your code tremendously.
Even better is to solve your problem not via for loops (which are notoriously slow, even with preallocation) but use either:
Vectorisation, matrix calculations. These will be order of magnitude faster.
dplyr or data.table. If smartly used, these will also be much faster, but vectorisation is probably even faster.

Create master list adding iterations of values in another list with known interval in R

I have a solution that works, but would appreciate ideas to improve the code to avoid using loops if possible.
I have a list of values, this is read in from a csv file, but takes the form
startingvalues = c(1, 7, 20, 32, 47)
I want to create a new list, that reads in each of these starting values and adds the next 2 (or 7 or 15 etc.) numbers, then goes to the next. For the above example this would be
newlist = c(1,2,3,7,8,9,20,21,22,32,33,34,47,48,49)
I have code that works, but I suspect there is a more elegant way to do this. I am not particularly worried about speed but would like to avoid the loop if there is a better way to do this.
newlist = c() # initialise an empty list
for (i in 1:length(startingvalues){
list1 = seq(startingvalues[i,1],startingvalues[i,1]+2, by = 1)
newlist = c(newlist,list1)
}
Any suggestions to improve my coding would be appreciated. This may be the best way to do this, however I suspect it isn't.
How about something like this
extend <- function(x,y) unlist(lapply(x, seq.int, length.out=y+1))
extend(startingvalues, 2)
# [1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49
The first parameter is the vector of numbers and the second is how far you want to extend each number. We just us an lapply for the iteration and unlist the thing in the end. This is better than appending at each iteration which is not very efficient.
Here's another alternative
extend <- function(x,y) c(outer(0:y, x, `+`))
The outer() will build a matrix but we coerce back to a vector with c().
We can use rep with + to get the expected output
unique(sort(startingvalues + rep(0:2, each = length(startingvalues))))
#[1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49
Or as #thelatemail mentioned replicating the 'startingvalues' and make use of the recycling would be better as sort can be avoided
s1 <- 0:2
rep(startingvalues, each=length(s1)) + s1
#[1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49

Parallel processing for multiple nested for loops

I am trying to run simulation scenarios which in turn should provide me with the best scenario for a given date, back tested a couple of months. The input for a specific scenario has 4 input variables with each of the variables being able to be in 5 states (625 permutations). The flow of the model is as follows:
Simulate 625 scenarios to get each of their profit
Rank each of the scenarios according to their profit
Repeat the process through a 1-day expanding window for the last 2 months starting on the 1st Dec 2015 - creating a time series of ranks for each of the 625 scenarios
The unfortunate result for this is 5 nested for loops which can take extremely long to run. I had a look at the foreach package, but I am concerned around how the combining of the outputs will work in my scenario.
The current code that I am using works as follows, first I create the possible states of each of the inputs along with the window
a<-seq(as.Date("2015-12-01", "%Y-%m-%d"),as.Date(Sys.Date()-1, "%Y-%m-%d"),by="day")
#input variables
b<-seq(1,5,1)
c<-seq(1,5,1)
d<-seq(1,5,1)
e<-seq(1,5,1)
set.seed(3142)
tot_results<-NULL
Next the nested for loops proceed to run through the simulations for me.
for(i in 1:length(a))
{
cat(paste0("\n","Current estimation date: ", a[i]),";itteration:",i," \n")
#subset data for backtesting
dataset_calc<-dataset[which(dataset$Date<=a[i]),]
p=1
results<-data.frame(rep(NA,625))
for(j in 1:length(b))
{
for(k in 1:length(c))
{
for(l in 1:length(d))
{
for(m in 1:length(e))
{
if(i==1)
{
#create a unique ID to merge onto later
unique_ID<-paste0(replicate(1, paste(sample(LETTERS, 5, replace=TRUE), collapse="")),round(runif(n=1,min=1,max=1000000)))
}
#Run profit calculation
post_sim_results<-profit_calc(dataset_calc, param1=e[m],param2=d[l],param3=c[k],param4=b[j])
#Exctract the final profit amount
profit<-round(post_sim_results[nrow(post_sim_results),],2)
results[p,]<-data.frame(unique_ID,profit)
p=p+1
}
}
}
}
#extract the ranks for all scenarios
rank<-rank(results$profit)
#bind the ranks for the expanding window
if(i==1)
{
tot_results<-data.frame(ID=results[,1],rank)
}else{
tot_results<-cbind(tot_results,rank)
}
suppressMessages(gc())
}
My biggest concern is the binding of the results given that the outer loop's actions are dependent on the output of the inner loops.
Any advice on how proceed would greatly be appreciated.
So I think that you can vectorize most of this, which should give a big reduction in run time.
Currently, you use for-loops (5, to be exact) to create every combination of values, and then run the values one by one through profit_calc (a function that is not specified). Ideally, you'd just take all possible combinations in one go and push them through profit_calc in one single operation.
-- Rationale --
a <- 1:10
b <- 1:10
d <- rep(NA,10)
for (i in seq(a)) d[i] <- a[i] * b[i]
d
# [1] 1 4 9 16 25 36 49 64 81 100
Since * also works on vectors, we can rewrite this to:
a <- 1:10
b <- 1:10
d <- a*b
d
# [1] 1 4 9 16 25 36 49 64 81 100
While it may save us only one line of code, it actually reduces the problem from 10 steps to 1 step.
-- Application --
So how does that apply to your code? Well, given that we can vectorize profit_calc, you can basically generate a data frame where each row is every possible combination of your parameters. We can do this with expand.grid:
foo <- expand.grid(b,c,d,e)
head(foo)
# Var1 Var2 Var3 Var4
# 1 1 1 1 1
# 2 2 1 1 1
# 3 3 1 1 1
# 4 4 1 1 1
# 5 5 1 1 1
# 6 1 2 1 1
Lets say we have a formula... (a - b) / (c + d)... Then it would work like:
bar <- (foo[,1] - foo[,2]) * (foo[,3] + foo[,4])
head(bar)
# [1] 0 2 4 6 8 -2
So basically, try to find a way to replace for-loops with vectorized options. If you cannot vectorize something, try looking into apply instead, as that can also save you some time in most cases. If your code is running too slow, you'd ideally first see if you can write a more efficient script. Also, you may be interested in the microbenchmark library, or ?system.time.

help with rle command

I'm having some trouble with an rle command that is designed to find the point at which participants reach 8 contiguous ones in a row.
For example, if:
x <- c(0,1,0,1,1,1,1,1,1,1,1,1)
i want to return a value of 11.
Thanks to DWin to I've been using this piece of code:
which( rle(x2)$values==1 & rle(x2)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
I've been using this code successfully to process my data. However, i noticed that it made a mistake when processing one of my data files.
For example, if
x <- c(1,1,1,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
the code returns 19, which is the point at which eight contiguous zeros in a row is reached. i'm not sure what is going wrong or how it fix it.
thanks in advance for your help.
Will
You need to paste the first line of code in its entirety into the second:
sum(rle(x)$lengths[ 1:(min(which( rle(x2)$values==1 & rle(x2)$lengths >= 8))-1) ]) + 8
[1] 39
However, here is another approach, using the function filter. This yields the same result in what I consider to be much more readable code:
which(filter(x2, rep(1/8, 8), sides=1) == 1)[1]
[1] 39
The filter function when used in this way essentially computes a moving average over a block of 8 values in the vector. I then return the position of the first value where the moving average equals 1.
In the basic programming course I teach, I advise students to give proper names to subresults, and to inspect these subresults:
lengthOfrepeatsOfAnything<-rle(x)$lengths
#4 2 5 11 2 2 3 2 17
whichRepeatsAreOfOnes<-rle(x)$values==1
#1 3 5 7 9
repeatsOfOnesLength<-lengthOfrepeatsOfAnything * whichRepeatsAreOfOnes #TRUE = 1, FALSE=0
#4 0 5 0 2 0 3 0 17
whichRepeatOfOneAreLongerThanEight<-which(repeatsOfOnesLength >= 8)
#9
result<-NA
if(length(whichRepeatOfOneAreLongerThanEight)>0){
firstRepeatOfOneAreLongerThanEight<-whichRepeatOfOneAreLongerThanEight[1]
#9
if(firstRepeatOfOneAreLongerThanEight==1){
result<-8
}
else{
repeatsBeforeFirstEightOnes<-1:(firstRepeatOfOneAreLongerThanEight-1)
#1 2 3 4 5 6 7 8
lengthsOfRepeatsBeforeFirstEightOnes<-lengthOfrepeatsOfAnything[repeatsBeforeFirstEightOnes]
#4 2 5 11 2 2 3 2
result<-sum(lengthsOfRepeatsBeforeFirstEightOnes) + 8
}
}
I know it doesn't look as dandy as a oneline solution, but it helps to make things clear and to pick up errors... Besides: what if you look back at this code in 4 months? Which one will be easier to understand again?
My advice would be to break the code up into simpler pieces. As suggested by #Nick, you want to write code which can be easily debugged and modular coding allows you to do that.
# find runs of 0s and 1s
run_01 = rle(x)
# find run of 1's with length >=8
run_1 = with(run_01, which(values == 1 & lengths >=8))
# find starting position of run_1
start_pos = sum(run_01$lengths[1:(run_1 - 1)])
# add 8 to it
end_pos = start_pos + 8

Resources