Using the sample function - r

I want to simulate rolling the die using the sample function as shown below but my sample function does not work
n <- 6^48 - where n is the total outcomes for tossing two dice 24 times
sample(n, 2, replace = TRUE, prob = NULL)

The code below rolls a die 24 times. If two dice, you may repeat the execution.
sample(c(1,2,3,4,5,6),24,replace = TRUE,prob = NULL)

Related

Monte Carlo for 3 or more consecutive faces

I wrote this code to check for 3 or more consecutive faces in a simulation of 100000 iterations, with five rolls of a fair die. I think it is in the right track, but I am missing something. I keep getting a missing value error:
nrep = 500000
count = 0
for (i in 1:nrep) {
roll = sample(6, 5)
print(roll)
if (roll[i] == roll[i+1] & roll[i+1] == roll[i+2]) count = count + 1
}
print(count)
Please advise on a correction using base R only.
Adding to my comment, you can use the function rle() to compute the lengths and values of runs of equal values in a vector. You can do something like the following
nrep = 500000
count = 0
for (i in 1:nrep) {
roll = sample(6, 5, replace = TRUE)
roll_rle = rle(roll)
if (any(roll_rle$lengths >= 3)) {
print(roll)
count = count + 1
}
}

Tossing 3 fair coins in R

X = # of heads showing when three coins are tossed.
Find P(X=1), and E(X).
Say, I want to solve this problem using sample(), and replicate() functions in R even though there is a function called rbinom().
My attempt:
noOfCoinTosses = 3;
noOfExperiments = 5;
mySamples <-replicate(noOfExperiments,
{mySamples <- sample(c("H", "T"), noOfCoinTosses, replace = T, prob=c(0.5, 0.5))
})
headCount = length(which(mySamples=="H"))
probOfCoinToss <- headCount / noOfExperiments # 1.6
meanOfCoinToss = ??
Am I on a right track regarding the P(X)? If yes, how can I find E(X)?
The results in mySamples stores the experiments per column, so you'll have to count the occurrence of head per column. The probability is then the frequency / nr of experiments, while the mean in this case is the frequency:
noOfCoinTosses = 3;
noOfExperiments = 5;
mySamples <-replicate(noOfExperiments,
{mySamples <- sample(c("H", "T"), noOfCoinTosses, replace = T, prob=c(0.5, 0.5))
})
headCount <- apply(mySamples,2, function(x) length(which(x=="H")))
probOfCoinToss <- length(which(headCount==1)) / noOfExperiments # 1.6
meanOfCoinToss <- length(which(headCount==1))
When you want to calculate a real mean, you can put this into a function and replicate that n times. Then the mean will become the average of the replicated meanOfCoinToss

Select a sample at random and use it to generate 1000 bootstrap samples

I would like to generate 1000 samples of size 25 from a standard normal distribution, calculate the variance of each one, and create a histogram. I have the following:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
Then I would like to randomly select one sample from those 1000 samples and take 1000 bootstraps from that sample. Then calculate the variance of each and plot a histogram. So far, I have:
sub.sample = sample(samples, 1)
Then this is where I'm stuck, I know a for loop is needed for bootstrapping here so I have:
rep.boot2 <- numeric(lengths(sub.sample))
for (i in 1:lengths(sub.sample)) {
index2 <- sample(1:1000, size = 25, replace = TRUE)
a.boot <- sub.sample[index2, ]
rep.boot2[i] <- var(a.boot)[1, 2]
}
but running the above produces an "incorrect number of dimensions" error. Which part is causing the error?
I can see 2 problems here. One is that you are trying to subset sub.sample with as you would with a vector but it is actually a list of length 1.
a.boot <- sub.sample[index2, ]
To fix this, you can change
sub.sample = sample(samples, 1)
to
sub.sample = as.vector(unlist(sample(samples, 1)))
The second problem is that you are generating a sample of 25 indexes from between 1 and 1000
index2 <- sample(1:1000, size = 25, replace = TRUE)
but then you try to extract these indexes from a list with a length of only 25. So you will end up with mostly NA values in a.boot.
If I understand what you want to do correctly then this should work:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
sub.sample = as.vector(unlist(sample(samples, 1)))
rep.boot2=list()
for (i in 1:1000) {
index2 <- sample(1:25, size = 25, replace = TRUE)
a.boot <- sub.sample[index2]
rep.boot2[i] <- var(a.boot)
}

Using sample(..., replace = FALSE) in a tidy Monte Carlo simulation using crossing

I am working through Digital Dice by Paul Nahin to teach myself Monte Carlo simulations. I am converting the Matlab code in the book to R code on the first pass, then replacing for-loops with tidy versions on the second pass.
Edit: Here is what I am looking to model:
Imagine that you face a pop quiz, a list of the 24 Presidents of the 19th century and another list of their terms in office but scrambled
The object is to match the President with the term
You get one guess every time
On average, how many do you guess correct?
Here is the R code using for-loops:
m <- 24
total_correct <- 0
n <- 10000
for (i in 1:n) {
correct <- 0
term <- sample(m, replace = TRUE)
for (j in 1:m) {
if (term[j] == j) {
correct <- correct + 1
}
}
total_correct = total_correct + correct
}
total_correct <- total_correct / n
print(total_correct)
This works (but I admit gives the wrong answer). Next is to tidy-fy this -- this is my attempt:
crossing(trials = 1:10,
m = 1:24) %>%
mutate(guess = sample.int(24, n(), replace = F), result = m == guess) %>%
summarise(score = sum(result) / n())
However, I get an error message reading
Error in sample.int(x, size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
I understand what's going on: The n() command in the mutate() statement returns 240. Sampling 240 from a population 24 with replace = FALSE is nonsensical hence the error message.
How do I get the mutuate() statement to receive a size of 24 on each iteration (or trial)?

R - Dealing with zeros in radomized subsamples

I've run into a little problem, simulating the throw of dice. Basically im doing this to get familiar with loops and their output.
Intention is to simulate the throw of two dice as follows:
R = 100
d6 = c(1:6)
d = 60
DICE = NULL
for (i in 1:R)
{
i <- as.factor((sample(d6, size=d, replace = T)) + (sample(d6, size=d, replace = T)))
j <- summary(i)
DICE = rbind(DICE, j)
}
head(DICE)
HIS = colMeans(DICE)
boxplot(DICE)
title(main= "Result 2d6", ylab= "Throws", xlab="")
relHIS = (HIS / sum(HIS))*100
relHIS
Problems occur if the result in one cathegorie is 0 (result did not occur in the sample). If this happens randomly in the first subsample one or more the categories (numbers 2-12) are missing. This causes problems ("number of columns of result is not a multiple of vector length (arg 2)") in the following subsamples.
Im sure there is a really simple solution for this, by defining everything beforehand...
Thanks for your help!
Here are some fixes:
R = 100
d6 = c(1:6)
d = 60
DICE = matrix(nrow = R, ncol = 11) #pre-allocate
colnames(DICE) <- 2:12
for (i in 1:R)
{
sim <- ordered((sample(d6, size=d, replace = T)) + (sample(d6, size=d, replace = T)),
levels = 2:12) #define the factor levels
sumsim <- table(sim)
DICE[i,] <- sumsim #sub-assign
}
head(DICE)
HIS = colMeans(DICE)
boxplot(DICE)
title(main= "Result 2d6", ylab= "Throws", xlab="")
prop.table(HIS) * 100
Always pre-allocate your result data structure. Growing it in a loop is terribly slow and you know how big it needs to be. Also, don't use the same symbol for the iteration variable and something else.
Omit as.factor()in your seventh row

Resources