Tossing 3 fair coins in R - r

X = # of heads showing when three coins are tossed.
Find P(X=1), and E(X).
Say, I want to solve this problem using sample(), and replicate() functions in R even though there is a function called rbinom().
My attempt:
noOfCoinTosses = 3;
noOfExperiments = 5;
mySamples <-replicate(noOfExperiments,
{mySamples <- sample(c("H", "T"), noOfCoinTosses, replace = T, prob=c(0.5, 0.5))
})
headCount = length(which(mySamples=="H"))
probOfCoinToss <- headCount / noOfExperiments # 1.6
meanOfCoinToss = ??
Am I on a right track regarding the P(X)? If yes, how can I find E(X)?

The results in mySamples stores the experiments per column, so you'll have to count the occurrence of head per column. The probability is then the frequency / nr of experiments, while the mean in this case is the frequency:
noOfCoinTosses = 3;
noOfExperiments = 5;
mySamples <-replicate(noOfExperiments,
{mySamples <- sample(c("H", "T"), noOfCoinTosses, replace = T, prob=c(0.5, 0.5))
})
headCount <- apply(mySamples,2, function(x) length(which(x=="H")))
probOfCoinToss <- length(which(headCount==1)) / noOfExperiments # 1.6
meanOfCoinToss <- length(which(headCount==1))
When you want to calculate a real mean, you can put this into a function and replicate that n times. Then the mean will become the average of the replicated meanOfCoinToss

Related

How do I find the minimum using which.min function within the function I built?

I would like to build that function that allows me to import any dataset (i.e. Dataset1) which I will use as my chosen number and sample numbers to find the difference (Difference = (chosen number - sample number) ^ 2) and then find the average. With that, I would then like to find which chosen number from Dataset1 will give me the minimum average (this will be the output for my function). Note that for my chosen numbers, it could be from Dataset1 or numbers outside of Dataset1, including real numbers.
Dataset1 <- c(12739, 172392, 16477, 14738, 12223, 15473, 18999, 12278)
simulation.model <- function(import.dataset){
test.df <- data.frame(matrix(ncol = 4, nrow = 1000))
colnames(test.df) <- c("Dataset","R_Sample","Payment","Mean_Payment" )
sample.numbers <- sample(c(import.dataset, size = 1000, replace = FALSE))
f <- 0
for (i in dataset)
test.df[i,2] <- sample(c(dataset, size = 1000, replace = FALSE))
test.df[i,3] <-(dataset[i] - sample.numbers)^2
test.df[i,4] <- mean(insert.dataset[i]$Payment)
choice <- test.df[which.min(test.df$MeanPayment)]$Dataset
return(c(chosen = choice))
}
new.simulation.function(Dataset1)
I am getting an error for my function - any help will be appreciated!

Two Random Numbers Without Repeating

I'm looking to make a set of two random numbers (e.g., [1,2], [3,12]) with the first number between 1-12, and the second between 1-4. I know how to sample the two numbers independently using:
sample(1:12, 1, replace = T)
sample(1:4, 1, replace = T)
but don't know how to create a system to determine if the pairing of the two numbers has already been rolled, and if so, roll again. Any tips!?
Thanks :)
While this doesn't scale happily (in case you need large-scale simulation), you can do this:
set.seed(42)
di2 <- sample(setdiff(1:4, di1 <- sample(1:12, size = 1)), size = 1)
c(di1, di2)
# [1] 1 2
The inner (di1) assignment takes the first from 1:12, so far so good.
We then set-diff 1:4 from this so that the second sampling only has candidates that are not equal to di1;
The outer (di2) assignment samples from 1:4 without di1 if it was within 1-4.
While not an authoritative proof of correctness,
rand <- replicate(100000, local({ di2 <- sample(setdiff(1:4, di1 <- sample(1:12, size=1)), size = 1); c(di1, di2); }))
dim(rand)
# [1] 2 100000
any(rand[1,] == rand[2,])
# [1] FALSE
Are you looking for sth like:
library(tidyverse)
expand.grid(1:12,1:4) %>%
as.data.frame() %>%
slice_sample (n = 5, replace = FALSE)

Select a sample at random and use it to generate 1000 bootstrap samples

I would like to generate 1000 samples of size 25 from a standard normal distribution, calculate the variance of each one, and create a histogram. I have the following:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
Then I would like to randomly select one sample from those 1000 samples and take 1000 bootstraps from that sample. Then calculate the variance of each and plot a histogram. So far, I have:
sub.sample = sample(samples, 1)
Then this is where I'm stuck, I know a for loop is needed for bootstrapping here so I have:
rep.boot2 <- numeric(lengths(sub.sample))
for (i in 1:lengths(sub.sample)) {
index2 <- sample(1:1000, size = 25, replace = TRUE)
a.boot <- sub.sample[index2, ]
rep.boot2[i] <- var(a.boot)[1, 2]
}
but running the above produces an "incorrect number of dimensions" error. Which part is causing the error?
I can see 2 problems here. One is that you are trying to subset sub.sample with as you would with a vector but it is actually a list of length 1.
a.boot <- sub.sample[index2, ]
To fix this, you can change
sub.sample = sample(samples, 1)
to
sub.sample = as.vector(unlist(sample(samples, 1)))
The second problem is that you are generating a sample of 25 indexes from between 1 and 1000
index2 <- sample(1:1000, size = 25, replace = TRUE)
but then you try to extract these indexes from a list with a length of only 25. So you will end up with mostly NA values in a.boot.
If I understand what you want to do correctly then this should work:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
sub.sample = as.vector(unlist(sample(samples, 1)))
rep.boot2=list()
for (i in 1:1000) {
index2 <- sample(1:25, size = 25, replace = TRUE)
a.boot <- sub.sample[index2]
rep.boot2[i] <- var(a.boot)
}

How to fix a function that is supposed to count the number of Heads over 100 trials?

I am trying to write a function that simply simulates a classic coin flipping situation where the probability of obtaining heads is equal to obtaining of tails and counts the observed Heads over 100 trials.
The code that I have tried to write, on the other hand, does not do it but returns either 0 or 1:
myCoinFlips <- function(prob = 0.5)
{
nFlips <- 100
for(i in 1:length(nFlips)) {
count <- 0
result <- sample(c("Heads", "Tails"), 1, replace= TRUE, prob =
c(prob, 1-prob))
if (result[i] == "Heads") {
count <- count + 1
}
}
return(count)
}
How can I fix this code so that it displays the number of Heads out of 100 trials?
You have several problems that are hidden by your bad indentation.
count <- 0 should be outside of your loop.
You are using result as both a scalar and a vector.
Change result[i] to result
nFlips is a scalar, so length(nFlips) is 1. You need for(i in 1:nFlips)
Please indent your code properly and it will help you.
#G5W has already shown where the main code syntax issues are; to expand from that, you should consider improving the algorithm itself.
For example, simulating a single coin flip 100 times (sample(c("Heads", "Tails"), 1)) is the same (but a lot slower) as simulating 100 coin flips once (sample(c("Heads", "Tails", 100, replace = T)).
If you do sample once from c("Heads", "Tails") you don't need replace = T, because you only draw one from a total of two elements.
I recommend making nFlips a function argument to easily allow changing that parameter.
So an improved function myCoinFlips2 could look like this
myCoinFlips2 <- function(prob = 0.5, nFlips = 100) {
return(sum(sample(
c("Heads", "Tails"), nFlips, replace = T, prob = c(prob, 1 - prob)) == "Heads"))
}
or since you're not storing the series of "Heads" and "Tails" just sum TRUE and FALSE
myCoinFlips3 <- function(prob = 0.5, nFlips = 100) {
return(sum(sample(
c(TRUE, FALSE), nFlips, replace = T, prob = c(prob, 1 - prob))))
}
To demonstrate how significant the performance increase can be when sampling nFlips coin flips once as opposed to sampling a single coin flip nFlips times we can run a microbenchmark for nFlips = 10^6 coin flips
library(microbenchmark)
res <- microbenchmark(
myCoinFlips = myCoinFlips(nFlips = 10^6),
myCoinFlips2 = myCoinFlips2(nFlips = 10^6),
myCoinFlips3 = myCoinFlips3(nFlips = 10^6),
times = 10
)
#Unit: milliseconds
# expr min lq mean median uq max
# myCoinFlips 7118.30839 7379.53689 7646.05646 7722.00763 7840.07345 8235.95764
# myCoinFlips2 40.00080 41.51453 48.08246 47.16093 50.62694 65.41062
# myCoinFlips3 23.47758 25.20427 27.55469 26.36489 30.88384 32.17406
library(ggplot2)
autoplot(res)
Notice how myCoinFlips is 2 orders of magnitude slower than the other two methods.

R Find sum of x or fewer rows

I have a function that ranks a variable based on # of occurrences.
rankTab <- function (x)
{
tab1 <- data.frame(table(x))
tab1 <- tab1[order(-tab1$Freq), ]
tab1
}
I'd like to run this across a data.frame with multiple columns and figure out a rough measure of cardinality by saying for each column, what % of values are covered by the 5 most frequently occurring values. Something like this:
df$top_5_val_pct <- round(sapply(x, function(x) sum(rankTab(x)[1:max(5,nrow(x)),'Freq']) / length(x)), 4)
My problem is when there are < 5 values, I'm getting an NA as there aren't 5 rows to sum. I've tried using min and max but can't figure out how to get 5 or fewer rows. Any suggestions?
I'm having a hard time parsing the code you're using to accomplish this, but going simply off of "what % of values are covered by the 5 most frequently occurring values" I'd do something like this:
sortTab <- function(x,n){
t <- sort(table(x))
sum(tail(t,n)) / sum(t)
}
sapply(mtcars,sortTab,n = 2)
where in this example, I'm finding the proportion covered by the two most common values.
How about changing the sum() to add in na.rm = TRUE
sum(rankTab(x)[1:5, "Freq"], na.rm = TRUE)
giving
df <- data.frame(A = sample(letters[1:4], 20, replace = TRUE),
B = sample(letters[1:4], 20, replace = TRUE))
round(sapply(df, function(x) sum(sum(rankTab(x)[1:5, "Freq"], na.rm = TRUE)) / length(x)), 4)

Resources