How do I run a for loop so it generates repeated samples of n observations? - r

I first generated random data from a Gamma distribution using the following code
data <- rgamma(9, shape=32, scale=1/4)
I proceeded to generate a single sample of 9 observations from the population.
sample(data, 9)
I'm trying to run a for loop in R so that I can repeatedly generate samples of 9 observations and save the mean of each sample into a new vector. I want to do this 500,000 times. After the for loop I then want to create a null distribution based on the distribution created from the for loop. I am also wanting to sample with replacement. (I am also very new to R, so any suggestions or help is greatly appreciated).
Here is the code I have tried for the for loop:
v <- 500000
Storage <- numeric(9)
for (i in v) {
Storage[i] <- mean(i)
}

The easiest way to do this is like this...
means <- replicate(500000, mean(rgamma(9, shape=32, scale=1/4)))
This will generate 9 gamma variates, take the mean, and repeat the process 500,000 times, storing the result in the vector means. Definitely no need for a for loop!

Use replicate to create the vectors, then compute the means with the fast colMeans.
set.seed(2023)
data <- rgamma(9, shape=32, scale=1/4)
v <- 500000L
Storage <- replicate(v, sample(data, 9, TRUE))
mean_Storage <- colMeans(Storage)
hist(mean_Storage, freq = FALSE)
Created on 2023-02-03 with reprex v2.0.2
Or maybe you want to sample from a Gamma distribution.
set.seed(2023)
v <- 500000L
Storage <- replicate(v, rgamma(9, shape=32, scale=1/4))
mean_Storage <- colMeans(Storage)
hist(mean_Storage, freq = FALSE)
Created on 2023-02-03 with reprex v2.0.2

Related

The R package BosonSampling keeps running without result

I tried to generate boson sampling data using R package BosonSampling. Although it takes long time to generate samples for larger values of n and m, I tried smaller values but I didn't get any output from code. I don't know what is the problem.
The documentation is available in the link:
https://cran.r-project.org/web/packages/BosonSampling/index.html
the code from documentation:
library('BosonSampling')
library('Rcpp')
set.seed(7)
n <- 10
m <- 20
A <- randomUnitary(m)[,1:n]
valueList <- bosonSampler(A, sampleSize=10, perm = FALSE)$values
valueList

How can I automate creation of a list of vectors containing simulated data from a known distribution, using a "for loop" in R?

First stack exchange post so please bear with me. I'm trying to automate the creation of a list, and the list will be made up of many empty vectors of various, known lengths. The empty vectors will then be filled with simulated data. How can I automate creation of this list using a for loop in R?
In this simplified example, fish have been caught by casting a net 4 times, and their abundance is given in the vector "abundance" (from counting the number of total fish in each net). We don't have individual fish weights, just the mean weight of all fish each net, so I need to simulate their weights from a lognormal distribution. So, I'm then looking to fill those empty vectors for each net, each with a length equal to the number of fish caught in that net, with weight data simulated from a lognormal distribution with a known mean and standard deviation.
A simplified example of my code:
abundance <- c(5, 10, 9, 20)
net1 <- rep(NA, abundance[1])
net2 <- rep(NA, abundance[2])
net3 <- rep(NA, abundance[3])
net4 <- rep(NA, abundance[4])
simulated_weights <- list(net1, net2, net3, net4)
#meanlog vector for each net
weight_per_net
#meansd vector for each net
sd_per_net
for (i in 1:4) {
simulated_weights[[i]] <- rlnorm(n = abundance[i], meanlog = weight_per_net[i], sd = sd_per_net[i])
print(simulated_weights_VM)
}
Could anyone please help me automate this so that I don't have to write out each net vector (e.g. net1) by hand, and then also write out all the net names in the list() function? There are far more nets than 4 so it would be extremely time consuming and inefficient to do it this way. I've tried several things from other posts like paste0(), other for loops, as.list(c()), all to no avail.
Thanks!
HM
Turns out you don't need the net1, net2, etc variables at all. You can just do
abundance <- c(5, 10, 9, 20)
simulated_weights <- lapply(abundance, function(x) rep(NA, x))
The lapply function will return the list you need by calling the function once for each value of abundance
We could create the 'simulated_weights' with split and rep
simulated_weights <- split(rep(rep(NA, length(abundance)), abundance),
rep(seq_along(abundance), abundance))

Simulation in R, for loop

I am trying to simulate the data for 10 times in R but I did not figure out how to achieve that. The code is shown below, you could run it in R straightway! When I run it, it will give me 5 numbers of "w" as output, I think this is only one simulation, but actually what I want is 10 different simulations of that 5 numbers.
I know I will need to write a for loop for it but I did not get that, could anyone help please?
# simulate 10 times
# try N = 10, for loop?
# initial values w0 and E
w0=1000
E= 1000
data = c(-0.02343731, 0.045509474 ,0.076144158,0.09234636,0.0398257)
constant = exp(cumsum(data))
exp.cum = cumsum(1/constant)
w=constant*(W0 - exp.cum)- E
w
You'll want to generate new values of data in each simulation. Do this within the curly brackets that follow the for loop. Then, before closing the curly brackets, be sure to save your statistical output in the appropriate position in a object, like a vector. For a simple example,
W0=1000
E= 1000
n_per_sim <- 5
num_sims <- 10
set.seed(12345) #seed is necessay for reproducibility
sim_output_1 <- rep(NA, times = num_sims) #This creates a vector of 10 NA values
for (sim_number in 1:num_sims){ #this starts your for loop
data <- rnorm(n=n_per_sim, mean=10, sd=2) #generate your data
average <- mean(data)
sim_output_1[sim_number] <- average #this is where you store your output for each simulation
}
sim_output_1 #Now you can see the average from each simulation
Note that if you want to save five values from each simulation, you can make use a matrix object instead of a vector object, as shown here
matrix_output <- matrix(NA, ncol=n_per_sim, nrow=num_sims) #This creates a 10x5 matrix
for (sim_number in 1:num_sims){ #this starts your for loop
data <- rnorm(n=n_per_sim, mean=10, sd=2) #generate your data
constant = exp(cumsum(data))
exp.cum = cumsum(1/constant)
w=constant*(W0 - exp.cum)- E
matrix_output[sim_number, ] <- w #this is where you store your output for each simulation
}
matrix_output #Now you can see the average from each simulation

Row sampling in R

I use the example data to ask the question.
seed(1)
X <- data.frame(matrix(rnorm(200), nrow=20))
I wanted to select 10 random rows everytime without replacement and do a multiple regression. I tried
hi=X[sample(1:20,10),]
MR1<-lm(X10~., data=hi)
R1<-summary(MR1)$r.squared #extract the R squared
Is it possible to create 25 such datasets sampling 10 rows each time. In the end, I would like to store the sampled datasets and do a multiple regression and extract the r squared values from the 25 such models as well as well.
You could use lapply:
set.seed(1)
X <- data.frame(matrix(rnorm(200), nrow=20))
n <- 25
res <- lapply(1:n,
function(i) {
samples <- sample(1:20,10)
hi=X[samples,]
MR1<-lm(X10~., data=X)
R1<-summary(MR1)$r.squared
return(list(Samples=samples,Hi=hi,MR1=MR1,R1=R1))
})

Random sample from given bivariate discrete distribution

Suppose I have a bivariate discrete distribution, i.e. a table of probability values P(X=i,Y=j), for i=1,...n and j=1,...m. How do I generate a random sample (X_k,Y_k), k=1,...N from such distribution? Maybe there is a ready R function like:
sample(100,prob=biprob)
where biprob is 2 dimensional matrix?
One intuitive way to sample is the following. Suppose we have a data.frame
dt=data.frame(X=x,Y=y,P=pij)
Where x and y come from
expand.grid(x=1:n,y=1:m)
and pij are the P(X=i,Y=j).
Then we get our sample (Xs,Ys) of size N, the following way:
set.seed(1000)
Xs <- sample(dt$X,size=N,prob=dt$P)
set.seed(1000)
Ys <- sample(dt$Y,size=N,prob=dt$P)
I use set.seed() to simulate the "bivariateness". Intuitively I should get something similar to what I need. I am not sure that this is correct way though. Hence the question :)
Another way is to use Gibbs sampling, marginal distributions are easy to compute.
I tried googling, but nothing really relevant came up.
You are almost there. Assuming you have the data frame dt with the x, y, and pij values, just sample the rows!
dt <- expand.grid(X=1:3, Y=1:2)
dt$p <- runif(6)
dt$p <- dt$p / sum(dt$p) # get fake probabilities
idx <- sample(1:nrow(dt), size=8, replace=TRUE, prob=dt$p)
sampled.x <- dt$X[idx]
sampled.y <- dt$Y[idx]
It's not clear to me why you should care that it is bivariate. The probabilities sum to one and the outcomes are discrete, so you are just sampling from a categorical distribution. The only difference is that you are indexing the observations using rows and columns rather than a single position. This is just notation.
In R, you can therefore easily sample from your distribution by reshaping your data and sampling from a categorical distribution. Sampling from a categorical can be done using rmultinom and using which to select the index, or, as Aniko suggests, using sample to sample the rows of the reshaped data. Some bookkeeping can take care of your exact case.
Here's a solution:
library(reshape)
# Reshape data to long format.
data <- matrix(data = c(.25,.5,.1,.4), nrow=2, ncol=2)
pmatrix <- melt(data)
# Sample categorical n times.
rcat <- function(n, pmatrix) {
rows <- which(rmultinom(n,1,pmatrix$value)==1, arr.ind=TRUE)[,'row']
indices <- pmatrix[rows, c('X1','X2')]
colnames(indices) <- c('i','j')
rownames(indices) <- seq(1,nrow(indices))
return(indices)
}
rcat(3,pmatrix)
This returns 3 random draws from your matrix, reporting the i and j of the rows and columns:
i j
1 1 1
2 2 2
3 2 2

Resources