Repeat iteration in a for loop in r - r

I am trying to generate a for loop that will repeat a sequence of the following:
sample(x = 1:14, size = 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4)
I want it to repeat 5000 times. So far, I include the above as the body of the loop and added
for (i in seq_along[1:5000]){
at the beginning but I am getting an error message saying
Error in seq_along[1:10000] : object of type 'builtin' is not subsettable

We need replicate
out <- replicate(5000, sample(x = 1:14, size = 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4)), simplify = FALSE)

There are a few issues here.
#MartinGal noted the syntax issues with seq_along and the missing ). Note that you can use seq(n) or 1:n in defining the number of loops.
You are not storing the sampled vectors anywhere, so the for loop will run the code but you won't capture the output.
You have x = 1:14 but you only have 4 prob values, which suggests you intended x = 1:4 (either that or you are 10 prob values short).
Here's one way to address these issues using a for loop.
n <- 5
s <- 10
xmax <- 4
p <- 1/4
out <- matrix(nrow = n, ncol = s, byrow = TRUE)
set.seed(1L)
for (i in seq(n)) {
out[i, ] <- sample(x = seq(xmax), size = s, replace = TRUE, prob = rep(p, xmax))
}

As andrew reece notes in his comment, it looks like you want x = 1:4 Depending what you want to do with your result you could generate all of the realizations at one time since you are sampling with replacement and then store the result in a matrix with 5000 rows of 10 realizations per row. So:
x <- sample(1:4, size = 5000 * 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4))
result <- matrix(x, nrow = 5000)

Related

Sample draw in sapply without replacement

How does one draw a sample within a sapply function without replacement? Consider the following MWE below. What I am trying to achieve is for a number in idDRAW to receive a letter from chrSMPL (given the sample size of chrSMPL). Whether a number from idDRAW receives a letter is determined by the respective probabilities, risk factors and categories. This is calculated in the sapply function and stored in tmp.
The issue is sample replacement, leading to a number being named with a letter more than once. How can one avoid replacement whilst still using the sapply function? I have tried to adjust the code from this question (Alternative for sample) to suit my needs, but no luck. Thanks in advance.
set.seed(3)
chr<- LETTERS[1:8]
chrSMPL<- sample(chr, size = 30, replace = TRUE)
idDRAW<- sort(sample(1:100, size = 70, replace = FALSE))
p_mat<- matrix(runif(16, min = 0, max = 0.15), ncol = 2); rownames(p_mat) <- chr ## probability matrix
r_mat <- matrix(rep(c(0.8, 1.2), each = length(chr)), ncol = 2); rownames(r_mat) <- chr ## risk factor matrix
r_cat<- sample(1:2, 70, replace = TRUE) ## risk categories
# find number from `idDRAW` to be named a letter:
Out<- sapply(chrSMPL, function(x){
tmp<- p_mat[x, 1] * r_mat[x, r_cat]
sample(idDRAW, 1, prob = tmp)
})
> sort(Out)[1:3]
G B B
5 5 5
I managed with an alternative solution using a for loop as seen below. If anyone can offer suggestions on how the desired result can be achieved without using a for loop it would be greatly appreciated.
set.seed(3)
Out <- c()
for(i in 1:length(chrSMPL)){
tmp <- p_mat[chrSMPL[i], 1] * r_mat[chrSMPL[i], r_cat]
Out <- c(Out, sample(idDRAW, 1, prob = tmp))
rm <- which(idDRAW == Out[i])
idDRAW <- idDRAW[-rm]
r_cat <- r_cat[-rm]
}
names(Out) <- chrSMPL
sort(Out)[1:3]

Select a sample at random and use it to generate 1000 bootstrap samples

I would like to generate 1000 samples of size 25 from a standard normal distribution, calculate the variance of each one, and create a histogram. I have the following:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
Then I would like to randomly select one sample from those 1000 samples and take 1000 bootstraps from that sample. Then calculate the variance of each and plot a histogram. So far, I have:
sub.sample = sample(samples, 1)
Then this is where I'm stuck, I know a for loop is needed for bootstrapping here so I have:
rep.boot2 <- numeric(lengths(sub.sample))
for (i in 1:lengths(sub.sample)) {
index2 <- sample(1:1000, size = 25, replace = TRUE)
a.boot <- sub.sample[index2, ]
rep.boot2[i] <- var(a.boot)[1, 2]
}
but running the above produces an "incorrect number of dimensions" error. Which part is causing the error?
I can see 2 problems here. One is that you are trying to subset sub.sample with as you would with a vector but it is actually a list of length 1.
a.boot <- sub.sample[index2, ]
To fix this, you can change
sub.sample = sample(samples, 1)
to
sub.sample = as.vector(unlist(sample(samples, 1)))
The second problem is that you are generating a sample of 25 indexes from between 1 and 1000
index2 <- sample(1:1000, size = 25, replace = TRUE)
but then you try to extract these indexes from a list with a length of only 25. So you will end up with mostly NA values in a.boot.
If I understand what you want to do correctly then this should work:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
sub.sample = as.vector(unlist(sample(samples, 1)))
rep.boot2=list()
for (i in 1:1000) {
index2 <- sample(1:25, size = 25, replace = TRUE)
a.boot <- sub.sample[index2]
rep.boot2[i] <- var(a.boot)
}

Most efficient way to use values of Matrix as row indices to lookup values in another matrix in R

I am repeatetly drawing large matrices with random values from a Monte Carlo Simulation. As I explore a large parameter space, the simulation will most likely run for several days, therefore I am trying to find most efficient way to shave off as much time as possible. Consider the following code with a 500x18 Matrix as an example.
U = matrix(sample.int(500, size = 500*18, replace = TRUE), nrow = 500, ncol = 18)
X = matrix(nrow= 500, ncol = 18)
Marginals = matrix(runif(500*18, min = 0, max = 1),500,18)
for (i in 1:18){
for (k in 1:500){
X[k,i] = Marginals[U[k,i],i]
}
}
The randomly drawn values in U serve as the row index, while the col index is provided by column of the respective U.
I know that loops are usually not the R away, is there a more efficient way to use e.g. apply here?
By Yogos Suggesiton, the most efficient code can make due without the k loop:
U = matrix(sample.int(500, size = 500*18, replace = TRUE), nrow = 500, ncol = 18)
X = matrix(nrow= 500, ncol = 18)
Marginals = matrix(runif(500*18, min = 0, max = 1),500,18)
for (i in 1:18){
X[, i] <- Marginals[U[, i], i]
}
You can speed up by calculating column by column:
for (i in 1:18) X[, i] <- Marginals[U[, i], i]
Eventually the following is equivalent to your code:
X <- replicate(18, sample(runif(500), repl=TRUE))
(this will not be much faster than my first variant, but the code is more compact)

Generating n samples using a vector

How can i generate 1000 samples with size 8 from a vector containing 20 elements in R. How can i make a single sample a thousand?
please help
If X is the vector containing your 20 elements, the you can use:
sample(X, 8, replace = TRUE, prob = NULL)
Loop this statement 1000 times as below:
Results <- matrix(, nrow = 1000, ncol = 8)
X=1:20
for (i in 1:1000){
Results[i, ]<-sample(X,8,replace=TRUE,prob=NULL)
}
Each row in the matrix called Sample should now represent each of your 1000 samples.
I think using *apply family is better than using a for loop as R is vectorized.
Below is a code that even work on multicore
X=1:20
# on linux
library(parallel)
library(magrittr)
mclapply(rep(list(X), 1000), sample, 8, replace = TRUE, prob = NULL) %>%
simplify2array
# on windows
cl <- makeCluster(detectCores()) # type = "MPI" / type = "PSOCK"
parLapply(cl, rep(list(X), 1000), sample, 8, replace = TRUE, prob = NULL) %>%
simplify2array
stopCluster(cl)

Fill elements of a list without looping

I am trying not to use a for loop to assign values to the elements of a list.
Here, I create an empty list, gives it a length of 20 and name each of the 20 elements.
mylist <- list()
length(mylist) <- 20
names(mylist) <- paste0("element", 1:20, sep = "")
I want each element of mylist to contain samples drawn from a pool of randomly generated numbers denoted as x:
x <- runif(100, 0, 1)
I tried the following codes, which do not get to the desired result:
mylist[[]] <- sample(x = x, size = 20, replace = TRUE) # Gives an error
mylist[[1:length(mylist)]] <- sample(x = x, size = 20, replace = TRUE) # Does not give the desired result
mylist[1:length(mylist)] <- sample(x = x, size = 20, replace = TRUE) # Gives the same undesired result as the previous line of code
mylist[] <- sample(x = x, size = 20, replace = TRUE) # Gives the same undesired result as the previous line of code
P.S. As explained above, the desired result is a list of 20 elements, which individually contains 20 numeric values. I can do it using a for loop, but I would like to become a better R user and use vectorized operations as much as possible.
Thank you for your help.
Maybe replicate is what you're looking for.
mylist <- replicate(20, sample(x = x, size = 20, replace = TRUE), simplify=FALSE)
names(mylist) <- paste0("element", 1:20, sep = "")
Note that there is no need to first create a list, replicate will do it for you.
Since you're using replace=TRUE you could also generate all 400 at once and then split them up. If you were doing this many times, this probably would be faster than replicate. For only 20 times, the speed difference won't matter hardly at all and tje code using replicate is perhaps easier to read and understand and so might be preferred for that reason.
foo <- sample(x = x, size = 20*20, replace = TRUE)
mylist <- split(foo, rep(1:20, each=20))
Alternatively, you could split them by converting to a data frame first. Not sure which would be faster.
mylist <- as.list(as.data.frame(matrix(foo, ncol=20)))

Resources