How can i generate 1000 samples with size 8 from a vector containing 20 elements in R. How can i make a single sample a thousand?
please help
If X is the vector containing your 20 elements, the you can use:
sample(X, 8, replace = TRUE, prob = NULL)
Loop this statement 1000 times as below:
Results <- matrix(, nrow = 1000, ncol = 8)
X=1:20
for (i in 1:1000){
Results[i, ]<-sample(X,8,replace=TRUE,prob=NULL)
}
Each row in the matrix called Sample should now represent each of your 1000 samples.
I think using *apply family is better than using a for loop as R is vectorized.
Below is a code that even work on multicore
X=1:20
# on linux
library(parallel)
library(magrittr)
mclapply(rep(list(X), 1000), sample, 8, replace = TRUE, prob = NULL) %>%
simplify2array
# on windows
cl <- makeCluster(detectCores()) # type = "MPI" / type = "PSOCK"
parLapply(cl, rep(list(X), 1000), sample, 8, replace = TRUE, prob = NULL) %>%
simplify2array
stopCluster(cl)
Related
I am trying to generate a for loop that will repeat a sequence of the following:
sample(x = 1:14, size = 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4)
I want it to repeat 5000 times. So far, I include the above as the body of the loop and added
for (i in seq_along[1:5000]){
at the beginning but I am getting an error message saying
Error in seq_along[1:10000] : object of type 'builtin' is not subsettable
We need replicate
out <- replicate(5000, sample(x = 1:14, size = 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4)), simplify = FALSE)
There are a few issues here.
#MartinGal noted the syntax issues with seq_along and the missing ). Note that you can use seq(n) or 1:n in defining the number of loops.
You are not storing the sampled vectors anywhere, so the for loop will run the code but you won't capture the output.
You have x = 1:14 but you only have 4 prob values, which suggests you intended x = 1:4 (either that or you are 10 prob values short).
Here's one way to address these issues using a for loop.
n <- 5
s <- 10
xmax <- 4
p <- 1/4
out <- matrix(nrow = n, ncol = s, byrow = TRUE)
set.seed(1L)
for (i in seq(n)) {
out[i, ] <- sample(x = seq(xmax), size = s, replace = TRUE, prob = rep(p, xmax))
}
As andrew reece notes in his comment, it looks like you want x = 1:4 Depending what you want to do with your result you could generate all of the realizations at one time since you are sampling with replacement and then store the result in a matrix with 5000 rows of 10 realizations per row. So:
x <- sample(1:4, size = 5000 * 10, replace = TRUE, prob = c(1/4,1/4,1/4,1/4))
result <- matrix(x, nrow = 5000)
How does one draw a sample within a sapply function without replacement? Consider the following MWE below. What I am trying to achieve is for a number in idDRAW to receive a letter from chrSMPL (given the sample size of chrSMPL). Whether a number from idDRAW receives a letter is determined by the respective probabilities, risk factors and categories. This is calculated in the sapply function and stored in tmp.
The issue is sample replacement, leading to a number being named with a letter more than once. How can one avoid replacement whilst still using the sapply function? I have tried to adjust the code from this question (Alternative for sample) to suit my needs, but no luck. Thanks in advance.
set.seed(3)
chr<- LETTERS[1:8]
chrSMPL<- sample(chr, size = 30, replace = TRUE)
idDRAW<- sort(sample(1:100, size = 70, replace = FALSE))
p_mat<- matrix(runif(16, min = 0, max = 0.15), ncol = 2); rownames(p_mat) <- chr ## probability matrix
r_mat <- matrix(rep(c(0.8, 1.2), each = length(chr)), ncol = 2); rownames(r_mat) <- chr ## risk factor matrix
r_cat<- sample(1:2, 70, replace = TRUE) ## risk categories
# find number from `idDRAW` to be named a letter:
Out<- sapply(chrSMPL, function(x){
tmp<- p_mat[x, 1] * r_mat[x, r_cat]
sample(idDRAW, 1, prob = tmp)
})
> sort(Out)[1:3]
G B B
5 5 5
I managed with an alternative solution using a for loop as seen below. If anyone can offer suggestions on how the desired result can be achieved without using a for loop it would be greatly appreciated.
set.seed(3)
Out <- c()
for(i in 1:length(chrSMPL)){
tmp <- p_mat[chrSMPL[i], 1] * r_mat[chrSMPL[i], r_cat]
Out <- c(Out, sample(idDRAW, 1, prob = tmp))
rm <- which(idDRAW == Out[i])
idDRAW <- idDRAW[-rm]
r_cat <- r_cat[-rm]
}
names(Out) <- chrSMPL
sort(Out)[1:3]
I would like to generate 1000 samples of size 25 from a standard normal distribution, calculate the variance of each one, and create a histogram. I have the following:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
Then I would like to randomly select one sample from those 1000 samples and take 1000 bootstraps from that sample. Then calculate the variance of each and plot a histogram. So far, I have:
sub.sample = sample(samples, 1)
Then this is where I'm stuck, I know a for loop is needed for bootstrapping here so I have:
rep.boot2 <- numeric(lengths(sub.sample))
for (i in 1:lengths(sub.sample)) {
index2 <- sample(1:1000, size = 25, replace = TRUE)
a.boot <- sub.sample[index2, ]
rep.boot2[i] <- var(a.boot)[1, 2]
}
but running the above produces an "incorrect number of dimensions" error. Which part is causing the error?
I can see 2 problems here. One is that you are trying to subset sub.sample with as you would with a vector but it is actually a list of length 1.
a.boot <- sub.sample[index2, ]
To fix this, you can change
sub.sample = sample(samples, 1)
to
sub.sample = as.vector(unlist(sample(samples, 1)))
The second problem is that you are generating a sample of 25 indexes from between 1 and 1000
index2 <- sample(1:1000, size = 25, replace = TRUE)
but then you try to extract these indexes from a list with a length of only 25. So you will end up with mostly NA values in a.boot.
If I understand what you want to do correctly then this should work:
samples = replicate(1000, rnorm(25,0,1), simplify=FALSE)
hist(sapply(samples, var))
sub.sample = as.vector(unlist(sample(samples, 1)))
rep.boot2=list()
for (i in 1:1000) {
index2 <- sample(1:25, size = 25, replace = TRUE)
a.boot <- sub.sample[index2]
rep.boot2[i] <- var(a.boot)
}
I am simulating dice throws, and would like to save the output in a single object, but cannot find a way to do so. I tried looking here, here, and here, but they do not seem to answer my question.
Here is my attempt to assign the result of a 20 x 3 trial to an object:
set.seed(1)
Twenty = for(i in 1:20){
trials = sample.int(6, 3, replace = TRUE)
print(trials)
i = i+1
}
print(Twenty)
What I do not understand is why I cannot recall the function after it is run?
I also tried using return instead of print in the function:
Twenty = for(i in 1:20){
trials = sample.int(6, 3, replace = TRUE)
return(trials)
i = i+1
}
print(Twenty)
or creating an empty matrix first:
mat = matrix(0, nrow = 20, ncol = 3)
mat
for(i in 1:20){
mat[i] = sample.int(6, 3, replace = TRUE)
print(mat)
i = i+1
}
but they seem to be worse (as I do not even get to see the trials).
Thanks for any hints.
There are several things wrong with your attempts:
1) A loop is not a function nor an object in R, so it doesn't make sense to assign a loop to a variable
2) When you have a loop for(i in 1:20), the loop will increment i so it doesn't make sense to add i = i + 1.
Your last attempt implemented correctly would look like this:
mat <- matrix(0, nrow = 20, ncol = 3)
for(i in 1:20){
mat[i, ] = sample.int(6, 3, replace = TRUE)
}
print(mat)
I personally would simply do
matrix(sample.int(6, 20 * 3, replace = TRUE), nrow = 20)
(since all draws are independent and with replacement, it doesn't matter if you make 3 draws 20 times or simply 60 draws)
Usually, in most programming languages one does not assign objects to for loops as they are not formally function objects. One uses loops to interact iteratively on existing objects. However, R maintains the apply family that saves iterative outputs to objects in same length as inputs.
Consider lapply (list apply) for list output or sapply (simplified apply) for matrix output:
# LIST OUTPUT
Twenty <- lapply(1:20, function(x) sample.int(6, 3, replace = TRUE))
# MATRIX OUTPUT
Twenty <- sapply(1:20, function(x) sample.int(6, 3, replace = TRUE))
And to see your trials, simply print out the object
print(Twenty)
But since you never use the iterator variable, x, consider replicate (wrapper to sapply which by one argument can output a matrix or a list) that receives size and expression (no sequence inputs or functions) arguments:
# MATRIX OUTPUT (DEFAULT)
Twenty <- replicate(20, sample.int(6, 3, replace = TRUE))
# LIST OUTPUT
Twenty <- replicate(20, sample.int(6, 3, replace = TRUE), simplify = FALSE)
You can use list:
Twenty=list()
for(i in 1:20){
Twenty[[i]] = sample.int(6, 3, replace = TRUE)
}
I'd like to sample a vector x of length 7 with replacement and sample that vector 10 separate times. I've tried the something like the following but can't get the resulting 7x10 output I'm looking for. This produces a 1x7 vector but I can't figure out to get the other 9 vectors
x <- runif(7, 0, 1)
for(i in 1:10){
samp <- sample(x, size = length(x), replace = T)
}
This is a very convenient way to do this:
replicate(10,sample(x,length(x),replace = TRUE))
Since you seem to want to sample with replacement, you can just get the 7*10 samples at once (which is more efficient for large sizes):
x <- runif(7)
n <- 10
xn <- length(x)
matrix(x[sample.int(xn, xn*n, replace=TRUE)], nrow=xn)
# Or slightly shorter:
matrix(sample(x, length(x)*n, replace=TRUE), ncol=n)
The second version uses sample directly, but there are some issues with that: if x is a numeric of length 1, bad things happen. sample.int is safer.
x <- c(pi, -pi)
sample(x, 5, replace=T) # OK
x <- pi
sample(x, 5, replace=T) # OOPS, interpreted as 1:3 instead of pi...
Looks like you got a suitable answer, but here's an approach that's similar to your first attempt. The difference is that we define samp with the appropriate dimensions, and then iteratively index into that object and fill it one row at a time:
samp <- matrix(NA, ncol = 7, nrow = 10)
for(i in 1:10){
samp[i,] <- sample(x, size = length(x), replace = T)
}