generate m-matrices of differnet lengths using foreach command - r

Take this simple worked example with dummy data:
ab <- c(1:500)
cd <- sample(1:100, 500, replace = T)
ef <- sample(1:10, 500, replace = T)
df1 <- data.frame(ab, cd, ef)
m <- 4
Now I want to use the foreach command to generate m matrices
Each matrix will vary by length using:
#size1 <- sample(50:60, 1)
#indices <- sample(1:500, size1)
#df2 <- df1[indices,]
I have not sure if how to generate the different matrices with the foreach command
Result = foreach(i=1:m,.combine=matrix(df2)) %do%{
size1 <- sample(50:60, 1)
indices <- sample(1:500, size1)
df2 <- df1[indices,]
}

The default of foreach is to save a list. The following code saves a list of matrices of different dimensions.
Result <- foreach(i=1:5) %do%{
# randomly select number of rows and columns
random.rows <- sample(1:5, 1)
random.columns <- sample(1:5, 1)
# generate matrix out of this
matrix(sample(1:100, random.rows*random.columns), random.rows)
}
The Result object is a list of length 5 with matrices whose sizes vary between 1X1 and 5X5.

Related

How to run function on indivisual columns instead of data frame?

Hello everyone I have two data frame trying to do bootstrapping with below script1 in my script1 i am taking number of rows from data frame one and two. Instead of taking rows number from entire data frame I wanted split individual columns as a data frame and remove the zero values and than take the row number than do the bootstrapping using below script. So trying with script2 where I am creating individual data frame from for loop as I am new to R bit confused how efficiently do add the script1 function to it
please suggest me below I am providing script which is running script1 and the script2 I am trying to subset each columns creating a individual data frame
Script1
set.seed(2)
m1 <- matrix(sample(c(0, 1:10), 100, replace = TRUE), 10)
m2 <- matrix(sample(c(0, 1:5), 50, replace = TRUE), 5)
m1 <- as.data.frame(m1)
m2 <- as.data.frame(m2)
nboot <- 1e3
n_m1 <- nrow(m1); n_m2 <- nrow(m2)
temp<- c()
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
value <- colSums(m2)/colSums(m1[boot,])
temp <- rbind(temp, value)
}
boot_data<- apply(temp, 2, median)
script2
for (i in colnames(m1)){
m1_subset=(m1[m1[[i]] > 0, ])
m1_subset=m1_subset[i]
m2_subset=m2[m2[[i]] >0, ]
m2_subset=m2_subset[i]
num_m1 <- nrow(m1_subset); n_m2 <- nrow(m2_subset)# after this wanted add above script changing input
}
If I understand correctly, you want to do the sampling and calculation on each column individually, after removing the 0 values. I. modified your code to work on a single vector instead of a dataframe (i.e., using length() instead of nrow() and sum() instead of colSums(). I also suggest creating the empty matrix for your results ahead of time, and filling in -- it will be fasted.
temp <- matrix(nrow = nboot, ncol = ncol(m1))
for (i in seq_along(m1)){
m1_subset = m1[m1[,i] > 0, i]
m2_subset = m2[m2[,i] > 0, i]
n_m1 <- length(m1_subset); n_m2 <- length(m2_subset)
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
temp[j, i] <- sum(m2_subset)/sum(m1_subset[boot])
}
}
boot_data <- apply(temp, 2, median)
boot_data <- setNames(data.frame(t(boot_data)), names(m1))
boot_data

Is it possible to generate a sample of 10 from a population and repeat that sampling 1000 times?

I am trying to randomly sample 10 individuals from a population and repeat 1000 times. Is this possible? Here is my code so far and I am not quite sure if I am on the right track. I keep receiving the error "number of items to replace is not a multiple of replacement length".
Here is my code:
B<-1000
for (i in 1:B){
FR3_Acropora_Sample[i]<-(sample(FR3_Acropora$Ratio,size=10,replace=TRUE))
}
Consider replicate (wrapper to sapply):
# MATRIX
sample_matrix <- replicate(B, sample(FR3_Acropora$Ratio, size=10, replace=TRUE))
# LIST
sample_list <- replicate(B, sample(FR3_Acropora$Ratio, size=10, replace=TRUE),
simplify = FALSE)
I believe you can accomplish this as follows. I create a sample dataset of numbers 1 through 50 - you'll skip this step of course. I initialize a vector of lists with a length of 100. I loop from 1 to 100 and choose a random sample to assign to each empty space in my vector. I can then access any sample with sampleList[[x]] where x is any number 1 to 100.
x <- c(1:50)
sampleList <- vector(mode="list", length=100)
for (i in 1:100) {
sampleList[[i]] = sample(x, size = 10, replace = TRUE)
}
Using your variable names, this would look like:
B<-1000
FR3_Acropora_Sample <- vector(mode="list", length=1000)
for (i in 1:B){
FR3_Acropora_Sample[[i]]=sample(FR3_Acropora$Ratio,size=10,replace=TRUE)
}

Eliminate for-loop through functional programming

I would like to take input from a dataframe with a systemName variable and a popNum variable and use this to generate a named list of vectors whose elements are the random numbers (1-6)*5 ie (5, 10, 15, 20, 25, 30) where the vector length is equal to the popNum of the system.
The following code works:
## Data
#Create a vector of integers
popNum <- c (2,5,3,9)
#Create corresponding names
systemNames <- c("Ruie", "Regina", "Roupe", "Efate")
# form up into a recatangular data frame
dataSource <- cbind.data.frame(systemNames,popNum )
## Create and Fill the List
#initialise the list
availableCargoes <- vector( mode = "list", length = nrow(dataSource))
#name the list
names(availableCargoes) <- dataSource$systemNames
#fill the list
for (loopCounter in 1:nrow(dataSource)) {
availableCargoes[[loopCounter]] <- sample.int( n = 6,
size = dataSource$popNum[loopCounter],
replace = TRUE) * 5
}
How can I get rid of the for-loop through something from the apply family or the purrr package? The problem I am having a hard time resolving is what is the X that the lapply runs the sample.int over? How do I pass the vector of popNum as an argument to control the size of the resulting vectors?
Use lapply to loop directly through dataSource$popNum.
Note that I set the RNG seed to make the results reproducible.
set.seed(1234)
for (loopCounter in 1:nrow(dataSource)) {
availableCargoes[[loopCounter]] <- sample.int( n = 6,
size = dataSource$popNum[loopCounter],
replace = TRUE) * 5
}
set.seed(1234)
ac <- lapply(dataSource$popNum, function(s)
sample.int(n = 6, size = s, replace = TRUE)*5)
names(ac) <- dataSource$systemNames
ac
identical(availableCargoes, ac)
#[1] TRUE
sapply version
## Data
#Create a vector of integers
popNum <- c (2,5,3,9)
#Create corresponding names
systemNames <- c("Ruie", "Regina", "Roupe", "Efate")
# form up into a recatangular data frame
dataSource <- cbind.data.frame(systemNames,popNum )
## Create and Fill the List
#initialise the list
availableCargoes <- vector( mode = "list", length = nrow(dataSource))
#name the list
names(availableCargoes) <- dataSource$systemNames
#fill the list
availableCargoes <- sapply(as.character(dataSource$systemNames),function(sysname){
sample.int( n = 6,
size = dataSource$popNum[dataSource$systemNames==sysname],
replace = TRUE) * 5
},USE.NAMES=T,simplify = F)

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Creating empty rows on matrix of various size

I am trying to generate a matrix with the following data, Is there any way to create empty rows to make matrix the same size?
#Generating original data
n <- c(12,24)
mu <- c(6.573,6.5)
sigma <- sqrt(0.25)
Diseased.Data <- round(rnorm(n[1],mu[1],sigma),4)
Healthy.Data <- round(rnorm(n[2],mu[2],sigma),4)
g <- c(2,3,4)
cstar.pool <- (mu[1]+mu[2])/2
#generating pooled data
for(i in 1:3){
assign(paste("pool.dis.data",i,sep = ""),replicate(n[1]/g[i],mean(sample(Diseased.Data,g[i]))))
assign(paste("pool.hel.data",i,sep = ""),replicate(n[2]/g[i],mean(sample(Healthy.Data,g[i]))))
}
#generating the pooled diseased data matrix
dis.mat1<- matrix(data = pool.dis.data1,length(pool.dis.data1),1)
dis.mat2 <- matrix(data = pool.dis.data2,length(pool.dis.data2),1)
dis.mat3 <- matrix(data = pool.dis.data3,length(pool.dis.data3),1)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.matrix <- matrix(NA, max(length(pool.dis.data1),length(pool.dis.data2),length(pool.dis.data3)),3)
dis.matrix[,1] <- cbind(dis.mat1)
dis.matrix[,2] <- cbind(dis.mat2)
dis.matrix[,3] <- cbind(dis.mat3)
I'd say your best bet is to start out with an empty matrix of the size you need. You can tell matrix to specify the dimensions on creation like so:
new <- matrix( data = NA, nrow = 10, ncol = 20 )
So you just need to create a value for each dimension, based on your input data:
num.rows <- max( length(n), length(mu), ... )
num.columns <- [ I'd just enter a numeric value here ]
new <- matrix( data = NA, nrow = num.rows, ncol = num.columns )
Then you can fill the columns as needed, making sure to leave any excess empty. For example:
new[(1:length(n)),3] <- n
The "1:length(n)" part there will tell R to stop filling the column once the values you've given it have been entered. Otherwise R will continue filling, and you'll get repeated values, which I'm guessing you don't want.

Resources