Eliminate for-loop through functional programming - r

I would like to take input from a dataframe with a systemName variable and a popNum variable and use this to generate a named list of vectors whose elements are the random numbers (1-6)*5 ie (5, 10, 15, 20, 25, 30) where the vector length is equal to the popNum of the system.
The following code works:
## Data
#Create a vector of integers
popNum <- c (2,5,3,9)
#Create corresponding names
systemNames <- c("Ruie", "Regina", "Roupe", "Efate")
# form up into a recatangular data frame
dataSource <- cbind.data.frame(systemNames,popNum )
## Create and Fill the List
#initialise the list
availableCargoes <- vector( mode = "list", length = nrow(dataSource))
#name the list
names(availableCargoes) <- dataSource$systemNames
#fill the list
for (loopCounter in 1:nrow(dataSource)) {
availableCargoes[[loopCounter]] <- sample.int( n = 6,
size = dataSource$popNum[loopCounter],
replace = TRUE) * 5
}
How can I get rid of the for-loop through something from the apply family or the purrr package? The problem I am having a hard time resolving is what is the X that the lapply runs the sample.int over? How do I pass the vector of popNum as an argument to control the size of the resulting vectors?

Use lapply to loop directly through dataSource$popNum.
Note that I set the RNG seed to make the results reproducible.
set.seed(1234)
for (loopCounter in 1:nrow(dataSource)) {
availableCargoes[[loopCounter]] <- sample.int( n = 6,
size = dataSource$popNum[loopCounter],
replace = TRUE) * 5
}
set.seed(1234)
ac <- lapply(dataSource$popNum, function(s)
sample.int(n = 6, size = s, replace = TRUE)*5)
names(ac) <- dataSource$systemNames
ac
identical(availableCargoes, ac)
#[1] TRUE

sapply version
## Data
#Create a vector of integers
popNum <- c (2,5,3,9)
#Create corresponding names
systemNames <- c("Ruie", "Regina", "Roupe", "Efate")
# form up into a recatangular data frame
dataSource <- cbind.data.frame(systemNames,popNum )
## Create and Fill the List
#initialise the list
availableCargoes <- vector( mode = "list", length = nrow(dataSource))
#name the list
names(availableCargoes) <- dataSource$systemNames
#fill the list
availableCargoes <- sapply(as.character(dataSource$systemNames),function(sysname){
sample.int( n = 6,
size = dataSource$popNum[dataSource$systemNames==sysname],
replace = TRUE) * 5
},USE.NAMES=T,simplify = F)

Related

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Call a function and provide output in a matrix

I have a function in R which I call
RS1 = t(cbind(Data[,18], Data[,20]))
RS2 = t(cbind(Data[,19], Data[,21]))
p = t(Data[23:24])
rand_x <- function (p, x) {
n.goods <- dim (p)[1]
n.obs <- dim (p)[2]
xRC = NaN*matrix(1, n.goods, n.obs)
for(i in 1:n.obs) {
xRC[1,i] <- RS1[1,i] + RS1[2,i]
xRC[2,i] <- RS2[1,i] + RS2[2,i]
}
result <- xRC
return(result)
}
This function by having these two inputs generates a vector (2x50) with some random numbers. I want to call this function rand_x 1000 times and derive 1000 matrices and then bind the results in a final matrix. I have tried to create a loop to sort this problem but I am still struggling. Any help will be much appreciated.
If you intend to add each element of column 18 to 20 (that is what your code does), try using rowSums().
Try:
xRC <- rbind(
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
The output will be a matrix.
I do not see, where randomness appears in your function though. If you just want a 2x50 matrix with random numbers you may want to use:
xRC <- matrix (rnorm(50*2), 2) # for standard-normal generated numbers
xRC <- matrix (sample(1:100, replace = T, size = 100), 2) # for numbers between 1 and 100, uniformly distributed
To do this 1000 times, try:
for (i in 1:1000) {
rbind(xRC,
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
}
# or if you just want to generate random numbers, performance is way faster when you use:
xRC <- matrix(rnorm(1000 * 2 * 50), ncol = 50)

Creating empty rows on matrix of various size

I am trying to generate a matrix with the following data, Is there any way to create empty rows to make matrix the same size?
#Generating original data
n <- c(12,24)
mu <- c(6.573,6.5)
sigma <- sqrt(0.25)
Diseased.Data <- round(rnorm(n[1],mu[1],sigma),4)
Healthy.Data <- round(rnorm(n[2],mu[2],sigma),4)
g <- c(2,3,4)
cstar.pool <- (mu[1]+mu[2])/2
#generating pooled data
for(i in 1:3){
assign(paste("pool.dis.data",i,sep = ""),replicate(n[1]/g[i],mean(sample(Diseased.Data,g[i]))))
assign(paste("pool.hel.data",i,sep = ""),replicate(n[2]/g[i],mean(sample(Healthy.Data,g[i]))))
}
#generating the pooled diseased data matrix
dis.mat1<- matrix(data = pool.dis.data1,length(pool.dis.data1),1)
dis.mat2 <- matrix(data = pool.dis.data2,length(pool.dis.data2),1)
dis.mat3 <- matrix(data = pool.dis.data3,length(pool.dis.data3),1)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.matrix <- matrix(NA, max(length(pool.dis.data1),length(pool.dis.data2),length(pool.dis.data3)),3)
dis.matrix[,1] <- cbind(dis.mat1)
dis.matrix[,2] <- cbind(dis.mat2)
dis.matrix[,3] <- cbind(dis.mat3)
I'd say your best bet is to start out with an empty matrix of the size you need. You can tell matrix to specify the dimensions on creation like so:
new <- matrix( data = NA, nrow = 10, ncol = 20 )
So you just need to create a value for each dimension, based on your input data:
num.rows <- max( length(n), length(mu), ... )
num.columns <- [ I'd just enter a numeric value here ]
new <- matrix( data = NA, nrow = num.rows, ncol = num.columns )
Then you can fill the columns as needed, making sure to leave any excess empty. For example:
new[(1:length(n)),3] <- n
The "1:length(n)" part there will tell R to stop filling the column once the values you've given it have been entered. Otherwise R will continue filling, and you'll get repeated values, which I'm guessing you don't want.

MHSMM package R input data format with multiple variables

my problem is similar to the question as followingthe problem of R-input Format
I have tried the above code in the above link and revised some part to suit my data. my data is like follow
I want my data can be created as a data frame with 4 variable vectors. The code what I have revised is
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
The output observation is not the data of 4th col, it's a list of (4, 8, 12,...,396, 1, 1, ..., 56, 192,...,6550, 68, NA, NA,...) It has picked up 1/4 data of each col. Why it is like this?
Thank you very much!!!!
Why don't you simply count yout observations by Id, and create the hsmm.data object directly? Supposing yout dataframe is called "data", we have:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"
Extracted from http://www.jstatsoft.org/v39/i04/paper

Filling in a matrix from a list of row,column,value

I have a data frame that contains a list of row positions, column positions and values, like so:
combs <- as.data.frame(t(combn(1:10,2)))
colnames(combs) <- c('row','column')
combs$value <- rnorm(nrow(combs))
I would like to fill in a matrix with these values such that every value appears in the matrix in exactly the position specified by row and column. I suppose I could do this manually
mat <- matrix(nrow=10,ncol=10)
for(i in 1:nrow(combs)) {
mat[combs[i,'row'],combs[i,'column']] <- combs[i,'value']
}
But surely there is a more elegant way to accomplish this in R?
Like this:
mat <- matrix(nrow = 10, ncol = 10)
mat[cbind(combs$row, combs$column)] <- combs$value
You could also consider building a sparse matrix using the Matrix package:
library(Matrix)
mat <- sparseMatrix(i = combs$row, j = combs$column, x = combs$value,
dims = c(10, 10))

Resources