Passing mclapply() a parameter from for (i in range)

Passing mclapply() a parameter from for (i in range) - r

I'm trying to do this:
nmf.sub <- function(n){
sub.data.matrix <- data.matrix[, (index[n, ])] ## the index is a permutation of the original matrix at a 0.8 resampling proportion (doesn't really matter)
temp.result <- nmf(sub.data.matrix, rank = 2, seed = 12345) ## want to change 2 to i
return(temp.result)
}
class.list <- list()
for (i in nmf.rank){ ## nmf.rank is 2:4
results.list <- mclapply(mc.cores = 16, 1:resamp.iterations, function(n) nmf.sub(n)) ## resamp.iterations is 10, nmf.sub is defined above
}
But instead of having rank = 2 in the nmf for temp.result, I want to have rank = i
Any idea how I could pass it that parameter? Just passing it through mclapply as function(n, i) doesn't work.

You seemingly have two loops: one for i in nmf.rank and one for n in 1:resamp.iterations. Therefore, you need to pass both i and n to nmf.sub e.g. like in:
nmf.sub <- function(n, i){
## the index is a permutation of the original matrix at a 0.8
## resampling proportion (doesn't really matter)
sub.data.matrix <- data.matrix[, (index[n, ])]
## want to change 2 to i
temp.result <- nmf(sub.data.matrix, rank = i, seed = 12345)
return(temp.result)
}
resamp.iterations <- 10
nmf.rank <- 2:4
res <- lapply(nmf.rank, function(i){
results.list <- mclapply(mc.cores = 16, 1:resamp.iterations,
function(n) nmf.sub(n,i))
})
## then you can flatten/reshape res
Regarding your comment (below) about efficiency: the bulk of the numerical calculations is performed within the nmf() function, therefore the loop is properly set up, in the sense that each process/core gets a numerically intensive job. However, to speed up computation you might consider using the previously computed result, instead of the seed 12345 (unless using the latter seed is mandatory for some reason related to your problem). In the following example I get a 30-40% reduction in execution time:
library(NMF)
RNGkind("L'Ecuyer-CMRG") ## always use this when using mclapply()
nr <- 19
nc <- 2e2
set.seed(123)
data.matrix <- matrix(rexp(nc*nr),nr,nc)
resamp.iterations <- 10
nmf.rank <- 2:4
index <- t(sapply(1:resamp.iterations, function(n) sample.int(nc,nc*0.8)))
nmf.sub <- function(n, i){
sub.data.matrix <- data.matrix[ ,index[n, ]]
temp.result <- nmf(sub.data.matrix, rank = i, seed = 12345)
return(temp.result)
}
## version 1
system.time({
res <- lapply(nmf.rank, function(i){
results.list <- mclapply(mc.cores = 16, 1:resamp.iterations,
function(n) nmf.sub(n,i))
})
})
## version 2: swap internal and external loops
system.time({
res <-
mclapply(mc.cores=16, 1:resamp.iterations, function(n){
res2 <- nmf(data.matrix[ ,index[n, ]], rank=2, seed = 12345)
res3 <- nmf(data.matrix[ ,index[n, ]], rank=3, seed = 12345)
res4 <- nmf(data.matrix[ ,index[n, ]], rank=4, seed = 12345)
list(res2,res3,res4)
})
})
## version 3: use previous calculation as starting point
## ==> 30-40% reduction in computing time
system.time({
res <-
mclapply(mc.cores=16, 1:resamp.iterations, function(n){
res2 <- nmf(data.matrix[ ,index[n, ]], rank=2, seed = 12345)
res3 <- nmf(data.matrix[ ,index[n, ]], rank=3, seed = res2)
res4 <- nmf(data.matrix[ ,index[n, ]], rank=4, seed = res3)
list(res2,res3,res4)
})
})

Related

R: performance issues when computing mutual information matrix with NAs

I realized that computing mutual information on a dataframe with NA using R's infotheo package does not yield errors but incorrect results. The problem is described in more detail here but while I now have a mathematically correct solution which only removes pairwise incomplete cases instead of across all columns the performance for large data sets it catastrophic. I guess it is the nested for loop which causes the long compute times, does anyone have an idea how to improve performance of the below code?
library(infotheo)
v1 <- c(1,2,3,4,5,NA,NA,NA,NA,NA)
v2 <- c(1,NA,3,NA,5,NA,7,NA,9,NA)
v3 <- c(NA,2,3,NA,NA,6,7,NA,7,NA)
v4 <- c(NA,NA,NA,NA,NA,6,7,8,9,10)
df <- cbind.data.frame(v1,v2,v3,v4)
ColPairMap<-function(df){
t <- data.frame(matrix(ncol = ncol(df), nrow = ncol(df)))
colnames(t) <- colnames(df)
rownames(t) <- colnames(df)
for (j in 1:ncol(df)) {
for (i in 1:ncol(df)) {
c(1:ncol(df))
if (nrow(df[complete.cases(df[,c(i,j)]),])>0) {
t[j,i] <- natstobits(mutinformation(df[complete.cases(df[,c(i,j)]),j], df[complete.cases(df[,c(i,j)]),i]))
} else {
t[j,i] <- 0
}
}
}
return(t)
}
ColPairMap(df)
Thanks in advance!

Twice the speed.
ColPairMap2 <- function(df){
t <- matrix(0, ncol = ncol(df), nrow = ncol(df),
dimnames = list(colnames(df), colnames(df)))
df <- as.matrix(df)
for (j in 1:ncol(df)) {
for (i in j:ncol(df)) {
compl_cases <- complete.cases(df[, c(i, j)])
if (sum(compl_cases) > 0) {
t[j,i] <- natstobits(mutinformation(df[compl_cases, j],
df[compl_cases, i]))
}
}
}
lt <- lower.tri(t)
t[lt] <- t[lt] + t(t)[lt]
t
}
all(ColPairMap(df) == ColPairMap2(df))
#[1] TRUE
Test the speed.
library(microbenchmark)
mb <- microbenchmark(
f1 = ColPairMap(df),
f2 = ColPairMap2(df)
)
print(mb, order = "median", unit = "relative")
#Unit: relative
# expr min lq mean median uq max neval cld
# f2 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 100 a
# f1 2.035973 2.01852 1.907398 2.008894 2.108486 0.569771 100 b

I found a tweak which is not helping for toy data sets as df above but for real world data sets, especially when executed on some proper H/W I've seen examples where it reduces a 2.5hrs compute time to 14min!
The code below is a complete copy&pastable exmple which incorporates Rui's solution using a nested for loop and building on this idea another solution using a nested 'foreach' loop parallelizing the task on 75% of the available cores.
You can control the size of the data set and consequently the compute time by adjusting n.
library(foreach)
library(parallel)
library(doParallel)
library(infotheo)
n <- 500 #creates an nXn matrix, the larger the more compute time is required
df <- (discretize(matrix(rnorm(4*n*n,n,n/10),ncol=n)))
## pairwise complete mutual information via nested for loop ##
start_for <- Sys.time()
ColPairMap<-function(df){
t <- data.frame(matrix(ncol = ncol(df), nrow = ncol(df)))
colnames(t) <- colnames(df)
rownames(t) <- colnames(df)
for (j in 1:ncol(df)) {
for (i in 1:ncol(df)) {
c(1:ncol(df))
if (nrow(df[complete.cases(df[,c(i,j)]),])>0) {
t[j,i] <- natstobits(mutinformation(df[complete.cases(df[,c(i,j)]),j], df[complete.cases(df[,c(i,j)]),i]))
} else {
t[j,i] <- 0
}
}
}
return(t)
}
ColPairMap(df)
end_for <- Sys.time()
end_for-start_for
## pairwise complete mutual information via nested foreach loop ##
start_foreach <- Sys.time()
ncl <- max(2,floor(detectCores()*0.75)) #number of cores
clst <- makeCluster(n=ncl,type="TERR") #create cluster
#e <- new.env() #new environment to export libraries to cores
#e$libs <- .libPaths()
#clusterExport(clst, "libs", envir=e) #export required packages to all cores
#clusterEvalQ(clst, .libPaths(libs)) #export required packages to all cores
clusterEvalQ(clst, { #export required packages to all cores
library(infotheo)
})
registerDoParallel(cl = clst) #register cluster
t <- foreach (j=1:ncol(df), .combine="c") %:% #parallellized nested loop for computing normalized pairwise complete MI between all columns
foreach (i=j:ncol(df), .combine="c", .packages="infotheo") %dopar% {
combine="c"
compl_cases <- complete.cases(df[,c(i,j)])
if (sum(compl_cases) > 0) {
natstobits(mutinformation(df[compl_cases,][,j], df[compl_cases,][,i]))
} else {
0
}
}
RCA_MI_Matrix <- matrix(0, ncol = ncol(df), nrow = ncol(df), dimnames = list(colnames(df), colnames(df))) #set-up empty matrix for MI values
RCA_MI_Matrix[lower.tri(RCA_MI_Matrix, diag=TRUE)] <- t #fill lower triangle with MI values from nested loop
RCA_MI_Matrix[upper.tri(RCA_MI_Matrix)] <- t(RCA_MI_Matrix)[upper.tri(RCA_MI_Matrix)] #mirror lower triangle of matrix into upper one
end_foreach <- Sys.time()
end_foreach-start_foreach
stopCluster(cl=clst) #stop cluster

Submitting a package to CRAN does not accept set.seed inside a function but do not know how to avoid it

I am submitting a package to CRAN, that identifies breaks in a time series, for this in some functions of the package I do Montecarlo simulations. In order to guarantee same result for the same input from the functions that perform Montecarlo simulations, I set a seed inside the function. The CRAN moderator tall me: "Please do not set a seed to a specific number within a function."
The problem is how to achieve same result with the same input if no seed is set. Here is an example to understand the problem, in which function2 set a seed inside and the result is always equal compare max2 and max4, instead funtion1 does the same but does not set seed and the result varies.
x <- c(1:100)
#Function without set.seed
function1 <- function(x,simulations = 100){
mn <- mean(x)
sd <- sd(x)
max_vect <- vector(mode = 'double',length = simulations)
for(i in 1:simulations){
x_aux <- rnorm(n = length(x),mean = mn,sd = sd)
max_vect[i] <- max(x_aux)
}
return(mean(max_vect))
}
#Function that set.seed
function2 <- function(x,simulations = 100){
mn <- mean(x)
sd <- sd(x)
max_vect <- vector(mode = 'double',length = simulations)
set.seed(1234)
for(i in 1:simulations){
x_aux <- rnorm(n = length(x),mean = mn,sd = sd)
max_vect[i] <- max(x_aux)
}
return(mean(max_vect))
}
max1 <- function1(x)
max2 <- function2(x)
max3 <- function1(x)
max4 <- function2(x)

Agree with comments. Do this
myFunction <-function (x, y,z, seed = NULL) {
if (length(seed) ) set.seed(seed)
# the function guts
}

Speeding up linear model fitting on complete pairwise observations in large sparse matrix in R

I have a numeric data.frame df with 134946 rows x 1938 columns.
99.82% of the data are NA.
For each pair of (distinct) columns "P1" and "P2", I need to find which rows have non-NA values for both and then do some operations on those rows (linear model).
I wrote a script that does this, but it seems quite slow.
This post seems to discuss a related task, but I can't immediately see if or how it can be adapted to my case.
Borrowing the example from that post:
set.seed(54321)
nr = 1000;
nc = 900;
dat = matrix(runif(nr*nc), nrow=nr)
rownames(dat) = paste(1:nr)
colnames(dat) = paste("time", 1:nc)
dat[sample(nr*nc, nr*nc*0.9)] = NA
df <- as.data.frame(dat)
df_ps <- names(df)
N_ps <- length(df_ps)
My script is:
tic = proc.time()
out <- do.call(rbind,sapply(1:(N_ps-1), function(i) {
if (i/10 == floor(i/10)) {
cat("\ni = ",i,"\n")
toc = proc.time();
show(toc-tic);
}
do.call(rbind,sapply((i+1):N_ps, function(j) {
w <- which(complete.cases(df[,i],df[,j]))
N <- length(w)
if (N >= 5) {
xw <- df[w,i]
yw <- df[w,j]
if ((diff(range(xw)) != 0) & (diff(range(yw)) != 0)) {
s <- summary(lm(yw~xw))
o <- c(i,j,N,s$adj.r.squared,s$coefficients[2],s$coefficients[4],s$coefficients[8],s$coefficients[1],s$coefficients[3],s$coefficients[7])} else {
o <- c(i,j,N,rep(NA,7))
}
} else {o <- NULL}
return(o)
},simplify=F))
}
,simplify=F))
toc = proc.time();
show(toc-tic);
This takes about 10 minutes on my machine.
You can imagine what happens when I need to handle a much larger (although more sparse) data matrix. I never managed to finish the calculation.
Question: do you think this could be done more efficiently?
The thing is I don't know which operations take more time (subsetting of df, in which case I would remove duplications of that? appending matrix data, in which case I would create a flat vector and then convert it to matrix at the end? ...).
Thanks!
EDIT following up from minem's post
As shown by minem, the speed of this calculation strongly depended on the way linear regression parameters were calculated. Therefore changing that part was the single most important thing to do.
My own further trials showed that: 1) it was essential to use sapply in combination with do.call(rbind, rather than any flat vector, to store the data (I am still not sure why - I might make a separate post about this); 2) on the original matrix I am working on, much more sparse and with a much larger nrows/ncolumns ratio than the one in this example, using the information on the x vector available at the start of each i iteration to reduce the y vector at the start of each j iteration increased the speed by several orders of magnitude, even compared with minem's original script, which was already much better than mine above.
I suppose the advantage comes from filtering out many rows a priori, thus avoiding costly xna & yna operations on very long vectors.
The modified script is the following:
set.seed(54321)
nr = 1000;
nc = 900;
dat = matrix(runif(nr*nc), nrow = nr)
rownames(dat) = paste(1:nr)
colnames(dat) = paste("time", 1:nc)
dat[sample(nr*nc, nr*nc*0.90)] = NA
df <- as.data.frame(dat)
df_ps <- names(df)
N_ps <- length(df_ps)
tic = proc.time()
naIds <- lapply(df, function(x) !is.na(x))
dl <- as.list(df)
rl <- sapply(1:(N_ps - 1), function(i) {
if ((i-1)/10 == floor((i-1)/10)) {
cat("\ni = ",i,"\n")
toc = proc.time();
show(toc-tic);
}
x <- dl[[i]]
xna <- which(naIds[[i]])
rl2 <- sapply((i + 1):N_ps, function(j) {
y <- dl[[j]][xna]
yna <- which(naIds[[j]][xna])
w <- xna[yna]
N <- length(w)
if (N >= 5) {
xw <- x[w]
yw <- y[yna]
if ((min(xw) != max(xw)) && (min(yw) != max(yw))) {
# extracts from lm/lm.fit/summary.lm functions
X <- cbind(1L, xw)
m <- .lm.fit(X, yw)
# calculate adj.r.squared
fitted <- yw - m$residuals
rss <- sum(m$residuals^2)
mss <- sum((fitted - mean(fitted))^2)
n <- length(m$residuals)
rdf <- n - m$rank
# rdf <- df.residual
r.squared <- mss/(mss + rss)
adj.r.squared <- 1 - (1 - r.squared) * ((n - 1L)/rdf)
# calculate se & pvals
p1 <- 1L:m$rank
Qr <- m$qr
R <- chol2inv(Qr[p1, p1, drop = FALSE])
resvar <- rss/rdf
se <- sqrt(diag(R) * resvar)
est <- m$coefficients[m$pivot[p1]]
tval <- est/se
pvals <- 2 * pt(abs(tval), rdf, lower.tail = FALSE)
res <- c(m$coefficients[2], se[2], pvals[2],
m$coefficients[1], se[1], pvals[1])
o <- c(i, j, N, adj.r.squared, res)
} else {
o <- c(i,j,N,rep(NA,7))
}
} else {o <- NULL}
return(o)
}, simplify = F)
do.call(rbind, rl2)
}, simplify = F)
out2 <- do.call(rbind, rl)
toc = proc.time();
show(toc - tic)
E.g. try with nr=100000; nc=100.
I should probably mention that I tried using indices, i.e.:
naIds <- lapply(df, function(x) which(!is.na(x)))
and then obviously generating w by intersection:
w <- intersect(xna,yna)
N <- length(w)
This however is slower than the above.

Larges bottleneck is lm function, because there are lot of checks & additional calculations, that you do not necessarily need. So I extracted only the needed parts.
I got this to run in +/- 18 seconds.
set.seed(54321)
nr = 1000;
nc = 900;
dat = matrix(runif(nr*nc), nrow = nr)
rownames(dat) = paste(1:nr)
colnames(dat) = paste("time", 1:nc)
dat[sample(nr*nc, nr*nc*0.9)] = NA
df <- as.data.frame(dat)
df_ps <- names(df)
N_ps <- length(df_ps)
tic = proc.time()
naIds <- lapply(df, function(x) !is.na(x)) # outside loop
dl <- as.list(df) # sub-setting list elements is faster that columns
rl <- sapply(1:(N_ps - 1), function(i) {
x <- dl[[i]]
xna <- naIds[[i]] # relevant logical vector if not empty elements
rl2 <- sapply((i + 1):N_ps, function(j) {
y <- dl[[j]]
yna <- naIds[[j]]
w <- xna & yna
N <- sum(w)
if (N >= 5) {
xw <- x[w]
yw <- y[w]
if ((min(xw) != max(xw)) && (min(xw) != max(xw))) { # faster
# extracts from lm/lm.fit/summary.lm functions
X <- cbind(1L, xw)
m <- .lm.fit(X, yw)
# calculate adj.r.squared
fitted <- yw - m$residuals
rss <- sum(m$residuals^2)
mss <- sum((fitted - mean(fitted))^2)
n <- length(m$residuals)
rdf <- n - m$rank
# rdf <- df.residual
r.squared <- mss/(mss + rss)
adj.r.squared <- 1 - (1 - r.squared) * ((n - 1L)/rdf)
# calculate se & pvals
p1 <- 1L:m$rank
Qr <- m$qr
R <- chol2inv(Qr[p1, p1, drop = FALSE])
resvar <- rss/rdf
se <- sqrt(diag(R) * resvar)
est <- m$coefficients[m$pivot[p1]]
tval <- est/se
pvals <- 2 * pt(abs(tval), rdf, lower.tail = FALSE)
res <- c(m$coefficients[2], se[2], pvals[2],
m$coefficients[1], se[1], pvals[1])
o <- c(i, j, N, adj.r.squared, res)
} else {
o <- c(i,j,N,rep(NA,6))
}
} else {o <- NULL}
return(o)
}, simplify = F)
do.call(rbind, rl2)
}, simplify = F)
out2 <- do.call(rbind, rl)
toc = proc.time();
show(toc - tic);
# user system elapsed
# 17.94 0.11 18.44

Manual simulation of Markov Chain in R (3)

I have tried to improve my previous code so that I can incorporate conditional probability.
Source Code
states <- c(1, 2)
alpha <- c(1, 1)/2
mat <- matrix(c(0.5, 0.5,
0, 1), nrow = 2, ncol = 2, byrow = TRUE)
# this function calculates the next state, if present state is given.
# X = present states
# pMat = probability matrix
nextX <- function(X, pMat)
{
#set.seed(1)
probVec <- vector() # initialize vector
if(X == states[1]) # if the present state is 1
{
probVec <- pMat[1,] # take the 1st row
}
if(X==states[2]) # if the prsent state is 2
{
probVec <- pMat[2,] # take the 2nd row
}
return(sample(states, 1, replace=TRUE, prob=probVec)) # calculate the next state
}
# this function simulates 5 steps
steps <- function(alpha1, mat1, n1)
{
vec <- vector(mode="numeric", length = n1+1) # initialize an empty vector
X <- sample(states, 1, replace=TRUE, prob=alpha1) # initial state
vec[1] <- X
for (i in 2:(n1+1))
{
X <- nextX(X, mat1)
vec[i] <- X
}
return (vec)
}
# this function repeats the simulation n1 times.
# steps(alpha1=alpha, mat1=mat, n1=5)
simulate <- function(alpha1, mat1, n1)
{
mattt <- matrix(nrow=n1, ncol=6, byrow=T);
for (i in 1:(n1))
{
temp <- steps(alpha1, mat1, 5)
mattt[i,] <- temp
}
return (mattt)
}
Execution
I created this function so that it can handle any conditional probability:
prob <- function(simMat, fromStep, toStep, fromState, toState)
{
mean(simMat[toStep+1, simMat[fromStep+1, ]==fromState]==toState)
}
sim <- simulate(alpha, mat, 10)
p <- prob(sim, 0,1,1,1) # P(X1=1|X0=1)
p
Output
NaN
Why is this source code giving NaN?
How can I correct it?

I didn't inspect the rest of your code, but it seems that only prob has a mistake; you are mixing up rows with columns and instead it should be
prob <- function(simMat, fromStep, toStep, fromState, toState)
mean(simMat[simMat[, fromStep + 1] == fromState, toStep + 1] == toState)
Then NaN still remains a valid possibility for the following reason. We are looking at a conditional probability P(X1=1|X0=1) which, by definition, is well defined only when P(X0=1)>0. The same holds with sample estimates: if there are no cases where X0=1, then the "denominator" in the mean inside of prob is zero. Thus, it cannot and should not be fixed (i.e., returning 0 in those cases would be wrong).

How can I save randomly generated train and test datasets?

I am using a for loop to generate 100 different train and test sets.
What I want to do now, is to save these 100 different train and test sets in order to be able to have a look at e.g. where iteration was 17.
This code shows my program with the for loop and the division into train and test set:
result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
for (g in 1:100 )
{
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
REST OF MY CODE
# Calculate some statistics
overall <- cm$overall
overall.accuracy <- format(overall['Accuracy'] * 100, nsmall =2, digits = 2)
overall.sensitivity <- format(cm$byClass['Sensitivity']* 100, nsmall =2, digits = 2)
overall.specificity <- format(cm$byClass['Specificity']* 100, nsmall =2, digits = 2)
result_df[g,1] <- overall.accuracy
result_df[g,2] <- overall.sensitivity
result_df[g,3] <- overall.specificity
}
How can I do this?

You could do the following, for example, saving each test and train sets as elements in a list:
result_df<-matrix(ncol=3,nrow=100)
colnames(result_df)<-c("Acc","Sens","Spec")
testlist <- list()
trainlist <- list()
for (g in 1:100 )
{
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
trainlist[[g]] <- train
testlist[[g]] <- test
}
EDIT
To retrieve the 7th element of these lists you could use trainlist[[7]]

You can save those in csv file by using the following method
write.csv(train, file = paste0("train-", Sys.time(), ".csv", sep=""))
write.csv(test, file = paste0("test-", Sys.time(), ".csv", sep=""))

One option could be to save the row indexes of your partitions, rather than saving all the datasets, and then select the rows indexes for the iteration you're interested in.
The caret package has a function called createDataPartition, which will do this for you:
library(caret)
df <- data.frame(col1 = rnorm(100), col2 = rnorm(100))
# create 100 partitions
train.idxs <- createDataPartition(1:nrow(df), times = 100, p = 0.8)
for(i in 1:length(train.idxs)) {
# create train and test sets
idx <- train.idxs[[i]]
train.df <- df[idx, ]
test.df <- df[-idx, ]
# calculate statistics ...
result_df[i,1] <- overall.accuracy
result_df[i,2] <- overall.sensitivity
result_df[i,3] <- overall.specificity
}
# check the datasets for the nth partition
# train set
df[train.idxs[[n]], ]
# test set
df[-train.idxs[[n]], ]

Put your code in a function and do a lapply():
result_df <- matrix(ncol=3, nrow=100)
colnames(result_df)<-c("Acc", "Sens", "Spec")
SIMg <- function(g) {
# Divide into Train and test set
smp_size <- floor(0.8 * nrow(mydata1))
train_ind <- sample(seq_len(nrow(mydata1)), size = smp_size)
train <- mydata1[train_ind, ]
test <- mydata1[-train_ind, ]
REST OF THE CODE
return(list(train=train, test=test, ...))
}
L <- lapply(1:100, SIMg)
The resulting list L has 100 elements, each element is a list containing the two dataframes and your results for one simulation run.
To get separate lists trainlist and testlist you can do:
trainlist <- lallpy(L, '[[', "train")
testlist <- lallpy(L, '[[', "test")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Passing mclapply() a parameter from for (i in range) - r

Related

R: performance issues when computing mutual information matrix with NAs

Submitting a package to CRAN does not accept set.seed inside a function but do not know how to avoid it

Speeding up linear model fitting on complete pairwise observations in large sparse matrix in R

Manual simulation of Markov Chain in R (3)

How can I save randomly generated train and test datasets?

Categories

Resources