I have been trying to use am R function called ipsi, which takes arguments (a, y, id, time, x.trt, x.out, delta.seq, nsplits) Originally, the components of the arguments were in one dataframe (except for delta.seq and nsplits which are coded later), but my understanding is I needed to put them in separate lists, and in the case of x.trt and x.out, matrices. This function is very easy to run on one of each argument, but since I multiply imputed the dataframe 30 times before splitting it up into different elements to be taken as ipsi arguments, I now want to iterate over the set of elements 30 times as if there were 30 dataframes. Additionally, I want to parallelize to optimize my computing power.
I have just expanded the npcausal example:
n <- 500
T <- 4
time <- rep(1:T, n)
time <- list(time,time,time,time,time,time,time,time,time,time,time,time,time,time,time,
time,time,time,time,time,time,time,time,time,time,time,time,time,time,time)
id <- rep(1:n, rep(T, n))
id <- list(id,id,id,id,id,id,id,id,id,id,id,id,id,id,id,
id,id,id,id,id,id,id,id,id,id,id,id,id,id,id)
x.trt <- matrix(rnorm(n * T * 5), nrow = n * T)
x.trt <- list(x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,
x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt)
x.out <- matrix(rnorm(n * T * 5), nrow = n * T)
x.out <- list(x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,
x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out)
a <- rbinom(n * T, 1, .5)
a <- list(a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a)
y <- rnorm(mean=1,n)
y <- list(y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,
y,y,y,y,y,y,y,y,y,y,y,y,y,y,y)
d.seq <- seq(0.1, 5, length.out = 10)
d.seq <- list(d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,
d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq)
set.seed(500, kind = "L'Ecuyer-CMRG")
numcores <- future::availableCores()
cl <- parallel::makeCluster(numcores)
parallel::clusterEvalQ(cl, library(dplyr))
parallel::clusterEvalQ(cl, library(npcausal))
parallel::clusterExport(cl, "d.seq", envir = environment())
parallel::clusterEvalQ(cl, d.seq <- d.seq)
new_element <- parallel::parLapply(cl = cl, for(i in 1:30){
npcausal::ipsi(a = a[[i]],
y = y[[i]],
id = id[[i]],
time = time[[i]],
x.out = x.out[[i]],
x.trt = x.trt[[i]],
delta.seq = d.seq[[i]],
nsplits = 10)
})
This actually runs, but at the end of the process it gives me an error saying that the FUN was missing. I knew that already, but I have no FUN to call besides ipsi. Thanks for any help you can provide.
My suggestion is to first figure out how to do it with a regular base-R *apply function without worrying about parallelization. I suspect you can use mapply() for this, so something like (non confirmed):
res <- mapply(
a, y, id, time, xout, x,out, x.trt, d.seq,
FUN = function(a_i, y_i, id_i, time_i, xout_i, x,out_i, x.trt_i, d.seq_i) {
npcausal::ipsi(a = a_i, y = y_i, id = id_i, time = time_i,
x.out = x.out_i, x.trt = x.trt_i, delta.seq = d.seq_i,
nsplits = 10)
}
)
When you figured that part out, you can start thinking about parallelization.
(Disclaimer: I'm the author) If you get an mapply() solution to work, then the simplest would be to replace that as-is with future_mapply() of the future.apply package. That will parallelize on your local machine if you set plan(multisession).
I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should basically recursively produce this for n = 3:
binmat <- matrix(nrow = 8, ncol = 3)
r <- 0 # current row of binmat
for (i in 0:1) {
for (j in 0:1) {
for (k in 0:1) {
r <- r + 1
binmat[r,] <- c(i, j, k)}
}
}
binmat
It should just be a matrix as binmat is being filled recursively.
I quickly wrote this function to generate all N^K permutations of length K for given N characters. Hope it will be useful.
gen_perm <- function(str=c(""), lst=5, levels = c("0", "1", "2")){
if (nchar(str) == lst){
cat(str, "\n")
return(invisible(NULL))
}
for (i in levels){
gen_perm(str = paste0(str,i), lst=lst, levels=levels)
}
}
# sample call
gen_perm(lst = 3, levels = c("x", "T", "a"))
I will return to your problem when I get more time.
UPDATE
I modified the code above to work for your problem. Note that the matrix being populated lives in the global environment. The function also uses the tmp variable to pass rows to the global environment. This was the easiest way for me to solve the problem. Perhaps, there are other ways.
levels <- c(0,1)
nc <- 3
m <- matrix(numeric(0), ncol = nc)
gen_perm <- function(row=numeric(), lst=nc, levels = levels){
if (length(row) == lst){
assign("tmp", row, .GlobalEnv)
with(.GlobalEnv, {m <- rbind(m, tmp); rownames(m) <- NULL})
return(invisible(NULL))
}
for (i in levels){
gen_perm(row=c(row,i), lst=lst, levels=levels)
}
}
gen_perm(lst=nc, levels=levels)
UPDATE 2
To get the expected output you provided, run
m <- matrix(numeric(0), ncol = 3)
gen_perm(lst = 3, levels = c(0,1))
m
levels specifies a range of values to generate (binary in our case) to generate permutations, m is an empty matrix to fill up, gen_perm generates rows and adds them to the matrix m, lst is a length of the permutation (matches the number of columns in the matrix).
(Very) amateur coder and statistician working on a problem in R.
I have four integer lists: A, B, C, D.
A <- [1:133]
B <- [1:266]
C <- [1:266]
D <- [1:133, 267-400]
I want R to generate all of the permutations from picking 1 item from each of these lists (I know this code will take forever to run), and then take the mean of each of those permutations. So, for instance, [1, 100, 200, 400] -> 175.25.
Ideally what I would have at the end is a list of all of these means then.
Any ideas?
Here's how I'd do this for a smaller but similar problem:
A <- 1:13
B <- 1:26
C <- 1:26
D <- c(1:13, 27:40)
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
You'll probably crash R if you just up all the indices, but you could probably set up a loop, something like this (not tested):
B <- 1:266
C <- 1:266
D <- c(1:133, 267:400)
for(A in 1:133) {
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
write.table(mymat, file = paste("matrix", A, "txt", sep = "."))
write.table(mymeans, file = paste("means", A, "txt", sep = "."))
rm(mymat, mymeans)
}
to get them all. That still might be too big, in which case you could do a nested loop, or loop over D (since it's the biggest)
Alternatively,
n <- 1e7
A <- sample(133, size = n, replace= TRUE)
B <- sample(266, size = n, replace= TRUE)
C <- sample(266, size = n, replace= TRUE)
D <- sample(x = c(1:133, 267:400), size = n, replace= TRUE)
mymeans <- (A+B+C+D)/4
will give you a large sample of the means and take no time at all.
hist(mymeans)
Even creating a vector of means as large as your permutations will use up all of your memory. You will have to split this into smaller problems, look up writing objects to excel and then removing objects from memory here (both on SO).
As for the code to do this, I've tried to keep it as simple as possible so that it's easy to 'grow' your knowledge:
#this is how to create vectors of sequential integers integers in R
a <- c(1:33)
b <- c(1:33)
c <- c(1:33)
d <- c(1:33,267:300)
#this is how to create an empty vector
means <- rep(NA,length(a)*length(b)*length(c)*length(d))
#set up for a loop
i <- 1
#how you run a loop to perform this operation
for(j in 1:length(a)){
for(k in 1:length(b)){
for(l in 1:length(c)){
for(m in 1:length(d)){
y <- c(a[j],b[k],c[l],d[m])
means[i] <- mean(y)
i <- i+1
}
}
}
}
#and to graph your output
hist(means, col='brown')
#lets put a mean line through the histogram
abline(v=mean(means), col='white', lwd=2)