splitting a list and writing multiple files with R - plyr?

splitting a list and writing multiple files with R - plyr? - r

I'm breaking my head on how to write multiple files from each row of the input matrix, after some calculations. The code that I'm using now looks like this:
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
The akl.list that I create is too large for large input matrix and I cannot store it in the RAM. My idea was to write on files each matrix that I obtain in the llply loop. Is there an easy way to do that?
thank you!!
gibbi

you can use do_ply since you want just the loop feature
d_ply(aa, 1,function(row){
a <- akl(dist(row))
write.table(a) ## you save in a file here
},.progress='text' ## to show progress (optional)
)

Related

Putting user-defined on a list in for loop

I have problems storing user defined functions in R list when they are put on it in a for loop.
I have to define some segment-specific functions based on some parameters, so I create functions and put them on a list looping through segments with for-loop. The problem is I get same function everywhere on a result list.
The code looks like this:
n <- 100
segmenty <- 1:n
segment_functions <- list()
for (i in segmenty){
segment_functions[[i]] <- function(){return(i)}
}
When i run the code what I get is the same function (last created in the loop) for all indexes:
## for all k
segment_functions[[k]]()
[1] 100
There is no problem when I put the functions on list manually e.g.
segment_functions[[1]] <- function(){return(1)}
segment_functions[[2]] <- function(){return(2)}
segment_functions[[3]] <- function(){return(3)}
works just fine.
I honsetly have no idea what's wrong. Could you help?

You need to use the force function to ensure that the evaluation of i is done during the assignment into the list:
n <- 100
segmenty <- 1:n
segment_functions <- list()
f <- function(i) { force(i); function() return(i) }
for (i in segmenty){
segment_functions[[i]] <- f(i)
}

I'd use lapply and capture i in a clousre of the wrapper:
segment_functions <- lapply(1:100, function(i) function() i)

Sum of a list in r with Rmpfr package

I would like to calculate the sum of list, which contains in every element the text mpfr1.
I have the following Code:
library(Rmpfr)
###### Central generalized Cofactorial #####
CgenC <- function(n,k,sigma){
i <- 0:k
B <- (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
CgenC <- sum(B)*1/(factorial(k))
return(CgenC)
}
#helpfunction
hfun <- function(d,n,k,sigma,gamma){
G <- choose(n,d)*CgenC(d,k,sigma)*pochMpfr(-(gamma), n-d)
return(G)
}
NCgenC <- function(n,k,sigma,gamma){
s <- k:n
E <- sapply(s,hfun,n=n,k=k,sigma=sigma,gamma=gamma)
NCgenC <- sum(E)
return(NCgenC)
}
Probably there is also a better way than using the helpfunction, but I am new to programming.
After that E Looks like:
[[1]]
'mpfr1' 58841424144769590802398501576045837205738093472425577422395207203116722951724046.536490917270720694092667549937401176716842829058677975377719677884520463728524151109618958095938032339354615748413119902638793141568563928308460798662023212923799608873996653084247997838235897625734493429191314427738979926169378387781579463813022200592226827918068083534511231048810054263460272712110560165030802860741581172618182744896043856500473312651443547811890843652043285582186302263502988917786388502716155013416710062301717698574066162109375
[[2]]
'mpfr1' 82871275202593779087122604776157996409481088998845801228666944089795992655102472.642373430793692374125929134125303485357643757531745742076715050078321471532137607446428722005372960997898005609266610074855955157895396722193218695989672456706464857213288691436880384233954098045336023019010212578667162712025383678754362300085837501918162183612037828788745867290227442574306429529728922577115900369140687795454923720717749187763638945864480587157831212835424687567575261647966653471636794832585905467382683540344896738333277759879099376438826851654084748588502407073974609375
...etc.
Thus, R cannot compute the sum, since it is a list and there is always the mpfr1.
I hope that I have been clear. Can somebody tell me how to calculate the sum?

Use Rmpfr::mpfr2array function to turn your list of mpfr number into an array of mpfr numbers then use the native R sum function.
library(Rmpfr)
vect = rep(0, 5)
for(i in 1:5){
vect[i] = mpfr(x=10^-(10*i), precBits=100)
}
# My vector has just turned to a list
vect
# Sum of list is an error
sum(vect)
# Turn it to a vector
converted_vect = Rmpfr::mpfr2array(vect, dim = length(vect))
converted_vect
# Now my sum and prod work fine and the precision is not lost
sum(converted_vect)
prod(converted_vect)
The function mpfr2array is not suposed to be called by the user, it is an internal tool for the package. However it's one way to solve the problem.

R_ How to put a variable in a name

I have 50 files to read in R and I created this loop to help me. I would like to know if it is possible to do something like this in R.
How can I write it properly in R?
library(foreign)
for(i in 1:50 ){
tpi <- read.dbf('toto_%i%')
}
Help please.

We can do this using lapply
lst <- lapply(1:50, function(i) read.dbf(paste0("toto_", i)))

You want to use the function paste. As written your loop will overwrite tpi everytime it increments, so you will want to use a list to store the data.
toto = list()
for(i in 1:50)
{
toto[i] = read.dbf(paste0("toto_", i))
}
A shortcut using lapply gets the same results:
toto = lapply(1:50, function(x) read.dbf(paste0("toto_", x)))

Store results of a for-loop in an object or matrix

i've following problem:
I use the for-loop within R to get specific data from a matrix.
my code is as follows.
for(i in 1:100){
T <- as.Date(as.mondate (STARTLISTING)+i)
DELIST <- (subset(datensatz_Start_End.frame, TIME <= T))[,1]
write.table(DELIST, file = paste("tab", i, ".csv"), sep="," )
print(DELIST)
}
Using print, R delivers the data.
Using write.table, R delivers the data into different files.
My aim is to aggregate the results from the for-loop within one matrix. (each row for 'i')
But unfortunately I can not make it.
sorry, i'm a real noob within R.
for(i in 1:100)
{
T <- as.Date(as.mondate (STARTLISTING)+i)
DELIST <- (subset(datensatz_Start_End.frame, TIME <= T))[,1]
assign(paste('b',i,sep=''),DELIST)
}
this delivers 100 objects, which contain my results.
But what i need is one matrix/dataframe with 100 columns or one list.
Any ideas?
Hey!
Hence I'm not allowed to edit my own answers, here my (simple) solution as follows:
DELIST <- vector("list",100)
for(i in 1:100)
{
T <- as.Date(as.mondate (STARTLISTING)+i)
DELIST[[i]] <- as.character((subset(datensatz_Start_End.frame, TIME <= T))[,1])
}
DELIST[[99]] ## it is possible to requist the relevant companies for every 'i'
Thx to everyone!
George

If you want a list you can use lapply instead of loop
LL <- lapply(1:100,
function(i) {
T <- as.Date(as.mondate (STARTLISTING)+i)
DELIST <- (subset(datensatz_Start_End.frame, TIME <= T))[,1]
assign(paste('b',i,sep=''),DELIST)
}
)
After that you can rbind results together using do.call
result <- do.call(rbind, LL)
Or if you are confident that columns of all elements of LL are going to be of same, then you can use more efficient rbindlist from package data.table
result <- rbindlist(LL)

check out rbind function. You can start with empty DELIST.DF and append each row to it inside the loop -
DELIST.DF <- NULL
for(i in 1:100){
T <- as.Date(as.mondate (STARTLISTING)+i)
DELIST <- (subset(datensatz_Start_End.frame, TIME <= T))[,1]
DELIST.DF <- rbind(DELIST.DF, DELIST)
write.table(DELIST, file = paste("tab", i, ".csv"), sep="," )
print(DELIST)
}

Parallelize and speed up R code to read in many files

I've a code that works perfectly for my purpose (it reads some files with a specific pattern, read the matrix within each file and compute something using each filepair...the final output is a matrix that has the same size of the file number) and looks like this:
m<- 100
output<- matrix(0, m, m)
lista<- list.files(pattern = "q")
listan<- as.matrix(lista)
n <- nrow(listan)
for (i in 1:n) {
AA <- read.table((listan[i,]), header = FALSE)
A<- as.matrix(AA)
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
BB <- read.table ((listan[j,]), header = FALSE)
B<- as.matrix(BB)
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
My problem is that it takes a lot of time (I have about 5000 matrixes, that means 5000x5000 loops).
I would like to parallelize, but I need some help!
Waiting for your kind suggestions!
Thank you in advance!
Gab

The bottleneck is likely reading from disk. Running code in parallel isn't guaranteed to make things faster. In this case, multiple processes attempting to read from the same disk at the same time is likely to be even slower than a single process.
Since your matrices are being written by another R process, you really should save them in R's binary format. You're reading every matrix once and only once, so the only way to make your program faster is to make reading from disk faster.
Here's an example that shows you how much faster it could be:
# make some random data and write it to disk
set.seed(21)
for(i in 0:9) {
m <- matrix(runif(700*700), 700, 700)
f <- paste0("f",i)
write(m, f, 700) # text format
saveRDS(m, paste0(f,".rds")) # binary format
}
# initialize two output objects
m <- 10
o1 <- o2 <- matrix(NA, m, m)
# get list of file names
files <- list.files(pattern="^f[[:digit:]]+$")
n <- length(files)
First, let's run your your code using scan, which is already a lot faster than your current solution with read.table.
system.time({
for (i in 1:n) {
A <- scan(files[i],quiet=TRUE)
for (j in i:n) {
B <- scan(files[j],quiet=TRUE)
o1[i,j] <- sqrt(mean(A*B)) / sqrt(sqrt(mean(A*A)) * sqrt(mean(B*B)))
}
}
})
# user system elapsed
# 31.37 0.78 32.58
Now, let's re-run that code using the files saved in R's binary format:
system.time({
for (i in 1:n) {
fA <- paste0(files[i],".rds")
A <- readRDS(fA)
for (j in i:n) {
fB <- paste0(files[j],".rds")
B <- readRDS(fB)
o2[i,j] <- sqrt(mean(A*B)) / sqrt(sqrt(mean(A*A)) * sqrt(mean(B*B)))
}
}
})
# user system elapsed
# 2.42 0.39 2.92
So the binary format is ~10x faster! And the output is the same:
all.equal(o1,o2)
# [1] TRUE

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

splitting a list and writing multiple files with R - plyr? - r

you can use do_ply since you want just the loop feature d_ply(aa, 1,function(row){ a <- akl(dist(row)) write.table(a) ## you save in a file here },.progress='text' ## to show progress (optional) )

Related

Putting user-defined on a list in for loop

Sum of a list in r with Rmpfr package

R_ How to put a variable in a name

Store results of a for-loop in an object or matrix

Parallelize and speed up R code to read in many files

Categories

Resources