making list of lists in R - r

Why in R
e=list(a,b,c,d)
is different than:
e=list(a,b)
e=list(e,c)
e=list(e,d)
?
The second approach can be easily used in a for loop, but it produces different result, and I create 1 object each iteration, so cant use first approach, any hints ?

If you absolutely want to use this approach, you can do this:
# Make up some data
a <- 1:3; b <- 4:5; c <- 6:10; d <- 11:17
# Build up the lists
e0 <- list(a, b, c, d)
e <- list(a, b)
e <- c(e, list(c))
e <- c(e, list(d))
# Compare the two
identical(e0, e) # TRUE
In a real-life case, however, instead of using a loop, you probably would be better off using function from the *apply family, such as lapply(), which will return a list of outputs directly.

Related

Performing a function's criterion within another

This is fairly difficult for me to explain, however, I wish to use a function that's within a function, that can be used towards the variables.
For example, given that the function looks similar to this:
test <- function(a, b, c){
a <- ...
b <- ...
c <- ...
}
#used likeso:
test(a = ..., b = ..., c = ...)
Is it possible to use it likeso:
test(a = ..., c(b = ...))
And what's an example of how this function would look like? I'm looking for a function like this because I'm trying to index a function within a function, that can be used like the second function above.
I know that this can be achieved with two separate function, though, I'm asking as to whether its possible with one function, whilst having another function indexed within it?
Use do.call like this:
test <- function(a, b, c) a + b + c
do.call("test", list(a = 1, b = 2, c = 3))
## [1] 6

Extracting objects defined within a function in R

I'm writing a function in R and I want to be able to call different objects from the function. I've got simple example of the problem I'm talking about (not the real code obviously).
example <- function(a,b){
c <- a+b
d <- a*b
e <- a/b
e
}
a <- 10
b <- 20
output <- example(a,b)
str(output)
output$c
My goal is for the last line to show the value of c defined in the function. In this code the only thing saved in output is the returned value, e.
I've tried changing the local and global environments, using <<- etc. That doesn't solve the problem though. Any help would be appreciated.
We can return multiple output in a list and then extract the list element
example <- function(a,b){
c <- a+b
d <- a*b
e <- a/b
list(c=c, d= d, e = e)
}
a <- 10
b <- 20
output <- example(a,b)[['c']]
output
#[1] 30
example(a,b)[['d']]
#[1] 200

using separate matrices in a loop and saving the results in a data frame

I have a few matrices that all of them are the same in number of rows and columns and dimanames of them are exactly the same too. I read them like this for example
a<-read.csv("a.txt",row.names = 1,header=T,sep="\t")
b<-read.csv("b.txt",row.names = 1,header=T,sep="\t")
c<-read.csv("c.txt",row.names = 1,header=T,sep="\t")
d<-read.csv("d.txt",row.names = 1,header=T,sep="\t")
e<-read.csv("e.txt",row.names = 1,header=T,sep="\t")
Now I want to get similarity index between a & b, a & c,...,b & c, ..., c & d, d
& e using this code
library(igraph)
library(BiRewire)
jaccard.index<-birewire.similarity( a,b)
Then I want to save the result as a data frame like this for example
mat1 mat2 simil.index
a b 0.9142
a c 0.8126
a d 0.5066
b e 0.9526
I don't know how can I use these separate matrices in a loop and saving the result like that. Anyone help me on this problem?
Prepare the function to compute pairwise similarities
myfun <- function(x, y) {
birewire.similarity(eval(parse(text = x)), eval(parse(text = y)))
}
Build the possible combinations (you said your matrices are named as the first 5 letters of the alphabet, but you can put any names in place of letters[1:5]):
myletters <- combn(letters[1:5], 2)
Build the dataframe binding by column the combinations and the results of the function that is applied to such combinations:
data.frame(t(myletters),
simil.index = mapply(myfun, myletters[1,], myletters[2,]))

naming and rbinding all elements (dfs, matrices, vectors, etc.) from lists of list

I have a list with lists like this:
# output from a package function
A <- list(a = matrix(1:9,ncol=3),
b = matrix(1:8,ncol=4),
c = 1.5)
B <- list(a = matrix(11:19,ncol=3),
b = matrix(11:18,ncol=4),
c = 2.5)
# list with all outputs (from loaded=lapply(filenames, function(x) get(load(x))) )
superlist <- list(A, B))
What I would like to do is first add the name of each list item (A, B) to all second order list elements. For example B would become:
B <- list(a = cbind(matrix(11:19,ncol=3),c("B","B","B")),
b = cbind(matrix(11:18,ncol=4),c("B","B")),
c = c(2.5,"B"))
Then, the aim is to rbind all matrices, values or dataframes (a,b,c) with the same name together, so that I would have:
superlist <- list(a = rbind(cbind(matrix(1:9,ncol=3),c("A","A","A")),cbind(matrix(11:19,ncol=3),c("B","B","B"))),
b = rbind(cbind(matrix(1:8,ncol=4),c("A","A")),cbind(matrix(11:18,ncol=4),c("B","B"))),
c = rbind(c(1.5,"A"),c(2.5,"B")))
For the rbinding, the best I got is this (from rbind all dataframes in a list of lists by name):
do.call("rbind",lapply(superlist ,function(x) x[["a"]]))
However, it only does it for one list element (and I have more than 20). I know that I can write a loop, but as I will be using the function often I would like to know how to do this nicer.
I am aware that there are multiple questions asked about this, but none of them has exactly the same problem (for example, some only have data frames as list of list elements, sometimes all of the same size). So although certain questions provided a bit of help, none truly gave me enough information to resolve my problem.
rbind dataframes in a list of lists # groups by the second list, not the first
Convert a list of data frames into one data frame # only one list
rbinding a list of lists of dataframes based on nested order
...
Thank you
I think can utilize the function proposed in this answer. It reverses the list structure i.e. groups by inner list. An example:
# output from a package function
A <- list(a = matrix(1:9,ncol=3),
b = matrix(1:8,ncol=4),
c = 1.5)
B <- list(a = matrix(11:19,ncol=3),
b = matrix(11:18,ncol=4),
c = 2.5)
# list with all outputs (from loaded=lapply(filenames, function(x) get(load(x))) )
superlist <- list(A, B)
################### your code above ##############
## the function from the linked answer
fun <- function(ll) {
nms <- unique(unlist(lapply(ll, function(X) names(X))))
ll <- lapply(ll, function(X) setNames(X[nms], nms))
ll <- apply(do.call(rbind, ll), 2, as.list)
lapply(ll, function(X) X[!sapply(X, is.null)])
}
## apply the function to your list
insideout <- fun(superlist)
## rbind the components together
lapply(insideout, function(x) do.call(rbind, x))
Is this what you intended to do?

Divide et impera on a data frame in R

As we all know R isn't the most efficient platform to run large analyses.
If I had a large data frame containing three parameters:
GROUP X Y
A 1 2
A 2 2
A 2 3
...
B 1 1
B 2 3
B 1 4
...
millions of rows
and I wanted to run a computation on each group (e.g. compute Pearson's r on X,Y) and store the results in a new data frame, I can do it like this:
df = loadDataFrameFrom( someFile )
results = data.frame()
for ( g in unique( df$GROUP)) ){
gdf <- subset( df, df$GROUP == g )
partialRes <- slowStuff( gdf$X,gdf$Y )
results = rbind( results, data.frame( GROUP = g, RES = partialRes ) )
}
// results contains all the results here.
useResults(results)
The obvious problem is that this is VERY slow, even on powerful multi-core machine.
My question is: is it possible to parallelise this computation, having for example a separate thread for each group or a block of groups?
Is there a clean R pattern to solve this simple divide et impera problem?
Thanks,
Mulone
First off, R is not necessarily slow. Its speed depends largely on using it correctly, just like any language. There are a few things that can speed up your code without altering much: preallocate your results data.frame before you begin; use a list and matrix or vector construct instead of a data.frame; switch to use data.table; the list goes on, but The R Inferno is an excellent place to start.
Also, take a look here. It provides a good summary on how to take advantage of multi-core machines.
The "clean R pattern" was succinctly solved by Hadley Wickam with his plyr package and specifically ddply:
library(plyr)
library(doMC)
registerDoMC()
ddply(df, .(GROUP), your.function, .parallel=TRUE)
However, it is not necessarily fast. You can use something like:
library(parallel)
mclapply(unique(df$GRUOP), function(x, df) ...)
Or finally, you can use the foreach package:
foreach(g = unique(df$Group), ...) %dopar$ {
your.analysis
}
To back up my comment: 10 million rows, 26 groups. Done in < 3 seconds on a single-core 3.3Ghz CPU. Using only base R. No parallelization needed.
> set.seed(21)
> x <- data.frame(GROUP=sample(LETTERS,1e7,TRUE),X=runif(1e7),Y=runif(1e7))
> system.time( y <- do.call(rbind, lapply(split(x,x$GROUP),
+ function(d) data.frame(GROUP=d$GROUP[1],cor=cor(d$X,d$Y)))) )
user system elapsed
2.37 0.56 2.94
> y
GROUP cor
A A 2.311493e-03
B B -1.020239e-03
C C -1.735044e-03
D D 1.355110e-03
E E -8.027199e-04
F F 8.234086e-04
G G 2.337217e-04
H H -5.861781e-04
I I 7.799191e-04
J J 1.063772e-04
K K 7.174137e-04
L L 4.151059e-04
M M 4.440694e-04
N N 2.568411e-03
O O -3.827366e-04
P P -1.239380e-03
Q Q -1.057020e-03
R R 1.079676e-03
S S -1.819232e-03
T T -3.577533e-04
U U -1.084114e-03
V V 6.686503e-05
W W -1.631912e-03
X X 8.668508e-04
Y Y -6.460281e-04
Z Z 1.614978e-03
By the way, parallelization will only help if your slowStuff function is the bottleneck. Your use of rbind in a loop is likely the bottleneck, unless you do something similar in slowStuff.
I think your slowness is in part due to your non R programming in R. The following would give you correlations per group (I used the mtcars data set and divided it by cyl group) and do it pretty fast:
by(mtcars, mtcars$cyl, cor)

Resources