elementwise combination of two lists in R - r

Say I have two lists:
list.a <- as.list(c("a", "b", "c"))
list.b <- as.list(c("d", "e", "f"))
I would like to combine these lists recursively, such that the result would be a list of combined elements as a vector like the following:
[[1]]
[1] a d
[[2]]
[1] a e
[[3]]
[1] a f
[[4]]
[1] b d
and so on. I feel like I'm missing something relatively simple here. Any help?
Cheers.

expand.grid(list.a, list.b) gives you the desired result in a data.frame. This tends to be the most useful format for working with data in R. However, you could get the exact structure you ask for (save the ordering) with a call to apply and lapply:
result.df <- expand.grid(list.a, list.b)
result.list <- lapply(apply(result.df, 1, identity), unlist)
If you want this list ordered by the first element:
result.list <- result.list[order(sapply(result.list, head, 1))]

You want mapply (if by "recursively" you mean "in parallel"):
mapply(c, list.a, list.b, SIMPLIFY=FALSE)
Or maybe this is more what you want:
unlist(lapply(list.a, function(a) lapply(list.b, function (b) c(a, b))), recursive=FALSE)

Surprised nobody has mentioned this simple one liner:
as.list(outer(list.a,list.b, paste))
[[1]]
[1] "a d"
[[2]]
[1] "b d"
[[3]]
[1] "c d"
[[4]]
[1] "a e"

This gets you what you are looking for:
unlist(lapply(list.a, function(X) {
lapply(list.b, function(Y) {
c(X, Y)
})
}), recursive=FALSE)

Here is a function you can pass lists to to expand
expand.list <- function(...){
lapply(as.data.frame(t((expand.grid(...)))),c, recursive = TRUE, use.names = FALSE)}
expand.list(list.a, list.b)

Here is a somewhat brute force approach that will, given they are the same dimensions, append list.b to list.a recursively using the append function.
# CREATE LIST OBJECTS
list.a <- as.list(c("a", "b", "c"))
list.b <- as.list(c("d", "e", "f"))
# CREATE AN EMPTY LIST TO POPULATE
list.ab <- list()
# DOUBLE LOOP TO CREATE RECURSIVE COMBINATIONS USING append
ct=0
for( i in 1:length(list.a) ) {
for (j in 1:length(list.b) ) {
ct=ct+1
list.ab[[ct]] <- append(list.a[[i]], list.b[[j]])
}
}
# PRINT RESULTS
list.ab

Related

R Loop all vectors in a list between each other

I'm still a little green to R. So bear with me.
I have a list of vectors and I would like to compare each vector in the list and then tack on the matching list to the end of the match one. I am looking for robust repeatable solution, regardless of number of vectors in the list.
So if I have a list (lst) made of vectors:
lst <- list(c("a", "b"), c("b", "c"), c("e", "f"), c("c", "g"))
I want to get a list of vectors like this as a result:
[[1]]
[1] "a" "b" "c" "g"
[[2]]
[1] "e" "f"
So I've been able to make this work for a singular instance:
if(any(lst[[1]] %in% lst[[2]])){
c(lst[[1]], lst[[2]])
}
but now I'm trying to loop it over the entire list and this is what I have so far, but I'm a little stuck:
endmembers <- lapply(seq_along(lst), function(i,j){
x <- lst[[i]]
x2 <- lst[[j]]
if(any(x %in% x2)){
c(x, x2)
}
})
I would use a recursive function to stick all the components together, then remove list items that are contained within other items:
#### helper functions ----
# Recursive function to stick list items together
fun <- function(x, d) {
i <- which(sapply(d, function(y) y[1]) == tail(x, 1))
if (length(i) > 0) {
y <- d[[i[1]]]
x <- c(x, y[2:length(y)])
x <- fun(x, d)
}
x
}
# is vector inside another vector? - must be in the same sequence and order
inside <- function(x, y) {
if ( isTRUE(all.equal(x, y)) )
return(FALSE)
if ( length(x) > length(y) )
return(FALSE)
if ( !any(x %in% y))
return(FALSE)
!is.unsorted( sapply(x, function(a, b) which(a == b), b = y), strictly = TRUE )
}
#### analysis ----
# Stick vectors together if last == first
d <- lapply(lst, fun, d = lst)
# remove list items that are inside other list items - there might be a more
# elegant solution to this, I'm confused by it.
d[!apply(
sapply(d,
function(x, y) sapply(y, function(x, y) inside(x, y), y = x),
y = d),
1,
any)]
An easy option is using igraph
library(igraph)
u <- cluster_infomap(graph_from_data_frame(as.data.frame(do.call(rbind,lst))))
out <- split(u$names,u$membership)
which gives
> out
$`1`
[1] "a" "b" "c" "g"
$`2`
[1] "e" "f"
If you want base R solution with for loops, here is one version
out <- lst[1]
for (v in lst) {
flag <- 1
for (k in 1:length(out)) {
if (any(v %in% out[[k]])) {
out[[k]] <- union(out[[k]], v)
flag <- 0
break
}
}
if (flag) out[[length(out) + 1]] <- v
}
such that
> out
[[1]]
[1] "a" "b" "c" "g"
[[2]]
[1] "e" "f"
In case anyone wants to know what I did, I followed the code in merging sets which have even one element in common R that was commented.
m <- sapply(lst, function(x) sapply(lst, function(y) (any(x %in% y))))
#determine the groups of the graph constructed from m
groups <- groups(components(graph_from_adjacency_matrix(m)))
#Get the unique elements of each group
endmembers <- lapply(groups,function(x) sort(unique(unlist(lst[x]))))

Avoiding nested loops but iterate over 2 values using (l)apply?

I'd like to get better at writing elegant code in R, and am trying to avoid writing nested loops, but cannot figure out an (l)apply solution to my problem.
I have a set of paired files, each of which has two variables associated with them - a name and a number. The filenames are long, so I'd like to generate a vector of filenames that can then be accessed by my own custom downstream function for reading them into a dataframe, plotting, etc.
For example, the files look like:
5_simulationA.k 5_simulationA.b
10_simulationA.k 10_simulationA.b
5_simulationB.k 5_simulationB.b
10_simulationB.k 10_simualtionB.b
The ".k" and ".b" files are mates of a pair and must stay together for downstream processing.
I could read in these files by writing a nested loop that would look something like,
K_files = c()
B_files = c()
for (i in c(A,B,C)){ # iterate over letter variable
for (n in c(5,10,15)){ #iterate over numbers of the files
k_filename = paste(n, "_simulation", i, ".k")
b_filename = paste(n, "_simulation", i, ".b")
K_files = c(K_files, k_filename)
B_files = c(B_files, b_filename)
}
}
This is of course very ugly and un-R-like. I would love to find a way to do this with the very powerful apply or lapply statements, or any other elegant solutions anyone might have. Thanks!
Base R function outer is meant for this kind of problem.
L <- c("A", "B", "C")
N <- c(5, 10, 15)
f <- function(i, n, e) paste0(n, "_simulation", i, e)
sapply(c(".k", ".b"), function(.e) outer(L, N, f, e = .e))
# .k .b
# [1,] "5_simulationA.k" "5_simulationA.b"
# [2,] "5_simulationB.k" "5_simulationB.b"
# [3,] "5_simulationC.k" "5_simulationC.b"
# [4,] "10_simulationA.k" "10_simulationA.b"
# [5,] "10_simulationB.k" "10_simulationB.b"
# [6,] "10_simulationC.k" "10_simulationC.b"
# [7,] "15_simulationA.k" "15_simulationA.b"
# [8,] "15_simulationB.k" "15_simulationB.b"
# [9,] "15_simulationC.k" "15_simulationC.b"
From OP's example output filenames, it looks like we want all combinations of n and i. expand.grid returns a dataframe of all combinations of ns and is. We could then use apply to loop through its row to generate the filenames:
i <- c("A", "B", "C")
n <- c(5, 10, 15)
combi <- expand.grid(n = n, i = i)
invisible(apply(combi, 1, function(x){
k_filename = paste0(x[1], "_simulation", x[2], ".k")
b_filename = paste0(x[1], "_simulation", x[2], ".b")
print(k_filename)
print(b_filename)
}))
Noticed that I used invisible to suppress the output of apply since we are only interested in the side-effects (read/write files). Alternatively, we can use pwalk from purrr, which takes each column of the same expand.grid dataframe as input and creates the filenames silently:
library(dplyr)
library(purrr)
combi %>%
pwalk(~ {
k_filename = paste0(.x, "_simulation", .y, ".k")
b_filename = paste0(.x, "_simulation", .y, ".b")
print(k_filename)
print(b_filename)
})
Output:
[1] "5_simulationA.k"
[1] "5_simulationA.b"
[1] "10_simulationA.k"
[1] "10_simulationA.b"
[1] "15_simulationA.k"
[1] "15_simulationA.b"
[1] "5_simulationB.k"
[1] "5_simulationB.b"
[1] "10_simulationB.k"
[1] "10_simulationB.b"
[1] "15_simulationB.k"
[1] "15_simulationB.b"
[1] "5_simulationC.k"
[1] "5_simulationC.b"
[1] "10_simulationC.k"
[1] "10_simulationC.b"
[1] "15_simulationC.k"
[1] "15_simulationC.b"
library(tidyverse)
Type = c("A", "B", "C")
Index = c(5, 10, 15)
crossing(Type, Index) %>%
mutate(k_filename = map2_chr(Index, Type, ~paste(.x, "_simulation", .y, ".k", sep="")),
b_filename = map2_chr(Index, Type, ~paste(.x, "_simulation", .y, ".b", sep=""))) -> names
After that, you can access the k_filename or b_filename using pull
K_files <- names %>% pull(k_filename)

reassign values in a list without looping

test <- list(a = list("first"= 1, "second" = 2),
b = list("first" = 3, "second" = 4))
In the list above, I would like to reassign the "first" elements to equal, let's say, five. This for loop works:
for(temp in c("a", "b")) {
test[[temp]]$first <- 5
}
Is there a way to do the same using a vectorized operation (lapply, etc)? The following extracts the values, but I can't get them reassigned:
lapply(test, "[[", "first")
Here is a vectorised one-liner using unlist and relist:
relist((function(x) ifelse(grepl("first",names(x)),5,x))(unlist(test)),test)
$a
$a$first
[1] 5
$a$second
[1] 2
$b
$b$first
[1] 5
$b$second
[1] 4
You can do it like this:
test <- lapply(test, function(x) {x$first <- 5; x})

Why can't I assign to multiple variables using mapply/assign? [duplicate]

I want to assign multiple variables in a single line in R. Is it possible to do something like this?
values # initialize some vector of values
(a, b) = values[c(2,4)] # assign a and b to values at 2 and 4 indices of 'values'
Typically I want to assign about 5-6 variables in a single line, instead of having multiple lines. Is there an alternative?
I put together an R package zeallot to tackle this very problem. zeallot includes an operator (%<-%) for unpacking, multiple, and destructuring assignment. The LHS of the assignment expression is built using calls to c(). The RHS of the assignment expression may be any expression which returns or is a vector, list, nested list, data frame, character string, date object, or custom objects (assuming there is a destructure implementation).
Here is the initial question reworked using zeallot (latest version, 0.0.5).
library(zeallot)
values <- c(1, 2, 3, 4) # initialize a vector of values
c(a, b) %<-% values[c(2, 4)] # assign `a` and `b`
a
#[1] 2
b
#[1] 4
For more examples and information one can check out the package vignette.
There is a great answer on the Struggling Through Problems Blog
This is taken from there, with very minor modifications.
USING THE FOLLOWING THREE FUNCTIONS
(Plus one for allowing for lists of different sizes)
# Generic form
'%=%' = function(l, r, ...) UseMethod('%=%')
# Binary Operator
'%=%.lbunch' = function(l, r, ...) {
Envir = as.environment(-1)
if (length(r) > length(l))
warning("RHS has more args than LHS. Only first", length(l), "used.")
if (length(l) > length(r)) {
warning("LHS has more args than RHS. RHS will be repeated.")
r <- extendToMatch(r, l)
}
for (II in 1:length(l)) {
do.call('<-', list(l[[II]], r[[II]]), envir=Envir)
}
}
# Used if LHS is larger than RHS
extendToMatch <- function(source, destin) {
s <- length(source)
d <- length(destin)
# Assume that destin is a length when it is a single number and source is not
if(d==1 && s>1 && !is.null(as.numeric(destin)))
d <- destin
dif <- d - s
if (dif > 0) {
source <- rep(source, ceiling(d/s))[1:d]
}
return (source)
}
# Grouping the left hand side
g = function(...) {
List = as.list(substitute(list(...)))[-1L]
class(List) = 'lbunch'
return(List)
}
Then to execute:
Group the left hand side using the new function g()
The right hand side should be a vector or a list
Use the newly-created binary operator %=%
# Example Call; Note the use of g() AND `%=%`
# Right-hand side can be a list or vector
g(a, b, c) %=% list("hello", 123, list("apples, oranges"))
g(d, e, f) %=% 101:103
# Results:
> a
[1] "hello"
> b
[1] 123
> c
[[1]]
[1] "apples, oranges"
> d
[1] 101
> e
[1] 102
> f
[1] 103
Example using lists of different sizes:
Longer Left Hand Side
g(x, y, z) %=% list("first", "second")
# Warning message:
# In `%=%.lbunch`(g(x, y, z), list("first", "second")) :
# LHS has more args than RHS. RHS will be repeated.
> x
[1] "first"
> y
[1] "second"
> z
[1] "first"
Longer Right Hand Side
g(j, k) %=% list("first", "second", "third")
# Warning message:
# In `%=%.lbunch`(g(j, k), list("first", "second", "third")) :
# RHS has more args than LHS. Only first2used.
> j
[1] "first"
> k
[1] "second"
Consider using functionality included in base R.
For instance, create a 1 row dataframe (say V) and initialize your variables in it. Now you can assign to multiple variables at once V[,c("a", "b")] <- values[c(2, 4)], call each one by name (V$a), or use many of them at the same time (values[c(5, 6)] <- V[,c("a", "b")]).
If you get lazy and don't want to go around calling variables from the dataframe, you could attach(V) (though I personally don't ever do it).
# Initialize values
values <- 1:100
# V for variables
V <- data.frame(a=NA, b=NA, c=NA, d=NA, e=NA)
# Assign elements from a vector
V[, c("a", "b", "e")] = values[c(2,4, 8)]
# Also other class
V[, "d"] <- "R"
# Use your variables
V$a
V$b
V$c # OOps, NA
V$d
V$e
here is my idea. Probably the syntax is quite simple:
`%tin%` <- function(x, y) {
mapply(assign, as.character(substitute(x)[-1]), y,
MoreArgs = list(envir = parent.frame()))
invisible()
}
c(a, b) %tin% c(1, 2)
gives like this:
> a
Error: object 'a' not found
> b
Error: object 'b' not found
> c(a, b) %tin% c(1, 2)
> a
[1] 1
> b
[1] 2
this is not well tested though.
A potentially dangerous (in as much as using assign is risky) option would be to Vectorize assign:
assignVec <- Vectorize("assign",c("x","value"))
#.GlobalEnv is probably not what one wants in general; see below.
assignVec(c('a','b'),c(0,4),envir = .GlobalEnv)
a b
0 4
> b
[1] 4
> a
[1] 0
Or I suppose you could vectorize it yourself manually with your own function using mapply that maybe uses a sensible default for the envir argument. For instance, Vectorize will return a function with the same environment properties of assign, which in this case is namespace:base, or you could just set envir = parent.env(environment(assignVec)).
As others explained, there doesn't seem to be anything built in. ...but you could design a vassign function as follows:
vassign <- function(..., values, envir=parent.frame()) {
vars <- as.character(substitute(...()))
values <- rep(values, length.out=length(vars))
for(i in seq_along(vars)) {
assign(vars[[i]], values[[i]], envir)
}
}
# Then test it
vals <- 11:14
vassign(aa,bb,cc,dd, values=vals)
cc # 13
One thing to consider though is how to handle the cases where you e.g. specify 3 variables and 5 values or the other way around. Here I simply repeat (or truncate) the values to be of the same length as the variables. Maybe a warning would be prudent. But it allows the following:
vassign(aa,bb,cc,dd, values=0)
cc # 0
list2env(setNames(as.list(rep(2,5)), letters[1:5]), .GlobalEnv)
Served my purpose, i.e., assigning five 2s into first five letters.
Had a similar problem recently and here was my try using purrr::walk2
purrr::walk2(letters,1:26,assign,envir =parent.frame())
https://stat.ethz.ch/R-manual/R-devel/library/base/html/list2env.html:
list2env(
list(
a=1,
b=2:4,
c=rpois(10,10),
d=gl(3,4,LETTERS[9:11])
),
envir=.GlobalEnv
)
If your only requirement is to have a single line of code, then how about:
> a<-values[2]; b<-values[4]
I'm afraid that elegent solution you are looking for (like c(a, b) = c(2, 4)) unfortunatelly does not exist. But don't give up, I'm not sure! The nearest solution I can think of is this one:
attach(data.frame(a = 2, b = 4))
or if you are bothered with warnings, switch them off:
attach(data.frame(a = 2, b = 4), warn = F)
But I suppose you're not satisfied with this solution, I wouldn't be either...
R> values = c(1,2,3,4)
R> a <- values[2]; b <- values[3]; c <- values[4]
R> a
[1] 2
R> b
[1] 3
R> c
[1] 4
Another version with recursion:
let <- function(..., env = parent.frame()) {
f <- function(x, ..., i = 1) {
if(is.null(substitute(...))){
if(length(x) == 1)
x <- rep(x, i - 1);
stopifnot(length(x) == i - 1)
return(x);
}
val <- f(..., i = i + 1);
assign(deparse(substitute(x)), val[[i]], env = env);
return(val)
}
f(...)
}
example:
> let(a, b, 4:10)
[1] 4 5 6 7 8 9 10
> a
[1] 4
> b
[1] 5
> let(c, d, e, f, c(4, 3, 2, 1))
[1] 4 3 2 1
> c
[1] 4
> f
[1] 1
My version:
let <- function(x, value) {
mapply(
assign,
as.character(substitute(x)[-1]),
value,
MoreArgs = list(envir = parent.frame()))
invisible()
}
example:
> let(c(x, y), 1:2 + 3)
> x
[1] 4
> y
[1]
Combining some of the answers given here + a little bit of salt, how about this solution:
assignVec <- Vectorize("assign", c("x", "value"))
`%<<-%` <- function(x, value) invisible(assignVec(x, value, envir = .GlobalEnv))
c("a", "b") %<<-% c(2, 4)
a
## [1] 2
b
## [1] 4
I used this to add the R section here: http://rosettacode.org/wiki/Sort_three_variables#R
Caveat: It only works for assigning global variables (like <<-). If there is a better, more general solution, pls. tell me in the comments.
For a named list, use
list2env(mylist, environment())
For instance:
mylist <- list(foo = 1, bar = 2)
list2env(mylist, environment())
will add foo = 1, bar = 2 to the current environement, and override any object with those names. This is equivalent to
mylist <- list(foo = 1, bar = 2)
foo <- mylist$foo
bar <- mylist$bar
This works in a function, too:
f <- function(mylist) {
list2env(mylist, environment())
foo * bar
}
mylist <- list(foo = 1, bar = 2)
f(mylist)
However, it is good practice to name the elements you want to include in the current environment, lest you override another object... and so write preferrably
list2env(mylist[c("foo", "bar")], environment())
Finally, if you want different names for the new imported objects, write:
list2env(`names<-`(mylist[c"foo", "bar"]), c("foo2", "bar2")), environment())
which is equivalent to
foo2 <- mylist$foo
bar2 <- mylist$bar

Efficient way to apply a function to a list of lists

I have 80 lists for the project in question. Each list is a list of length 1000. I'd like to run a function on each one (each of the 1000), and assign the results back to the original object. The total data is over 150 gigs so I want to make sure this is most efficient before running it on the actual data. Is this trivial example the best way to do what I need?
# my actual function is obviously more complicated.
# But let's say the goal is to keep 2/5 items in each list
trivial <- function(foo) {
keep <- c("S1", "S2")
foo[which(keep %in% names(foo))]
}
sublist <- replicate(5, as.list(1:5), simplify=FALSE)
names(sublist) <- paste0("S", 1:5)
eachlist <- replicate(5, sublist, simplify = F)
a1 <- a2 <- a3 <- a4 <- a5 <- eachlist
# To clarify the layout
length(a1)
[1] 5
> length(a1[[1]])
[1] 5
> names(a1[[1]])
[1] "S1" "S2" "S3" "S4" "S5"
# I need to drop S3-S5 from each of 5 sublists of a1.
# Now I'd like to repeat this for all 80 lists named a[0-9].
# all the objects have a pattern sometextNUMBER. This list is
# just the names of all the lists.
listz <- as.list(ls(pattern="[a-z][0-9]"))
> listz
[[1]]
[1] "a1"
[[2]]
[1] "a2"
[[3]]
[1] "a3"
[[4]]
[1] "a4"
[[5]]
[1] "a5"
# I don't need anything returned, just for a1-a80 updated such that
# in each sublist, 3 of 5 items are dropped.
# This works fine, but my concern now is just scaling this up.
l_ply(listz, function(x){
assign(as.character(x), llply(get(x), trivial), envir = .GlobalEnv)
})
You could loop over the list of names, using substitute() and eval() to first construct and then execute the expressions you'd (not!) like to type individually at the command line:
objNames <- ls(pattern="[a-z][0-9]")
for(objName in objNames) {
expr <-
substitute({
OBJ <- lapply(OBJ, function(X) X[names(X) %in% c("S1", "S2")])
}, list(OBJ = as.symbol(objName)))
eval(expr)
}
This is a good use-case for rapply:
listz <- replicate(5, as.list(1:5), simplify=FALSE)
fun <- function(x) x*10
out <- rapply(listz, fun, how="replace")

Resources