Packing and unpacking elements from list in R - r

I have two questions related to using list in R and I am trying to see how I can improve my naive solution. I have seen questions on similar topic here but the approach described there is not helping.
Q1:
MWE:
a <- c(1:5)
b <- "adf"
c <- array(rnorm(9), dim = c(3,3) )
Make a list, say with name "packedList", while preserving the name of
all variables.
Current solution: packedList <- list(a = a, b = b, c = c)
However, if the number of variables (three in above problem i.e. a, b, c) is
large (say we have 20 variables), then my current solution may not be
the best.
This is idea is useful while returning large number of variables from
a function.
Q2:
MWE: Given packedList, extract variables a, b, c
I would like to extract all elements in the given list (i.e. packedList) to the environment while preserving their names. This is reverse of task 1.
For example: Given variable packedList in the environment, I can define a, b, and c as follows:
a <- packedList$a
b <- packedList$b
c <- packedList$c
However, if the number of variables is very large then my solution can be cumbersome.
- After some Google search, I found one solution but I am not sure if it is the most elegant solution either. The solution is shown below:
x <- packedList
for(i in 1:length(x)){
tempobj <- x[[i]]
eval(parse(text=paste(names(x)[[i]],"= tempobj")))
}

You are most likely looking for mget (Q1) and list2env (Q2).
Here's a small example:
ls() ## Starting with an empty workspace
# character(0)
## Create a few objects
a <- c(1:5)
b <- "adf"
c <- array(rnorm(9), dim = c(3,3))
ls() ## Three objects in your workspace
[1] "a" "b" "c"
## Pack them all into a list
mylist <- mget(ls())
mylist
# $a
# [1] 1 2 3 4 5
#
# $b
# [1] "adf"
#
# $c
# [,1] [,2] [,3]
# [1,] 0.70647167 1.8662505 1.7941111
# [2,] -1.09570748 0.9505585 1.5194187
# [3,] -0.05225881 -1.4765127 -0.6091142
## Remove the original objects, keeping just the packed list
rm(a, b, c)
ls() ## only one object is there now
# [1] "mylist"
## Use `list2env` to recreate the objects
list2env(mylist, .GlobalEnv)
# <environment: R_GlobalEnv>
ls() ## The list and the other objects...
# [1] "a" "b" "c" "mylist"

Related

Why does as.list() applied to a vector generate a list that is not treated the same as a list generated with list() in R?

Here is a very basic example that illustrates the differences in R
Given the following data frames:
a <- data.frame(l=c("object1", "object2"))
b <- data.frame(l=c("object3", "object4"))
Creating a vector for the names of the data frames:
vector <- c("a","b")
And then applying as.list()
list_of_vector <- as.list(vector)
If we try loop this:
lapply(list_of_vector, print)
The output is
[1] "a"
[1] "b"
[[1]]
[1] "a"
[[2]]
[1] "b"
Compared to just manually creating a list and then running the same loop:
straight_list <- list(a,b)
lapply(straight_list, print)
l
1 object1
2 object2
l
1 object3
2 object4
[[1]]
l
1 object1
2 object2
[[2]]
l
1 object3
2 object4
I would like to understand what makes as.list() different from list and how I would be able to convert a vector like the above to create the 2nd, rather than first output. Thanks in advance :)

Split dataframe columns into vectors in R

I have a dataframe as such:
Number <- c(1,2,3)
Number2 <- c(10,12,14)
Letter <- c("A","B","C")
df <- data.frame(Number,Number2,Letter)
I would like to split the df into its respective three columns, each one becoming a vector with the respective column name. In essence, the output should look exactly like the original three input vectors in the above example.
I have tried the split function and also using for loop, but without success.
Any ideas? Thank you.
We may use unclass as data.frame is a list with additional attributes. By unclassing, it removes the data.frame attribute
unclass(df)
Or another option is asplit with MARGIN specified as 2
asplit(df, 2)
NOTE: Both of them return a named list. If we intend to create new objects in the global env, use list2env (not recommended though)
We can use c oras.list
> c(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
> as.list(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
Assuming you are trying to create these as vectors if the global environment, use list2env:
df <- data.frame(Number = c(1, 2, 3),
Number2 = c(10, 12, 14),
Letter = c("A", "B", "C"))
list2env(df, .GlobalEnv)
## <environment: R_GlobalEnv>
ls()
## [1] "df" "Letter" "Number" "Number2"
list2env is clearly the easiest way, but if you want to do it with a for loop it can also be achieved.
The "tricky" part is to make a new vector based on the column names inside the for loop. If you just write
names(df[i]) <- input
a vector will not be created.
A workaround is to use paste to create a string with the new vector name and what should be in it, then use "eval(parse(text=)" to evaluate this expression.
Maybe not the most elegant solution, but seems to work.
for (i in colnames(df)){
vector_name <- names(df[i])
expression_to_be_evaluated <- paste(vector_name, "<- df[[i]]")
eval(parse(text=expression_to_be_evaluated))
}
> Letter
[1] A B C
Levels: A B C
> Number
[1] 1 2 3
> Number2
[1] 10 12 14

Build a dataframe from pairwise combinations of list elements

I have a list list. The first 5 elements of this list are:
[[1]]
[1] "#solarpanels" "#solar"
[[2]]
[1] "#Nuclear" "#Wind" "#solar"
[[3]]
[1] "#solar"
[[4]]
[1] "#steel" "#windenergy" "#solarenergy" "#carbonfootprint"
[[5]]
[1] "#solar" "#wind"
I would like to delete elements like [[3]] because contains only one element. Moreover, I would like to build a dataframe containing all the possible combinations for each row of the list. For example, dataframe with two columns (e.g. the first named A, the second B) such as:
A B
"#solarpanels" "#solar"
"#Nuclear" "#Wind"
"#Nuclear" "#solar"
"#steel" "#windenergy"
"#steel" "#solarenergy"
"#steel" "#carbonfootprint"
"#windenergy" "#carbonfootprint"
"#windenergy" "#solarenergy"
"#solarenergy" "#carbonfootprint"
"#solar" "#wind"
I tried with (just for one element)
for (i in 1:(length(list[[4]])-1)) {
df$from = rep(list[[4]][i],length(list[[4]])-i)
df$to = list[[4]][(i+1):length(list[[4]])]
}
where
df=data.frame(A=character(),
B=character(),
stringsAsFactors=FALSE)
but I obtained
data.frame`(`*tmp*`, A, value = c("#steel", "#steel", :
replacement has 3 rows, data has 0
for i=1.
Your data first:
l = list(
c("#solarpanels", "#solar"),
c("#Nuclear", "#Wind", "#solar"),
"#solar",
c("#steel", "#windenergy", "#solarenergy", "#carbonfootprint"),
c("#solar", "#wind")
)
Here's a two-liner version:
l = l[lengths(l) > 1L]
data.frame(do.call(rbind, unlist(lapply(l, combn, 2L, simplify = FALSE), recursive = FALSE)))
# X1 X2
# 1 #solarpanels #solar
# 2 #Nuclear #Wind
# 3 #Nuclear #solar
# 4 #Wind #solar
# 5 #steel #windenergy
# 6 #steel #solarenergy
# 7 #steel #carbonfootprint
# 8 #windenergy #solarenergy
# 9 #windenergy #carbonfootprint
# 10 #solarenergy #carbonfootprint
# 11 #solar #wind
More slowly, for clarity:
combn(x, k) returns every possible (unordered) subset of size k from x; what you're after is the pairs from each element of the list. By default, it returns this as a matrix with p = choose(length(x), k) columns, but that's not a helpful format for your use case; simplify = FALSE returns each subset as a new element of a list instead.
So lapply(l, combn, 2L, simplify = FALSE) will look something like:
# [[1]]
# [[1]][[1]]
# [1] "#solarpanels" "#solar"
#
#
# [[2]]
# [[2]][[1]]
# [1] "#Nuclear" "#Wind"
#
# [[2]][[2]]
# [1] "#Nuclear" "#solar"
(we have to filter the length-1 elements of l first, since it's an error to ask for 2 elements from a length-1 object, hence the first line)
The lapply(.) bit is the crux of your issue; the rest is just kludging the output (which already has all the correct data) into a data.frame format.
First, the lapply output is nested -- it's a list of lists. It's more uniform to have a list of length-2 vectors; unlist(., recusive=FALSE) accomplishes this by un-nesting the first level of lists (with recursive=TRUE, we'd wind up with a big long vector and lose the paired structure; we could work with this, but I think maybe a bit unnatural).
Next, we turn the list of length-2 vectors into a matrix (with an eye to the end goal -- a 2-column matrix is very easy to convert to a data.frame); list->matrix is done in base with do.call(rbind, .).
Finally we pass this to data.frame, et voila!
In data.table, I would do it slightly cleaner and in one command:
setDT(transpose(
unlist(lapply(l[lengths(l) > 1L], combn, 2L, simplify = FALSE), recursive = FALSE)
))[]
Given you likely don't care much about intermediate output, this would also be a good place to use magrittr:
library(magrittr)
l[lengths(l) > 1L] %>%
lapply(combn, 2L, simplify = FALSE) %>%
unlist(recursive = FALSE) %>%
do.call(rbind, . ) %>%
data.frame
It's more readable, but in this case, it might be nice to see that data.frame is the end goal up-front, as the intent of the unlist & do.call steps might otherwise be obscure.

Storing unique values of each column (of a df) in list

It is straight forward to obtain unique values of a column using unique. However, I am looking to do the same but for multiple columns in a dataframe and store them in a list, all using base R. Importantly, it is not combinations I need but simply unique values for each individual column. I currently have the below:
# dummy data
df = data.frame(a = LETTERS[1:4]
,b = 1:4)
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols)
{
x = unique(i)
unique_values_by_col[[i]] = x
}
The problem comes when displaying unique_values_by_col as it shows as empty. I believe the problem is i is being passed to the loop as a text not a variable.
Any help would be greatly appreciated. Thank you.
Why not avoid the for loop altogether using lapply:
lapply(df, unique)
Resulting in:
> $a
> [1] A B C D
> Levels: A B C D
> $b
> [1] 1 2 3 4
Or you have also apply that is specifically done to be run on column or line:
apply(df,2,unique)
result:
> apply(df,2,unique)
a b
[1,] "A" "1"
[2,] "B" "2"
[3,] "C" "3"
[4,] "D" "4"
thought if you want a list lapply return you a list so may be better
Your for loop is almost right, just needs one fix to work:
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
x = unique(df[[i]])
unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
#
# $b
# [1] 1 2 3 4
i is just a character, the name of a column within df so unique(i) doesn't make sense.
Anyhow, the most standard way for this task is lapply() as shown by demirev.
Could this be what you're trying to do?
Map(unique,df)
Result:
$a
[1] A B C D
Levels: A B C D
$b
[1] 1 2 3 4

How to get a named list element in R if the appearance of the element is conditional?

I want to include a list element c in a list L in R and name it C.
The example is as follows:
a=c(1,2,3)
b=c("a","b","c")
c=rnorm(3)
L<-list(A=a,
B=b,
C=c)
print(L)
## $A
## [1] 1 2 3
##
## $B
## [1] "a" "b" "c"
##
## $C
## [1] -2.2398424 0.9561929 -0.6172520
Now I want to introduce a condition on C, so it is only included in
the list if C.bool==T:
C.bool<-T
L<-list(A=a,
B=b,
if(C.bool) C=c)
print(L)
## $A
## [1] 1 2 3
##
## $B
## [1] "a" "b" "c"
##
## [[3]]
## [1] -2.2398424 0.9561929 -0.6172520
Now, however, the list element of c is not being named as specified in
the list statement. What's the trick here?
Edit: The intention is to only include the element in the list if the condition is met (no NULL shoul be included otherwise). Can this be done within the core definition of the list?
I don't know why you want to do it "without adding C outside the core definition of the list?" but if you're content with two lists in a single c then:
L <- c(list(A=a, B=b), if(C.bool) list(C=c))
If you really want one list but don't mind subsetting after creation then
L <- list(A=a, B=b, C=if(C.bool) c)[c(TRUE, TRUE, C.bool)]
(pace David Arenburg, isTRUE() omitted for brevity)
you can try this if you want to keep the names
L2 <-list(A=a,
B=b,
C = if (TRUE) c)
You can of course replace TRUE with the statement containing C.bool
You could place the if statement outside the core definition of the list, like this:
L <- list(A = a, B= b)
if (isTRUE(C.bool)) L$C <- c
#> L
#$A
#[1] 1 2 3
#
#$B
#[1] "a" "b" "c"
#
#$C
#[1] -0.7631459 0.7353929 -0.2085646
(Edit with isTRUE() owing to the comment by #DavidArenburg)
As a combination of the previous answers by #MamounBenghezal, #user20637
and the comment made by #DavidArenburg, I would suggest this generalized
version that does not depend on the length of the list:
L <- Filter(Negate(is.null),
x = list(A = a, B = b, C = if (isTRUE(C.bool)) c, D = "foo"))

Resources