combine list elements based on element names - r

How to combine this list of vectors by elements names ?
L1 <- list(F01=c(1,2,3,4),F02=c(10,20,30),F01=c(5,6,7,8,9),F02=c(40,50))
So to get :
results <- list(F01=c(1,2,3,4,5,6,7,8),F02=c(10,20,30,40,50))
I tried to apply the following solution merge lists by elements names but I can't figure out how to adapt this to my situation.

sapply(unique(names(L1)), function(x) unname(unlist(L1[names(L1)==x])), simplify=FALSE)
$F01
[1] 1 2 3 4 5 6 7 8 9
$F02
[1] 10 20 30 40 50

You can achieve the same result using map function from purrr
map(unique(names(L1)), ~ flatten_dbl(L1[names(L1) == .x])) %>%
set_names(unique(names(L1)))
The first line transforms the data by merging elements with matching names, while the last line renames new list accordingly.

Related

Select the first element from the first list, the second element from the second list, and so on, in a nested list

Let's say I have a list like this:
lst <- list(list(1,2,3),list(4,5,6),list(7,8,9))
I would then like to extract the elements 1, 5, and 9. How should I do that in an efficient manner? I've come across this post; Select first element of nested list, where it is suggested that one should use:
lapply(x, '[[', 1)
to select the first element of a nested list. I was wondering if something similar could be done in the situation described above?
You can use the sapply using the length of the list and the function to subset the list as below:
sapply(1:length(lst), function(x) lst[[x]][[x]])
also:
mapply('[[', lst, seq_along(lst))
[1] 1 5 9
Maybe the code below works only for your specific example
> unlist(diag(do.call(cbind,lst)))
[1] 1 5 9
Use
library(purrr)
imap_dbl(lst, ~ pluck(.x, .y))
[1] 1 5 9
Or more compactly
imap_dbl(lst, pluck)
[1] 1 5 9

R use of lapply() to populate and name one column in list of dataframes

After searching for some time, I cannot find a smooth R-esque solution.
I have a list of vectors that I want to convert to dataframes and add a column with the names of the vectors. I cant do this with cbind() and melt() to a single dataframe b/c there are vectors with different number of rows.
Basic example would be:
list<-list(a=c(1,2,3),b=c(4,5,6,7))
var<-"group"
What I have come up with and works is:
list<-lapply(list, function(x) data.frame(num=x,grp=""))
for (j in 1:length(list)){
list[[j]][,2]<-names(list[j])
names(list[[j]])[2]<-var
}
But I am trying to better use lapply() and have cleaner coding practices. Right now I rely so heavily on for and if statements, which a lot of the base functions do already and much more efficiently than I can code at this point.
The psuedo code I would like is something like:
list<-lapply(list, function(x) data.frame(num=x,get(var)=names(x))
Is there a clean way to get this done?
Second closely related question, if I already have a list of dataframes, why is it so hard to reassign column values and names using lapply()?
So using something like:
list<-list(a=data.frame(num=c(1,2,3),grp=""),b=data.frame(num=c(4,5,6,7),grp=""))
var<-"group"
#pseudo code
list<-lapply(list, function(x) x[,2]<-names(x)) #populate second col with name of df[x]
list<-lapply(list, function(x) names[[x]][2]<-var) #set 2nd col name to 'var'
The first line of pseudo code throws an error about matching row lengths. Why does lapply() not just loop over and repeat names(x) like the same function on a single dataframe does in a for loop?
For the second line, as I understand it I can use setNames() to reassign all the column names, but how do I make this work for just one of the col names?
Many thanks for any ideas or pointing to other threads that cover this and helping me understand the behavior of lapply() in this context.
A full R base approach without using loops
> l<-list(a=c(1,2,3),b=c(4,5,6,7))
> data.frame(grp=rep(names(l), lengths(l)), num=unlist(l), row.names = NULL)
grp num
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
Related to your first/main question you can use the function enframe from package tibble for this purpose
library(tibble)
library(tidyr)
library(dplyr)
l<-list(a=c(1,2,3),b=c(4,5,6,7))
l %>%
enframe(name = "group", value="value") %>%
unnest(value) %>%
group_split(group)
Try this:
library(dplyr)
mylist <- list(a = c(1,2,3), b = c(4,5,6,7))
bind_rows(lapply(names(mylist), function(x) tibble(grp = x, num = mylist[[x]])))
# A tibble: 7 x 2
grp num
<chr> <dbl>
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
7 b 7
This is essentially a lapply-based solution where you iterate over the names of your list, and not the individual list elements themselves. If you prefer to do everything in base R, note that the above is equivalent to
do.call(rbind, lapply(names(mylist), function(x) data.frame(grp = x, num = mylist[[x]], stringsAsFactors = F)))
Having said that, tibbles as modern implementation of data.frames are preferred, as is bind_rows over the do.call(rbind... construct.
As to the second question, note the following:
lapply(mylist, function(x) str(x))
num [1:3] 1 2 3
num [1:4] 4 5 6 7
....
lapply(mylist, function(x) names(x))
$a
NULL
$b
NULL
What you see here is that the function inside of lapply gets the elements of mylist. In this case, it get's to work with the numeric vector. This does not have any name as far as the function that is called inside lapply is concerned. To highlight this, consider the following:
names(c(1,2,3))
NULL
Which is the same: the vector c(1,2,3) does not have a name attribute.

Insert i th vector number into data frame column name - R

This is likely a quick fix! I am trying to place the ith position of my vector into my data frame column name. I am trying to use paste0 to enter the ith number.
sma <- 2:20
> sma
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# Place i number from sma vector to data frame column name
spx.sma <- df$close.sma.paste0("n", sma[i])
Column name should read:
"close.sma.n2"
If I print
paste0("n", sma[i])
I obtain:
> paste0("n", sma[i])
[1] "n2"
So really if i paste this into my data frame column name then it should read:
close.sma.n2
What is the correct method to achieve this?
I achieve the error:
> spx.sma <- df$close.sma.paste0(".n", sma[i])
Error: attempt to apply non-function
You should treat the dataframe as a list. So avoid the "$" operator and instead use [[]].
so:
spx.sma <- df[[paste0("close.sma.n", sma[i])]]

Filtering List of Objects in R through lapply

How has a lapply function to be structured to pull out a specific objects by index? I have a List of Lists. I now want to get every even 2nd, 4th and 5th element of the list and put them into a data frame. I thought the easiest way would be to use lapply and simply get the entries like this:
list <-lapply(ll, function(x) { x[[2]]; x[[4]]; x[[5]] }
But that won't work as it seems.
this will work:
ll <- list(as.list(1:10),
as.list(11:20),
as.list(21:30))
library(magrittr)
output1 <- ll %>% sapply(function(x){c(x[[2]],x[[4]],x[[5]])}) %>% t %>% as.data.frame
# or with base syntax:
output2 <- as.data.frame(t(sapply(ll,function(x){c(x[[2]],x[[4]],x[[5]])})))
# V1 V2 V3
# 1 2 4 5
# 2 12 14 15
# 3 22 24 25
your function is returning the result of the last operation, which in your case is ``x[[5]]`. the 2 operations you made before are lost.
Not sure what you want this data.frame to look like, but you can extract the 2, 4, and 5 elements with
lapply(ll, `[`, c(2,4,5))
and if you wanted to turn those into rows, you could do
do.call("rbind",lapply(ll, `[`, c(2,4,5)))
If you wanted them to become columns, you could do
data.frame(sapply(ll, `[`, c(2,4,5)))

Change multiple dataframes in a loop

I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var) by a character string, so I could create the new variables by another for loop nested in the other two.
Hope I've not been too tangled in explain my problem.
Thanks in advance.
You can put your dataframes in a list and use lapply to process them one by one. So no need to use a loop in this case.
For example you can do this :
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
I had the same need and wanted to change also the columns in my actual list of dataframes.
I found a great method here (the purrr::map2 method in the question works for dataframes with different columns), followed by
list2env(list_of_dataframes ,.GlobalEnv)

Resources