R: Merge lists containing vectors and scalars - r

I have N lists that have some identical column names. Here is a MWE with two list:
ls <- list()
ls[[1]] <- list("a"=1:2,
"b"=20,
"c"=numeric(0))
names(ls[[1]]$a) <- c("a1", "a2")
ls[[2]] <- list("a"=3:4,
"b"=30,
"c"=1:4,
"d"="f")
names(ls[[2]]$a) <- c("a1", "a2")
Is it possible merge these into a resulting list lsRes, where lsRes has the following properties:
lsRes$a contains two elements, where the first is the named vector
c(1,2) (with names c(a1, a2)) and the second a named vector c(3,4)
(with names (c(a1.a2)))
lsRes$b contains two elements, where the first is 20 and the second is 30
lsRes$c contains two elements, where the first is numeric(0) and the second is 1:4
lsRes$d contains
two elements, where the first is NA and the second is "f"
I looked at this and this, but they describe different cases

Assuming that we need to have the output also as a list, we create the common names and then assign those doesn't any of the common names to NA
nm1 <- unique(unlist(sapply(ls, names)))
lsRes <- lapply(ls, function(x) {x[setdiff(nm1, names(x))] <- NA; x})
lengths(lsRes)
#[1] 4 4
If we need to have a list of 4 elements, then use transpose
library(purrr)
lsRes %>%
transpose

Related

turning lists of lists of lists into a dataframe

I have a set of lists stored in the all_lists.
all_list=c("LIST1","LIST2")
From these, I would like to create a data frame such that
LISTn$findings${Coli}$character is entered into the n'th column with rowname from LISTn$rowname.
DATA
LIST1=list()
LIST1[["findings"]]=list(s1a=list(character="a1",number=1,string="a1type",exp="great"),
=list(number=2,string="b1type"),
in2a=list(character="c1",number=3,string="c1type"),
del3b=list(character="d1",number=4,string="d1type"))
LIST1[["rowname"]]="Row1"
LIST2=list()
LIST2[["findings"]]=list(s1a=list(character="a2",number=5,string="a2type",exp="great"),
s1b=list(character="b2",number=6,string="b2type"),
in2a=list(character="c2",number=7,string="c2type"),
del3b=list(character="d2",number=8,string="d2type"))
LIST2[["rowname"]]="Row2"
Please note that some characters are missing for which NA would suffice.
Desired output is this data frame:
s1a s1b in2a del3b
Row1 a1 NA c1 d1
Row2 a2 b2 c2 d2
There is about 1000 of these lists, speed is a factor. And each list is about 50mB after I load them through rjson::fromJSON(file=x)
The row and column names don't follow a particular pattern. They're names and attributes
We can use a couple of lapply/sapply combinations to loop over the nested list and extract the elements that have "Row" as the name
do.call(rbind, lapply(mget(all_list), function(x)
sapply(lapply(x$findings[grep("^Row\\d+", names(x$findings))], `[[`,
"character"), function(x) replace(x, is.null(x), NA))))
Or it can be also done by changing the names to a single value and then extract all those
do.call(rbind, lapply(mget(all_list), function(x) {
x1 <- setNames(x$findings, rep("Row", length(x$findings)) )
sapply(x1[names(x1)== "Row"], function(y)
pmin(NA, y$character[1], na.rm = TRUE)[1])}))
purrr has a strong function called map_chr which is built for these tasks.
library(purrr)
sapply(mget(all_list),function(x) purrr::map_chr(x$findings,"character",.default=NA))
%>% t
%>% data.frame

combine multiple dataframes based on sequence of names

Say I have 30 dataframes all named with a date from 01/01/2000 to 30/01/2000 in the format of ddmmyy (code below) :
Season <- seq(as.Date("2000-01-01"),as.Date("2000-01-30"),1)
Season <- format(Season,"%d%m%y")
for (s in Season) {
df <- data.frame(X=1:10, Y=1:10)
aa <- paste(s,"tests",s ,sep = "_")
assign(aa,df)
}
Each name, you cans see, has the word tests added to it.I want to combine (rbind?) the data.frames based on the date. In this case, combine data.frames that contain the dates from 01-01-00 to 10-01-00.
I have the below code to combine all dataframes but what if I only want to select the ones shown above?
All_dfs <- do.call(rbind, eapply(.GlobalEnv,function(x) if(is.data.frame(x)) x))
Is it better to create a list first?
We can use mget to get the values of 'Season' in a list and then rbind the list of data.frames. As there is a suffix "tests" followed by "Season" concatenated to the "Season", we can use paste to get the string, then use mget.
res <- do.call(rbind, mget( paste0(Season[1:10], "_tests_", Season[1:10])))
dim(res)
#[1] 100 2

R: Merging lists of data frames

I'm a total noob at R and I've tried (and retried) to search for an answer to the following problem, but I've not been able to get any of the proposed solutions to do what I'm interested in.
I have two lists of named elements, with each element pointing to data frames with identical layouts:
(EDIT)
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"))
df2 <- data.frame(A=c(98,99),B=c("Y","Z"))
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"))
lst2 <- c(X=df3)
(EDIT 2)
So it seems like storing multiple data frames in a list is a bad idea, as it will convert the data frames to lists. So I'll go out looking for an alternative way to store a set of named data frames.
In general the names of the elements in the two elements might overlap partially, completely, or not at all.
I'm looking for a way to merge the two lists into a single list:
<some-function-sequence>(lst1, lst2)
->
c(X=rbind(df1,df3),Y=df2)
-resulting in something like this:
[EDIT: Syntax changed to correctly reflect desired result (list-of-data frames)]
$X
A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
$X.B
A B
1 98 Y
2 99 Z
I.e:
IF the lists contain identical element names, each pointing to a data frame, THEN I want to 'rbind' the rows from these two data frames and assign the resulting data frame to the same element name in the resulting list.
Otherwise the element names and data frames from both lists should just be copied into the resulting list.
I've tried the solutions from a number of discussions such as:
Can I combine a list of similar dataframes into a single dataframe?
Combine/merge lists by elements names
Simultaneously merge multiple data.frames in a list
Combine/merge lists by elements names (list in list)
Convert a list of data frames into one data frame
-but I've not been able to find the right solution. A general problem seems to be that the data frame ends up being converted into a list by the application of 'mapply/sapply/merge/...' - and usually also sliced and/or merged in ways which I am not interested in. :)
Any help with this will be much appreciated!
[SOLUTION]
The solution seems to be to change the use of c(...) when collecting data frames to list(...) after which the solution proposed by Pierre seems to give the desired result.
Here is a proposed solution using split and c to combine like terms. Please read the caveat at the bottom:
s <- split(c(lst1, lst2), names(c(lst1,lst2)))
lapply(s, function(lst) do.call(function(...) unname(c(...)), lst))
# $X.A
# [1] 1 2 3 4 5
#
# $X.B
# [1] "A" "B" "C" "D" "E"
#
# $Y.A
# [1] 98 99
#
# $Y.B
# [1] "Y" "Z"
This solution is based on NOT having factors as strings. It will not throw an error but the factors will be converted to numbers. Below I show how I transformed the data to remove factors. Let me know if you require factors:
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"), stringsAsFactors=FALSE)
df2 <- data.frame(A=c(98,99),B=c("Y","Z"), stringsAsFactors=FALSE)
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"), stringsAsFactors=FALSE)
lst2 <- c(X=df3)
If the data is stored in lists we can use:
lapply(split(c(lst1, lst2), names(c(lst1,lst2))), function(lst) do.call(rbind, lst))
The following solution is probably not the most efficient way. However, if I got your problem right this should work ;)
# Example data
# Some vectors
a <- 1:5
b <- 3:7
c <- rep(5, 5)
d <- 5:1
# Some dataframes, data1 and data3 have identical column names
data1 <- data.frame(a, b)
data2 <- data.frame(c, b)
data3 <- data.frame(a, b)
data4 <- data.frame(c, d)
# 2 lists
list1 <- list(data1, data2)
list2 <- list(data3, data4)
# Loop, wich checks for the dataframe names and rbinds dataframes with the same column names
final_list <- list1
used_lists <- numeric()
for(i in 1:length(list1)) {
for(j in 1:length(list2)) {
if(sum(colnames(list1[[i]]) == colnames(list2[[j]])) == ncol(list1[[i]])) {
final_list[[i]] <- rbind(list1[[i]], list2[[j]])
used_lists <- c(used_lists, j)
}
}
}
# Adding the other dataframes, which did not have the same column names
for(i in 1:length(list2)) {
if((i %in% used_lists) == FALSE) {
final_list[[length(final_list) + 1]] <- list2[[i]]
}
}
# Final list, which includes all other lists
final_list

lapply: extract specific element

I have a list of subsets obtained through:
lapply(1:5, function(x) combn(5,x))
I would like to extract a specific vector from this list. For example, the 16th element of this list, which is (1,2,3). Any hints? Thanks.
The command produces all the subsets of (1,2,3,4,5), which is a list of 2^5=32 subsets. The 16th being (1,2,3). I want to know how to extract this by using its position (16th).
We could try by splitting (split) the matrix to a list of vectors for each list elements, concatenate c the output to flatten the list, and subset using the numeric index.
lst2 <- do.call(`c`,lapply(lst, function(x) split(x, col(x))))
lst2[[16]]
#[1] 1 2 3
Or instead of splitting the matrix output, we could use the FUN argument within combn to create list and then concatenate c using do.call
lst <- do.call(`c`,lapply(1:5, function(x) combn(5, x, FUN=list)))
lst[[16]]
#[1] 1 2 3
Or instead of do.call(c,..), we can use (contributed by #Marat Talipov)
lst <- unlist(lapply(1:5, function(x)
combn(5, x, FUN=list)), recursive=FALSE)
data
lst <- lapply(1:5, function(x) combn(5,x))
I would rather consider producing the right data instead of looping again on them :)
lst = Reduce('c', lapply(1:5, function(x) as.list(data.frame(combn(5,x)))))
> lst[[16]]
[1] 1 2 3

rename the columns name after cbind the data

merger <- cbind(as.character(Date),weather1$High,weather1$Low,weather1$Avg..High,weather1$Avg.Low,sale$Scanned.Movement[a])
After cbind the data, the new DF has column names automatically V1, V2......
I want rename the column by
colnames(merger)[,1] <- "Date"
but failed. And when I use merger$V1 ,
Error in merger$V1 : $ operator is invalid for atomic vectors
You can also name columns directly in the cbind call, e.g.
cbind(date=c(0,1), high=c(2,3))
Output:
date high
[1,] 0 2
[2,] 1 3
Try:
colnames(merger)[1] <- "Date"
Example
Here is a simple example:
a <- 1:10
b <- cbind(a, a, a)
colnames(b)
# change the first one
colnames(b)[1] <- "abc"
# change all colnames
colnames(b) <- c("aa", "bb", "cc")
you gave the following example in your question:
colnames(merger)[,1]<-"Date"
the problem is the comma: colnames() returns a vector, not a matrix, so the solution is:
colnames(merger)[1]<-"Date"
If you pass only vectors to cbind() it creates a matrix, not a dataframe. Read ?data.frame.
A way of producing a data.frame and being able to do this in one line is to coerce all matrices/data frames passed to cbind into a data.frame while setting the column names attribute using setNames:
a = matrix(rnorm(10), ncol = 2)
b = matrix(runif(10), ncol = 2)
cbind(setNames(data.frame(a), c('n1', 'n2')),
setNames(data.frame(b), c('u1', 'u2')))
which produces:
n1 n2 u1 u2
1 -0.2731750 0.5030773 0.01538194 0.3775269
2 0.5177542 0.6550924 0.04871646 0.4683186
3 -1.1419802 1.0896945 0.57212043 0.9317578
4 0.6965895 1.6973815 0.36124709 0.2882133
5 0.9062591 1.0625280 0.28034347 0.7517128
Unfortunately, there is no setColNames function analogous to setNames for data frames that returns the matrix after the column names, however, there is nothing to stop you from adapting the code of setNames to produce one:
setColNames <- function (object = nm, nm) {
colnames(object) <- nm
object
}
See this answer, the magrittr package contains functions for this.
If you offer cbind a set of arguments all of whom are vectors, you will get not a dataframe, but rather a matrix, in this case an all character matrix. They have different features. You can get a dataframe if some of your arguments remain dataframes, Try:
merger <- cbind(Date =as.character(Date),
weather1[ , c("High", "Low", "Avg..High", "Avg.Low")] ,
ScnMov =sale$Scanned.Movement[a] )
It's easy just add the name which you want to use in quotes before adding
vector
a_matrix <- cbind(b_matrix,'Name-Change'= c_vector)

Resources