bind list of lists based on the lists location - r

I have some data similar to mainList below.
List of 2
$ :List of 3
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "1" "2" "3"
.. .. ..$ col2: chr [1:3] "a" "b" "c"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "3" "7" "4"
.. .. ..$ col2: chr [1:3] "e" "d" "g"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "2" "7" "4"
.. .. ..$ col2: chr [1:3] "l" "o" "i"
$ :List of 3
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "8" "3" "4"
.. .. ..$ col2: chr [1:3] "r" "t" "q"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "7" "5" "2"
.. .. ..$ col2: chr [1:3] "h" "w" "p"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "9" "3" "6"
.. .. ..$ col2: chr [1:3] "x" "y" "z"
I want to merge, or bind the lists based on the lists location in the list of lists.
That is, I want to merge splt1 with splt11, and then merge splt2 with splt22 and finally splt3 with splt33.
So it would take the first data frame from the first List of 3 and merge it with the first data frame from the second List of 3.
This does not get what I want
mainList %>%
map(., ~bind_rows(., .id = "split"))
Since all of the splits are merged into a single data frame (I want them kept separate).
Data:
splt1 <- list(
data.frame(
col1 = c("1", "2", "3"),
col2 = c("a", "b", "c")
)
)
splt2 <- list(
data.frame(
col1 = c("3", "7", "4"),
col2 = c("e", "d", "g")
)
)
splt3 <- list(
data.frame(
col1 = c("2", "7", "4"),
col2 = c("l", "o", "i")
)
)
nestList1 <- list(
splt1,
splt2,
splt3
)
splt11 <- list(
data.frame(
col1 = c("8", "3", "4"),
col2 = c("r", "t", "q")
)
)
splt22 <- list(
data.frame(
col1 = c("7", "5", "2"),
col2 = c("h", "w", "p")
)
)
splt33 <- list(
data.frame(
col1 = c("9", "3", "6"),
col2 = c("x", "y", "z")
)
)
nestList2 <- list(
splt11,
splt22,
splt33
)
mainList <- list(
nestList1,
nestList2
)
EDIT:
Screenshot of the lists:
I am trying to bind together all of the split's, i.e.
split1 will contain the results from 08001, 08003, 08005 ... 0801501 for each of the lists in catalunya_madrid.
split2 will contain the same results 08001, 08003, 08005 ... 0801501
and so on.
EDIT2:
# Function to invert the list structure
invertListStructure <- function(ll) {
nms <- unique(unlist(lapply(ll, function(X) names(X))))
ll <- lapply(ll, function(X) setNames(X[nms], nms))
ll <- apply(do.call(rbind, ll), 2, as.list)
lapply(ll, function(X) X[!sapply(X, is.null)])
}
invertedList <- map(analysis, ~invertListStructure(.) %>%
map(., ~bind_rows(.x, .id = "MITMA")))

You can use purrr::transpose() to group list elements with the same location (i.e. the first element in list 1 with the first element in list 2 and list 3 and so on) for any number of lists. In your case, transpose will convert 592 lists of 216 into 216 lists of 592, each properly titled. With transpose, l[[x]][[y]] becomes l[[y]][[x]].
library(tidyverse)
mainList %>% purrr::transpose() %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
# $splt1
# id col1 col2
# 1 1 1 a
# 2 1 2 b
# 3 1 3 c
# 4 2 8 r
# 5 2 3 t
# 6 2 4 q
#
# $splt2
# id col1 col2
# 1 1 3 e
# 2 1 7 d
# 3 1 4 g
# 4 2 7 h
# 5 2 5 w
# 6 2 2 p
#
# $splt3
# id col1 col2
# 1 1 2 l
# 2 1 7 o
# 3 1 4 i
# 4 2 9 x
# 5 2 3 y
# 6 2 6 z
Note that you only need to flatten if the data.frame is in a list of length 1, by itself. If you have a list of data.frames (as opposed to a list of lists, each of which contains one data.frame, as in your example data), you can ignore the flatten() command and just bind the rows.
Your example dataset doesn't quite match your actual data, but if you make a list of two mainLists, it's closer. These types of operations are heavily dependent on the structure of the data, though, so I can't be sure this is what you need. All you need to do here is add a subscript.
mainList2 <- list(mainList, mainList) # First is Madrid, second is Valencia
# Operations are done on Madrid only
mainList2[[1]] %>%
transpose() %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
If you want to do this for both elements in mainList2, you can wrap the whole thing in map.
mainList2 %>% map(function(x) {
transpose(x) %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
})

You can combine the pairs in following way :
Map(rbind, unlist(mainList[[1]], recursive = FALSE),
unlist(mainList[[2]], recursive = FALSE))
Or using purrr you can also add an id column easily.
library(purrr)
map2(mainList[[1]] %>% flatten,
mainList[[2]] %>% flatten, dplyr::bind_rows, .id = 'id')
#[[1]]
# id col1 col2
#1 1 1 a
#2 1 2 b
#3 1 3 c
#4 2 8 r
#5 2 3 t
#6 2 4 q
#[[2]]
# id col1 col2
#1 1 3 e
#2 1 7 d
#3 1 4 g
#4 2 7 h
#5 2 5 w
#6 2 2 p
#[[3]]
# id col1 col2
#1 1 2 l
#2 1 7 o
#3 1 4 i
#4 2 9 x
#5 2 3 y
#6 2 6 z

Related

Combining two sets of variable in R

I am very new to R software and appreciate if you can provide some suggestions to combine variables (antibiotic) within common variable (antibiotic_date).
My original data looks like this (3X3 table);
id: 1
antibiotic: a, b, c
antibiotic_date: 2018-01-20, 2018-01-20, 2018-03-04
Is it possible to transform the above date to (3X table);
id: 1
antibiotic: a b, c
antibiotic_date: 2018-01-20, 2018-03-04
Thank you very much for your help.
Looks like you have
df
# id antibiotic antibiotic_date
# 1 1 a 2018-01-20
# 2 1 b 2018-01-20
# 3 1 c 2018-03-04
Use unique in aggregate.
(res1 <- aggregate(. ~ antibiotic_date, df, unique))
# antibiotic_date id antibiotic
# 1 2018-01-20 1 a, b
# 2 2018-03-04 1 c
Where
str(res1)
# 'data.frame': 2 obs. of 3 variables:
# $ antibiotic_date: chr "2018-01-20" "2018-03-04"
# $ id : chr "1" "1"
# $ antibiotic :List of 2
# ..$ : chr "a" "b"
# ..$ : chr "c"
If you need a string rather than a vector of length > 1 make it toString,
(res2 <- aggregate(. ~ antibiotic_date, df, \(x) toString(unique(x))))
# antibiotic_date id antibiotic
# 1 2018-01-20 1 a, b
# 2 2018-03-04 1 c
where:
str(res2)
# 'data.frame': 2 obs. of 3 variables:
# $ antibiotic_date: chr "2018-01-20" "2018-03-04"
# $ id : chr "1" "1"
# $ antibiotic : chr "a, b" "c"
Or paste,
(res3 <- aggregate(. ~ antibiotic_date, df, \(x) paste(unique(x), collapse=' ')))
# antibiotic_date id antibiotic
# 1 2018-01-20 1 a b
# 2 2018-03-04 1 c
where:
str(res3)
# 'data.frame': 2 obs. of 3 variables:
# $ antibiotic_date: chr "2018-01-20" "2018-03-04"
# $ id : chr "1" "1"
# $ antibiotic : chr "a b" "c"
You can also wrap a sort around it, if needed, e.g. sort(toString(unique(.))).
Data:
df <- structure(list(id = c(1, 1, 1), antibiotic = c("a", "b", "c"),
antibiotic_date = c("2018-01-20", "2018-01-20", "2018-03-04"
)), class = "data.frame", row.names = c(NA, -3L))

Bind_rows() error: "Argument 1 must have names" // Occurs after tidyverse update

After a global tidyverse update, I have noted a change of behaviour in my code and after many researches I am desperately unable to solve the issue. Basically I need to convert a list of elements (including lists) to a dataframe.
Here is a reprex:
Data
x <- list(
col1 = list("a", "b", "c", NA),
col2 = list(1, 2, 3, 4),
col3 = list("value1", "value2", "value1", c("value1", "value2")))
Expected behaviour and output (before tidyverse update):
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x))) %>% as.data.frame()
> x
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 <NA> 4 value1, value2
> str(x)
# 'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
Problem encountered after update
x <- data.frame((sapply(x, c)))
x <- purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# Error: Argument 1 must have names.
# Run `rlang::last_error()` to see where the error occurred.
# In addition: Warning message:
# Outer names are only allowed for unnamed scalar atomic inputs
> rlang::last_error()
# <error/rlang_error>
# Argument 1 must have names.
# Backtrace:
# 1. purrr::map_df(x, function(x) sapply(x, function(x) unlist(x)))
# 2. dplyr::bind_rows(res, .id = .id)
# Run `rlang::last_trace()` to see the full context.
This error seems to be well known, and I have explored many options with the purrr::flatten_() family, and others found on Stackoverflow but was not able to solve.
Thank you if any help is providable!
The first part of your attempt gives you a list for every column irrespective of it's length.
x <- data.frame((sapply(x, c)))
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1:List of 4
# ..$ : chr "a"
# ..$ : chr "b"
# ..$ : chr "c"
# ..$ : logi NA
# $ col2:List of 4
# ..$ : num 1
# ..$ : num 2
# ..$ : num 3
# ..$ : num 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
You can unlist the above for columns with only 1 element.
x[] <- lapply(x, function(p) if(max(lengths(p)) == 1) unlist(p) else p)
x
# col1 col2 col3
#1 a 1 value1
#2 b 2 value2
#3 c 3 value1
#4 <NA> 4 value1, value2
str(x)
#'data.frame': 4 obs. of 3 variables:
# $ col1: chr "a" "b" "c" NA
# $ col2: num 1 2 3 4
# $ col3:List of 4
# ..$ : chr "value1"
# ..$ : chr "value2"
# ..$ : chr "value1"
# ..$ : chr "value1" "value2"
One option utilizing dplyr, tibble and purrr could be:
imap_dfc(x, ~ tibble(!!.y := .x)) %>%
mutate(across(where(~ all(lengths(.) == 1)), ~ unlist(.)))
col1 col2 col3
<chr> <dbl> <list>
1 a 1 <chr [1]>
2 b 2 <chr [1]>
3 c 3 <chr [1]>
4 <NA> 4 <chr [2]>
no tidyverse solution, but it seems to work..
library( rlist )
as.data.frame( rlist::list.cbind( x ) )
# col1 col2 col3
# 1 a 1 value1
# 2 b 2 value2
# 3 c 3 value1
# 4 NA 4 value1, value2

Name columns with Dataframe name in a list of Dataframes

Objective: Change colname of dataframes in a list of dataframes to the name of each dataframe.
I have some issues when dealing with list and dataframes regarding its name. I have prepared this example to clarify. Hope it is not a mess.
Data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(A = 3, B = 3, C = 2)
dfList <- list(df1,df2)
Output:
> str(dfList)
List of 2
$ :'data.frame': 1 obs. of 3 variables:
..$ A: num 1
..$ B: num 2
..$ C: num 3
$ :'data.frame': 1 obs. of 3 variables:
..$ A: num 3
..$ B: num 3
..$ C: num 2
> names(dfList)
NULL
> names(dfList$df1)
NULL
> names(dfList$df2)
NULL
Manually Input names:
names(dfList) <- c("df1", "df2")
dfList <- lapply(dfList, setNames, c("A", "B", "C"))
Which yields:
> str(dfList)
List of 2
$ df1:'data.frame': 1 obs. of 3 variables:
..$ A: num 1
..$ B: num 2
..$ C: num 3
$ df2:'data.frame': 1 obs. of 3 variables:
..$ A: num 3
..$ B: num 3
..$ C: num 2
> names(dfList)
[1] "df1" "df2"
> names(dfList$df1)
[1] "A" "B" "C"
> names(dfList$df2)
[1] "A" "B" "C"
Desired Solution:
WishedList <- dfList
WishedList[[1]] <- setNames(WishedList[[1]], c("A", "B", "df1"))
WishedList[[2]] <- setNames(WishedList[[2]], c("A", "B", "df2"))
Output solution:
> str(WishedList)
List of 2
$ df1:'data.frame': 1 obs. of 3 variables:
..$ A : num 1
..$ B : num 2
..$ df1: num 3
$ df2:'data.frame': 1 obs. of 3 variables:
..$ A : num 3
..$ B : num 3
..$ df2: num 2
> names(WishedList)
[1] "df1" "df2"
> names(WishedList$df1)
[1] "A" "B" "df1"
> names(WishedList$df2)
[1] "A" "B" "df2"
MyTry:
TryList1 <- lapply(dfList, function(x) setNames(x, c("A", "B", quote(x))))
str(TryList1)
List of 2
$ df1:'data.frame': 1 obs. of 3 variables:
..$ A: num 1
..$ B: num 2
..$ x: num 3
$ df2:'data.frame': 1 obs. of 3 variables:
..$ A: num 3
..$ B: num 3
..$ x: num 2
Doubts:
1) Why when creating the file the names both of the dataframes and of the cols of the dataframes are not included in the list?
2) quote(x) with a single dataframe works. Why not in the list?
> df1 <- data.frame(A = 1, B = 2, C = 3)
> df1 <- setNames(df1, c("A", "B", quote(df1)))
> names(df1)
[1] "A" "B" "df1"
Thank you very much!
Here's a slightly different approach:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(A = 3, B = 3, C = 2)
dfList <- list(df1,df2)
names(dfList) <- c("df1", "df2")
Map(function(df, dfn) {names(df)[3] <- dfn; df}, dfList, names(dfList))
#$df1
# A B df1
#1 1 2 3
#
#$df2
# A B df2
#1 3 3 2
You could alternatively use setNames(df, c("A", "B", dfn)) inside the mapply function.
A note on OP's trial: The documentation for quote states:
quote simply returns its argument.
That's why when you use quote(x) inside lapply, it simply returns the character x.
We can lapply() over names(dfList) instead of dfList:
lapply(names(dfList), function(dfn) {
df <- dfList[[dfn]]
names(df)[3] <- dfn
df
})
# [[1]]
# A B df1
# 1 1 2 3
#
# [[2]]
# A B df2
# 1 3 3 2
There's a convenience function in purrr that maps over a list and its names simultaneously:
library(purrr)
imap(dfList, ~ {
names(.x)[3] <- .y
.x
})
# $df1
# A B df1
# 1 1 2 3
#
# $df2
# A B df2
# 1 3 3 2
Or if you're after a short one-liner and don't mind hard-coding "A" and "B":
imap(dfList, ~ setNames(.x, c("A", "B", .y)))
(NB: Essentially those are just variations around Docendo discimus' answer).
Also, not your expected output but maybe of interest for you:
dplyr::bind_rows(dfList, .id = "origin")
# origin A B C
# 1 df1 1 2 3
# 2 df2 3 3 2
Or:
bind_rows(map(dfList, select, -C), .id = "C")
# C A B
# 1 df1 1 2
# 2 df2 3 3

Is there a way to define a subsequent set of data.frame in R?

If I have a data.frame like this:
X1 X2
1 1 A
2 2 A
3 3 B
4 4 B
5 5 A
6 6 A
7 7 B
8 8 B
9 9 A
10 10 A
My goal is to define a set of data.frame as:
y1<-data[1:2,]
y2<-data[3:4,]
y3<-data[5:6,] ##...etc. by a loop.
Therefore, ideally I would like to use (for instance) a for loop
for (i in 1:5){
y_i <- data[2*i:2*(i+1), ]
}
However, I cannot figure out how to define a subsequent set of data.frame such as y_i. Is there any method able to do this? Thanks in advance.
Use a list for y and generate a sequence for the indexing:
y <- lapply(seq(from=1, to=nrow(dat), by=2), function(i) {
dat[i:(i+1),]
})
str(y)
## List of 5
## $ :'data.frame': 2 obs. of 2 variables:
## ..$ X1: int [1:2] 1 2
## ..$ X2: chr [1:2] "A" "A"
## $ :'data.frame': 2 obs. of 2 variables:
## ..$ X1: int [1:2] 3 4
## ..$ X2: chr [1:2] "B" "B"
## $ :'data.frame': 2 obs. of 2 variables:
## ..$ X1: int [1:2] 5 6
## ..$ X2: chr [1:2] "A" "A"
## $ :'data.frame': 2 obs. of 2 variables:
## ..$ X1: int [1:2] 7 8
## ..$ X2: chr [1:2] "B" "B"
## $ :'data.frame': 2 obs. of 2 variables:
## ..$ X1: int [1:2] 9 10
## ..$ X2: chr [1:2] "A" "A"
If this is based on the adjacent values that are same on the second column
lst <- split(df,with(df,cumsum(c(TRUE,X2[-1]!=X2[-nrow(df)]))))
If you need individual data.frame objects
list2env(setNames(lst, paste0('y', seq_along(lst))), envir=.GlobalEnv)
#<environment: R_GlobalEnv>
y1
# X1 X2
#1 1 A
#2 2 A
Or if it is only based on a fixed number 2
split(df,as.numeric(gl(nrow(df),2, nrow(df))))
data
df <- structure(list(X1 = 1:10, X2 = c("A", "A", "B", "B", "A", "A",
"B", "B", "A", "A")), .Names = c("X1", "X2"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
You can use assign. It will help you get the data frames you need with the naming convention you asked for.
for (i in 1:5){
assign(paste("y", i, sep="_"), data[(i*2-1):(i*2), ])
}
data <- data.frame(X1 = c(1:10), X2 = c("A", "A", "B", "B", "A", "A", "B", "B", "A", "A"))
lapply(1:5, function (i) assign(paste("y", i, sep="_"), data[2*i-1:2*i, ], envir=.GlobalEnv))
This would also work. As 'Cancer' said, assign can be helpful in this situation.
I just change for loop to lapply function.

creating a data.frame whose column will hold a list in each row [duplicate]

This question already has answers here:
Create a data.frame where a column is a list
(4 answers)
Closed 9 years ago.
I can't create a data frame with a column made of a collection of characters.
Is it not possible / should I stick with lists ?
>subsets <- c(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.frame(customerid = customerids,subset =subsets)
> str(transactions)
'data.frame': 2 obs. of 8 variables:
$ customerid : num 1 1
$ subset..a. : Factor w/ 1 level "a": 1 1
$ subset..d. : Factor w/ 1 level "d": 1 1
$ subset..e. : Factor w/ 1 level "e": 1 1
$ subset..a..1: Factor w/ 1 level "a": 1 1
$ subset..b. : Factor w/ 1 level "b": 1 1
$ subset..c. : Factor w/ 1 level "c": 1 1
$ subset..e..1: Factor w/ 1 level "e": 1 1
I think you've written subsets wrongly. If it is in fact this:
subsets <- list(c("a", "d", "e"), c("a", "b", "c", "e"))
# [[1]]
# [1] "a" "d" "e"
# [[2]]
# [1] "a" "b" "c" "e"
And customerids is c(1,1), then you can have subsets as a list in a column of data.frame as the total number of rows will still be the same. You can do it as follows:
DF <- data.frame(id = customerids, value = I(subsets))
# id value
# 1 1 a, d, e
# 2 1 a, b, c, e
sapply(DF, class)
# id value
# "numeric" "AsIs"
Now you can access DF$value and perform operations as you would on a list.
Use data.table instead:
library(data.table)
# note the extra list here
subsets <- list(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.table(customerid = customerids, subset = subsets)
str(transactions)
#Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
# $ customerid: num 1 1
# $ subset :List of 2
# ..$ :List of 3
# .. ..$ : chr "a"
# .. ..$ : chr "d"
# .. ..$ : chr "e"
# ..$ :List of 4
# .. ..$ : chr "a"
# .. ..$ : chr "b"
# .. ..$ : chr "c"
# .. ..$ : chr "e"
# - attr(*, ".internal.selfref")=<externalptr>
transactions
# customerid subset
#1: 1 <list>
#2: 1 <list>

Resources