I have a data.frame
'data.frame': 4 obs. of 2 variables:
$ name:List of 4
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
..$ : chr "d"
$ tvd :List of 4
..$ : num 0.149
..$ : num 0.188
..$ : num 0.161
..$ : num 0.187
structure(list(name = list("a", "b", "c",
"d"), tvd = list(0.148831029536996, 0.187699857380692,
0.161428147003292, 0.18652668961466)), .Names = c("name",
"tvd"), row.names = c(NA, -4L), class = "data.frame")
It appears that as.data.frame(lapply(z,unlist)) converts it to the usual
'data.frame': 4 obs. of 2 variables:
$ name: Factor w/ 4 levels "a",..: 4 1 2 3
$ tvd : num 0.149 0.188 0.161 0.187
However, I wonder if I could do better.
I create my ugly data frame like this:
as.data.frame(do.call(rbind,lapply(my.list, function (m)
list(name = ...,
tvd = ...))))
I wonder if it is possible to modify this expressing so that it would produce the normal data table.
It looks like you're just trying to tear down your original data then re-assemble it? If so, here are a few cool things to look at. Assume df is your data.
A data.frame is just a list in disguise. To see this, compare df[[1]] to df$name in your data. [[ is used for list indexing, as well as $. So we are actually viewing a list item when we use df$name on a data frame.
> is.data.frame(df) # df is a data frame
# [1] TRUE
> is.list(df) # and it's also a list
# [1] TRUE
> x <- as.list(df) # as.list() can be more useful than unlist() sometimes
# take a look at x here, it's a bit long
> (y <- do.call(cbind, x)) # reassemble to matrix form
# name tvd
# [1,] "a" 0.148831
# [2,] "b" 0.1876999
# [3,] "c" 0.1614281
# [4,] "d" 0.1865267
> as.data.frame(y) # back to df
# name tvd
# 1 a 0.148831
# 2 b 0.1876999
# 3 c 0.1614281
# 4 d 0.1865267
I recommend doing
do.call(rbind,lapply(my.list, function (m)
data.frame(name = ...,
tvd = ...)))
rather than trying to convert a list of lists into a data.frame
Related
I have some data similar to mainList below.
List of 2
$ :List of 3
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "1" "2" "3"
.. .. ..$ col2: chr [1:3] "a" "b" "c"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "3" "7" "4"
.. .. ..$ col2: chr [1:3] "e" "d" "g"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "2" "7" "4"
.. .. ..$ col2: chr [1:3] "l" "o" "i"
$ :List of 3
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "8" "3" "4"
.. .. ..$ col2: chr [1:3] "r" "t" "q"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "7" "5" "2"
.. .. ..$ col2: chr [1:3] "h" "w" "p"
..$ :List of 1
.. ..$ :'data.frame': 3 obs. of 2 variables:
.. .. ..$ col1: chr [1:3] "9" "3" "6"
.. .. ..$ col2: chr [1:3] "x" "y" "z"
I want to merge, or bind the lists based on the lists location in the list of lists.
That is, I want to merge splt1 with splt11, and then merge splt2 with splt22 and finally splt3 with splt33.
So it would take the first data frame from the first List of 3 and merge it with the first data frame from the second List of 3.
This does not get what I want
mainList %>%
map(., ~bind_rows(., .id = "split"))
Since all of the splits are merged into a single data frame (I want them kept separate).
Data:
splt1 <- list(
data.frame(
col1 = c("1", "2", "3"),
col2 = c("a", "b", "c")
)
)
splt2 <- list(
data.frame(
col1 = c("3", "7", "4"),
col2 = c("e", "d", "g")
)
)
splt3 <- list(
data.frame(
col1 = c("2", "7", "4"),
col2 = c("l", "o", "i")
)
)
nestList1 <- list(
splt1,
splt2,
splt3
)
splt11 <- list(
data.frame(
col1 = c("8", "3", "4"),
col2 = c("r", "t", "q")
)
)
splt22 <- list(
data.frame(
col1 = c("7", "5", "2"),
col2 = c("h", "w", "p")
)
)
splt33 <- list(
data.frame(
col1 = c("9", "3", "6"),
col2 = c("x", "y", "z")
)
)
nestList2 <- list(
splt11,
splt22,
splt33
)
mainList <- list(
nestList1,
nestList2
)
EDIT:
Screenshot of the lists:
I am trying to bind together all of the split's, i.e.
split1 will contain the results from 08001, 08003, 08005 ... 0801501 for each of the lists in catalunya_madrid.
split2 will contain the same results 08001, 08003, 08005 ... 0801501
and so on.
EDIT2:
# Function to invert the list structure
invertListStructure <- function(ll) {
nms <- unique(unlist(lapply(ll, function(X) names(X))))
ll <- lapply(ll, function(X) setNames(X[nms], nms))
ll <- apply(do.call(rbind, ll), 2, as.list)
lapply(ll, function(X) X[!sapply(X, is.null)])
}
invertedList <- map(analysis, ~invertListStructure(.) %>%
map(., ~bind_rows(.x, .id = "MITMA")))
You can use purrr::transpose() to group list elements with the same location (i.e. the first element in list 1 with the first element in list 2 and list 3 and so on) for any number of lists. In your case, transpose will convert 592 lists of 216 into 216 lists of 592, each properly titled. With transpose, l[[x]][[y]] becomes l[[y]][[x]].
library(tidyverse)
mainList %>% purrr::transpose() %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
# $splt1
# id col1 col2
# 1 1 1 a
# 2 1 2 b
# 3 1 3 c
# 4 2 8 r
# 5 2 3 t
# 6 2 4 q
#
# $splt2
# id col1 col2
# 1 1 3 e
# 2 1 7 d
# 3 1 4 g
# 4 2 7 h
# 5 2 5 w
# 6 2 2 p
#
# $splt3
# id col1 col2
# 1 1 2 l
# 2 1 7 o
# 3 1 4 i
# 4 2 9 x
# 5 2 3 y
# 6 2 6 z
Note that you only need to flatten if the data.frame is in a list of length 1, by itself. If you have a list of data.frames (as opposed to a list of lists, each of which contains one data.frame, as in your example data), you can ignore the flatten() command and just bind the rows.
Your example dataset doesn't quite match your actual data, but if you make a list of two mainLists, it's closer. These types of operations are heavily dependent on the structure of the data, though, so I can't be sure this is what you need. All you need to do here is add a subscript.
mainList2 <- list(mainList, mainList) # First is Madrid, second is Valencia
# Operations are done on Madrid only
mainList2[[1]] %>%
transpose() %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
If you want to do this for both elements in mainList2, you can wrap the whole thing in map.
mainList2 %>% map(function(x) {
transpose(x) %>%
map(function(x) {
flatten(x) %>% bind_rows(.id = 'id')
})
})
You can combine the pairs in following way :
Map(rbind, unlist(mainList[[1]], recursive = FALSE),
unlist(mainList[[2]], recursive = FALSE))
Or using purrr you can also add an id column easily.
library(purrr)
map2(mainList[[1]] %>% flatten,
mainList[[2]] %>% flatten, dplyr::bind_rows, .id = 'id')
#[[1]]
# id col1 col2
#1 1 1 a
#2 1 2 b
#3 1 3 c
#4 2 8 r
#5 2 3 t
#6 2 4 q
#[[2]]
# id col1 col2
#1 1 3 e
#2 1 7 d
#3 1 4 g
#4 2 7 h
#5 2 5 w
#6 2 2 p
#[[3]]
# id col1 col2
#1 1 2 l
#2 1 7 o
#3 1 4 i
#4 2 9 x
#5 2 3 y
#6 2 6 z
After a previous post regarding coercion of variables into their appropriate format, I realized that the problem is due to unlist():ing, which appears to kill off the object class of variables.
Consider a nested list (myList) of the following structure
> str(myList)
List of 2
$ lst1:List of 3
..$ var1: chr [1:4] "A" "B" "C" "D"
..$ var2: num [1:4] 1 2 3 4
..$ var3: Date[1:4], format: "1999-01-01" "2000-01-01" "2001-01-01" "2002-01-01"
$ lst2:List of 3
..$ var1: chr [1:4] "Q" "W" "E" "R"
..$ var2: num [1:4] 11 22 33 44
..$ var3: Date[1:4], format: "1999-01-02" "2000-01-03" "2001-01-04" "2002-01-05"
which contains different object types (character, numeric and Date) at the lowest level. I`ve been using
myNewLst <- lapply(myList, function(x) unlist(x,recursive=FALSE))
result <- do.call("rbind", myNewLst)
to get the desired structure of my resulting matrix. However, this yields a coercion into character for all variables, as seen here:
> str(result)
chr [1:2, 1:12] "A" "Q" "B" "W" "C" "E" "D" "R" "1" "11" "2" "22" "3" "33" "4" "44" "10592" "10593" "10957" "10959" "11323" "11326" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "lst1" "lst2"
..$ : chr [1:12] "var11" "var12" "var13" "var14" ...
After reading a post on a similar issue, I've attempted to utilize do.call("c", x)
myNewLst <- lapply(myList, function(x) do.call("c", x))
result <- do.call("rbind", myNewLst)
Unfortunately, this also results in all variables being characters, as my first attempt. So my question is: How do I unlist a nested list without loosing the object class of my lower-level variables? Are there alternatives which will accomplish the desired result?
Reproducible code for myList:
myList <- list(
"lst1" = list(
"var1" = c("A","B","C","D"),
"var2" = c(1,2,3,4),
"var3" = c(as.Date('1999/01/01'),as.Date('2000/01/01'),as.Date('2001/01/01'),as.Date('2002/01/01'))
),
"lst2" = list(
"var1" = c("Q","W","E","R"),
"var2" = c(11,22,33,44),
"var3" = c(as.Date('1999/01/02'),as.Date('2000/01/03'),as.Date('2001/01/4'),as.Date('2002/01/05'))
)
)
You can use Reduce() or do.call() to be able to combine all of the to one dataframe. The code below should work
Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F))
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Also the class is maintained:
mapply(class,Reduce(rbind,lapply(myList,data.frame,stringsAsFactors=F)))
var1 var2 var3
"character" "numeric" "Date"
If your goal is to convert this list of lists into a single data frame, the following code should work:
result <- data.frame(var1 = unlist(lapply(myList, function(e) e[1]), use.names = FALSE),
var2 = unlist(lapply(myList, function(e) e[2]), use.names = FALSE),
var3 = as.Date(unlist(lapply(myList, function(e) e[3]), use.names = FALSE), origin = "1970-01-01"))
This gives:
> result
var1 var2 var3
1 A 1 1999-01-01
2 B 2 2000-01-01
3 C 3 2001-01-01
4 D 4 2002-01-01
5 Q 11 1999-01-02
6 W 22 2000-01-03
7 E 33 2001-01-04
8 R 44 2002-01-05
Of course, you could use a for-loop to make the code more succinct if there are multiple variables in each list.
I've got a list with different types in it. They are arranged in matrix form:
tmp <- list('a', 1, 'b', 2, 'c', 3)
dim(tmp) <- c(2,3)
tmp
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] 1 2 3
That's the form I get it out of another more complex function.
Now I want to transpose it and convert to a data.frame. So I do the following:
data <- as.data.frame(t(tmp))
data
V1 V2
1 a 1
2 b 2
3 c 3
This looks great. But it's got the wrong structure:
str(data)
'data.frame': 3 obs. of 2 variables:
$ V1:List of 3
..$ : chr "a"
..$ : chr "b"
..$ : chr "c"
$ V2:List of 3
..$ : num 1
..$ : num 2
..$ : num 3
So how do I get rid of the extra level of lists?
This should do the trick:
df <- data.frame(lapply(data.frame(t(tmp)), unlist), stringsAsFactors=FALSE)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ X1: chr "a" "b" "c"
# $ X2: num 1 2 3
The inner data.frame() call converts the matrix into a two column data.frame, with one "character" column and one "numeric" column.**
lapply(..., unlist) strips away extra list() layer.
The outer data.frame() call converts the resulting list into the data.frame you're after.
** (OK, that intermediate "character" column is really of class "factor", but it ends up making no difference in the final result. If you like, you could force it to be have class "character" by adding a stringsAsFactors=FALSE for the inner data.frame() call as well, but I don't think neglecting to do so would ever make a difference...)
Or this :
as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE))
You can inspect the result:
str(as.data.frame(matrix(unlist(tmp),ncol=2,byrow=TRUE)))
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
This question already has answers here:
Create a data.frame where a column is a list
(4 answers)
Closed 9 years ago.
I can't create a data frame with a column made of a collection of characters.
Is it not possible / should I stick with lists ?
>subsets <- c(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.frame(customerid = customerids,subset =subsets)
> str(transactions)
'data.frame': 2 obs. of 8 variables:
$ customerid : num 1 1
$ subset..a. : Factor w/ 1 level "a": 1 1
$ subset..d. : Factor w/ 1 level "d": 1 1
$ subset..e. : Factor w/ 1 level "e": 1 1
$ subset..a..1: Factor w/ 1 level "a": 1 1
$ subset..b. : Factor w/ 1 level "b": 1 1
$ subset..c. : Factor w/ 1 level "c": 1 1
$ subset..e..1: Factor w/ 1 level "e": 1 1
I think you've written subsets wrongly. If it is in fact this:
subsets <- list(c("a", "d", "e"), c("a", "b", "c", "e"))
# [[1]]
# [1] "a" "d" "e"
# [[2]]
# [1] "a" "b" "c" "e"
And customerids is c(1,1), then you can have subsets as a list in a column of data.frame as the total number of rows will still be the same. You can do it as follows:
DF <- data.frame(id = customerids, value = I(subsets))
# id value
# 1 1 a, d, e
# 2 1 a, b, c, e
sapply(DF, class)
# id value
# "numeric" "AsIs"
Now you can access DF$value and perform operations as you would on a list.
Use data.table instead:
library(data.table)
# note the extra list here
subsets <- list(list("a","d","e"),list("a","b","c","e"))
customerids <- c(1,1)
transactions <- data.table(customerid = customerids, subset = subsets)
str(transactions)
#Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
# $ customerid: num 1 1
# $ subset :List of 2
# ..$ :List of 3
# .. ..$ : chr "a"
# .. ..$ : chr "d"
# .. ..$ : chr "e"
# ..$ :List of 4
# .. ..$ : chr "a"
# .. ..$ : chr "b"
# .. ..$ : chr "c"
# .. ..$ : chr "e"
# - attr(*, ".internal.selfref")=<externalptr>
transactions
# customerid subset
#1: 1 <list>
#2: 1 <list>
Suppose I have an object called v, how do I find out its container type (a vector, a list, a matrix, etc.), without trying each of the is.vector(v), is.list(v) ... ?
There are three functions which will be helpful for you: mode, str and class
First, let's make some data:
nlist <- list(a=c(1,2,3), b=c("a", "b", "c"), c=matrix(rnorm(10),5))
ndata.frame <- data.frame(a=c("a", "b", "c"), b=1:3)
ncharvec <- c("a", "b", "c")
nnumvec <- c(1, 2, 3)
nintvec <- 1:3
So let's use the functions I mentioned above:
mode(nlist)
[1] "list"
str(nlist)
List of 3
$ a: num [1:3] 1 2 3
$ b: chr [1:3] "a" "b" "c"
$ c: num [1:5, 1:2] -0.9469 -0.0602 -0.3601 0.9594 -0.4348 ...
class(nlist)
[1] "list"
Now for the data frame:
mode(ndata.frame)
[1] "list"
This may surprise, you but data frames are simply a list with a data.frame class attribute.
str(ndata.frame)
'data.frame': 3 obs. of 2 variables:
$ a: Factor w/ 3 levels "a","b","c": 1 2 3
$ b: int 1 2 3
class(ndata.frame)
[1] "data.frame"
Note that there are different modes of vectors:
mode(ncharlist)
[1] "character"
mode(nnumvec)
[1] "numeric"
mode(nintvec)
[1] "numeric"
Also see that although nnumvec and nintvec appear identical, they are quite different:
str(nnumvec)
num [1:3] 1 2 3
str(nintvec)
int [1:3] 1 2 3
class(nnumvec)
[1] "numeric"
class(nintvec)
[1] "integer"
Depending on which of these you want should determine what function you use. str is a generally good function to look at variables whereas the other two are more useful in functions.