I have a list of some length(let's say 1000). Each element of the list is another list of length = 2. Each element of the new list is a data.table. The second element of each list might be an empty data.table.
I need to rbind() all the data.frames that are in the first position of the list. I am currently doing the following:
DT1 = data.table()
DT2 = data.table()
for (i in 1:length(myList)){
DT1 = rbind(DT1, myList[[i]][[1]]
DT2 = rbind(DT2, myList[[i]][[2]]
}
This works, but it is too slow. Is there a way I can avoid the for-loop?
Thank you in advance!
data table has a dedicated fast function: rbindlist
Cf: http://www.inside-r.org/packages/cran/data.table/docs/rbindlist
Edited:
Here is an example of code
library(data.table)
srcList=list(list(DT1=data.table(X=0),DT2=NULL),list(DT1=data.table(X=2),data.table(Y=3)))
# first have a list for all DT1s
DT1.list= lapply(srcList, FUN=function(el){el$DT1})
rbindlist(DT1.list)
X
1: 0
2: 2
Do this:
do.call("rbind", lapply(df.list, "[[", 1)) # for first list element
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# 4 4 40
# 5 5 50
# 6 6 60
do.call("rbind", lapply(df.list, "[[", 2)) # for second list element
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# 4 4 70
# 5 5 80
# 6 6 90
DATA
df.list=list(list(structure(list(x = 1:3, y = c(10, 20, 30)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
x = 1:3, y = c(30, 40, 50)), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")), list(structure(list(x = 4:6, y = c(40,
50, 60)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame"),
structure(list(x = 4:6, y = c(70, 80, 90)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame")))
# df.list
# [[1]]
# [[1]][[1]]
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# [[1]][[2]]
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# [[2]]
# [[2]][[1]]
# x y
# 1 4 40
# 2 5 50
# 3 6 60
# [[2]][[2]]
# x y
# 1 4 70
# 2 5 80
# 3 6 90
Related
This question already has answers here:
Merging a lot of data.frames [duplicate]
(1 answer)
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Closed 2 years ago.
I want to merge the following 3 data frames and fill the missing values with -1. I think I should use the fct merge() but not exactly know how to do it.
> df1
Letter Values1
1 A 1
2 B 2
3 C 3
> df2
Letter Values2
1 A 0
2 C 5
3 D 9
> df3
Letter Values3
1 A -1
2 D 5
3 B -1
desire output would be:
Letter Values1 Values2 Values3
1 A 1 0 -1
2 B 2 -1 -1 # fill missing values with -1
3 C 3 5 -1
4 D -1 9 5
code:
> dput(df1)
structure(list(Letter = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Values1 = c(1, 2, 3)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df2)
structure(list(Letter = structure(1:3, .Label = c("A", "C", "D"
), class = "factor"), Values2 = c(0, 5, 9)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df3)
structure(list(Letter = structure(c(1L, 3L, 2L), .Label = c("A",
"B", "D"), class = "factor"), Values3 = c(-1, 5, -1)), class = "data.frame", row.names = c(NA,
-3L))
You can get data frames in a list and use merge with Reduce. Missing values in the new dataframe can be replaced with -1.
new_df <- Reduce(function(x, y) merge(x, y, all = TRUE), list(df1, df2, df3))
new_df[is.na(new_df)] <- -1
new_df
# Letter Values1 Values2 Values3
#1 A 1 0 -1
#2 B 2 -1 -1
#3 C 3 5 -1
#4 D -1 9 5
A tidyverse way with the same logic :
library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(full_join) %>%
mutate(across(everything(), replace_na, -1))
Here's a dplyr solution
df1 %>%
full_join(df2, by = "Letter") %>%
full_join(df3, by = "Letter") %>%
mutate_if(is.numeric, function(x) replace_na(x, -1))
output:
Letter Values1 Values2 Values3
<chr> <dbl> <dbl> <dbl>
1 A 1 0 -1
2 B 2 -1 -1
3 C 3 5 -1
4 D -1 9 5
I have a list of dataframes and my goal it is transpose them to bind into one. How could i do this? Below it is my list
$pri
$pri$x
a b
1 1 3
2 2 4
$pri$y
a b c
1 1 3 5
2 2 4 6
$sec
$sec$w
a b
1 7 9
2 8 10
$sec$z
a b c d
1 11 13 15 17
2 12 14 16 18
I aim the output like this
"col1" "col2"
a ; 1 ; 2
b ; 3 ; 4
a ; 1 ; 2
b ; 3 ; 4
c ; 5 ; 6
a ; 7 ; 8
b ; 9 ; 10
a ; 11 ; 12
b ; 13 ; 14
c ; 15 ; 16
d ; 17 ; 18
library(purrr)
pri <-
list(
x = data.frame(a = 1:2, b = 3:4),
y = data.frame(a = 1:2, b = 3:4, c = 5:6)
)
sec <-
list(
w = data.frame(a = 7:8, b = 9:10),
z = data.frame(a = 11:12, b = 13:14, c = 15:16, d = 17:18)
)
list(pri = pri, sec = sec) %>% flatten() %>% map(t) %>% reduce(rbind)
#> [,1] [,2]
#> a 1 2
#> b 3 4
#> a 1 2
#> b 3 4
#> c 5 6
#> a 7 8
#> b 9 10
#> a 11 12
#> b 13 14
#> c 15 16
#> d 17 18
Created on 2020-03-12 by the reprex package (v0.3.0)
Assuming your data given like this (According to your question):
frame_list <- list(pri = list(x = structure(list(a = 1:2, b = 3:4), class = "data.frame", row.names = c(NA,
-2L)), y = structure(list(a = 1:2, b = 3:4, c = 5:6), class = "data.frame", row.names = c(NA,
-2L))), sec = list(w = structure(list(a = 7:8, b = 9:10), class = "data.frame", row.names = c(NA,
-2L)), z = structure(list(a = 11:12, b = 13:14, c = 15:16, d = 17:18), class = "data.frame", row.names = c(NA,
-2L))))
then you can do:
df <- t(do.call('cbind', unlist(frame_list, recursive = FALSE)))
rownames(df) <- gsub('\\w+\\.\\w\\.', '', rownames(df))
Note: The output will be a matrix, In case you need to convert this to a dataframe, you can use data.frame but this will change the rownames by appending some number to it to make it unique.
Output:
[,1] [,2]
a 1 2
b 3 4
a 1 2
b 3 4
c 5 6
a 7 8
b 9 10
a 11 12
b 13 14
c 15 16
d 17 18
In case you want it into a dataframe, then you can do:
df <- data.frame(t(do.call('cbind', unlist(frame_list, recursive = FALSE))), stringsAsFactors = FALSE)
df$newcol <- gsub('\\w+\\.\\w\\.', '', rownames(df))
rownames(df) <- NULL
Output:
X1 X2 newcol
1 1 2 a
2 3 4 b
3 1 2 a
4 3 4 b
5 5 6 c
6 7 8 a
7 9 10 b
8 11 12 a
9 13 14 b
10 15 16 c
11 17 18 d
you could also solve your problem using base R functions as follows:
dfs <- list(pri = list(x = structure(list(a = 1:2, b = 3:4), class = "data.frame", row.names = c(NA,
-2L)), y = structure(list(a = 1:2, b = 3:4, c = 5:6), class = "data.frame", row.names = c(NA,
-2L))), sec = list(w = structure(list(a = 7:8, b = 9:10), class = "data.frame", row.names = c(NA,
-2L)), z = structure(list(a = 11:12, b = 13:14, c = 15:16, d = 17:18), class = "data.frame", row.names = c(NA,
-2L))))
t(Reduce(cbind, unlist(dfs, FALSE)))
# [,1] [,2]
# a 1 2
# b 3 4
# a 1 2
# b 3 4
# c 5 6
# a 7 8
# b 9 10
# a 11 12
# b 13 14
# c 15 16
# d 17 18
I need access a tibble table that is in a nest() function inside of another nest() function.
x <- list( factory = c('a','b','c','d'), cost = c(21,30,44,100))
x <- as.data.frame(x)
x <- x %>%
melt('cost','factory')
colnames(x) <- c('cost','client','type')
x <- x %>%
group_by(client)%>%
nest()
for (m in 1:4) {
if(m==1){
x$scene <- m
x2 <- x
}else{
x3 <- x
x3$scene <- m
x2 <- rbind(x2,x3)
}
}
x2 <- x2 %>%
group_by(scene) %>%
nest()
What am I trying to do is applying a function inside of first vector, something like:
test <- function(df){
df$data %>%
mutate(increa = cost + 15)
}
x2$data%>%
map(test)
dput(x2) result a:
structure(list(scene = 1:4, data = list(structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")))), row.names = c(NA, -4L), class =
c("tbl_df", "tbl", "data.frame"))
The expected result:
[[1]]
[[1]]$`factory`
[1] "a" "b" "c" "d"
[[1]]$cost
[1] 21 30 44 100
[[1]]$increa
[1] 36 45 59 115
[[2]]
[[2]]$`factory`
[1] "a" "b" "c" "d"
[[2]]$cost
[1] 21 30 44 100
[[2]]$increa
[1] 36 45 59 115
[[3]]
[[3]]$`factory`
[1] "a" "b" "c" "d"
[[3]]$cost
[1] 21 30 44 100
[[3]]$increa
[1] 36 45 59 115
[[4]]
[[4]]$`factory`
[1] "a" "b" "c" "d"
[[4]]$cost
[1] 21 30 44 100
[[4]]$increa
[1] 36 45 59 115
Someone could help me to solve this issue?
ANSWER
This is the result that I was looking for:
map(x2$data, function(df) map(df$data, function(df) df <- mutate(df,increa = cost + 15)))
To get your desired output I think it is easier first to extract the level of information you want to have and then calculate the new column. If you on the other hand want to manipulate the data in this structure and preserve this, than a nested call of map and mutate is necessary-
library(tidyverse)
First solution - extract information and then calculate new column:
We can get to desired level of information with
map(x2$data, ~ .x$data)
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 2
# cost type
# <dbl> <chr>
# 1 21 a
# 2 30 b
# 3 44 c
# 4 100 d
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 2
# cost type
# <dbl> <chr>
# 1 21 a
# 2 30 b
# 3 44 c
# 4 100 d
#
# ...
As this is a nested list structure a second map is needed to calculate the new column. Here the mutate-function is applied to each of the nested data entries with the additional specification to create a new column inc.
map(x2$data, ~ map(.x$data, mutate, inc = cost + 15))
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 3
# cost type inc
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 3
# cost type inc
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
# ...
The same result would be obtained with an extra function test which takes a data.frame as input parameter and calculates the new column:
test <- function(df){
mutate(df, increa = cost + 15)
}
map(x2$data, ~ map(.x$data, test))
Second solution - Manipulate in place
If you however want to keep this nested structure, then we use mutate on the first data-column with map and again mutate and map:
x2_new <- x2 %>%
mutate(data = map(data, function(df1) mutate(df1, data = map(data, test))))
To verify that this worked we again extract the needed information as above:
map(x2_new$data, ~ .x$data)
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 3
# cost type increa
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 3
# cost type increa
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
# ...
Third solution - breaks structure but keep information
This is my favourite solution as it turns the data into a tidy format and keeps all information:
x2 %>%
unnest(data) %>%
unnest(data) %>%
mutate(inc = cost + 15)
# A tibble: 16 x 5
# scene client cost type inc
# <int> <fct> <dbl> <chr> <dbl>
# 1 1 factory 21 a 36
# 2 1 factory 30 b 45
# 3 1 factory 44 c 59
# 4 1 factory 100 d 115
# 5 2 factory 21 a 36
# 6 2 factory 30 b 45
# 7 2 factory 44 c 59
# 8 2 factory 100 d 115
# 9 3 factory 21 a 36
# 10 3 factory 30 b 45
# 11 3 factory 44 c 59
# 12 3 factory 100 d 115
# 13 4 factory 21 a 36
# 14 4 factory 30 b 45
# 15 4 factory 44 c 59
# 16 4 factory 100 d 115
Data
generic_data <- structure(
list(client = structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100),
type = c("a", "b", "c", "d")),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame")))),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
x2 <- structure(
list(scene = 1:4,
data = list(generic_data, generic_data, generic_data, generic_data)),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
Based on your description, I think you're looking for map_depth
From the documentation https://purrr.tidyverse.org/reference/map_if.html:
map_depth(x, 2, fun) is equivalent to x <- map(x, ~ map(., fun))
which looks like the answer/solution you settled on.
This question already has answers here:
Changing Column Names in a List of Data Frames in R
(6 answers)
Closed 4 years ago.
I have the following list with multiple dataframes .
> dput(dfs)
structure(list(a = structure(list(x = 1:4, a = c(0.114304427057505,
0.202305722748861, 0.247671527322382, 0.897279736353084)), .Names = c("x",
"a"), row.names = c(NA, -4L), class = "data.frame"), b = structure(list(
x = 1:3, b = c(0.982652948237956, 0.694535500137135, 0.0617770322132856
)), .Names = c("x", "b"), row.names = c(NA, -3L), class = "data.frame"),
c = structure(list(x = 1:2, c = c(0.792271690675989, 0.997932326048613
)), .Names = c("x", "c"), row.names = c(NA, -2L), class = "data.frame")), .Names = c("a",
"b", "c"))
here i want change the first column name of each dataframe.
> dfs
$a
x a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
x b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
x c
1 1 0.7922717
2 2 0.9979323
I am using the following function
> lapply(dfs,function(x){ names(x)[1] <- 'sec';x})
$a
sec a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
sec b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
sec c
1 1 0.7922717
2 2 0.9979323
It's works but when i recall the original list ,the column names are not change.
How to assign to original list?
Thank you.
You have to assign the result of lapply to a variable, like this
dfs <- lapply(dfs,function(x){
names(x)[1] <- 'sec'
return(x)
})
I am not sure if this question is too basic but as I haven't found an answer despite searching google for quite some time I have to ask here..
Suppose I want to create a list out of data frames (df1 and df2), how can I use the name of the data frame as the list "index"(?) instead of numbers? I.e., how do I get [[df1]] instead of [[1]] and [[df2]] instead of [[2]]?
list(structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(b = 1:10, a = 1:10), .Names = c("b",
"a"), row.names = c(NA, -10L), class = "data.frame"))
OK, entirely different way to ask this question to hopefully make things clearer ;)
I have three data frames
weguihl <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
raeg <- structure(list(b = 1:3, a = 1:3), .Names = c("b", "a"), row.names = c(NA, -3L), class = "data.frame")
awezilf <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
I want to create a list out of them..
li <- list(weguihl, raeg, awezilf)
But now I have the problem that - without remembering the order of the data frames - I do not know which data frame is which in the list..
> li
[[1]]
a b
1 1 1
2 2 2
3 3 3
[[2]]
b a
1 1 1
2 2 2
3 3 3
[[3]]
a b
1 1 1
2 2 2
3 3 3
Thus I'd prefer this output
> li
[[weguihl]]
a b
1 1 1
2 2 2
3 3 3
[[raeg]]
b a
1 1 1
2 2 2
3 3 3
[[awezilf]]
a b
1 1 1
2 2 2
3 3 3
How do I get there?
You could potentially achieving this with mget on a clean global environment.
Something like
Clean the global environment
rm(list = ls())
You data frames
weguihl <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
raeg <- structure(list(b = 1:10, a = 1:10), .Names = c("b", "a"), row.names = c(NA, -10L), class = "data.frame")
awezilf <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
Running mget which will return a list of data frames by default
li <- mget(ls(), .GlobalEnv)
li
# $awezilf
# a b
# 1 1 1
# 2 2 2
# 3 3 3
#
# $raeg
# b a
# 1 1 1
# 2 2 2
# 3 3 3
#
# $weguihl
# a b
# 1 1 1
# 2 2 2
# 3 3 3