R rbind a dataframe of dataframes

R rbind a dataframe of dataframes - r

How is it possible to concatenate a dataframe that contains one or more data.frames among its columns. For example:
df <- data.frame(a=1:3)
df$df <- data.frame(a=1:3)
rbind( df, df)
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘2’, ‘3’
library(dplyr)
bind_rows(list(df,df))
Error: Argument 2 can't be a list containing data frames

The issue here seems to be not another data.frame within a data frame, but the non-unique rownames in the result. If you made sure that rownames are unique after rbind - it should work:
df1 <- data.frame(a=1:3)
df2 <- data.frame(a=1:3)
df1$df <- data.frame(a=1:3, row.names=letters[1:3])
df2$df <- data.frame(a=1:3, row.names=LETTERS[1:3])
> res <- rbind(df1, df2)
> res
a a
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 3 3
> res$df
a
a 1
b 2
c 3
A 1
B 2
C 3
The problem seems to be that rbind adjusts the rownames for the two data.frames being merged, but does not adjust the rownames for data.frames within data.frames.

One option would be to replicate df twice (or more) instead of rbind-ing it; this will automatically create non duplicated row.names. Try this:
df[rep(seq_len(nrow(df)), 2), ]
# output
a a
1 1 1
2 2 2
3 3 3
1.1 1 1
2.1 2 2
3.1 3 3
The same process using dplyr will give you more interesting row.names:
library(dplyr)
df %>% slice(rep(row_number(), 2))
# output
a a
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 3 3

We may list the data frames, then using mapply to handle column types differently: stack for vectors and do.call(rbind) for data.frames.
L <- mget(ls(pattern="df\\.")) # or list(df.1, df.2, df.3)
res <- data.frame(a=stack(mapply(`[`, L, 1))[[1]])
res$df <- do.call(rbind, mapply(`[`, L, 2))
res
# a a
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6
# 7 7 7
# 8 8 8
# 9 9 9
str(res)
# 'data.frame': 9 obs. of 2 variables:
# $ a : int 1 2 3 4 5 6 7 8 9
# $ df:'data.frame': 9 obs. of 1 variable:
# ..$ a: int 1 2 3 4 5 6 7 8 9
Data
df.1 <- structure(list(a = 1:3, df = structure(list(a = 1:3), class = "data.frame", row.names = c(NA,
-3L))), row.names = c(NA, -3L), class = "data.frame")
df.2 <- structure(list(a = 4:6, df = structure(list(a = 4:6), class = "data.frame", row.names = c(NA,
-3L))), row.names = c(NA, -3L), class = "data.frame")
df.3 <- structure(list(a = 7:9, df = structure(list(a = 7:9), class = "data.frame", row.names = c(NA,
-3L))), row.names = c(NA, -3L), class = "data.frame")

Related

how to separate 2 numbers within a column in R

In R, I want to separate numbers that are in the same column. My data appear like this:
id time
1 1,2
2 3,4
3 4,5,6
I want it to appear like this:
1 1
1 2
2 3
2 4
3 4
3 5
3 6
Though not shown, there are different iterations of time that vary depending on the id. For example:
4 1,6,7
5 1,3,6
6 1,4,5
7 1,3,5
8 2,3,4
There are 100 ids and the time column has different #s that vary in order as shown above.
Does anyone have advice to do this?

An option with separate_rows
library(dplyr)
library(tidyr)
df %>%
separate_rows(time, sep = "(?<=.)(?=.)", convert = TRUE)
# A tibble: 4 x 2
# id time
# <dbl> <int>
#1 1 1
#2 1 2
#3 2 3
#4 2 4
data
df <- structure(list(id = c(1, 2), time = c(12, 34)), class = "data.frame",
row.names = c(NA,
-2L))

Using tidyverse you could try the following. Make sure time is character type, and use strsplit to split up into single characters.
library(tidyverse)
df %>%
mutate(time = strsplit(as.character(time), ",")) %>%
unnest(cols = time)
Or you can just use separate_rows and indicate comma as separator:
df %>%
separate_rows(time, sep = ',')
Or in base R you could try this:
s <- strsplit(df$time, ',', fixed = T)
data.frame(id = unlist(s), time = rep(df$id, lengths(s)))
Output
# A tibble: 10 x 2
id time
<int> <chr>
1 1 1
2 1 2
3 2 3
4 2 4
5 3 4
6 3 5
7 3 6
8 4 1
9 4 6
10 4 7
Data
df <- structure(list(id = 1:4, time = c("1,2", "3,4", "4,5,6", "1,6,7"
)), class = "data.frame", row.names = c(NA, -4L))

How to use filter_at over a list of dataframes

I want to apply a filter_at over a list of dataframes. I can apply it to a single dataframe within this list like so:
dat_list[[1]] <- dat_list[[1]] %>% filter_at(vars(c("test", "x")), all_vars(!is.na(.)))
Here is the test dataset:
dat1 <- structure(list(id = 1:3, test = 4:6, x = 7:9), class = "data.frame", row.names = c(NA,-3L))
dat2 <- structure(list(id = 1:3, test = 4:6, x = 7:9), class = "data.frame", row.names = c(NA,-3L))
dat3 <- structure(list(id = 1:3, test = 4:6, x = 7:9), class = "data.frame", row.names = c(NA,-3L))
dat1[1,2] <- NA
dat1[1,3] <- NA
dat1[3,2] <- NA
dat1[3,3] <- NA
dat3[1,2] <- NA
dat3[1,3] <- NA
dat3[3,2] <- NA
dat3[3,3] <- NA
dat_list <- list(dat1, dat2, dat3)

Using tidyverse:
library(dplyr)
library(purrr)
dat_list2 <- map(dat_list, ~filter_at(., vars(c("test", "x")), all_vars(!is.na(.))))
dat_list2
#> [[1]]
#> id test x
#> 1 2 5 8
#>
#> [[2]]
#> id test x
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
#>
#> [[3]]
#> id test x
#> 1 2 5 8
Created on 2020-07-08 by the reprex package (v0.3.0)

With dplyr 1.0.0, we can use filter with across
library(dplyr)#1.0.0
library(purrr)
dat_list %>%
map(~ .x %>% filter(across(c(test, x), ~ !is.na(.x))))
#[[1]]
# id test x
#1 2 5 8
#[[2]]
# id test x
#1 1 4 7
#2 2 5 8
#3 3 6 9
#[[3]]
# id test x
#1 2 5 8

R: Merge two data frames based on value in column and return all values of both data frames

Let's say I have the following dfs
df1:
a b c d
1 2 3 4
4 3 3 4
9 7 3 4
df2:
a b c d
1 2 3 4
2 2 3 4
3 2 3 4
Now I want to merge both dfs conditional of column "a" to give me the following df
a b c d
1 2 3 4
4 3 3 4
9 7 3 4
2 2 3 4
3 2 3 4
In my dataset i tried using
merge <- merge(x = df1, y = df2, by = "a", all = TRUE)
However, while df1 has 50,000 entries and df2 has 100,000 entries and there are definately matching values in column a the merged df has over one million entries. I do not understand this. As I understand there should be max. 150,000 entries in the merged df and this is the case when no values in column a are equal between the two dfs.

I think what you want to do is not mergebut rather rbind the two dataframes and remove the duplicated rows:
DATA:
df1 <- data.frame(a = c(1,4,9),
b = c(2,3,7),
c = c(3,3,3),
d = c(4,4,4))
df2 <- data.frame(a = c(1,2,3),
b = c(2,2,2),
c = c(3,3,3),
d = c(4,4,4))
SOLUTION:
Row-bind df1and df2:
df3 <- rbind(df1, df2)
Remove the duplicate rows:
df3 <- df3[!duplicated(df3), ]
RESULT:
df3
a b c d
1 1 2 3 4
2 4 3 3 4
3 9 7 3 4
5 2 2 3 4
6 3 2 3 4

With tidyverse, we can do bind_rows and distinct
library(dplyr)
bind_rows(df1, df2) %>%
distinct
data
df1 <- structure(list(a = c(1, 4, 9), b = c(2, 3, 7), c = c(3, 3, 3),
d = c(4, 4, 4)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(a = c(1, 2, 3), b = c(2, 2, 2), c = c(3, 3, 3),
d = c(4, 4, 4)), class = "data.frame", row.names = c(NA,
-3L))

it is possible so
dplyr::union(df1, df2)

here is another base R solution using rbind + %in%
dfout <- rbind(df1,subset(df2,!a %in% df1$a))
such that
> rbind(df1,subset(df2,!a %in% df1$a))
a b c d
1 1 2 3 4
2 4 3 3 4
3 9 7 3 4
21 2 2 3 4
31 3 2 3 4

Expand rows with lists as observations

I have the following frame:
df <- structure(list(returns = list(c(1,2,3,4,5,6), c(7,8,9,10,11,12)), indexId = c("a", "b")), class = "data.frame", row.names = 1:2)
Is there an easy way to convert this into a normal data.frame so it appears as:
Choice ppl
1 a
2 a
3 a
4 a
5 a
6 a
7 b
8 b
9 b
10 b
11 b
12 b
I have a solution using For but I am looking for something simpler.
All help is much appreciated!

df <- structure(list(returns = list(c(1,2,3,4,5,6), c(7,8,9,10,11,12)),
indexId = c("a", "b")), class = "data.frame", row.names = 1:2)
library(tidyverse)
df %>% separate_rows()
# returns indexId
# 1 1 a
# 2 2 a
# 3 3 a
# 4 4 a
# 5 5 a
# 6 6 a
# 7 7 b
# 8 8 b
# 9 9 b
# 10 10 b
# 11 11 b
# 12 12 b

Or :
data.frame(choice = unlist(df$returns), ppl = rep(df$indexId, lapply(df$returns, length)))

Modify, extract, and concatenate list sub-elements into a data.frame in R with tidyverse

I'm trying to find an elegant way to work with list structures in R. In particular, in this case, I'd like to extract sub-elements from a list, modify them based on their associated data in that list, and concatenate them into a data frame. Perhaps easier with an example:
mystruct <- structure(list(dataset1 = structure(list(data1 = structure(list(
a = c(1, 2, 3), b = c(4, 5, 6)), .Names = c("a", "b"), row.names = c(NA,
-3L), class = "data.frame"), data2 = c("a", "b", "c", "d", "e"
)), .Names = c("data1", "data2")), dataset2 = structure(list(
data1 = structure(list(a = c(7, 8, 9), b = c(10, 11, 12)), .Names = c("a",
"b"), row.names = c(NA, -3L), class = "data.frame"), data2 = c("f",
"g", "h", "i", "j")), .Names = c("data1", "data2"))), .Names = c("dataset1",
"dataset2"))
I can concatenate data1 elements like this:
> mystruct %>% map_dfr(~.x$data1)
a b
1 1 4
2 2 5
3 3 6
4 7 10
5 8 11
6 9 12
But I would like to add a "dataset" column, which is populated by the name of the list element from whence the data was taken:
dataset a b
1 dataset1 1 4
2 dataset1 2 5
3 dataset1 3 6
4 dataset2 7 10
5 dataset2 8 11
6 dataset2 9 12
Is there a way to do this nicely with the tidyverse? I'd also be open to data.table solutions.
Thanks,
Allie

Provide an .id parameter to map_df, which will create a column giving the name of the list:
map_df(mystruct, 'data1', .id='dataset')
# dataset a b
#1 dataset1 1 4
#2 dataset1 2 5
#3 dataset1 3 6
#4 dataset2 7 10
#5 dataset2 8 11
#6 dataset2 9 12
Or map_dfr should work as well:
map_dfr(mystruct, 'data1', .id='dataset')

map_dfr has an .id argument:
mystruct %>% map_dfr(~ .x$data1, .id = "id")
giving:
id a b
1 dataset1 1 4
2 dataset1 2 5
3 dataset1 3 6
4 dataset2 7 10
5 dataset2 8 11
6 dataset2 9 12

Restructure as a "tidy" table with list columns...
library(data.table)
tabstruct = rbindlist(lapply(mystruct, lapply, list), id = TRUE)
# .id data1 data2
# 1: dataset1 <data.frame> a,b,c,d,e
# 2: dataset2 <data.frame> f,g,h,i,j
Then "unnest" data1:
tabstruct[, rbindlist(setNames(data1, .id), id=TRUE)]
# .id a b
# 1: dataset1 1 4
# 2: dataset1 2 5
# 3: dataset1 3 6
# 4: dataset2 7 10
# 5: dataset2 8 11
# 6: dataset2 9 12
Or unnest data2:
tabstruct[, .(val = unlist(data2)), by=.id]
# .id val
# 1: dataset1 a
# 2: dataset1 b
# 3: dataset1 c
# 4: dataset1 d
# 5: dataset1 e
# 6: dataset2 f
# 7: dataset2 g
# 8: dataset2 h
# 9: dataset2 i
# 10: dataset2 j

Here is an option to do this on multiple datasets in the list
map(c('data1', 'data2'), ~
map2_df(mystruct, .x, ~ .x[[.y]], .id = 'id'))
#[[1]]
# id a b
#1 dataset1 1 4
#2 dataset1 2 5
#3 dataset1 3 6
#4 dataset2 7 10
#5 dataset2 8 11
#6 dataset2 9 12
#[[2]]
# A tibble: 5 x 3
# id dataset1 dataset2
# <chr> <chr> <chr>
#1 1 a f
#2 1 b g
#3 1 c h
#4 1 d i
#5 1 e j

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R rbind a dataframe of dataframes - r

Related

how to separate 2 numbers within a column in R

How to use filter_at over a list of dataframes

R: Merge two data frames based on value in column and return all values of both data frames

Expand rows with lists as observations

Modify, extract, and concatenate list sub-elements into a data.frame in R with tidyverse

Categories

Resources