I would like to merge two dataframe into one, each cell becoming a vector or a list.
Columns have the same name in both dataframes. Some columns are made of numerical values that I want to keep as numerical values in the merged dataframe. Some columns are made of characters.
For example I would like from these two dataframes:
DF1 <- data.frame(
xx = c(1:5),
yy = c(2:6),
zz = c("a","b","c","d","e"))
DF2 <- data.frame(
xx = c(3:7),
yy = c(5:9),
zz = c("a","i","h","g","f"))
Which look like this:
DF1
xx
yy
zz
1
2
a
2
3
b
3
4
c
4
5
d
5
6
e
DF2
xx
yy
zz
3
5
a
4
6
i
5
7
h
6
8
g
7
9
f
To get a dataframe looking like this:
xx
yy
zz
c(1,3)
c(2,5)
c(a,a)
c(2,4)
c(3,6)
c(b,i)
c(3,5)
c(4,7)
c(c,h)
c(4,6)
c(5,8)
c(d,g)
c(5,7)
c(6,9)
c(e,f)
I have tried with paste() or str_c() but it always transforms my numerical values into char and it does not create a list or a vector like I want.
Do you know of any functions that coule help me do that?
Using some tidyverse, you can invert the lists and then build it all back together.
library(purrr)
library(dplyr)
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)))
This gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
Breaking this down...
transpose(list(.x, .y)) will flip a paired list of columns inside-out from a list of two vectors to a list of 5 elements (one for each row, each with two list elements in it).
map(transpose(list(.x, .y)), unlist)) will iterate over each of the 5 lists and unlist them back from a list of 2 to a vector of 2.
map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)) will iterate over each column pair from DF1 and DF2 (e.g., xx, yy, zz) doing steps 1 and 2.
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist))) converts the list to a tibble (basically a data.frame).
Another thing you can do is stack the data and then nest() it. You again need a few steps to do it. This would scale better because you could do this with more than 2 data frames.
library(dplyr)
library(tibble)
library(tidyr)
bind_rows(rowid_to_column(DF1),
rowid_to_column(DF2)) %>%
group_by(rowid) %>%
nest(nest_data = -rowid) %>%
unnest_wider(nest_data) %>%
ungroup() %>%
select(-rowid)
This also gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
This gives you matrices in a list:
res <- setNames(
lapply( colnames(DF1), function(x) cbind(DF1[[x]], DF2[[x]]) ),
colnames(DF1) )
To convert the result into a data frame you can use this:
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res$xx), function(y){ list(res[[x]][y,1:ncol(res$xx)]) }
) }
) )
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f
Put together in a function:
EDIT: Added functionality to apply any number of DFs
(against what the question demands, but seemed to be necessary)
morph <- function(...){
abc <- list(...)
res <- sapply( colnames(abc[[1]]), function(col) list(
sapply( abc, function(dfr) dfr[[col]] ) ) )
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res[[1]]), function(y){ list(res[[x]][y,1:ncol(res[[1]])]) }
) }
) )
}
morph(DF1, DF2, DF2)
xx yy zz
1 1, 3, 3 2, 5, 5 a, a, a
2 2, 4, 4 3, 6, 6 b, i, i
3 3, 5, 5 4, 7, 7 c, h, h
4 4, 6, 6 5, 8, 8 d, g, g
5 5, 7, 7 6, 9, 9 e, f, f
As your data consists of different types, There is no straight forward answer. However I produced some solution, that might do the trick by creating a nested list. Let me know, if this is what you need:
library(BBmisc)
library(dplyr)
colvec <- c("xx2","yy2","zz2")
colnames(DF2) <- colvec
DF <- bind_cols(DF1,DF2)
cols.num <- c("xx","xx2","yy","yy2")
DF[cols.num] <- sapply(DF[cols.num],as.character)
DF <- DF[,c(1,4,2,5,3,6)]
xx <- convertRowsToList(DF[,1:2])
yy <- convertRowsToList(DF[,3:4])
zz <- convertRowsToList(DF[,5:6])
final_list <- list(xx,yy,zz)
Try the following base R option
> data.frame(Map(function(x, y) asplit(cbind(x, y), 1), DF1, DF2))
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f
Sorry I am new in R
I need to get a dataframe ready a json format. But I have trouble to put the variable back to the original format c(1,2,3,...). For example
library(tidyr)
x<-tibble(x = 1:3, y = list(c(1,5), c(1,5,10), c(1,2,3,20)))
View(x)
This shows
1 1 c(1, 5)
2 2 c(1, 5, 10)
3 3 c(1, 2, 3, 20)
x1<-x %>% unnest(y)
x2<-x1 %>% nest(data=c(y))
View(x2)
This shows
1 1 1 variable
2 2 1 variable
3 3 1 variable
the desired format is c(...) rather than a variable to get ready for the json data file
1 1 c(1, 5)
2 2 c(1, 5, 10)
3 3 c(1, 2, 3, 20)
Please help
x$y is a list-column of doubles. Whereas x2$y is a list-column of tibbles.
Use map and unlist to turn the tibbles into doubles.
library(tidyverse)
x2 %>%
mutate(data = map(data, unlist))
#> # A tibble: 3 x 2
#> x data
#> <int> <list>
#> 1 1 <dbl [2]>
#> 2 2 <dbl [3]>
#> 3 3 <dbl [4]>
Alternatively, instead of nesting, you can use summarise.
x1 %>%
group_by(x) %>%
summarise(data = list(y))
#> # A tibble: 3 x 2
#> x data
#> <int> <list>
#> 1 1 <dbl [2]>
#> 2 2 <dbl [3]>
#> 3 3 <dbl [4]>
I am trying to create a list column within a data frame, specifying the range using existing columns, something like:
# A tibble: 3 x 3
A B C
<dbl> <dbl> <list>
1 1 6 c(1, 2, 3, 4, 5, 6)
2 2 5 c(2, 3, 4, 5)
3 3 4 c(3, 4)
The catch is that it would need to be created as follows:
df %>% mutate(C = c(A:B))
I have a dataset containing integers entered as ranges, i.e someone has entered "7 to 26". I've separated the ranges into two columns A & B, or "start" and "end", and was hoping to use c(A:B) to create a list, but using dplyr I keep getting:
Warning messages:
1: In a:b : numerical expression has 3 elements: only the first used
2: In a:b : numerical expression has 3 elements: only the first used
Which gives:
# A tibble: 3 x 3
A B C
<dbl> <dbl> <list>
1 1 6 list(1:6)
2 2 5 list(1:6)
3 3 4 list(1:6)
Has anyone had a similar issue and found a workaround?
You can use map2() in purrr
library(dplyr)
df %>%
mutate(C = purrr::map2(A, B, seq))
or do rowwise() before mutate()
df %>%
rowwise() %>%
mutate(C = list(A:B)) %>%
ungroup()
Both methods give
# # A tibble: 3 x 3
# A B C
# <int> <int> <list>
# 1 1 6 <int [6]>
# 2 2 5 <int [4]>
# 3 3 4 <int [2]>
Data
df <- tibble::tibble(A = 1:3, B = 6:4)
Suppose I have:
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 6, 8, 10)
my.list <- list(start = x, end = y) %>% as.data.frame()
I need to define a new variable that contains seq(start,end) or start:end stored in that variable, I want the sequence of numbers across the rows, for example, 1 2 for the first row and 3 4 5 6 for the third row.
Many thanks
We can use map2 to get the sequence of corresponding values of 'start', 'end' to create a list of vectors
library(dplyr)
library(purrr)
my.list %>%
mutate(new = map2(start, end, `:`))
# start end new
#1 1 2 1, 2
#2 2 3 2, 3
#3 3 6 3, 4, 5, 6
#4 4 8 4, 5, 6, 7, 8
#5 5 10 5, 6, 7, 8, 9, 10
Another option is rowwise
my.list %>%
rowwise %>%
mutate(new = list(start:end))
# A tibble: 5 x 3
# Rowwise:
# start end new
# <dbl> <dbl> <list>
#1 1 2 <int [2]>
#2 2 3 <int [2]>
#3 3 6 <int [4]>
#4 4 8 <int [5]>
#5 5 10 <int [6]>
Or with data.table as #markus mentioned in comments
library(data.table)
setDT(my.list)[, V3 := Map(`:`, start, end)]
Or with Map from base R
Map(`:`, my.list$start, my.list$end)
I have a data.frame with the following dimensions:
Output:
as_tibble(data2)
lamda meanlog sdlog freq freqsev
<dbl> <dbl> <dbl> <list> <list>
1 5 9 2 <int [4]> <list [4]>
2 2 10 2.1 <int [4]> <list [4]>
3 3 11 2.2 <int [4]> <list [4]>
where freqsev is a list of values of length freq, and freq itself is a list of values of length s, where s is the number of simulations.
library(tidyverse)
set.seed(123)
s <- 5
data <- data.frame(lamda = c(5, 2, 3), meanlog = c(9, 10, 11), sdlog = c(2, 2.1, 2.2))
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog)))
)
I would like to sum freqsev (producing <dbl [4]> where the [4] is the index of s) i.e. a sum over the number of freq occurrences e.g.
For data2$freqsev[[1]][[1]] I would expect the sum.
How can this be achieved? Thank you.
To be honest, this is a really complicated way of storing your data and you would probably be better off using unnest() after creating the freq column. However, you can get the sums of the freqsev vectors like this:
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))),
freqsum = map(freqsev, ~map_dbl(.x, ~sum(.x)))
)
Because freqsev is a double-nested list, you also need to double-map the sum operation.