purrr to replace split, apply, output nested column - r

I understand how to use split, lapply and the combine the list outputs back together using base R. I'm trying to understand the purrr way to do this. I can do it with base R and even with purrr* but am guessing since I seem to be duplciating the order variable that I'm doing it wrong. It feels clunky so I don't think I get it.
What is the tidyverse approach to using info from data subsets to create a nested output column?
Base R approach to make nested column in a data frame
library(tidyverse)
set.seed(10)
dat2 <- dat1 <- data_frame(
v1 = LETTERS[c(1, 1, 1, 1, 2, 2, 2, 2)],
v2 = rep(1:4, 2),
from = c(1, 3, 2, 1, 3, 5, 2, 1),
to = c(1, 3, 2, 1, 3, 5, 2, 1) + sample(1:3, 8, TRUE)
)
dat1 <- split(dat1, dat1[c('v1', 'v2')]) %>%
lapply(function(x){
x$order <- list(seq(x$from, x$to))
x
}) %>%
{do.call(rbind, .)}
dat1
unnest(dat1)
My purrr approach (what is the right way?)
dat2 %>%
group_by(v1, v2) %>%
nest() %>%
mutate(order = purrr::map(data, ~ with(., seq(from, to)))) %>%
select(-data)
Desired output
v1 v2 from to order
* <chr> <int> <dbl> <dbl> <list>
1 A 1 1 3 <int [3]>
2 B 1 3 4 <int [2]>
3 A 2 3 4 <int [2]>
4 B 2 5 6 <int [2]>
5 A 3 2 4 <int [3]>
6 B 3 2 3 <int [2]>
7 A 4 1 4 <int [4]>
8 B 4 1 2 <int [2]>

In this particular case it seems you're looking for:
mutate(dat2,order = map2(.x = from,.y = to,.f = seq))

Using the new, experimental, rap package:
remotes::install_github("romainfrancois/rap")
library(rap)
dat2 %>%
rap(order = ~ seq(from, to))

Related

Concatenate/merge dataframes in R into vector type cells

I would like to merge two dataframe into one, each cell becoming a vector or a list.
Columns have the same name in both dataframes. Some columns are made of numerical values that I want to keep as numerical values in the merged dataframe. Some columns are made of characters.
For example I would like from these two dataframes:
DF1 <- data.frame(
xx = c(1:5),
yy = c(2:6),
zz = c("a","b","c","d","e"))
DF2 <- data.frame(
xx = c(3:7),
yy = c(5:9),
zz = c("a","i","h","g","f"))
Which look like this:
DF1
xx
yy
zz
1
2
a
2
3
b
3
4
c
4
5
d
5
6
e
DF2
xx
yy
zz
3
5
a
4
6
i
5
7
h
6
8
g
7
9
f
To get a dataframe looking like this:
xx
yy
zz
c(1,3)
c(2,5)
c(a,a)
c(2,4)
c(3,6)
c(b,i)
c(3,5)
c(4,7)
c(c,h)
c(4,6)
c(5,8)
c(d,g)
c(5,7)
c(6,9)
c(e,f)
I have tried with paste() or str_c() but it always transforms my numerical values into char and it does not create a list or a vector like I want.
Do you know of any functions that coule help me do that?
Using some tidyverse, you can invert the lists and then build it all back together.
library(purrr)
library(dplyr)
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)))
This gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
Breaking this down...
transpose(list(.x, .y)) will flip a paired list of columns inside-out from a list of two vectors to a list of 5 elements (one for each row, each with two list elements in it).
map(transpose(list(.x, .y)), unlist)) will iterate over each of the 5 lists and unlist them back from a list of 2 to a vector of 2.
map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist)) will iterate over each column pair from DF1 and DF2 (e.g., xx, yy, zz) doing steps 1 and 2.
as_tibble(map2(DF1, DF2, ~ map(transpose(list(.x, .y)), unlist))) converts the list to a tibble (basically a data.frame).
Another thing you can do is stack the data and then nest() it. You again need a few steps to do it. This would scale better because you could do this with more than 2 data frames.
library(dplyr)
library(tibble)
library(tidyr)
bind_rows(rowid_to_column(DF1),
rowid_to_column(DF2)) %>%
group_by(rowid) %>%
nest(nest_data = -rowid) %>%
unnest_wider(nest_data) %>%
ungroup() %>%
select(-rowid)
This also gets you your data frame of vectors.
# A tibble: 5 x 3
xx yy zz
<list> <list> <list>
1 <int [2]> <int [2]> <chr [2]>
2 <int [2]> <int [2]> <chr [2]>
3 <int [2]> <int [2]> <chr [2]>
4 <int [2]> <int [2]> <chr [2]>
5 <int [2]> <int [2]> <chr [2]>
This gives you matrices in a list:
res <- setNames(
lapply( colnames(DF1), function(x) cbind(DF1[[x]], DF2[[x]]) ),
colnames(DF1) )
To convert the result into a data frame you can use this:
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res$xx), function(y){ list(res[[x]][y,1:ncol(res$xx)]) }
) }
) )
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f
Put together in a function:
EDIT: Added functionality to apply any number of DFs
(against what the question demands, but seemed to be necessary)
morph <- function(...){
abc <- list(...)
res <- sapply( colnames(abc[[1]]), function(col) list(
sapply( abc, function(dfr) dfr[[col]] ) ) )
data.frame( sapply(
names(res), function(x){ sapply(
1:nrow(res[[1]]), function(y){ list(res[[x]][y,1:ncol(res[[1]])]) }
) }
) )
}
morph(DF1, DF2, DF2)
xx yy zz
1 1, 3, 3 2, 5, 5 a, a, a
2 2, 4, 4 3, 6, 6 b, i, i
3 3, 5, 5 4, 7, 7 c, h, h
4 4, 6, 6 5, 8, 8 d, g, g
5 5, 7, 7 6, 9, 9 e, f, f
As your data consists of different types, There is no straight forward answer. However I produced some solution, that might do the trick by creating a nested list. Let me know, if this is what you need:
library(BBmisc)
library(dplyr)
colvec <- c("xx2","yy2","zz2")
colnames(DF2) <- colvec
DF <- bind_cols(DF1,DF2)
cols.num <- c("xx","xx2","yy","yy2")
DF[cols.num] <- sapply(DF[cols.num],as.character)
DF <- DF[,c(1,4,2,5,3,6)]
xx <- convertRowsToList(DF[,1:2])
yy <- convertRowsToList(DF[,3:4])
zz <- convertRowsToList(DF[,5:6])
final_list <- list(xx,yy,zz)
Try the following base R option
> data.frame(Map(function(x, y) asplit(cbind(x, y), 1), DF1, DF2))
xx yy zz
1 1, 3 2, 5 a, a
2 2, 4 3, 6 b, i
3 3, 5 4, 7 c, h
4 4, 6 5, 8 d, g
5 5, 7 6, 9 e, f

Simplifying the list for nested data frame

Sorry I am new in R
I need to get a dataframe ready a json format. But I have trouble to put the variable back to the original format c(1,2,3,...). For example
library(tidyr)
x<-tibble(x = 1:3, y = list(c(1,5), c(1,5,10), c(1,2,3,20)))
View(x)
This shows
1 1 c(1, 5)
2 2 c(1, 5, 10)
3 3 c(1, 2, 3, 20)
x1<-x %>% unnest(y)
x2<-x1 %>% nest(data=c(y))
View(x2)
This shows
1 1 1 variable
2 2 1 variable
3 3 1 variable
the desired format is c(...) rather than a variable to get ready for the json data file
1 1 c(1, 5)
2 2 c(1, 5, 10)
3 3 c(1, 2, 3, 20)
Please help
x$y is a list-column of doubles. Whereas x2$y is a list-column of tibbles.
Use map and unlist to turn the tibbles into doubles.
library(tidyverse)
x2 %>%
mutate(data = map(data, unlist))
#> # A tibble: 3 x 2
#> x data
#> <int> <list>
#> 1 1 <dbl [2]>
#> 2 2 <dbl [3]>
#> 3 3 <dbl [4]>
Alternatively, instead of nesting, you can use summarise.
x1 %>%
group_by(x) %>%
summarise(data = list(y))
#> # A tibble: 3 x 2
#> x data
#> <int> <list>
#> 1 1 <dbl [2]>
#> 2 2 <dbl [3]>
#> 3 3 <dbl [4]>

Create a list column with ranges set by existing columns

I am trying to create a list column within a data frame, specifying the range using existing columns, something like:
# A tibble: 3 x 3
A B C
<dbl> <dbl> <list>
1 1 6 c(1, 2, 3, 4, 5, 6)
2 2 5 c(2, 3, 4, 5)
3 3 4 c(3, 4)
The catch is that it would need to be created as follows:
df %>% mutate(C = c(A:B))
I have a dataset containing integers entered as ranges, i.e someone has entered "7 to 26". I've separated the ranges into two columns A & B, or "start" and "end", and was hoping to use c(A:B) to create a list, but using dplyr I keep getting:
Warning messages:
1: In a:b : numerical expression has 3 elements: only the first used
2: In a:b : numerical expression has 3 elements: only the first used
Which gives:
# A tibble: 3 x 3
A B C
<dbl> <dbl> <list>
1 1 6 list(1:6)
2 2 5 list(1:6)
3 3 4 list(1:6)
Has anyone had a similar issue and found a workaround?
You can use map2() in purrr
library(dplyr)
df %>%
mutate(C = purrr::map2(A, B, seq))
or do rowwise() before mutate()
df %>%
rowwise() %>%
mutate(C = list(A:B)) %>%
ungroup()
Both methods give
# # A tibble: 3 x 3
# A B C
# <int> <int> <list>
# 1 1 6 <int [6]>
# 2 2 5 <int [4]>
# 3 3 4 <int [2]>
Data
df <- tibble::tibble(A = 1:3, B = 6:4)

How can I use the seq function on two columns in a dataframe?

Suppose I have:
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 6, 8, 10)
my.list <- list(start = x, end = y) %>% as.data.frame()
I need to define a new variable that contains seq(start,end) or start:end stored in that variable, I want the sequence of numbers across the rows, for example, 1 2 for the first row and 3 4 5 6 for the third row.
Many thanks
We can use map2 to get the sequence of corresponding values of 'start', 'end' to create a list of vectors
library(dplyr)
library(purrr)
my.list %>%
mutate(new = map2(start, end, `:`))
# start end new
#1 1 2 1, 2
#2 2 3 2, 3
#3 3 6 3, 4, 5, 6
#4 4 8 4, 5, 6, 7, 8
#5 5 10 5, 6, 7, 8, 9, 10
Another option is rowwise
my.list %>%
rowwise %>%
mutate(new = list(start:end))
# A tibble: 5 x 3
# Rowwise:
# start end new
# <dbl> <dbl> <list>
#1 1 2 <int [2]>
#2 2 3 <int [2]>
#3 3 6 <int [4]>
#4 4 8 <int [5]>
#5 5 10 <int [6]>
Or with data.table as #markus mentioned in comments
library(data.table)
setDT(my.list)[, V3 := Map(`:`, start, end)]
Or with Map from base R
Map(`:`, my.list$start, my.list$end)

R sum a twice nested list using purrr

I have a data.frame with the following dimensions:
Output:
as_tibble(data2)
lamda meanlog sdlog freq freqsev
<dbl> <dbl> <dbl> <list> <list>
1 5 9 2 <int [4]> <list [4]>
2 2 10 2.1 <int [4]> <list [4]>
3 3 11 2.2 <int [4]> <list [4]>
where freqsev is a list of values of length freq, and freq itself is a list of values of length s, where s is the number of simulations.
library(tidyverse)
set.seed(123)
s <- 5
data <- data.frame(lamda = c(5, 2, 3), meanlog = c(9, 10, 11), sdlog = c(2, 2.1, 2.2))
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog)))
)
I would like to sum freqsev (producing <dbl [4]> where the [4] is the index of s) i.e. a sum over the number of freq occurrences e.g.
For data2$freqsev[[1]][[1]] I would expect the sum.
How can this be achieved? Thank you.
To be honest, this is a really complicated way of storing your data and you would probably be better off using unnest() after creating the freq column. However, you can get the sums of the freqsev vectors like this:
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))),
freqsum = map(freqsev, ~map_dbl(.x, ~sum(.x)))
)
Because freqsev is a double-nested list, you also need to double-map the sum operation.

Resources