I have a data.frame with the following dimensions:
Output:
as_tibble(data2)
lamda meanlog sdlog freq freqsev
<dbl> <dbl> <dbl> <list> <list>
1 5 9 2 <int [4]> <list [4]>
2 2 10 2.1 <int [4]> <list [4]>
3 3 11 2.2 <int [4]> <list [4]>
where freqsev is a list of values of length freq, and freq itself is a list of values of length s, where s is the number of simulations.
library(tidyverse)
set.seed(123)
s <- 5
data <- data.frame(lamda = c(5, 2, 3), meanlog = c(9, 10, 11), sdlog = c(2, 2.1, 2.2))
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog)))
)
I would like to sum freqsev (producing <dbl [4]> where the [4] is the index of s) i.e. a sum over the number of freq occurrences e.g.
For data2$freqsev[[1]][[1]] I would expect the sum.
How can this be achieved? Thank you.
To be honest, this is a really complicated way of storing your data and you would probably be better off using unnest() after creating the freq column. However, you can get the sums of the freqsev vectors like this:
data2 <- data %>% mutate(
freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))),
freqsum = map(freqsev, ~map_dbl(.x, ~sum(.x)))
)
Because freqsev is a double-nested list, you also need to double-map the sum operation.
Related
I am doing a project which is similar to Uber Eat. I want to create new column in a data frame to calculate sub-total of these orders but because the class of each column is "list", R is not allowing me to do that. Do you know any ways to do it.
Thank you
a = c(1,2,3)
b = 1:2
c = (3,1)
P1 = c(12,13,4)
P2 = c(2,4)
P3 = c(12,1)
#My given dataframe will be:
Order | Price | Sub-total
a | P1 | sum(a*P1)
b | P2 | sum(b*P2)
c | P3 | sum(c*P3)
Expect output:
Subtotal = [50, 10, 37]
Please see the attached image to understand my dataframe
My dataframe
My goal is how to compute aP1, bP2, cP3 and then total sum of aP1....
library(tidyverse)
orders <- list(
a = c(1,2,3),
b = 1:2,
c = c(3,1)
)
prices <- list(
P1 = c(12,13,4),
P2 = c(2,4),
P3 = c(12,1)
)
tibble(
Order = orders,
Price = prices
) %>%
mutate(
sub_total = Order %>% map2_dbl(Price, ~ sum(.x * .y))
)
#> # A tibble: 3 x 3
#> Order Price sub_total
#> <named list> <named list> <dbl>
#> 1 <dbl [3]> <dbl [3]> 50
#> 2 <int [2]> <dbl [2]> 10
#> 3 <dbl [2]> <dbl [2]> 37
Created on 2021-10-01 by the reprex package (v2.0.1)
First, store your respective Order and Price data into a list
a = c(1,2,3)
b = 1:2
c = c(3,1)
P1 = c(12,13,4)
P2 = c(2,4)
P3 = c(12,1)
Order <- list(a, b, c)
Price <- list(P1, P2, P3)
Use a tibble so that you can easily set list columns.
Then using the tidyverse structure, map over the two list columns and apply your formula.
library(dplyr)
library(purrr)
df <- tibble(Order = Order, Price = Price)
df <- df %>%
mutate(Sub_total = map2_dbl(Order, Price, ~ sum( .x * .y)))
The result will be as you expected. You can see your original data stored as lists and then your sub-totals.
> df
# A tibble: 3 x 3
Order Price Sub_total
<list> <list> <dbl>
1 <dbl [3]> <dbl [3]> 50
2 <int [2]> <dbl [2]> 10
3 <dbl [2]> <dbl [2]> 37
The total sum would then be sum(df$Sub_total) which is 97.
Here is an option in base R
d1 <- data.frame(Order = I(list(a, b, c)), Price = I(list(P1, P2, P3)))
d1$Sub_total <- unlist(Map(`%*%`, d1$Order, d1$Price))
-output
> d1
Order Price Sub_total
1 1, 2, 3 12, 13, 4 50
2 1, 2 2, 4 10
3 3, 1 12, 1 37
I'm trying to add the elements of an integer vector, which are nested in a two-level list.
I came up with this solution, but I think it's a little uncommon, so I am looking for another alternative:
df <- tibble(
a = list(list(c(1, 2), c(3, 4)), list(c(1, 2), c(3, 4)))
)
df %>%
mutate(
b = a %>% modify_depth(2, sum) %>% map(unlist)
)
which gives, and it's the right solution. But I am looking to use more of a map solution and less of modify.
# A tibble: 2 x 2
a b
<list> <list>
1 <list [2]> <dbl [2]>
2 <list [2]> <dbl [2]>
Solution in view mode:
If we don't know the depth and the list elements have multiple depths or same depth, rrapply would be more general
libary(dplyr)
library(purrr)
df %>%
mutate(b = rrapply::rrapply(a, f = sum) %>%
map(unlist))
-output
# A tibble: 2 x 2
# a b
# <list> <list>
#1 <list [2]> <dbl [2]>
#2 <list [2]> <dbl [2]>
also, there is map_depth. According to ?map_depth
map_depth() allows to apply .f to a specific depth level of a nested vector
which is same as modify_depth
df %>%
mutate(
b = a %>%
map_depth(2, sum) %>%
map(unlist) )
Let's say I have the following (simplified) tibble containing a group and values in vectors:
set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
values = replicate(5, sample(3), simplify = FALSE)))
# A tibble: 5 x 2
group values
<fct> <list>
1 A <int [3]>
2 A <int [3]>
3 B <int [3]>
4 B <int [3]>
5 B <int [3]>
tb_vec[[1,2]]
[1] 1 3 2
I would like to summarize the values vectors per group by summing them (vectorized) and tried the following:
tb_vec %>% group_by(group) %>%
summarize(vec_sum = colSums(purrr::reduce(values, rbind)))
Error: Column vec_sum must be length 1 (a summary value), not 3
The error surprises me, because tibbles (the output format) can contain vectors as well.
My expected output would be the following summarized tibble:
# A tibble: 2 x 2
group vec_sum
<fct> <list>
1 A <dbl [3]>
2 B <dbl [3]>
Is there a tidyverse solution accommodate the vector output of summarize? I want to avoid splitting the tibble, because then I loose the factor.
You just need to add list(.) within summarise in your solution, in order to be able to have a column with 2 elements, where each element is a vector of 3 values:
library(tidyverse)
set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
values = replicate(5, sample(3), simplify = FALSE)))
tb_vec %>%
group_by(group) %>%
summarize(vec_sum = list(colSums(purrr::reduce(values, rbind)))) -> res
res$vec_sum
# [[1]]
# [1] 2 4 6
#
# [[2]]
# [1] 6 5 7
I would like to find the sum across occurrences and then the mean of those sums across simulations in the following:
library(tidyverse)
set.seed(123)
s <- 2
data <- data.frame(
lamda = c(5, 2, 3),
meanlog = c(9, 10, 11),
sdlog = c(2, 2.1, 2.2))
data2 <- data %>%
mutate(freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))))
I would like to take the sum of freqsev then the mean of the sum of freqsev over the simulation (s) dimension:
Any ideas on how this can be achieved? Thank you!
data3 <- data2 %>%
mutate(sum-freqsev = ???
mean-sum-freqsev = ???)
Dimensions expected:
data2 is a data.frame with 3 rows (e.g . per lamda)
sum-freqsev should be list of <int [2]> i.e the sum of entries in freqsev.
mean-sum-freqsev should be a number, simply the mean of sum-freqsev per lamda
We can use a nested map to find sum_freqsev and a single map to find mean_sum_freqsev:
library(tidyverse)
data3 <- data2 %>%
mutate(sum_freqsev = freqsev %>% map(~map_dbl(., sum)),
mean_sum_freqsev = sum_freqsev %>% map_dbl(mean),
percentile = freqsev %>% map(~map(., ~quantile(.x, c(.50, .90)))))
The inner map_dbl sums the entries of freqsev over each simulation and returns a vector of type double instead of a list with two elements.
mean_sum_freqsev is calculated by taking the mean of each list element (a vector) of sum_freqsev and returning a double.
Output:
> as.tibble(data3)
# A tibble: 3 x 8
lamda meanlog sdlog freq freqsev sum_freqsev mean_sum_freqsev percentile
<dbl> <dbl> <dbl> <list> <list> <list> <dbl> <list>
1 5 9 2 <int [2]> <list [2]> <dbl [2]> 1493880. <list [2]>
2 2 10 2.1 <int [2]> <list [2]> <dbl [2]> 623586. <list [2]>
3 3 11 2.2 <int [2]> <list [2]> <dbl [2]> 15219. <list [2]>
> data3 %>% pull(percentile)
[[1]]
[[1]][[1]]
50% 90%
24633.8 1832533.5
[[1]][[2]]
50% 90%
22461.18 114075.74
[[2]]
[[2]][[1]]
50% 90%
470808.0 845321.7
[[2]][[2]]
50% 90%
12539.82 202665.48
[[3]]
[[3]][[1]]
50% 90%
3906.931 10100.830
[[3]][[2]]
50% 90%
NA NA
I understand how to use split, lapply and the combine the list outputs back together using base R. I'm trying to understand the purrr way to do this. I can do it with base R and even with purrr* but am guessing since I seem to be duplciating the order variable that I'm doing it wrong. It feels clunky so I don't think I get it.
What is the tidyverse approach to using info from data subsets to create a nested output column?
Base R approach to make nested column in a data frame
library(tidyverse)
set.seed(10)
dat2 <- dat1 <- data_frame(
v1 = LETTERS[c(1, 1, 1, 1, 2, 2, 2, 2)],
v2 = rep(1:4, 2),
from = c(1, 3, 2, 1, 3, 5, 2, 1),
to = c(1, 3, 2, 1, 3, 5, 2, 1) + sample(1:3, 8, TRUE)
)
dat1 <- split(dat1, dat1[c('v1', 'v2')]) %>%
lapply(function(x){
x$order <- list(seq(x$from, x$to))
x
}) %>%
{do.call(rbind, .)}
dat1
unnest(dat1)
My purrr approach (what is the right way?)
dat2 %>%
group_by(v1, v2) %>%
nest() %>%
mutate(order = purrr::map(data, ~ with(., seq(from, to)))) %>%
select(-data)
Desired output
v1 v2 from to order
* <chr> <int> <dbl> <dbl> <list>
1 A 1 1 3 <int [3]>
2 B 1 3 4 <int [2]>
3 A 2 3 4 <int [2]>
4 B 2 5 6 <int [2]>
5 A 3 2 4 <int [3]>
6 B 3 2 3 <int [2]>
7 A 4 1 4 <int [4]>
8 B 4 1 2 <int [2]>
In this particular case it seems you're looking for:
mutate(dat2,order = map2(.x = from,.y = to,.f = seq))
Using the new, experimental, rap package:
remotes::install_github("romainfrancois/rap")
library(rap)
dat2 %>%
rap(order = ~ seq(from, to))