Using dplyr in a for loop for multiple values - r

I have a dataframe (df) like this. I have 82 SKUs started from M1 to M82.
SKU date sales
M1 2-jan 4
M2 2-jan 5
M1 3-jan 8
M82 3-jan 1
...
M82 31-dec 9
i want to filter each SKU seperate and then group_by(date) and summarise(sales_perday = sum(sales)
Something like this
for(i in SKU){
SKU_M[i] <- df %>% filter(SKU == SKU_M[i]) %>% group_by(date)
%>% summarise(sales_perday = sum(sales))
Expected output are 82 dataframes with each SKU in 1 dataframe.
I did this below for 1 SKU but i want it for all 82 in an easy way.
M50 <- df %>% filter(SKU == 'M50') %>% group_by(date) %>% summarise(sales_perday = sum(sales))

You probably want to group by multiple columns:
library(tidyverse)
data <- tribble(
~SKU, ~date, ~sales,
"M1", "2-jan",4,
"M2", "2-jan",5,
"M1", "3-jan",8
)
# the cioncise way
data %>%
group_by(SKU, date) %>%
summarise(sales_perday = sum(sales))
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> # A tibble: 3 × 3
#> # Groups: SKU [2]
#> SKU date sales_perday
#> <chr> <chr> <dbl>
#> 1 M1 2-jan 4
#> 2 M1 3-jan 8
#> 3 M2 2-jan 5
# if one really want to have multiple data frames
data %>%
group_by(SKU, date) %>%
summarise(sales_perday = sum(sales)) %>%
nest(-SKU) %>%
pull(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = -SKU`?
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> [[1]]
#> # A tibble: 2 × 2
#> date sales_perday
#> <chr> <dbl>
#> 1 2-jan 4
#> 2 3-jan 8
#>
#> [[2]]
#> # A tibble: 1 × 2
#> date sales_perday
#> <chr> <dbl>
#> 1 2-jan 5
Created on 2022-06-08 by the reprex package (v2.0.0)

Another option with split:
df <- df |>
group_by(date) |>
summarise(sales_perday = sum(sales))
split(df, df$SKU)

If you really do want separate data frames, then after grouping by SKU and date, and then summarizing, use group_split() to partition by SKU.
library(tidyverse)
df <- tribble(
~SKU, ~date, ~sales,
"M1", "2-jan",4,
"M2", "2-jan",5,
"M1", "3-jan",8
)
df |>
group_by(SKU, date) |>
summarise(sales_perday = sum(sales)) |>
group_split()
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> <list_of<
#> tbl_df<
#> SKU : character
#> date : character
#> sales_perday: double
#> >
#> >[2]>
#> [[1]]
#> # A tibble: 2 × 3
#> SKU date sales_perday
#> <chr> <chr> <dbl>
#> 1 M1 2-jan 4
#> 2 M1 3-jan 8
#>
#> [[2]]
#> # A tibble: 1 × 3
#> SKU date sales_perday
#> <chr> <chr> <dbl>
#> 1 M2 2-jan 5

Related

Accessing variable name in for loop in R?

I am trying to run a for loop where I randomly subsample a dataset using sample_n command. I also want to name each new subsampled dataframe as "df1" "df2" "df3". Where the numbers correspond to i in the for loop. I know the way I wrote this code is wrong and why i am getting the error. How can I access "df" "i" in the for loop so that it reads as df1, df2, etc.? Happy to clarify if needed. Thanks!
for (i in 1:9){ print(get(paste("df", i, sep=""))) = sub %>%
group_by(dietAandB) %>%
sample_n(1) }
Error in print(get(paste("df", i, sep = ""))) = sub %>% group_by(dietAandB) %>% :
target of assignment expands to non-language object
Instead of using get you could use assign.
Using some fake example data:
library(dplyr, warn=FALSE)
sub <- data.frame(
dietAandB = LETTERS[1:2]
)
for (i in 1:2) {
assign(paste0("df", i), sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup())
}
df1
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
df2
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
But the more R-ish way to do this would be to use a list instead of creating single objects:
df <- list(); for (i in 1:2) { df[[i]] = sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup() }
df
#> [[1]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
#>
#> [[2]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
Or more concise to use lapply instead of a for loop
df <- lapply(1:2, function(x) sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup())
df
#> [[1]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
#>
#> [[2]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
It depends on the sample size which is missing in your question. So, As an example I considered the mtcars dataset (32 rows) and sampling three subsamples of size 20 from the data:
library(dplyr)
for (i in 1:3) {
assign(paste0("df", i), sample_n(mtcars, 20))
}

Formatting dates when converting to list

I have a table like this:
ID
Date
Status
101
2020-09-14
1
102
2020-09-14
1
103
2020-09-14
1
104
2020-09-14
2
105
2020-09-14
2
106
2020-09-14
2
But want a table like this:
Status
ID
Date
1
101,102,103
2020-09-14, 2020-09-14, 2020-09-14
1
104,105,106
2020-09-14, 2020-09-14, 2020-09-14
Code that i'm currently using:
note: date is in format yyyy-mm-dd before running code.
g1 <- df1 %>%
mutate(Date = as.Date(Date, format = '%Y%m%d')) %>%
group_by(status) %>%
summarise_at(c("ID", "Date"), list)
This seems to work except for the date in the new table is not in yyyy-mm-dd. For example, 2021-06-10 is converting to 18788.
A possible solution:
library(tidyverse)
df %>%
group_by(Status) %>%
summarise(ID = str_c(ID, collapse = ","), Date = str_c(Date, collapse = ","))
#> # A tibble: 2 × 3
#> Status ID Date
#> <int> <chr> <chr>
#> 1 1 101,102,103 2020-09-14,2020-09-14,2020-09-14
#> 2 2 104,105,106 2020-09-14,2020-09-14,2020-09-14
A more succinct alternative:
library(tidyverse)
df %>%
group_by(Status) %>%
summarise(across(c(ID, Date), str_c, collapse = ","))
#> # A tibble: 2 × 3
#> Status ID Date
#> <int> <chr> <chr>
#> 1 1 101,102,103 2020-09-14,2020-09-14,2020-09-14
#> 2 2 104,105,106 2020-09-14,2020-09-14,2020-09-14

R: What is the difference between dplyr::group_keys() and summarise()?

Suppose I group a data.frame() using dplyr::group_by(). Is there any scenario where passing this to group_keys() or summarise() would produce different results? Was surprised to see a group_keys function.
library(dplyr)
df <- data.frame(x = rep(1:2, 10), y = rep(1:10,2))
df_grouped <- df %>% group_by(x,y)
# group_keys
df_grouped %>% group_keys()
# summarise
df_grouped %>% summarise()
summarise() without arguments will strip one level of grouping, returning
a grouped data frame if there are multiple grouping columns:
library(dplyr)
mtcars %>%
group_by(am, vs) %>%
summarise()
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 x 2
#> # Groups: am [2]
#> am vs
#> <dbl> <dbl>
#> 1 0 0
#> 2 0 1
#> 3 1 0
#> 4 1 1
group_keys() does not return a grouped data frame, and is more idiomatic
for the task:
mtcars %>%
group_by(am, vs) %>%
group_keys()
#> # A tibble: 4 x 2
#> am vs
#> <dbl> <dbl>
#> 1 0 0
#> 2 0 1
#> 3 1 0
#> 4 1 1

I Want to merge 2 Data Frames by a fraction of row Names

I have the following Data Tables and I want to merge them.
Problem: They have just not completely similar row names.
For Example
Data Frame row Names
Call.Dec.2018-Ask Price
Call.Dec.2017-Ask Price
Call.Dec.2015-Ask Price
Call.Dec.2013-Ask Price
Call.Dec.2019-Ask Price
Call.Dec.2029-Ask Price
Data Frame row names
Call.Dec.2018-Strike Price
Call.Dec.2017-Strike Price
Call.Dec.2015-Strike Price
Call.Dec.2013-Strike Price
Call.Dec.2019-Strike Price
Call.Dec.2029-Strike Price
I know that there is a solution but I can't find it.
Thanks for helping
You can split the column into two and then join:
library(tidyverse)
tbl_a <- tibble::tribble(
~name,
"Call.Dec.2017-Ask Price",
"Call.Dec.2015-Ask Price",
"Call.Dec.2013-Ask Price",
"Call.Dec.2019-Ask Price",
"Call.Dec.2029-Ask Price"
) %>%
mutate(price = 1)
tbl_a
#> # A tibble: 5 × 2
#> name price
#> <chr> <dbl>
#> 1 Call.Dec.2017-Ask Price 1
#> 2 Call.Dec.2015-Ask Price 1
#> 3 Call.Dec.2013-Ask Price 1
#> 4 Call.Dec.2019-Ask Price 1
#> 5 Call.Dec.2029-Ask Price 1
tbl_b <- tibble::tribble(
~name,
"Call.Dec.2017-Strike Price",
"Call.Dec.2015-Strike Price",
"Call.Dec.2013-Strike Price",
"Call.Dec.2019-Strike Price",
"Call.Dec.2029-Strike Price"
) %>%
mutate(price = 2)
tbl_b
#> # A tibble: 5 × 2
#> name price
#> <chr> <dbl>
#> 1 Call.Dec.2017-Strike Price 2
#> 2 Call.Dec.2015-Strike Price 2
#> 3 Call.Dec.2013-Strike Price 2
#> 4 Call.Dec.2019-Strike Price 2
#> 5 Call.Dec.2029-Strike Price 2
list(tbl_a, tbl_b) %>%
map(~ .x %>% separate(name, into = c("name", "type"), sep = "-")) %>%
reduce(full_join) %>%
pivot_wider(names_from = type, values_from = price)
#> Joining, by = c("name", "type", "price")
#> # A tibble: 5 × 3
#> name `Ask Price` `Strike Price`
#> <chr> <dbl> <dbl>
#> 1 Call.Dec.2017 1 2
#> 2 Call.Dec.2015 1 2
#> 3 Call.Dec.2013 1 2
#> 4 Call.Dec.2019 1 2
#> 5 Call.Dec.2029 1 2
Created on 2021-09-25 by the reprex package (v2.0.1)

How do you use dplyr::pull to convert grouped a colum into vectors?

I have a tibble, df, I would like to take the tibble and group it and then use dplyr::pull to create vectors from the grouped dataframe. I have provided a reprex below.
df is the base tibble. My desired output is reflected by df2. I just don't know how to get there programmatically. I have tried to use pull to achieve this output but pull did not seem to recognize the group_by function and instead created a vector out of the whole column. Is what I'm trying to achieve possible with dplyr or base r. Note - new_col is supposed to be a vector created from the name column.
library(tidyverse)
library(reprex)
df <- tibble(group = c(1,1,1,1,2,2,2,3,3,3,3,3),
name = c('Jim','Deb','Bill','Ann','Joe','Jon','Jane','Jake','Sam','Gus','Trixy','Don'),
type = c(1,2,3,4,3,2,1,2,3,1,4,5))
df
#> # A tibble: 12 x 3
#> group name type
#> <dbl> <chr> <dbl>
#> 1 1 Jim 1
#> 2 1 Deb 2
#> 3 1 Bill 3
#> 4 1 Ann 4
#> 5 2 Joe 3
#> 6 2 Jon 2
#> 7 2 Jane 1
#> 8 3 Jake 2
#> 9 3 Sam 3
#> 10 3 Gus 1
#> 11 3 Trixy 4
#> 12 3 Don 5
# Desired Output - New Col is a column of vectors
df2 <- tibble(group=c(1,2,3),name=c("Jim","Jane","Gus"), type=c(1,1,1), new_col = c("'Jim','Deb','Bill','Ann'","'Joe','Jon','Jane'","'Jake','Sam','Gus','Trixy','Don'"))
df2
#> # A tibble: 3 x 4
#> group name type new_col
#> <dbl> <chr> <dbl> <chr>
#> 1 1 Jim 1 'Jim','Deb','Bill','Ann'
#> 2 2 Jane 1 'Joe','Jon','Jane'
#> 3 3 Gus 1 'Jake','Sam','Gus','Trixy','Don'
Created on 2020-11-14 by the reprex package (v0.3.0)
Maybe this is what you are looking for:
library(dplyr)
df <- tibble(group = c(1,1,1,1,2,2,2,3,3,3,3,3),
name = c('Jim','Deb','Bill','Ann','Joe','Jon','Jane','Jake','Sam','Gus','Trixy','Don'),
type = c(1,2,3,4,3,2,1,2,3,1,4,5))
df %>%
group_by(group) %>%
mutate(new_col = name, name = first(name, order_by = type), type = first(type, order_by = type)) %>%
group_by(name, type, .add = TRUE) %>%
summarise(new_col = paste(new_col, collapse = ","))
#> `summarise()` regrouping output by 'group', 'name' (override with `.groups` argument)
#> # A tibble: 3 x 4
#> # Groups: group, name [3]
#> group name type new_col
#> <dbl> <chr> <dbl> <chr>
#> 1 1 Jim 1 Jim,Deb,Bill,Ann
#> 2 2 Jane 1 Joe,Jon,Jane
#> 3 3 Gus 1 Jake,Sam,Gus,Trixy,Don
EDIT If new_col should be a list of vectors then you could do `summarise(new_col = list(c(new_col)))
df %>%
group_by(group) %>%
mutate(new_col = name, name = first(name, order_by = type), type = first(type, order_by = type)) %>%
group_by(name, type, .add = TRUE) %>%
summarise(new_col = list(c(new_col)))
Another option would be to use tidyr::nest:
df %>%
group_by(group) %>%
mutate(new_col = name, name = first(name, order_by = type), type = first(type, order_by = type)) %>%
nest(new_col = new_col)

Resources