Using named list to generate cases for case_when - r

I am trying to figure out how to use case_when for groups stored in a list so modifying the list will modify the result of the case_when.
Here is a toy test case:
library(tidyverse)
info <- tibble(target = letters[seq(1, 10)])
groups <- list("A" = letters[seq(1, 10, by = 3)],
"B" = letters[seq(2, 10, by = 3)],
"C" = letters[seq(3, 10, by = 3)])
info %>% mutate(case_when(
target %in% groups$A ~ names(groups)[1],
target %in% groups$B ~ names(groups)[2],
target %in% groups$C ~ names(groups)[3]
))
This gives the output I want but I want to generate the options in the case_when dynamically from the list. I imagine it would be something like this:
generate_cases <- function(x, i) {
### I have no idea what to do here...
}
cases <- groups %>% imap(generate_cases)
info %>% mutate(case_when(!!! cases))
I suspect something use quo() and rlang::expr() but I really can't figure out how to string it together.

Here's one way using the purrr::imap function
cases <- imap(groups, ~quo(target %in% !!.x ~ !!.y))
info %>% mutate(case_when(
!!!cases
))
A better alternative might be to reshape your groups into a proper lookup table so you can do an efficent left-join. One way would be
info %>%
left_join(stack(groups), by=c("target"="values"))

Related

Check if single column is equal to any multiple others

My question seems simple, but I just can't do it. I have a dataframe with multiple columns with the name starting with coa and another column p with values like A, D, F, and so on, which changes according to the id.
All I found is how to do this matching with a fixed value, let's say "A", as below:
df <-df %>%
mutate(ly = any(str_detect(c_across(starts_with("coa")), "A")))
However, in my case, I want to compare to the column p specifically, where p changes, something like this:
df <-df %>%
mutate(ly = any(str_detect(c_across(starts_with("coa")), p)))
In this case, I get the error:
x no applicable method for 'type' applied to an object of class "factor"
Any thoughts? Thanks!
If we need to create a column, use if_any
library(dplyr)
library(stringr)
df <- df %>%
mutate(ly = if_any(starts_with("coa"), ~ str_detect(.x, p)))
I think this is a good place to use dplyr::across. You can run vignette('colwise') for a more comprehensive guide, but the key point here is that we can mutate all columns starting with "coa" simultaneously using the function == and we can pass a second argument, p, to == using the ... option provided by across.
library(dplyr)
df <- tibble(p = 1:10, coa1 = 1:10, coa2 = 11:20)
df %>%
mutate(across(.cols = starts_with('coa'), .fns = `==`, p))

How to use for loop to create new data frames using i in the name of data frame in R

I'm relatively new to R and have been searching this forum for an example. While I have found some similar questions and answers, I still can't seem to get my code to work.
I would like to use a for loop in R to create a series of new data frames that incorporates the temporary value for i in the name of the new data frame. I have the following code in which I would like to create two new data frames: metrics_2013, and metrics_2014. I have some calculations (mutate and filter) to apply to the new dataframes, but I'm leaving that out for simplicity.
yearlist <- as.list(c(2013, 2014))
metrics_ <- data.frame(matrix(ncol = 4, nrow = 0))
for (i in yearlist) {
maxyear <- i
minyear <- maxyear - 7
metrics_[as.character(i)] <- mutatedata %>%
group_by(symbol) %>%
filter(year>=minyear & year<=maxyear) %>%
summarize(
avgroepercent = mean(roe,na.rm = TRUE),
avgrocpercent = mean(roc, na.rm = TRUE),
epsroc = (((last(eps))/(first(eps)))^(1/(maxyear-minyear))-1)
)
}
In this case both dataframes (for 2013 and 2014) will be equal to "data", as all I'm having trouble with at the moment is creating data frames with names based on the value of i. I believe that it may have something to do with [], vs [[]], or maybe I need to define metrics_[i] prior to the for loop??? But any assistance is much appreciated!!!
Doing this slightly differently will be a lot simpler and make your life easier in the long run. Instead of having several variables with auto-generated names, have a list whose elements have auto-generated names. e.g.:
data <- data.frame(a=1:2)
metrics <- list()
for (i in yearlist) {
metrics[as.character(i)] <- data
}
(there may be a better way to do this than with a loop, but that's another topic)
You can try :
library(dplyr)
yearlist <- c(2013, 2014)
lapply(yearlist, function(x) {
maxyear <- x
minyear <- maxyear - 7
mutatedata %>%
filter(year>=minyear & year<=maxyear) %>%
group_by(symbol) %>%
summarize(
avgroepercent = mean(roe,na.rm = TRUE),
avgrocpercent = mean(roc, na.rm = TRUE),
epsroc = (((last(eps))/(first(eps)))^(1/(maxyear-minyear))-1)
)
}) -> data
where data is a list of dataframes. If you want to create separate dataframes you can use list2env.
names(data) <- paste0('metrics_', yearlist)
list2env(data, .GlobalEnv)

dplyr group_by loop through different columns

I have the following data;
I would like to create three different dataframes using group_by and summarise dplyr functions. These would be df_Sex, df_AgeGroup and df_Type. For each of these columns I would like to perform the following function;
df_Sex = df%>%group_by(Sex)%>%summarise(Total = sum(Number))
Is there a way of using apply or lapply to pass the names of each of these three columns (Sex, AgeGrouping and Type) to these create 3 dataframes?
This will work but will create a list of data frames as your output
### Create your data first
df <- data.frame(ID = rep(10250,6), Sex = c(rep("Female", 3), rep("Male",3)),
Population = c(rep(3499, 3), rep(1163,3)), AgeGrouping =c(rep("0-14", 3), rep("15-25",3)) ,
Type = c("Type1", "Type1","Type2", "Type1","Type1","Type2"), Number = c(260,100,0,122,56,0))
gr <- list("Sex", "AgeGrouping","Type")
df_list <- lapply(gr, function(i) group_by(df, .dots=i) %>%summarise(Total = sum(Number)))
Here's a way to do it:
f <- function(x) {
df %>%
group_by(!!x) %>%
summarize(Total = sum(Number))
}
lapply(c(quo(Sex), quo(AgeGrouping), quo(Type)), f)
There might be a better way to do it, I haven't looked that much into tidyeval. I personally would prefer this:
library(data.table)
DT <- as.data.table(df)
lapply(c("Sex", "AgeGrouping", "Type"),
function(x) DT[, .(Total = sum(Number)), by = x])

Apply map function to grouped data frame in with purrr

I am trying to apply a function which takes multiple inputs (which are columns which vary depending on the problem at hand) and applying this to list of data frames. I have taken the below code from this example: Map with Purrr multiple dataframes and have those modified dataframes as the output and modified it to include another metric of my choosing ('choice'). This code, however, throws an error:
Error in .f(.x[[i]], ...) : unused argument (choice = "disp").
Ideally, I would like to be able to create a grouped data frame (with group_by or split() and apply a function over the different groups within the data frame, however have not been able to work this out. Hence looking at a list of data frames instead.
mtcars2 <- mtcars
#change one variable just to distinguish them
mtcars2$mpg <- mtcars2$mpg / 2
#create the list
dflist <- list(mtcars,mtcars2)
#then, a simple function example
my_fun <- function(x)
{x <- x %>%
summarise(`sum of mpg` = sum(mpg),
`sum of cyl` = sum(cyl),
`sum of choice` = sum(choice))}
#then, using map, this works and prints the desired results
list_results <- map(dflist,my_fun, choice= "disp")
Three things to fix the code above:
Add choice as an argument in your function.
Make your function have an output by removing x <-
Use tidyeval to make the "choice" argument work.
The edited code thus looks like this:
my_fun <- function(x, choice)
{x %>%
summarise(`sum of mpg` = sum(mpg),
`sum of cyl` = sum(cyl),
`sum of choice` = sum(!!choice))}
list_results <- map(dflist, my_fun, choice = quo(disp))
If you want to stay within a dataframe/tibble, then using nest to create list-columns might help.
mtcars2$group <- sample(c("a", "b", "c"), 32, replace = TRUE)
mtcars2 %>%
as_tibble() %>%
nest(-group) %>%
mutate(out = map(data, my_fun, quo(disp))) %>%
unnest(out)

How can I simultaneously assign value to multiple new columns with R and dplyr?

Given
base <- data.frame( a = 1)
f <- function() c(2,3,4)
I am looking for a solution that would result in a function f being applied to each row of base data frame and the result would be appended to each row. Neither of the following works:
result <- base %>% rowwise() %>% mutate( c(b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( (b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( b,c,d = f() )
What is the correct syntax for this task?
This appears to be a similar problem (Assign multiple new variables on LHS in a single line in R) but I am specifically interested in solving this with functions from tidyverse.
I think the best you are going to do is a do() to modify the data.frame. Perhaps
base %>% do(cbind(., setNames(as.list(f()), c("b","c","d"))))
would probably be best if f() returned a list in the first place for the different columns.
In case you're willing to do this without dplyr:
# starting data frame
base_frame <- data.frame(col_a = 1:10, col_b = 10:19)
# the function you want applied to a given column
add_to <- function(x) { x + 100 }
# run this function on your base data frame, specifying the column you want to apply the function to:
add_computed_col <- function(frame, funct, col_choice) {
frame[paste(floor(runif(1, min=0, max=10000)))] = lapply(frame[col_choice], funct)
return(frame)
}
Usage:
df <- add_computed_col(base_frame, add_to, 'col_a')
head(df)
And add as many columns as needed:
df_b <- add_computed_col(df, add_to, 'col_b')
head(df_b)
Rename your columns.

Resources