dplyr: how to pass strings to dplyr's mutate argument - r

I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")

I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")

Related

R function that selects certain columns from a dataframe

I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")

How to put a formula within a function in R?

I want to store a dplyr function/formula (e.g. filter(exercise=="Inadequate") or mutate(exercise="adequate") in the variable_to_filter section for my function. I have lots of variables that need to go through this function. How can I do that? I know the code below doesn't work, but I hope you can see the logic in what I'm trying to do.
exercise_inadequate<-(exercise=="Inadequate")
variable_to_mutate<-(mutate(exercise="adequate"))
difference_pe<-function(percent, variable_to_filter, variable_to_mutate){
filtered <- dataset %>% filter(variable_to_filter)
sampled <- sample_frac(filtered, percent/100)
sampled <- sampled %>% mutate(variable_to_mutate)
}
difference_pe(100, exercise_inadequate, exercise_adequate)
I would prefer passing the column name and value separately to the function because evaluating string as condition in filter statement can be ugly.
library(dplyr)
library(rlang)
difference_pe<- function(dataset, percent, col, value) {
filtered <- dataset %>% filter({{col}} == value)
sampled <- sample_frac(filtered, percent/100)
return(sampled)
}
You can use this function as :
difference_pe(dataset, 100, exercise, "Inadequate")
If for some reason the above is not possible and you need to pass condition as string we can use parse_expr which is similar to eval parse.
exercise_inadequate<- 'exercise=="Inadequate"'
difference_pe<- function(dataset, percent, variable_to_filter) {
filtered <- dataset %>% filter(eval(parse_expr(variable_to_filter)))
#filtered <- dataset %>% filter(eval(parse(text = variable_to_filter)))
sampled <- sample_frac(filtered, percent/100)
return(sampled)
}
difference_pe(dataset, 100, exercise_inadequate)

removing and replacing observations with string package

I have two datasets, I'm trying to join together. the column i am joining by does not exactly match up with each other. first file the column looks like this: 00:01:54:2145 etc. 00: for every single observation. I want to change all the observations in this column to be in this format: 01/54/2145.
I have tried several things with string package, but can't get it to work.
df1 <- df %>%
str_replace_all("00:")
I'm getting this error, but don't think that's the only problem:
argument is not an atomic vector; coercing
Thank you
library(stringr)
library(dplyr)
my_conversion <- Vectorize(function(str) {
str_replace(str, "^00:", "") %>%
str_replace_all(":", "/")
})
df <- data.frame(
a_column = 1:3, key_column = c("00:01:54:2145", "00:01:54:2145", "00:01:54:2145"))
df %>% mutate(key_column = my_conversion(key_column))

Applying multiple functions to one column

I have four functions, clean, clean2, cleanFun, and trim. Currently I apply the functions to one column, like so.
library(tidyverse)
library(data.table)
py17$CE.Finding.Description <- clean(py17$CE.Finding.Description)
py17$CE.Finding.Description <- clean2(py17$CE.Finding.Description)
py17$CE.Finding.Description <- cleanFun(py17$CE.Finding.Description)
py17$CE.Finding.Description <- trim(py17$CE.Finding.Description)
This process does the trick but I have to copy and paste this multiple times, and I'd eventually like to expand this to multiple columns.
For now, I'd like to save time and add an apply function but I'm not sure how to create that apply function. I've tried creating this.
maxclean <- function(cleaner) {
c(clean(cleaner), clean2(cleaner), cleanFun(cleaner), trim(cleaner))
}
py17$CE.Finding.Description <- sapply(py17$CE.Finding.Description, maxclean)
After trying this I just get
Error in `$<-.data.frame`(`*tmp*`, CE.Finding.Description, value = c(NA, :
replacement has 4 rows, data has 4318
I do not get any errors doing this the long way. Where am I going wrong on this?
Your maxclean function should take the same arguments as the separate functions. In your case - a vector. And then call each function in a row. Like this:
maxclean <- function(x) {
x <- clean(x)
x <- clean2(x)
x <- cleanFun(x)
x <- trim(x)
return(x)
}
Apparently, the OP has created a cleaning pipeline where the output of one step is fed into the next step and the final result of the pipeline overwrites the original input.
The magrittr package has the freduce() function which applies one function after the other in the described way. Thus,
py17$CE.Finding.Description <- clean(py17$CE.Finding.Description)
py17$CE.Finding.Description <- clean2(py17$CE.Finding.Description)
py17$CE.Finding.Description <- cleanFun(py17$CE.Finding.Description)
py17$CE.Finding.Description <- trim(py17$CE.Finding.Description)
can be written as:
library(magrittr)
fcts <- list(clean, clean2, cleanFun, trim)
py17$CE.Finding.Description %<>% freduce(fcts)
which is a shortcut for
py17$CE.Finding.Description <- py17$CE.Finding.Description %>%
clean() %>%
clean2() %>%
cleanFun() %>%
trim()
Here, %>% is the magrittr forward-pipe operator and %<>% is the magrittr compound assignment pipe-operator which updates the left-hand side object with the resulting value.
Reproducible example
Using the mtcars dataset:
data(mtcars)
mycars <- mtcars
mycars$mpg %<>%
{. - mean(.)} %>%
abs() %>%
sqrt()
mycars
or
mycars <- mtcars
mycars$mpg %<>% freduce(list(function(.) {. - mean(.)}, abs, sqrt))
mycars
Applying on multiple columns
The OP has mentioned that he eventually like to expand this to multiple columns
This can be achieved by, e.g.,
mycars <- mtcars
fcts <- list(function(.) {. - mean(.)}, abs, sqrt)
mycars$mpg %<>% freduce(fcts)
mycars$disp %<>% freduce(fcts)
mycars

use outside variable inside of rename() function in R

I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct

Resources