How to put a formula within a function in R? - r

I want to store a dplyr function/formula (e.g. filter(exercise=="Inadequate") or mutate(exercise="adequate") in the variable_to_filter section for my function. I have lots of variables that need to go through this function. How can I do that? I know the code below doesn't work, but I hope you can see the logic in what I'm trying to do.
exercise_inadequate<-(exercise=="Inadequate")
variable_to_mutate<-(mutate(exercise="adequate"))
difference_pe<-function(percent, variable_to_filter, variable_to_mutate){
filtered <- dataset %>% filter(variable_to_filter)
sampled <- sample_frac(filtered, percent/100)
sampled <- sampled %>% mutate(variable_to_mutate)
}
difference_pe(100, exercise_inadequate, exercise_adequate)

I would prefer passing the column name and value separately to the function because evaluating string as condition in filter statement can be ugly.
library(dplyr)
library(rlang)
difference_pe<- function(dataset, percent, col, value) {
filtered <- dataset %>% filter({{col}} == value)
sampled <- sample_frac(filtered, percent/100)
return(sampled)
}
You can use this function as :
difference_pe(dataset, 100, exercise, "Inadequate")
If for some reason the above is not possible and you need to pass condition as string we can use parse_expr which is similar to eval parse.
exercise_inadequate<- 'exercise=="Inadequate"'
difference_pe<- function(dataset, percent, variable_to_filter) {
filtered <- dataset %>% filter(eval(parse_expr(variable_to_filter)))
#filtered <- dataset %>% filter(eval(parse(text = variable_to_filter)))
sampled <- sample_frac(filtered, percent/100)
return(sampled)
}
difference_pe(dataset, 100, exercise_inadequate)

Related

dplyr: how to pass strings to dplyr's mutate argument

I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")
I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")

How to drop columns that meet a certain pattern over a list of dataframes

I'm trying to drop columns that have a suffix .1 - indicating that this is a repeated column name. This needs to act over a list of dataframe
I have written a function:
drop_duplicated_columns <- function (df) {
lapply(df, function(x) {
x <- x %>% select(-contains(".1"))
x
})
return(df)
}
However it is not working. Any ideas why?
One tidy way to solve this problem would be to first create a function that works for one data.frame and then map this function to a list
library(tidyverse)
drop_duplicated_columns <- function(df) {
df %>%
select(-contains(".1"))
}
Or even better
drop_duplicated_columns <- . %>%
select(-contains(".1"))
Usage in pipes, combine it with a map
list_dfs <- list(mtcars,mtcars)
list_dfs %>%
map(drop_duplicated_columns)
If you just need one function you can create a new pipe using the functioning code that you tested before
drop_duplicated_columns_list <- . %>%
map(drop_duplicated_columns)
list_dfs %>%
drop_duplicated_columns_list()

R - Passing a string as a string in a user defined function R

I am trying to write a function which subsets a dataset containing a certain string.
Mock data:
library(stringr)
set.seed(1)
codedata <- data.frame(
Key = sample(1:10),
ReadCodePreferredTerm = sample(c("yes", "prefer", "Had refer"), 20, replace=TRUE)
)
User defined function:
findterms <- function(inputdata, variable, searchterm) {
outputdata <- inputdata[str_which(inputdata$variable, regex(searchterm, ignore_case=TRUE)), ]
return(outputdata)
}
I am expecting at least a couple of rows returned, but I get 0 when I run the following code:
findterms(codedata, ReadCodePreferredTerm, " refer") #the space in front of this word is deliberate
I realise I am trying to do something quite simple... but can't find out why it isn't working.
Note, the code works fine when not defined as a function:
referterms <- codedata[str_which(codedata$ReadCodePreferredTerm, regex(" refer", ignore_case=TRUE)), ]
You can use dplyr and stringr to do this simply
library(magrittr) # For the pipe (%>%)
library(dplyr)
library(stringr)
codedata %>%
dplyr::filter(str_detect(ReadCodePreferredTerm, '\\brefer\\b'))
You can also write your own function if you like, you will need rlang as well if you don't want to pass in a string for the variable name. something like this works
library(rlang)
findterms <- function(df, variable, searchterm) {
variable <- enquo(variable)
return(
df %>%
dplyr::filter(str_detect(!!variable, str_interp('\\b${ searchterm }\\b')))
)
}
findterms(codedata, ReadCodePreferredTerm, 'refer')

Simple loop with subset and variable name assignment

I am actually learning R and I don't understand why this simple assignment does not works. I would like to subset by year using the filter function of the dplyr package. After several tentatives, here are a reproducible example using the gapminder dataset.
I could use the subset function, lapply, or even anonymous function to solve this problem, but here, I just want to understand why this specific code is not working.
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
paste0("gapminder", i) <- print(gapminder %>%
filter(year == i))
}
With or without print, same problem
It's because your assignment is to a function (paste0).
If you remove that part it prints each filtered dataframe:
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
print(gapminder %>% filter(year == i))
}
You could assign each to a list, like so:
my_list <- list()
library(gapminder)
library(dplyr)
for (i in seq_along(unique(gapminder$year))) {
year_filter <- unique(gapminder$year)[i] # each iteration we get another year
my_list[[i]] <- gapminder %>% filter(year == year_filter)
cat(paste0("gapminder", year_filter, " ")) # use cat if you want to print at each iteration
}
paste0 just concatenates vectors after converting to character.
Use assign function to store the output.
for (i in unique(gapminder$year))
{
assign(paste0("gapminder", i),print(gapminder %>%filter(year == i)))
}
If you want to get the specific output, use get function.
out_i = get(paste0("gapminder", i))

use outside variable inside of rename() function in R

I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct

Resources