Select multiple data columns in function - r

How can I select for multiple existing columns from a dataframe when I index my function with the triple dots as a parameter?
for example:
devTest <- function(data,...){
col = list(...)
innerTest <- function(...){
more = list(...)
data %>% select(more)
}
x <- innerTest({{col}})
x
}
devTest(mtcars,mpg, gear)
produces this error:
Error in devTest(mtcars, vs) : object 'vs' not found

The main issue is that you need to defuse the arguments using enquos (since you want to pass column symbols rather than strings to devText):
devTest <- function(data, ...) {
col <- enquos(...)
innerTest <- function(col) {
data %>% select(!!!col)
}
innerTest(col)
}
devTest(mtcars,mpg, gear)
Other minor issues are the duplicate list(...) calls which are not necessary, as we can define innerTest to take a list of quosures directly (which we can then evaluate using the triple-bang operator !!!).

Related

Function containing dataframe and variable using lapply

I have two dataframes and a function, which works when I use it on a single variable.
library(tidyverse)
iris1<-iris
iris2<-iris
iris_fn<-function(df,species_type){
df1<-df%>%
filter((Species==species_type))
return(df1)}
new_df<-iris_fn(df=iris1, species_type="setosa")
I want to pass a vector of variables to the function with the expected output being a list of dataframes (3), one filtered to each variable, for which I have been experimenting using lapply:
variables<-c("setosa","versicolor","virginica")
new_df<-lapply(df=iris1, species_type="setosa", FUN= iris_fn)
The error message is Error in is.vector(X) : argument "X" is missing, with no default which I dont understand because I have stated the variables of the function and what the name of the function is.
Can anyone suggest a solution to get the desired output? I essentially need a version of lapply or purrr function that will allow a dataframe and a vector as inputs.
lapply expects an argument called X as the main input. You could re-write it so that the function expects X instead of species_type e.g.
iris_fn <- function(df, X){
df1 <- df %>% filter((Species==X))
return(df1)
}
variables <- c("setosa", "versicolor", "virginica")
new_df <- lapply(X=variables, FUN=iris_fn, df=iris1)
EDIT:
Alternatively to avoid using X, you need the first argument of the function to match the lapply input e.g.
iris_fn <- function(species_type, df){
df1 <- df %>% filter((Species==species_type))
return(df1)
}
new_df <- lapply(variables, FUN=iris_fn, df=iris1)
Check out the split function for a convenient way to split a data.frame to a list e.g. split(iris, f=iris$Species)
From ?lapply : lapply(X, FUN, ...) , by naming all your arguments there's no X that could be passed to function as the first arg.
Try something like this:
library(dplyr)
iris1<-iris
# note the changes arg. order
iris_fn<-function(species_type, df){
df1<-df%>%
filter((Species==species_type))
return(df1)}
variables<-c("setosa","versicolor","virginica")
new_df_list <-lapply(variables, iris_fn, df=iris1 )
Or with just an anonymous function:
new_df_list <-lapply(variables, \(x) filter(iris1, Species == x))
As you already use Tidyverse, perhaps with purrr::map() instead:
library(purrr)
new_df_list <- map(variables, ~ filter(iris1, Species == .x))
Created on 2022-11-14 with reprex v2.0.2

What is the cause of the 'not found' error in my user-defined function?

df <- tibble(wspid = c("text","text",1:9,NA),
PID = c("text","text",1:10))
#Function to export a single column inputted in the function argument
export_ids <- function(var) {
export_df <- df %>%
filter(!is.na(var)) %>%
select(var)
write_csv(export_df, "~/Downloads/final_ids.csv")
}
#Calling the function
export_ids(wspid)
I keep getting the same error:
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `!is.na(var)`.
x object 'wspid' not found
I suspect there's some issue with the scoping of the function but no matter what combinations I try-- such as defining the tibble within the function or referencing the tibble directly within the function (as in, df$var) I still get an error.
As we pass unquoted variable, use {{}} (curly-curly opertor) for evaluation
export_ids <- function(var) {
export_df <- df %>%
filter(!is.na({{var}})) %>%
select({{var}})
write_csv(export_df, "~/Downloads/final_ids.csv")
}
-testing
export_ids(wspid)
I guess a better solution would be to simply use:
df <- tibble(wspid = c("text","text",1:9,NA),
PID = c("text","text",1:10))
#Function to export a single column inputted in the function argument
export_ids <- function(df, var) {
data <- df[[var]]
# now filter out NA values
data <- data[!is.na(data)]
write_csv(data, "~/Downloads/final_ids.csv")
}
#Calling the function
export_ids(df, "wspid")

How to find object name passed to function

I have a function which takes a dataframe and its columns and processes it in various ways (left out for simplicity). We can put in column names as arguments or transform columns directly inside function arguments (like here). I need to find out what object(s) are passed in the function.
Reproducible example:
df <- data.frame(x= 1:10, y=1:10)
myfun <- function(data, col){
col_new <- eval(substitute(col), data)
# magic part
object_name <- ...
# magic part
plot(col_new, main= object_name)
}
For instance, the expected output for myfun(data= df, x*x) is the plot plot(df$x*df$x, main= "x"). So the title is x, not x*x. What I have got so far is this:
myfun <- function(data, col){
colname <- tryCatch({eval(substitute(col))}, error= function(e) {geterrmessage()})
colname <- gsub("' not found", "", gsub("object '", "", colname))
plot(eval(substitute(col), data), main= colname)
}
This function gives the expected output but there must be some more elegant way to find out to which object the input refers to. The answer must be with base R.
Use substitute to get the expression passed as col and then use eval and all.vars to get the values and name.
myfun <- function(data, col){
s <- substitute(col)
plot(eval(s, data), main = all.vars(s), type = "o", ylab = "")
}
myfun(df, x * x)
Anothehr possibility is to pass a one-sided formula.
myfun2 <- function(formula, data){
plot(eval(formula[[2]], data), main = all.vars(formula), type = "o", ylab = "")
}
myfun2(~ x * x, df)
The rlang package can be very powerful when you get a hang of it. Does something like this do what you want?
library(rlang)
myfun <- function (data, col){
.col <- enexpr(col)
unname(sapply(call_args(.col), as_string))
}
This gives you back the "wt" column.
myfun(mtcars, as.factor(wt))
# [1] "wt"
I am not sure your use case, but this would work for multiple inputs.
myfun(mtcars, sum(x, y))
# [1] "x" "y"
And finally, it is possible you might not even need to do this, but rather store the expression and operate directly on the data. The tidyeval framework can help with that as well.

Is there a way to pass arguments with logical operators (!=, >,<) to a function?

create_c <- function(df, line_number = NA, prior_trt, line_name, biomarker, ...) {
if (!"data.frame" %in% class(df)) {
stop("First input must be dataframe")
}
# handle extra arguments
args <- enquos(...)
names(args) <- tolower(names(args))
# check for unknown argument - cols that do not exist in df
check_args_exist(df, args)
# argument to expression
ex_args <- unname(imap(args, function(expr, name) quo(!!sym(name) == !!expr)))
# special case arguments
if (!missing(line_number)) {
df <- df %>% filter(line_number %in% (!!line_number))
if (!missing(prior_trt)) {
df <- filter_arg(df. = df, arg = prior_trt, col = "prior_trt_", val = "y")
}
}
if (!missing(biomarker)) {
df <- filter_arg(df. = df, arg = biomarker, col = "has_", val = "positive")
}
if (!missing(line_name)) {
ln <- list()
if (!!str_detect(line_name[1], "or")) {
line_name <- str_split(line_name, " or ", simplify = TRUE)
}
for (i in 1:length(line_name)) {
ln[[i]] <- paste(tolower(sort(strsplit(line_name[i], "\\+")[[1]])), collapse = ",")
}
df <- df %>% filter(line_name %in% (ln))
}
df <- df %>%
group_by(patient_id) %>%
slice(which.min(line_number)) %>%
ungroup()
df <- df %>% filter(!!!ex_args)
invisible(df)
}
I have this function where I am basically filtering various columns based on parameters users pass. I want the users to be able to pass logical operators like >,<, != for some of the parameters. Right now my function is not able to handle any other operators besides '='. Is there a way to accomplish this?
create_c(df = bsl_all_nsclc,
line_number > 2)
create_c(df, biomarker != "positive)
Error in tolower(arg) : object 'biomarker' not found
Certainly there is a way: operators are regular functions in R, you can pass them around like any other function.
The only complication is that the operators are non-syntactic names so you can’t just pass them “as is”, this would confuse the parser. Instead, you need to wrap them in backticks, to make their use syntactically valid where a name would be expected:
filter_something = function (value, op) {
op(value, 13)
}
filter_something(cars$speed, `>`)
filter_something(cars$speed, `<`)
filter_something(cars$speed, `==`)
And since R also supports non-standard evaluation of function arguments, you can also pass unevaluated expressions — this gets slightly more complicated, since you’d want to evaluate them in the correct context. ‘rlang’/‘dplyr’ uses data masking for this.
How exactly you need to apply this depends entirely on the context in which the expression is to be used. In many cases, you can simply dispatch them to the corresponding ‘dplyr’ functions, e.g.
filter_something2 = function (.data, expr) {
.data %>%
filter({{expr}})
}
filter_something2(cars, speed < 13)
The “secret sauce” here is the {{…}} syntax. This works because filter from ‘dplyr’ accepts unevaluated arguments and handles {{expr}} specially by transforming it into (effectively) !! enexpr(expr). That is: expr is first “defused”: it is explicitly marked as unevaluated, and the name expr is replaced by the unevaluated expression it binds to (speed < 13 in the above). Next, this unevaluated expression is unquoted. That is, the wrapper is “peeled off” from the expression, and that unevaluated expression itself is handled inside filter as if it were passed as filter(.data, speed < 13). In other words: the name expr is substituted with the speed < 13 in the call expression.
For a more thorough explanation, please refer to the Programming with dplyr vignette.

dplyr exclude columns using the dot argument

How can we write a function that let user drop multiple columns using the ... argument dplyr style?
E.g.
mydrop=function(x,...){function body}
mydrop(npk,N:K)
returns npk[,c("block","yield")].
Note that it is important that the ... argument is compatible with all the ?select_helpers functions.
Similar to #akrun, but allowing for the N:K , dplyr style column selection the OP requested for (...), as well as some error handling:
mydrop <- function(x,...){
try(
todrop <- x %>%
select(...) %>% names(.)
, silent = TRUE)
if(exists('todrop')){
x %>% select(setdiff(current_vars(), todrop))
}else x
}
Perhaps we can use
mydrop <- function(x,...){
nm <- list(...)
if(length(nm)>0) {
x %>%
select(-one_of(unlist(nm)))
} else x
}
mydrop(npk, "N", "K")
Using reproducible example
mydrop(mtcars, 'mpg', 'cyl')
mydrop(mtcars)
mydrop(mtcars, names(mtcars)[-1])
mydrop(mtcars, names(mtcars))

Resources