dplyr exclude columns using the dot argument - r

How can we write a function that let user drop multiple columns using the ... argument dplyr style?
E.g.
mydrop=function(x,...){function body}
mydrop(npk,N:K)
returns npk[,c("block","yield")].
Note that it is important that the ... argument is compatible with all the ?select_helpers functions.

Similar to #akrun, but allowing for the N:K , dplyr style column selection the OP requested for (...), as well as some error handling:
mydrop <- function(x,...){
try(
todrop <- x %>%
select(...) %>% names(.)
, silent = TRUE)
if(exists('todrop')){
x %>% select(setdiff(current_vars(), todrop))
}else x
}

Perhaps we can use
mydrop <- function(x,...){
nm <- list(...)
if(length(nm)>0) {
x %>%
select(-one_of(unlist(nm)))
} else x
}
mydrop(npk, "N", "K")
Using reproducible example
mydrop(mtcars, 'mpg', 'cyl')
mydrop(mtcars)
mydrop(mtcars, names(mtcars)[-1])
mydrop(mtcars, names(mtcars))

Related

Function containing dataframe and variable using lapply

I have two dataframes and a function, which works when I use it on a single variable.
library(tidyverse)
iris1<-iris
iris2<-iris
iris_fn<-function(df,species_type){
df1<-df%>%
filter((Species==species_type))
return(df1)}
new_df<-iris_fn(df=iris1, species_type="setosa")
I want to pass a vector of variables to the function with the expected output being a list of dataframes (3), one filtered to each variable, for which I have been experimenting using lapply:
variables<-c("setosa","versicolor","virginica")
new_df<-lapply(df=iris1, species_type="setosa", FUN= iris_fn)
The error message is Error in is.vector(X) : argument "X" is missing, with no default which I dont understand because I have stated the variables of the function and what the name of the function is.
Can anyone suggest a solution to get the desired output? I essentially need a version of lapply or purrr function that will allow a dataframe and a vector as inputs.
lapply expects an argument called X as the main input. You could re-write it so that the function expects X instead of species_type e.g.
iris_fn <- function(df, X){
df1 <- df %>% filter((Species==X))
return(df1)
}
variables <- c("setosa", "versicolor", "virginica")
new_df <- lapply(X=variables, FUN=iris_fn, df=iris1)
EDIT:
Alternatively to avoid using X, you need the first argument of the function to match the lapply input e.g.
iris_fn <- function(species_type, df){
df1 <- df %>% filter((Species==species_type))
return(df1)
}
new_df <- lapply(variables, FUN=iris_fn, df=iris1)
Check out the split function for a convenient way to split a data.frame to a list e.g. split(iris, f=iris$Species)
From ?lapply : lapply(X, FUN, ...) , by naming all your arguments there's no X that could be passed to function as the first arg.
Try something like this:
library(dplyr)
iris1<-iris
# note the changes arg. order
iris_fn<-function(species_type, df){
df1<-df%>%
filter((Species==species_type))
return(df1)}
variables<-c("setosa","versicolor","virginica")
new_df_list <-lapply(variables, iris_fn, df=iris1 )
Or with just an anonymous function:
new_df_list <-lapply(variables, \(x) filter(iris1, Species == x))
As you already use Tidyverse, perhaps with purrr::map() instead:
library(purrr)
new_df_list <- map(variables, ~ filter(iris1, Species == .x))
Created on 2022-11-14 with reprex v2.0.2

Select multiple data columns in function

How can I select for multiple existing columns from a dataframe when I index my function with the triple dots as a parameter?
for example:
devTest <- function(data,...){
col = list(...)
innerTest <- function(...){
more = list(...)
data %>% select(more)
}
x <- innerTest({{col}})
x
}
devTest(mtcars,mpg, gear)
produces this error:
Error in devTest(mtcars, vs) : object 'vs' not found
The main issue is that you need to defuse the arguments using enquos (since you want to pass column symbols rather than strings to devText):
devTest <- function(data, ...) {
col <- enquos(...)
innerTest <- function(col) {
data %>% select(!!!col)
}
innerTest(col)
}
devTest(mtcars,mpg, gear)
Other minor issues are the duplicate list(...) calls which are not necessary, as we can define innerTest to take a list of quosures directly (which we can then evaluate using the triple-bang operator !!!).

Passing enquo expression to subfunction

This question is related to Passing variables to functions that use `enquo()`.
I have a higher function with arguments of a tibble (dat) and the columns of interest in dat (variables_of_interest_in_dat). Within that function, there is a call to another function to which I want to pass variables_of_interest_in_dat.
higher_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
lower_function(dat, ???variables_of_interest_in_dat???)
}
lower_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
dat %>%
select(!!!variables_of_interest_in_dat)
}
What is the recommended way to pass variables_of_interest_in_dat to lower_function?
I have tried lower_function(dat, !!!variables_of_interest_in_dat) but when I run higher_function(mtcars, cyl) this returns "Error: Can't use !!! at top level."
In the related post, the higher_function did not enquo the variables before passing them to lower function.
Thank you
Is this what you want?
library(tidyverse)
LF <- function(df,var){
newdf <- df %>% select({{var}})
return(newdf)
}
HF <- function(df,var){
LF(df,{{var}})
}
LF(mtcars,disp)
HF(mtcars,disp)
the {{}} (aka 'curly curly') operator replaces the approach of quoting with enquo()

How to evaluate empty quosure programmatically?

In my dataset, I have a few possible grouping variables a, b, c. How do I programmatically tell dplyr to not group by any variables?
For example:
granularity <- NA
if(isTRUE(granularity == 'all')){
# all group variables
group_variables <- quos(a, b, c)
}else if(isTRUE(granularity == 'no_c')){
# all except c
group_variables <- quos(a, b)
}else{
# no group variables
group_variables <- quo()
}
data_summary <- mydata %>%
group_by(!!! group_variables) %>%
summarise(
x_mean = mean(x)
)
This will run correctly if I set granularity to 'all' or 'no_c', but it fails when I assign group_variables to the empty quosure. Does anyone know how to make this work?
Edit: This question also applies to functions like select, so assume I wanted to run
data_select <- mydata %>%
select(!!! select_variables, d, e, f)
How do I set select_variables to sometimes be quos(a, b, c) or sometimes be empty?
Thanks!
Use group_variables <- NULL in that clause:
}else{
# no group variables
group_variables <- NULL
}
also note the massive warning:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `<empty>` is unknown
In addition: Warning message:
Unquoting language objects with `!!!` is soft-deprecated as of rlang 0.3.0.
Please use `!!` instead.
# Bad:
dplyr::select(data, !!!enquo(x))
# Good:
dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures
You might want to consider not using packages with unstable APIs.

Error: cannot join on columns: index out of bounds [duplicate]

I am trying to perform an inner join two tables using dplyr, and I think I'm getting tripped up by non-standard evaluation rules. When using the by=("a" = "b") argument, everything works as expected when "a" and "b" are actual strings. Here's a toy example that works:
library(dplyr)
data(iris)
inner_join(iris, iris, by=c("Sepal.Length" = "Sepal.Width"))
But let's say I was putting inner_join in a function:
library(dplyr)
data(iris)
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=c(xname = yname))
}
myfn("Sepal.Length", "Sepal.Width")
This returns the following error:
Error: cannot join on columns 'xname' x 'Sepal.Width': index out of bounds
I suspect there is some fancy expression, deparsing, quoting, or unquoting that I could do to make this work, but I'm a bit murky on those details.
You can use
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=setNames(yname, xname))
}
The suggested syntax in the ?inner_join documentation of
by = c("a"="b") # same as by = c(a="b")
is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.
I like MrFlick's answer and fber's addendum, but I prefer structure. For me setNames feels as something at the end of a pipe, not as an on-the-fly constructor. On another note, both setNames and structure enable the use of variables in the function call.
myfn <- function(xnames, ynames) {
data(iris)
inner_join(iris, iris, by = structure(names = xnames, .Data = ynames))
}
x <- "Sepal.Length"
myfn(x, "Sepal.Width")
A named vector argument would run into problems here:
myfn <- function(byvars) {
data(iris)
inner_join(iris, iris, by = byvars)
}
x <- "Sepal.Length"
myfn(c(x = "Sepal.Width"))
You could solve that, though, by using setNames or structure in the function call.
I know I'm late to the party, but how about:
myfn <- function(byvar) {
data(iris)
inner_join(iris, iris, by=byvar)
}
This way you can do what you want with:
myfn(c("Sepal.Length"="Sepal.Width"))
I faced a nearly identical challenge as #Peter, but needed to pass multiple different sets of by = join parameters at one time. I chose to use the map() function from the tidyverse package, purrr.
This is the subset of the tidyverse that I used.
library(magrittr)
library(dplyr)
library(rlang)
library(purrr)
First, I adapted myfn to use map() for the case posted by Peter. 42's comment and Felipe Gerard's answer made it clear that the by argument can take a named vector. map() requires a list over which to iterate.
myfn_2 <- function(xname, yname) {
by_names <- list(setNames(nm = xname, yname ))
data(iris)
# map() returns a single-element list. We index to retrieve dataframe.
map( .x = by_names,
.f = ~inner_join(x = iris,
y = iris,
by = .x)) %>%
`[[`(1)
}
myfn_2("Sepal.Length", "Sepal.Width")
I found that I didn't need quo_name() / !! in building the function.
Then, I adapted the function to take a list of by parameters. For each by_i in by_grps, we could extend x and y to add named values on which to join.
by_grps <- list( by_1 = list(x = c("Sepal.Length"), y = c("Sepal.Width")),
by_2 = list(x = c("Sepal.Width"), y = c("Petal.Width"))
)
myfn_3 <- function(by_grps_list, nm_dataset) {
by_named_vectors_list <- lapply(by_grps_list,
function(by_grp) setNames(object = by_grp$y,
nm = by_grp$x))
map(.x = by_named_vectors_list,
.f = ~inner_join(nm_dataset, nm_dataset, by = .x))
}
myfn_3(by_grps, iris)

Resources