In R how to pass a column as parameter to strsplit? - r

What is the proper way to pass a column as parameter to a str_split function and have it recognized as a column?
library(tidyverse)
library(lazyeval)
df = data.frame("x"=c("apple/pear","pear/banana/kiwi","orange/pear"))
function (col) {
mtcars %>%
select(col) %>%
transform(col = interp(strsplit(~v, "/"), v=as.name(col)) )
}
currently this is returning error 'Error in strsplit(~v, "-") : non-character argument'

We can use tidyverse options instead of mixing base R with tidyverse. separate_rows from tidyr splits the column and reshape it to 'long' format. Inside the function, we can make use of the curly-curly operator ({{}}) that evaluates unquoted argument to the function
library(dplyr)
library(tidyr)
f1 <- function(data, col) {
data %>%
separate_rows({{col}}, sep="/")
}
f1(df, x)

Related

How to write custom pipe-friendly functions?

I'm trying to create pipe-friendly functions using magrittr
For example, I tried to write a custom function to calculate the mean of a column:
library(magrittr)
custom_function <-
function(.data, x) {
mean(.data$x)
}
mtcars %>%
custom_function(mpg)
But I'm getting this error:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'mpg' not found
Maybe my reference to the variable is not working. How do I fix this .data$x ?
.data$x does not refer to a column whose name is held in a variable x but refers to a column called "x". Use .data[[x]] to refer to the column whose name is the character string held in variable x and call your function using character string "mpg".
library(magrittr)
custom_function <- function(.data, x) mean(.data[[x]])
mtcars %>% custom_function("mpg")
## [1] 20.09062
In base R, we can change the $ to [[ and convert the unquoted column names to character with deparse/substitute
custom_function <- function(.data, x) {
mean(.data[[deparse(substitute(x))]])
}
Now, we apply the function
mtcars %>%
custom_function(mpg)
#[1] 20.09062
The issue with $ is that it is literally checking the column name 'x' without the associative value it stored. Thus, it is failing and returns NULL
With tidyverse, we can use the curly-curly operator ({{}}) to do the evaluation within summarise. As we need only a single summarised output, summarise can return that single value whereas if we need to create a new column in the original dataset, we need mutate. After we create summarised column, just pull that column as a vector
custom_function <- function(.data, x) {
.data %>%
summarise(out = mean({{x}})) %>%
pull(out)
}
mtcars %>%
custom_function(mpg)
[1] 20.09062

How to use 2 variables in 1 argument in a R function with dplyr

I would like to create a function in R which would allow to to group a data framen according to 2 variables put in a single argument of my function.
My example :
library(dplyr)
myfunction <- function(vars=NULL) {
starwars %>%
group_by(!!sym(vars)) %>%
summarise(stat=mean(height,na.rm=T))
}
# It works
myfunction(vars="gender")
# It doesn't work
myfunction(vars=c("gender","sex"))
Many thanks in advance !
For multiple arguments you can use syms with !!! :
library(dplyr)
library(rlang)
myfunction <- function(vars=NULL) {
starwars %>%
group_by(!!!syms(vars)) %>%
summarise(stat=mean(height,na.rm=T))
}
However across accepts string arguments so you can do this without any NSE :
myfunction <- function(vars=NULL) {
starwars %>%
group_by(across(all_of(vars))) %>%
summarise(stat=mean(height,na.rm=T))
}

How to pass a filter statement as a function parameter in dplyr using quosure [duplicate]

This question already has answers here:
dplyr/rlang: parse_expr with multiple expressions
(3 answers)
Closed 2 years ago.
Using the dplyr package in R, I want to pass a filter statement as a parameter in a function. I don't know how to evaluate the statement as code instead of a string. When I try the code below, I get an error message. I'm assuming I need a quosure or something, but I don't fully grasp that concept.
data("PlantGrowth")
myfunc <- function(df, filter_statement) {
df %>%
filter(!!filter_statement)
}
myfunc(PlantGrowth, "group %in% c('trt1', 'trt2')")
> Error: Argument 2 filter condition does not evaluate to a logical vector
# Want to do the same as this:
# PlantGrowth %>%
# filter(group %in% c('trt1', 'trt2'))
You can use parse_expr from rlang
library(dplyr)
myfunc <- function(df, filter_statement) {
df %>% filter(eval(rlang::parse_expr(filter_statement)))
}
identical(myfunc(PlantGrowth, "group %in% c('trt1', 'trt2')"),
PlantGrowth %>% filter(group %in% c('trt1', 'trt2')))
#[1] TRUE
The same can be done using infamous eval and parse.
myfunc <- function(df, filter_statement) {
df %>% filter(eval(parse(text = filter_statement)))
}

Mutating Columns with paste0

I'm looking to dynamically name columns. I need to duplicate variables with new names. Why isn't the new_sepal_length_2 variable the same as new_sepal_length? How can I fix this?
new_var = 'Sepal.Length'
iris %>% mutate(new_sepal_length = Sepal.Length,
new_sepal_length_2 = noquote(paste0(new_var)))
We can convert it to symbol (sym) and evaluate (!!)
library(dplyr)
library(stringr)
iris %>%
mutate(new_sepal_length = str_c(!!rlang::sym(new_var), collapse=", "))
Or another option is to make use of mutate_at which can take strings in vars
iris %>%
mutate_at(vars(new_var), list(new= ~ str_c(., collapse=", ")))
Or use paste
iris %>%
mutate(new_sepal_length = paste(!!rlang::sym(new_var), collapse = ", "))
paste0 or paste by itself only converts to character class. Perhaps, we may need to use the arguments in paste

Magritttr + lapply where first argument isn't to LHS [duplicate]

This question already has answers here:
Use pipe without feeding first argument
(2 answers)
Closed 6 years ago.
I'd like to pass a data frame into lapply via %>%, but I need to be able to access the names of the columns, so my lapply arguments are something like this:
mydf %>%
lapply( 1:length(.), function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})
However, when I try that, I get the following error:
Error in match.fun(FUN) :
'1:length(.)' is not a function, character or symbol
As far as I can tell R and lapply don't like 1:length(.). I suppose a valid option is breaking the chain, but I'd like to learn how to do it properly.
Your issue here is that %>% is inserting mydf as the first argument (so that three arguments are getting passed to lapply. Try wrapping the entire lapply expression in brackets. This prevents the insertion behavior:
mydf %>%
{ lapply( 1:length(.), function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
}) }
I think the prettiest fix would be to make a new function:
manipulate_whole_df = function(mydf)
lapply( 1:length(mydf), function(x)
manipulate_df( mydf[x], using_column_names(names(mydf)[x] ) ) )
mydf %>%
manipulate_whole_df
Or even
library(tidyr)
mydf %>%
gather(variable, value) %>%
group_by(variable) %>%
do(manipulate_df(.$value,
.$variable %>% first %>% using_column_name ) )
The function in lapply() only references the column indexes / column names, referencing mtcars in a way that does not depend on the iteration in lapply, so pipe the names
names(mtcars) %>% lapply(function(x) mtcars[x])
or write a proper closure
names(mtcars) %>% lapply(function(x, df) df[x], df=mtcars)
or perhaps you don't really need to access the names but just the columns?
mtcars %>% lapply(function(x) sqrt(sum(x)))
I think what you want is the following:
mydf %>% length %>% seq %>%
lapply(function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})
or you can use a lambda function:
mydf %>% {1:length(.)} %>%
lapply(function(x) {
manipulate_df( mydf[x], using_column_names(names(mydf)[x] )
})

Resources