Replacement of the dot function from plyr - r

How can I transform a vector of groups specified using the plyr function . such as .(group, sex) into a vector of characters like this c("group", "sex").
We used the plyr approach to specify the groups in an older version of our R package. In the new version we want the user to specify the groups using a vector of strings, but we do not want to break previous code that used the dot approach.
Example of the old function:
library(plyr)
my_function_old <- function(df, grouping) {
ddply(df, grouping, summarize,
m = mean(mpg))
}
my_function_old(mtcars, .(cyl, vs))
Example of the new function:
library(dplyr)
my_function_new <- function(df, grouping) {
df %>%
group_by(!!!syms(grouping)) %>%
summarise(m = mean(mpg))
}
my_function_new(mtcars, c("cyl", "vs"))
In the new function the grouping should be specified using a vector of strings. I would like to check whether the user is using the old dot notation in the new function and in that case to transform the grouping variables specified with the dot to a vector of strings.

Using enexpr
library(dplyr)
my_function <- function(df, grouping) {
grouping <- as.character(enexpr(grouping))[-1]
df %>%
group_by(!!!syms(grouping)) %>%
summarise(m = mean(mpg))
}
my_function(mtcars, c("cyl", "vs")) # this works
my_function(mtcars, .(cyl, vs)) # this also works

Related

Passing argument into function for group_by in dplyr [duplicate]

This question already has answers here:
How to pass column name as argument to function for dplyr verbs?
(4 answers)
Closed 7 months ago.
I am trying to use group_by within a function call in dplyr (R) and I am getting unexpected results. Here is an example of what I am trying to do:
df = data.frame(a = c(0,0,1,1), b = c(0,1,0,1), c = c(1,2,3,4))
result1 = df %>%
group_by(a,b) %>%
mutate(d = sum(c))
result1$d
myFunc <- function(df, var) {
output = df %>%
group_by(a,!!var) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,"b")
result2$d
result1$d yields [1,2,3,4] which is what I expected. result2$d yields [3,3,7,7] which I do not want, and I am not sure what is going on.
It works to have b (without quotes) as the function argument, and {{var}} in place of !!var. Unfortunately, in my case, my column names are in string format (but maybe there is a way to transform the string beforehand so that it will work with the {{}} notation?)
If you want to pass a character object that can refer to a certain column of a data frame, you should use !!sym(var):
myFunc <- function(df, var) {
output = df %>%
group_by(a, !!sym(var)) %>%
mutate(d = sum(c))
return(output)
}
myFunc(df, "b")
If you want to pass a data-masked argument, you should use {{ var }} or equivalently !!enquo(var):
myFunc <- function(df, var) {
output = df %>%
group_by(a, {{ var }}) %>%
mutate(d = sum(c))
return(output)
}
myFunc(df, b)
Note that I pass "b" and b respectively into the function in the two different cases.
If we want to use quoting and unquoting instead of curlycurly {{}} the we should consider this basic procedure: https://tidyeval.tidyverse.org/dplyr.html
Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.
1. Abstraction step:
Here we identify the varying steps. In our case var in group_by:
2. Quoting step:
Identify all the arguments where the user is allowed to refer to data frame columns directly.
The function can’t evaluate these arguments right away.
Instead they should be automatically quoted. Apply enquo() to these arguments
3. Unquoting step:
Identify where these variables are passed to other quoting functions and unquote with !!.
In this case we pass var to group_by():
myFunc <- function(df, var) {
var <- enquo(var)
output = df %>%
group_by(a,!!var) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,b)
output:
[1] 1 2 3 4
Just as I post a question, I come across something that works...
myFunc <- function(df, var) {
output = df %>%
group_by_at(.vars = c("a",var)) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,"b")

Calculate mode for each column in dataframe using lapply dplyr

I'm trying to create a function that essentially gets me the MODE...or MODE-X (2nd-Xth most common value & and the associated counts for each column in a data frame.
I can't figure out what I may be missing and I'm looking for some assistance? I believe it has to do with the passing in of a variable into dplyr function.
library(tidyverse)
myfunct_get_mode = function(x, rank=1){
mytable = dplyr::count(rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = table %>% dplyr::slice(rlang::sym(rank))
return(result)
}
mtcars %>% lapply(. %>% (function(x) myfunct_get_mode(x, rank=2)))
There are some problems with your function:
You function-call is not doing what you think. Check with mtcars %>% lapply(. %>% (function(x) print(x))) that actually your x is the whole column of mtcars. To get the names of the column apply the function to names(mtcars). But then you also have to specify the dataframe you're working on.
To evaluate a symbol you get sym from you need to use !! in front of the rlang::sym(x).
rank is not a variable name, thus no need for rlang::sym here.
table should be mytable in second to last line of your function.
So how could it work (although there are probably better ways):
myfunct_get_mode = function(df, x, rank=1){
mytable = count(df, !!rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = mytable %>% slice(rank)
return(result)
}
names(mtcars) %>% lapply(function(x) myfunct_get_mode(mtcars, x, rank=2))
If we need this in a list, we can use map
f1 <- function(dat, rank = 1) {
purrr::imap(dat, ~
dat %>%
count(!! rlang::sym(.y)) %>%
rename_all(~ c('variable', 'counts')) %>%
arrange(desc(counts)) %>%
slice(seq_len(rank))) #%>%
#bind_cols - convert to a data.frame
}
f1(mtcars, 2)

Replacing group_by_ with group_by when the argument is a string in dplyr

I have some code that specifies a grouping variable as a string.
group_var <- "cyl"
My current code for using this grouping variable in a dplyr pipeline is:
mtcars %>%
group_by_(group_var) %>%
summarize(mean_mpg = mean(mpg))
My best guess as to how to replace the deprecated group_by_ function with group_by is:
mtcars %>%
group_by(!!as.name(group_var)) %>%
summarize(mean_mpg = mean(mpg))
This works but is not explicitly mentioned in the programming with dplyr vignette.
Is using !!as.name() the preferred way to replace group_by_() with group_by()?
Is this within a function? Otherwise I think the !!as.name() part is unnecessary and I would stick with the group_by_at(group_var) suggestion by #aosmith for simplicity sake. Otherwise, I would set it up as so:
examplr <- function(data, group_var){
group_var <- as.name(group_var)
data %>%
group_by(!!group_var) %>%
summarize(mean_mpg = mean(mpg))
}
examplr(data = mtcars,
group_var = "cyl")

How to programmatically group a data_frame by each column name specified in a vector? [duplicate]

I'm writing a function where the user is asked to define one or more grouping variables in the function call. The data is then grouped using dplyr and it works as expected if there is only one grouping variable, but I haven't figured out how to do it with multiple grouping variables.
Example:
x <- c("cyl")
y <- c("cyl", "gear")
dots <- list(~cyl, ~gear)
library(dplyr)
library(lazyeval)
mtcars %>% group_by_(x) # groups by cyl
mtcars %>% group_by_(y) # groups only by cyl (not gear)
mtcars %>% group_by_(.dots = dots) # groups by cyl and gear, this is what I want.
I tried to turn y into the same as dots using:
mtcars %>% group_by_(.dots = interp(~var, var = list(y)))
#Error: is.call(expr) || is.name(expr) || is.atomic(expr) is not TRUE
How to use a user-defined input string of > 1 variable names (like y in the example) to group the data using dplyr?
(This question is somehow related to this one but not answered there.)
No need for interp here, just use as.formula to convert the strings to formulas:
dots = sapply(y, . %>% {as.formula(paste0('~', .))})
mtcars %>% group_by_(.dots = dots)
The reason why your interp approach doesn’t work is that the expression gives you back the following:
~list(c("cyl", "gear"))
– not what you want. You could, of course, sapply interp over y, which would be similar to using as.formula above:
dots1 = sapply(y, . %>% {interp(~var, var = .)})
But, in fact, you can also directly pass y:
mtcars %>% group_by_(.dots = y)
The dplyr vignette on non-standard evaluation goes into more detail and explains the difference between these approaches.
slice_rows() from the purrrlyr package (https://github.com/hadley/purrrlyr) groups a data.frame by taking a vector of column names (strings) or positions (integers):
y <- c("cyl", "gear")
mtcars_grp <- mtcars %>% purrrlyr::slice_rows(y)
class(mtcars_grp)
#> [1] "grouped_df" "tbl_df" "tbl" "data.frame"
group_vars(mtcars_grp)
#> [1] "cyl" "gear"
Particularly useful now that group_by_() has been depreciated.

dplyr::group_by_ with character string input of several variable names

I'm writing a function where the user is asked to define one or more grouping variables in the function call. The data is then grouped using dplyr and it works as expected if there is only one grouping variable, but I haven't figured out how to do it with multiple grouping variables.
Example:
x <- c("cyl")
y <- c("cyl", "gear")
dots <- list(~cyl, ~gear)
library(dplyr)
library(lazyeval)
mtcars %>% group_by_(x) # groups by cyl
mtcars %>% group_by_(y) # groups only by cyl (not gear)
mtcars %>% group_by_(.dots = dots) # groups by cyl and gear, this is what I want.
I tried to turn y into the same as dots using:
mtcars %>% group_by_(.dots = interp(~var, var = list(y)))
#Error: is.call(expr) || is.name(expr) || is.atomic(expr) is not TRUE
How to use a user-defined input string of > 1 variable names (like y in the example) to group the data using dplyr?
(This question is somehow related to this one but not answered there.)
No need for interp here, just use as.formula to convert the strings to formulas:
dots = sapply(y, . %>% {as.formula(paste0('~', .))})
mtcars %>% group_by_(.dots = dots)
The reason why your interp approach doesn’t work is that the expression gives you back the following:
~list(c("cyl", "gear"))
– not what you want. You could, of course, sapply interp over y, which would be similar to using as.formula above:
dots1 = sapply(y, . %>% {interp(~var, var = .)})
But, in fact, you can also directly pass y:
mtcars %>% group_by_(.dots = y)
The dplyr vignette on non-standard evaluation goes into more detail and explains the difference between these approaches.
slice_rows() from the purrrlyr package (https://github.com/hadley/purrrlyr) groups a data.frame by taking a vector of column names (strings) or positions (integers):
y <- c("cyl", "gear")
mtcars_grp <- mtcars %>% purrrlyr::slice_rows(y)
class(mtcars_grp)
#> [1] "grouped_df" "tbl_df" "tbl" "data.frame"
group_vars(mtcars_grp)
#> [1] "cyl" "gear"
Particularly useful now that group_by_() has been depreciated.

Resources