Function writing passing column reference to group_by - r

I'm very new to R. Trying to define a function that groups a data set (group_by) and then creates summary statistics based on the groupings (dplyr, summarise_each).
Without defining a function the following works:
sum_stat <- data %>%
group_by(column) %>%
summarise_each(funs(mean), var1:var20)
The following function form does not work:
sum_stat <- function(data, column){
data %>%
group_by(column) %>%
summarise_each(funs(mean), var1:var20)
}
sum_stat(data, column)
The error message returned is:
Error: unknown column 'column'

This is the usual way you'd do this:
foo <- function(data,column){
data %>%
group_by_(.dots = column) %>%
summarise_each(funs(mean))
}
foo(mtcars,"cyl")
foo(mtcars,"gear")

Related

Changing factors order inside a function [duplicate]

I have been reading from this SO post on how to work with string references to variables in dplyr.
I would like to mutate a existing column based on string input:
var <- 'vs'
my_mtcars <- mtcars %>%
mutate(get(var) = factor(get(var)))
Error: unexpected '=' in:
"my_mtcars <- mtcars %>%
mutate(get(var) ="
Also tried:
my_mtcars <- mtcars %>%
mutate(!! rlang::sym(var) = factor(!! rlang::symget(var)))
This resulted in the exact same error message.
How can I do the following based on passing string 'vs' within var variable to mutate?
# works
my_mtcars <- mtcars %>%
mutate(vs = factor(vs))
This operation can be carried out with := while evaluating (!!) and using the conversion to symbol and evaluating on the rhs of assignment
library(dplyr)
my_mtcars <- mtcars %>%
mutate(!! var := factor(!! rlang::sym(var)))
class(my_mtcars$vs)
#[1] "factor"
Or without thinking too much, use mutate_at, which can take strings in vars and apply the function of interest
my_mtcars2 <- mtcars %>%
mutate_at(vars(var), factor)

How do I add checks to a function created using tidy eval framework?

Say I have created a function using the tidy eval framework -
library(tidyverse)
library(rlang)
my_function <- function(data, var){
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
When I run the following function, I get the result below it
my_function(mtcars, cyl)
# A tibble: 3 x 2
cyl count
<dbl> <int>
1 4 11
2 6 7
3 8 14
How do I add the following checks to this function -
Check if data is a dataframe. If not, return the error data should be a dataframe
Check if var is missing. If so return the error var is missing
You can make the following modifications.
In order to check if our input data is of a particular class we can check its class attribute and in this case whether it's a data frame or tibble they both contains the class data.frame
Also for missing function, it is normally used inside many functions to check whether an argument is assigned a value so that they generate a value as the default value. In your case we can terminate the execution of the function (you can also check the source code of length function on how it specifies a value for size argument when it is missing)
You can use base::stop in place of rlang::abort as specified by dear #akrun
library(rlang)
my_function <- function(data, var){
if(!"data.frame" %in% attr(data, "class")) {
abort("data should be a data frame")
}
if(missing(var)) {
abort("var is missing")
}
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
Special thanks to dear #27 ϕ 9 for bringing this valuable point to my attention. We can also customize the output error message in stopifnot function which is another way of checking your input arguments:
my_function <- function(data, var){
stopifnot("The input data is not of class data frame" = "data.frame" %in% attr(data, "class") ,
"var is missing" = !missing(var))
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
Special thanks to dear #IceCreamToucan for presenting yet another option which is using the inherits function in lieu of attr. In case the input data does not include data.frame in its class attributes it returns FALSE:
my_function <- function(data, var){
if(!inherits(data, "data.frame")) {
stop("data is not of class data.frame")
}
if(missing(var)) {
stop("var is missing")
}
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}

How to write a function that includes pipes using functions from the srvyr package?

I have a survey that I am trying to get to be grouped by years and the calculate totals for certain variables. I need to do this about 20 times with different variables so I am writing a function but I can't seem to get to work properly even though it works fine outside the function.
this works fine:
mepsdsgn %>% group_by(YEAR) %>% summarise(tot_pri = survey_total(TOTPRV)) %>% select(YEAR, tot_pri)
when I try a function:
total_calc <- function(x) {mepsdsgn %>% group_by(YEAR) %>% summarise(total = survey_total(x)) %>% select(YEAR, total)}
total_calc(TOTPRV)
I get this error: Error in stop_for_factor(x) : object 'TOTPRV' not found
Figured it out:
total_fun <- function(x) {
col = x
mepsdsgn %>% group_by(YEAR) %>% summarise(total = survey_total(!!sym(col), na.rm = TRUE)) %>% select(YEAR, total)
}
there are a couple of things I'd suggest doing, see below
# first try to make a working minimal example people can run in a new R session
library(magrittr)
library(dplyr)
dt <- data.frame(y=1:10, x=rep(letters[1:2], each=5))
# simple group and mean using the column names explicitly
dt %>% group_by(x) %>% summarise(mean(y))
# a bit of googling showed me you need to use group_by_at(vars("x")) to replicate
# using a string input
# in this function, add all arguments, so the data you use - dt & the column name - column.x
foo <- function(dt, column.x){
dt %>% group_by_at(vars(column.x)) %>% summarise(mean(y))
}
# when running a function, you need to supply the name of the column as a string, e.g. "x" not x
foo(dt, column.x="x")
I don't use dplyr, so there may be a better way

Calculate mode for each column in dataframe using lapply dplyr

I'm trying to create a function that essentially gets me the MODE...or MODE-X (2nd-Xth most common value & and the associated counts for each column in a data frame.
I can't figure out what I may be missing and I'm looking for some assistance? I believe it has to do with the passing in of a variable into dplyr function.
library(tidyverse)
myfunct_get_mode = function(x, rank=1){
mytable = dplyr::count(rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = table %>% dplyr::slice(rlang::sym(rank))
return(result)
}
mtcars %>% lapply(. %>% (function(x) myfunct_get_mode(x, rank=2)))
There are some problems with your function:
You function-call is not doing what you think. Check with mtcars %>% lapply(. %>% (function(x) print(x))) that actually your x is the whole column of mtcars. To get the names of the column apply the function to names(mtcars). But then you also have to specify the dataframe you're working on.
To evaluate a symbol you get sym from you need to use !! in front of the rlang::sym(x).
rank is not a variable name, thus no need for rlang::sym here.
table should be mytable in second to last line of your function.
So how could it work (although there are probably better ways):
myfunct_get_mode = function(df, x, rank=1){
mytable = count(df, !!rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = mytable %>% slice(rank)
return(result)
}
names(mtcars) %>% lapply(function(x) myfunct_get_mode(mtcars, x, rank=2))
If we need this in a list, we can use map
f1 <- function(dat, rank = 1) {
purrr::imap(dat, ~
dat %>%
count(!! rlang::sym(.y)) %>%
rename_all(~ c('variable', 'counts')) %>%
arrange(desc(counts)) %>%
slice(seq_len(rank))) #%>%
#bind_cols - convert to a data.frame
}
f1(mtcars, 2)

dplyr: How to use select and filter inside functions; (...) not working for arguments

I'm trying to build some functions for creating standard tables from a questionnaire, using dplyr for the data manipulation. This question was very helpful for the group_by function, passing arguments (in this case, the name of the variable I want to use to make the table) to (...), but that seems to break down when trying to pass the same arguments to other dplyr commands, specifically 'select' and 'filter'. The error message I get is '...' used in an incorrect context'.
Does anyone have any ideas on this? Thank you
For the sake of completeness (and any other hints - I'm very new to writing functions), here is the code I would like to use:
myTable <- function(x, ...) {
df <-
x %>%
group_by(Var1, ...) %>%
filter(!is.na(...) & ... != '') %>% # To remove missing values: Not working!
summarise(value = n()) %>%
group_by(Var1) %>%
mutate(Tot = sum(value)) %>%
group_by(Var1, ...) %>%
summarise(num = sum(value), total = sum(Tot), proportion = num/total*100) %>%
select(Var1, ..., proportion) # To select desired columns: Not working!
tab <- dcast(df, Var1 ~ ..., value.var = 'proportion')
tab[is.na(tab)] <- 0
print(tab)
}

Resources