Adapting a function to tidyverse ecosystem - r

Function box_m from library(rstatix) currently requires that its first argument NOT include the grouping variable, and its second argument only be the grouping variable. For example: box_m(d[-1], d$Group).
I'm trying to re-write this function such that box_m2 would work like: box_m2(d, Group).
I have tried the following without success but was wondering if there might be a way to achieve my goal?
library(rstatix)
library(tidyverse)
d <- read.csv("https://raw.githubusercontent.com/rnorouzian/v/main/memory.csv")[-1]
box_m(d[-1], d$Group) # How the function currently works
# box_m2(d, Group) # How I would like the function to work
# My trial without success to achieve `box_m2`:
box_m2 <- function(data, group){
dat <- dplyr::select(data, -vars(group))
box_m(dat, one_of(group))
}
# New function
box_m2(d, Group)

You can write the function with the help of curly-curly ({{}}) operator.
library(rstatix)
library(dplyr)
box_m2 <- function(data, group){
dat <- dplyr::select(data, -{{group}})
box_m(dat, data %>% pull({{group}}))
}
identical(box_m(d[-1], d$Group), box_m2(d, Group))
#[1] TRUE

We could also convert to symbol with ensym and evaluate (!!). In that way, it is flexible to either pass unquoted or quoted arguments
library(rstatix)
library(dplyr)
library(rstatix)
box_m2 <- function(data, group){
group <- ensym(group)
dat <- dplyr::select(data, -!!group)
box_m(dat, data %>%
pull(!! group))
}
-testing
identical(box_m(d[-1], d$Group), box_m2(d, Group))
#[1] TRUE
identical(box_m(d[-1], d$Group), box_m2(d, "Group"))
#[1] TRUE

Related

Automatize as.factor function on some variables via a for loop

I was struggling with the iteration of as.factor() function on the following list of variables.
d$block <- as.factor(d$block)
d$CR <- as.factor(d$CR)
d$T1.ACC <- as.factor(d$T1.ACC)
d$T1.correct <- as.factor(d$T1.correct)
d$T1.response <- as.factor(d$T1.response)
I was wondering how to create some code lines with either a for loop to iterate the as.factor function on such a list. Actually I used the following code
d %>%
dplyr::select(
block, CR, T1.ACC, T1.correct, T1.response) %>%
lapply(., as.factor) %>%
lapply(., is.factor)
But when going to verify variable singularly, that doesn't seem to work.
> is.factor(d$block)
[1] FALSE
Which one could be the problem? How it would be possible to code a good for loop? Thanks for replying
You could use dplyr's across function:
library(dplyr)
d <- d %>%
mutate(across(c(block, CR, T1.ACC, T1.correct, T1.response), as.factor))
If you really want to use a for-loop, you could use
for (i in c("block", "CR", "T1.ACC", "T1.correct", "T1.response")){
d[, i] <- as.factor(d[, i])
}

Replacement of the dot function from plyr

How can I transform a vector of groups specified using the plyr function . such as .(group, sex) into a vector of characters like this c("group", "sex").
We used the plyr approach to specify the groups in an older version of our R package. In the new version we want the user to specify the groups using a vector of strings, but we do not want to break previous code that used the dot approach.
Example of the old function:
library(plyr)
my_function_old <- function(df, grouping) {
ddply(df, grouping, summarize,
m = mean(mpg))
}
my_function_old(mtcars, .(cyl, vs))
Example of the new function:
library(dplyr)
my_function_new <- function(df, grouping) {
df %>%
group_by(!!!syms(grouping)) %>%
summarise(m = mean(mpg))
}
my_function_new(mtcars, c("cyl", "vs"))
In the new function the grouping should be specified using a vector of strings. I would like to check whether the user is using the old dot notation in the new function and in that case to transform the grouping variables specified with the dot to a vector of strings.
Using enexpr
library(dplyr)
my_function <- function(df, grouping) {
grouping <- as.character(enexpr(grouping))[-1]
df %>%
group_by(!!!syms(grouping)) %>%
summarise(m = mean(mpg))
}
my_function(mtcars, c("cyl", "vs")) # this works
my_function(mtcars, .(cyl, vs)) # this also works

Pass variables by name into a function that calls dplyr?

I'm trying to create a function that will take 2 variables from a dataset, and map their distinct values side by side, after which it will write the out to a csv file. I'll be using dplyr's distinct function for getting the unique values.
map_table <- function(df, var1, var2){
df_distinct <- df %>% distinct(var1, var2)
write.csv(df_distinct, 'var1.csv')
}
map_table(iris, Species, Petal.Width)
1) map_table(iris, Species, Petal.Width) doesn't produce what I want. It should produce 27 rows of data, instead I'm getting 150 rows of data.
2) How can I name the csv file after the input of var1?
So if var1 = 'Sepal.Length', the name of the file should be 'Sepal.Length.csv'
If you want to pass the col names without quotes, you need to use non-standard evaluation. (More here)
deparse(substitute()) will get you the name for the file output.
library(dplyr)
map_table <- function(df, var1, var2){
file_name <- paste0(deparse(substitute(var1)), ".csv") # file name
var1 <- enquo(var1) # non-standard eval
var2 <- enquo(var2) # equo() caputures the expression passed, ie: Species
df_distinct <- df %>%
distinct(!!var1, !!var2) # non-standard eval, !! tells dplyr to use Species
write.csv(df_distinct, file = file_name)
}
map_table(iris, Species, Petal.Width)
You're trying to pass the columns as objects. Try passing their names instead and then use a select helper:
map_table <- function(df, var1, var2){
df_distinct <- df %>% select(one_of(c(var1, var2)))%>%
distinct()
write.csv(df_distinct, 'var1.csv')
}
map_table(iris, 'Species', 'Petal.Width')
1) Ok the answer is to use distinct_ instead of distinct. And the variables being called need to be apostrophized.
2) use apply function to concatenate values/string formatting, and file =
map_table <- function(df, var1, var2){
df_distinct <- df %>% distinct_(var1, var2)
write.csv(df_distinct, file = paste(var1,'.csv'))
}
map_table(iris, 'Species', 'Petal.Width')

R: Passing Column Names to Function w/ dplyr

I am trying to set up a function in R which prepares data in a specific format to be fed into a correlogram. When manipulating datasets I tend to use dplyr due to its clarity and ease of use, but I am running into problems trying to pass a dataset and specified column names into this function while using dplyr.
This is the set up, included here (in slightly abbreviated form) for clarity. I have not encountered any errors with this and before posting this I confirmed corrData is set up properly:
library(corrplot)
library(tidyverse)
library(stringr)
table2a <- table2 %>%
mutate(example_index = str_c(country,year, sep="."))
Here is the actual function:
prepCorr <- function(dtable, x2, index2) {
practice <- dtable %>%
select(index2, x2) %>%
mutate(count=1) %>%
complete(index2, x2)
practice$count[is.na(practice$count)] <- 0
practice <- spread(practice, key = x2, value = count)
M <- cor(practice)
return(M)
}
prepCorr(table2a, type, example_index)
Whenever I run this function I get:
Error in overscope_eval_next(overscope, expr) : object 'example_index' not found
I have also tried to take advantage of quosures to fix this, but recieve a different error when I do so. When I run the following modified code:
prepCorr <- function(dtable, x2, index2) {
x2 <- enquo(x2)
index2 <- enquo(index2)
practice <- dtable %>%
select(!!index2, !!x2) %>%
mutate(count=1) %>%
complete(!!index2, !!x2)
practice$count[is.na(practice$count)] <- 0
practice <- spread(practice, key = !!x2, value = count)
return(cor(practice))
}
prepCorr(table2a, type, example_index)
I get:
Error in !index2 : invalid argument type
What am I doing wrong here, and how can I fix this? I believe I am using dplyr 0.7 for clarification.
UPDATE: replaced old example with reproducible example.
Look at this example
library(dplyr)
myfun <- function(df, col1, col2) {
col1 <- enquo(col1) # need to quote
col2 <- enquo(col2)
df1 <- df %>%
select(!!col1, !!col2) #!! unquotes
return(df1)
}
myfun(mtcars, cyl, gear)
You can learn more here link about NSE vs SE

dplyr mutate, custom function and variable name as characters

It's still not fully clear to me how I can pass certain expressions to dplyr.
I'd like to use a user defined function within mutate and be able to pass it column names as characters. I tried a few things with interp{lazyeval} with no success.
See the dummy example below.
library(dplyr)
library(lazyeval)
# Define custom function
sumVar <- function(x, y) { x + y }
# Using bare column names (OK)
iris %>%
mutate(newVar = sumVar(Petal.Length, Petal.Width))
# Using characters for column names (does not work)
iris %>%
mutate_(newVar = sumVar('Petal.Length', 'Petal.Width'))
We can try
library(lazyeval)
library(dplyr)
res1 <- iris %>%
mutate_(newVar= interp(~sumVar(x, y),
x= as.name("Petal.Length"),
y = as.name("Petal.Width")) )
The OP's method
res2 <- iris %>%
mutate(newVar = sumVar(Petal.Length, Petal.Width))
identical(res1, res2)
#[1] TRUE
Update
In the devel version of dplyr (soon to be released 0.6.0 in April 2017), this can be also with quosure
varNames <- quos(Petal.Length, Petal.Width)
res3 <- iris %>%
mutate(newVar = sumVar(!!! varNames))
The quos are quoting and inside the mutate, we use !!! to unquote a list for evaluation
identical(res2, res3)
#[1] TRUE

Resources