Changing factors order inside a function [duplicate] - r

I have been reading from this SO post on how to work with string references to variables in dplyr.
I would like to mutate a existing column based on string input:
var <- 'vs'
my_mtcars <- mtcars %>%
mutate(get(var) = factor(get(var)))
Error: unexpected '=' in:
"my_mtcars <- mtcars %>%
mutate(get(var) ="
Also tried:
my_mtcars <- mtcars %>%
mutate(!! rlang::sym(var) = factor(!! rlang::symget(var)))
This resulted in the exact same error message.
How can I do the following based on passing string 'vs' within var variable to mutate?
# works
my_mtcars <- mtcars %>%
mutate(vs = factor(vs))

This operation can be carried out with := while evaluating (!!) and using the conversion to symbol and evaluating on the rhs of assignment
library(dplyr)
my_mtcars <- mtcars %>%
mutate(!! var := factor(!! rlang::sym(var)))
class(my_mtcars$vs)
#[1] "factor"
Or without thinking too much, use mutate_at, which can take strings in vars and apply the function of interest
my_mtcars2 <- mtcars %>%
mutate_at(vars(var), factor)

Related

R - mutate with regex in a loop

I have a data frame in which every column consists of number followed by text, e.g. 533 234r/r.
The following code to get rid off text works well:
my_data <- my_data %>%
mutate(column1 = str_extract(column1, '.+?(?=[a-z])'))
I would like to do it for multiple columns:
col_names <- names(my_data)
for (i in 1:length(col_names)) {
my_data <- my_data%>%
mutate(col_names[i] = str_extract(col_names[i], '.+?(?=[a-z])'))
}
But it returns an error:
Error: unexpected '=' in:
" my_data <- my_data %>%
mutate(col_names[i] ="
I think mutate_all() wouldn't work as well, bcos str_extract() requires column name as argument.
If we are using strings, then convert to symbol and evaluate (!!) while we do the assignment with (:=)
library(dplyr)
library(stringr)
col_names <- names(my_data)
for (i in seq_along(col_names)) {
my_data <- my_data %>%
mutate(!! col_names[i] :=
str_extract(!!rlang::sym(col_names[i]), '.+?(?=[a-z])'))
}
In tidyverse, we could do this with across instead of looping with a for loop (dplyr version >= 1.0)
my_data <- my_data %>%
mutate(across(everything(), ~ str_extract(., '.+?(?=[a-z])')))
If the dplyr version is old, use mutate_all
my_data <- my_data %>%
mutate_all(~ str_extract(., '.+?(?=[a-z])'))

Moving from mutate_all to across() in dplyr 1.0

With the new release of dplyr I am refactoring quite a lot of code and removing functions that are now retired or deprecated. I had a function that is as follows:
processingAggregatedLoad <- function (df) {
defined <- ls()
passed <- names(as.list(match.call())[-1])
if (any(!defined %in% passed)) {
stop(paste("Missing values for the following arguments:", paste(setdiff(defined, passed), collapse=", ")))
}
df_isolated_load <- df %>% select(matches("snsr_val")) %>% mutate(global_demand = rowSums(.)) # we get isolated load
df_isolated_load_qlty <- df %>% select(matches("qlty_good_ind")) # we get isolated quality
df_isolated_load_qlty <- df_isolated_load_qlty %>% mutate_all(~ factor(.), colnames(df_isolated_load_qlty)) %>%
mutate_each(funs(as.numeric(.)), colnames(df_isolated_load_qlty)) # we convert the qlty to factors and then to numeric
df_isolated_load_qlty[df_isolated_load_qlty[]==1] <- 1 # 1 is bad
df_isolated_load_qlty[df_isolated_load_qlty[]==2] <- 0 # 0 is good we mask to calculate the global index quality
df_isolated_load_qlty <- df_isolated_load_qlty %>% mutate(global_quality = rowSums(.)) %>% select(global_quality)
df <- bind_cols(df, df_isolated_load, df_isolated_load_qlty)
return(df)
}
Basically the function does as follows:
1.The function selects all of the values of a pivoted dataframe and aggregated them.
2.The function selects the quality indicator (character) of a pivoted dataframe.
3.I convert the characters of the quality to factors and then to numeric to get the 2 levels (1 or 2).
4.I replace the numeric values of each of the individual columns by 0 or 1 depending on the level.
5.I rowsum the individual quality as I will get 0 if all of the values are good, otherwise the global quality is bad.
The problem is that I am getting the following messages:
1: `funs()` is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
2: `mutate_each_()` is deprecated as of dplyr 0.7.0.
Please use `across()` instead.
I did multiple trials as for instance:
df_isolated_load_qlty %>% mutate(across(.fns = ~ as.factor(), .names = colnames(df_isolated_load_qlty)))
Error: Problem with `mutate()` input `..1`.
x All unnamed arguments must be length 1
ℹ Input `..1` is `across(.fns = ~as.factor(), .names = colnames(df_isolated_load_qlty))`.
But I am still a bit confused about the new dplyr syntax. Would someone be able to guide me a little bit around the right way of doing this?
mutate_each has been long deprecated and was replaced with mutate_all.
mutate_all is now replaced with across
across has default .cols as everything() which means it behaves as mutate_all by default (like here) if not mentioned explicitly.
You can apply the mulitple function in the same mutate call, so here factor and as.numeric can be applied together.
Considering all this you can change your existing function to :
library(dplyr)
processingAggregatedLoad <- function (df) {
defined <- ls()
passed <- names(as.list(match.call())[-1])
if (any(!defined %in% passed)) {
stop(paste("Missing values for the following arguments:",
paste(setdiff(defined, passed), collapse=", ")))
}
df_isolated_load <- df %>%
select(matches("snsr_val")) %>%
mutate(global_demand = rowSums(.))
df_isolated_load_qlty <- df %>% select(matches("qlty_good_ind"))
df_isolated_load_qlty <- df_isolated_load_qlty %>%
mutate(across(.fns = ~as.numeric(factor(.))))
df_isolated_load_qlty[df_isolated_load_qlty ==1] <- 1
df_isolated_load_qlty[df_isolated_load_qlty==2] <- 0
df_isolated_load_qlty <- df_isolated_load_qlty %>%
mutate(global_quality = rowSums(.)) %>%
select(global_quality)
df <- bind_cols(df, df_isolated_load, df_isolated_load_qlty)
return(df)
}

Calculate mode for each column in dataframe using lapply dplyr

I'm trying to create a function that essentially gets me the MODE...or MODE-X (2nd-Xth most common value & and the associated counts for each column in a data frame.
I can't figure out what I may be missing and I'm looking for some assistance? I believe it has to do with the passing in of a variable into dplyr function.
library(tidyverse)
myfunct_get_mode = function(x, rank=1){
mytable = dplyr::count(rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = table %>% dplyr::slice(rlang::sym(rank))
return(result)
}
mtcars %>% lapply(. %>% (function(x) myfunct_get_mode(x, rank=2)))
There are some problems with your function:
You function-call is not doing what you think. Check with mtcars %>% lapply(. %>% (function(x) print(x))) that actually your x is the whole column of mtcars. To get the names of the column apply the function to names(mtcars). But then you also have to specify the dataframe you're working on.
To evaluate a symbol you get sym from you need to use !! in front of the rlang::sym(x).
rank is not a variable name, thus no need for rlang::sym here.
table should be mytable in second to last line of your function.
So how could it work (although there are probably better ways):
myfunct_get_mode = function(df, x, rank=1){
mytable = count(df, !!rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = mytable %>% slice(rank)
return(result)
}
names(mtcars) %>% lapply(function(x) myfunct_get_mode(mtcars, x, rank=2))
If we need this in a list, we can use map
f1 <- function(dat, rank = 1) {
purrr::imap(dat, ~
dat %>%
count(!! rlang::sym(.y)) %>%
rename_all(~ c('variable', 'counts')) %>%
arrange(desc(counts)) %>%
slice(seq_len(rank))) #%>%
#bind_cols - convert to a data.frame
}
f1(mtcars, 2)

Replacing group_by_ with group_by when the argument is a string in dplyr

I have some code that specifies a grouping variable as a string.
group_var <- "cyl"
My current code for using this grouping variable in a dplyr pipeline is:
mtcars %>%
group_by_(group_var) %>%
summarize(mean_mpg = mean(mpg))
My best guess as to how to replace the deprecated group_by_ function with group_by is:
mtcars %>%
group_by(!!as.name(group_var)) %>%
summarize(mean_mpg = mean(mpg))
This works but is not explicitly mentioned in the programming with dplyr vignette.
Is using !!as.name() the preferred way to replace group_by_() with group_by()?
Is this within a function? Otherwise I think the !!as.name() part is unnecessary and I would stick with the group_by_at(group_var) suggestion by #aosmith for simplicity sake. Otherwise, I would set it up as so:
examplr <- function(data, group_var){
group_var <- as.name(group_var)
data %>%
group_by(!!group_var) %>%
summarize(mean_mpg = mean(mpg))
}
examplr(data = mtcars,
group_var = "cyl")

group_by by a vector of characters using tidy evaluation semantics

I used to do it, using group_by_
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_(.dots = group_by) %>% summarise(gear = mean(gear))
but now group_by_ is deprecated. I don't know how to do it using the tidy evaluation framework.
New answer
With dplyr 1.0, you can now use selection helpers like all_of() inside across():
df |>
group_by(
across(all_of(my_vars))
)
Old answer
Transform the character vector into a list of symbols and splice it in
df %>% group_by(!!!syms(group_by))
There is group_by_at variant of group_by:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars you could use any dplyr way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))

Resources