Mutate with tidy evaluation

Mutate with tidy evaluation - r

Using mutate_() I used to provide a list of new variables and the logic needed to create them.
library(dplyr)
library(rlang)
list_new_var <-
list(new_1 = "am * mpg",
new_2 = "cyl + disp")
mtcars %>%
mutate_(.dots = list_new_var) %>%
head()
Now I want to transition to using tidy evaluation. I am in the process of understanding the new methods.
How can I make this work? Will a function generally be recommended to solve this type of situation?
f_mutate <- function(data, new) {
a <- expr(new)
b <- eval(new)
c <- syms(new)
d <- UQ(syms(new))
e <- UQS(syms(new))
f <- UQE(syms(new))
data %>%
mutate(f) %>%
head()
}
f_mutate(mtcars, new = list_new_var)

One option would be to create a list with quote to return as an argument without evaluation
list_new_var <-list(
new_1 = quote(am * mpg),
new_2 = quote(cyl + disp)
)
and within the f_mutate, use the !!! to evaluate
f_mutate <- function(data, new) {
data %>%
mutate(!!! new)
}
run the function
f_mutate(mtcars, new = list_new_var) %>%
head
# mpg cyl disp hp drat wt qsec vs am gear carb new_1 new_2
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0 166
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.0 166
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 22.8 112
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.0 264
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.0 368
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0.0 231

I don't think you need a function for this. I think you just need the following
library(dplyr)
mtcars %>%
as_tibble() %>%
mutate(new_column1 = am * mpg,
new_column2 = cyl + disp) %>%
head()
Check out the first example here.

Related

warning message with mutate_at() in dplyr package

I received a warning message when using mutate_at() in dplyr package.
dt %>%
mutate_at(
c(5:43),
funs(pc = ./Population)
)
Warning message:
funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once per session.
Is there any alternative function?
How can I pass this using data.table.

With the recent versions of dplyr, it would be list
library(dplyr)
dt %>%
mutate_at(5:43,
list(pc = ~ ./Population))
Reproducible example
head(mtcars) %>%
mutate_at(4:5, list(pc = ~ ./wt))
# mpg cyl disp hp drat wt qsec vs am gear carb hp_pc drat_pc
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 41.98473 1.4885496
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.26087 1.3565217
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 40.08621 1.6594828
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 34.21462 0.9580093
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 50.87209 0.9156977
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 30.34682 0.7976879
Warning message is a friendly warning and nothing to worry about
In data.table, we specify it in .SDcols
library(data.table)
setDT(dt)[, paste0(names(dt)[5:43], "_pc") :=
lapply(.SD, function(x) x/Population), .SDcols = 5:43]
Or using base R
nm1 <- names(dt)[5:43]
dt[paste0(nm1, "_pc")] <- lapply(dt[nm1], `/`, dt[["Population"]])
Or directly
dt[paste0(nm1, "_pc")] <- dt[nm1]/dt[["Population"]]

Find row maximum across columns by using vector of column names in dplyr

I have a long list of column names in a character vector that refer to various medications. I like to keep that list at the top of my code to make it easy to edit and easy to reference the group of medications at various points in my script. I would like to take the row maximum across the medications using dplyr by feeding it the pre-defined vector of column names to find the maximum across. It seems like there is a simple fix but it is escaping me today...
I tried the code below but it returns one of the names in the list of column names.
I also tried various permutations using get(), select() and do.call() to try and make R read the character vector differently but I couldn't figure it out...
data(mtcars)
colnames <- c("vs", "am", "gear", "carb")
df <- mtcars %>%
rowwise() %>%
mutate(max = max(colnames))
EDIT: I'd like the maximum to be shown in a new column. For example, I'd like the output as the following:
vs am gear carb MAX
0 1 4 4 4
0 1 4 4 4
1 1 4 1 4
1 0 3 1 3
0 0 3 2 3

You could also tidy the data by making it long first then finding the max and joining it on the original data. Note you would have to use gather_() here with all names in quotes so you can reference your vector. In this example I am using car as your drug and did not address if there is a tie for max value.
library(dplyr)
library(tidyr)
colnames <- c("vs", "am", "gear", "carb")
df <- mtcars %>%
mutate(nms = row.names(mtcars))
#transpose then find max value and keep max value
dfx <- tidyr::gather_(df, 'nms2','vals', colnames) %>%
group_by(nms) %>%
mutate(max = max(vals)) %>%
ungroup %>%
filter(max == vals)
#join back on to data with column name and max value
mt2 <- left_join(df,select(dfx, nms, vals,nms2),by='nms')
using pmax and much less code
you can use pmax inside a do.call to the the rowwise maximum
df <- mtcars %>%
mutate(mx2 = do.call(pmax,mtcars[,colnames]))

Using c_across with your initial attempt appears to work:
mycols <- c("vs", "am", "gear", "carb")
df <- mtcars %>%
rowwise() %>%
mutate(row_max = max(c_across(all_of(mycols))))

It may not be the most dplyr answer, but you could always use apply inside mutate:
mtcars %>%
mutate(max_val = apply(., 1, function(x) max(x[col_names]))) %>%
head()
mpg cyl disp hp drat wt qsec vs am gear carb max_val2 max_val
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4 4
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3 3
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3 3
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3 3
Or, you could do something like this:
mtcars$max_val2 <- mtcars %>%
select(col_names) %>%
transmute(apply(., 1, max)) %>%
pull()
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb max_val2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3

you can summarise a select number of columns or a vector of columns such as you have, using summarise_at from dplyr:
data(mtcars)
colnames <- c("vs", "am", "gear", "carb")
df <- mtcars %>%
summarise_at(colnames, list(max))
vs am gear carb
1 1 1 5 8
You simply specify the columns first, and then function second; in this case max. It's the same syntax for select_at, mutate_at and rename_at - you use summarise_at because you preserve the specified columns rather than create new ones.

write function with dplyr

I have this function
var_sup <- function(var1,var2)
{
df$RD <- ifelse(df[var1]>df[var2],1,0)
df$RD <- as.numeric(df$RD)
return(df)
}
I want to write with dplyr to use it : like that
var_sup(num,num2) without "" !
compare_sup <- function (var1,var2) {
# capture the argument without evaluating it
var1 <- quo_name(enquo(var1))
var2 <- quo_name(enquo(var2))
# construct the expression
df %>%
mutate(RD = ifelse(!!var1 > !!var2 ,1,0))
}
I tried that but I have an error
thank you

The following works for me:
compare_sup <- function (var1,var2) {
require(tidyverse)
# capture the argument without evaluating it
var1 <- enquo(var1)
var2 <- enquo(var2)
# construct the expression
mtcars %>%
mutate(RD = ifelse(!!var1 > !!var2, 1, 0))
}
compare_sup(drat, wt) %>% head
# mpg cyl disp hp drat wt qsec vs am gear carb RD
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0
I basically removed the quo_name() from the function (and used mtcars as data set).

Programming dplyr operations

Any idea how I can manipulate dplyr variables programatically?
This works:
out = "new_var"
mtcars %>%
mutate(!!out := mpg/carb)
But I really need to be able to adjust the variables in the division. Thought I could do it like this:
out = "new_var"
numer = "mpg"
denom = "carb"
mtcars %>%
mutate(!!out := !! quo(numer/denom))
but no dice:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
result should look like:
mpg cyl disp hp drat wt qsec vs am gear carb new_var
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.250000
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.250000
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.800000
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.400000
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 9.350000
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.100000
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3.575000
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 12.200000
...
Any idea how this works?
SOLVED -------------------------------------------------
myFunction = function(df, col, col2, new_col) {
col <- enquo(col)
col2 <- enquo(col2)
new_col <- quo_name(enquo(new_col))
df %>%
mutate(!!new_col := (!!col)/(!!col2))
}
myFunction(mtcars, mpg, wt, mpg_based_new_col)

If you want to make a quosure from a character value, you can use the rlang::sym() function (or just the base as.name() function). For example
out = "new_var"
numer = rlang::sym("mpg")
denom = rlang::sym("carb")
library(tidyverse)
mtcars %>%
mutate(!!out := (!!numer)/(!!denom))
Note how we escape each variable separately rather than the entire expression.

Is there a way to pass dplyr's `do` function a vector of additional arguments?

I was curious if there was a way to pass dplyr's do function a vector of additional arguments which would be applied to each group in turn? Consider, for example, if we wanted to group the mtcars dataset by its cyl variable and apply the head function to the resulting groups (one for 4, 6, and 8 respectively) with n = 1 for the 4 group, n = 2 for the 6 group, and n = 3 for the 8 group, combining the final results in a single dataframe.
I can accomplish this using mapply as follows:
temp <- mtcars %>%
split(mtcars$cyl) %>%
mapply(FUN = head, x = ., n = 1:3, SIMPLIFY = FALSE)
rbind(temp[[1]], temp[[2]], temp[[3]])
I was curious if there was an equivalent way of doing this with dplyr? I got as far as below, but was stymied as to how to pass head an additional argument representing the number of rows we would like to select:
# only selects first row of each group
mtcars %>%
group_by(cyl) %>%
do(data.frame(head(x = ., n = 1)))
# throws an error because n expects a single number
mtcars %>%
group_by(cyl) %>%
do(data.frame(head(x = ., n = 1:3)))

if we wanted to group the mtcars dataset by its cyl variable and apply the head function to the resulting groups (one for 4, 6, and 8 respectively) with n = 1 for the 4 group, n = 2 for the 6 group, and n = 3 for the 8 group
First, formalize this notion in a data.frame:
heads = data.frame(cyl=c(4,6,8), n = 1:3)
Then you can merge it in:
mtcars %>% left_join(heads) %>% group_by(cyl) %>% slice(seq(first(n)))
# mpg cyl disp hp drat wt qsec vs am gear carb n
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int)
# 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1
# 2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2
# 3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2
# 4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 3
# 5 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3
# 6 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 3
I would also consider dodging extra parentheses with
... %>% slice(n %>% first %>% seq)
do exists only as a hack when the other dplyr functions aren't up to the job, and should be avoided.

This is also possible without grouping at all,
mtcars %>% arrange(cyl) %>% slice(rep(c(0, which(diff(cyl)>0)), 1:3) + sequence(1:3))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 5 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 6 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
To answer your question about do more directly, because of the way it is implemented (evaluating the expression in a loop through the subsets), one way to make your head function work, would be to have it increment a variable everytime it is called.
## Define a function that increments a variable each time it is called
heads <- (function() { n <- 0; function(dat) { n <<- n+1; dat[1:n, ] } })()
mtcars %>% group_by(cyl) %>% do(heads(.))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 5 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 6 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3

Hmm, I bet there's a more elegant way to do this, but:
group_index =
mtcars %>%
group_by(cyl) %>%
group_indices
mtcars %>%
mutate(group_index = group_index) %>%
group_by(cyl) %>%
slice(group_index %>% first %>% seq)

would something like this work. this solution is specific to the mtcars example but something like this may work in your case .
it involves creating your own function that has conditional statements based off the column you are grouping off of:
head_custom <- function(df, n){
if(df$cyl == 4){
ans <- head(df, n[1])
}
if(df$cyl == 6){
ans <- head(df, n[2])
}
if(df$cyl == 8){
ans <- head(df, n[3])
}
return(ans)
}
mtcars %>%
group_by(cyl) %>%
do(head_custom(., n = 1:3))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mutate with tidy evaluation - r

I don't think you need a function for this. I think you just need the following library(dplyr) mtcars %>% as_tibble() %>% mutate(new_column1 = am * mpg, new_column2 = cyl + disp) %>% head() Check out the first example here.

Related

warning message with mutate_at() in dplyr package

Find row maximum across columns by using vector of column names in dplyr

write function with dplyr

Programming dplyr operations

Is there a way to pass dplyr's `do` function a vector of additional arguments?

Categories

Resources