As part of a much larger project, I am trying to create a new column in a data.frame called "unique_id" based on the interaction of user-specified variables. In this use-case, the number of variables needed and the names will vary quite a bit which each user, so this flexibility is important. Some data.frames, such as the toy one I made in my example will even come with a "unique id" variable, but this is quite rare. I included in my example to be clear about what my desired output is.
Consider this toy data.frame:
mini_df <- data.frame(
lat = c(41.23,37.37,41.23,39.01,32.00),
lon = c(-120.79,-120.68,-120.79,-119.13,-120.00),
station_id = c(300,527,300,228,72)
)
Outside of a proper function, it is quite easy to do something like this:
out_of_function_test_df <- mini_df %>%
mutate(id = interaction(lat, lon))
Which produces what I want, namely:
lat lon station_id id
1 41.23 -120.79 300 41.23.-120.79
2 37.37 -120.68 527 37.37.-120.68
3 41.23 -120.79 300 41.23.-120.79
4 39.01 -119.13 228 39.01.-119.13
5 32.00 -120.00 72 32.-120
I need this to work within a function in which the user specifies the interacting variables.
I have read many stack exchange posts which approach similar problems with some important differences to mine. The other questions address verbs other than mutate, attempt to apply different functions, or do not address the issue of multiple user-specified variables.
After reading these, trying many things, and reading this, the best I can come up with is the following:
create_unique_id <- function(df,
metadata_coords,
unique_id_coords) {
df <- df %>%
mutate_(id = interp(~interaction(args), # causes error with or without tilde
args = c("list", lapply(unique_id_coords, as.name))))
return(df)
}
This produces an error:
Error in unique.default(x, nmax = nmax) :
unique() applies only to vectors
Here is the full traceback, if it is helpful:
22.
unique.default(x, nmax = nmax)
21.
unique(x, nmax = nmax)
20.
factor(x)
19.
as.factor(args[[i]])
18.
interaction(list("list", lat, lon))
17.
mutate_impl(.data, dots, caller_env())
16.
mutate.tbl_df(tbl_df(.data), ...)
15.
mutate(tbl_df(.data), ...)
14.
as.data.frame(mutate(tbl_df(.data), ...))
13.
mutate.data.frame(.data, !!!dots)
12.
mutate(.data, !!!dots)
11.
mutate_.data.frame(., id = interp(~interaction(args), args = c("list",
lapply(unique_id_coords, as.name))))
10.
mutate_(., id = interp(~interaction(args), args = c("list", lapply(unique_id_coords,
as.name))))
9.
function_list[[k]](value)
8.
withVisible(function_list[[k]](value))
7.
freduce(value, `_function_list`)
6.
`_fseq`(`_lhs`)
5.
eval(quote(`_fseq`(`_lhs`)), env, env)
4.
eval(quote(`_fseq`(`_lhs`)), env, env)
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2.
df %>% mutate_(id = interp(~interaction(args), args = c("list",
lapply(unique_id_coords, as.name))))
1.
create_unique_id(df = mini_df, metadata_coords = c("lat", "lon",
"station_id"), unique_id_coords = c("lat", "lon"))
I do not have nearly enough background knowledge for this to be helpful to me. I am confused because it seems that the issue is deep within the interact() function. interact() calls unique() along the way (which makes sense), but unique() is what ends up failing. Somehow, the initial call of interact() within my function is different than when it was outside the function, but I am not sure how.
Does this application of tidyr::unite() help?
create_unique_id <- function(df,
unique_id_coords) {
df <- df %>%
tidyr::unite("id", {{ unique_id_coords }}, remove = FALSE)
return(df)
}
Result
create_unique_id(mtcars, unique_id_coords = c(wt, qsec)) %>% head()
mpg cyl disp hp drat id wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.62_16.46 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875_17.02 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.32_18.61 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215_19.44 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.44_17.02 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.46_20.22 3.460 20.22 1 0 3 1
Here is another option using the ellipses ...:
library(rlang)
create_unique_id <- function(df, ...){
df %>%
mutate(id = paste(!!! ensyms(...), sep = "_"))
}
Output
create_unique_id(mtcars, cyl, hp, vs) %>% head()
mpg cyl disp hp drat wt qsec vs am gear carb id
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 6_110_0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 6_110_0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4_93_1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6_110_1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 8_175_0
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6_105_1
I am trying to create a copy of a column based on a variable - that is, the new column's name is constant, but which one it copies changes. This is what I would do previously:
library(dplyr)
x <- "mpg"
mtcars %>%
mutate_(Target = x)
To receive results like this:
However, when you run this, you now receive a warning:
Warning message:
mutate_() is deprecated.
Please use mutate() instead
It suggests looking at https://tidyeval.tidyverse.org/ for guidance; I've had a quick skim, but didn't spot this as a use case in the document. (It doesn't seem to cover the problem of converting existing code, but maybe I'm just not understanding it well enough?)
How do I move this code from mutate_() to mutate()?
You need to adhere to dplyr's non-standard evaluation
mtcars %>% mutate(Target = !!sym(x))
# mpg cyl disp hp drat wt qsec vs am gear carb Target
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 21.0
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 21.0
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.4
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 18.7
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.1
...
Here sym takes a string as input and turns it into a symbol, which you then unquote using the bang-bang operator !!.
Also note that mutate_ has been deprecated.
We can use mutate_at and this can be also used for multiple columns
library(dplyr)
mtcars %>%
mutate_at(vars(x), list(Target = ~ I))
You could use rlang::sym or base R get
library(dplyr)
mtcars %>% mutate(Target = !!rlang::sym(x))
mtcars %>% mutate(Target = get(x))
You can also try basic way like this...
x <- mtcars$mpg
mtcars$Target <- x
I have this function
var_sup <- function(var1,var2)
{
df$RD <- ifelse(df[var1]>df[var2],1,0)
df$RD <- as.numeric(df$RD)
return(df)
}
I want to write with dplyr to use it : like that
var_sup(num,num2) without "" !
compare_sup <- function (var1,var2) {
# capture the argument without evaluating it
var1 <- quo_name(enquo(var1))
var2 <- quo_name(enquo(var2))
# construct the expression
df %>%
mutate(RD = ifelse(!!var1 > !!var2 ,1,0))
}
I tried that but I have an error
thank you
The following works for me:
compare_sup <- function (var1,var2) {
require(tidyverse)
# capture the argument without evaluating it
var1 <- enquo(var1)
var2 <- enquo(var2)
# construct the expression
mtcars %>%
mutate(RD = ifelse(!!var1 > !!var2, 1, 0))
}
compare_sup(drat, wt) %>% head
# mpg cyl disp hp drat wt qsec vs am gear carb RD
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0
I basically removed the quo_name() from the function (and used mtcars as data set).
Using mutate_() I used to provide a list of new variables and the logic needed to create them.
library(dplyr)
library(rlang)
list_new_var <-
list(new_1 = "am * mpg",
new_2 = "cyl + disp")
mtcars %>%
mutate_(.dots = list_new_var) %>%
head()
Now I want to transition to using tidy evaluation. I am in the process of understanding the new methods.
How can I make this work? Will a function generally be recommended to solve this type of situation?
f_mutate <- function(data, new) {
a <- expr(new)
b <- eval(new)
c <- syms(new)
d <- UQ(syms(new))
e <- UQS(syms(new))
f <- UQE(syms(new))
data %>%
mutate(f) %>%
head()
}
f_mutate(mtcars, new = list_new_var)
One option would be to create a list with quote to return as an argument without evaluation
list_new_var <-list(
new_1 = quote(am * mpg),
new_2 = quote(cyl + disp)
)
and within the f_mutate, use the !!! to evaluate
f_mutate <- function(data, new) {
data %>%
mutate(!!! new)
}
run the function
f_mutate(mtcars, new = list_new_var) %>%
head
# mpg cyl disp hp drat wt qsec vs am gear carb new_1 new_2
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0 166
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.0 166
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 22.8 112
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.0 264
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.0 368
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0.0 231
I don't think you need a function for this. I think you just need the following
library(dplyr)
mtcars %>%
as_tibble() %>%
mutate(new_column1 = am * mpg,
new_column2 = cyl + disp) %>%
head()
Check out the first example here.
Any idea how I can manipulate dplyr variables programatically?
This works:
out = "new_var"
mtcars %>%
mutate(!!out := mpg/carb)
But I really need to be able to adjust the variables in the division. Thought I could do it like this:
out = "new_var"
numer = "mpg"
denom = "carb"
mtcars %>%
mutate(!!out := !! quo(numer/denom))
but no dice:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
result should look like:
mpg cyl disp hp drat wt qsec vs am gear carb new_var
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.250000
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.250000
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.800000
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.400000
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 9.350000
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.100000
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 3.575000
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 12.200000
...
Any idea how this works?
SOLVED -------------------------------------------------
myFunction = function(df, col, col2, new_col) {
col <- enquo(col)
col2 <- enquo(col2)
new_col <- quo_name(enquo(new_col))
df %>%
mutate(!!new_col := (!!col)/(!!col2))
}
myFunction(mtcars, mpg, wt, mpg_based_new_col)
If you want to make a quosure from a character value, you can use the rlang::sym() function (or just the base as.name() function). For example
out = "new_var"
numer = rlang::sym("mpg")
denom = rlang::sym("carb")
library(tidyverse)
mtcars %>%
mutate(!!out := (!!numer)/(!!denom))
Note how we escape each variable separately rather than the entire expression.