Pass equation as argument into function - r

I was told that passing equation as strings and evaluating them is bad practice. How can I still create a function which takes an equation and evaluates it without the string version and without using third party packages?
This is my function:
replaceFormula <- function(df, column, formula){
df[column] <- eval(parse(text=formula), df)
return(df)
}
This is my use case:
set.seed(24)
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
df <- as.data.frame(dataset)
replaceFormula(df, 'V5', 'V3+V4')
Update:
Is this also possible with conditions?
My example function:
replaceFactor <- function(df, column, condition, what){
df[column] <- sapply(df[column],function(x) ifelse(eval(parse(text=condition), df), what, x))
return(df)
}
My usecase:
set.seed(24)
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
df <- as.data.frame(dataset)
replaceFactor(df, 'V5', 'V1==1', 'GOOD')

It looks like you've crafted yourself a kludgey version of transform
> set.seed(24)
> dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
> df <- as.data.frame(dataset)
> transform(df, V5 = V3 + V4)
V1 V2 V3 V4 V5
1 1 5 3 5 8
2 1 1 2 1 3
3 4 4 4 NA NA
4 3 4 4 3 7
5 3 1 1 NA NA

We can pass the formula as a quosure and evaluate it by unquoting (!! or UQ) in the devel version of dplyr (or soon to be released 0.6.0)
library(dplyr)
replaceFormula <- function(dat, Col, form){
Col <- quo_name(enquo(Col))
dat %>%
mutate(UQ(Col) := UQ(form))
}
replaceFormula(df, V5, quo(V3 + V4))
# V1 V2 V3 V4 V5
#1 1 5 3 5 8
#2 1 1 2 1 3
#3 4 4 4 NA NA
#4 3 4 4 3 7
#5 3 1 1 NA NA
Update
Based on the OP's comments, we can also pass an expression to evaluate and change the values based on that
replaceFormulaNew <- function(dat, Col, form, what){
Col <- enquo(Col)
ColN <- quo_name(Col)
what <- quo_name(enquo(what))
dat %>%
mutate(UQ(ColN) := ifelse(UQ(form), what, UQ(Col)))
}
replaceFormulaNew(df, V5, quo(V1==1), GOOD)
# V1 V2 V3 V4 V5
#1 1 5 3 5 GOOD
#2 1 1 2 1 GOOD
#3 4 4 4 NA 4
#4 3 4 4 3 <NA>
#5 3 1 1 NA 1
replaceFormulaNew(df, V5, quo(V3 < V4), GOOD)
# V1 V2 V3 V4 V5
#1 1 5 3 5 GOOD
#2 1 1 2 1 3
#3 4 4 4 NA <NA>
#4 3 4 4 3 <NA>
#5 3 1 1 NA <NA>
The enquo takes the input argument and convert it to quosure while quo_name converts it to string for evaluation in mutate to assign the evaluated output to the column specified in the input

Related

Crossing .name_repair with duplicated column names

I would like to combine two dataframes using crossing, but some have the same columnnames. For that, I would like to add "_nameofdataframe" to these columns. Here are some reproducible dataframes (dput below):
> df1
person V1 V2 V3
1 A 1 3 3
2 B 4 4 5
3 C 2 1 1
> df2
V2 V3
1 2 5
2 1 6
3 1 2
When I run the following code it will return duplicated column names:
library(tidyr)
crossing(df1, df2, .name_repair = "minimal")
#> # A tibble: 9 × 6
#> person V1 V2 V3 V2 V3
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A 1 3 3 1 2
#> 2 A 1 3 3 1 6
#> 3 A 1 3 3 2 5
#> 4 B 4 4 5 1 2
#> 5 B 4 4 5 1 6
#> 6 B 4 4 5 2 5
#> 7 C 2 1 1 1 2
#> 8 C 2 1 1 1 6
#> 9 C 2 1 1 2 5
As you can see it returns the column names while being duplicated. My desired output should look like this:
person V1 V2_df1 V3_df1 V2_df2 V3_df2
1 A 1 3 3 1 2
2 A 1 3 3 1 6
3 A 1 3 3 2 5
4 B 4 4 5 1 2
5 B 4 4 5 1 6
6 B 4 4 5 2 5
7 C 2 1 1 1 2
8 C 2 1 1 1 6
9 C 2 1 1 2 5
So I was wondering if anyone knows a more automatic way to give the duplicated column names a name like in the desired output above with crossing?
dput of df1 and df2:
df1 <- structure(list(person = c("A", "B", "C"), V1 = c(1, 4, 2), V2 = c(3,
4, 1), V3 = c(3, 5, 1)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(V2 = c(2, 1, 1), V3 = c(5, 6, 2)), class = "data.frame", row.names = c(NA,
-3L))
As you probably know, the .name_repair parameter can take a function. The problem is crossing() only passes that function one argument, a vector of the concatenated column names() of both data frames. So we can't easily pass the names of the data frame objects to it. It seems to me that there are two solutions:
Manually add the desired suffix to an anonymous function.
Create a wrapper function around crossing().
1. Manually add the desired suffix to an anonymous function
We can simply supply the suffix as a character vector to the anonymous .name_repair parameter, e.g. suffix = c("_df1", "_df2").
crossing(
df1,
df2,
.name_repair = \(x, suffix = c("_df1", "_df2")) {
names_to_repair <- names(which(table(x) == 2))
x[x %in% names_to_repair] <- paste0(
x[x %in% names_to_repair],
rep(
suffix,
each = length(unique(names_to_repair))
)
)
x
}
)
# person V1 V2_df1 V3_df1 V2_df2 V3_df2
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 3 3 1 2
# 2 A 1 3 3 1 6
# 3 A 1 3 3 2 5
# 4 B 4 4 5 1 2
# 5 B 4 4 5 1 6
# 6 B 4 4 5 2 5
# 7 C 2 1 1 1 2
# 8 C 2 1 1 1 6
# 9 C 2 1 1 2 5
The disadvantage of this is that there is a room for error when typing the suffix, or that we might forget to change it if we change the names of the data frames.
Also note that we are checking for names which appear twice. If one of your original data frames already has broken (duplicated) names then this function will also rename those columns. But I think it would be unwise to try to do any type of join if either data frame did not have unique column names.
2. Create a wrapper function around crossing()
This might be more in the spirit of the tidyverse. Thecrossing() docs to which you linked state crossing() is a wrapper around expand_grid(). The source for expand_grid() show that it is basically a wrapper which uses map() to apply vctrs::vec_rep() to some inputs. So if we want to add another function to the call stack, there are two ways I can think of:
Using deparse(substitute())
crossing_fix_names <- function(df_1, df_2) {
suffixes <- paste0(
"_",
c(deparse(substitute(df_1)), deparse(substitute(df_2)))
)
crossing(
df_1,
df_2,
.name_repair = \(x, suffix = suffixes) {
names_to_repair <- names(which(table(x) == 2))
x[x %in% names_to_repair] <- paste0(
x[x %in% names_to_repair],
rep(
suffix,
each = length(unique(names_to_repair))
)
)
x
}
)
}
# Output the same as above
crossing_fix_names(df1, df2)
The disadvantage of this is that deparse(substitute()) is ugly and can occasionally have surprising behaviour. The advantage is we do not need to remember to manually add the suffixes.
Using match.call()
crossing_fix_names2 <- function(df_1, df_2) {
args <- as.list(match.call())
suffixes <- paste0(
"_",
c(
args$df_1,
args$df_2
)
)
crossing(
df_1,
df_2,
.name_repair = \(x, suffix = suffixes) {
names_to_repair <- names(which(table(x) == 2))
x[x %in% names_to_repair] <- paste0(
x[x %in% names_to_repair],
rep(
suffix,
each = length(unique(names_to_repair))
)
)
x
}
)
}
# Also the same output
crossing_fix_names2(df1, df2)
As we don't have the drawbacks of deparse(substitute()) and we don't have to manually specify the suffix, I think this is the probably the best approach.
test for the condition using dputs :
colnames(df1) %in% colnames(df2)
[1] FALSE FALSE TRUE TRUE
rename
colnames(df2) <- paste0(colnames(df2), '_df2')
then cbind
cbind(df1,df2)
person V1 V2 V3 V2_df2 V3_df2
1 A 1 3 3 2 5
2 B 4 4 5 1 6
3 C 2 1 1 1 2
not so elegant, but usefully discernible later.

Using purrr rowwise instead of apply() on whole row

I want to replace apply() (and it's derivatives) with a purrr function.
I have a data.frame like this one:
> df
V1 V2 V3
1 NA 2 3
2 2 NA 3
3 3 3 NA
And I want to apply two functions rowwise: min(x, na.rm = T) and which.min(x)and return the results as a dataframe.
If I know how many columns there are I can do e.g. this:
pmap_dfr(df, function(V1, V2, V3) {data.frame(min = pmin(V1, V2, V3, na.rm = T),
where = which.min(c(V1, V2, V3)))})
min where
1 2 2
2 2 1
3 3 1
How can I make pmap() or any other purrr function take the whole row as argument just like apply()does?
func <- function(x) {data.frame(min = min(x, na.rm = T), where = which.min(x))}
> Reduce(rbind, apply(df,1, func))
min where
V2 2 2
V1 2 1
V11 3 1
I probably just missed a feature or some trick. Thanks for your help.
Your solution will work for all columns if you use an ellipsis.
pmap_dfr(df, ~data.frame(min = min(..., na.rm = TRUE), where = which.min(c(...))))
min where
1 2 2
2 2 1
3 3 1
One possibility could be:
df %>%
mutate(min = invoke(pmin, na.rm = TRUE, .),
where = max.col(!is.na(-.)[, 1:length(.)], ties.method = "first"))
V1 V2 V3 min where
1 NA 2 3 2 2
2 2 NA 3 2 1
3 3 3 NA 3 1
Or if you want to keep just the last the two columns:
df %>%
transmute(min = invoke(pmin, na.rm = TRUE, .),
where = max.col(!is.na(-.)[, 1:length(.)], ties.method = "first"))
Not a purrr-solution, but a data.table oneliner.
library(data.table)
dt <- fread("row V1 V2 V3
1 NA 2 3
2 2 NA 3
3 3 3 NA")
melt(dt, id.vars = "row")[ , .SD[which.min(value)], by = row]
row variable value
1: 1 V2 2
2: 2 V1 2
3: 3 V1 3

r: randomly assigning "1" or "2" in a vector based on double-occurrences in another vector

I constructed the following code below. It shall assign the value "1" or "2" to vector v2, if an element in vector v1 occurs twice, e.g. "A" in vector v1 appears twice, hence in the respective rows, v2 should once read "1" and in the other case "2".
The code works sort of fine, except in some cases, a similar number is assigned to v2, when an element in v1 occurs twice, this should obviously not be the case.
Can anybody help me with the issue? Thanks!
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)
for (i in 1:length(df1$v1)) {
if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {
df1$v2[i] <- sample(c(1,2),1,replace=TRUE)
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {
df1$v2[i] <- 2
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {
df1$v2[i] <- 1
} else {
df1$v2[i] <- 2
}
}
I think that I have understood what you require and hopefully the below should do what you want, using dplyr. It will randomly assign integer values from 1 to n, where n is the number of occurrences of a given letter (note this is generalizable from your requirement of 2 occurrences).
library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))
df1 <- df1 %>%
group_by(v1) %>%
mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE),
TRUE ~ 1L))
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
value = 1:length(v1)
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,value,v2)
library(dplyr)
set.seed(9)
df1 %>%
sample_frac(1) %>% # shuffle rows
group_by(v1) %>% # for each v1 value
mutate(v2 = row_number()) %>% # count and flag occurences
ungroup() %>% # forget the grouping
arrange(v1) # order by v1 (only for visualisation purposes)
# # A tibble: 18 x 3
# v1 value v2
# <fct> <int> <int>
# 1 A 1 1
# 2 A 2 2
# 3 B 4 1
# 4 B 3 2
# 5 C 5 1
# 6 C 6 2
# 7 D 7 1
# 8 D 8 2
# 9 E 9 1
#10 E 10 2
#11 F 12 1
#12 F 11 2
#13 G 14 1
#14 G 13 2
#15 H 15 1
#16 I 16 1
#17 J 17 1
#18 K 18 1
Using base R, I think you can arrive at what you want somewhat easily by using table and sequence in connection and manipulating the output.
Edit: After your comments. I now think I understand what you what.
res <- data.frame(v1, v2 = sequence(table(v1)), row.names = NULL)
res <- res[sample(1:nrow(res)), ] # Scramble data order
res <- res[order(res$v1), ] # Reorder by v1 column
# v1 v2
#1 A 1
#2 A 2
#3 B 1
#4 B 2
#5 C 1
#6 C 2
#7 D 2 # note 2 comes first here
#8 D 1
#9 E 1
#10 E 2
#11 F 1
#12 F 2
#13 G 1
#14 G 2
#15 H 1
#16 I 1
#17 J 1
#18 K 1
Edit2 "randomly" sorting before assigning:
df1 <- data.frame(v1)
df1[order(rank(v1, ties.method = "random")), "v2"] <- sequence(table(v1))
df1

dplyr with string and NSE at same time

I like creating dplyr functions with character inputs, so will be very happy with new v0.6.0 coming up.
For fun and learning current dplyr version 0.5.0.9004, i tried to make a flexible function that can take a character argument as well as an expressions (NSE).
I did succeed, but can't this be done more elegantly?!
d = data.frame(v1=1:2, v2=9:8)
fixquo <- function(x){
# 'fixquo' expects a enquo-ed object, then sees if enquo-ing needs to be 'reversed'...
if(!length(tryCatch({ls(get_env(x))}, error=function(e) "empty")))
x = as.name(as.character(UQ(x))[2])
x
}
Dtest <- function(d, x="v1", res="z"){
x <- enquo(x) %>% fixquo()
d %>% mutate(!!res := UQ(x))
}
Dtest(d)
Dtest(d, "v1")
Dtest(d, v1)
Dtest(d, v2)
It is not clear about the expected output as the OP's function gives the character string as output for some cases
Dtest(d, "v1")
# v1 v2 z
#1 1 9 v1
#2 2 8 v1
#Warning message:
#In ls(get_env(x)) : ‘get_env(x)’ converted to character string
Here, we assume that the function should evaluate to get the value of the column 'v1'
DtestN <- function(dat, x, res = "z"){
lst <- as.list(match.call())
x <- if(is.character(lst$x)) {
rlang::parse_quosure(x, env = parent.frame())
} else enquo(x)
res <- if(!is.character(lst$res)) quo_name(enquo(res)) else res
dat %>%
mutate(UQ(res) := UQ(x))
}
DtestN(d, 'v1')
# v1 v2 z
#1 1 9 1
#2 2 8 2
DtestN(d, v1)
# v1 v2 z
#1 1 9 1
#2 2 8 2
DtestN(d, v1, z)
# v1 v2 z
#1 1 9 1
#2 2 8 2
DtestN(d, 'v1', z)
# v1 v2 z
#1 1 9 1
#2 2 8 2
Some more cases
DtestN(d, v1, new)
# v1 v2 new
#1 1 9 1
#2 2 8 2
DtestN(d, v1, 'new')
# v1 v2 new
#1 1 9 1
#2 2 8 2
DtestN(d, v2, 'new')
# v1 v2 new
#1 1 9 9
#2 2 8 8

Return a list in dplyr mutate()

I have a function in my real-world problem that returns a list. Is there any way to use this with the dplyr mutate()? This toy example doesn't work -:
it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2))
myfun = function(arg1,arg2) {
temp1 = arg1 + arg2
temp2 = arg1 - arg2
list(temp1,temp2)
}
myfun(1,2)
it%.%mutate(new = myfun(V2,V3))
I see that it is cycling through the output of the function in the first "column" of the new variable, but do not understand why.
Thanks!
The idiomatic way to do this using data.table would be to use the := (assignment by reference) operator. Here's an illustration:
it[, c(paste0("V", 4:5)) := myfun(V2, V3)]
If you really want a list, why not:
as.list(it[, myfun(V2, V3)])
Alternatively, maybe this is what you want, but why don't you just use the data.table functionality:
it[, c(.SD, myfun(V2, V3))]
# V1 V2 V3 V4 V5
# 1: a 1 2 3 -1
# 2: a 2 3 5 -1
# 3: b 3 4 7 -1
# 4: b 4 2 6 2
# 5: c 5 2 7 3
Note that if myfun were to name it's output, then the names would show up in the final result columns:
# V1 V2 V3 new.1 new.2
# 1: a 1 2 3 -1
# 2: a 2 3 5 -1
# 3: b 3 4 7 -1
# 4: b 4 2 6 2
# 5: c 5 2 7 3
Given the title to this question, I thought I'd post a tidyverse solution that uses dplyr::mutate. Note that myfun needs to output a data.frame to work.
library(tidyverse)
it = data.frame(
v1 = c("a","a","b","b","c"),
v2 = c(1,2,3,4,5),
v3 = c(2,3,4,2,2))
myfun = function(arg1,arg2) {
temp1 = arg1 + arg2
temp2 = arg1 - arg2
data.frame(temp1, temp2)
}
it %>%
nest(data = c(v2, v3)) %>%
mutate(out = map(data, ~myfun(.$v2, .$v3))) %>%
unnest(cols = c(data, out))
#> # A tibble: 5 x 5
#> v1 v2 v3 temp1 temp2
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 a 1 2 3 -1
#> 2 a 2 3 5 -1
#> 3 b 3 4 7 -1
#> 4 b 4 2 6 2
#> 5 c 5 2 7 3
Created on 2020-02-04 by the reprex package (v0.3.0)
The mutate() function is designed to add new columns to the existing data frame. A data frame is a list of vectors of the same length. Thus, you cant add a list as a new column, because a list is not a vector.
You can rewrite your function as two functions, each of which return a vector. Then apply each of these separately using mutate() and it should work.

Resources