Dplyr indirection / pipe doesn't work inside a closure - r

I have a code which uses dplyr indirection:
library(dplyr)
createGenerator <- function(data, column)
{
values <- data %>% pull({{column}})
function(n)
{
values %>% sample(n)
}
}
df <- data.frame(x = 1:10, y = 1:10)
df %>% createGenerator(x)(1)
It gives me an error
Error in pull(., { : object 'x' not found
However if I don't create a closure it works, like in code below
createGenerator <- function(data, column, n)
{
values <- data %>% pull({{column}}) %>% sample(n)
}
But I need a possibility to create a closure. What am I missing in closure creation code?

There is a problem with the pipes, specifically the pipe within the enclosed function. I guess there might be a scoping problem, as you are dealing with different environments and also promises rather than existing objects.
No pipe (which I personally prefer, but I guess that's taste)
library(dplyr)
createGenerator <- function(data, column) {
values <- pull(data, {{ column }})
function(n) {
sample(values, n)
}
}
df <- data.frame(x = 1:10, y = 1:10)
createGenerator(df, x)(2)
#> [1] 4 5
or you create values within the enclosed function. Then the pipe works.
createGenerator <- function(data, column) {
function(n) {
values <- data %>% pull({{column}})
values %>% sample(n)
}
}
createGenerator(df, x)(2)
#> [1] 7 5

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

Error with tidy select when feeding column names into purrr::map for user function

I have a long function that uses a dataframe column name as an input and am trying to apply it to several different column names without a new line of code each time. I am having issues with tidyselect within the function called by map. I believe the issue is related to defusing, but I cannot figure it out. A toy example using mtcars data is below.
This works correctly with map:
library(tidyverse)
sum_dplyr <- function(df, x) {
res <- df %>% summarise(mean = mean({{x}}, na.rm = TRUE))
return(res)
}
sum_dplyr(mtcars, disp)
map(names(mtcars), ~ sum_dplyr(mtcars, mtcars[[.]])) # all columns -> works fine
While this gives the error "Must subset columns with a valid subscript vector" when feeding the function through map:
library(tidyverse)
sel_dplyr <- function(df, x) {
res <- df %>% dplyr::select({{x}})
return(res)
}
sel_dplyr(mtcars, disp) # ok
map(names(mtcars), ~ sel_dplyr(mtcars, mtcars[[.]])) # all columns -> error
What am I missing here ? Many thanks !
It may be better to correct the function to make sure that it takes both unquoted and quoted. With map, we are passing a character string. So, instead of {{}}, can use ensym with !!
sum_dplyr <- function(df, x) {
x <- rlang::ensym(x)
res <- df %>%
summarise(mean = mean(!!x, na.rm = TRUE))
return(res)
}
Similarly for sel_dplyr
sel_dplyr <- function(df, x) {
x <- rlang::ensym(x)
res <- df %>%
dplyr::select(!! x)
return(res)
}
and then test as
library(purrr)
library(dplyr)
map(names(mtcars), ~ sel_dplyr(mtcars, !!.x))
sel_dplyr(mtcars, carb)

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

How to dynamically name files in lapply?

Here's some sample code:
dat <- data.frame(test1=runif(100,0,1),
test2=runif(100,0,1),
test3=runif(100,0,1))
variable_of_interest <- "test1"
dat_multi <- 3
test_save <- function(dat, x_seq) {
saveRDS(dat, file=paste0("data/", variable_of_interest,"_", x_seq,".RDS"))
}
test_func <- function(dat, dat_multi,x) {
res <- as.data.frame(as.matrix(dat*dat_multi))
dat_test <- rbind(melt(res$test1) %>% mutate(Var = "First_Testing_1"),
melt(res$test2) %>% mutate(Var = "Different_Testing_2"),
melt(res$test3) %>% mutate(Var = "Very_Different_Testing_3"))
test_save(dat_test,x)
}
x_seq <- 1:3
lapply(x_seq, function(x) test_func(dat, dat_multi, x))
Just looking to save files with the variable_of_interest and the iteration in the filename:
data/test1_1.RDS
data/test1_2.RDS
data/test1_3.RDS
The loop can be
lapply(x_seq, function(x) test_save(dat, x))
It is better to use a different lambda name than the object name already created in the global env. Also, if 'dat' is the same, then the lambda function can have a single argument
In the updated function, there are some issues in the code i.e. $ is used for extraction on a matrix object. Instead, it would be [ . It can be made compact with
test_func <- function(dat, dat_multi, x) {
res <- as.matrix(dat*dat_multi)
names(res) <- paste0("Testing_", seq_len(ncol(res)))
test_save(melt(res), x)
}
lapply(x_seq, function(x) test_func(dat, dat_multi, x))

programming with dplyr::arrange in dplyr v.0.7

I am trying to get my head around the new implementations in dplyr with respect to programming and non standard evaluation. So the verb_ functions are replaced by enquo of the argument and then applying !! in the regular verb function. Translating select from old to new works fine, the following function give similar results:
select_old <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
x %>% select(vars)
}
select_new <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
vars_enq <- enquo(vars)
x %>% select(!!vars_enq)
}
However when I try to use arrange in the new programming style I'll get an error:
arrange_old <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
x %>% arrange_(vars)
}
arrange_new <- function(x, ...){
vars <- as.character(match.call())[-(1:2)]
vars_enq <- enquo(vars)
x %>% arrange(!!vars_enq)
}
mtcars %>% arrange_new(cyl)
# Error in arrange_impl(.data, dots) :
# incorrect size (1) at position 1, expecting : 32
32 is obviously the number of rows of mtcars, the inner function of dplyr apparently expects a vector of this length. My questions are why does the new programming style not traslate for arrange and how to it then in the new style.
You are overthinking it. Use the appropriate function to deal with .... No need to use match.call at all (also not in the old versions, really).
arrange_new <- function(x, ...){
dots <- quos(...)
x %>% arrange(!!!dots)
}
Of course this function does the exact same as the normal arrange, but I guess you are just using this as an example.
You can write a select function in the same way.
The arrange_old should probably have looked something like:
arrange_old <- function(x, ...){
dots <- lazyeval::lazy_dots(...)
x %>% arrange_(.dots = dots)
}
You don't actually need rlang in this situation. This will work:
my_arrange <- function(x, ...) arrange(x, ...)
# test
DF <- data.frame(a = c(2, 2, 1, 1), b = 4:1)
DF %>% my_arrange(a, b)

Resources