R - Creating function call within function using relational operator as variable - r

I am trying to write a function that will apply a user-specified binary operator (e.g. < ) to a raster object. To do so is fairly simple. For example:
selection <- raster::overlay(x = data, fun = function(x) {return(x < 2)}
My issue is that this code would be running within a function, with which I would like to specify both the binary operator and the criteria value (which is 2 in the example above) as variables. For example:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) {x criteria value})
return(selection)
}
I have tried to construct the function as a call without success.
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) {call(sprintf("x %s %s", criteria, value))}
return(selection)
}
Is there a way to construct the call of the second function using variables in the first function?
Thanks for your help.

Write your code like this:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) binary_operator(x, value))
return(selection)
}
You need to call this as
my.func(data, `<`, 2)
(with backticks for quotes). If you want to allow "<" for the operator, you could use do.call:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x)
do.call(binary_operator, list(x, value)))
return(selection)
}
This will work with either form of argument.

The example is probably simpler than the real case, but you in the example you use, it would be more direct to do:
selection <- data < 2

Related

Calling a character string into object names within a function

I've currently got a very lengthy and repetitive bit of code for data normalisation and inversion ((x-min)/(max-min)*-1)+1) that I want to clean up a bit.
This is a small sample of what it currently looks like:
W3_E1_Norm_New <- W3_E1_Average%>%
mutate(W3_E1_Norm_New = ((W3_E1_zoo-W3_E1_Min)/(W3_E1_Max-W3_E1_Min)*-1)+1)
W3_E2_Norm_New <- W3_E2_Average%>%
mutate(W3_E2_Norm_New = ((W3_E2_zoo-W3_E2_Min)/(W3_E2_Max-W3_E2_Min)*-1)+1)
W3_E3_Norm_New <- W3_E3_Average%>%
mutate(W3_E3_Norm_New = ((W3_E3_zoo-W3_E3_Min)/(W3_E3_Max-W3_E3_Min)*-1)+1)
Each 'W3_E1' refers to a sample ID, and at present each sample ID requires the two lines of code to be written out each time.
Ideally I'd like to write a function which can call a character string (Sample_IDs) into the names of each data frame, so something like
a_Norm_New
would return
W3_E1_Norm_New
then
W3_E2_Norm_New
etc.
Is there a way to write a function that could accomplish this?
Many thanks
I don't have your data but this should work. Define a function:
my_fun <- function (x) {
norm_new <- paste0(x,"_Norm_New")
average <- paste0(x,"_Average")
zoo <- paste0(x, "_zoo")
min <- paste0(x, "_Min")
max <- paste0(x, "_Max")
df <- get(average) %>%
mutate(new_norm = ((zoo - min) / (max - min) * - 1) + 1)
assign(df, norm_new)
}
Then run a for loop:
Sample_IDs <- c("W3_E1", "W3_E2", "W3_E3")
for (i in Sample_IDs) {
my_fun(i)
}
With data.table, it is very easy to write functions that use quoted variable names (see a blog post I wrote on the subject).
Here, we paste the pattern of your column name everywhere with the sufx variable:
library(data.table)
normalize <- function(dt, sufx = "W3_E1"){
df <- as.data.table(dt)
df[, (paste0(sufx,"_Norm_New")) := (
(get(paste0(sufx,_zoo)) - get(paste0(sufx,"_Min"))
)/(
get(paste0(sufx,"_Max")) - get(paste0(sufx,"_Min"))
)*-1)+1)
}
Here the code is not easy to read because I wanted to show that this can be done in one line but you can give more readability easily.
In this solution, you use get to unquote your variable name.

Object not found - nested function - R

I am still getting used with functions. I had a look in environments documentation but I can't figure out how to solve the error. Lets see what I tried until now:
I have a list of documents. Lets suppose it is "core"
library(dplyr)
table_1 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
table_2 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
core <- list(table_1, table_2)
Then, I have to run the function documents_ for each element of the list. This function gives some parameters to execute in another nested function:
documents_ <- function(i) {
core_processed <- as.data.frame(core[[i]])
x <- 1:nrow(core_processed)
y <- 1:ncol(core_processed)
temp <- sapply(x, function(x) mapply(calc_dens_,x,y))
return(temp)
}
Inside that, there is the function calc_dens, which is:
calc_dens_ <- function(x, y) {
core_temp <- core_processed %>%
filter(X2 == x & X3 == y)
return(core_temp)
}
Then, for iterate for each element of the list, I tried without success:
calc <- lapply(c(1:2), function(i) documents_(i))
Error in eval(lhs, parent, parent) : object 'core_processed' not found
The calc_dens function doesn't get the results of the documents_ (environment problem. Is there a way to solve this, or another better approach? My function is more complex than this, but the main elements are in this example. Thank you in advance.
As the other commenters have said, the problem is that you are referring to a variable, core_processed that is not in scope. You can make it a global variable, but it might be more sensible just to use it in a closure like this:
table_1 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
table_2 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
cores <- list(table_1, table_2)
documents_ <- function(core_processed) {
x <- 1:nrow(core_processed)
y <- 1:ncol(core_processed)
calc_dens <- function(x, y) core_processed %>% filter(X2 == x & X3 == y)
sapply(x, function(x) mapply(calc_dens, x, y))
}
calc <- lapply(cores, documents_)
If cores is a list of data frames, you do not need to to use as.data.frame and since you use lapply, there is no need to apply over indices and then index into the list. So the code I wrote here is simplified but does the same as your code.
I have to wonder, though, is this really what you want? The sapply over x and then mapply over x and y -- where x is the one from the sapply and not the ist you built in documents_ -- looks mighty strange to me.

dplyr and overlapping variable names with surrounding environment

Let's say I have a (dplyr/tibble) data-frame/tbl constructed like so:
df <- data_frame(x = 1:10)
Now, I'd like to use this within a function that works with df via some dplyr verbs, like so:
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter(df, x == x)
}
But this will always return the full df... I'm trying to figure out a way to implement scoping within a dplyr verb, something like:
filter_(df, ~x == x)
... which doesn't work, either. In some other languages, you might be able to achieve this via something like:
df.filter(this.x == x)
... where this refers to the df instance.
My only work-around so far is naming the function's variable like so:
myfun <- function(df, query_x) {
query_x <- doSomeStuffTo(query_x)
filter(df, x == query_x)
}
I suspect this is doable (without using a name like query_x) somehow with SE dplyr verbs (e.g. filter_), but I haven't stumbled upon the correct pattern yet. Anyone here have the answer?
To dynamically build different dplyr commands you typically use the standard evaluation versions of the functions (the ones with the underscores) and the lazyeval package. Here's how you could change your function
doSomeStuffTo <- function(x) {x+1}
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter_(df, lazyeval::interp(~x == y, y=x))
}
df <- data_frame(x = 1:10)
myfun(df,3)
but even in the interp we can't have x==x because it's not clear which x you want to replace. Both filter(df, 3==x) and filter(df, x==3) work with dplyr. You can have constants or column names on either side of the equality.
If you use filter_ you can pass logical expressions via quote:
myfun <- function(df, t) {
df$x <- 5*df$x
filter_(df, t )
}
> myfun(df, t= quote(x < 25) )
# A tibble: 4 x 1
x
<dbl>
1 5
2 10
3 15
4 20
I stumbled into the same issue. Instead of wrangling with even more complex evaluations, it's usually easier to just rename the function argument. Like this:
myfun <- function(df, x) {
x_ <- doSomeStuffTo(x)
filter(df, x == x_)
}
This solution is still dangerous because we might hit another variable called x_. One can be defensive about this by checking the variable names in df and making sure to pick one that isn't there. Or more lazily, one can use very implausible variable names. I often use stuff like _____temp.
Maybe the new dplyr 0.6.0 evaluation system will handle this better. See the notes about the new system, tidyeval.

Not passing all optional arguments in apply

I am facing some problem with the apply function passing on arguments to a function when not needed. I understand that apply don't know what to do with the optional arguments and just pass them on the function.
But anyhow, here is what I would like to do:
First I want to specify a list of functions that I would like to use.
functions <- list(length, sum)
Then I would like to create a function which apply these specified functions on a data set.
myFunc <- function(data, functions) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]]))
}
This works fine.
data <- cbind(rnorm(100), rnorm(100))
myFunc(data, functions)
[1] 100 100
[1] -0.5758939 -5.1311173
But I would also like to use additional arguments for some functions, e.g.
power <- function(x, p) x^p
Which don't work as I want to. If I modify myFunc to:
myFunc <- function(data, functions, ...) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]], ...))
}
functions as
functions <- list(length, sum, power)
and then try my function I get
myFunc(data, functions, p=2)
Error in FUN(newX[, i], ...) :
2 arguments passed to 'length' which requires 1
How may I solve this issue?
Sorry for the wall of text. Thank you!
You can use Curry from functional to fix the parameter you want, put the function in the list of function you want to apply and finally iterate over this list of functions:
library(functional)
power <- function(x, p) x^p
funcs = list(length, sum, Curry(power, p=2), Curry(power, p=3))
lapply(funcs, function(f) apply(data, 2 , f))
With your code you can use:
functions <- list(length, sum, Curry(power, p=2))
myFunc(data, functions)
I'd advocate using Colonel's Curry approach, but if you want to stick to base R you can always:
funcs <- list(length, sum, function(x) power(x, 2))
which is roughly what Curry ends up doing
One option is to pass the parameters in a list with the arguments needed for each function. You can add those parameters to the others needed for apply using c and then use do.call to call the function. Something like this. I also wrap all the output in a list here rather than using print; your usage may vary.
power <- function(x, p) x^p
myFunc <- function(data, functions, parameters) {
lapply(seq_along(functions), function(i) {
p0 <- list(X=data, MARGIN=2, FUN=functions[[i]])
do.call(apply, c(p0, parameters[[i]]))
})
}
d <- matrix(1:6, nrow=2)
functions <- list(length, sum, power)
parameters <- list(NULL, NULL, p=3)
myFunc(d, functions, parameters)
You can use lazyeval package:
library(lazyeval)
my_evaluate <- function(data, expressions, ...) {
lapply(expressions, function(e) {
apply(data, MARGIN=2, FUN=function(x) {
lazy_eval(e, c(list(x=x), list(...)))
})
})
}
And use it like this:
my_expressions <- lazy_dots(sum = sum(x), sumpow = sum(x^p), length_k = length(x)*k )
data <- cbind(rnorm(100), rnorm(100))
my_evaluate(data, my_expressions, p = 2, k = 2)

character string as function argument r

I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)

Resources