Formula evaluation with mutate() - r

Is there a way to make mutate() evaluate formulas in (d)plyr package of R? I think of situations where one has many variables like count.a, count.b, ..., count.z and I would like to create a new variable to sum all these. I can create a character string like "count.total = count.a + count.b + (...) + count.z", but how to make mutate() evaluate it?

If you want expression input
library(dplyr)
df = data.frame(x = 1:10, y = 2:11)
f = function(df, s){
eval(substitute(mutate(df, z = s)))
}
f(df, x-y)
f(df, x+y)
If you want character input
g = function(df, s){
q = quote(mutate(df, z = s))
eval(parse(text=sub("s", s, deparse(q))))
}
g(df, "x-y")
g(df, "x+y")
You can also modify the functions to take the name of z as an input.
Expression input: f1 passes all extra parameters to mutate, f2 only passes one argument to mutate.
f1 = function(df, ...){
mutate(df, ...)
}
f1(df, a = x-y)
f2 = function(df, ...){
dots = substitute(alist(...))
var = names(dots)[2]
cal = as.call(list(quote(mutate), quote(df)))
cal[var] = dots[2]
eval(cal)
}
f2(df, a = x-y)
Again, if you want to use character input
g1 = function(df, s){
q = quote(mutate(df, z = s))
eval(parse(text=sub("z = s", s, deparse(q))))
}
g1(df, "a=x-y")
g1(df, "w=x+y")

Related

Subsetting in a second level R function

Function foo1 can subset a list by a requested variable (e.g., by = type == 1). Otherwise, foo1 will simply output the inputted list itself.
For my purposes, I need to use foo1 within a new function called foo2.
In my code below, my desired output is obtained like so: foo2(data = D, by = G[[1]]) ; foo2(data = D, by = G[[2]]) ; foo2(data = D, by = G[[3]]).
But, I wonder why when I loop over G using lapply, I get an error as shown below?
foo1 <- function(data, by){
L <- split(data, data$study.name) ; L[[1]] <- NULL
if(!missing(by)){
L <- lapply(L, function(x) do.call("subset", list(x, by)))
}
return(L)
}
foo2 <- function(data, by){
eval(substitute(foo1(data = data, by = by)))
}
## EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", h = T) ## Data
G <- lapply(unique(na.omit(D$type)), function(i) bquote(type == .(i)))# all levels of `type`
foo2(data = D, by = G[[1]]) # Works fine without `lapply` :-)
lapply(1:3, function(i) foo2(data = D, by = G[[i]])) # Doesn't work with `lapply`! :-(
# Error in do.call("subset", list(x, by)) : object 'i' not found
Your foo2 function tries to evaluate the expression
foo1(data = D, by = G[[i]])
but it doesn't have i available. You need to evaluate G[[i]] in the anonymous function you're passing to lapply to get an expression defining the subset, and then evaluate that subset in foo2. I recommend naming that function instead of using an anonymous one; it makes debugging a lot easier.
Here's some recoding that appears to work:
Redefine foo2 to
foo2 <- function(data, by){
by <- eval(by, envir = data)
foo1(data = data, by = by)
}
and
foo3 <- function(i) {
expr <- G[[i]]
foo2(data = D, by = expr)
}
and then
lapply(1:3, foo3)
I'm not sure this does exactly what you want, but it should be close enough that you can fix it up.
Instead of using lapply, here a for loop can be used
lst1 <- vector("list", length(G))
for(i in 1:3) lst1[[i]] <- foo2(data = D, by = G[[i]])
-checking
identical(lst1[[2]], foo2(data = D, by = G[[2]]))
#[1] TRUE
identical(lst1[[3]], foo2(data = D, by = G[[3]]))
#[1] TRUE
For the lapply part, there seems to be a conflict with i anonymous function which is also called in the G. If we use a new variable say 'j'
lst2 <- lapply(1:3, function(j) foo1(data = D, by = G[[j]]))
should work
identical(lst2[[2]], lst1[[2]])
#[1] TRUE

Function scope: using results from high-level function in low-level function

I would like to use the calculations from high-level (outer) function high_lvl_fun in in a low-level (inner) function low_lvl_fun. The low-level function is an argument of the high-level one (I would like to use different functions with different sets of arguments). My reproducible example:
set.seed(101)
low_lvl_fun <- function(x, y){ # low-level (inner) function
sum((x-y)^2) # Mean Squared Error
}
high_lvl_fun <- function(x, y = NULL, FUN, args){ # high level (outer) function
# Just some toy changes in y to check if the code works
if(length(y) == 0){
y <- rep(1, length(x))
}else{
y <- rep(2, length(x))
}
do.call(FUN, args = args) # Call of low_lvl_fun
}
The low-level function computes Mean Squared Error. The high-level function performs some operations on vector y and calls the low-level function. Declaration of such an argument and the high-level function call:
x <- rnorm(100)
high_lvl_fun(x, y = NULL, FUN = "low_lvl_fun", args = list(x, y))
results in such an error:
Error in do.call(FUN, args = args) : object 'y' not found
I understand that the low-level function assumes that the value of y is NULL (as declared in high-level function call), however, I don't know how to change the scope in which the low-level function searches for y.
The only solution I came up with would be to declare y in the global environment:
high_lvl_fun2 <- function(x, y = NULL, FUN, args){ # high level (outer) function
if(length(y) == 0){
y <<- rep(1, length(x))
}else{
y <<- rep(2, length(x))
}
do.call(FUN, args = args) # Call of low_lvl_fun
}
however, I would like to avoid modifying y in the global environment.
EDIT: (more details)
The low-level function can take arguments other than x and y. It may also require only x and other arguments, and not y, for example:
low_lvl_fun2 <- function(x){sd(x)/mean(x)}
The other important thing is that high and low-level functions can have the arguments with the same names (like above, where both functions have arguments called x and y) and it would be good not being forced to rewrite low-level function. Unfortunately, the implementation in the comments suggested by #Andrea does not meet this condition, since matching two arguments with the same names throws an error:
high_lvl_fun <- function(x, y = NULL, FUN, ...){ # Function suggested by #Andrea
dots <- lazy_eval(lazy_dots(...))
# Just some toy changes in y to check if the code works
if(length(y) == 0){
y <- rep(1, length(x))
}else{
y <- rep(2, length(x))
}
args <- c(list(x , y) , dots)
do.call(FUN, args = args) # Call of low_lvl_fun
}
# Calling the low-level function at the beginning of the post
high_lvl_fun(x = 1:10, y = 2:11, FUN = "low_lvl_fun", x = x, y = y)
Error in high_lvl_fun(x = 1:10, y = 2:11, FUN = "low_lvl_fun", x = x,
: formal argument "x" matched by multiple actual arguments
Assuming that low_lvl_fun() takes x and y only. This should do the job
high_lvl_fun <- function(x, y = NULL, FUN ){ # high level (outer) function
# Just some toy changes in y to check if the code works
if(length(y) == 0){
y <- rep(1, length(x))
}else{
y <- rep(2, length(x))
}
args <- list(x = x, y = y)
do.call(FUN, args = args) # Call of low_lvl_fun
}
As a more general solution I would suggest
The use of the ... argument
require(lazyeval)
high_lvl_fun <- function(x, y = NULL, FUN, ...){ # high level (outer) function
dots <- lazy_eval(lazy_dots(...))
# Just some toy changes in y to check if the code works
y <- y+1
args <- c(list(x , y) , dots)
do.call(FUN, args = args) # Call of low_lvl_fun
}
# Ex 1
f <- function(x, y , z) {x+y+z}
high_lvl_fun (x = 1, y = 2, FUN = f, z = 3)
# Ex 2
g <- function(x, y , z, mean , sd) {
n <- x+y+z
sum(rnorm(n , mean , sd))
}
high_lvl_fun (x = 1, y = 2, FUN = g, z = 3, mean = 100, sd = 1)

How can I use dplyr/magrittr's pipe inside functions in R?

I'm trying to write a function which takes as argument a dataframe and the name of the function. When I try to write the function with the standard R syntax, I can get the good result using eval and substitute as recommanded by #hadley in http://adv-r.had.co.nz/Computing-on-the-language.html
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ out <- mean(eval(expr = substitute(x), envir = data))
+ return(out)
+ }
> f(data = df, x = y)
[1] 5.5
Now, when I try to write the same function using the %>% operator, it doesn't work :
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ data %>%
+ eval(expr = substitute(x), envir = .) %>%
+ mean()
+ }
> f(data = df, x = y)
Show Traceback
Rerun with Debug
Error in eval(expr, envir, enclos) : objet 'y' introuvable
>
How can I using the combine the piping operator with the use of eval and substitute ? It's seems really tricky for me.
A workaround would be
f <- function(data, x) {
v <- substitute(x)
data %>%
eval(expr = v, envir = .) %>%
mean()
}
The problem is that the pipe functions (%>%) are creating another level of closure which interferes with the evaluation of substitute(x). You can see the difference with this example
df <- data.frame(y = 1:10)
f1 <- function(data, x) {
print(environment())
eval(expr = environment(), envir = data)
}
f2 <- function(data, x) {
print(environment())
data %>%
eval(expr = environment(), envir = .)
}
f1(data = df, x = y)
# <environment: 0x0000000006388638>
# <environment: 0x0000000006388638>
f2(data = df, x = y)
# <environment: 0x000000000638a4a8>
# <environment: 0x0000000005f91ae0>
Notice how the environments differ in the matrittr version. You want to take care of substitute stuff as soon as possible when mucking about with non-standard evaluation.
I hope your use case is a bit more complex than your example, because it seems like
mean(df$y)
would be a much easier bit of code to read.
I've been trying to understand my problem.
First, I've written what I want with the summarise() function :
> library(dplyr)
> df <- data.frame(y = 1:10)
> summarise_(.data = df, mean = ~mean(y))
mean
1 5.5
Then I try to program my own function. I've found a solution which seems to work with the lazyeval package in this post. I use the lazy() and the interp() functions to write what I want.
The first possibility is here :
> library(lazyeval)
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ summarise_(.data = data, mean = inter)
+ }
> f(data = df, col = y)
mean
1 5.5
I can also use pipes :
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ data %>%
+ summarise_(.data = ., mean = inter)
+ }
>
> f(data = df, col = y)
mean
1 5.5
I would not use eval and substitute.
What follows is a simplified version of this great post suited to your question.
df <- data.frame(y = 1:10)
f <- function(data, x) {
x <- enquo(x)
df %>% summarise(mean = mean(!!x))
}
f(data = df, x = y)
There are two things happening here:
Tranforming the column name with enquo()
Prefixing the column with !!
Please see refer to the link for a more complicated example.

change argument names inside a function r

I'm trying to adjust the names of an argument inside a function. I want to create a procedure that takes the body of a function, looks for x, changes every x into x0, and then restores the function to what it was before. To provide an example:
f = function(x, y) -x^2 + x + -y^2 + y
# Take old names
form_old = names(formals(f))
# Make new names
form_new = paste0(form_old, 0)
# Give f new formals
formals(f) = setNames(vector("list", length(form_new)), form_new)
# Copy function body
bod = as.list(body(f))
for (i in 1:length(form_new)) {
bod = gsub(form_old[i], form_new[i], bod)
}
# return from list to call ?
body(f) = as.call(list(bod))
f(1, 1) # produces an error
So far, this code will change all variable names from x to x0 and from y to y0. However, the final output of bod is a character vector and not a call. How can I now change this back to a call?
Thanks in advance!
Surely there is a better way to do what you are trying to do that doesn't require modifying functions. That being said, you definetly don't want to be replacing variables by regular expressions, that could have all sorts of problems. Generally, trying to manipulate code as strings is going to lead to problems, for example, a function like tricky <- function(x, y) { tst <- "x + y"; -xx*x + yy*y }, where there are strings and variable names overlap, will lead to the wrong results.
Here is a function that takes a recursive approach (Recall) to traverse the expression tree (recursion could be avoided using a 'stack' type structure, but it seems more difficult to me).
## Function to replace variables in function body
## expr is `body(f)`, keyvals is a lookup table for replacements
rep_vars <- function(expr, keyvals) {
if (!length(expr)) return()
for (i in seq_along(expr)) {
if (is.call(expr[[i]])) expr[[i]][-1L] <- Recall(expr[[i]][-1L], keyvals)
if (is.name(expr[[i]]) && deparse(expr[[i]]) %in% names(keyvals))
expr[[i]] <- as.name(keyvals[[deparse(expr[[i]])]])
}
return( expr )
}
## Test it
f <- function(x, y) -x^2 + x + -y^2 + y
newvals <- c('x'='x0', 'y'='y0') # named lookup vector
newbod <- rep_vars(body(f), newvals)
newbod
# -x0^2 + x0 + -y0^2 + y0
## Rename the formals, and update the body
formals(f) <- pairlist(x0=bquote(), y0=bquote())
body(f) <- newbod
## The new 'f'
f
# function (x0, y0)
# -x0^2 + x0 + -y0^2 + y0
f(2, 2)
# [1] -4
With a more difficult function, where you want to avoid modifying strings or the other variables named yy and xx for example,
tricky <- function(x, y) { tst <- "x + y"; -xx*x + yy*y }
formals(tricky) <- pairlist(x0=bquote(), y0=bquote())
body(tricky) <- rep_vars(body(tricky), newvals)
tricky
# function (x0, y0)
# {
# tst <- "x + y"
# -xx * x0 + yy * y0
# }
#
There are a few ways to go here. Following your code, I would go with something like this:
f = function(x, y) -x^2 + x + -y^2 + y
# Take old names
form_old = names(formals(f))
# Make new names
form_new = paste0(form_old, 0)
deparse(body(f)) -> bod
for (i in 1:length(form_new)) {
bod = gsub(form_old[i], form_new[i], bod, fixed = TRUE)
}
formals(f) = setNames(vector("list", length(form_new)), form_new)
body(f) <- parse(text = bod)
f(1, 1)

Wrapping very long functions in RExcel VBA?

When you want to use R functions in VBA via RExcel, you have to use
RInterface.RRun "..."
Then, if you'd like to define your own R function, you can simply
RInterface.RRun "y <- function(x) { ... }"
If y is made up by more than one command line, you can separate each line with ;, as you're used to do in R environment.
But... what if your y function is very very long?
A 20 ~ 30 rows R function is damn difficult to be written in such a way in VBA; and there's a limit to the length of VBA sentences.
So: how may I wrap?
Here's an example of a quite long R function: can you show me how to put in VBA using RExcel?
bestIV <- function(dT, IVTS.t, Spot, r) {
b <- r
xout <- seq(0, max(T), dT)
sfm <- matrix(NA, nrow = length(K), ncol = length(xout))
for(i in 1:length(K)) {
sfm[i,] <- approx(x = T, y = IVTS.t[i,], xout = xout, rule = 2)$y
}
sfm[,1] <- sfm[,1] + sfm[,2] - sfm[,3]
rownames(sfm) <- K
colnames(sfm) <- xout
Option <- matrix(NA, nrow = length(K), ncol = length(xout))
for(i in 1:length(K)) {
for(j in 1:length(xout)) {
TypeFlag <- ifelse(K[i] < Spot, 'p', 'c')
Option[i,j] <- GBSOption(TypeFlag = TypeFlag, S = Spot, X = K[i],
Time = xout[j] / 365, r = r, b = b,
sigma = sfm[i,j] / 100)#price
}
}
rownames(Option) <- K
colnames(Option) <- xout
dP <- (cbind(0, -t(apply(X = Option, MARGIN = 1, FUN = diff))) / Option)[,-(1:2)]
dV <- dP / dT
min.V <- which(dV == min(dV), arr.ind = TRUE, useNames = TRUE)
Strike <- as.numeric(dimnames(min.V)[1])
Maturity <- as.numeric(unlist(dimnames(dV)[2]))[min.V[2]]
Days <- dT
Mat <- c(dV[which(dV == min(dV))], Strike, Maturity, Days)
names(Mat) <- c('Value', 'Strike', 'Maturity', 'Days')
return(Mat)
}
Thanks,
Put your R code in your spreadhseet (in a range of cells) and use this function instead:
RInterface.RunRCodeFromRange range
Executes the commands in range on a worksheet
(allows to use commands prepared for interactive execution with R to be run in macro code)
You are passing a string as an argument to a VBA function. Thus your question reduces to "how can I concatenate strings in VBA".
The answer is to use the concatenation operator &, like this:
"a" & "b"
Say you have an R function:
y <- function(x, a, b){
return(x)
}
Then you can do this in VBA:
RInterface.RRun "y <- function(x, a, b) {" &
"return(x)" &
"}"

Resources