What's the best way to do arithmetic on functions in R? - r

In math, when combining functions, you can indicate this using ordinary arithmetic operations, e.g.
u = 2*x
v = log(x)
and then simply
f = u + v
I do a lot of numerical work where you have to construct complicated math operations. It would be very helpful to be able to use a notation more like this one. For example, in R it might look like
f <- function.arithmetic('u+v', args=c('x'))
With some nonstandard evaluation, this might even be as simple as
f(x) %def% u + v
where u and v are already defined functions of x.
Is there a simple way to set up this syntax? For example, breaking down the expression and substituting u(x) and v(x) wherever they occur -- then doing an ordinary eval. (I would trust an existing parsing function more than some hack that I write. And I know that "parse" doesn't do this.)
Answers already suggested would work, but they seem more complicated than writing
f <- function(x) u(x) + v(x)
If the expression is more complicated, this notation starts getting harder to read. I want to set up something simpler, more easily readable, and closer to the above mathematical notation.

Here are a couple of approaches:
1) Ops/Math This could be done using S3 or S4. We illustrate S3 only.
Compose <- function(f, g) {
if (missing(g)) structure(f, class = c("functional", "function"))
else Compose(function(...) f(g(...)))
}
Ops.functional <- function(e1, e2) {
op <- match.fun(.Generic)
Compose( if (is.numeric(e1)) function(...) op(e1, e2(...))
else if (is.numeric(e2)) function(...) op(e1(...), e2)
else function(...) op(e1(...), e2(...)))
}
Math.functional <- function(x, ...) {
op <- match.fun(.Generic)
Compose(op, x)
}
Here are two examples:
u <- Compose(function(x) 2*x)
v <- Compose(log)
(u + v)(pi/2) # example 1
## [1] 3.593175
(exp(u*v) / (1 + u^2 + v^2)) (pi/2) # example 2
## [1] 0.3731149
Note: that u could have alternately been defined as u <- 2 * Compose(identity) . In fact, we could define:
Id <- Compose(identity)
u <- 2*Id
v <- log(Id)
2) Define your own functions This is not really much work. Likely less than a page to define all common functions. This could be done using the %...% infix operators but if you really want to go the infix route (1) above seems preferable. So with this approach we keep it simple. The following could be enhanced to allow numeric arguments to be regarded as constant functions as we did in (1).
Plus <- function(f, g) function(...) f(...) + g(...)
Plus(function(x) 2*x, log)(pi/2)
## [1] 3.593175

This is G.Grothendieck's answer cast in the form of an infix "+"-operator:
`%+%` <- function(f1, f2) { function(x) {f1(x) +f2(x)} }
f <- cos %+% sin
f
#-----
function(x) {f1(x) +f2(x)}
<environment: 0x7ffccd7eeea8>
#-------
f(0)
#[1] 1
There are also examples of functional composition on the 'funprog' page which needs to be pulled up with one of the function names, for instance?Reduce. Can also be defined to accept additional arguments:
`%+%` <- function(f1, f2) { function(x,...) {f1(x,...) +f2(x,...)} }
f2 <- dt %+% dt
#-- testing---
> f2(0)
Error in f1(x, ...) : argument "df" is missing, with no default
> f2(0, 6)
[1] 0.7654655
> dt(0,6)
[1] 0.3827328
To see how this is handled internally you can recover the definition by examining the environment stored with the resulting closure:
> ls(envir=environment(f2) )
[1] "f1" "f2"
> environment(f2)$f1
function (x, df, ncp, log = FALSE)
{
if (missing(ncp))
.Call(C_dt, x, df, log)
else .Call(C_dnt, x, df, ncp, log)
}
<bytecode: 0x7ffcc63e8ff8>
<environment: namespace:stats>
The problem with your example is that you did not define the u and v in an R-functional manner. Presumably this would not be the situation with your proposed use case.
> u = function(x,...)2*x
> v = function(x,...) (log(x))
> f <- u %+% v
> f(4)
[1] 9.386294
Some of this style of programming may be supported by Hadley's lazyeval package:
> require(lazyeval)
Loading required package: lazyeval
> help(pac=lazyeval)
> lazy_eval(interp(~ u + v, u = sum(1:10), v=sum(1:5)) )
[1] 70
> x <- 1:10; y=1:5
> lazy_eval(interp(~ u + v, u = sum(x), v=sum(y)) )
[1] 70
> lazy_eval(interp(~ u / v, u = sum(x), v=sum(y)) )
[1] 3.666667
> lazy_eval(interp(~ log(u) / v, u = sum(x), v=sum(y)) )
[1] 0.2671555
But I have encountered some puzzles that I cannot really understand:
e2 <- ~ exp(u * v)/(1 + x)^2
lazy_eval(interp(e2, u = sum(x)/100, v=sum(y)/100) )
#[1] 0.271499668 0.120666519 0.067874917 0.043439947 0.030166630 0.022163238 0.016968729
#[8] 0.013407391 0.010859987 0.008975196
exp( sum(x)/100 +sum(y)/100 )/(1+x)^2
[1] 0.50343818 0.22375030 0.12585954 0.08055011 0.05593758 0.04109699 0.03146489 0.02486114
[9] 0.02013753 0.01664258

Related

R: instrument function to capture all assignments

Given a regular R function f, I'd like to be able to create a new function f_debug that acts just like f, but lets me keep track of all the assignments to function-local variables that happened inside it.
For example:
f <- function(x, y) {
z <- x + y
df <- data.frame(z=z)
df
}
# This function doesn't work as intended - would like it to (in the case of `f` above)
# write out a list containing `z` and `df` to an RDS file
capturing <- function(func) {
e <- new.env()
altered <- function(...) {
parent <- parent.frame()
e <- something...(func, environment(), parent, etc., etc.)
result <- func(...)
saveRDS(as.list(e), 'foo.rds')
result
}
environment(func) <- e
altered
}
f_debug <- capturing(f)
I'm not sure whether my knowledge gap to do this is large or small, anyone have a solution?
Solution 1: Steal the function's code
Here's a solution which doesn't return a new function which captures intermediate calculations, but rather calls the given function's code internally. There's some limitations, such as it probably only works with named arguments. Instead of storing the intermediate calculations as an RDS, it attaches them as an attribute.
capturing <- function(fun, ...) {
fun <- match.fun(fun)
code <- body(fun)
parent <- environment(fun)
env <- new.env(parent = parent)
for (val in names(list(...))) {
env[[val]] <- list(...)[[val]]
}
result <- eval(code, envir = env, enclos = parent.frame())
attr(result, "intermediate") <- env
result
}
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
intermediates <- function(x) {
attr(x, "intermediate", exact = TRUE)
}
value <- capturing(my_add, x = 1, y = 7)
ls(envir = intermediates(value))
#> [1] "u" "w" "x" "y" "z"
intermediates(value)$x
#> [1] 1
# Created on 2022-02-08 by the reprex package (v2.0.1)
Solution 2: Modify the function's code
One weakness of this solution is that if the chosen function features a call to on.exit(add=FALSE), some additional work needs to be done to modify the function so the internal environment is captured. However, it does work when the function accepts ... arguments.
my_add <- function(x, y) {
z <- x+y
u <- x-y
w <- x*y
x + y
}
insert_capture <- function(code) {
# `<<-` assigns into the global environment if no variable of the given name is found
# while traveling up to the global environment. If you need this assignment to go elsewhere,
# I'd recommend passing in `assign()`. Of course, you could also modify the `on.exit()`
# to use saveRDS.
parse(text=append(deparse(code),
"on.exit(._last_capture <<- environment(), add = TRUE)",
after = 1L))
}
capturing2 <- function(fun) {
fun <- match.fun(fun)
code <- insert_capture(body(fun))
body(fun) <- code
fun
}
my_add2 <- capturing2(my_add)
my_add2(1, 7)
#> [1] 8
ls(envir = ._last_capture)
#> [1] "u" "w" "x" "y" "z"
._last_capture$u
#> [1] -6
Created on 2022-02-08 by the reprex package (v2.0.1)
What you are describing is already implemented in base R with utils::dump.frames, in an even more sophisticated way. It saves the frame (environment) associated with each call in the call stack to an object of class "dump.frames", which you can explore retroactively with utils::debugger as if you had actually run your code under a debugger.
capturing <- function(func, ...) {
cc <- as.call(c(quote(utils::dump.frames), list(...)))
cc <- call("on.exit", cc, add = TRUE)
body(func) <- call("{", cc, body(func))
func
}
capturing injects the call on.exit(utils::dump.frames(...), add = TRUE) into the body of func and returns the modified function.
Here, ... is a list of arguments to dump.frames:
dumpto, a character string giving the name to be used for the "dump.frames" object
to.file, a logical flag indicating whether the "dump.frames" object should be assigned in the global environment or save-ed to paste0(dumpto, ".rda") in the current working directory
include.GlobalEnv, a logical flag indicating whether the global environment should be saved as well
A quick example, which you should try yourself:
tmp <- tempfile()
dir.create(tmp)
cwd <- setwd(tmp)
f <- function(x, y) {
z <- x + y
z + 1
}
g <- capturing(f, dumpto = "zzz", to.file = TRUE)
h <- function(a, b) {
d <- g(a, b)
d + 1
}
h12 <- h(1, 2)
load("zzz.rda")
zzz
## $`h(1, 2)`
## <environment: 0x14c16cb58>
##
## $`#2: g(a, b)`
## <environment: 0x14c16ca40>
##
## attr(,"error.message")
## [1] ""
## attr(,"class")
## [1] "dump.frames"
ls(zzz[[1L]])
## [1] "a" "b"
ls(zzz[[2L]])
## [1] "z" "x" "y"
utils::debugger(zzz)
## Message: Available environments had calls:
## 1: h(1, 2)
## 2: #2: g(a, b)
##
## Enter an environment number, or 0 to exit
## Selection: 2
## Browsing in the environment with call:
## #2: g(a, b)
## Called from: debugger.look(ind)
## Browse[1]> ls()
## [1] "x" "y" "z"
## Browse[1]> x == 1 && y == 2 && z == x + y
## [1] TRUE
## Browse[1]> Q
setwd(cwd)
unlink(tmp, recursive = TRUE)
See ?browser if you are unfamiliar with R's environment browser.
My capturing function has the limitation that on.exit calls in the body of func must also use add = TRUE. If you have written func yourself, then it is not much of a limitation at all, and passing add = TRUE is a good habit anyway.
Ultimately, there is no completely safe way to inject code into functions, but, in an interactive setting, I would say that this level of "unsafety" is fine.

Can I avoid the `eval(parse())` defining a function with `polynomial()` in R?

I want to avoid using parse() in a function definition that contains a polynomial().
My polynomial is this:
library(polynom)
polynomial(c(1, 2))
# 1 + 2*x
I want to create a function which uses this polynomial expression as in:
my.function <- function(x) magic(polynomial(c(1, 2)))
where for magic(), I have tried various combinations of expression(), formula(), eval(), as.character(), etc... but nothing seems to work.
My only working solution is using eval(parse()):
eval(parse(text = paste0('poly_function <- function(x) ', polynomial(c(1, 2)))))
poly_function(x = 10)
# 21
Is there a better way to do want I want? Can I avoid the eval(parse())?
Like you, I though that the polynomial function was returning an R expression, but we were both wrong. Reading the help Index for package:polynom would have helped us both:
str(pol)
#Class 'polynomial' num [1:2] 1 2
help(pac=polynom)
So user20650 is correct and:
> poly_function <- as.function(pol)
> poly_function(10)
[1] 21
So this was how the authors (Venables, Hornick, Maechler) do it:
> getAnywhere(as.function.polynomial)
A single object matching ‘as.function.polynomial’ was found
It was found in the following places
registered S3 method for as.function from namespace polynom
namespace:polynom
with value
function (x, ...)
{
a <- rev(coef(x))
w <- as.name("w")
v <- as.name("x")
ex <- call("{", call("<-", w, 0))
for (i in seq_along(a)) {
ex[[i + 2]] <- call("<-", w, call("+", a[1], call("*",
v, w)))
a <- a[-1]
}
ex[[length(ex) + 1]] <- w
f <- function(x) NULL
body(f) <- ex
f
}
<environment: namespace:polynom>
Since you mention in your comments that getAnywhere was new then it also might be the case that you could gain by reviewing the "run up" to using it. If you type a function name at the console prompt, you get the code, in this case:
> as.function
function (x, ...)
UseMethod("as.function")
<bytecode: 0x7f978bff5fc8>
<environment: namespace:base>
Which is rather unhelpful until you follow it up with:
> methods(as.function)
[1] as.function.default as.function.polynomial*
see '?methods' for accessing help and source code
The asterisk at the end of the polynomial version tells you that the code is not "exported", i.e. available at the console just by typing. So you need to pry it out of a loaded namespace with getAnywhere.
It seems like you could easily write your own function too
poly_function = function(x, p){
sum(sapply(1:length(p), function(i) p[i]*x^(i-1)))
}
# As 42- mentioned in comment to this answer,
# it appears that p can be either a vector or a polynomial
pol = polynomial(c(1, 2))
poly_function(x = 10, p = pol)
#[1] 21
#OR
poly_function(x = 10, p = c(1,2))
#[1] 21

R: preventing copies when passing a variable into a function

Hadley's new pryr package that shows the address of a variable is really great for profiling. I have found that whenever a variable is passed into a function, no matter what that function does, a copy of that variable is created. Furthermore, if the body of the function passes the variable into another function, another copy is generated. Here is a clear example
n = 100000
p = 100
bar = function(X) {
print(pryr::address(X))
}
foo = function(X) {
print(pryr::address(X))
bar(X)
}
X = matrix(rnorm(n*p), n, p)
print(pryr::address(X))
foo(X)
Which generates
> X = matrix(rnorm(n*p), n, p)
> print(pryr::address(X))
[1] "0x7f5f6ce0f010"
> foo(X)
[1] "0x92f6d70"
[1] "0x92f3650"
The address changes each time, despite the functions not doing anything. I'm confused by this behavior because I've heard R described as copy on write - so variables can be passed around but copies are only generated when a function wants to write into that variable. What is going on in these function calls?
For best R development is it better to not write multiple small functions, rather keep the content all in one function? I have also found some discussion on Reference Classes, but I see very little R developers using this. Is there another efficient way to pass the variable that I am missing?
I'm not entirely certain, but address may point to the memory address of the pointer to the object. Take the following example.
library(pryr)
n <- 100000
p <- 500
X <- matrix(rep(1,n*p), n, p)
l <- list()
for(i in 1:10000) l[[i]] <- X
At this point, if each element of l was a copy of X, the size of l would be ~3.5Tb. Obviously this is not the case as your computer would have started smoking. And yet the addresses are different.
sapply(l[1:10], function(x) address(x))
# [1] "0x1062c14e0" "0x1062c0f10" "0x1062bebc8" "0x10641e790" "0x10641dc28" "0x10641c640" "0x10641a800" "0x1064199c0"
# [9] "0x106417380" "0x106411d40"
pryr::address passes an unevaluated symbol to an internal function that returns its address in the parent.frame():
pryr::address
#function (x)
#{
# address2(check_name(substitute(x)), parent.frame())
#}
#<environment: namespace:pryr>
Wrapping of the above function can lead to returning address of a "promise". To illustrate we can simulate pryr::address's functionality as:
ff = inline::cfunction(sig = c(x = "symbol", env = "environment"), body = '
SEXP xx = findVar(x, env);
Rprintf("%s at %p\\n", type2char(TYPEOF(xx)), xx);
if(TYPEOF(xx) == PROMSXP) {
SEXP pr = eval(PRCODE(xx), PRENV(xx));
Rprintf("\tvalue: %s at %p\\n", type2char(TYPEOF(pr)), pr);
}
return(R_NilValue);
')
wrap1 = function(x) ff(substitute(x), parent.frame())
where wrap1 is an equivalent of pryr::address.
Now:
x = 1:5
.Internal(inspect(x))
##256ba60 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 1,2,3,4,5
pryr::address(x)
#[1] "0x256ba60"
wrap1(x)
#integer at 0x0256ba60
#NULL
with further wrapping, we can see that a "promise" object is being constructed while the value is not copied:
wrap2 = function(x) wrap1(x)
wrap2(x)
#promise at 0x0793f1d4
# value: integer at 0x0256ba60
#NULL
wrap2(x)
#promise at 0x0793edc8
# value: integer at 0x0256ba60
#NULL
# wrap 'pryr::address' like your 'bar'
( function(x) pryr::address(x) )(x)
#[1] "0x7978a64"
( function(x) pryr::address(x) )(x)
#[1] "0x79797b8"
You can use the profmem package (I'm the author), to see what memory allocations take place. It requires that your R session was build with "profmem" capabilities:
capabilities()["profmem"]
## profmem
## TRUE
Then, you can do something like this:
n <- 100000
p <- 100
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
object.size(X)
## 80000200 bytes
## No copies / no new objects
bar <- function(X) X
foo <- function(X) bar(X)
## One new object
bar2 <- function(X) 2*X
foo2 <- function(X) bar2(X)
profmem::profmem(foo(X))
## Rprofmem memory profiling of:
## foo(X)
##
## Memory allocations:
## bytes calls
## total 0
profmem::profmem(foo2(X))
## Rprofmem memory profiling of:
## foo2(X)
##
## Memory allocations:
## bytes calls
## 1 80000040 foo2() -> bar2()
## total 80000040

Recognizing language type arguments in function calls

I want to create an overloaded function that behaves differently given the arguments provided. For this, I need to check if the argument given is an existing object (e.g. data frame, list, integer) or an abstract formula (e.g. a + b, 2 * 4, y ~ x + y etc.). Below I paste what I would like it to recognize:
df <- data.frame(a, b)
f(df) # data.frame
f(data.frame(a, b)) # data frame
f(a + b) # expression
f("a + b") # character
f(2 * 2 + 7) # expression
f(I(2 * 2)) # integer
Is it possible to construct such a function? How? Unfortunately I wasn't able to find any references on the web or in the books on R programming I know.
The general way of overloading functions in R would be something like this:
f <- function(x) UseMethod("f")
f.default <- function(x) eval(substitute(x))
f.data.frame <- function(x) print("data frame")
It gives:
> f(df)
[1] "data frame"
> f(2 + 2)
[1] 4
> f(list(a, b))
[[1]]
[1] 1
[[2]]
[1] 2
So the problem with doing it like this is that I would have to name all the possible other data types rather than checking if x is an expression.
The same is with using:
f2 <- function(x) typeof(substitute(x))
because it evaluates function calls and expressions in the same manner:
> f2(2 + 2)
[1] "language"
> f2(df)
[1] "symbol"
> f2(data.frame(a, b))
[1] "language"
while I would like it to differentiate between list(a, b) and 2 + 2, because the first one is a list, and the second one is an expression.
I know that it would be easy with a classic R formula that is easily recognizable by R, but is it possible with different input?
Thanks!
It is the principle of object oriented langage in R. You should learn a bit more about it here:
https://www.stat.auckland.ac.nz/~stat782/downloads/08-Objects.pdf
http://brainimaging.waisman.wisc.edu/~perlman/R/A1%20Introduction%20to%20object-oriented%20programming.pdf
There are two types of objects in R: S3 and S4. S3 objects are easier to implement and more flexible. Their use is sufficient for what you want to do. You can use S3 generic functions.
I strongly advise you to learn more about these S3 and S4 classes, but to make it short, you can just look at the class of parameter you give to function f. This can be done thanks to function class.
You can separate your function f in different cases:
f <- function(a){
if (class(a) == 'data.frame'){
# do things...
}
else if (class(a) == 'formula'){
# do things...
}
else if (class(a) == 'integer'){
# do things...
}
else {
stop("Class no supported")
}
}
OK, it seems I tried to complicate it in a greater extent than I had to. The simple answer is just:
if (tryCatch(is.data.frame(x), error=function(z) FALSE)) {
# here do stuff with a data.frame
} else {
# here check the expression using some regular expressions etc.
}

Combining expressions in R

I am trying to combine multiple expressions in R into a single expression. Ideally, I would be able to do something like this:
g <- expression(exp(a[1]*x)/(1 + exp(a[1]*x)))
h <- expression(exp(a[2]*x)/(1 + exp(a[2]*x)))
c <- expression(g * h)
where a is a given vector of data and x is the only unknown (and it is the same unknown across all expressions). c would return
R> c
expression(exp(a[1]*x)/(1 + exp(a[1]*x)) * exp(a[2]*x)/(1 + exp(a[2]*x)))
Right now, when I do this I just get
R> c
expression(g * h)
I want to have an equation
(source: lehrfeld.me)
into which I could plug some vector a to obtain a function of x. What am I doing wrong here?
Don't use expressions, use functions. The
From what I can decipher, the following will do what you want
# a function for a vector `x` and single value `a`
func <- function(x,a) { (exp(1)^(a*x)/(1 + exp(1)^(a*x))) }
# a function for a vector `x` and vector length 2 for `a`
foo <- function(x, a){func(x,a[1]) * func(x, a[2])}
# call the function to calculate what you want.
foo(x,a)
And if you want the expression associated with this so you can plot the text of the equation, the following will work
expr <- expression(exp(1)^(a*x)/(1 + exp(1)^(a*x))
g <- do.call(substitute, list(as.list(expr)[[1]], env= list(a=3)))
h<- do.call(substitute, list(as.list(expr)[[1]], env= list(a=2)))
'%c%' <- function(a,b) bquote(.(a) %*% .(b))
fooExpr <- g %c% h
this is an old question but surprisingly, no easy answer has been given. As said in a comment, "R is not a symbolic algebra program"; however, R has all necessary means for manipulating expressions. I have no idea how to do it with expressions (in the technical sense, see ?expression) but it is trivially easy with calls:
g <- quote(exp(a[1]*x)/(1 + exp(a[1]*x)))
h <- quote(exp(a[2]*x)/(1 + exp(a[2]*x)))
substitute(g*h, list(g=g, h=h))
# exp(a[1] * x)/(1 + exp(a[1] * x)) * (exp(a[2] * x)/(1 + exp(a[2] * x)))
There are probably easier ways to achieve what you want (maybe using functions) but this is the easiest way to merge two "calls" (i.e. expressions in the "colloquial" sense as defined by R wizards).
Creating the expression from other expressions is more straightforward (IMO) using rlang, than base R. Use the !! (bang-bang) to force evaluation of an object within an expression.
library(rlang)
a <- c(2, 3)
g <- expr(exp(!!a[1] * x) / (1 + exp(!!a[1] * x)))
h <- expr(exp(!!a[2] * x) / (1 + exp(!!a[2] * x)))
c <- expr(!!g * !!h)
c
#> exp(2 * x)/(1 + exp(2 * x)) * (exp(3 * x)/(1 + exp(3 * x)))
Created on 2020-03-21 by the reprex package (v0.3.0)
You may want a function not an expression I think:
newfunc <- function(x) {
(exp(1)^(2*x)/(1 + exp(1)^(2*x))) *
(exp(1)^(3*x)/(1 + exp(1)^(3*x)))
}
a <- 1:10
newfunc(a)
[1] 0.8390245 0.9795856 0.9974043 0.9996585 0.9999543 0.9999938 0.9999992
[8] 0.9999999 1.0000000 1.0000000
If you want to chain together multiple functions explicitly, you could just do:
newfunc1 <- function(x) {
(exp(1)^(2*x)/(1 + exp(1)^(2*x)))
}
newfunc2 <- function(x) {
(exp(1)^(3*x)/(1 + exp(1)^(3*x)))
}
newfunc1(a) * newfunc2(a)
Keep in mind, as the help file at ?expression says:
‘Expression’ here is not being used in its colloquial sense, that
of mathematical expressions. Those are calls (see ‘call’) in R,
and an R expression vector is a list of calls, symbols etc, for
example as returned by ‘parse’.
You could define a binary function to combine expression objects in a slightly hacky way -- get their character representation, paste them with a *, then re-parse it:
"%c%" <- function(x, y) parse( text=paste(x, "*", y) )
gives the desired output when calling g %c% h, for example.
EDIT: Answer updated to correct previous error; thanks mnel!

Resources