Symbolic derivatives on formulas - r

In R, I would like a way to take symbolic derivatives of the right hand side of formulas which may include interaction terms, squared terms, etc.
For example, I would like to be able to take the derivative of the right hand side of each of the following two [edit:three] formulas with respect to x:
y~x+I(x^2)
y~x:z
EDIT: y~x*z
I would like a function which, when each of the above three formulas are input, returns 1+2x, z, and 1+z, respectively.
I've tried the following:
f1<-y~x+I(x^2)
deriv(f1,"x")
## Error in deriv.formula(f1, "x") : Function 'I' is not in the derivatives table
f2<-y~x:z
deriv(f2,"x")
## Error in deriv.formula(f2, "x") : Function '`:`' is not in the derivatives table
Is there any way to force R to recognize I(x^2) (or, similarly, I(x*z), etc.) as x^2 (respectively, x*z), x:z as x*z (in the mathematical sense), and x*z (in the formula sense) as x+z+x*z (in the mathematical sense) for purposes of calculating the derivative?
Second, is there a way to take the output from deriv() and reshape it to look like the right hand side of a formula? In particular, I know that D() will alleviate this issue and generate output in the form I desire (though D() can't handle a formula as input), but what if I want to take derivatives with respect to multiple variables? I can work around this by applying D() over and over for each variable I'd like to take the derivative with respect to, but it would be nice to simply input a character string of all such variables and receive output suitable to be placed on the right hand side of a formula.
Thank you!

If you have a formula expression you can work with it using substitute():
substitute( x~x:z+x:y , list(`:`=as.name("*") ) )
x ~ x * z + x * y
And this will let you pass an expression object to substitute with it first being evaluated (which would otherwise not happen since substitute does not evaluate its first argument):
form1 <- expression(x ~ x : z + x : y)
rm(form2)
form2 <- do.call('substitute' , list(form , list(`:`=as.name("*") ) ))
form2
# expression(x ~ x * z + x * y)
This shows how to "reshape" the RHS so that y ~ x:z is handled like ~ x*z by extracting the RHS from its list structure where the tilde operator is being treated as a function and the LHS is the second element in (~ , <LHS>, <RHS>):
f2<-y~x:z
substar <- function(form) {
do.call('substitute' , list(form , list(`:`=as.name("*") ) )) }
f3 <- substar(f2)
deriv(f3[[3]],"x")
#----------------------
expression({
.value <- x * z
.grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
.grad[, "x"] <- z
attr(.value, "gradient") <- .grad
.value
})
If you want to work with expressions it may help to understand that they are organized like lists and that the operators are really Lisp-like functions:
> Z <- y~x+I(x^2)
> Z
y ~ x + I(x^2)
> Z[[1]]
`~`
> Z[[2]]
y
> Z[[3]]
x + I(x^2)
> Z[[3]][[1]]
`+`
> Z[[3]][[2]]
x
> Z[[3]][[3]]
I(x^2)
> Z[[3]][[3]][[1]]
I
> Z[[3]][[3]][[2]]
x^2
> Z[[3]][[3]][[2]][[1]]
`^`
If you want to see a function that will traverse an expression tree, the inimitable Gabor Grothendieck constructed one a few years ago in Rhelp: http://markmail.org/message/25lapzv54jc4wfwd?q=list:org%2Er-project%2Er-help+eval+substitute+expression

the help file of deriv (?deriv)says that expr argument in deriv function is a "
A expression or call or (except D) a formula with no lhs" . So you can't use left hand side of the equation in an expression.
On the second part of the question, if I correctly understood your question, you can do something like this: say your rhs is x^2+y^2 and you need to take partial derivative of this expression with x and y:
myexp <- expression((x^2) + (y^2))
D.sc.x <- D(myexp, "x")
> D.sc.x
2 * x
D.sc.y <- D(myexp, "y")
> D.sc.y
2 * y
In one line:
lapply(as.list(c("x","y")),function(a)D(myexp,a))
[[1]]
2 * x
[[2]]
2 * y

Related

(R) How can a bigger expression be written by combining shorter variable expressions?

Brief introduction, I have multiple data file that can be fitted by models based in mathematical equations that are a combination (by sum, mutiplication, etc...) of other shorter mathematical equations.
As an example I have three short equations:
expression1 <- exp(x)
expression2 <- exp(x^2)
expression3 <- exp(1/x)
Because each data file has it's own bigger expression that produce a better fit, I want to be able to generate these larger expressions as a combitaion of the shorter expressions.
What I want is to be able to write something like this:
expression1(1) + expression1(2) * expression2(3) + expression2(1)
And get:
x1 + x2 * x3^2 + 1/x1
Later I will use this larger equations to find the values x1, x2, x3 that better fits one data file.
1) Presumably you meant expression rather than exp. Making that change we define e1, e2, e3 and e. esub is a function which replaces a variable name with another in an expression. gsubfn in the package of the same name is like gsub except the second argument can be a function (possibly expressed using formula notation as we do here) which takes the capture groups in the pattern as arguments and replaces the entire pattern with the output of the function. We deparse e, use gsubfn and parse it back.
library(gsubfn)
e1 <- expression(x)
e2 <- expression(x^2)
e3 <- expression(1/x)
e <- expression(e1(1) + e1(2) * e2(3) + e2(1))
esub <- function(expr, env) do.call("substitute", list(expr, env))
g <- gsubfn("(\\w+)[(](\\w+)[)]",
~ deparse(esub(get(x)[[1]], list(x = as.name(paste0("x", y))))),
deparse(e))
parse(text = g)[[1]]
## expression(x1 + x2 * x3^2 + x1^2)
2) If it were desired to use strings instead of expressions it is even shorter:
library(gsubfn)
s1 <- "x"
s2 <- "x^2"
s3 <- "1/x"
s <- "s1(1) + s1(2) * s2(3) + s2(1)"
gsubfn("(\\w+)[(](\\w+)[)]", x + y ~ gsub("\\bx\\b", paste0("x", y), get(x)), s)
## [1] "x1 + x2 * x3^2 + x1^2"
exp is the exponential function, so your code doesn't do what you think it does. To get something like this to work probably needs the creation of a new S3 class with print and Ops methods:
make_exp <- function(expr) {
expr <- match.call()$expr
function(x) structure(list(val = do.call(substitute, list(expr))),
class = "ex")
}
print.ex <- function(x, ...) print(x$val)
Ops.ex <- function(e1, e2) structure(list(val = call(.Generic, e1$val, e2$val)),
class = "ex")
This allows
expression1 <- make_exp(x)
expression2 <- make_exp(x^2)
expression3 <- make_exp(1/x)
expression1(x1) + expression2(x2) * expression3(x3)
#> x1 + x2^2 * (1/x3)

Vectors with sigma notation (R)

I'm now learning R and have some difficulties while computing sigma notation. I know how to do the basic stuff like this:
summ <- 10:100
sum(summ^3 + 4 * summ^2)
But I don't know how to do the same operations with the values that differ from i (include not only i (ex: x and y)) or operations with two sigma notations in a row.
At the beginning I thought that it just requires to do the same as in the simple sigma notation with only i's
summ <- 1:10
sum((x^summ) / (y^summ))
But it shows an error that it is not a numeric argument.
Thank you in advance for your help.
For you second formula, you can define a function like below
f <- function(x,y,n) sum((x/y)**(1:n))
For you last formula, you can rewrite the expression as a product of two terms (you need a math transformation as the first step if you want to simplify the procedure), since i and j are independent
> sum((1:20)**2)*sum(1/(5+(1:10)**3))
[1] 886.0118
Otherwise, a straightforward translation from the formula could be using nested sapply
> sum(sapply(1:20,function(i) sapply(1:10, function(j) i**2/(5+j**3))))
[1] 886.0118
That's, basically, the answer to the first question with undefined variables x and y:
x <- readline(prompt = "Enter x: ")
y <- readline(prompt = "Enter y: ")
x <- as.integer(x)
y <- as.integer(y)
i = 1:10
answer <- sum((x^i) / (y^i))
answer

removing offset terms from a formula

R has a handy tool for manipulating formulas, update.formula(). This works nicely when you want to get something like "formula containing all terms in previous formula except x", e.g.
f1 <- z ~ a + b + c
(f2 <- update.formula(f1, . ~ . - c))
## z ~ a + b
However, this doesn't seem to work with offset terms:
f3 <- z ~ a + offset(b)
update(f3, . ~ . - offset(b))
## z ~ a + offset(b)
I've dug down as far as terms.formula, which ?update.formula references:
[after substituting, ...] The result is then simplified via ‘terms.formula(simplify = TRUE)’.
terms.formula(z ~ a + offset(b) - offset(b), simplify=TRUE)
## z ~ a + offset(b)
(i.e., this doesn't seem to remove offset(b) ...)
I know I can hack up a solution either by using deparse() and text-processing, or by processing the formula recursively to remove the term I don't want, but these solutions are ugly and/or annoying to implement. Either enlightenment as to why this doesn't work, or a reasonably compact solution, would be great ...
1) Recursion Recursively descend through the formula replacing offset(...) with offset and then remove offset using update. No string manipulation is done and although it does require a number of lines of code it's still fairly short and does remove single and multiple offset terms.
If there are multiple offsets one can preserve some of them by setting preserve so, for example, if preserve = 2 then the second offset is preserved and any others are removed. The default is to preserve none, i.e. remove them all.
no.offset <- function(x, preserve = NULL) {
k <- 0
proc <- function(x) {
if (length(x) == 1) return(x)
if (x[[1]] == as.name("offset") && !((k<<-k+1) %in% preserve)) return(x[[1]])
replace(x, -1, lapply(x[-1], proc))
}
update(proc(x), . ~ . - offset)
}
# tests
no.offset(z ~ a + offset(b))
## z ~ a
no.offset(z ~ a + offset(b) + offset(c))
## z ~ a
Note that if you don't need the preserve argument then the line
initializing k can be omitted and the if simplified to:
if (x[[1]] == as.name("offset")) return(x[[1]])
2) terms this neither uses string manipulation directly nor recursion. First get the terms object, zap its offset attribute and fix it using fixFormulaObject which we extract out of the guts of terms.formula. This could be made a bit less brittle by copying the source code of fixFormulaObject into your source and removing the eval line below. preserve acts as in (1).
no.offset2 <- function(x, preserve = NULL) {
tt <- terms(x)
attr(tt, "offset") <- if (length(preserve)) attr(tt, "offset")[preserve]
eval(body(terms.formula)[[2]]) # extract fixFormulaObject
f <- fixFormulaObject(tt)
environment(f) <- environment(x)
f
}
# tests
no.offset2(z ~ a + offset(b))
## z ~ a
no.offset2(z ~ a + offset(b) + offset(c))
## z ~ a
Note that if you don't need the preserve argument then the line that
zaps the offset attribute can be simplified to:
attr(tt, "offset") <- NULL
This seems to be by design. But a simple workaround is
offset2 = offset
f3 <- z ~ a + offset2(b)
update(f3, . ~ . - offset2(b))
# z ~ a
If you need the flexibility to accept formulae that do include offset(), for example if the formula is provided by a package user who may be unaware of the need to use offset2 in place of offset, then we should also add a line to change any instances of offset() in the incoming formula:
f3 <- z ~ a + offset(b)
f4 <- as.formula(gsub("offset\\(", "offset2(", deparse(f3)))
f4 <- update(f4, . ~ . - offset2(b))
# finally, just in case there are any references to offset2 remaining, we should revert them back to offset
f4 <- as.formula(gsub("offset2\\(", "offset(", deparse(f4)))
# z ~ a

Sum function in R

I want to compute a simple sum, but not from 1 to the value that I put in the sum function, instead I want it to sum like I would normally do in math, where I have an expression which has some variable, that I then change from 1:4, and then R is suppose to sum the expression values.
Like
y = function(x) x**2
sum(y(x),x=3:5) = 3^2+4^2+5^2
How do I do this in R?
You almost had it, just pass the 3:5 directly to y:
> y <- function(x) x**2
> sum(y(3:5))
[1] 50
You can create a custom function:
mysum <- function(f,vals) sum(f(vals))
mysum(y,3:5)
# [1] 50
While this is not standard in R, there are uses for passing function and arguments separately:
sapply(list(sqrt=sqrt,log=log,sin=sin),mysum,vals=1:3)
# sqrt log sin
# 4.146264 1.791759 1.891888
If your function doesn't accept a vector, then you'll need to use an apply function. In base R:
y <- function(x) x^2
sum(sapply(1:4, y))
or
sum(Vectorise(y)(1:4))
Assign the values to x beforehand and than sum the result of your function. So like this:
y = function(x) x^2
x = 3:5
sum(y(x))

Is there a better alternative than string manipulation to programmatically build formulas?

Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous.
I'm writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use as.formula.
But before I go that route, I just want to make sure that I'm not overlooking some cleaner way of doing it that would allow the function to accept formulas in the standard R format (e.g. extracted from other formula-using objects).
I want something like...
> LHS <- y~1; RHS <- ~a+b; c(LHS,RHS);
y ~ a + b
> RHS2 <- ~c;
> c(LHS, RHS, RHS2);
y ~ a + b + c
or...
> LHS + RHS;
y ~ a + b
> LHS + RHS + RHS2;
y ~ a + b + c
...but unfortunately neither syntax works. Does anybody know if there is something that does? Thanks.
reformulate will do what you want.
reformulate(termlabels = c('x','z'), response = 'y')
## y ~ x + z
Or without an intercept
reformulate(termlabels = c('x','z'), response = 'y', intercept = FALSE)
## y ~ x + z - 1
Note that you cannot construct formulae with multiple reponses such as x+y ~z+b
reformulate(termlabels = c('x','y'), response = c('z','b'))
z ~ x + y
To extract the terms from an existing formula (given your example)
attr(terms(RHS), 'term.labels')
## [1] "a" "b"
To get the response is slightly different, a simple approach (for a single variable response).
as.character(LHS)[2]
## [1] 'y'
combine_formula <- function(LHS, RHS){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response)
}
combine_formula(LHS, list(RHS, RHS2))
## y ~ a + b + c
## <environment: 0x577fb908>
I think it would be more sensible to specify the response as a character vector, something like
combine_formula2 <- function(response, RHS, intercept = TRUE){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response, intercept)
}
combine_formula2('y', list(RHS, RHS2))
you could also define a + operator to work with formulae (update setting an new method for formula objects)
`+.formula` <- function(e1,e2){
.terms <- lapply(c(e1,e2), terms)
reformulate(unique(unlist(lapply(.terms, attr, which = 'term.labels'))))
}
RHS + RHS2
## ~a + b + c
You can also use update.formula using . judiciously
update(~a+b, y ~ .)
## y~a+b

Resources