removing offset terms from a formula - r

R has a handy tool for manipulating formulas, update.formula(). This works nicely when you want to get something like "formula containing all terms in previous formula except x", e.g.
f1 <- z ~ a + b + c
(f2 <- update.formula(f1, . ~ . - c))
## z ~ a + b
However, this doesn't seem to work with offset terms:
f3 <- z ~ a + offset(b)
update(f3, . ~ . - offset(b))
## z ~ a + offset(b)
I've dug down as far as terms.formula, which ?update.formula references:
[after substituting, ...] The result is then simplified via ‘terms.formula(simplify = TRUE)’.
terms.formula(z ~ a + offset(b) - offset(b), simplify=TRUE)
## z ~ a + offset(b)
(i.e., this doesn't seem to remove offset(b) ...)
I know I can hack up a solution either by using deparse() and text-processing, or by processing the formula recursively to remove the term I don't want, but these solutions are ugly and/or annoying to implement. Either enlightenment as to why this doesn't work, or a reasonably compact solution, would be great ...

1) Recursion Recursively descend through the formula replacing offset(...) with offset and then remove offset using update. No string manipulation is done and although it does require a number of lines of code it's still fairly short and does remove single and multiple offset terms.
If there are multiple offsets one can preserve some of them by setting preserve so, for example, if preserve = 2 then the second offset is preserved and any others are removed. The default is to preserve none, i.e. remove them all.
no.offset <- function(x, preserve = NULL) {
k <- 0
proc <- function(x) {
if (length(x) == 1) return(x)
if (x[[1]] == as.name("offset") && !((k<<-k+1) %in% preserve)) return(x[[1]])
replace(x, -1, lapply(x[-1], proc))
}
update(proc(x), . ~ . - offset)
}
# tests
no.offset(z ~ a + offset(b))
## z ~ a
no.offset(z ~ a + offset(b) + offset(c))
## z ~ a
Note that if you don't need the preserve argument then the line
initializing k can be omitted and the if simplified to:
if (x[[1]] == as.name("offset")) return(x[[1]])
2) terms this neither uses string manipulation directly nor recursion. First get the terms object, zap its offset attribute and fix it using fixFormulaObject which we extract out of the guts of terms.formula. This could be made a bit less brittle by copying the source code of fixFormulaObject into your source and removing the eval line below. preserve acts as in (1).
no.offset2 <- function(x, preserve = NULL) {
tt <- terms(x)
attr(tt, "offset") <- if (length(preserve)) attr(tt, "offset")[preserve]
eval(body(terms.formula)[[2]]) # extract fixFormulaObject
f <- fixFormulaObject(tt)
environment(f) <- environment(x)
f
}
# tests
no.offset2(z ~ a + offset(b))
## z ~ a
no.offset2(z ~ a + offset(b) + offset(c))
## z ~ a
Note that if you don't need the preserve argument then the line that
zaps the offset attribute can be simplified to:
attr(tt, "offset") <- NULL

This seems to be by design. But a simple workaround is
offset2 = offset
f3 <- z ~ a + offset2(b)
update(f3, . ~ . - offset2(b))
# z ~ a
If you need the flexibility to accept formulae that do include offset(), for example if the formula is provided by a package user who may be unaware of the need to use offset2 in place of offset, then we should also add a line to change any instances of offset() in the incoming formula:
f3 <- z ~ a + offset(b)
f4 <- as.formula(gsub("offset\\(", "offset2(", deparse(f3)))
f4 <- update(f4, . ~ . - offset2(b))
# finally, just in case there are any references to offset2 remaining, we should revert them back to offset
f4 <- as.formula(gsub("offset2\\(", "offset(", deparse(f4)))
# z ~ a

Related

Replace expressions in a source file from another source file in R

Hello I have the following problem
Say I have a file base.R
x <- 1
# comment
y <- Y ~ X1 +
X2
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
and another file override.R
x <- 2
y <- Y ~ X1 + X3
my goal would be to create another file new.R which is essentially base.R overriden by override.R
x <- 2
# comment
y <- Y ~ X1 + X3
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
Obviously if all expressions in base.R were 1 liners I would be able to use sed but unfortunately it's not the case.
Note that I only need it to work for assignations lhs <- rhs either if ideally lhs = rhs would work as well.
EDIT: the above is a minimization of my actual problem
Sometime a difficult problem is best made easier by redefining the problem itself. In the following we suggest a number of approaches some of which have particularly simple implementations. In (7) we provide code that does what the question asks but you may prefer to change the problem slightly and use simpler code in one of the other solutions we provide.
1) omit first few lines in base.R & concatenate files We will assume that override.R should override everything in base.R up to the last statement to be overriden in base.R. Comments in base.R after the overridden statements will be kept as well any comments in override.R . Thus in the example comments 2 will be kept and comments will be overridden but could be replicated in override.R if desired which seems reasonable since you can't assume that a comment that applies to the assignment in base.R also applies in override.R .
Determine the number of statements n in override.R. Then parse base.R and find the last line number prior to the first line not to be overridden, ix. Then in the lines ending in that line number find the last non-comment line number, mx. Now write out override.R followed by all but the first mx lines of base.R . In the code below replace stdout() with the desired name of the output file, e.g. "outfile.R" .
library(utils)
n <- length(parse("override.R"))
g <- getParseData(parse("base.R"))
ix <- g$line1[grep("^0", g$parent)][n + 1] - 1
baseLines <- readLines("base.R")
is_comment <- grepl("^\\s*#", head(baseLines, ix))
mx <- max(which(!is_comment))
overrideLines <- readLines("override.R")
writeLines(c(overrideLines, tail(baseLines, -mx)), stdout())
giving:
x <- 2
y <- Y ~ X1 + X3
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
2) comment out rather thqan omit One alternative would be to comment out the overridden lines rather than omitting them. We can readily do that by replacing the writeLines statement with the statement below. This will allow one to see both the comments in base.R, if any, and the comments in override.R .
writeLines(c(overrideLines, sub("^", "# ", head(baseLines, mx)),
tail(baseLines, -mx)), stdout())
giving:
x <- 2
y <- Y ~ X1 + X3
# x <- 1
# # comment
# y <- Y ~ X1 +
# X2
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
3) separator If you control base.R then a simpler approach is to mark the end of the portion to be overriden. Suppose we put #--- on a line by itself in base.R between the portion to override and the rest. Then we have the following which is simpler:
overrideLines <- readLines("override.R")
baseLines <- readLines("base.R")
ix <- grep("#---", baseLines)[1]
writeLines(c(overrideLines, tail(baseLines, -ix)), stdout())
4) exists or possibly, in base.R, check if x has already been defined and only define it if not. Ditto for y. Then it is just a matter of concatenating the two files or sourcing one after the other.
if (!exists("x")) x <- ...whatever...
if (!exists("y")) y <- ...whatever...
5) function Yet another possibility is to define a function whose defaults are the current values of x and y in base.R. Then we can call it as f() to get the defaults or specify them.
f <- function(x = ..., y = ...) { ...base.R code except x and y ...}
6) Omit definitions from base.R Perhaps the simplest alternative is just to omit the definitions from base.R and for each run have a override.R that is sourced first or concatenated.
7) Keep base.R comments This one does what the question asked but it is a bit complex and you may prefer one of the other solutions.
library(codetools)
library(utils)
baseLines <- readLines("base.R")
overrideLines <- readLines("override.R")
p_o <- parse("override.R")
g_o <- getParseData(p_o)
locals_o <- findLocalsList(p_o)
ipos <- sapply(locals_o, function(x) which(g_o$text == x)[1]-1)
DFo <- cbind(g_o[ipos, ], var = names(ipos))
p_b <- parse("base.R")
g_b <- getParseData(p_b)
ipos <- sapply(locals_o, function(x) which(g_b$text == x)[1]-1)
DFb <- data.frame(g_b[ipos, ], var = names(ipos), row.names = NULL)
o <- order(-DFb$line1)
DFb <- DFb[o, ]
newLines <- baseLines
for(i in 1:nrow(DFb)) {
j <- match(DFb$var[i], DFo$var)
newLines <- append(newLines,
overrideLines[DFo$line1[j]:DFo$line2[j]], DFb$line2[i])
newLines <- newLines[-(DFb$line1[i]:DFb$line2[i])]
}
writeLines(newLines, stdout())
giving:
x <- 2
# comment
y <- Y ~ X1 + X3
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
If you can accept comments being stripped, then this might suffice for you:
Starting with base.R:
x <- 1
# comment
y <- Y ~ X1 +
X2
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
and override.R:
x <- 2
y <- Y ~ X1 + X3
We can run:
base <- parse("base.R")
override <- parse("override.R")
base_assignment <-
sapply(base, function(z) as.character(z[[1]]) %in% c("<-", "="))
base_lhs <- mapply(function(assigned, z) as.character(z[[2]]),
base_assignment, base)
override_assignment <-
sapply(override, function(z) as.character(z[[1]]) %in% c("<-", "="))
override_lhs <- mapply(function(assigned, z) as.character(z[[2]]),
override_assignment, override)
matches <- match(base_lhs, override_lhs)
base[which(!is.na(matches))] <- override[na.omit(matches)]
writeLines(paste(do.call(c, lapply(base, deparse)), collapse = "\n"), "new.R")
and now we have new.R with
x <- 2
y <- Y ~ X1 + X3
z <- function(x) {
x + 1
}
t <- z(x)
For conversation, in order to retain comments we'd likely need to use getParseData:
iterate over $parent and $id so that our $line1 references can be combined, store this reduced line1 into a new variable (since we'll need to remove the originals from getParseData(base);
find all references to $token == "SYMBOL" where there exists $token == "LEFT_ASSIGN" later in each expression. This starts to hobble it a little in the instance we have "EQ_ASSIGN" or, more of a challege, "RIGHT_ASSIGN" (since the presumed order of symbols changes);
step 2 helps us find object names to which assignments occur, which we use to compare between base/override processing;
replace the subset of each versions' parsed frame;
find a way to recombine the resulting parsed frame into a source file.
I ran out of time trying to get this to work elegantly/robustly, so I offer it as an example of effort-required in order to retain comments.
I suggest that if your intent is to allow a single source file of overriding expressions, it makes sense to keep the base.R untouched (as in your question) and create a temporary new.R that is used and sourced and discarded, in which case its comments are tangential.
This would be very challenging with sed, but you could try using awk; this works with the example data:
awk 'BEGIN{FS="<-|="} NR==FNR{a[$1]=$0; next}; {if($1 in a){c=1} else if (/[#<=]/){c++}; if(c == 1){print a[$1]} else {print $0}}' override.R base.R
x <- 2
# comment
y <- Y ~ X1 + X3
# comment 2
z <- function(x) {
x + 1
}
t <- z(x)
Basically, if the LHS from override.R is found in base.R the 'counter' is set to 1, then if any of the "#<=" characters are encountered in the following lines the counter is incremented. Then all lines in base.R with counter == 1 are replaced with the corresponding line from override.R. I can't think of cases where this would fail, but I'd be interested to see if it holds up on more complicated examples.
Formatted with further explanation:
awk 'BEGIN{FS="<-|="} # set the field separator to either "<-" or "="
NR==FNR{a[$1]=$0; next} # load override.R into an array, key = 1st field (i.e. the LHS)
{
if($1 in a){c=1} # if the LHS from override.R is found, set "c" (counter) to 1
else if(/[#<=]/){c++} # if the line contains "#", "<" or "=", increment counter
if(c==1){print a[$1]} # if c equals 1 (i.e. the LHS is present) print the override.R line
else {print $0} # else print the base.R line
}' override.R base.R

(R) How can a bigger expression be written by combining shorter variable expressions?

Brief introduction, I have multiple data file that can be fitted by models based in mathematical equations that are a combination (by sum, mutiplication, etc...) of other shorter mathematical equations.
As an example I have three short equations:
expression1 <- exp(x)
expression2 <- exp(x^2)
expression3 <- exp(1/x)
Because each data file has it's own bigger expression that produce a better fit, I want to be able to generate these larger expressions as a combitaion of the shorter expressions.
What I want is to be able to write something like this:
expression1(1) + expression1(2) * expression2(3) + expression2(1)
And get:
x1 + x2 * x3^2 + 1/x1
Later I will use this larger equations to find the values x1, x2, x3 that better fits one data file.
1) Presumably you meant expression rather than exp. Making that change we define e1, e2, e3 and e. esub is a function which replaces a variable name with another in an expression. gsubfn in the package of the same name is like gsub except the second argument can be a function (possibly expressed using formula notation as we do here) which takes the capture groups in the pattern as arguments and replaces the entire pattern with the output of the function. We deparse e, use gsubfn and parse it back.
library(gsubfn)
e1 <- expression(x)
e2 <- expression(x^2)
e3 <- expression(1/x)
e <- expression(e1(1) + e1(2) * e2(3) + e2(1))
esub <- function(expr, env) do.call("substitute", list(expr, env))
g <- gsubfn("(\\w+)[(](\\w+)[)]",
~ deparse(esub(get(x)[[1]], list(x = as.name(paste0("x", y))))),
deparse(e))
parse(text = g)[[1]]
## expression(x1 + x2 * x3^2 + x1^2)
2) If it were desired to use strings instead of expressions it is even shorter:
library(gsubfn)
s1 <- "x"
s2 <- "x^2"
s3 <- "1/x"
s <- "s1(1) + s1(2) * s2(3) + s2(1)"
gsubfn("(\\w+)[(](\\w+)[)]", x + y ~ gsub("\\bx\\b", paste0("x", y), get(x)), s)
## [1] "x1 + x2 * x3^2 + x1^2"
exp is the exponential function, so your code doesn't do what you think it does. To get something like this to work probably needs the creation of a new S3 class with print and Ops methods:
make_exp <- function(expr) {
expr <- match.call()$expr
function(x) structure(list(val = do.call(substitute, list(expr))),
class = "ex")
}
print.ex <- function(x, ...) print(x$val)
Ops.ex <- function(e1, e2) structure(list(val = call(.Generic, e1$val, e2$val)),
class = "ex")
This allows
expression1 <- make_exp(x)
expression2 <- make_exp(x^2)
expression3 <- make_exp(1/x)
expression1(x1) + expression2(x2) * expression3(x3)
#> x1 + x2^2 * (1/x3)

Dynamically update formula with vector under R 4.0.0 and higher

I would like to dynamically update a formula with a vector under R 4.0.0 or higher. Hence everything is the same as under this link - R: Dynamically update formula - but x is now a vector, and the R version is >= 4.0.
In short, I have a formula, e.g. y ~ 1, and would like to update it via the character scalar myvar1 or the character vector myvar2.
For character vectors and R versions >= 4.0.0, the suggested solution fails for character vectors. In the release note for 4.0.0, this is mentioned as a bug fix. The note states that "formula(x) with length(x) > 1 character vectors, is deprecated now. Such use has been rare, and has ‘worked’ as expected in some cases only. In other cases, wrong x have silently been truncated, not detecting previous errors." (https://cran.r-project.org/doc/manuals/r-patched/NEWS.html)
myvar1 <- "x1"
myvar2 <- c("x1", "x2")
update(y ~ 1 , paste(" ~ . +", myvar1))
# y ~ x1
update(y ~ 1 , paste(" ~ . +", myvar2))
# y ~ x1
# Warning message:
# Using formula(x) is deprecated when x is a character vector of length > 1.
Consider formula(paste(x, collapse = " ")) instead.
So, how would I update a formula with a character vector in R 4.0.0 and higher?
What you need to do now is to manipulate your string so that you obtain: "~ . + x1 + x2".
myvar2 <- c("x1", "x2")
formula_update <- paste(
"~ . +",
paste(myvar2, collapse = " + ")
)
formula_update
[1] "~ . + x1 + x2"
update(y ~ 1, formula_update)
y ~ x1 + x2
Would you consider
reformulate(myvar1, response="y")
an acceptable substitute? (It doesn't do exactly the same thing but might work.)
You could also make your update-argument a formula ...
update(y~1, reformulate(c(".",myvar1)))

Symbolic derivatives on formulas

In R, I would like a way to take symbolic derivatives of the right hand side of formulas which may include interaction terms, squared terms, etc.
For example, I would like to be able to take the derivative of the right hand side of each of the following two [edit:three] formulas with respect to x:
y~x+I(x^2)
y~x:z
EDIT: y~x*z
I would like a function which, when each of the above three formulas are input, returns 1+2x, z, and 1+z, respectively.
I've tried the following:
f1<-y~x+I(x^2)
deriv(f1,"x")
## Error in deriv.formula(f1, "x") : Function 'I' is not in the derivatives table
f2<-y~x:z
deriv(f2,"x")
## Error in deriv.formula(f2, "x") : Function '`:`' is not in the derivatives table
Is there any way to force R to recognize I(x^2) (or, similarly, I(x*z), etc.) as x^2 (respectively, x*z), x:z as x*z (in the mathematical sense), and x*z (in the formula sense) as x+z+x*z (in the mathematical sense) for purposes of calculating the derivative?
Second, is there a way to take the output from deriv() and reshape it to look like the right hand side of a formula? In particular, I know that D() will alleviate this issue and generate output in the form I desire (though D() can't handle a formula as input), but what if I want to take derivatives with respect to multiple variables? I can work around this by applying D() over and over for each variable I'd like to take the derivative with respect to, but it would be nice to simply input a character string of all such variables and receive output suitable to be placed on the right hand side of a formula.
Thank you!
If you have a formula expression you can work with it using substitute():
substitute( x~x:z+x:y , list(`:`=as.name("*") ) )
x ~ x * z + x * y
And this will let you pass an expression object to substitute with it first being evaluated (which would otherwise not happen since substitute does not evaluate its first argument):
form1 <- expression(x ~ x : z + x : y)
rm(form2)
form2 <- do.call('substitute' , list(form , list(`:`=as.name("*") ) ))
form2
# expression(x ~ x * z + x * y)
This shows how to "reshape" the RHS so that y ~ x:z is handled like ~ x*z by extracting the RHS from its list structure where the tilde operator is being treated as a function and the LHS is the second element in (~ , <LHS>, <RHS>):
f2<-y~x:z
substar <- function(form) {
do.call('substitute' , list(form , list(`:`=as.name("*") ) )) }
f3 <- substar(f2)
deriv(f3[[3]],"x")
#----------------------
expression({
.value <- x * z
.grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
.grad[, "x"] <- z
attr(.value, "gradient") <- .grad
.value
})
If you want to work with expressions it may help to understand that they are organized like lists and that the operators are really Lisp-like functions:
> Z <- y~x+I(x^2)
> Z
y ~ x + I(x^2)
> Z[[1]]
`~`
> Z[[2]]
y
> Z[[3]]
x + I(x^2)
> Z[[3]][[1]]
`+`
> Z[[3]][[2]]
x
> Z[[3]][[3]]
I(x^2)
> Z[[3]][[3]][[1]]
I
> Z[[3]][[3]][[2]]
x^2
> Z[[3]][[3]][[2]][[1]]
`^`
If you want to see a function that will traverse an expression tree, the inimitable Gabor Grothendieck constructed one a few years ago in Rhelp: http://markmail.org/message/25lapzv54jc4wfwd?q=list:org%2Er-project%2Er-help+eval+substitute+expression
the help file of deriv (?deriv)says that expr argument in deriv function is a "
A expression or call or (except D) a formula with no lhs" . So you can't use left hand side of the equation in an expression.
On the second part of the question, if I correctly understood your question, you can do something like this: say your rhs is x^2+y^2 and you need to take partial derivative of this expression with x and y:
myexp <- expression((x^2) + (y^2))
D.sc.x <- D(myexp, "x")
> D.sc.x
2 * x
D.sc.y <- D(myexp, "y")
> D.sc.y
2 * y
In one line:
lapply(as.list(c("x","y")),function(a)D(myexp,a))
[[1]]
2 * x
[[2]]
2 * y

Is there a better alternative than string manipulation to programmatically build formulas?

Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous.
I'm writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use as.formula.
But before I go that route, I just want to make sure that I'm not overlooking some cleaner way of doing it that would allow the function to accept formulas in the standard R format (e.g. extracted from other formula-using objects).
I want something like...
> LHS <- y~1; RHS <- ~a+b; c(LHS,RHS);
y ~ a + b
> RHS2 <- ~c;
> c(LHS, RHS, RHS2);
y ~ a + b + c
or...
> LHS + RHS;
y ~ a + b
> LHS + RHS + RHS2;
y ~ a + b + c
...but unfortunately neither syntax works. Does anybody know if there is something that does? Thanks.
reformulate will do what you want.
reformulate(termlabels = c('x','z'), response = 'y')
## y ~ x + z
Or without an intercept
reformulate(termlabels = c('x','z'), response = 'y', intercept = FALSE)
## y ~ x + z - 1
Note that you cannot construct formulae with multiple reponses such as x+y ~z+b
reformulate(termlabels = c('x','y'), response = c('z','b'))
z ~ x + y
To extract the terms from an existing formula (given your example)
attr(terms(RHS), 'term.labels')
## [1] "a" "b"
To get the response is slightly different, a simple approach (for a single variable response).
as.character(LHS)[2]
## [1] 'y'
combine_formula <- function(LHS, RHS){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response)
}
combine_formula(LHS, list(RHS, RHS2))
## y ~ a + b + c
## <environment: 0x577fb908>
I think it would be more sensible to specify the response as a character vector, something like
combine_formula2 <- function(response, RHS, intercept = TRUE){
.terms <- lapply(RHS, terms)
new_terms <- unique(unlist(lapply(.terms, attr, which = 'term.labels')))
response <- as.character(LHS)[2]
reformulate(new_terms, response, intercept)
}
combine_formula2('y', list(RHS, RHS2))
you could also define a + operator to work with formulae (update setting an new method for formula objects)
`+.formula` <- function(e1,e2){
.terms <- lapply(c(e1,e2), terms)
reformulate(unique(unlist(lapply(.terms, attr, which = 'term.labels'))))
}
RHS + RHS2
## ~a + b + c
You can also use update.formula using . judiciously
update(~a+b, y ~ .)
## y~a+b

Resources