R substitute(), to substitute values in expression, is adding unnecessary quotes - r

I am trying to update a formula for a linear model in R, based on names of variables that I have stored in an array. I am using substitute() for that and the code is as follows.
var = 'a'
covar = c('b', 'c')
covar = paste(c(var, covar), collapse = ' + ')
formula = substitute(condition ~ (1|subject) + v, list(v = as.name(covar)))
print(formula)
Output
condition ~ (1 | subject) + `a + b + c`
How do I remove the extra `` around a + b + c?
If I don't concatenate with paste, then it works, but I need those extra variables...
var = 'a'
formula = substitute(condition ~ (1|subject) + v, list(v = as.name(var)))
print(formula)
Output
condition ~ (1 | subject) + a
Both var and covar are char type.
Another solution that lets iteratively change v in formula that could also work

Assume that v is a term by itself (which is the case in the question) and the inputs shown in the Note at the end. Then here are two approaches.
1) update Use reformulate to create the formula ~ . - v + a + b + c and update the input formula with it.
update(fo, reformulate(c(". - v", var, covar)))
## condition ~ (1 | subject) + a + b + c
2) getTerms Another approach is to decompose the formula into terms using getTerms from this post, remove v, append var and covar and reformulate it back into a formula:
reformulate(c(setdiff(sapply(getTerms(fo[[3]]), format), "v"), var, covar), fo[[2]])
## condition ~ (1 | subject) + a + b + c
Note
The inputs are assumed to be:
var <- 'a'
covar <- c('b', 'c')
fo <- condition ~ (1 | subject) + v

Maybe I misunderstood what you are doing, but the following seems to work:
form <- 'condition ~ (1|subject) + v'
var <- 'a'
covar <- c('b', 'c')
Then combine with paste and turn to formula directly:
covar <- paste(var, paste(covar, collapse=" + "), sep=" + ")
form <- formula(paste(form, covar, sep=" + "))
Output:
condition ~ (1 | subject) + v + a + b + c

Related

How to collapse vectors using paste to create function with both * and +

I have two vectors
s.null <- c("a1","a2","a3")
s.full <- c("a1","a2","a3","b","c","d")
I am comparing a null model to a full model, the null where I'm creating my null model like so:
null.model <- lm(as.formula(paste(colnames(data)[i], "~",
paste(s.null, collapse = "*"),
sep = ""
)),
data=data)
So that my model looks like
some_data_value ~ a1 * a2 *a3
ie the interaction between terms a1,a2,and a3 is accounted for in the null model.
I would like to do the same thing in the full model but have the covariates b, c, and d also present:
some_data_value ~ a1 * a2 *a3 + b +c + d
but I am unsure of how to use paste to accomplish this. If anyone could offer a solution as to how I could use paste to accomplish this I would be very grateful!
We can use either paste
paste('some_data_value ~ ', paste(s.full[1:3], collapse=" * "),
"+", paste(s.full[4:6], collapse = " + "))
Or in a slightly different way to construct the formula
paste('some_data_value ~ ', paste(matrix(s.full, ncol =2),
rep(c(' * ', ' + ', ''), c(2, 3, 1)), collapse=''))
or str_c
library(stringr)
glue::glue('some_data_value ~ {str_c(s.full[1:3], collapse = " * ")} + {str_c(s.full[4:6], collapse= " + ")}')
#some_data_value ~ a1 * a2 * a3 + b + c + d
Or may be use reformulate with update
update(reformulate(setdiff(s.full, s.null), 'some_data_value'), paste(". ~ . *", s.null))
Maybe you can use null model terms to construct full model formula
formula(paste('some_data_value ~', paste(s.null, collapse = " * "), '+',
paste(setdiff(s.full, s.null), collapse = " + ")))
#some_data_value ~ a1 * a2 * a3 + b + c + d

Subsetting formula objects in R

Given any formula object (e.g., f) below, I was wondering how I could separate tilda sign and everything after it and convert it into a formula object?
My desired output only in this example case is: ~ es.type+weeks as a formula object.
NOTE: f could be ANY formula, the above f is just an example. I'm looking for a general solution.
f <- formula(dint ~ es.type+weeks) # Only as an example
g <- paste0(f[[1]], f[[3]]) # No success
as.formula(g) # No success
You can just manipulate the formula directly:
f <- y ~ x1 + x2:x3
f[[2]] <- f[[3]]
f[[3]] <- NULL
identical(f, ~ x1 + x2:x3)
# TRUE
An option is to drop the terms based on the number of terms in the formula
g <- formula(drop.terms(terms(f), 3))
g
#~es.type + weeks
f1 <- formula(dint ~ es.type:weeks)
formula(drop.terms(terms(f1), 3))
#~es.type:weeks
It would be better to create a function to applied for different formulas
form1 <- function(form) {
i1 <- length(terms(form)) + 1
formula(drop.terms(terms(form), i1))
}
f1 <- formula(dint ~ es.type+weeks+dd)
f2 <- formula(dint ~ es.type+weeks)
form1(f1)
#~es.type + weeks + dd
form1(f2)
#~es.type + weeks
If we need to add a new term
update(form1(f2), ~time +.)
#~time + es.type + weeks

R substitute function prints c() concatenated into equation

I had an old function that worked like a charm:
lm_eqn = function(m) {
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute(italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2~"="~r2,l)
as.character(as.expression(eq));
}
where m was an lm model. This would produce an equation like the following:
y = 0.3 + 4.4x, r = 0.67
which could then be used in a ggplot to show the model formula with its graph. The problem is that the same equation now incorporates uncalled for symbols:
y = c(0.3) + c(4.4)x, r=0.67
The concatenated c() is now included for each variable from the list I am accruing - and I don't know why. Does anyone know how to
a) prevent this, or
b) correct it?
Note: the problem seems to emerge in substitution, the output of eq is:
"italic(y) == c(`(Intercept)` = \"0.3\") + c(x = \"4.4\") %.% italic(x) * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.67\""
It looks like substitute's output includes the c() for the intercept and slope.
edit
m in this case is a generic lm element. For example
x <- c(5,3,6,8,2,6)
y <- c(2,6,3,7,4,9)
test.lm <- lm(y~x)
lm_eqn(test.lm)
[1] "italic(C)[i] == c(`(Intercept)` = \"3.3\") + c(x = \"0.37\") %.% italic(I)[i] * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.0969\""
You apparently need to unname the coef() values:
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- bquote( italic(C)[i] == .(l$a) + .(l$b) %.% italic(I)[i]*","~~italic(r)^2~"="~.(l$r2))
as.character(as.expression(eq));
}
I also think you need to clarify exactly what you are hoping to see. At the moment you are creating an expression vector with two elements and then you are converting that to a character. The fact that ggplot requires character values for its "expressions" makes it quite difficult to look at a character value and figure out what will be displayed, so you should probably expand your test code to include that manner in which this value will be delivered. (It's much easier to look at a real R expression.) I think there are mechanisms that allow unevaluated expressions to be passed to ggplot annotations and titles but they seem incredibly convoluted to my eyes.
Could also use substitute which requires specifying a list that has named elements.
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute( italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2 == r2, env=l) )
as.character(as.expression(eq));
}
lm_eqn(test.lm)
[1] "italic(C)[i] == \"3.3\" + \"0.37\" %.% italic(I)[i] * \",\" ~ ~italic(r)^2 == \"0.0969\""

Add a " ' " around any number of functions in a formula/string

The Simple Request:
I'd like to take formulas/strings similar to the following:
"A ~ 1 + B + C + L(diff(B), -k:k) + L(diff(C), -k:k)"
and change them to treat functions as character as follows:
"A ~ 1 + B + C + `L(diff(B), -k:k)` + `L(diff(C), -k:k)`"
There may be any number of "L(diff(___), -____:____)" in a string.
The Background:
This is so that I may take the output models made with dynlm and run them with functions that rely on "lm" objects only.
# package
library(dynlm)
# data
A <- as.ts(rnorm(20, 10, 2))
B <- as.ts(A + rnorm(20, 6, 2))
C <- as.ts(rnorm(20, 3, 1))
# lags/leads
k <- 1
# dynlm model
dyn.mod <- dynlm(A ~ 1 + B + C + L(diff(B), -k:k) + L(diff(C), -k:k))
# capture the formula and data
dyn.mod.call <- gsub(" ", "", paste(deparse(dyn.mod$call$formula), collapse = "")) # just in case formula is too long
dyn.mod.model <- dyn.mod$model # the matrix that was created from the call formula
# Do the following
lm(dyn.mod.call, data = dyn.mod.model) # Will not run obviously,
lm(A ~ 1 + B + C + `L(diff(B), -k:k)` + `L(diff(C), -k:k)`, data = dyn.mod.model) # will run
# how do I change
dyn.mod.call
# [1] "A ~ 1 + B + C + L(diff(B), -k:k) + L(diff(C), -k:k)"
# to ad " ` " around each dynlm "L()" function so the process is not manual?
Thanks for your help.
Note that we don't really want to replace all function calls since + is a function (and even ~ can be regarded as a function) but we only want to replace just certain ones. Suppose that the only function call that we want to process is L. Modify the second if appropriately according to what it is that is to be matched. The function shown works recursively. No packages are used.
enquote_L <- function(x) {
if (length(x) == 1) return(x)
if (x[[1]] == as.name("L")) return(as.name(format(x)))
replace(x, -1, lapply(x[-1], enquote_L))
}
s <- "A ~ 1 + B + C + L(diff(B), -k:k) + L(diff(C), -k:k)"
enquote_L(as.formula(s))
## A ~ 1 + B + C + `L(diff(B), -k:k)` + `L(diff(C), -k:k)`
ADDED
If there were a variety of functions and + and ~ were the only ones not to be processed then a variation might be to replace the second if with:
if (x[[1]] != as.name("+") && x[[1]] != as.name("~")) return(as.name(format(x)))
You can use string manipulation to change the formula.
x <- deparse(A ~ 1 + B + C + L(diff(B), -k:k) + L(diff(C), -k:k))
parts <- unlist(strsplit(x, " \\+ "))
parts <- c(parts[1:3], paste0("`", parts[4:5], "`"))
as.formula(paste(parts, collapse = " + "))

How to wrap RHS terms of a formula with a function

I can construct a formula that does what I desire starting with the character versions of terms in a formula, but I'm stumbling in starting with a formula object:
form1 <- Y ~ A + B
form1[-c(1,2)][[1]]
#A + B
Now how to build a formula object that looks like:
Y ~ poly(A, 2) + poly(B, 2) + poly(C, 2)
Or:
Y ~ pspline(A, 4) + pspline(B, 4) + pspline(C, 4)
Seems that it might involve a recursive walk along the RHS but I'm not getting progress. It just occurred to me that I might use
> attr( terms(form1), "term.labels")
[1] "A" "B"
And then use the as.formula(character-expr) approach, but I's sorly of like to see an lapply (RHS_form, somefunc) version of a polyize (or perhaps polymer?) function.
If I borrow some functions I originally wrote here, you could do something like this. First, the helper functions...
extract_rhs_symbols <- function(x) {
as.list(attr(delete.response(terms(x)), "variables"))[-1]
}
symbols_to_formula <- function(x) {
as.call(list(quote(`~`), x))
}
sum_symbols <- function(...) {
Reduce(function(a,b) bquote(.(a)+.(b)), do.call(`c`, list(...), quote=T))
}
transform_terms <- function(x, f) {
symbols_to_formula(sum_symbols(sapply(extract_rhs_symbols(x), function(x) do.call("substitute",list(f, list(x=x))))))
}
And then you can use
update(form1, transform_terms(form1, quote(poly(x, 2))))
# Y ~ poly(A, 2) + poly(B, 2)
update(form1, transform_terms(form1, quote(pspline(x, 4))))
# Y ~ pspline(A, 4) + pspline(B, 4)
There's a formula.tools package that provides various utility functions for working with formulas.
f <- y ~ a + b
rhs(f) # a + b
x <- get.vars(rhs(f)) # "a" "b"
r <- paste(sprintf("poly(%s, 4)", x), collapse=" + ") # "poly(a, 4) + poly(b, 4)"
rhs(f) <- parse(text=r)[[1]]
f # y ~ poly(a, 4) + poly(b, 4)

Resources