I have a data frame and a formula stored in variables:
> d <- data.frame(cls=1, foo=2, bar=3)
> f <- formula(cls ~ .)
I'd like to remove one variable from the RHS of this formula programatically (in my code, the name of this variable would be passed somewhere as a string). I tried using update.formula:
> update(f, .~.-foo)
Error in terms.formula(tmp, simplify = TRUE) :
'.' in formula and no 'data' argument
Then I tried providing the data argument:
> update(f, .~.-foo, data=d)
Error in terms.formula(tmp, simplify = TRUE) :
'.' in formula and no 'data' argument
I know the above would work if the initial formula didn't have a dot on the right side:
> f <- formula(cls ~ foo + bar)
> update(f, .~.-foo)
cls ~ bar
How do I remove a variable from RHS of a formula if I can't ensure that RHS doesn't contain a dot?
update(terms(f, data = d), . ~ . - foo)
# cls ~ bar
Related
I am trying to create a function that allows me to pass outcome and predictor variable names as strings into the lm() regression function. I have actually asked this before here, but I learned a new technique here and would like to try and apply the same idea in this new format.
Here is the process
library(tidyverse)
# toy data
df <- tibble(f1 = factor(rep(letters[1:3],5)),
c1 = rnorm(15),
out1 = rnorm(15))
# pass the relevant inputs into new objects like in a function
d <- df
outcome <- "out1"
predictors <- c("f1", "c1")
# now create the model formula to be entered into the model
form <- as.formula(
paste(outcome,
paste(predictors, collapse = " + "),
sep = " ~ "))
# now pass the formula into the model
model <- eval(bquote( lm(.(form),
data = d) ))
model
# Call:
# lm(formula = out1 ~ f1 + c1, data = d)
#
# Coefficients:
# (Intercept) f1b f1c c1
# 0.16304 -0.01790 -0.32620 -0.07239
So this all works nicely, an adaptable way of passing variables into lm(). But what if we want to apply special contrast coding to the factorial variable? I tried
model <- eval(bquote( lm(.(form),
data = d,
contrasts = list(predictors[1] = contr.treatment(3)) %>% setNames(predictors[1])) ))
But got this error
Error: unexpected '=' in:
" data = d,
contrasts = list(predictors[1] ="
Any help much appreciated.
Reducing this to the command generating the error:
list(predictors[1] = contr.treatment(3))
Results in:
Error: unexpected '=' in "list(predictors[1] ="
list() seems to choke when the left-hand side naming is a variable that needs to be evaluated.
Your approach of using setNames() works, but needs to be wrapped around the list construction step itself.
setNames(list(contr.treatment(3)), predictors[1])
Output is a named list containing a contrast matrix:
$f1
2 3
1 0 0
2 1 0
3 0 1
Given data and a variable name:
seed = 1253
dat = data.frame(x = c(1:4, NA), y = rnorm(5), extra = rnorm(5))
var = "extra"
I would like to create a model.frame with all three variables, when only two are specified in formula. This could be done with expanding dots as:
model.frame(y ~ x, dat, var = extra)
# y x (var)
# 1 1.0447865 1 1.4039139
# 2 1.8088280 2 -0.1656416
# 3 0.9614491 3 -0.8215288
# 4 -1.6359538 4 1.0751587
However, I need to be able to add columns to a model.frame from a character string. My attempt:
model.frame(y ~ x, dat, var = var)
returns an error message:
Error in model.frame.default(y ~ x, dat, var = var) :
variable lengths differ (found for '(var)')
How to add additional variables to a model.frame from a character string vector of column names? Alternatively, is it possible to expand model.response and model.matrix with variables that are not present in formula?
I have a function that inputs a data.frame and outputs the residual version of it with some chosen variable as predictor.
residuals.DF = function(data, resid.var, suffix="") {
lm_f = function(x) {
x = residuals(lm(data=data, formula= x ~ eval(parse(text=resid.var))))
}
resid = data.frame(apply(data,2,lm_f))
colnames(resid) = paste0(colnames(data),suffix)
return(resid)
}
set.seed(31233)
df = data.frame(Age = c(1,3,6,7,3,8,4,3,2,6),
Var1 = c(19,45,76,34,83,34,85,34,27,32),
Var2 = round(rnorm(10)*100))
df.res = residuals.DF(df, "Age", ".test")
df.res
Age.test Var1.test Var2.test
1 -1.696753e-17 -25.1351351 -90.20582
2 -1.318443e-19 -0.8108108 31.91892
3 -5.397735e-18 27.6756757 84.10603
4 -5.927747e-18 -15.1621622 -105.83160
5 -3.807699e-18 37.1891892 -57.08108
6 -6.457759e-18 -16.0000000 -25.76923
7 5.117344e-17 38.3513514 -65.01871
8 -3.807699e-18 -11.8108108 35.91892
9 -3.277687e-18 -17.9729730 97.85655
10 -5.397735e-18 -16.3243243 94.10603
This works fine, however, I often need to use the eval parse combo when working with variable inputs to lm(), so I decided to write a wrapper function:
#Wrapper function for convenience for evaluating strings
evalparse = function(string) {
eval(parse(text=string))
}
This works fine when used alone, e.g.:
> evalparse("5+5")
[1] 10
However, if one uses it in the above function, one gets:
> df.res = residuals.DF(df, "Age", ".test")
Error in eval(expr, envir, enclos) : object 'Age' not found
I figure this is because the wrapper function means that the string gets evaluated in its own environment where the chosen variable is missing. This does not happen when using eval parse combo because it then happens in the lm() environment where the chosen variable is not missing.
Is there some clever solution to this problem? A better way of using dynamic formulas in lm()? Otherwise I will have to keep typing eval(parse(text=object)).
Anytime you're trying to perform operations that modify the contents of a formula, you should use update because it is designed for this purpose.
In your case, you want to modify your function as follows:
residuals.DF = function(data, resid.var, suffix="") {
lm_f = function(x) {
x = residuals(lm(data=data, formula= update(x ~ 0, paste0("~",resid.var))))
}
resid = data.frame(apply(data,2,lm_f))
colnames(resid) = paste0(colnames(data),suffix)
return(resid)
}
Basically, update (or the update.formula method specifically) takes a formula as its first argument, and then allows for modifications based on its second argument. To get a handle on it, check out the following examples:
f <- y ~ x
f
# y ~ x
update(f, ~ z)
# y ~ z
update(f, x ~ y)
# x ~ y
update(f, "~ x + y")
# y ~ x + y
update(f, ~ . + z + w)
# y ~ x + z + w
x <- "x"
update(f, paste0("~",x))
# y ~ x
As you can see, the second argument can be a formula or character string containing one or more variables. This greatly simplifies the creation of a dynamically modified formula where you are only trying to change one part of the formula.
I get the following error when I try to predict using lmer
> predict(mm1, newdata = TEST)
Error in terms.formula(formula(x, fixed.only = TRUE)) :
'.' in formula and no 'data' argument
This is what my formula looks like
> formula(mm1)
log_bid_price ~ . - zip_cbsa_name + (1 | zip_cbsa_name)
I'm able to summarize the model, but I can't pass it to the predict function.
I would like to be able to automatically generate a formula given the columns of the predictor matrix and then pass that to lmer. How would I do that?
You might have more success building formula objects like so:
resp <- "log_bid_price"
reserve.coef <- c("zip_cbsa_name")
RHS <- names(data)[-(which(names(data) %in% c(resp, reserve.coef))]
f <- paste0(paste(resp, paste(RHS, collapse="+"), sep= "~"), " + (1 | zip_cbsa_name)")
mm1 <- lmer(f, data= data)
eg.
paste0(paste("Y", paste(c("a", "b", "c"), collapse= "+"), sep="~"), "+ (1 | zip_cbsa_name)")
[1] "Y~a+b+c+ (1 | zip_cbsa_name)"
If you wish to do variable selection as you do model selection, you can iterate on this to produce your RHS object
ac0=c("WEIGHT","PREMENO","SMOKE") #this is the vector with names
ac1=glm(FRACTURE~PRIORFRAC+AGE+HEIGHT+MOMFRAC+RATERISK, data=glow, family='binomial')
ac2=update(ac1, formula.=~.+ac0[1])
Error is this:
Error in model.frame.default(formula = FRACTURE ~ PRIORFRAC + AGE + HEIGHT + :
invalid type (list) for variable 'ac0[1]'
You could create the formula and then pass it to update:
mF <- formula(paste(". ~ . +", ac0[1]))
update(ac1, mF)