Given any formula object (e.g., f) below, I was wondering how I could separate tilda sign and everything after it and convert it into a formula object?
My desired output only in this example case is: ~ es.type+weeks as a formula object.
NOTE: f could be ANY formula, the above f is just an example. I'm looking for a general solution.
f <- formula(dint ~ es.type+weeks) # Only as an example
g <- paste0(f[[1]], f[[3]]) # No success
as.formula(g) # No success
You can just manipulate the formula directly:
f <- y ~ x1 + x2:x3
f[[2]] <- f[[3]]
f[[3]] <- NULL
identical(f, ~ x1 + x2:x3)
# TRUE
An option is to drop the terms based on the number of terms in the formula
g <- formula(drop.terms(terms(f), 3))
g
#~es.type + weeks
f1 <- formula(dint ~ es.type:weeks)
formula(drop.terms(terms(f1), 3))
#~es.type:weeks
It would be better to create a function to applied for different formulas
form1 <- function(form) {
i1 <- length(terms(form)) + 1
formula(drop.terms(terms(form), i1))
}
f1 <- formula(dint ~ es.type+weeks+dd)
f2 <- formula(dint ~ es.type+weeks)
form1(f1)
#~es.type + weeks + dd
form1(f2)
#~es.type + weeks
If we need to add a new term
update(form1(f2), ~time +.)
#~time + es.type + weeks
Related
I am trying to update a formula for a linear model in R, based on names of variables that I have stored in an array. I am using substitute() for that and the code is as follows.
var = 'a'
covar = c('b', 'c')
covar = paste(c(var, covar), collapse = ' + ')
formula = substitute(condition ~ (1|subject) + v, list(v = as.name(covar)))
print(formula)
Output
condition ~ (1 | subject) + `a + b + c`
How do I remove the extra `` around a + b + c?
If I don't concatenate with paste, then it works, but I need those extra variables...
var = 'a'
formula = substitute(condition ~ (1|subject) + v, list(v = as.name(var)))
print(formula)
Output
condition ~ (1 | subject) + a
Both var and covar are char type.
Another solution that lets iteratively change v in formula that could also work
Assume that v is a term by itself (which is the case in the question) and the inputs shown in the Note at the end. Then here are two approaches.
1) update Use reformulate to create the formula ~ . - v + a + b + c and update the input formula with it.
update(fo, reformulate(c(". - v", var, covar)))
## condition ~ (1 | subject) + a + b + c
2) getTerms Another approach is to decompose the formula into terms using getTerms from this post, remove v, append var and covar and reformulate it back into a formula:
reformulate(c(setdiff(sapply(getTerms(fo[[3]]), format), "v"), var, covar), fo[[2]])
## condition ~ (1 | subject) + a + b + c
Note
The inputs are assumed to be:
var <- 'a'
covar <- c('b', 'c')
fo <- condition ~ (1 | subject) + v
Maybe I misunderstood what you are doing, but the following seems to work:
form <- 'condition ~ (1|subject) + v'
var <- 'a'
covar <- c('b', 'c')
Then combine with paste and turn to formula directly:
covar <- paste(var, paste(covar, collapse=" + "), sep=" + ")
form <- formula(paste(form, covar, sep=" + "))
Output:
condition ~ (1 | subject) + v + a + b + c
I can construct a formula that does what I desire starting with the character versions of terms in a formula, but I'm stumbling in starting with a formula object:
form1 <- Y ~ A + B
form1[-c(1,2)][[1]]
#A + B
Now how to build a formula object that looks like:
Y ~ poly(A, 2) + poly(B, 2) + poly(C, 2)
Or:
Y ~ pspline(A, 4) + pspline(B, 4) + pspline(C, 4)
Seems that it might involve a recursive walk along the RHS but I'm not getting progress. It just occurred to me that I might use
> attr( terms(form1), "term.labels")
[1] "A" "B"
And then use the as.formula(character-expr) approach, but I's sorly of like to see an lapply (RHS_form, somefunc) version of a polyize (or perhaps polymer?) function.
If I borrow some functions I originally wrote here, you could do something like this. First, the helper functions...
extract_rhs_symbols <- function(x) {
as.list(attr(delete.response(terms(x)), "variables"))[-1]
}
symbols_to_formula <- function(x) {
as.call(list(quote(`~`), x))
}
sum_symbols <- function(...) {
Reduce(function(a,b) bquote(.(a)+.(b)), do.call(`c`, list(...), quote=T))
}
transform_terms <- function(x, f) {
symbols_to_formula(sum_symbols(sapply(extract_rhs_symbols(x), function(x) do.call("substitute",list(f, list(x=x))))))
}
And then you can use
update(form1, transform_terms(form1, quote(poly(x, 2))))
# Y ~ poly(A, 2) + poly(B, 2)
update(form1, transform_terms(form1, quote(pspline(x, 4))))
# Y ~ pspline(A, 4) + pspline(B, 4)
There's a formula.tools package that provides various utility functions for working with formulas.
f <- y ~ a + b
rhs(f) # a + b
x <- get.vars(rhs(f)) # "a" "b"
r <- paste(sprintf("poly(%s, 4)", x), collapse=" + ") # "poly(a, 4) + poly(b, 4)"
rhs(f) <- parse(text=r)[[1]]
f # y ~ poly(a, 4) + poly(b, 4)
I have the following ANOVA in R which works great:
fit <- aov(dependent1 ~ X + Z + X*Z, data=dataset)
drop1(fit,~.,test="F")
"dependent1", "X", and "Z" are the column names.
I want to make a for loop where I loop over a certain amount of dependent variables, and I tried this:
dependent_variables <- c("dependent1", "dependent2", "dependent3")
for (i in dependent_variables) {
fit <- aov(i ~ X + Z + X*Z, data=dataset)
drop1(fit,~.,test="F")
}
If I run this, I get an error message:
Error in model.frame.default(formula = i ~ X + Z + X * :
variable lengths differ (found for 'X')
Any idea what goes wrong here?
Example data (which may or may not fulfil the criteria for an ANOVA)
X <- rnorm(100)
Z <- rnorm(100)
dependent1 <- rnorm(100)
dependent2 <- rnorm(100)
dependent3 <- rnorm(100)
dataset <- cbind(data.frame(X, Z, dependent1, dependent2, dependent3))
The following script would work, you need to put in the row column numbers of your dependent variables:
for (i in 3:5) {
fit <- aov(dataset[ , i] ~ X + Z + X*Z, data=dataset)
drop <- drop1(fit,~.,test="F")
print(fit)
print(drop)
}
Why not loop through data instead of looping through names? Perhaps this is a bit clunkier than what you're trying to do.
Create data
dependent1 = runif(100);
dependent2 = runif(100);
dependent3 = runif(100);
dataset = data.frame(X=1:100, Z=rnorm(1,1,100))
Run single ANOVA
fit = aov(dependent1 ~ X + Z + X*Z, data=dataset)
drop1(fit,~.,test="F")
cbind the dependents together and loop over them, storing results in list objects
d = cbind(dependent1, dependent2, dependent3)
fit = list(); drop = list()
for (i in 1:ncol(d)) {
fit[[i]] = aov(d[,i] ~ X + Z + X*Z, data=dataset)
drop[[i]] = drop1(fit[[i]],~.,test="F")
}
** Edited: called fit instead of fit[[i]]. Sorry about that.
I'm tring to use neuralnet for prediction.
Create some X:
x <- cbind(seq(1, 50, 1), seq(51, 100, 1))
Create Y:
y <- x[,1]*x[,2]
Give them a names
colnames(x) <- c('x1', 'x2')
names(y) <- 'y'
Make data.frame:
dt <- data.frame(x, y)
And now, I got error
model <- neuralnet(y~., dt, hidden=10, threshold=0.01)
error in terms.formula(formula) : '.' in formula and no 'data'
argument
For example, in lm(linear model) this is worked.
As my comment states, this looks like a bug in the non-exported function neuralnet:::generate.initial.variables. As a work around, just build a long formula from the names of dt, excluding y, e.g.
n <- names(dt)
f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))
f
## gives
> f
y ~ x1 + x2
## fit model using `f`
model <- neuralnet(f, data = dt, hidden=10, threshold=0.01)
> model
Call: neuralnet(formula = f, data = dt, hidden = 10, threshold = 0.01)
1 repetition was calculated.
Error Reached Threshold Steps
1 53975276.25 0.00857558698 1967
Offering a simpler alternative to the previous answer, you can create a formula from names of dt using reformulate():
f <- reformulate(setdiff(colnames(dt), "y"), response="y")
reformulate() doesn't require the use of paste() and automatically adds the terms together.
To expand a formula
f <- formula(terms(f, data= dt))
or even shorter
f <- formula(dt, f)
where f is the formula and dt is the data.
For instance, the original formula could be:
f <- as.formula("y ~ .")
While investigating some fundamentals of multiple regression, I decided to try and compare my manual efforts to those of the "effects" package, by John Fox. I've generated variables with some relationships, and want to get adjusted means for a factor when controlling for the influence of a continuous variable.
I have become stalled, however, as the effect function in the effects package returns an error "invalid type (builtin) for variable 'c'"
When I check the type of variable 'c' using typeof(c), I'm told it is of type double, as I constructed it to be.
What could be the cause of this error?
Is the variable 'c' being coerced for some reason to type 'builtin'?
Here is my code:
set.seed(1986)
y <- rnorm(100)
f <- sapply(y, function(x) if(x < 0) 1 else 2)
f.f <- as.factor(f)
set.seed(1987)
c <- rnorm(100, 0, .1) + y + f
an3 <- lm(y ~ f.f + c); summary(an3)
ef <- effect("f.f", an3)
c is not a good choice for a a variable name. It's an extremely commonly-used built-in function in R.
Changing c to d works for me:
set.seed(1986)
y <- rnorm(100)
f <- sapply(y, function(x) if(x < 0) 1 else 2)
f.f <- as.factor(f)
set.seed(1987)
d <- rnorm(100, 0, .1) + y + f
an3 <- lm(y ~ f.f + d); summary(an3)
library(effects)
ef <- effect("f.f", an3)
ef
f.f effect
f.f
1 2
0.5504214 -0.3231941
Another option is to store the data in a data.frame; this has other benefits as well, especially if one is working with multiple data sets.
set.seed(1986)
d <- data.frame(y=rnorm(100))
d <- within(d, {
f <- sapply(y, function(x) if(x < 0) 1 else 2)
f.f <- as.factor(f)
set.seed(1987)
c <- rnorm(100, 0, .1) + y + f
})
library(effects)
an3 <- lm(y ~ f.f + c, data=d); summary(an3)
ef <- effect("f.f", an3)
ef
# f.f effect
# f.f
# 1 2
# 0.5504214 -0.3231941