how do i write this exponential model as a function in R? - r

For y = B0 + B1x, I can write it as lm(y ~ x). However I am not sure how to write y = B0eB1x into a model function in R.
I have tried lm(log(y) ~ x), lm(y ~ exp(x)), lm(y ~ log(x)), and lm(log(y) ~ log(x)), but I am not sure which is correct. I get different results for each model.

The two ways that you can do this that are actually faithful to the original statistical model (Gaussian errors with constant variance) are:
glm(y ~ x, family = gaussian(link = "log"), data = ...)
(but you'll have to exponentiate the intercept parameter) or
nls(y ~ b0*exp(b1*x), start = ..., data = ...)
(but you'll have to provide starting values in the form list(b0 = 1, b1 = 1) (for some sensible values).
y = b0*exp(b1*x) implies log(y) = log(b0) + b1*x, but transforming the response variable in this way will change the statistical model ... so lm(log(y) ~ x, data = ...) will give you similar but not identical answers to the preceding two recipes.

Related

How can I apply interaction between two variable for multiple non-linear regression with GAM?

I have a data with Y and X1, X2 which has different dimension, like X1 = xxxx Volt, X2 = xx hour.
To make regression model with this data, I used code below.
MODEL <- gam(Y ~ s(X1) + s(X2), data = DATA, method = "REML")
It seems to work well, but I want apply a interaction between X1 and X2 to my code.
Could I do the multiple non-linear regression with using a code like :
MODEL <- gam(Y ~ s(X1) + s(X2) + ti(X1, X2), data = DATA, method = "REML")
or should I use different equation to do this work?
The preferred way would be:
MODEL <- gam(Y ~ te(X1, X2), data = DATA, method = "REML")
as that entails selection of fewer smoothness parameters so it is a slightly simpler version of the model you showed.
However, if you want to see if the interaction is significant or otherwise want to separate out the main effects from the interaction, then yes, the model you showed is the way to go
MODEL <- gam(Y ~ s(X1) + s(X2) + ti(X1, X2), data = DATA, method = "REML")

force given coefficients in lm()

I am currently trying to fit a polynomial model to measurement data using lm().
fit_poly4 <- lm(y ~ poly(x, degree = 4, raw = T), weights = w)
with x as independent, y as dependent variable and w = 1/variance of the measurements.
I want to try a polynomial with given coefficients instead of the ones determined by R. Specifically I want my polynomial to be
y = -3,3583*x^4 + 43*x^3 - 191,14*x^2 + 328,2*x - 137,7
I tried to enter it as
fit_poly4 <- lm(y ~ 328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3,
weights = w)
but this just returns an error:
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
Is there a way to determine the coefficients in lm() and how would one do this?
I'm not sure why you want to do this, but you can use an offset term:
set.seed(101)
dd <- data.frame(x=rnorm(1000),y=rnorm(1000), w = rlnorm(1000))
fit_poly4 <- lm(y ~
-1 + offset(328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3),
data=dd,
weights = w)
the -1 suppresses the usual intercept term.

Error in terms.formula(formula, data = data) : invalid power in formula

I'm trying to take a transformation of the dose predictor, here is my code:
mod = glm(colonies ~ (as.numeric(as.factor(dose)))^(m), data = salmonella, family = "poisson")
where "m" is the power I use. However, I got an error
> mod = glm(colonies ~ (as.numeric(as.factor(dose)))^(m), data = salmonella, family = "poisson")
Error in terms.formula(formula, data = data) : invalid power in formula
Any one knows why?
Sorry for not being clear. Here my m is -0.18182 from an earlier calculation. I now understand I shouldn't use as.numeric(as.factor). But if the code is
mod = glm(colonies ~ (as.factor(dose))^(m), data = salmonella, family = "poisson")
The error is still here. It's weird because when I change m to 2, it works.
tl;dr my best guess is that you should use I(...^m) to protect ^/have R treat it as a numerical exponentiation operator.
I found salmonella in the faraway package, and can confirm your error. Indeed, it persists through a variety of simplifications.
m <- 1 ## same results with m <- 2L, etc.
mod = glm(colonies ~ (as.numeric(as.factor(dose)))^(m), data = salmonella, family = "poisson")
mod = glm(colonies ~ dose^(m), data = salmonella, family = "poisson")
mod = glm(colonies ~ dose^m, data = salmonella, family = "poisson")
mod = lm(colonies ~ dose^m, data = salmonella)
It looks like R's formula interface does not allow symbolic substitution in a power in a formula.
However: if what you are really trying to do is
convert dose to an evenly spaced integer value (0=1, 10=2, 33=3)
use a power of that dose as a predictor in a GLM
then using I() to specify that R should treat ^ as a numeric operator, not an interaction operator in a formula, is what you want:
ss <- transform(salmonella, numdose=as.numeric(as.factor(dose)))
mod = glm(colonies ~ I(numdose^m), data = ss, family = "poisson")
OTOH the picture shows that this isn't completely crazy (although also unnecessary):
library(ggplot2); theme_set(theme_bw())
m <- 2
ggplot(ss,aes(numdose,colonies))+
geom_point()+
geom_smooth(method="glm",method.args=list(family=poisson))+
geom_smooth(method="glm",method.args=list(family=poisson),
formula=y~I(x^m),colour="red")
ggsave("numdose.png")
If this is the salmonella dataset from package:'faraway' then you not use either as.factor or as.numeric on the dose value, since it's already numeric.
converting to factor will seriously distort the meaning of "dose"
Furthermore, the proper way to do polynomial models in R is to use the poly function rather than forming quadratic terms. If you insist on using "raw" quadratic terms, then it would be easier to do with poly but as Ben has suggested it should be with the I function
library(faraday)
m=2
mod = glm(colonies ~ I(dose^m), data = salmonella, family = "poisson")
Better, however, would be:
m=2; mod = glm(colonies ~ poly(dose, m), data = salmonella, family = "poisson")
Which will give you both the linear and the quadratic terms but the quadratic term will be done as an orthogonal polynomial which then lets you make proper inferences.

How do I code a piecewise mixed-model in lme in R?

I followed this example for running a piecewise mixed model using lmer, and it works very well. However, I am having trouble translating the model to lme because I need to deal with heteroscedasticity, and lmer doesn’t have that ability.
Code to reproduce the problem is here. I included details about the experimental design in the code if you think it’s necessary to answer the question.
Here is the model without the breakpoint:
linear <- lmer(mass ~ lat + (1 | pop/line), data = df)
And here is how I run it with the breakpoint:
bp = 30
b1 <- function(x, bp) ifelse(x < bp, x, 0)
b2 <- function(x, bp) ifelse(x < bp, 0, x)
breakpoint <- lmer(mass ~ b1(lat, bp) + b2(lat, bp) + (1 | pop/line), data = df)
The problem is that I have pretty severe heteroscedasticity. As far as I understand, that means I should be using lme from the nlme package. Here is the linear model in lme:
ctrl <- lmeControl(opt='optim')
linear2 <- lme(mass ~ lat , random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
And this is the breakpoint model that is, well, breaking:
breakpoint2 <- lme(mass ~ b1(lat, bp) + b2(lat, bp), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
Here is the error message:
Error in model.frame.default(formula = ~pop + mass + lat + bp + line, : variable lengths differ (found for 'bp')
How can I translate this lovely breakpoint model from lmer to lme? Thank you!
Looks like lme doesn't like it when you use variables in your formula that aren't in the data.frame you are fitting your model on. One option would be to build your formula first then pass it to lme. For example
myform <- eval(substitute(mass ~ b1(lat, bp) + b2(lat, bp), list(bp=bp)))
breakpoint2 <- lme(myform, random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
The eval()/substitute() is just to swap out the bp in your formula with the value of the variable bp
Or if bp were always 30, you would just put that directly in the formula
breakpoint2 <- lme(mass ~ b1(lat, 30) + b2(lat, 30), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
and that would work as well.

Using linear regression (lm) in R caret, how do I force the intercept through 0? [duplicate]

This question already has answers here:
Fit a no-intercept model in caret
(2 answers)
Closed 3 years ago.
I'm trying to use R caret to perform cross-validation of my linear regression models. In some cases I want to force the intercept through 0. I have tried the following, using the standard lm syntax:
regressControl <- trainControl(method="repeatedcv",
number = 4,
repeats = 5
)
regress <- train(y ~ 0 + x,
data = myData,
method = "lm",
trControl = regressControl)
Call:
lm(formula = .outcome ~ ., data = dat)
Coefficients:
(Intercept) x
-0.0009585 0.0033794 `
This syntax seems to work with the standard 'lm' function but not within the caret package. Any suggestions?
test <- lm(y ~ 0 + x,
data = myData)
Call:
lm(formula = y ~ 0 + x, data = myData)
Coefficients:
x
0.003079
You can take advantage of the tuneGrid parameter in caret::train.
regressControl <- trainControl(method="repeatedcv",
number = 4,
repeats = 5
)
regress <- train(mpg ~ hp,
data = mtcars,
method = "lm",
trControl = regressControl,
tuneGrid = expand.grid(intercept = FALSE))
Use getModelInfo("lm", regex = TRUE)[[1]]$param to see all the things you could have tweaked in tuneGrid (in the lm case, the only tuning parameter is the intercept). It's silly that you can't simply rely on formula syntax, but alas.

Resources