Fitting a polynomial regression model selected by `leaps::regsubsets`

Fitting a polynomial regression model selected by `leaps::regsubsets` - r

I have performed best subset selection of linear regression model using leaps::regsubsets. Then I chose the model with 14 predictors and using coef(model, 14) gave me the following output:
structure(c(16.1303774392893, -0.0787496652705482, -0.104929454314886,
-1.22322411065346, 1.14718778105312, 0.75468065020279, 0.455617836039703,
0.521951041899427, 0.0124590834643436, -0.0002293804247409,
1.26667965342874e-07, 1.4002805624594e-06, -9.90560347112683e-07,
1.8809273394337e-06, 5.48249071436573e-07), .Names = c("(Intercept)", "X1",
"X2", "poly(X4, 2)1", "poly(X5, 2)1", "poly(X6, 2)2", "poly(X7, 2)2",
"poly(X9, 2)1", "X10", "X12", "X13", "X14", "X16", "X17", "X18"))
To get this model, I need to fit it with lm. As poly(X, 2)1 is linear and poly(X, 2)2 is quadratic, I did:
lm(X20 ~ X1 + X2 + X4 + X5 + I(X6 ^ 2) + I(X7 ^ 2) +
X9 + X10 + X12 + X13 + X14 + X16 + X17 + X18, df)
I think I know why coefficients are different (see poly() in lm(): difference between raw vs. orthogonal), but why don't they give the same fitted values and adjusted R2?
Of course, using poly(X, 2)[,2] in the formula gives complete consistency with regsubsets output. But is it valid to use only second term orthogonal polynomial and specify the model as follows?
lm(X20 ~ X1 + X2 + X4 + X5 + poly(X6, 2)[,2] + poly(X7, 2)[,2] +
X9 + X10 + X12 + X13 + X14 + X16 + X17 + X18, df)
Is there more direct way to retrieve single model from regsubsets output than specifying the model by hand?

but why don't they give the same fitted values and adjusted R2?
Fitted values won't necessarily be the same, if you don't use all columns from poly.
set.seed(0)
y <- runif(100)
x <- runif(100)
X <- poly(x, 3)
all.equal(lm(y ~ X)$fitted, lm(y ~ x + I(x ^ 2) + I(x ^ 3))$fitted)
#[1] TRUE
all.equal(lm(y ~ X[, 1:2])$fitted, lm(y ~ x + I(x ^ 2))$fitted)
#[1] TRUE
all.equal(lm(y ~ X - 1)$fitted, lm(y ~ x + I(x ^ 2) + I(x ^ 3) - 1)$fitted) ## no intercept
#[1] "Mean relative difference: 33.023"
all.equal(lm(y ~ X[, c(1, 3)])$fitted, lm(y ~ x + I(x ^ 3))$fitted)
#[1] "Mean relative difference: 0.03008166"
all.equal(lm(y ~ X[, c(2, 3)])$fitted, lm(y ~ I(x ^ 2) + I(x ^ 3))$fitted)
#[1] "Mean relative difference: 0.03297488"
We only have ~ 1 + poly(x, degree)[, 1:k] equivalent to ~ 1 + x + I(x ^ 2) + ... + I(x ^ k), for any k <= degree. (I explicitly write out the intercept, to emphasize that we have to start from polynomial of degree 0.)
(The reason is related to how an orthogonal polynomial is generated. See How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned? for great great details. Note that when doing a QR factorization X = QR, as R is an upper triangular matrix (not a diagonal matrix), Q[, ind] will not have the same column space with X[, ind] for an arbitrary subset ind, unless ind = 1:k.)
So, I(x ^ 2) is not equivalent to ploy(x, 2)[, 2], and you will get different fitted values hence (adjusted) R2.
is it valid to use only second term orthogonal polynomial and specify the model as follows?
It is really a bad idea for leaps (or generally any modeler) to drop columns from an orthogonal polynomial. An orthogonal polynomial is a factor-alike term, whose significance is determined by F-statistics (i.e., treating all columns as a whole), rather than t-statistics for individual columns.
In fact, even for raw polynomials, it is not a good idea to omit any low order term. For example, y ~ 1 + I(x ^ 2) omitting linear term is not a good idea. A basic problem here is that it is not invariant to linear shift. For example, if we shift x for x1:
shift <- runif(1) ## an arbitrary value; can be `mean(x)`
x1 <- x - shift
then y ~ 1 + I(x ^ 2) is not equivalent to y ~ 1 + I(x1 ^ 2), but still, y ~ 1 + x + I(x ^ 2) is equivalent to y ~ 1 + x1 + I(x1 ^ 2).
all.equal(lm(y ~ 1 + I(x ^ 2))$fitted, lm(y ~ 1 + I(x1 ^ 2))$fitted)
#[1] "Mean relative difference: 0.02020984"
all.equal(lm(y ~ 1 + x + I(x ^ 2))$fitted, lm(y ~ 1 + x1 + I(x1 ^ 2))$fitted)
#[1] TRUE
I briefly mentioned the issue of dropping columns at R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?, but my examples here give you more insight.
Is there more direct way to retrieve single model from regsubsets output than specifying the model by hand?
I don't know; at least I did not figure it out almost 2 years ago when answering this thread: Get all models from leaps regsubsets.
One remaining question though. Assuming that leaps returns poly(X, 2)1 I should definitely retain poly(X, 2)1 in my model. But what if only poly(X, 2)1 is returned by leaps? Can higher order term can be dropped then?
There is no problem dropping higher order terms (in this case where you originally fitted a quadratic polynomial). As I said, we have equivalence for ind = 1:j, where j <= degree. But make sure you understand this. Take the following two examples.
If leaps drops poly(x, 5)3 and poly(x, 5)5. you can safely remove poly(x, 5)5, but are still advised to retain poly(x, 5)3. This is, instead of fitting an 5-th order polynomial, you fit a 4-th order one.
If leaps drops poly(x, 6)3 and poly(x, 6)5. Since poly(x, 6)6 is not dropped, you are advised to drop no terms at all.

Related

What is the difference between y ~ 1, y ~ 0 and y ~ -1 in R formulas?

In R formulas (e.g. for lm), which is the difference between y ~ 1, y ~ 0 and y ~ -1?

From the ?formula documentation:
The ‘-’ operator removes the specified terms, so that ‘(a+b+c)^2 -
a:b’ is identical to ‘a + b + c + b:c + a:c’. It can also used to
remove the intercept term: when fitting a linear model ‘y ~ x - 1’
specifies a line through the origin. A model with no intercept
can be also specified as ‘y ~ x + 0’ or ‘y ~ 0 + x’.
So:
y ~ 1 includes an intercept
y ~ 0 does not include an intercept
y ~ -1 does not include an intercept
The last two are functionally equivalent.

Is there a workaround for non-convergence in glmmTMB models using the dredge function of MuMin?

I've tried to use the dredge-function of the MuMin-Package for a negative binomial generalized linear mixed model fitted with the package glmmTMB.
Because my full-model failed to converge, I've tried the workaround as described here:
Dredge with the global model failing to converge
But when I use a simplified model and rewrite the function in model$call$function, dredge ignores this change and uses the simplified model instead of the full model.
Is there maybe another workaround for functions of the glmmTMB-package?
Below some example code:
# The full_model does not converge
full_model <- glmmTMB(y ~ x1 * x2 * (x3 + x4 + x5 + x6) + (1|RE1/RE1.1/RE1.2) + (1|RE2), data = df, family = "nbinom2")
# The simple_model does converge
simple_model <- glmmTMB(y ~ x1 + x2 + x3 + x4 + x5 + x6 + (1|RE1/RE1.1/RE1.2) + (1|RE2), data = df, family = "nbinom2")
# Change formula in the model
simple_model$call$formula <- y ~ x1 * x2 * (x3 + x4 + x5 + x6) + (1|RE1/RE1.1/RE1.2) + (1|RE2)
# use dredge, but this ignores the changed formula
dredge(simple_model)
Thank you!

You would have to replace the elements of simple_model $ modelInfo $ allForm. These are three formulas: "formula", "ziformula", and "dispformula", but in the case of your model only the first one is used.

Estimate equation with varying numbers of parameters in R (lm) with demeaned variables

I want to estimate an equation such as:
(where the bar denotes the mean of a variable.... Meaning, I want to automatically have interactions between Z and a demeaned version of X. So far I just demean the variables manually beforehand and estimate:
lm(Y ~ .*Z, data= sdata)
This seems to be working, but I would rather use a solution that does not require manual demeaning beforehand because I would also like to include the means of more complex terms, such as:
Edit:
As requested, a working code-sample, note that in the actual thing I have large (and varying) numbers of X- variables, so that I dont want to use a hard-coded variant:
x1 <- runif(100)
x2 <- runif(100)
Z <- runif(100)
Y <- exp(x1) + exp(x2) + exp(z)
##current way of estimating the first equation:
sdata <- data.frame(Y=Y,Z=Z,x1=x1-mean(x1),x2=x2-mean(x2))
lm(Y ~ .*Z, data= sdata)
##basically what I want is that the following terms, and their interactions with Z are also used:
# X1^2 - mean(X1^2)
# X2^2 - mean(X2^2)
# X1*X2 - mean(X1*X2)
Edit 2:
Now, what I want to achieve is basically what
lm(Y ~ .^2*Z, data= sdata)
would do. However, given prior demeaing expressions in there, such as: Z:X1:X2 would correspond to: (x1-mean(x1))*(x2-mean(x2)), while what I want to have is x1*x2-mean(x1*x2)

To show that scale works inside a formula:
lm(mpg ~ cyl + scale(disp*hp, scale=F), data=mtcars)
Call:
lm(formula = mpg ~ cyl + scale(disp * hp, scale = F), data = mtcars)
Coefficients:
(Intercept) cyl scale(disp * hp, scale = F)
3.312e+01 -2.105e+00 -4.642e-05
Now for comparison let's scale the interaction outside the formula:
mtcars$scaled_interaction <- with(mtcars, scale(disp*hp, scale=F))
lm(mpg ~ cyl + scaled_interaction, data=mtcars)
Call:
lm(formula = mpg ~ cyl + scaled_interaction, data = mtcars)
Coefficients:
(Intercept) cyl scaled_interaction
3.312e+01 -2.105e+00 -4.642e-05
At least in these examples, it seems as if scale inside formulae is working.
To provide a solution to your specific issue:
Alternative 1: Use formulae
# fit without Z
mod <- lm(Y ~ (.)^2, data= sdata[, names(sdata) != "Z" ])
vars <- attr(mod$terms, "term.labels")
vars <- gsub(":", "*", vars) # needed so that scale works later
vars <- paste0("scale(", vars, ", scale=F)")
newf <- as.formula(paste0("Y ~ ", paste0(vars, collapse = "+")))
# now interact with Z
f2 <- update.formula(newf, . ~ .*Z)
# This fives the following formula:
f2
Y ~ scale(x1, scale = F) + scale(x2, scale = F) + scale(x1*x2, scale = F) +
Z + scale(x1, scale = F):Z + scale(x2, scale = F):Z + scale(x1*x2, scale = F):Z
Alternative 2: Use Model Matrices
# again fit without Z and get model matrix
mod <- lm(Y ~ (.)^2, data= sdata[, names(sdata) != "Z" ])
modmat <- apply(model.matrix(mod), 2, function(x) scale(x, scale=F))
Here, all x's and the interactions are demeaned:
> head(modmat)
(Intercept) x1 x2 x1:x2
[1,] 0 0.1042908 -0.08989091 -0.01095459
[2,] 0 0.1611867 -0.32677059 -0.05425087
[3,] 0 0.2206845 0.29820499 0.06422944
[4,] 0 0.3462069 -0.15636463 -0.05571430
[5,] 0 0.3194451 -0.38668844 -0.12510551
[6,] 0 -0.4708222 -0.32502269 0.15144812
> round(colMeans(modmat), 2)
(Intercept) x1 x2 x1:x2
0 0 0 0
You can use the model matrix as follows:
modmat <- modmat[, -1] # remove intercept
lm(sdata$Y ~ modmat*sdata$Z)
It is not beautiful, but should do the work with any number of explanatory variables. You can also add Y and Z to the matrix so that the output looks prettier if this is a concern. Note that you can also create the model matrix directly without fitting the model. I took it from the fitted model directly since it have already fitted it for the first approach.
As a sidenote, it may be that this is not implemented in a more straight forward fashion because it is difficult to imagine situations in which demeaning the interaction is more desirable compared to the interaction of demeaned variables.
Comparing both approaches:
Here the output of both approaches for comparison. As you can see, apart from the coefficient names everything is identical.
> lm(sdata$Y ~ modmat*sdata$Z)
Call:
lm(formula = sdata$Y ~ modmat * sdata$Z)
Coefficients:
(Intercept) modmatx1 modmatx2 modmatx1:x2 sdata$Z
4.33105 1.56455 1.43979 -0.09206 1.72901
modmatx1:sdata$Z modmatx2:sdata$Z modmatx1:x2:sdata$Z
0.25332 0.38155 -0.66292
> lm(f2, data=sdata)
Call:
lm(formula = f2, data = sdata)
Coefficients:
(Intercept) scale(x1, scale = F) scale(x2, scale = F)
4.33105 1.56455 1.43979
scale(x1 * x2, scale = F) Z scale(x1, scale = F):Z
-0.09206 1.72901 0.25332
scale(x2, scale = F):Z scale(x1 * x2, scale = F):Z
0.38155 -0.66292

"lm" extended function. Linear model [duplicate]

This question already has an answer here:
How to add all variables its second degree in lm()? [duplicate]
(1 answer)
Closed 6 years ago.
I have a linear model with almost 0 Rsquare. I am making a function with 1 parameter n which describes the power transformation that is to be taken.
If n = 3 the model becomes:
y = x1 + x2 + x1^2 + x2^2 + x1^3 + x2^3
How can I enter these in the model without having to write it again and again?

you can use the function poly in the formula like this
set.seed(123)
dat <- data.frame(y=rnorm(10), x1=rnorm(10), x2=rnorm(10))
n <-3
fm <-lm(y ~ poly(x1, degree=n, raw=TRUE)+poly(x2, degree=n, raw=TRUE), data=dat)
summary(fm)
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.50796 0.81187 0.626 0.576
## poly(x1, degree = n, raw = TRUE)1 -0.54354 0.86195 -0.631 0.573
## poly(x1, degree = n, raw = TRUE)2 -0.66328 0.55169 -1.202 0.315
## poly(x1, degree = n, raw = TRUE)3 0.05989 0.35421 0.169 0.876
## poly(x2, degree = n, raw = TRUE)1 1.06890 1.00518 1.063 0.366
## poly(x2, degree = n, raw = TRUE)2 0.01655 0.76730 0.022 0.984
## poly(x2, degree = n, raw = TRUE)3 -1.18610 0.84214 -1.408 0.254
degree is of course the maximum degree of x1 and x2, raw=TRUE means that it is equivalent to x1 + I(x1^2) + ... , if raw=FALSE the polynomials will be orthogonals.
Note that the number at the end of the names of the coefficients represent the degree of the associated polynomials.
PS : you can use poly(x1, x2, degree=n, raw=TRUE), to write a similar formula which include interactions.

How can I specify a relationship between parameter estimates in lm?

Using lm, I would like to fit the model:
y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2
My question is:
How can I specify that the coefficient of the interaction should equal the multiplication of the coefficients the main effects?
I've seen that to set the coefficient to a specific value you can use offset() and I() but I don't know how to specify a relationship between coefficient.
Here is a simple simulated dataset:
n <- 50 # Sample size
x1 <- rnorm(n, 1:n, 0.5) # Independent variable 1
x2 <- rnorm(n, 1:n, 0.5) # Independent variable 2
b0 <- 1
b1 <- 0.5
b2 <- 0.2
y <- b0 + b1*x1 + b2*x2 + b1*b2*x1*x2 + rnorm(n,0,0.1)
To fit Model 1: y = b0 + b1*x1 + b2*x2 + b3*x1*x2, I would use:
summary(lm(y~ x1 + x2 + x1:x2))
But how do I fit Model 2: y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2?
One of the main differences between the two models is the number of parameters to estimate. In Model 1, we estimate 4 parameters: b0 (intercept), b1 (slope of var. 1), b2 (slope of var. 2), and b3 (slope for the interaction between vars. 1 & 2). In Model 2, we estimate 3 parameters: b0 (intercept), b1 (slope of var. 1 & part of slope of the interaction between vars. 1 & 2), and b2 (slope of var. 2 & part of slope of the interaction between vars. 1 & 2)
The reason why I want to do this is that when investigating whether there is a significant interaction between x1 & x2, model 2, y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, can be a better null model than y = b0 + b1*x1 + b2*x2.
Many thanks!
Marie

Because of the constraint that you impose on the coefficients, the model you specify is not a linear model and so lm can not be used to fit it. You would need to use a non-linear regression, such as nls.
> summary(nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1)))
Formula: y ~ b0 + b1 * x1 + b2 * x2 + b1 * b2 * x1 * x2
Parameters:
Estimate Std. Error t value Pr(>|t|)
b0 0.987203 0.049713 19.86 <2e-16 ***
b1 0.494438 0.007803 63.37 <2e-16 ***
b2 0.202396 0.003359 60.25 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1121 on 47 degrees of freedom
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.545e-06
You can really see that the model is non-linear when you re-write it as
> summary(nls(y ~ b0+(1+b1*x1)*(1+b2*x2)-1, start=list(b0=0, b1=1, b2=1)))
Formula: y ~ b0 + (1 + b1 * x1) * (1 + b2 * x2) - 1
Parameters:
Estimate Std. Error t value Pr(>|t|)
b0 0.987203 0.049713 19.86 <2e-16 ***
b1 0.494438 0.007803 63.37 <2e-16 ***
b2 0.202396 0.003359 60.25 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1121 on 47 degrees of freedom
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.25e-06

Brian provides a way to fit the constrained model you specify but if you're interested in if the unconstrained model fits better than your constrained model you use the delta method to test that hypothesis.
# Let's make some fake data where the constrained model is true
n <- 100
b0 <- 2
b1 <- .2
b2 <- -1.3
b3 <- b1 * b2
sigma <- 1
x1 <- rnorm(n)
# make x1 and x2 correlated for giggles
x2 <- x1 + rnorm(n)
# Generate data according to the model
y <- b0 + b1*x1 + b2*x2 + b3*x1*x2 + rnorm(n, 0, sigma)
# Fit full model y = b0 + b1*x1 + b2*x3 + b3*x1*x2 + error
o <- lm(y ~ x1 + x2 + x1:x2)
# If we want to do a hypothesis test of Ho: b3 = b1*b2
# this is the same as Ho: b3 - b1*b2 = 0
library(msm)
# Get estimate of the difference specified in the null
est <- unname(coef(o)["x1:x2"] - coef(o)["x1"] * coef(o)["x2"])
# Use the delta method to get a standard error for
# this difference
standerr <- deltamethod(~ x4 - x3*x2, coef(o), vcov(o))
# Calculate a test statistic. We're relying on asymptotic
# arguments here so hopefully we have a decent sample size
z <- est/standerr
# Calculate p-value
pval <- 2 * pnorm(-abs(z))
pval
I explain what the delta method is used for and more on how to use it in R in this blog post.
Expanding on Brian's answer you could alternatively do this by comparing the full model to the constrained model - however you have to use nls to fit the full model to be able to easily compare the models.
o2 <- nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1))
o3 <- nls(y ~ b0 + b1*x1 + b2*x2 + b3*x1*x2, start = list(b0 = 0, b1 = 1, b2 = 1, b3 = 1))
anova(o2, o3)

There's no way to do what you're asking for in lm and there's no reason for it to be able to do it. You run lm to get estimates of of your coefficients. If you don't want to estimate the coefficient then don't include the predictor in the model. You can use coef to extract the coefficients you want and multiply them out afterwards.
Note that leaving the interaction out is a different model and will produce a different b1 and b2. You could alternatively leave I(x1 * x2) in and not use the coefficient.
As for why you want to do this, there's not good a priori justification that your constrained model actually fits better than the simple additive model. Having more free parameters necessarily means a model fits better but you haven't added that, you've added a constraint that, in the real world, could make it fit worse. In that case would you consider it a better "baseline" for comparison to the model including the interaction?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Fitting a polynomial regression model selected by `leaps::regsubsets` - r

Related

What is the difference between y ~ 1, y ~ 0 and y ~ -1 in R formulas?

Is there a workaround for non-convergence in glmmTMB models using the dredge function of MuMin?

Estimate equation with varying numbers of parameters in R (lm) with demeaned variables

"lm" extended function. Linear model [duplicate]

How can I specify a relationship between parameter estimates in lm?

Categories

Resources