emmip (emmeans) with quadratic term - r

Is it possible to plot with emmip the marginal (log odds) means from a geeglm model when you have a quadratic term? I have repeated measures data and the model fits better with a treatment x time squared term in addition to an interaction term with linear time.
I just want to be able to visualise the predicted curve in the data. If it's possible I don't know how to specify it. I've tried:
mod3 <- geeglm(outcome ~ treatment*time + treatment*time_sq, data = dat, id = id, family = "binomial", corstr = "exchangeable"))
mod3a.rg <- ref_grid(mod3, at = list(time = c(1,2,3,4,5,6), time_sq = c(1,4,9,16,25,36)))
emmip(mod3a.rg, treatment ~ time)

I don't think your mod3 is including your quadratic term correctly (hard to tell since you did not include reproducible code). This will let you include your squared term for time correctly:
mod3 <- geeglm(outcome ~ treatment*time + treatment*I(time^2), data =
dat, id = id, family = "binomial", corstr = "exchangeable"))
The add plotit = TRUE to your call to emmip():
emmip(mod3a.rg, treatment ~ time, plotit = TRUE)
Here's a simple reproducible example with the savings dataset in the MASS, faraway package for comparison
library(MASS)
data(savings, package="faraway")
#fit model with polynomial term
mod <- lm(sr ~ ddpi+I(ddpi^2))
summary(mod)
The summary produces this output, note the additonal coefficient for your quadratic term
> Call: lm(formula = sr ~ ddpi + I(ddpi^2), data = savings)
>
> Residuals:
> Min 1Q Median 3Q Max
> -8.5601 -2.5612 0.5546 2.5735 7.8080
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
>(Intercept) 5.13038 1.43472 3.576 0.000821 ***
>ddpi 1.75752 0.53772 3.268 0.002026 **
>I(ddpi^2) -0.09299 0.03612 -2.574 0.013262 *
> --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 4.079 on 47 degrees of freedom Multiple
> R-squared: 0.205, Adjusted R-squared: 0.1711 F-statistic: 6.059 on
> 2 and 47 DF, p-value: 0.004559
If you don't enclose the quadratic term with I() your summary will only include the term for ddpi.
mod2 <- lm(sr ~ ddpi+ddpi^2)
summary(mod2)
produces the following summary with a coefficient only for ddpi
> lm(formula = sr ~ ddpi + ddpi^2, data = savings)
>
> Residuals:
> Min 1Q Median 3Q Max
> -8.5535 -3.7349 0.9835 2.7720 9.3104
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
>(Intercept) 7.8830 1.0110 7.797 4.46e-10 ***
>ddpi 0.4758 0.2146 2.217 0.0314 *
> --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 4.311 on 48 degrees of freedom Multiple
> R-squared: 0.0929, Adjusted R-squared: 0.074 F-statistic: 4.916 on
> 1 and 48 DF, p-value: 0.03139

Related

How do I change predictors in linear regression in loop in R?

How do I change predictors in linear regression in loop in R?
Below is an example along with the error. Can someone please fix it.
# sample data
mpg <- mpg
str(mpg)
# array of predictors
predictors <- c("hwy", "cty")
# loop over predictors
for (predictor in predictors)
{
# fit linear regression
model <- lm(formula = predictor ~ displ + cyl,
data = mpg)
# summary of model
summary(model)
}
Error
Error in model.frame.default(formula = predictor ~ displ + cyl, data = mpg, :
variable lengths differ (found for 'displ')
We may use paste or reformulate. Also, as it is a for loop, create an object to store the output from summary
sumry_model <- vector('list', length(predictors))
names(sumry_model) <- predictors
for (predictor in predictors) {
# fit linear regression
model <- lm(reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
# with paste
# model <- lm(formula = paste0(predictor, "~ displ + cyl"), data = mpg)
# summary of model
sumry_model[[predictor]] <- summary(model)
}
-output
> sumry_model
$hwy
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-7.5098 -2.1953 -0.2049 1.9023 14.9223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2162 1.0481 36.461 < 2e-16 ***
displ -1.9599 0.5194 -3.773 0.000205 ***
cyl -1.3537 0.4164 -3.251 0.001323 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.759 on 231 degrees of freedom
Multiple R-squared: 0.6049, Adjusted R-squared: 0.6014
F-statistic: 176.8 on 2 and 231 DF, p-value: < 2.2e-16
$cty
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-5.9276 -1.4750 -0.0891 1.0686 13.9261
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2885 0.6876 41.139 < 2e-16 ***
displ -1.1979 0.3408 -3.515 0.000529 ***
cyl -1.2347 0.2732 -4.519 9.91e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.466 on 231 degrees of freedom
Multiple R-squared: 0.6671, Adjusted R-squared: 0.6642
F-statistic: 231.4 on 2 and 231 DF, p-value: < 2.2e-16
This may be also done as a multivariate response
summary(lm(cbind(hwy, cty) ~ displ + cyl, data = mpg))
Or if we want to use predictors
summary(lm(as.matrix(mpg[predictors]) ~ displ + cyl, data = mpg))

How to right two lags periods?

I have a question about R; I would like to use two lags periods instead of one (please check my below code) in my model but I don't know how to write it in R. Can someone help please?
Here below are details of my R code:
library(plm)
fixed = plm(sp ~lag(debt)+lag(I(debt^2))+outgp+gvex+vlimp+vlexp+bcour+infcpi, data=pdata, index=c("country", "year"), model="within")
The lags must be on the variable debt.
This should give 2 lags on the debt variable.
library(plm)
fixed = plm(sp ~lag(debt, k=1:2)+lag(I(debt^2))+outgp+gvex+vlimp+vlexp+bcour+infcpi, data=pdata, index=c("country", "year"), model="within")
For example:
data("Grunfeld", package = "plm")
lags2mod <- plm(inv ~ lag(value, k=1:2) + capital, data = Grunfeld, model = "within")
summary(lags2mod)
Oneway (individual) effect Within Model
Call:
plm(formula = inv ~ lag(value, k = 1:2) + capital, data = Grunfeld,
model = "within")
Balanced Panel: n = 10, T = 18, N = 180
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-272.21434 -19.24168 0.42825 18.09930 260.85548
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
lag(value, k = 1:2)1 0.078234 0.015438 5.0677 1.059e-06 ***
lag(value, k = 1:2)2 -0.018754 0.016078 -1.1664 0.2451
capital 0.352658 0.021003 16.7910 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 2034500
Residual Sum of Squares: 617850
R-Squared: 0.69631
Adj. R-Squared: 0.67449
F-statistic: 127.633 on 3 and 167 DF, p-value: < 2.22e-16

How to formulate time period dummy variable in lm()

I am analysing whether the effects of x_t on y_t differ during and after a specific time period.
I am trying to regress the following model in R using lm():
y_t = b_0 + [b_1(1-D_t) + b_2 D_t]x_t
where D_t is a dummy variable with the value 1 over the time period and 0 otherwise.
Is it possible to use lm() for this formula?
observationNumber <- 1:80
obsFactor <- cut(observationNumber, breaks = c(0,55,81), right =F)
fit <- lm(y ~ x * obsFactor)
For example:
y = runif(80)
x = rnorm(80) + c(rep(0,54), rep(1, 26))
fit <- lm(y ~ x * obsFactor)
summary(fit)
Call:
lm(formula = y ~ x * obsFactor)
Residuals:
Min 1Q Median 3Q Max
-0.48375 -0.29655 0.05957 0.22797 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50959 0.04253 11.983 <2e-16 ***
x -0.02492 0.04194 -0.594 0.554
obsFactor[55,81) -0.06357 0.09593 -0.663 0.510
x:obsFactor[55,81) 0.07120 0.07371 0.966 0.337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3116 on 76 degrees of freedom
Multiple R-squared: 0.01303, Adjusted R-squared: -0.02593
F-statistic: 0.3345 on 3 and 76 DF, p-value: 0.8004
obsFactor[55,81) is zero if observationNumber < 55 and one if its greater or equal its coefficient is your $b_0$. x:obsFactor[55,81) is the product of the dummy and the variable $x_t$ - its coefficient is your $b_2$. The coefficient for $x_t$ is your $b_1$.

Why I() "AsIs" is necessary when making a linear polynomial model in R?

I'm trying to understand what is the role of I() base function in R when using a linear polynomial model or the function poly. When I calculate the model using
q + q^2
q + I(q^2)
poly(q, 2)
I have different answers.
Here is an example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + .1 * (q-5)^2
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
model3 <- lm(noisy.y ~ poly(q,2))
model1 <- lm(noisy.y ~ q + I(q^2))
model2 <- lm(noisy.y ~ q + q^2)
I(q^2)==I(q)^2
I(q^2)==q^2
summary(model1)
summary(model2)
summary(model3)
Here is the output:
> summary(model1)
Call:
lm(formula = noisy.y ~ q + I(q^2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 489.3723 16.5982 29.483 <2e-16 ***
q 5.0560 3.8344 1.319 0.189
I(q^2) -0.1530 0.1856 -0.824 0.411
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
> summary(model2)
Call:
lm(formula = noisy.y ~ q + q^2)
Residuals:
Min 1Q Median 3Q Max
-219.96 -54.42 3.30 61.06 170.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 499.5209 11.1252 44.900 <2e-16 ***
q 1.9961 0.9623 2.074 0.0393 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.16 on 199 degrees of freedom
Multiple R-squared: 0.02117, Adjusted R-squared: 0.01625
F-statistic: 4.303 on 1 and 199 DF, p-value: 0.03933
> summary(model3)
Call:
lm(formula = noisy.y ~ poly(q, 2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 519.482 5.588 92.966 <2e-16 ***
poly(q, 2)1 164.202 79.222 2.073 0.0395 *
poly(q, 2)2 -65.314 79.222 -0.824 0.4107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
Why is the I() necessary when doing a polynomial model in R.
Also, is this normal that the poly function doesn't give the same result as the q + I(q^2)?
The formula syntax in R is described in the ?formula help page. The ^ symbol has not been given the usual meaning of multiplicative exponentiation. Rather, it's used for interactions between all terms at the base of the exponent. For example
y ~ (a+b)^2
is the same as
y ~ a + b + a:b
But if you do
y ~ a + b^2
y ~ a + b # same as above, no way to "interact" b with itself.
That caret would just include the b term because it can't include the interaction with itself. So ^ and * inside formulas has nothing to do with multiplication just like the + doesn't really mean addition for variables in the usual sense.
If you want the "usual" definition for ^2 you need to put it the as is function. Otherwise it's not fitting a squared term at all.
And the poly() function by default returns orthogonal polynomials as described on the help page. This helps to reduce co-linearity in the covariates. But if you don't want the orthogonal versions and just want the "raw" polynomial terms, then just pass raw=TRUE to your poly call. For example
lm(noisy.y ~ poly(q,2, raw=TRUE))
will return the same estimates as model1

Extract data from Partial least square regression on R

I want to use the partial least squares regression to find the most representative variables to predict my data.
Here is my code:
library(pls)
potion<-read.table("potion-insomnie.txt",header=T)
potionTrain <- potion[1:182,]
potionTest <- potion[183:192,]
potion1 <- plsr(Sommeil ~ Aubepine + Bave + Poudre + Pavot, data = potionTrain, validation = "LOO")
The summary(lm(potion1)) give me this answer:
Call:
lm(formula = potion1)
Residuals:
Min 1Q Median 3Q Max
-14.9475 -5.3961 0.0056 5.2321 20.5847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.63931 1.67955 22.410 < 2e-16 ***
Aubepine -0.28226 0.05195 -5.434 1.81e-07 ***
Bave -1.79894 0.26849 -6.700 2.68e-10 ***
Poudre 0.35420 0.72849 0.486 0.627
Pavot -0.47678 0.52027 -0.916 0.361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.845 on 177 degrees of freedom
Multiple R-squared: 0.293, Adjusted R-squared: 0.277
F-statistic: 18.34 on 4 and 177 DF, p-value: 1.271e-12
I deduced that only the variables Aubepine et Bave are representative. So I redid the model just with this two variables:
potion1 <- plsr(Sommeil ~ Aubepine + Bave, data = potionTrain, validation = "LOO")
And I plot:
plot(potion1, ncomp = 2, asp = 1, line = TRUE)
Here is the plot of predicted vs measured values:
The problem is that I see the linear regression on the plot, but I can not know its equation and R². Is it possible ?
Is the first part is the same as a multiple regression linear (ANOVA)?
pacman::p_load(pls)
data(mtcars)
potion <- mtcars
potionTrain <- potion[1:28,]
potionTest <- potion[29:32,]
potion1 <- plsr(mpg ~ cyl + disp + hp + drat, data = potionTrain, validation = "LOO")
coef(potion1) # coefficeints
scores(potion1) # scores
## R^2:
R2(potion1, estimate = "train")
## cross-validated R^2:
R2(potion1)
## Both:
R2(potion1, estimate = "all")

Resources