emmip (emmeans) with quadratic term

emmip (emmeans) with quadratic term - r

Is it possible to plot with emmip the marginal (log odds) means from a geeglm model when you have a quadratic term? I have repeated measures data and the model fits better with a treatment x time squared term in addition to an interaction term with linear time.
I just want to be able to visualise the predicted curve in the data. If it's possible I don't know how to specify it. I've tried:
mod3 <- geeglm(outcome ~ treatment*time + treatment*time_sq, data = dat, id = id, family = "binomial", corstr = "exchangeable"))
mod3a.rg <- ref_grid(mod3, at = list(time = c(1,2,3,4,5,6), time_sq = c(1,4,9,16,25,36)))
emmip(mod3a.rg, treatment ~ time)

I don't think your mod3 is including your quadratic term correctly (hard to tell since you did not include reproducible code). This will let you include your squared term for time correctly:
mod3 <- geeglm(outcome ~ treatment*time + treatment*I(time^2), data =
dat, id = id, family = "binomial", corstr = "exchangeable"))
The add plotit = TRUE to your call to emmip():
emmip(mod3a.rg, treatment ~ time, plotit = TRUE)
Here's a simple reproducible example with the savings dataset in the MASS, faraway package for comparison
library(MASS)
data(savings, package="faraway")
#fit model with polynomial term
mod <- lm(sr ~ ddpi+I(ddpi^2))
summary(mod)
The summary produces this output, note the additonal coefficient for your quadratic term
> Call: lm(formula = sr ~ ddpi + I(ddpi^2), data = savings)
>
> Residuals:
> Min 1Q Median 3Q Max
> -8.5601 -2.5612 0.5546 2.5735 7.8080
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
>(Intercept) 5.13038 1.43472 3.576 0.000821 ***
>ddpi 1.75752 0.53772 3.268 0.002026 **
>I(ddpi^2) -0.09299 0.03612 -2.574 0.013262 *
> --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 4.079 on 47 degrees of freedom Multiple
> R-squared: 0.205, Adjusted R-squared: 0.1711 F-statistic: 6.059 on
> 2 and 47 DF, p-value: 0.004559
If you don't enclose the quadratic term with I() your summary will only include the term for ddpi.
mod2 <- lm(sr ~ ddpi+ddpi^2)
summary(mod2)
produces the following summary with a coefficient only for ddpi
> lm(formula = sr ~ ddpi + ddpi^2, data = savings)
>
> Residuals:
> Min 1Q Median 3Q Max
> -8.5535 -3.7349 0.9835 2.7720 9.3104
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
>(Intercept) 7.8830 1.0110 7.797 4.46e-10 ***
>ddpi 0.4758 0.2146 2.217 0.0314 *
> --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 4.311 on 48 degrees of freedom Multiple
> R-squared: 0.0929, Adjusted R-squared: 0.074 F-statistic: 4.916 on
> 1 and 48 DF, p-value: 0.03139

Related

How do I change predictors in linear regression in loop in R?

How do I change predictors in linear regression in loop in R?
Below is an example along with the error. Can someone please fix it.
# sample data
mpg <- mpg
str(mpg)
# array of predictors
predictors <- c("hwy", "cty")
# loop over predictors
for (predictor in predictors)
{
# fit linear regression
model <- lm(formula = predictor ~ displ + cyl,
data = mpg)
# summary of model
summary(model)
}
Error
Error in model.frame.default(formula = predictor ~ displ + cyl, data = mpg, :
variable lengths differ (found for 'displ')

We may use paste or reformulate. Also, as it is a for loop, create an object to store the output from summary
sumry_model <- vector('list', length(predictors))
names(sumry_model) <- predictors
for (predictor in predictors) {
# fit linear regression
model <- lm(reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
# with paste
# model <- lm(formula = paste0(predictor, "~ displ + cyl"), data = mpg)
# summary of model
sumry_model[[predictor]] <- summary(model)
}
-output
> sumry_model
$hwy
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-7.5098 -2.1953 -0.2049 1.9023 14.9223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2162 1.0481 36.461 < 2e-16 ***
displ -1.9599 0.5194 -3.773 0.000205 ***
cyl -1.3537 0.4164 -3.251 0.001323 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.759 on 231 degrees of freedom
Multiple R-squared: 0.6049, Adjusted R-squared: 0.6014
F-statistic: 176.8 on 2 and 231 DF, p-value: < 2.2e-16
$cty
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-5.9276 -1.4750 -0.0891 1.0686 13.9261
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2885 0.6876 41.139 < 2e-16 ***
displ -1.1979 0.3408 -3.515 0.000529 ***
cyl -1.2347 0.2732 -4.519 9.91e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.466 on 231 degrees of freedom
Multiple R-squared: 0.6671, Adjusted R-squared: 0.6642
F-statistic: 231.4 on 2 and 231 DF, p-value: < 2.2e-16
This may be also done as a multivariate response
summary(lm(cbind(hwy, cty) ~ displ + cyl, data = mpg))
Or if we want to use predictors
summary(lm(as.matrix(mpg[predictors]) ~ displ + cyl, data = mpg))

How to right two lags periods?

I have a question about R; I would like to use two lags periods instead of one (please check my below code) in my model but I don't know how to write it in R. Can someone help please?
Here below are details of my R code:
library(plm)
fixed = plm(sp ~lag(debt)+lag(I(debt^2))+outgp+gvex+vlimp+vlexp+bcour+infcpi, data=pdata, index=c("country", "year"), model="within")
The lags must be on the variable debt.

This should give 2 lags on the debt variable.
library(plm)
fixed = plm(sp ~lag(debt, k=1:2)+lag(I(debt^2))+outgp+gvex+vlimp+vlexp+bcour+infcpi, data=pdata, index=c("country", "year"), model="within")
For example:
data("Grunfeld", package = "plm")
lags2mod <- plm(inv ~ lag(value, k=1:2) + capital, data = Grunfeld, model = "within")
summary(lags2mod)
Oneway (individual) effect Within Model
Call:
plm(formula = inv ~ lag(value, k = 1:2) + capital, data = Grunfeld,
model = "within")
Balanced Panel: n = 10, T = 18, N = 180
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-272.21434 -19.24168 0.42825 18.09930 260.85548
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
lag(value, k = 1:2)1 0.078234 0.015438 5.0677 1.059e-06 ***
lag(value, k = 1:2)2 -0.018754 0.016078 -1.1664 0.2451
capital 0.352658 0.021003 16.7910 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 2034500
Residual Sum of Squares: 617850
R-Squared: 0.69631
Adj. R-Squared: 0.67449
F-statistic: 127.633 on 3 and 167 DF, p-value: < 2.22e-16

How to formulate time period dummy variable in lm()

I am analysing whether the effects of x_t on y_t differ during and after a specific time period.
I am trying to regress the following model in R using lm():
y_t = b_0 + [b_1(1-D_t) + b_2 D_t]x_t
where D_t is a dummy variable with the value 1 over the time period and 0 otherwise.
Is it possible to use lm() for this formula?

observationNumber <- 1:80
obsFactor <- cut(observationNumber, breaks = c(0,55,81), right =F)
fit <- lm(y ~ x * obsFactor)
For example:
y = runif(80)
x = rnorm(80) + c(rep(0,54), rep(1, 26))
fit <- lm(y ~ x * obsFactor)
summary(fit)
Call:
lm(formula = y ~ x * obsFactor)
Residuals:
Min 1Q Median 3Q Max
-0.48375 -0.29655 0.05957 0.22797 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50959 0.04253 11.983 <2e-16 ***
x -0.02492 0.04194 -0.594 0.554
obsFactor[55,81) -0.06357 0.09593 -0.663 0.510
x:obsFactor[55,81) 0.07120 0.07371 0.966 0.337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3116 on 76 degrees of freedom
Multiple R-squared: 0.01303, Adjusted R-squared: -0.02593
F-statistic: 0.3345 on 3 and 76 DF, p-value: 0.8004
obsFactor[55,81) is zero if observationNumber < 55 and one if its greater or equal its coefficient is your $b_0$. x:obsFactor[55,81) is the product of the dummy and the variable $x_t$ - its coefficient is your $b_2$. The coefficient for $x_t$ is your $b_1$.

Why I() "AsIs" is necessary when making a linear polynomial model in R?

I'm trying to understand what is the role of I() base function in R when using a linear polynomial model or the function poly. When I calculate the model using
q + q^2
q + I(q^2)
poly(q, 2)
I have different answers.
Here is an example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + .1 * (q-5)^2
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
model3 <- lm(noisy.y ~ poly(q,2))
model1 <- lm(noisy.y ~ q + I(q^2))
model2 <- lm(noisy.y ~ q + q^2)
I(q^2)==I(q)^2
I(q^2)==q^2
summary(model1)
summary(model2)
summary(model3)
Here is the output:
> summary(model1)
Call:
lm(formula = noisy.y ~ q + I(q^2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 489.3723 16.5982 29.483 <2e-16 ***
q 5.0560 3.8344 1.319 0.189
I(q^2) -0.1530 0.1856 -0.824 0.411
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
> summary(model2)
Call:
lm(formula = noisy.y ~ q + q^2)
Residuals:
Min 1Q Median 3Q Max
-219.96 -54.42 3.30 61.06 170.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 499.5209 11.1252 44.900 <2e-16 ***
q 1.9961 0.9623 2.074 0.0393 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.16 on 199 degrees of freedom
Multiple R-squared: 0.02117, Adjusted R-squared: 0.01625
F-statistic: 4.303 on 1 and 199 DF, p-value: 0.03933
> summary(model3)
Call:
lm(formula = noisy.y ~ poly(q, 2))
Residuals:
Min 1Q Median 3Q Max
-211.592 -50.609 4.742 61.983 165.792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 519.482 5.588 92.966 <2e-16 ***
poly(q, 2)1 164.202 79.222 2.073 0.0395 *
poly(q, 2)2 -65.314 79.222 -0.824 0.4107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79.22 on 198 degrees of freedom
Multiple R-squared: 0.02451, Adjusted R-squared: 0.01466
F-statistic: 2.488 on 2 and 198 DF, p-value: 0.08568
Why is the I() necessary when doing a polynomial model in R.
Also, is this normal that the poly function doesn't give the same result as the q + I(q^2)?

The formula syntax in R is described in the ?formula help page. The ^ symbol has not been given the usual meaning of multiplicative exponentiation. Rather, it's used for interactions between all terms at the base of the exponent. For example
y ~ (a+b)^2
is the same as
y ~ a + b + a:b
But if you do
y ~ a + b^2
y ~ a + b # same as above, no way to "interact" b with itself.
That caret would just include the b term because it can't include the interaction with itself. So ^ and * inside formulas has nothing to do with multiplication just like the + doesn't really mean addition for variables in the usual sense.
If you want the "usual" definition for ^2 you need to put it the as is function. Otherwise it's not fitting a squared term at all.
And the poly() function by default returns orthogonal polynomials as described on the help page. This helps to reduce co-linearity in the covariates. But if you don't want the orthogonal versions and just want the "raw" polynomial terms, then just pass raw=TRUE to your poly call. For example
lm(noisy.y ~ poly(q,2, raw=TRUE))
will return the same estimates as model1

Extract data from Partial least square regression on R

I want to use the partial least squares regression to find the most representative variables to predict my data.
Here is my code:
library(pls)
potion<-read.table("potion-insomnie.txt",header=T)
potionTrain <- potion[1:182,]
potionTest <- potion[183:192,]
potion1 <- plsr(Sommeil ~ Aubepine + Bave + Poudre + Pavot, data = potionTrain, validation = "LOO")
The summary(lm(potion1)) give me this answer:
Call:
lm(formula = potion1)
Residuals:
Min 1Q Median 3Q Max
-14.9475 -5.3961 0.0056 5.2321 20.5847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.63931 1.67955 22.410 < 2e-16 ***
Aubepine -0.28226 0.05195 -5.434 1.81e-07 ***
Bave -1.79894 0.26849 -6.700 2.68e-10 ***
Poudre 0.35420 0.72849 0.486 0.627
Pavot -0.47678 0.52027 -0.916 0.361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.845 on 177 degrees of freedom
Multiple R-squared: 0.293, Adjusted R-squared: 0.277
F-statistic: 18.34 on 4 and 177 DF, p-value: 1.271e-12
I deduced that only the variables Aubepine et Bave are representative. So I redid the model just with this two variables:
potion1 <- plsr(Sommeil ~ Aubepine + Bave, data = potionTrain, validation = "LOO")
And I plot:
plot(potion1, ncomp = 2, asp = 1, line = TRUE)
Here is the plot of predicted vs measured values:
The problem is that I see the linear regression on the plot, but I can not know its equation and R². Is it possible ?
Is the first part is the same as a multiple regression linear (ANOVA)?

pacman::p_load(pls)
data(mtcars)
potion <- mtcars
potionTrain <- potion[1:28,]
potionTest <- potion[29:32,]
potion1 <- plsr(mpg ~ cyl + disp + hp + drat, data = potionTrain, validation = "LOO")
coef(potion1) # coefficeints
scores(potion1) # scores
## R^2:
R2(potion1, estimate = "train")
## cross-validated R^2:
R2(potion1)
## Both:
R2(potion1, estimate = "all")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

emmip (emmeans) with quadratic term - r

Related

How do I change predictors in linear regression in loop in R?

How to right two lags periods?

How to formulate time period dummy variable in lm()

Why I() "AsIs" is necessary when making a linear polynomial model in R?

Extract data from Partial least square regression on R

Categories

Resources