Interpretation of main effects when interaction is present in gam - r

Consider a GAM model with the following structure:
y~gam(s(x1, by=x2) + x2 + s(x3)) where x1 and x2 are continuous variables and x2 is categorical. If I want to know the effect of x1 (in terms of deviance explained), I remove x1 from the model and compare the deviance explained (following this thread), like this:
model1 <- y~gam(s(x1, by=x2) + x2 + s(x3))
model2 <- y~gam(x2 + s(x3))
## deviance explained by x1:
summary(model1)$dev.expl-summary(model2)$dev.expl
But what if I want to know the effect of x2? I am not interested in the effect of x2 on x1; I just want to know the effect of x2 by itself. Could I do this:
model3 <- y~gam(s(x1, by=x2) + s(x3))
## deviance explained by x2:
summary(model1)$dev.expl-summary(model3)$dev.expl
I know that for linear models, if a significant interaction is present, one cannot remove the main effects of the variables in that interaction, even if they are not significant. Does the same apply here, in that I cannot know the effect of x2 on y independently of its effect on x1?

Yes, the same apply here. Whenever there are any interactions involving a variable, you cannot make any affirmation over the effects of this variable.
However, notice that this type of effect you are retrieving from explained deviance, doesn't have the same interpretability as the usual in linear models, where you affirm that a modification of a single unit in x2 represents an increase of beta2 over the mean of y. In fact, they are two different effects. Hence, by removing, only the x2 parameter you can still say that you have an explained deviance increase that is interpretable. The only difference is that the interpretation is in terms of information loss, or uncertainty decrease, which is absolutely fine to do.

Related

Correlation Syntax for R

Very basic question, it's my first time writing syntax in R. Trying to write basic correlation syntax. Hypothesis is as follows: X1 (Predictor variable) and X2 (latent predictor variable) will be positively associated with Y (outcome variable), over and above X3 (latent predictor variable). How can I write this in R?
Not sure what your statistics chops are, but pure the correlation as measured by the r-squared value will strictly increase with added variables to your model. So, if these variables are stored in data frame df,
model_full <- lm(Y ~ X1 + X2 + X3, data = df)
fits the full model. Use summary(model_full) to view summary statistics of the model.
model_reduced <- lm(Y ~ X3, data = df)
fits the reduced model. Here's where the more complicated stuff comes in. To test the value of X1 and X2, you probably want an F-test to test whether the coefficients on X1 and X2 are jointly statistically significantly different from zero (this is how I interpret 'above and beyond X3'). To compute that test, use
lmtest::waldtest(model_full, model_reduced, test = "F")
Hope this helps!

How to find overall significance for main effects in a dummy interaction using anova()

I run a Cox Regression with two categorical variables (x1 and x2) and their interaction. I need to know the significance of the overall effect of x1, x2 and of the interaction.
The overall effect of the interaction:
I know how do find out the overall effect of the interaction using anova():
library(survival)
fit_x1_x2 <- coxph(Surv(time, death) ~ x1 + x2 , data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_x2, fit_full)
But how are we supposed to use anova() to find out the overall effect of x1 or x2? What I tried is this:
The overall effect of x1
fit_x2_ia <- coxph(Surv(time, death) ~ x2 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x2_ia, fit_full)
The overall effect of x2
fit_x1_ia <- coxph(Surv(time, death) ~ x1 + x1:x2, data= df)
fit_full <- coxph(Surv(time, death) ~ x1 + x2 + x1:x2, data= df)
anova(fit_x1_ia, fit_full)
I am not sure whether this is how we are supposed to use anova(). The fact that the output shows degree of freedom is zero makes me sceptical. I am even more puzzled that both times, for the overall effect of x1 and x2, the test is significant, although the log likelihood values of the models are the same and the Chi value is zero.
Here is the data I used
set.seed(1) # make it reproducible
df <- data.frame(x1= rnorm(1000), x2= rnorm(1000)) # generate data
df$death <- rbinom(1000,1, 1/(1+exp(-(1 + 2 * df$x1 + 3 * df$x2 + df$x1 * df$x2)))) # dead or not
library(tidyverse) # for cut_number() function
df$x1 <- cut_number(df$x1, 4); df$x2 <- cut_number(df$x2, 4) # make predictors to groups
df$time <- rnorm(1000); df$time[df$time<0] <- -df$time[df$time<0] # add survival times
The two models you have constructed for "overall effect" do really not appear to satisfy the statistical property of being hierarchical, i.e properly nested. Specifically, if you look at the actual models that get constructed with that code you should see that they are actually the same model with different labels for the two-way crossed effects. In both cases you have 15 estimated coefficients (hence zero degrees of freedom difference) and you will not that the x1 parameter in the full model has the same coefficient as the x2[-3.2532,-0.6843):x1[-0.6973,-0.0347) parameter in the "reduced" model looking for an x1-effect, namely 0.19729. The crossing operator is basically filling in all the missing cells for the main effects with interaction results.
There really is little value in looking at interaction models without all of the main effects if you want to stay within the bounds of generally accepted statistical practice.
If you type:
fit_full
... you should get a summary of the model that has p-values for x1 levels, x2 levels,and the interaction levels. Because you chose to categorize these by four arbitrary cutpoints each you will end up with a total of 15 parameter estimates. If instead you made no cuts and modeled the linear effects and the linear-by-linear interaction, you could get three p-values directly. I'm guessing there was suspicion that the effects were not linear and if so I thought a cubic spline model might be more parsimonious and distort the biological reality less than discretization into 4 disjoint levels. If you thought the effects might be non-linear but ordinal, there is an ordinal version of factor classed variables, but the results are generally confusion to the uninitiated.
The answer from 42- is informative but after reading it I still did not know how to determine the three p values or if this is possible at all. Thus I talked to the professor of biostatistic of my university. His answer was quite simple and I share it in case others have similar questions.
In this design it is not possible to determine the three p values for overall effect of x1, x2 and their interaction. If we want to know the p values of the three overall effects, we need to keep the continuous variables as they are. But breaking up the variables into groups answers a different question, hence we can not test the hypothesis of the overall effects no matter which statisstical model we use.

Interpreting Interaction Coefficients within Multiple Linear Regression Model

I am struggling with the interpretation of the coefficients within interaction models.
I am looking at the outcome of an interaction model of 2 binary (dummy variables). I was just wondering how I interpret the:
- Intercept (is everything at 0)?
- The slope coefficients?
- The interaction coefficients?
In standard multiple linear regression, we talk about the change in y when we have a 1-unit change in x, holding everything else constnat. How do we interpret this in interactions? Especially since both my variables are dummy?
Hope this makes sense and thanks very much in advance.
How do we interpret this in interactions?
The meaning of the regression coefficients in models having interaction do not remain the same as in the case of simple linear regression without interaction simply because of the added interaction term/terms.
The regression coefficients no longer indicate the change in the mean response with a unit increase of the predictor variable, with the other predictor variable held constant at any given level. This interpretation is only valid after accounting for the dependence on the level of the other predictor variable.
Ex:
A SLRM with interaction terms:
E(Y) = B0 + B1X1 + B2X2 + B3X1X2
Interpretation:
It can be shown that the change in the mean response with a unit increase in X1 when X2 is held constant is:
B1 + B3X2
And, the change in the mean response with a unit increase in X2 when X1 is held
constant is:
B2 + B3X1
I was just wondering how I interpret the: - Intercept (is everything at 0)?
The intercept is the prediction from the regression model when all the predictors are at level zero.
The slope coefficients?
In case of no interaction coefficients.
E(Y) = B0 + B1X1 + B2X2
The coefficients B1, B2 indicate, respectively, how much higher (lower) the response functions for dummies X1, X2 are than the one for, both dummies zero.
Thus, B1 and B2 measure the differential effects of the dummy variables on the height of the response function i.e. E(Y).
You can inspect that only the slope changes:
When X1 = 1 and X2 = 0.
E(Y) = B0 + B1
and, when X1 = 0 and X2 + 1.
E(Y) = B0 + B2
The interaction coefficients?
By interaction coefficients, I understand the regression coefficients for model with interaction.
The model:
E(Y) = B0 + B1X1 + B2X2 + B3X1X2
When both X1 and X2 are 1, then the model becomes:
E(Y) = B0 + B1 + B2 + B3.
Which translates to an increase or decrease in the height of the response function.
You can create a more interesting example with a third continuous predictor and explore the interaction relationship of the continuous variable with the dummies, in which case the slope of the regression would also change instead of only the intercept. And, hence the interpretation that how much higher (lower) one response function is than the other for any given level of X1 and X2 would not be valid as the slope also would have changed and thus the effect of the dummy predictor also would have been more evident.
When interaction effects are present, the effect of the qualitative predictor (dummy variable) can be studied by comparing the regression functions within the scope of the model for the different classes of the dummy variable.
Reference: Kutner et. al. Applied Linear Statistical Models

Scatter plot between two predictors X1 and X2

Given the following scatter plot between two predictors X1 and X2:
Is there a way to get the number of parameters of a linear model like that?
model <- lm(Y~X1+X2)
I would like to get the number 3 somehow (intercept + X1 + X2). I looked for something like this in the structures that lm, summary(model) and anova(model) return, but I didn't figure it out.
In case I don't get an answer, I'll stick on dim(model.matrix(model))[2] Thank you
I was thinking that X1 and X2 are correlated. Collinearity will reduce the accuracy of the estimates of the regression coefficients
Maybe the The importance of either X1 or X2 variable may be masked due to the presence of collinearity?
Though they both could be correct
Thank you!
In a linear model to get the 2nd beta, you need to have your y variable predicted/explained by at least 2 independent variables. If you are predicting 1 variable explained by only 1 variable, your linear model will only produce 1 beta.

Regression from error term to dependent variable (lavaan)

I want to test a structural equation model (SEM). There are 3 indicators, I1 to I3, that make up a latent construct LC. This construct should explain a dependent variable DV.
Now, assume that unique variance of the indicators will contribute additional explanation to the DV. Something like this:
IV1 ↖
IV2 ← LC → DV
IV3 ↙ ↑
↑ │
e3 ───────┘
In lavaan the error terms/residuals of IV3, e3, are usually not written:
model = '
# latent variables
LV =~ IV1 + IV2 + IV3
# regression
DV ~ LV
'
Further, the residual of I3 must be split into a compontent that contributes to explain DV, and one residual of the residual.
I do not want to explain DV directly by IV3, because its my goal to show how much unique explanation IV3 can contribute to DV. I want to maximize the path IV3 → LC → DV, and then put the residual into I3 → DV.
Question:
How do I put this down in a SEM?
Bonus question:
Does it make sense from a SEM persective that each of the IVs has such a path to DV?
Side note:
What I already did, was to compute this traditionally, using a series of computations. I:
Computed a pendant to LV, average of IV1 to IV3
Did 3 regressions IVx → LC
Did a multiple regression of the IVxs residuals to DV.
Removing the common variance seems to make one of the residuals superfluous, so the regression model cannot estimate each of the residuals, but skips the last one.
For your question:
How do I put this down in a SEM model? Is it possible at all?
The answer, I think, is yes--at least if I understand you correctly.
If what you want to do is predict an outcome using a latent variable and the unique variance of one of its indicators, this can be easily accomplished in lavaan. See example code below: the first example involves predicting an outcome from a latent variable alone, whereas the second example predicts the same outcome from the same latent variable as well as the unique variance of one of the indicators of that latent variable:
#Call lavaan and use HolzingerSwineford1939 data set
library(lavaan)
dat = HolzingerSwineford1939
#Model 1: x4 predicted by lv (visual)
model1 = '
visual =~ x1 + x2 + x3
x4 ~ visual
'
#Fit model 1 and get fit measures and r-squared estimates
fit1 <- cfa(model1, data = dat, std.lv = T)
summary(fit1, fit.measures = TRUE, rsquare=T)
#Model 2: x4 predicted by lv (visual) and residual of x3
model2 = '
visual =~ x1 + x2 + x3
x4 ~ visual + x3
'
#Fit model 2 and get fit measures and r-squared estimates
fit2 <- cfa(model2, data = dat, std.lv = T)
summary(fit2, fit.measures = TRUE,rsquare=T)
Notice that the R-squared for x4 (the hypothetical outcome) is much larger when predicted by both the latent variable onto which x3 loads, and x3's unique variance.
As for your second question:
Bonus question: Does that make sense? And even more: Does it make sense from a SEM view (theoretically is does) that each of the independet variables has such a path to DV?
It can make sense, in some cases, to specify such paths, but I would not do so in absentia of strong theory. For example, perhaps you think a variable is a weak, but theoretically important indicator of a greater latent variable--such as the experience of "awe" is for "positive affect". But perhaps your investigation isn't interested in the latent variable, per se--you are interested in the unique effects of awe for predicting something above and beyond its manifestation as a form of positive affect. You might therefore specify a regression pathway from the unique variance of awe to the outcome, in addition to the pathway from positive affect to the outcome.
But could/should you do this for each of your variables? Well, no, you couldn't. As you can see, this particular case only has one remaining degree of freedom, so the model is on the verge of being under-identified (and would be, if you specified the remaining two possible paths from the unique variances of x1 and x2 to the outcome of x4).
Moreover, I think many would be skeptical of your motivation for attempting to specify all these pathways. Modelling the pathway from the latent variable to the outcome allows you to speak to a broader process; what would you learn by modelling each and every pathway from unique variance to outcome? Sure, you might be able to say, "Well the remaining "stuff" in this variable predicts x4!"...but what could you say about the nature of that "stuff"--it's just isolated manifest variance. Instead, I think you would be on stronger theoretical ground to consider additional common factors that might underly the remaining variance of your variables (e.g., method factors); this would add more conceptual specificity to your analyses.

Resources