What does the y-axis "effect" mean after using gratia::draw for a GAM - effect

I have made a GAM model using "mgcv" package with the family = inverse.guassian(link = identity) and I am really happy with the fit. After plotting the smooth terms using gratia:draw(GAM, residuals = TRUE) I am really confused by the y-axis. What does "effect" mean?
Any help would be very appreciated!
Thank you

Technically this should read "Partial effect" (and I'll be fixing this shortly). This is the smooth effect of the covariate on the response conditional upon the other estimated terms.
Most smooths in {mgcv} are subject to a sum-to-zero identifiability constraint (so we can include an intercept in the model, which is especially useful when we have factor parametric terms in the model also), so they are centred about 0. The 0 line then means the overall mean (on the link scale) of the response (or the reference levels if factor parametric terms are involved in the model); negative values on the axis indicate where the effect of the covariate reduces the response below the average value, and positive values on the axis indicate those covariate values where the response is increased above the average. All conditional upon the other estimated model terms.

Related

Output from Linear Mixed Models differs from Estimated Marginal Means

I have a query about the output statistics gained from linear mixed models (using the lmer function) relative to the output statistics taken from the estimated marginal means gained from this model
Essentially, I am running an LMM comparing the within-subjects effect of different contexts (with "Negative" coded as the baseline) on enjoyment ratings. The LMM output suggests that the difference between negative and polite contexts is not significant, with a p-value of .35. See the screenshot below with the relevant line highlighted:
LMM output
However, when I then run the lsmeans function on the same model (with the Holm correction), the p-value for the comparison between Negative and Polite context categories is now .05, and all of the other statistics have changed too. Again, see the screenshot below with the relevant line highlighted:
LSMeans output
I'm probably being dense because my understanding of LMMs isn't hugely advanced, but I've tried to Google the reason for this and yet I can't seem to find out why? I don't think it has anything to do with the corrections because the smaller p-value is observed when the Holm correction is used. Therefore, I was wondering why this is the case, and which value I should report/stick with and why?
Thank you for your help!
Regression coefficients and marginal means are not one and the same. Once you learn these concepts it'll be easier to figure out which one is more informative and therefore which one you should report.
After we fit a regression by estimating its coefficients, we can predict the outcome yi given the m input variables Xi = (Xi1, ..., Xim). If the inputs are informative about the outcome, the predicted yi is different for different Xi. If we average the predictions yi for examples with Xij = xj, we get the marginal effect of the jth feature at the value xj. It's crucial to keep track of which inputs are kept fixed (and at what values) and which inputs are averaged over (aka marginalized out).
In your case, contextCatPolite in the coefficients summary is the difference between Polite and Negative when smileType is set to its reference level (no reward, I'd guess). In the emmeans contrasts, Polite - Negative is the average difference over all smileTypes.
Interactions have a way of making interpretation more challenging and your model includes an interaction between smileType and contextCat. See Interaction analysis in emmeans.
To add to #dipetkov's answer, the coefficients in your LMM are based on treatment coding (sometimes called 'dummy' coding). With the interactions in the model, these coefficients are no longer "main-effects" in the traditional sense of factorial ANOVA. For instance, if you have:
y = b_0 + b_1(X_1) + b_2(X_2) + b_3 (X_1 * X_2)
...b_1 is "the effect of X_1" only when X_2 = 0:
y = b_0 + b_1(X_1) + b_2(0) + b_3 (X_1 * 0)
y = b_0 + b_1(X_1)
Thus, as #dipetkov points out, 1.625 is not the difference between Negative and Polite on average across all other factors (which you get from emmeans). Instead, this coefficient is the difference between Negative and Polite specifically when smileType = 0.
If you use contrast coding instead of treatment coding, then the coefficients from the regression output would match the estimated marginal means, because smileType = 0 would now be on average across smile types. The coding scheme thus has a huge effect on the estimated values and statistical significance of regression coefficients, but it should not effect F-tests based on the reduction in deviance/variance (because no matter how you code it, a given variable explains the same amount of variance).
https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/

What are the differences between directly plotting the fit function and plotting the predicted values(they have same shape but different ranges)?

I am trying to learn gam() in R for a logistic regression using spline on a predictor. The two methods of plotting in my code gives the same shape but different ranges of response in the logit scale, seems like an intercept is missing in one. Both are supposed to be correct but, why the differences in range?
library(ISLR)
attach(Wage)
library(gam)
gam.lr = gam(I(wage >250) ~ s(age), family = binomial(link = "logit"), data = Wage)
agelims = range(age)
age.grid = seq(from = agelims[1], to = agelims[2])
pred=predict(gam.lr, newdata = list(age = age.grid), type = "link")
par(mfrow = c(2,1))
plot(gam.lr)
plot(age.grid, pred)
I expected that both of the methods would give the exact same plot. plot(gam.lr) plots the additive effects of each component and since here there's only one so it is supposed to give the predicted logit function. The predict method is also giving me estimates in the link scale. But the actual outputs are on different ranges. The minimum value of the first method is -4 while that of the second is less than -7.
The first plot is of the estimated smooth function s(age) only. Smooths are subject to identifiability constraints as in the basis expansion used to parametrise the smooth, there is a function or combination of functions that are entirely confounded with the intercept. As such, you can't fit the smooth and an intercept in the same model as you could subtract some value from the intercept and add it back to the smooth and you have the same fit but different coefficients. As you can add and subtract an infinity of values you have an infinite supply of models, which isn't helpful.
Hence identifiability constraints are applied to the basis expansions, and the one that is most useful is to ensure that the smooth sums to zero over the range of the covariate. This involves centering the smooth at 0, with the intercept then representing the overall mean of the response.
So, the first plot is of the smooth, subject to this sum to zero constraint, so it straddles 0. The intercept in this model is:
> coef(gam.lr)[1]
(Intercept)
-4.7175
If you add this to values in this plot, you get the values in the second plot, which is the application of the full model to the data you supplied, intercept + f(age).
This is all also happening on the link scale, the log odds scale, hence all the negative values.

Contrast plot for GAM using mgcv

When using the visreg package to visualise a GAM with a contrast plot, the confidence interval goes to zero at the inflection point when the graph is U-shaped:
# Load libraries
library(mgcv)
library(visreg)
# Synthetic data
df <- data.frame(a = -10:10, b = jitter((-10:10)^2, amount = 10))
# Fit GAM
res <- gam(b ~ s(a), data = df)
# Make contrast figure
visreg(res, type = "contrast")
This seems dodgy and doesn't happen when making a conditional plot (i.e., visreg(res, type = "conditional")), so instead I'm looking at the mgcv package to make the same plot. I can make a conditional plot using mgcv (e.g., plot.gam(res)), but I don't see the option to make a contrast plot. Is this possible with the mgcv package?
This is due to identifiability constraints imposed on the spline basis/bases used in the model. This is a sum-to-zero constraint and effectively removes an intercept-like basis function from the basis used for each smooth term so that these are not confounded with the model intercept. This allows the model to be identifiable, rather than having an infinity of solutions.
Using standard theory, the confidence interval has to tend to zero where it crosses zero on the y-axis (the centred effect usually, but here as shown it is on on some transformed scale) as the constraint implies that at some point x, the effect is 0 and has 0 variance.
This is nonsense of course and recent research has investigated this problem. One solution provided by Simon Wood and colleagues employs extensions to Nychka's observation that, for the Gaussian case, the Bayesian credible interval for a smooth has good across-the-function interpretation as a frequentist confidence interval (so not pointwise, but not simultaneous either). Nychka's results (the coverage properties of the interval) fail in situations where the estimated smooth has squared bias that is not substantially less than the variance of the estimate; clearly this fails to be the case when the variance hits zero where the estimated smooth passes through zero effect as the bias is not actually quite zero at this point.
Marra and Wood (2012) have extended these results to the generalized model setting, basically estimating the confidence interval for one smooth by assuming that all the other terms in the model have had the identifiability constraints applied to them, but not the smooth of interest. This shifts the focus of inference from the smooth directly to the smooth + intercept. You can turn this on in plot.gam() with the argument seWithMean = TRUE.
I don't see an easy way to make visreg do this however, although it is trivial to get back the information you want via predict.gam() with the options type = 'iterms', se.fit = TRUE. This returns, on the scale of the linear predictor, the contributions of each model smooth term plus the standard error that include the correction implied by seWithMean. You can then fiddle with this to your heart's content; adding on the model constant term (the estimate for the intercept) for example should provide you something close to the figure you show in your question.

Orange Canvas: Regression models gives the r^2 coefficients out of [-1,1] interval

I am using the Orange canvas with its regression methods to make some estimations about my data set. The regression coefficients r^2 must be inside of the interval [-1,1] for being meaningful according to statistics field. But sometimes, I've got the regression coefficients -50,.. or 26,.. etc. So, I am confused about that. How can I interprete such the coefficients ? Thank you all already.
From Wikipedia:
Important cases where the computational definition of R2 can yield negative values, depending on the definition used, arise where the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept. Additionally, negative values of R2 may occur when fitting non-linear functions to data.
There is nothing in the definition of R² that theoretically prevents it from having arbitrarily negative values. I guess you can interpret -50 as even worse than -1. But with regard to R² = 26, I'm clueless.

predict and multiplicative variables / interaction terms in probit regressions

I want to determine the marginal effects of each dependent variable in a probit regression as follows:
predict the (base) probability with the mean of each variable
for each variable, predict the change in probability compared to the base probability if the variable takes the value of mean + 1x standard deviation of the variable
In one of my regressions, I have a multiplicative variable, as follows:
my_probit <- glm(a ~ b + c + I(b*c), family = binomial(link = "probit"), data=data)
Two questions:
When I determine the marginal effects using the approach above, will the value of the multiplicative term reflect the value of b or c taking the value mean + 1x standard deviation of the variable?
Same question, but with an interaction term (* and no I()) instead of a multiplicative term.
Many thanks
When interpreting the results of models involving interaction terms, the general rule is DO NOT interpret coefficients. The very presence of interactions means that the meaning of coefficients for terms will vary depending on the other variate values being used for prediction. The right way to go about looking at the results is to construct a "prediction grid", i.e. a set of values that are spaced across the range of interest (hopefully within the domain of data support). The two essential functions for this process are expand.grid and predict.
dgrid <- expand.grid(b=fivenum(data$b)[2:4], c=fivenum(data$c)[2:4]
# A grid with the upper and lower hinges and the medians for `a` and `b`.
predict(my_probit, newdata=dgrid)
You may want to have the predictions on a scale other than the default (which is to return the linear predictor), so perhaps this would be easier to interpret if it were:
predict(my_probit, newdata=dgrid, type ="response")
Be sure to read ?predict and ?predict.glm and work with some simple examples to make sure you are getting what you intended.
Predictions from models containing interactions (at least those involving 2 covariates) should be thought of as being surfaces or 2-d manifolds in three dimensions. (And for 3-covariate interactions as being iso-value envelopes.) The reason that non-interaction models can be decomposed into separate term "effects" is that the slopes of the planar prediction surfaces remain constant across all levels of input. Such is not the case with interactions, especially those with multiplicative and non-linear model structures. The graphical tools and insights that one picks up in a differential equations course can be productively applied here.

Resources