When using the visreg package to visualise a GAM with a contrast plot, the confidence interval goes to zero at the inflection point when the graph is U-shaped:
# Load libraries
library(mgcv)
library(visreg)
# Synthetic data
df <- data.frame(a = -10:10, b = jitter((-10:10)^2, amount = 10))
# Fit GAM
res <- gam(b ~ s(a), data = df)
# Make contrast figure
visreg(res, type = "contrast")
This seems dodgy and doesn't happen when making a conditional plot (i.e., visreg(res, type = "conditional")), so instead I'm looking at the mgcv package to make the same plot. I can make a conditional plot using mgcv (e.g., plot.gam(res)), but I don't see the option to make a contrast plot. Is this possible with the mgcv package?
This is due to identifiability constraints imposed on the spline basis/bases used in the model. This is a sum-to-zero constraint and effectively removes an intercept-like basis function from the basis used for each smooth term so that these are not confounded with the model intercept. This allows the model to be identifiable, rather than having an infinity of solutions.
Using standard theory, the confidence interval has to tend to zero where it crosses zero on the y-axis (the centred effect usually, but here as shown it is on on some transformed scale) as the constraint implies that at some point x, the effect is 0 and has 0 variance.
This is nonsense of course and recent research has investigated this problem. One solution provided by Simon Wood and colleagues employs extensions to Nychka's observation that, for the Gaussian case, the Bayesian credible interval for a smooth has good across-the-function interpretation as a frequentist confidence interval (so not pointwise, but not simultaneous either). Nychka's results (the coverage properties of the interval) fail in situations where the estimated smooth has squared bias that is not substantially less than the variance of the estimate; clearly this fails to be the case when the variance hits zero where the estimated smooth passes through zero effect as the bias is not actually quite zero at this point.
Marra and Wood (2012) have extended these results to the generalized model setting, basically estimating the confidence interval for one smooth by assuming that all the other terms in the model have had the identifiability constraints applied to them, but not the smooth of interest. This shifts the focus of inference from the smooth directly to the smooth + intercept. You can turn this on in plot.gam() with the argument seWithMean = TRUE.
I don't see an easy way to make visreg do this however, although it is trivial to get back the information you want via predict.gam() with the options type = 'iterms', se.fit = TRUE. This returns, on the scale of the linear predictor, the contributions of each model smooth term plus the standard error that include the correction implied by seWithMean. You can then fiddle with this to your heart's content; adding on the model constant term (the estimate for the intercept) for example should provide you something close to the figure you show in your question.
Related
I am using spatstat to build point process models using the ppm function but I have problems in validation, when I use the residual plot parres to understand the effect of a covariate.
The model is composed of 1022 locations of bird occurrences (called ois.ppm), the habitat availability (a raster called FB0lin which has been normalized and log-transformed), the sampling effort (a raster called Nbdate, normalized too) and the accessibility of places (a raster called pAccess, normalized too) across the study area. The objective is to evaluate the fit of a Gibbs point process model with a Geyer process parameter, the habitat availability, the sampling effort and the accessibility. The eps function was also used to create a set of dummy points chosen along a grid with a 100 x 100 m resolution.
The model used is :
mod.ois.echlin = ppm(ois.ppp, ~ FB0lin + Nbdate + pAccess, interaction = Geyer(r=401,sat=9), eps=100)
Geyer parameter were identified using :
rs=expand.grid(r=seq(1,1001, by=50), sat=1:40)
term.interlin=profilepl(rs, Geyer, ois.ppp,~FB0lin+Nbdate+pAccess)
Then I use the parres function :
res.FB0.echlin=parres(mod.ois.echlin, covariate="FB0lin")
plot(res.FB0.echlin,main="FB0 LinCost", legend=FALSE)
The problem is that the fitted values seems not to be optimal (see figure below). The fit curve should have lower values within interval confidence but is outside of this interval, which probably affect the quality of the point process model.
My questions are then :
Have you ever seen such a result and is it normal ?
is it possible to correct it ?
Figure : Smoothed partial residuals - FB0lin
Any advice would be much appreciated.
The diagnostic is working correctly. It indicates that, as a function of the predictor variable FB0lin, the fitted model (represented by the dashed straight line) overestimates the true intensity of the model (represented by the thick black curve with grey confidence bands) by a constant amount. The linear relationship (linear dependence of the log intensity on the covariate) seems to be adequate, in the sense that you don't need to replace this linear relationship by a more complicated relationship (which is the main question for which the partial residuals are used). The diagnostic says that the model is adequate except that it is underestimating the log intensity by a constant amount, which means that it is underestimating the intensity by a constant factor. (This could be due to the way in which the other predictors Nbdate and pAccess are involved in the model, or it could be due to the choice of interpoint interaction. To investigate that, you need to try other tools as discussed in Chapter 11 of the spatstat book.)
I have made a GAM model using "mgcv" package with the family = inverse.guassian(link = identity) and I am really happy with the fit. After plotting the smooth terms using gratia:draw(GAM, residuals = TRUE) I am really confused by the y-axis. What does "effect" mean?
Any help would be very appreciated!
Thank you
Technically this should read "Partial effect" (and I'll be fixing this shortly). This is the smooth effect of the covariate on the response conditional upon the other estimated terms.
Most smooths in {mgcv} are subject to a sum-to-zero identifiability constraint (so we can include an intercept in the model, which is especially useful when we have factor parametric terms in the model also), so they are centred about 0. The 0 line then means the overall mean (on the link scale) of the response (or the reference levels if factor parametric terms are involved in the model); negative values on the axis indicate where the effect of the covariate reduces the response below the average value, and positive values on the axis indicate those covariate values where the response is increased above the average. All conditional upon the other estimated model terms.
My Goal: I have an ordinal factor variable (5 levels) to which I would like to apply contrasts to test for a linear trend. However, the factor groups have heterogeneity of variance.
What I've done: Upon recommendation, I used lmRob() from robust pckg to create a robust linear model, then applied the contrasts.
# assign the codes for a linear contrast of 5 groups, save as object
contrast5 <- contr.poly(5)
# set contrast property of sf1 to contain the weights
contrasts(SCI$sf1) <- contrast5
# fit and save a robust model (exhaustive instead of subsampling)
robmod.sf1 <- lmRob(ICECAP_A ~ sf1, data = SCI, nrep = Exhaustive)
summary.lmRob(robmod.sf1)
My problem: I have since been reading that robust regression is more suited to address outliers, and not heterogeneity of variance. (bottom of https://stats.idre.ucla.edu/r/dae/robust-regression/_ ) This UCLA page (among others) suggests the sandwich package to get heteroskedastic-consistent (HC) standard errors (such as in https://thestatsgeek.com/2014/02/14/the-robust-sandwich-variance-estimator-for-linear-regression-using-r/ ).
But these examples use a series of functions/calls to generate output that gives you the HC that could be used to calculate confidence intervals, t-values, p-values etc.
My thinking is that if I use vcovHC(), I could get the HC std errors, but the HC std errors would not have been 'applied'/a property of the model, so I couldn't pass the model (with the HC errors) through a function to apply the contrasts that I ultimately want. I hope I am not conflating two separate concepts, but surely if a function addresses/down-weights outliers, that should at least somewhat address unequal variances as well?
Can anyone confirm if my reasoning is sound (and thus remain with lmRob()? Or suggest how I could just correct my standard errors and still apply the contrasts?
vcovHC is the right function to deal with heteroscedasticity. HC stands for heteroscedasticity-consistent estimator. This will not downweight outliers in estimates of model effects, but it will calculated the CIs and p-values differently to accommodate the impact of such outlying observations. lmRob does downweight outlying values and does not handle heteroscedasticity
See more here:
https://stats.stackexchange.com/questions/50778/sandwich-estimator-intuition/50788#50788
I am conducting an analysis of where on the landscape a predator encounters potential prey. My response data is binary with an Encounter location = 1 and a Random location = 0 and my independent variables are continuous but have been rescaled.
I originally used a GLM structure
glm_global <- glm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
data=Data_scaled, family=binomial)
but realized that this failed to account for potential spatial-autocorrelation in the data (a spline correlogram showed high residual correlation up to ~1000m).
Correlog_glm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glm_global,
type = "pearson"), xmax = 1000)
I attempted to account for this by implementing a GLMM (in lme4) with the predator group as the random effect.
glmm_global <- glmer(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs+(1|Group),
data=Data_scaled, family=binomial)
When comparing AIC of the global GLMM (1144.7) to the global GLM (1149.2) I get a Delta AIC value >2 which suggests that the GLMM fits the data better. However I am still getting essentially the same correlation in the residuals, as shown on the spline correlogram for the GLMM model).
Correlog_glmm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glmm_global,
type = "pearson"), xmax = 10000)
I also tried explicitly including the Lat*Long of all the locations as an independent variable but results are the same.
After reading up on options, I tried running Generalized Estimating Equations (GEEs) in “geepack” thinking this would allow me more flexibility with regards to explicitly defining the correlation structure (as in GLS models for normally distributed response data) instead of being limited to compound symmetry (which is what we get with GLMM). However I realized that my data still demanded the use of compound symmetry (or “exchangeable” in geepack) since I didn’t have temporal sequence in the data. When I ran the global model
gee_global <- geeglm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
id=Pride, corstr="exchangeable", data=Data_scaled, family=binomial)
(using scaled or unscaled data made no difference so this is with scaled data for consistency)
suddenly none of my covariates were significant. However, being a novice with GEE modelling I don’t know a) if this is a valid approach for this data or b) whether this has even accounted for the residual autocorrelation that has been evident throughout.
I would be most appreciative for some constructive feedback as to 1) which direction to go once I realized that the GLMM model (with predator group as a random effect) still showed spatially autocorrelated Pearson residuals (up to ~1000m), 2) if indeed GEE models make sense at this point and 3) if I have missed something in my GEE modelling. Many thanks.
Taking the spatial autocorrelation into account in your model can be done is many ways. I will restrain my response to R main packages that deal with random effects.
First, you could go with the package nlme, and specify a correlation structure in your residuals (many are available : corGaus, corLin, CorSpher ...). You should try many of them and keep the best model. In this case the spatial autocorrelation in considered as continous and could be approximated by a global function.
Second, you could go with the package mgcv, and add a bivariate spline (spatial coordinates) to your model. This way, you could capture a spatial pattern and even map it. In a strict sens, this method doesn't take into account the spatial autocorrelation, but it may solve the problem. If the space is discret in your case, you could go with a random markov field smooth. This website is very helpfull to find some examples : https://www.fromthebottomoftheheap.net
Third, you could go with the package brms. This allows you to specify very complex models with other correlation structure in your residuals (CAR and SAR). The package use a bayesian approach.
I hope this help. Good luck
I want to determine the marginal effects of each dependent variable in a probit regression as follows:
predict the (base) probability with the mean of each variable
for each variable, predict the change in probability compared to the base probability if the variable takes the value of mean + 1x standard deviation of the variable
In one of my regressions, I have a multiplicative variable, as follows:
my_probit <- glm(a ~ b + c + I(b*c), family = binomial(link = "probit"), data=data)
Two questions:
When I determine the marginal effects using the approach above, will the value of the multiplicative term reflect the value of b or c taking the value mean + 1x standard deviation of the variable?
Same question, but with an interaction term (* and no I()) instead of a multiplicative term.
Many thanks
When interpreting the results of models involving interaction terms, the general rule is DO NOT interpret coefficients. The very presence of interactions means that the meaning of coefficients for terms will vary depending on the other variate values being used for prediction. The right way to go about looking at the results is to construct a "prediction grid", i.e. a set of values that are spaced across the range of interest (hopefully within the domain of data support). The two essential functions for this process are expand.grid and predict.
dgrid <- expand.grid(b=fivenum(data$b)[2:4], c=fivenum(data$c)[2:4]
# A grid with the upper and lower hinges and the medians for `a` and `b`.
predict(my_probit, newdata=dgrid)
You may want to have the predictions on a scale other than the default (which is to return the linear predictor), so perhaps this would be easier to interpret if it were:
predict(my_probit, newdata=dgrid, type ="response")
Be sure to read ?predict and ?predict.glm and work with some simple examples to make sure you are getting what you intended.
Predictions from models containing interactions (at least those involving 2 covariates) should be thought of as being surfaces or 2-d manifolds in three dimensions. (And for 3-covariate interactions as being iso-value envelopes.) The reason that non-interaction models can be decomposed into separate term "effects" is that the slopes of the planar prediction surfaces remain constant across all levels of input. Such is not the case with interactions, especially those with multiplicative and non-linear model structures. The graphical tools and insights that one picks up in a differential equations course can be productively applied here.