What are the differences between directly plotting the fit function and plotting the predicted values(they have same shape but different ranges)? - r

I am trying to learn gam() in R for a logistic regression using spline on a predictor. The two methods of plotting in my code gives the same shape but different ranges of response in the logit scale, seems like an intercept is missing in one. Both are supposed to be correct but, why the differences in range?
library(ISLR)
attach(Wage)
library(gam)
gam.lr = gam(I(wage >250) ~ s(age), family = binomial(link = "logit"), data = Wage)
agelims = range(age)
age.grid = seq(from = agelims[1], to = agelims[2])
pred=predict(gam.lr, newdata = list(age = age.grid), type = "link")
par(mfrow = c(2,1))
plot(gam.lr)
plot(age.grid, pred)
I expected that both of the methods would give the exact same plot. plot(gam.lr) plots the additive effects of each component and since here there's only one so it is supposed to give the predicted logit function. The predict method is also giving me estimates in the link scale. But the actual outputs are on different ranges. The minimum value of the first method is -4 while that of the second is less than -7.

The first plot is of the estimated smooth function s(age) only. Smooths are subject to identifiability constraints as in the basis expansion used to parametrise the smooth, there is a function or combination of functions that are entirely confounded with the intercept. As such, you can't fit the smooth and an intercept in the same model as you could subtract some value from the intercept and add it back to the smooth and you have the same fit but different coefficients. As you can add and subtract an infinity of values you have an infinite supply of models, which isn't helpful.
Hence identifiability constraints are applied to the basis expansions, and the one that is most useful is to ensure that the smooth sums to zero over the range of the covariate. This involves centering the smooth at 0, with the intercept then representing the overall mean of the response.
So, the first plot is of the smooth, subject to this sum to zero constraint, so it straddles 0. The intercept in this model is:
> coef(gam.lr)[1]
(Intercept)
-4.7175
If you add this to values in this plot, you get the values in the second plot, which is the application of the full model to the data you supplied, intercept + f(age).
This is all also happening on the link scale, the log odds scale, hence all the negative values.

Related

LASSO-type regressions with non-negative continuous dependent variable (dependent var)

I am using "glmnet" package (in R) mostly to perform regularized linear regression.
However I am wondering if it can perform LASSO-type regressions with non-negative (integer) continuous (dependent) outcome variable.
I can use family = poisson, but the outcome variable is not specifically "count" variable. It is just a continuous variable with lower limit 0.
I aware of "lower.limits" function, but I guess it is for covariates (independent variables). (Please correct me if my understanding of this function not right.)
I look forward to hearing from you all! Thanks :-)
You are right that setting lower limit in glmnet is meant for covariates. Poisson will set a lower limit to zero because you exponentiate to get back the "counts".
Going along those lines, most likely it will work if you transform your response variable. One quick way is to take the log of your response variable, do the fit and transform it back, this will ensure that it's always positive. you have to deal with zeros
An alternative is a power transformation. There's a lot to think about and I can only try a two parameter box-cox with a dataset since you did not provide yours:
library(glmnet)
library(mlbench)
library(geoR)
data(BostonHousing)
data = BostonHousing
data$chas=as.numeric(data$chas)
# change it to min 0 and max 1
data$medv = (data$medv-min(data$medv))/diff(range(data$medv))
Then here I use a quick approximation via pca (without fitting all the variables) to get the suitable lambda1 and lambda2 :
bcfit = boxcoxfit(object = data[,14],
xmat = prcomp(data[,-14],scale=TRUE,center=TRUE)$x[,1:2],
lambda2=TRUE)
bcfit
Fitted parameters:
lambda lambda2 beta0 beta1 beta2 sigmasq
0.42696313 0.00001000 -0.83074178 -0.09876102 0.08970137 0.05655903
Convergence code returned by optim: 0
Check the lambda2, it is the one thats critical for deciding whether you get a negative value.. It should be rather small.
Create the functions to power transform:
bct = function(y,l1,l2){((y+l2)^l1 -1)/l1}
bctinverse = function(y,l1,l2){(y*l1+1)^(1/l1) -l2}
Now we transform the response:
data$medv_trans = bct(data$medv,bcfit$lambda[1],bcfit$lambda[2])
And fit glmnet:
fit = glmnet(x=as.matrix(data[,1:13]),y=data$medv_trans,nlambda=500)
Get predictions over all lambdas, and you can see there's no negative predictions once you transform back:
pred = predict(fit,as.matrix(data[,1:13]))
range(bctinverse(pred,bcfit$lambda[1],bcfit$lambda[2]))
[1] 0.006690685 0.918473356
And let's say we do a fit with cv:
fit = cv.glmnet(x=as.matrix(data[,1:13]),y=data$medv_trans)
pred = predict(fit,as.matrix(data[,1:13]))
pred_transformed = bctinverse(pred,bcfit$lambda[1],bcfit$lambda[2]
plot(data$medv,pred_transformed,xlab="orig response",ylab="predictions")

Test of second differences for average marginal effects in logistic regression

I have a question similar to the one here: Testing the difference between marginal effects calculated across factors. I used the same code to generate average marginal effects for two groups. The difference is that I am running a logistic rather than linear regression model. My average marginal effects are on the probability scale, so emmeans will not provide the correct contrast. Does anyone have any suggestions for how to test whether there is a significant difference in the average marginal effects between group 1 and group 2?
Thank you so much,
Ilana
It is a bit unclear what the issue really is, but I'll try. I'm supposing your logistic regression model was fitted using, say, glm:
mod <- glm(cbind(heads, tails) ~ treat, data = mydata, family = binomial())
If you then do
emm <- emmeans(mod, "treat")
emm ### marginal means
pairs(emm) ### differences
Your results will be presented on the logit scale.
If you want them on the probability scale, you can do
summary(emm, type = "response")
summary(pairs(emm), type = "response")
However, the latter will back-transform the differences of logits, thereby producing odds ratios.
If you actually want differences of probabilities rather than ratios of odds, use regrid(), which will construct a new grid of values after back-transforming (and hence it will forget the log transformation):
pairs(regrid(emm))
It seems possible that two or more factors are present and you want contrasts of contrasts on the probability scale. In that case, extend this idea by calling regrid() on the table of EMMs to put everything on the probability scale, then follow the analogous procedure used in the linked article.

Scenario development with GAM models

I'm working with a mgcv::gam model in R to generate predictions in which the relationship between time (year) and the outcome variable (out) varies. For example, in one scenario, I'd like to force time to affect the outcome variable in a linear manner, in another a marginally decreasing manner, and in another, I'd like to specify specific slopes of the time-outcome interaction. I'm unsure how to force the prediction to treat the interaction between time and the outcome variable in a specific manner:
res <- gam(out ~ s(time) + s(GEOID, bs='re'), data = df, method = "REML")
pred <- predict(gam, newdata = ndf, type="response", se=T)
There isn't an interaction betweentime and out; here time has a potentially non-linear effect on out.
Are we talking about trying to force certain shapes for the function of time? If so, you will need to estimate different models; use time if you want a linear effect:
res_lin <- gam(out ~ time + s(GEOID, bs='re'), data = df, method = "REML")
and look at shape constrained p splines to enforce montonicity or concave/convex relationships.
The scam package has these sorts of constraints and uses mgcv with GCV smoothness selection to fit the shape constrained models.
As for specifying a specific slope for the linear effect of time, I think you'll need to include time as an offset in the model. So say the slope you want is 0.5 I think you need to do + offset(I(0.5*time)) because an offset has by definition a coefficient of 1. I would double check this though as I might have messed up my thinking here.

Contrast plot for GAM using mgcv

When using the visreg package to visualise a GAM with a contrast plot, the confidence interval goes to zero at the inflection point when the graph is U-shaped:
# Load libraries
library(mgcv)
library(visreg)
# Synthetic data
df <- data.frame(a = -10:10, b = jitter((-10:10)^2, amount = 10))
# Fit GAM
res <- gam(b ~ s(a), data = df)
# Make contrast figure
visreg(res, type = "contrast")
This seems dodgy and doesn't happen when making a conditional plot (i.e., visreg(res, type = "conditional")), so instead I'm looking at the mgcv package to make the same plot. I can make a conditional plot using mgcv (e.g., plot.gam(res)), but I don't see the option to make a contrast plot. Is this possible with the mgcv package?
This is due to identifiability constraints imposed on the spline basis/bases used in the model. This is a sum-to-zero constraint and effectively removes an intercept-like basis function from the basis used for each smooth term so that these are not confounded with the model intercept. This allows the model to be identifiable, rather than having an infinity of solutions.
Using standard theory, the confidence interval has to tend to zero where it crosses zero on the y-axis (the centred effect usually, but here as shown it is on on some transformed scale) as the constraint implies that at some point x, the effect is 0 and has 0 variance.
This is nonsense of course and recent research has investigated this problem. One solution provided by Simon Wood and colleagues employs extensions to Nychka's observation that, for the Gaussian case, the Bayesian credible interval for a smooth has good across-the-function interpretation as a frequentist confidence interval (so not pointwise, but not simultaneous either). Nychka's results (the coverage properties of the interval) fail in situations where the estimated smooth has squared bias that is not substantially less than the variance of the estimate; clearly this fails to be the case when the variance hits zero where the estimated smooth passes through zero effect as the bias is not actually quite zero at this point.
Marra and Wood (2012) have extended these results to the generalized model setting, basically estimating the confidence interval for one smooth by assuming that all the other terms in the model have had the identifiability constraints applied to them, but not the smooth of interest. This shifts the focus of inference from the smooth directly to the smooth + intercept. You can turn this on in plot.gam() with the argument seWithMean = TRUE.
I don't see an easy way to make visreg do this however, although it is trivial to get back the information you want via predict.gam() with the options type = 'iterms', se.fit = TRUE. This returns, on the scale of the linear predictor, the contributions of each model smooth term plus the standard error that include the correction implied by seWithMean. You can then fiddle with this to your heart's content; adding on the model constant term (the estimate for the intercept) for example should provide you something close to the figure you show in your question.

Scale back linear regression coefficients in R from scaled and centered data

I'm fitting a linear model using OLS and have scaled my regressors with the function scale in R because of the different units of measure between variables. Then, I fit the model using the lm command and get the coefficients of the fitted model. As far as I know the coefficients of the fitted model are not in the same units of the original regressors variables and therefore must be scaled back before they can be interpreted. I have been searching for a direct way to do it by couldn't find anything. Does anyone know how to do that?
Please have a look to the code, could you please help me implementing what you proposed?
library(zoo)
filename="DataReg4.csv"
filepath=paste("C:/Reg/",filename, sep="")
separator=";"
readfile=read.zoo(filepath, sep=separator, header=T, format = "%m/%d/%Y", dec=".")
readfile=as.data.frame(readfile)
str(readfile)
DF=readfile
DF=as.data.frame(scale(DF))
fm=lm(USD_EUR~diff_int+GDP_US+Net.exports.Eur,data=DF)
summary(fm)
plot(fm)
I'm sorry this is the data.
http://www.mediafire.com/?hmcp7urt0ag8187
If you used the scale function with default arguments then your regressors will be centered (subtracting their mean) and divided by their standard deviations. You can interpret the coefficients without transforming them back to the original units:
Holding everything else constant, on average, a one standard deviation change in one of the regressors is associated with a change in the dependent variable corresponding to the coefficient of that regressor.
If you have included an intercept term in your model keep in mind that the interpretation of the intercept will change. The estimated intercept now represents the average level of the dependent variable when all of the regressors are at their average levels. This is a result of subtracting the mean from each variable.
To interpret the coefficients in non-standard deviation terms, just calculate the standard deviation of each regressor and multiple that by the coefficient.
To de-scale or back-transform regression coefficients from a regression done with scaled predictor variable(s) and non-scaled response variable the intercept and slope should be calculated as:
A = As - Bs*Xmean/sdx
B = Bs/sdx
thus the regression is,
Y = As - Bs*Xmean/sdx + Bs/sdx * X
where
As = intercept from the scaled regression
Bs = slope from the scaled regression
Xmean = the mean of the scaled predictor variable
sdx = the standard deviation of the predictor variable
This can be adjusted if Y was also scaled but it appears you decided not to do that ultimately with your dataset.
If I understand your description (that is unfortunately at the moment code-free), you are getting standardized regression coefficients for Y ~ As + Bs*Xs where all those "s" items are scaled variables. The coefficients then are the predicted change on a std deviation scale of Y associated with a change in X of one standard deviation of X. The scale function would have recorded the means and standard deviations in attributes for hte scaled object. If not, then you will have those estimates somewhere in your console log. The estimated change in dY for a change dX in X should be: dY*(1/sdY) = Bs*dX*(1/sdX). Predictions should be something along these lines:
Yest = As*(sdX) + Xmn + Bs*(Xs)*(sdX)
You probably should not have needed to standardize the Y values, and I'm hoping that you didn't because it makes dealing with the adjustment for the means of the X's easier. Put some code and example data in if you want implemented and checked answers. I think #DanielGerlance is correct in saying to multiply rather than divide by the SD's.

Resources