Interpretation of contour plots (mgcv) - r

When we plot a GAM model using the mgcv package with isotropic smoothers, we have a contour plot that looks something like this:
x axis for one predictor,
y axis for another predictor,
the main is a function s(x1, x2) (isotropic smother).
Suppose that in this model we have many other isotropic smoothers like:
y ~ s(x1, x2) + s(x3, x4) + s(x5, x6)
My doubts are: when interpreting the contour plot for s(x1, x2), what happens to the others isotropic smoothers? Are they "fixed at their medians"? Can we interpret a s(x1, x2) plot separately?

Because this model is additive in the functions you can interpret the functions (the separate s() terms) separately, but not necessarily as separate effects of covariates on the response. In your case there is no overlap between the covariates in each of the bivariate smooths, so you can also interpret them as the effects of the covariates on the response separately from the other smoothers.
All of the smooth functions are typically subject to a sum to zero constraint to allow the model constant term (the intercept) to be an identifiable parameter. As such, the 0 line in each plot is the value of the model constant term (on the scale of the link function or linear predictor).
The plots shown in the output from plot.gam(model) are partial effects plots or partial plots. You can essentially ignore the other terms if you are interested in understanding the effect of that term on the response as a function of the covariates for the term.
If you have other terms in the model that might include one or more covariates in another terms, and you want to look at how the response changes as you vary that term or coavriate, then you should predict from the model over the range of the variables you are interested in, whilst holding the other variables at some representation values, say their means or medians.
For example if you had
model <- gam(y ~ s(x, z) + s(x, v), data = foo, method = 'REML')
and you want to know how the response varied as a function of x only, you would fix z and v at representative values and then predict over a range of values for x:
newdf <- with(foo, expand.grid(x = seq(min(x), max(x), length = 100),
z = median(z)
v = median(v)))
newdf <- cbind(newdf, fit = predict(model, newdata = newdf, type = 'response'))
plot(fit ~ x, data = newdf, type = 'l')
Also, see ?vis.gam in the mgcv package as a means of preparing plots like this but where it does the hard work.

Related

Get real predicted values from GLM

I am running GLM with linear regression, then i am using predict to fit the response on my test data, but the problem is i am getting the probabilities and i don't know how to convert those probabilities to real values.
log<- glm(formula=stock_out_duration~lag_2_market_unres_dos+lag_2_percentage_bias_forecast_error + forecast,train_data_final,family = inverse.gaussian(link = "log"),maxit=100)
summary(log)
predict <- predict(log, test_data, type = 'response')
table_mat <- table(test_data$stock_out_duration)
table_mat
As far as I'm aware, there isn't a magic function that does this for you given that you're using glm. As you've noted, what typically gets returned is the probabilities. You can convert the probabilities into predictions for the outcome of the underlying categories by choosing the outcome with the largest probability. I agree a one-line function for this would be nice though.
You can get this functionality if use the glmnet package.
library(glmnet)
y = ifelse(rnorm(100) > 0, "red", "blue")
y = factor(y)
x = rnorm(100)
fit = glmnet(x, y, family="binomial") # use family="multinomial" if there are more than 2 categories in your factor
yhat = predict(fit, newx=x, type="class", s=0)
yhat in the above will be a vector containing either "red" or "blue".
Note, the type="class" is the bit that gets you the category outcomes returned in yhat. The s=0 means to use a lambda penalty of zero for the coefficients you use to get predictions. You indicated in the question that you were just doing ordinary regression without any ridge or lasso style penalty factors, so s=0 ensures you get that in your predictions.

r loess: coefficients of global "parametric" terms

Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...

R: varying-coefficient GAMM models in mgcv - extracting 'by' variable coefficients?

I am creating a varying-coefficient GAMM using 'mgcv' in R with a continuous 'by' variable by using the by setting. However, I am having difficulty in locating the parameter estimate of the effect of the 'by' variable. In this example we determine the spatially-dependent effect of temperature t on sole eggs (i.e. how the linear effect of temperature on sole eggs changes across space):
require(mgcv)
require(gamair)
data(sole)
b = gam(eggs ~ s(la,lo) + s(la,lo, by = t), data = sole)
We can then plot the predicted effects of s(la,lo, by = t) against the predictor t:
pred <- predict(b, type = "terms", se.fit =T)
by.variable.prediction <- pred[[1]][,2]
plot(x= sole$t, y = by.variable.prediction)
However, I can't find a listing/function with the parameter estimates of the 'by' variable t for each sampling location. summary(), coef(), and predict() do not give you the parameter estimates.
Any help would be appreciated!
So the coefficient for the variable t is the value where t is equal to 1, conditional on the latitude and longitude. So one way to get the coefficient/parameter estimate for t at each latitude and longitude is to construct your own dataframe with a range of latitude/longitude combinations with t=1 and run predict.gam on that (rather than running predict.gam on the data used the fit the model, as you have done). So:
preddf <- expand.grid(list(la=seq(min(sole$la), max(sole$la), length.out=100),
lo=seq(min(sole$lo), max(sole$lo), length.out=100),
t=1))
preddf$parameter <- predict(b, preddf, type="response")
And then if you want to visualize this coefficient over space, you could graph it with ggplot2.
library(ggplot2)
ggplot(preddf) +
geom_tile(aes(x=lo, y=la, fill=parameter))

`rms::ols()`: how to fit a model without intercept

I'd like to use the ols() (ordinary least squares) function from the rms package to do a multivariate linear regression, but I would not like it to calculate the intercept. Using lm() the syntax would be like:
model <- lm(formula = z ~ 0 + x + y, data = myData)
where the 0 stops it from calculating an intercept, and only two coefficients are returned, on for x and the other for y. How do I do this when using ols()?
Trying
model <- ols(formula = z ~ 0 + x + y, data = myData)
did not work, it still returns an intercept and a coefficient each for x and y.
Here is a link to a csv file
It has five columns. For this example, can only use the first three columns:
model <- ols(formula = CorrEn ~ intEn_anti_ncp + intEn_par_ncp, data = ccd)
Thanks!
rms::ols uses rms:::Design instead of model.frame.default. Design is called with the default of intercept = 1, so there is no (obvious) way to specify that there is no intercept. I assume there is a good reason for this, but you can try changing ols using trace.

constrain predictions within outer knots in mgcv::gam using R

I would like to fit a cubic spline using the gam function in mgcv package in R. Furthermore, I would like to constrain values outside of the training set (beyond the outer knots) to be equal to the nearest knot value. That is, no model prediction should be done outside the range of the training data. I know I can do this simply by eliminating those points in the predict call and then setting them to the min and max of the training data. However, is there a built in method in gam to do this (just so it's a bit cleaner?)
Example code:
require(mgcv)
x = 10:90
y = x^2
mdl = gam(y ~ s(x, bs="cr"))
needed_x = 1:100
p = predict(mdl, newdata = list(x = needed_x)) #this returns model values form 1:9 and 91:100

Resources