Population-level prediction from bam {mgcv} - r

Using bam, I made a logistic mixed model with the following form:
PresAbs ~ s(Var 1) + s(Var 2) + ... + s(Var n) + s(RandomVar, bs = "re")
The RandomVar is a factor and I am not interested in the predictions for each of its level. How can I obtain population-level prediction, comparable to predict.lme?

One way is just exclude the random effect spline from the predictions.
Using the example from ?gam.models
library("mgcv")
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
m1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
we want to exclude the term s(fac) as it is written in the output from
summary(m1)
For the observed data, population effects are
predict(m1, exclude = 's(fac)')
but you can supply newdata to generate predictions for other combinations of the covariates.

Related

test significance between models with emmeans

Let's say I have these two models
dat1 <- data.frame(x=factor(c(1,2,1,1,2,2)),y=c(2,5,2,1,7,9))
dat2 <- data.frame(x=factor(c(1,2,1,1,2,2)),y=c(3,3,4,3,4,2))
mod1 <- lm(y~x,data=dat1)
mod2 <- lm(y~x, data=dat2)
and calculate a t test between the levels of x in each model
t1 <- pairs(emmeans(mod1, ~x))
t2 <- pairs(emmeans(mod2, ~x))
How can I assess whether the two models are significantly different for this contrast using emmeans?
dat1$dataset <- "dat1"
dat2$dataset <- "dat2"
alldat <- rbind(dat1, dat2)
modsame <- lm(y ~ x, data = alldat)
moddiff <- lm(y ~ x * dataset, data = alldat)
anova(modsame, moddiff)
Don't try to use emmeans() to do this; that isn't its purpose. The anova() call above compares the two models: modsame presumes that the x effects are the same in each dataset; moddiff adds two terms, dataset which accounts for the change in overall mean, and x:dataset which accounts for the change in x effects.
The comparison between the two models comprises a joint test of both the dataset and the x:dataset effects -- it is an F test with 2 numerator d.f. -- not a t test.

Implement null distribution for gbm interaction strength

I am trying to determine which interactions in a gbm model are significant using the method described in Friedman and Popescu 2008 https://projecteuclid.org/euclid.aoas/1223908046. My gbm is a classification model with 9 different classes.
I'm struggling with how to translate Section 8.3 into code to run in R.
I think the overall process is to:
Train a version of the model with max.depth = 1
Simulate response data from this model
Train a new model on this data with max.depth the same as the real model
Get interaction strength for this model
Repeat steps 1-4 to create a null distribution of interaction strengths
The part that I am finding most confusing is implementing equations 48 and 49. (You will have to look at the linked article since I can't reproduce them here)
This is what I think I understand but please correct me if I'm wrong:
y_i is a new vector of the response that we will use to train a new model which will provide the null distribution of interaction statistics.
F_A(x_i) is the prediction from a version of the gbm model trained with max.depth = 1
b_i is a probability between 0 and 1 based on the prediction from the additive model F_A(x_i)
Questions
What is subscript i? Is it the number of iterations in the bootstrap?
How is each artificial data set different from the others?
Are we subbing the Pr(b_i = 1) into equation 48?
How can this be done with multinomial classification?
How would one implement this in R? Preferably using the gbm package.
Any ideas or references are welcome!
Overall, the process is an elegant way to neutralise the interaction effects in y by permuting/re-distributing the extra contribution of modelling on the interactions. The extra contribution could be captured by the margins between a full and an additive model.
What is subscript i? Is it the number of iterations in the bootstrap?
It is the index of samples. There are N samples in each iteration.
How is each artificial data set different from the others?
The predictors X are the same across data sets. The response values Y~ are different due to random permutation of margins in equation 47 and random realisation (for categorical outcomes only) in equation 48.
Are we subbing the Pr(b_i = 1) into equation 48?
Yes, if the outcome Y is binary.
How can this be done with multinomial classification?
One way is to randomly permute margins in the log-odds of each category. Followed by random realisation according to the probability from the additive model.
How would one implement this in R? Preferably using the gbm package.
I tried to implement it in-line with your overall process.
Firstly, a simulated training data set {X1,X2,Y} of size N=200 where Y has three categories (Y1,Y2,Y3) realised by the probabilities determined by X1, X2. The interaction part X1*X2 is in Y1, while the additive parts are in Y2,Y3.
set.seed(1)
N <- 200
X1 <- rnorm(N) # 2 predictors
X2 <- rnorm(N)
#log-odds for 3 categories
Y1 <- 2*X1*X2 + rnorm(N, sd=1/10) # interaction
Y2 <- X1^2 + rnorm(N, sd=1/10) #additive
Y3 <- X2^2 + rnorm(N, sd=1/10) #additive
Y <- rep(NA, N) # Multinomial outcome with 3 categories
for (i in 1:N)
{
prob <- 1 / (1 + exp(-c(Y1[i],Y2[i],Y3[i]))) #logistic regression
Y[i] <- which.max(rmultinom(1, 10000, prob=prob)) #realisation from prob
}
Y <- factor(Y)
levels(Y) <- c('Y1','Y2','Y3')
table(Y)
#Y1 Y2 Y3
#38 75 87
dat = data.frame(Y, X1, X2)
head(dat)
# Y X1 X2
# 2 -0.6264538 0.4094018
# 3 0.1836433 1.6888733
# 3 -0.8356286 1.5865884
# 2 1.5952808 -0.3309078
# 3 0.3295078 -2.2852355
# 3 -0.8204684 2.4976616
Train a full and an additive models with max.depth = 2 and 1 respectively.
library(gbm)
n.trees <- 100
F_full <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=2) # consider interactions
F_additive <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=1) # ignore interactions
#use improved prediction as interaction strength
interaction_strength_original <- min(F_additive$cv.error) - min(F_full$cv.error)
> 0.1937891
Simulate response data from this model.
#randomly permute margins (residuals) of log-odds to remove any interaction effects
margin <- predict(F_full, n.trees=gbm.perf(F_full, plot.it=FALSE), type='link')[,,1] -
predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link')[,,1]
margin <- apply(margin, 2, sample) #independent permutation for each category (Y1, Y2, Y3)
Y_art <- rep(NA, N) #response values of an artificial dataset
for (i in 1:N)
{
prob <- predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link',
newdata=dat[i,])
prob <- prob + margin[i,] # equation (47)
prob <- 1 / (1 + exp(-prob))
Y_art[i] <- which.max(rmultinom(1, 1000, prob=prob)) #Similar to random realisation in equation (49)
}
Y_art <- factor(Y_art)
levels(Y_art) = c('Y1','Y2','Y3')
table(Y_art)
#Y1 Y2 Y3
#21 88 91
Train a new model on this artificial data with max.depth (2) the same as the real model
F_full_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=2)
F_additive_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=1)
Get interaction strength for this model
interaction_strength_art = min(F_additive_art$cv.error) - min(F_full_art$cv.error)
> 0.01323959 # much smaller than interaction_strength_original in step 1.
Repeat steps 2-4 to create a null distribution of interaction strengths. As expected, the interaction effects are much lower (-0.0527 to 0.0421) in the neutralised data sets than in the original training data set (0.1938).
interaction_strength_art <- NULL
for (j in 1:10)
{
print(j)
interaction_strength_art <- c(interaction_strength_art, step_2_to_4())
}
summary(interaction_strength_art)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.052648 -0.019415 0.001124 -0.004310 0.012759 0.042058
interaction_strength_original
> 0.1937891

Individual terms in prediction of linear regression

I performed a regression analyses in R on some dataset and try to predict the contribution of each individual independent variable on the dependent variable for each row in the dataset.
So something like this:
set.seed(123)
y <- rnorm(10)
m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10))
regr <- lm(formula=y~v1+v2+v3, data=m)
summary(regr)
terms <- predict.lm(regr,m, type="terms")
In short: run a regression and use the predict function to calculate the terms of v1,v2 and v3 in dataset m. But I am having a hard time understanding what the predict function is calculating. I would expect it multiplies the coefficient of the regression result with the variable data. So something like this for v1:
coefficients(regr)[2]*m$v1
But that gives different results compared to the predict function.
Own calculation:
0.55293884 0.16253411 0.18103537 0.04999729 -0.25108302 0.80717945 0.22488764 -0.88835486 0.31681455 -0.21356803
And predict function calculation:
0.45870070 0.06829597 0.08679724 -0.04424084 -0.34532115 0.71294132 0.13064950 -0.98259299 0.22257641 -0.30780616
The prediciton function is of by 0.1 or so Also if you add all terms in the prediction function together with the constant it doesn’t add up to the total prediction (using type=”response”). What does the prediction function calculate here and how can I tell it to calculate what I did with coefficients(regr)[2]*m$v1?
All the following lines result in the same predictions:
# our computed predictions
coefficients(regr)[1] + coefficients(regr)[2]*m$v1 +
coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3
# prediction using predict function
predict.lm(regr,m)
# prediction using terms matrix, note that we have to add the constant.
terms_predict = predict.lm(regr,m, type="terms")
terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant')
You can read more about using type="terms" here.
The reason that your own calculation (coefficients(regr)[2]*m$v1) and the predict function calculation (terms_predict[,1]) are different is because the columns in the terms matrix are centered around the mean, so their mean becomes zero:
# this is equal to terms_predict[,1]
coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1)
# indeed, all columns are centered; i.e. have a mean of 0.
round(sapply(as.data.frame(terms_predict),mean),10)
Hope this helps.
The function predict(...,type="terms") centers each variable by its mean. As a result, the output is a little difficult to interpret. Here's an alternative where each variable (constant, x1, and x2) is multiplied to its coefficient.
TLDR: pred_terms <- model.matrix(formula(mod$terms), testData) %*% diag(coef(mod))
library(tidyverse)
### simulate data
set.seed(123)
nobs <- 50
x1 <- cumsum(rnorm(nobs) + 3)
x2 <- cumsum(rnorm(nobs) * 3)
y <- 2 + 2*x1 -0.5*x2 + rnorm(nobs,0,50)
df <- data.frame(t=1:nobs, y=y, x1=x1, x2=x2)
train <- 1:round(0.7*nobs,0)
rm(x1, x2, y)
trainData <- df[train,]
testData <- df[-train,]
### linear model
mod <- lm(y ~ x1 + x2 , data=trainData)
summary(mod)
### predict test set
test_preds <- predict(mod, newdata=testData)
head(test_preds)
### contribution by predictor
test_contribution <- model.matrix(formula(mod$terms), testData) %*% diag(coef(mod))
colnames(test_contribution) <- names(coef(mod))
head(test_contribution)
all(round(apply(test_contribution, 1, sum),5) == round(test_preds,5)) ## should be true
### Visualize each contribution
test_contribution_df <- as.data.frame(test_contribution)
test_contribution_df$pred <- test_preds
test_contribution_df$t <- row.names(test_contribution_df)
test_contribution_df$actual <- df[-train,"y"]
test_contribution_df_long <- pivot_longer(test_contribution_df, -t, names_to="variable")
names(test_contribution_df_long)
ggplot(test_contribution_df_long, aes(x=t, y=value, group=variable, color=variable)) +
geom_line() +
theme_bw()

Bayesian Weighted Proportional-Odds Cumulative Logit model in JAGS

I am creating multilevel/mixed ordinal models in JAGS code, which are run through the “runjags” package interface in [R].
The dataset that I am analysing also includes a set of survey weights. These are important to include, in order to make the data representative of the population surveyed.
For a glm regression this can be achieved with the following code, which seeks to reduce the variance of extreme responses, bringing values closer to the expected distribution ("bias-variance tradeoff"):
y[i] ~ dnorm(fitted[i], (precision/weights[i]))
I would like to know the most appropriate way to include survey weights in a Proportional-Odds Cumulative Logit model.
Please see the JAGS code beneath for an example of the unweighted models that I have constructed so far. A working application for the displayed code and/or reference to a relevant citation would be most appreciated.
polr_model <- model{
## N = Number of observations
## ConditionCat = y-varaiable measured as 5-point Likert scale
## Categories = Number of y-variable categories (5)
## variable1 = numeric x-variable
## variable2 = numeric x-variable
## Geography = level-2 hierachical struture, in which observations are nested
## weights = survey weights (NOT CURRENTLY USED)
for(i in 1:N){
## Calculate categorical likelihood of the outcome:
ConditionCat[i] ~ dcat(prob[i,1:Categories])
## The odds is fixed between observation categories
## This is the same as any LR except that the intercept isnt included here:
odds[i] <- b1 * variable1[i] + b2 * variable2[i] +
randomeffect[Geography[i]]
## Each observation category is associated with a cumualtive probability called theta:
for(k in 1:(Categories-1)){
## Each observation category has its own intercept - note the minus sign here NOT plus:
logit(theta[i,k]) <- intercept[k] - odds[i]
}
## With the cumulative probability of the highest category equal to 1:
theta[i, Categories] <- 1
## Then calculate the non-cumulative probabilities from theta:
prob[i,1] <- theta[i,1]
for(k in 2:(Categories-1)){
prob[i,k] <- theta[i,k] - theta[i,(k-1)]
}
prob[i, Categories] <- 1 - theta[i,(Categories-1)]
}
## These lines give the prior distributions for the numeric fixed parameters to be estimated:
b1 ~ dnorm(0, 10^-6)
b2 ~ dnorm(0, 10^-6)
## Calculate the Categories-1 independent intercepts::
for(k in 1:(Categories-1)){
intercept[k] ~ dnorm(0, 10^-6)
}
## These lines give the prior distributions for the 'random' (hierachical/nested) parameters to be estimated:
for(iterator in 1:Nest1){
randomeffect[iterator] ~ dnorm(0, Geography_precision)}
Geography_precision ~ dgamma(0.001, 0.001)
#data# N, Nest1, Categories, ConditionCat, variable1, variable2, Geography
#monitor# intercept, b1, b2, randomeffect
#inits# Geography_precision, intercept, b1, b2
#modules# glm on, lecuyer on
}

r loess: coefficients of global "parametric" terms

Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...

Resources