When does brms assume parameters are distributed multivariate normal? - r

Suppose I fit a varying-intercepts model in brms per below.
library(brms)
# Example data in "long" format:
# 50 subjects each completing 2 trials. Outcomes were recorded 2x per subject
# per trial: once when treat = 1 and once when treat = 0.
trials <- 2
subjects <- 50
N <- trials * subjects
df <- data.frame(
outcome = rnorm(N),
treated = rep(0:1, N),
subject = rep(1:subjects, each = trials)
)
# Fit varying slopes and varying intercepts model
mod <- brm(outcome ~ treated + (1 | subject), df,
iter = 500, warmup = 490, chains = 1)
If we look at the brms documentation, it notes:
(...) group-level parameters u are assumed to come from a multivariate
normal distribution with mean zero and unknown covariance matrix Σ:
However, there is no such covariance matrix for the intercepts:
# empty
mod$cov_ranef
What is assumed to be multivariate normal by brms - i.e. it only when coefficients are hierarchically modeled as varying across groups that there is a covariance matrix? What if there's a predictor that is included that only varies by group but is not modelled hierarchically?

Related

Find the parameter estimates for each random term in a binomial GLMM (lme4)?

Does anyone know how to extract the parameter estimates of random term when using the (1 | …) syntax in a glmer model (including se, t ratio and p value)? I’m only able to access the average variance and std. deviance with the summary function.
Some background: I used cohort and period random terms (both factorized), where period = each survey year, and cohort = 8 birth cohorts. My model empty model looks like this :
glmer(pid ~ age + age2 + (1 | cohort) + (1| period)
There's a bit of a conceptual problem with what you are doing. The random effects do not have the same standing in statistical theory as the fixed effects. You are not really supposed to be making inferences on their estimates since you don't have a random sampling from their overall population. Hence you need to make some unteseted assumptions on their distribution. That said, there are apparently times when you might want to do it but with care that you are not making unsupportable claims. See: https://stats.stackexchange.com/questions/392314/interpretation-of-fixed-effect-coefficients-from-glms-and-glmms .
Dimitris Rizopoulosthen responded to a request for the possibility of getting "an average" of the random effects conditional on the fixed effects (rather the flipped version of mixed models inference). He offered a function in his GLMM package:
https://drizopoulos.github.io/GLMMadaptive/articles/Methods_MixMod.html#marginalized-coefficients
This is his example ......
install.packages("GLMMadaptive"); library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
#We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
.... and then the call to his marginal_coefs function.
marginal_coefs(fm, std_errors=TRUE)
Estimate Std.Err z-value p-value
(Intercept) -1.6025 0.2906 -5.5154 < 1e-04
sexfemale -1.0975 0.3676 -2.9859 0.0028277
time 0.1766 0.0337 5.2346 < 1e-04
sexfemale:time 0.0508 0.0366 1.3864 0.1656167

Obtain predicted probabilities from rstanarm in ordinal regression

How can I generate the posterior probability distribution for each outcome for each predictor in an ordinal regression?
e.g.
what I am looking for is this:
library(rstanarm)
fit_f <- MASS::polr(tobgp ~ agegp, data = esoph)
predict(fit_f,newdata=data.frame(agegp=factor(levels(esoph$agegp))),type = "probs")
Now with rstanarm I do:
fit <- stan_polr(tobgp ~ agegp, data = esoph, method = "logit",
prior = R2(0.2, "mean"), init_r = 0.1, seed = 12345)
But how do I obtain the distribution for the individual outcomes/predictors?
I do get a distribution of probabilities using epred, but I don't understand for which outcome/predictor?
posterior_epred(fit, newdata=data.frame(agegp=factor(levels(esoph$agegp))))
The easiest way to do this in rstanarm is to use the posterior_predict function to obtain posterior predictions and then calculate the proportion of predictions that fall in each outcome category by observation. In code,
PPD <- posterior_predict(fit) # uses esoph
probs <- t(apply(PPD, MARGIN = 2, FUN = table) / nrow(PPD))
The matrix called probs has rows equal to the number of observations (in esoph) and columns equal to the number of categories in tobgp and each of its rows sums to 1.
head(probs)
0-9g/day 10-19 20-29 30+
1 0.26400 0.26250 0.22875 0.24475
2 0.25650 0.26750 0.23050 0.24550
3 0.25175 0.27975 0.22450 0.24400
4 0.25575 0.26000 0.24025 0.24400
5 0.26350 0.26625 0.23575 0.23450
6 0.28275 0.26025 0.21500 0.24200

Identifying lead/lags using multivariate regression analysis

I have three time-series variables (x,y,z) measured in 3 replicates. x and z are the independent variables. y is the dependent variable. t is the time variable. All the three variables follow diel variation, they increase during the day and decrease during the night. An example with a simulated dataset is below.
library(nlme)
library(tidyverse)
n <- 100
t <- seq(0,4*pi,,100)
a <- 3
b <- 2
c.unif <- runif(n)
amp <- 2
datalist = list()
for(i in 1:3){
y <- 3*sin(b*t)+rnorm(n)*2
x <- 2*sin(b*t+2.5)+rnorm(n)*2
z <- 4*sin(b*t-2.5)+rnorm(n)*2
data = as_tibble(cbind(y,x,z))%>%mutate(t = 1:100)%>% mutate(replicate = i)
datalist[[i]] <- data
}
df <- do.call(rbind,datalist)
ggplot(df)+
geom_line(aes(t,x),color='red')+geom_line(aes(t,y),color='blue')+
geom_line(aes(t,z),color = 'green')+facet_wrap(~replicate, nrow = 1)+theme_bw()
I can identify the lead/lag of y with respect to x and z individually. This can be done with ccf() function in r. For example
ccf(x,y)
ccf(z,y)
But I would like to do it in a multivariate regression approach. For example, nlme package and lme function indicates y and z are negatively affecting x
lme = lme(data = df, y~ x+ z , random=~1|replicate, correlation = corCAR1( form = ~ t| replicate))
It is impossible (in actual data) that x and z can negatively affect y.
I need the time-lead/lag and also I would like to get the standardized coefficient (t-value to compare the effect size), both from the same model.
Is there any multivariate model available that can give me the lead/lag and also give me regression coefficient?
We might be considering the " statistical significance of Cramer Rao estimation of a lower bound". In order to find Xbeta-Xinfinity, taking the expectation of Xbeta and an assumed mean neu; will yield a variable, neu^squared which can replace Xinfinity. Using the F test-likelihood ratio, the degrees of freedom is p2-p1 = n-p2.
Put it this way, the estimates are n=(-2neu^squared/neu^squared+n), phi t = y/Xbeta and Xbeta= (y-betazero)/a.
The point estimate is derived from y=aXbeta + b: , Xbeta. The time lead lag is phi t and the standardized coefficient is n. The regression generates the lower bound Xbeta, where t=beta.
Spectral analysis of the linear distribution indicates a point estimate beta zero = 0.27 which is a significant peak of
variability. Scaling Xbeta by Betazero would be an appropriate idea.

bootMer output of lme4 linear mixed model: correct structure? at what level am I bootstrapping?

I have a linear mixed model that describes the relation between a continuous outcome and the interaction between a binary grouping variable (subjects are 1 out of 2 genotypes) and a time variable (5 consecutive days). Every subject has repeated measures within a day. I have 47 subjects.
Some simulated data:
Genotype <- as.factor(append(rep("X", 230), rep("Y", 240)))
Day <- as.factor(rep(c(1:5), 94))
Subject <- as.factor(rep(c(1:47), times=1, each=10))
jitter <- rep(jitter(0.01*c(1:47)), times=1, each=10)
Outcome <- append(rnorm(230, mean=80, sd=16), rnorm(240, mean=85, sd=18)) + jitter
Data <- data.frame(Genotype, Day, Subject, Outcome)
The model is coded as:
Lmer <- lmer(Outcome ~ Genotype*Day + (1|Subject),
data = Data)
I want to perform bootstrapping. If I calculate bootstrapping results for a specific genotype on Day 1, I use this code:
new_dat <- subset(Lmer, Genotype=="X" & Day=="1")
b <- bootMer(Lmer, nsim=200,
FUN=function(x)predict(x, newdata=new_dat, re.form=NA))
b$t now gives me a matrix of 46 columns (=N-1) with each 200 identical observations.
My goal is to bootstrap on the subject level, should I then not have only 200 bootstrapping values (thus not 46*200 values)? Am I overlooking something in my code?

Implement null distribution for gbm interaction strength

I am trying to determine which interactions in a gbm model are significant using the method described in Friedman and Popescu 2008 https://projecteuclid.org/euclid.aoas/1223908046. My gbm is a classification model with 9 different classes.
I'm struggling with how to translate Section 8.3 into code to run in R.
I think the overall process is to:
Train a version of the model with max.depth = 1
Simulate response data from this model
Train a new model on this data with max.depth the same as the real model
Get interaction strength for this model
Repeat steps 1-4 to create a null distribution of interaction strengths
The part that I am finding most confusing is implementing equations 48 and 49. (You will have to look at the linked article since I can't reproduce them here)
This is what I think I understand but please correct me if I'm wrong:
y_i is a new vector of the response that we will use to train a new model which will provide the null distribution of interaction statistics.
F_A(x_i) is the prediction from a version of the gbm model trained with max.depth = 1
b_i is a probability between 0 and 1 based on the prediction from the additive model F_A(x_i)
Questions
What is subscript i? Is it the number of iterations in the bootstrap?
How is each artificial data set different from the others?
Are we subbing the Pr(b_i = 1) into equation 48?
How can this be done with multinomial classification?
How would one implement this in R? Preferably using the gbm package.
Any ideas or references are welcome!
Overall, the process is an elegant way to neutralise the interaction effects in y by permuting/re-distributing the extra contribution of modelling on the interactions. The extra contribution could be captured by the margins between a full and an additive model.
What is subscript i? Is it the number of iterations in the bootstrap?
It is the index of samples. There are N samples in each iteration.
How is each artificial data set different from the others?
The predictors X are the same across data sets. The response values Y~ are different due to random permutation of margins in equation 47 and random realisation (for categorical outcomes only) in equation 48.
Are we subbing the Pr(b_i = 1) into equation 48?
Yes, if the outcome Y is binary.
How can this be done with multinomial classification?
One way is to randomly permute margins in the log-odds of each category. Followed by random realisation according to the probability from the additive model.
How would one implement this in R? Preferably using the gbm package.
I tried to implement it in-line with your overall process.
Firstly, a simulated training data set {X1,X2,Y} of size N=200 where Y has three categories (Y1,Y2,Y3) realised by the probabilities determined by X1, X2. The interaction part X1*X2 is in Y1, while the additive parts are in Y2,Y3.
set.seed(1)
N <- 200
X1 <- rnorm(N) # 2 predictors
X2 <- rnorm(N)
#log-odds for 3 categories
Y1 <- 2*X1*X2 + rnorm(N, sd=1/10) # interaction
Y2 <- X1^2 + rnorm(N, sd=1/10) #additive
Y3 <- X2^2 + rnorm(N, sd=1/10) #additive
Y <- rep(NA, N) # Multinomial outcome with 3 categories
for (i in 1:N)
{
prob <- 1 / (1 + exp(-c(Y1[i],Y2[i],Y3[i]))) #logistic regression
Y[i] <- which.max(rmultinom(1, 10000, prob=prob)) #realisation from prob
}
Y <- factor(Y)
levels(Y) <- c('Y1','Y2','Y3')
table(Y)
#Y1 Y2 Y3
#38 75 87
dat = data.frame(Y, X1, X2)
head(dat)
# Y X1 X2
# 2 -0.6264538 0.4094018
# 3 0.1836433 1.6888733
# 3 -0.8356286 1.5865884
# 2 1.5952808 -0.3309078
# 3 0.3295078 -2.2852355
# 3 -0.8204684 2.4976616
Train a full and an additive models with max.depth = 2 and 1 respectively.
library(gbm)
n.trees <- 100
F_full <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=2) # consider interactions
F_additive <- gbm(Y ~ ., data=dat, distribution='multinomial', n.trees=n.trees, cv.folds=3,
interaction.depth=1) # ignore interactions
#use improved prediction as interaction strength
interaction_strength_original <- min(F_additive$cv.error) - min(F_full$cv.error)
> 0.1937891
Simulate response data from this model.
#randomly permute margins (residuals) of log-odds to remove any interaction effects
margin <- predict(F_full, n.trees=gbm.perf(F_full, plot.it=FALSE), type='link')[,,1] -
predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link')[,,1]
margin <- apply(margin, 2, sample) #independent permutation for each category (Y1, Y2, Y3)
Y_art <- rep(NA, N) #response values of an artificial dataset
for (i in 1:N)
{
prob <- predict(F_additive, n.trees=gbm.perf(F_additive, plot.it=FALSE), type='link',
newdata=dat[i,])
prob <- prob + margin[i,] # equation (47)
prob <- 1 / (1 + exp(-prob))
Y_art[i] <- which.max(rmultinom(1, 1000, prob=prob)) #Similar to random realisation in equation (49)
}
Y_art <- factor(Y_art)
levels(Y_art) = c('Y1','Y2','Y3')
table(Y_art)
#Y1 Y2 Y3
#21 88 91
Train a new model on this artificial data with max.depth (2) the same as the real model
F_full_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=2)
F_additive_art = gbm(Y_art ~ ., distribution='multinomial', n.trees=n.trees, cv.folds=3,
data=data.frame(Y_art, X1, X2),
interaction.depth=1)
Get interaction strength for this model
interaction_strength_art = min(F_additive_art$cv.error) - min(F_full_art$cv.error)
> 0.01323959 # much smaller than interaction_strength_original in step 1.
Repeat steps 2-4 to create a null distribution of interaction strengths. As expected, the interaction effects are much lower (-0.0527 to 0.0421) in the neutralised data sets than in the original training data set (0.1938).
interaction_strength_art <- NULL
for (j in 1:10)
{
print(j)
interaction_strength_art <- c(interaction_strength_art, step_2_to_4())
}
summary(interaction_strength_art)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#-0.052648 -0.019415 0.001124 -0.004310 0.012759 0.042058
interaction_strength_original
> 0.1937891

Resources