Modeling a random-effects component using the lme4 package - r

I would like to estimate a generalized linear mixed-effects model using the glmer.nb function in R's lme4 package. I have panel data of various crime outcomes. My cross-sectional unit is the "precinct" (over 40 precincts) and I observe crime in those precincts over many months. I am evaluating an intervention that 'turns on/off' (dummy coded) over the month-years. I include "precinct" and "month" fixed effects (i.e., a full set of precinct and month dummies enter the model). I have only one independent variable I am assessing. The second model using glmer.nb is the function that returns an error.
# How the two-way fixed-effects model is specified (works well)
model_fe <- glm.nb(crime_counts ~ as.factor(precinct) + as.factor(month_year) + policy, data = df)
# Modeling "precincts" as the random-effect (fails)
model_re <- glmer.nb(crime_counts ~ (1 | precinct) + as.factor(month_year) + policy, data = df)
The error returned is shown below...
failure to converge in 10000 evaluationsModel failed to converge with max|grad| = 0.00295777 (tol = 0.001, component 1)iteration limit reached
In sum, I would like to specify precincts as the random intercept component. I also tried to specify precinct as a factor variable but that did not help. Any ideas? I am new to this package.

Related

Calculating ROC for panel data and Linear Probability Model

I have panel data from external assets of 102 countries over ~ 20-40 years, depending on the country.
I tried predicting the probability for a financial crisis, depending on log(total_liabilities to see whether an increase in foreign investment and other capital positions can help predict a crisis.
plm1 <- plm(crisis ~ log_total_liabilities + lag1_log_tot_lia + lag2_log_tot_lia + lag3_log_tot_lia
+ factor(year) + factor(country), data = dt2, index=c("year", "country"), model="pooling")
summary(plm1)
I started by estimating a plm model, regressing on my crisis dummy.
To estimate the predictive ability, I wanted to generate a ROC and AUC value, given the regression
# Plot of True Positive Rate Against the False Positive Rate
pred1 <- predict(plm1)
pred2 <- prediction(pred1,as.numeric(plm1$crisis))
plot(performance(pred2,"tpr","fpr"), las=0, main="plm1")
I get errors like:
Error: not fitting arguments / variables" (translated from German) or
"all arguments/variables need to have the same length" (translated
from German).
Another approach to obtaining Roc values would start with
When changing pred1 <- predict(plm1, dt2) (dt2 is my data frame, containing also some variables I had not used in the plm1 regression), the error differs:
The format of predictions is invalid. It couldn't be coerced to a list.
Are PLMs simply not made for ROC calculations? And if so, how come that the paper attached presents AUROC values for a linear probability model with fixed effects? (See second last row)
And if no, what am I doing wrong?
I attached the screenshot of the paper and my dataset.
CSV File with datasat
Screenshot of paper with OLS AUROC value
AUC-ROC only works for only binary classification problems. As you used a fixed effects regression, the predicted values produced after plm1, pred1, is a continuous one.

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

High GLMER dispersion parameters

I am running a glmer with a random effect for count data (x) and two categorical variables (y and z):
fullmodel<-glmer(x~y*z + (1|Replicate), family = poisson, data = Data)
However, when I look at the dispersion parameter:
> dispersion_glmer(fullmodel)
[1] 2.338742
It is way higher than 1. Does this mean my model is over dispersed? How do I correct it. I want to keep my random effect but when I tried to swap the family to quasipoisson it says you can't use it for a glmer

Application of a multi-way cluster-robust function in R

Hello (first timer here),
I would like to estimate a "two-way" cluster-robust variance-covariance matrix in R. I am using a particular canned routine from the "multiwayvcov" library. My question relates solely to the set-up of the cluster.vcov function in R. I have panel data of various crime outcomes. My cross-sectional unit is the "precinct" (over 40 precincts) and I observe crime in those precincts over several "months" (i.e., 24 months). I am evaluating an intervention that 'turns on' (dummy coded) for only a few months throughout the year.
I include "precinct" and "month" fixed effects (i.e., a full set of precinct and month dummies enter the model). I have only one independent variable I am assessing. I want to cluster on "both" dimensions but I am unsure how to set it up.
Do I estimate all the fixed effects with lm first? Or, do I simply run a model regressing crime on the independent variable (excluding fixed effects), then use cluster.vcov i.e., ~ precinct + month_year.
This seems like it would provide the wrong standard error though. Right? I hope this was clear. Sorry for any confusion. See my set up below.
library(multiwayvcov)
model <- lm(crime ~ as.factor(precinct) + as.factor(month_year) + policy, data = DATASET_full)
boot_both <- cluster.vcov(model, ~ precinct + month_year)
coeftest(model, boot_both)
### What the documentation offers as an example
### https://cran.r-project.org/web/packages/multiwayvcov/multiwayvcov.pdf
library(lmtest)
data(petersen)
m1 <- lm(y ~ x, data = petersen)
### Double cluster by firm and year using a formula
vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)
coeftest(m1, vcov_both_formula)
Is is appropriate to first estimate a model that ignores the fixed effects?
First the answer: you should first estimate your lm -model using fixed effects. This will give you your asymptotically correct parameter estimates. The std errors are incorrect because they are calculated from a vcov matrix which assumes iid errors.
To replace the iid covariance matrix with a cluster robust vcov matrix, you can use cluster.vcov, i.e. my_new_vcov_matrix <- cluster.vcov(~ precinct + month_year).
Then a recommendation: I warmly recommend the function felm from lfe for both multi-way fe's and cluster-robust standard erros.
The syntax is as follows:
library(multiwayvcov)
library(lfe)
data(petersen)
my_fe_model <- felm(y~x | firmid + year | 0 | firmid + year, data=petersen )
summary(my_fe_model)

Is there a way to extrapolate predicted data from lmer

I am using lmer to fit a multilevel polynomial regression model with several fixed effects (including subject-specific variables like age, short-term memory span, etc.) and two sets of random effects (Subject and Subject:Condition). Now I would like to predict data for a hypothetical subject with particular properties (age, short-term memory span, etc.). I fit the model (m) and created a new data frame (pred) that contains my hypothetical subject, but when I tried predict(m, pred) I got an error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "mer"
I know I could use the brute-force method of extracting fixed effects from my model and multiplying it all out, but is there a more elegant solution?
You can do this type of extrapolated prediction easily with the merTools package for R: http://www.github.com/jknowles/merTools
merTools includes a function called predictInterval which provides robust prediction capabilities for lmer and glmer fits. Specifically, you can use this function to predict extrapolated data, and to obtain prediction intervals that account for the variance in both the fixed and random effects, as well as the residual error of the model.
Here's a quick code example:
library(merTools)
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
predOut <- predictInterval(m1, newdata = sleepstudy, n.sims = 100)
# extrapolated data
extrapData <- sleepstudy[1:10,]
extrapData$Days <- 20
extrapPred <- predictInterval(m1, newdata = extrapData)

Resources