Mixed models with mlogit in R - Random intercepts? - r

I want to fit a very simple mixed-effects model, with a couple of fixed effects and random intercepts (no random slopes), using the mlogit package in R. My categorical outcome variable has three levels, so I cannot use the lme4 package.
However, I keep googling and stack-ing and CRAN-ing (?) about this, but nowhere am I able to find a good solution. Any help out there on how to do this with the mlogit package? -- Or are there any similar alternatives in other R packages (or in SPSS, Stata or Minitab, or via packages in Python/Julia)?
See code below for my data structure and what type of model I would like to fit (I know how to fit a fixed-effects only model with mlogit (cf. fixed_model below); I just want to add random intercepts):
library(mlogit)
library(dfidx)
# Make variables:
Outcome = c("y","z","y","z","x","z","y","x","x","x","z",
"y","z","x","x","y","z","x", "x", "y")
Predictor = rep(c("M", "F"), 10)
RandomIntercept = rep(c("A", "B", "C", "D"), 5)
# Make data frame
df <- data.frame(Outcome, Predictor, RandomIntercept)
# Make mlogit-ready dataframe:
df_mlogit <- dfidx(df, choice = "Outcome", shape = "wide", id.var = "RandomIntercept")
# Display first observations:
head(df_mlogit)
# Make fixed-effect-only model:
fixed_model <- mlogit::mlogit(Outcome ~ 1 | Predictor, data = df_mlogit, reflevel = "x")
#Display results:
fixed_model
# The kind of model I want, in lme4-syntax:
dream_model <- lme4::glmer(Outcome ~ Predictor + (1|RandomIntercept), family = "binomial")

Related

Multinomial regression table using gtsummary; how to get rid of "NA" row?

I have experienced the same issue with multinom (nnet) and an extra "N" row above the glance table in tbl_regression (gtsummary) that this user had: Previous post
In the replies to the previous question it was asked to provide reproducable code, so here it is:
library(nnet)
library(gtsummary)
library(tidyverse)
# Create a sample data frame
set.seed(123)
df <- data.frame(
y = sample(c(0, 1, 2), 100, replace = TRUE),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100)
)
# Fit a multinomial logistic regression model with nnet
model <- nnet::multinom(y ~ x1 + x2 + x3, data = df)
# Create a summary table with tbl_regression
model_tab <- tbl_regression(model,
exponentiate = TRUE) %>%
add_glance_table(c(nobs, AIC))
model_tab
I suspect the NA row has to do with tbl_regression producing an "empty model" for the NAs in the dependent variable. When I used Daniel Sjoberg's function to display a multinom model in wide format here, I noted that tbl_regression produced an additional empty model column for the value "NA" of my dependent variable, to the right of my table. I tried to use na.action = na.omit in multinom, to no avail.
So, perhaps tbl_regression is just too buggy for multinom models and I have to shift to another table-producing package. Nonetheless, if anyone has a clue how to avoid the NA issue, I would be happy to continue using the otherwise very useful gtsummary package.

Non-parametric bootstrapping to generate 95% Confidence Intervals for fixed effect coefficients calculated by a glmer with nested random effects

I have an R coding question.
This is my first time asking a question here, so apologies if I am unclear or do something wrong.
I am trying to use a Generalized Linear Mixed Model (GLMM) with Poisson error family to test for any significant effect on a count response variable by three separate dichotomous variables (AGE = ADULT or JUVENILE, SEX = MALE or FEMALE and MEDICATION = NEW or OLD) and an interaction between AGE and MEDICATION (AGE:MEDICATION).
There is some dependency in my data in that the data was collected from a total of 22 different sites (coded as SITE vector with 33 distinct levels), and the data was collected over a total of 21 different years (coded as YEAR vector with 21 distinct levels, and treated as a categorical variable). Unfortunately, every SITE was not sampled for each YEAR, with some being sampled for a greater number of years than others.
The data is also quite sparse, in that I do not have a great number of measurements of the response variable (coded as COUNT and an integer vector) per SITE per YEAR.
My Poisson GLMM is constructed using the following code:
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + SEX:MEDICATION + AGE + AGE:SEX + MEDICATION + AGE:MEDICATION + (1|SITE/YEAR),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
In order to try and obtain more reliable estimates for the fixed effect coefficients (particularly given the sparse nature of my data), I am trying to obtain 95% confidence intervals for the fixed effect coefficients through non-parametric bootstrapping.
I have come across the "glmmboot" package which can be used to conduct non-parametric bootstrapping of GLMMs, however when I try to run the non-parametric bootstrapping using the following code:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
When I run this code, I receive the following message:
Performing case resampling (no random effects)
Naturally, though, my model does have random effects, namely (1|SITE/YEAR).
If I try to tell the function to resample from a specific block, by adding in the "reample_specific_blocks" argument, i.e.:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "YEAR")
Then I get the following error message:
Performing block resampling, over SITE
Error: Invalid grouping factor specification, YEAR:SITE
I get a similar error message if I try set 'resample_specific_blocks' to "SITE".
If I then try to set 'resample_specific_blocks' to "SITE:YEAR" or "SITE/YEAR" I get the following error message:
Error in bootstrap_model(base_model = model, base_data = mydata, resamples = 1000, :
No random columns from formula found in resample_specific_blocks
I have tried explicitly nesting YEAR within SITE and then adapting the model accordingly using the code:
mydata <- within(mydata, SAMPLE <- factor(SITE:YEAR))
model.refit <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + AGE + MEDICATION + AGE:MEDICATION + (1|SITE) + (1|SAMPLE),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
bootstrap_model(base_model = model.refit,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "SAMPLE")
But unfortunately I just get this error message:
Error: Invalid grouping factor specification, SITE
The same error message comes up if I set resample_specific_blocks argument to SITE, or if I just remove the resample_specific_blocks argument.
I believe that the case_bootstrap() function found in the lmeresampler package could potentially be another option, but when I look into the help for it it looks like I would need to create a function and I unfortunately have no experience with creating my own functions within R.
If anyone has any suggestions on how I can get the bootstrap_model() function in the glmmboot package to recognise the random effects in my model/dataframe, or any suggestions for alternative methods on conducting non-parametric bootstrapping to create 95% confidence intervals for the coefficients of the fixed effects in my model, it would be greatly appreciated! Many thanks in advance, and for reading such a lengthy question!
For reference, I include links to the RDocumentation and GitHub for the glmmboot package:
https://www.rdocumentation.org/packages/glmmboot/versions/0.6.0
https://github.com/ColmanHumphrey/glmmboot
The following is code that will allow for creation of a reproducible example using the data set from lme4::grouseticks
#Load in required packages
library(tidyverse)
library(lme4)
library(glmmboot)
library(psych)
#Load in the grouseticks dataframe
data("grouseticks")
tibble(grouseticks)
#Create dummy vectors for SEX, AGE and MEDICATION
set.seed(1)
SEX <-sample(1:2, size = 403, replace = TRUE)
SEX <- as.factor(ifelse(SEX == 1, "MALE", "FEMALE"))
set.seed(2)
AGE <- sample(1:2, size = 403, replace = TRUE)
AGE <- as.factor(ifelse(AGE == 1, "ADULT", "JUVENILE"))
set.seed(3)
MEDICATION <- sample(1:2, size = 403, replace = TRUE)
MEDICATION <- as.factor(ifelse(MEDICATION == 1, "OLD", "NEW"))
grouseticks$SEX <- SEX
grouseticks$AGE <- AGE
grouseticks$MEDICATION <- MEDICATION
#Use the INDEX vector to create a vector of sample sizes per LOCATION
#per YEAR
grouseticks$INDEX <- 1
sample.sizes <- grouseticks %>%
group_by(LOCATION, YEAR) %>%
summarise(SAMPLE.SIZE = sum(INDEX))
#Combine the dataframes together into the dataframe to be used in the
#model
mydata$SAMPLE.SIZE <- as.integer(mydata$SAMPLE.SIZE)
#Create the Poisson GLMM model
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = TICKS ~ SEX + SEX + AGE + MEDICATION + AGE:MEDICATION + (1|LOCATION/YEAR),
nAGQ = 0)
#Attempt non-parametric bootstrapping on the model to get 95%
#confidence intervals for the coefficients of the fixed effects
set.seed(1)
Model.bootstrap <- bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
Model.bootstrap

How to Properly Plot an SVM Model in R?

I am running an SVM model with 4 numerical columns and 1 column that is a factor. I am able to see a successful summary of the model, and the accuracy is perfect.
However, when trying to plot the model with 4 variables I get a result that does not look right, as the data points are not grouped by classification. Here is the code I've been using, if anyone could help that would be much appreciated. Also, let me know if the dataset is required for you to help me solve this issue.
View(anthrokids)
anthrokids$Race <- as.factor(anthrokids$Race)
svm_model <- svm(formula = Race ~ ., data = anthrokids)
summary(svm_model)
svm_model$SV
plot(svm_model, data = anthrokids, Height~Weight,
slice = list(Age = 3, Sex = 4))
prediction = predict(svm_model, anthrokids)
table(Predicted = prediction, Actual = anthrokids$Race)

sjt.lmer displaying incorrect p-values

I've just noticed that sjt.lmer tables are displaying incorrect p-values, e.g., p-values that do not reflect the model summary. This appears to be a new-ish issue, as this worked fine last month?
Using the provided data and code in the package vignette
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(lme4)
library(sjstats)
load sample data
data(efc)
prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
efc$care.level <- rec(efc$n4pstu, rec = "0=0;1=1;2=2;3:4=4",
val.labels = c("none", "I", "II", "III"))
data frame for fitted model
mydf <- data.frame(
neg_c_7 = efc$neg_c_7,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
barthel = efc$barthtot,
education = to_factor(efc$c172code),
grp = efc$grp,
carelevel = to_factor(efc$care.level)
)
fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1 | grp), data = mydf)
summary(fit1)
p_value(fit1, p.kr =TRUE)
model summary
p_value summary
sjt.lmer output does not show these p-values??
Note that the first summary comes from a model fitted with lmerTest, which computes p-values with df based on Satterthwaite approximation (see first line in output).
p_value(), however, with p.kr = TRUE, uses the Kenward-Roger approximation from package pbkrtest, which is a bit more conservative.
Your output from sjt.lmer() seems to be messed up somehow, and I can't reproduce it with your example. My output looks ok:

Putting MCMCglmm objects in Anova() or Manova()

I am using MCMCglmm package to perform mixed-model analysis. If I have data look like:
df_test <- data.frame(ID = 1:10, a = 171:180, b = 71:80 + rnorm(10),
age = factor(c(rep("young", 3), rep("mid", 4), rep("old", 3)),
levels = c("young", "mid", "old")))
For linear models, I can easily do summary(Manova(lm(cbind(a, b) ~ age + 0, data = df_test))) to get the table and see the clear effect on age (please ignore the collinearity issue here).
However, what if I use MCMCglmm to consider the mixed effects of ID? Let's say the MCMCglmm object is called "mc_model", how can I use Manova or similar methods to look at the effects of (multiple) categorical (with numerous levels) variables?
Thanks!

Resources