sjt.lmer displaying incorrect p-values - r

I've just noticed that sjt.lmer tables are displaying incorrect p-values, e.g., p-values that do not reflect the model summary. This appears to be a new-ish issue, as this worked fine last month?
Using the provided data and code in the package vignette
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(lme4)
library(sjstats)
load sample data
data(efc)
prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
efc$care.level <- rec(efc$n4pstu, rec = "0=0;1=1;2=2;3:4=4",
val.labels = c("none", "I", "II", "III"))
data frame for fitted model
mydf <- data.frame(
neg_c_7 = efc$neg_c_7,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
barthel = efc$barthtot,
education = to_factor(efc$c172code),
grp = efc$grp,
carelevel = to_factor(efc$care.level)
)
fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1 | grp), data = mydf)
summary(fit1)
p_value(fit1, p.kr =TRUE)
model summary
p_value summary
sjt.lmer output does not show these p-values??

Note that the first summary comes from a model fitted with lmerTest, which computes p-values with df based on Satterthwaite approximation (see first line in output).
p_value(), however, with p.kr = TRUE, uses the Kenward-Roger approximation from package pbkrtest, which is a bit more conservative.
Your output from sjt.lmer() seems to be messed up somehow, and I can't reproduce it with your example. My output looks ok:

Related

Difference in linear regression codes

I am self-teaching r from "An Introduction to Statistical Learning: With Applications in R". I am sure I should get the same mean for both codes. However, I get a drastically different result. Can someone please help me find out why am I not getting the same msg? Looks like the first code chunk is wrong. These came from the Auto data set. My predictions and the book's predictions are different. However, the index on which these two were trained was the same.
First Chunk (my code)
set.seed(1)
train_index = sample (392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, horsepower = test_df$horsepower)
mean((test_df$mpg - predictions)^2)
Second Chunk (book's code - An Introduction to Statistical Learning: With Applications in R)
set. seed (1)
train = sample (392, 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)
mean (( mpg - predict(lm.fit , Auto))[-train ]^2)
In your code, you’re not specifying the test data correctly in predict(). predict() takes a dataframe containing predictor variables, passed to the newdata argument; instead, you include horsepower = test_df$horsepower, which just gets absorbed by ... and has no effect.
If you instead pass the whole test_df dataframe to newdata, you get the same result as the text.
library(ISLR)
library(dplyr)
set.seed(1)
# OP’s code with change to predict()
train_index = sample(392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, newdata = test_df)
mean((test_df$mpg - predictions)^2)
# 23.26601
# ISLR code
set.seed (1)
train = sample (392 , 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)
mean (( mpg - predict(lm.fit , Auto))[-train ]^2)
# 23.26601

Reporting mgcv::gam summary with modelsummary

I'm attempting to report the model summary from mgcv::gam() using the modelsummary package. The flextable package provides a summary that is consistent with the summary output in R and what is often presented in publications. It separates out reporting for the fixed/parametric effects and the smooth terms.
Although flextable works well, I'd like to use modelsummary (mainly for it's ability to output to gt, kable, etc.). My plan was to produce two separate tables and report the appropriate data for parametric and smooth terms separately (there might be a better way?). However, I get hung up trying to omit coefficients in modelsummary().
Flextable example:
library(mgcv)
library(flextable)
library(modelsummary)
dat <- gamSim(1, n = 4000, dist = "normal", scale = 2)
mod <- gam(y ~ s(x0) + s(x1) + s(x2), data = dat)
flextable::as_flextable(mod)
My first step at getting the summary for parametric terms using modelsummary():
modelsummary(mod,
estimate = "estimate",
statistic = c("Std.Error" = "std.error",
"t-value" = "statistic",
"p-value" = "p.value"),
shape = term ~ model + statistic,
gof_map = NA)
I want to drop the smooth terms and include those in a different table or group, so I tried the coef_omit argument:
modelsummary(mod,
estimate = "estimate",
statistic = c("Std.Error" = "std.error",
"t-value" = "statistic",
"p-value" = "p.value"),
coef_omit = "^(?!.*Intercept)", #this should retain the intercept term
omit = ".*",
shape = term ~ model + statistic,
gof_map = NA)
Error in if (dat$part[i] == "estimates" && dat[[column]][i - 1] == dat[[column]][i]) { :
missing value where TRUE/FALSE needed
Interestingly, if I remove the shape argument to report statistics in "long format" the error goes away. I might be approaching formatting this summary completely wrong and am open to suggestions.

Non-parametric bootstrapping to generate 95% Confidence Intervals for fixed effect coefficients calculated by a glmer with nested random effects

I have an R coding question.
This is my first time asking a question here, so apologies if I am unclear or do something wrong.
I am trying to use a Generalized Linear Mixed Model (GLMM) with Poisson error family to test for any significant effect on a count response variable by three separate dichotomous variables (AGE = ADULT or JUVENILE, SEX = MALE or FEMALE and MEDICATION = NEW or OLD) and an interaction between AGE and MEDICATION (AGE:MEDICATION).
There is some dependency in my data in that the data was collected from a total of 22 different sites (coded as SITE vector with 33 distinct levels), and the data was collected over a total of 21 different years (coded as YEAR vector with 21 distinct levels, and treated as a categorical variable). Unfortunately, every SITE was not sampled for each YEAR, with some being sampled for a greater number of years than others.
The data is also quite sparse, in that I do not have a great number of measurements of the response variable (coded as COUNT and an integer vector) per SITE per YEAR.
My Poisson GLMM is constructed using the following code:
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + SEX:MEDICATION + AGE + AGE:SEX + MEDICATION + AGE:MEDICATION + (1|SITE/YEAR),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
In order to try and obtain more reliable estimates for the fixed effect coefficients (particularly given the sparse nature of my data), I am trying to obtain 95% confidence intervals for the fixed effect coefficients through non-parametric bootstrapping.
I have come across the "glmmboot" package which can be used to conduct non-parametric bootstrapping of GLMMs, however when I try to run the non-parametric bootstrapping using the following code:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
When I run this code, I receive the following message:
Performing case resampling (no random effects)
Naturally, though, my model does have random effects, namely (1|SITE/YEAR).
If I try to tell the function to resample from a specific block, by adding in the "reample_specific_blocks" argument, i.e.:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "YEAR")
Then I get the following error message:
Performing block resampling, over SITE
Error: Invalid grouping factor specification, YEAR:SITE
I get a similar error message if I try set 'resample_specific_blocks' to "SITE".
If I then try to set 'resample_specific_blocks' to "SITE:YEAR" or "SITE/YEAR" I get the following error message:
Error in bootstrap_model(base_model = model, base_data = mydata, resamples = 1000, :
No random columns from formula found in resample_specific_blocks
I have tried explicitly nesting YEAR within SITE and then adapting the model accordingly using the code:
mydata <- within(mydata, SAMPLE <- factor(SITE:YEAR))
model.refit <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + AGE + MEDICATION + AGE:MEDICATION + (1|SITE) + (1|SAMPLE),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
bootstrap_model(base_model = model.refit,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "SAMPLE")
But unfortunately I just get this error message:
Error: Invalid grouping factor specification, SITE
The same error message comes up if I set resample_specific_blocks argument to SITE, or if I just remove the resample_specific_blocks argument.
I believe that the case_bootstrap() function found in the lmeresampler package could potentially be another option, but when I look into the help for it it looks like I would need to create a function and I unfortunately have no experience with creating my own functions within R.
If anyone has any suggestions on how I can get the bootstrap_model() function in the glmmboot package to recognise the random effects in my model/dataframe, or any suggestions for alternative methods on conducting non-parametric bootstrapping to create 95% confidence intervals for the coefficients of the fixed effects in my model, it would be greatly appreciated! Many thanks in advance, and for reading such a lengthy question!
For reference, I include links to the RDocumentation and GitHub for the glmmboot package:
https://www.rdocumentation.org/packages/glmmboot/versions/0.6.0
https://github.com/ColmanHumphrey/glmmboot
The following is code that will allow for creation of a reproducible example using the data set from lme4::grouseticks
#Load in required packages
library(tidyverse)
library(lme4)
library(glmmboot)
library(psych)
#Load in the grouseticks dataframe
data("grouseticks")
tibble(grouseticks)
#Create dummy vectors for SEX, AGE and MEDICATION
set.seed(1)
SEX <-sample(1:2, size = 403, replace = TRUE)
SEX <- as.factor(ifelse(SEX == 1, "MALE", "FEMALE"))
set.seed(2)
AGE <- sample(1:2, size = 403, replace = TRUE)
AGE <- as.factor(ifelse(AGE == 1, "ADULT", "JUVENILE"))
set.seed(3)
MEDICATION <- sample(1:2, size = 403, replace = TRUE)
MEDICATION <- as.factor(ifelse(MEDICATION == 1, "OLD", "NEW"))
grouseticks$SEX <- SEX
grouseticks$AGE <- AGE
grouseticks$MEDICATION <- MEDICATION
#Use the INDEX vector to create a vector of sample sizes per LOCATION
#per YEAR
grouseticks$INDEX <- 1
sample.sizes <- grouseticks %>%
group_by(LOCATION, YEAR) %>%
summarise(SAMPLE.SIZE = sum(INDEX))
#Combine the dataframes together into the dataframe to be used in the
#model
mydata$SAMPLE.SIZE <- as.integer(mydata$SAMPLE.SIZE)
#Create the Poisson GLMM model
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = TICKS ~ SEX + SEX + AGE + MEDICATION + AGE:MEDICATION + (1|LOCATION/YEAR),
nAGQ = 0)
#Attempt non-parametric bootstrapping on the model to get 95%
#confidence intervals for the coefficients of the fixed effects
set.seed(1)
Model.bootstrap <- bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
Model.bootstrap

Mixed models with mlogit in R - Random intercepts?

I want to fit a very simple mixed-effects model, with a couple of fixed effects and random intercepts (no random slopes), using the mlogit package in R. My categorical outcome variable has three levels, so I cannot use the lme4 package.
However, I keep googling and stack-ing and CRAN-ing (?) about this, but nowhere am I able to find a good solution. Any help out there on how to do this with the mlogit package? -- Or are there any similar alternatives in other R packages (or in SPSS, Stata or Minitab, or via packages in Python/Julia)?
See code below for my data structure and what type of model I would like to fit (I know how to fit a fixed-effects only model with mlogit (cf. fixed_model below); I just want to add random intercepts):
library(mlogit)
library(dfidx)
# Make variables:
Outcome = c("y","z","y","z","x","z","y","x","x","x","z",
"y","z","x","x","y","z","x", "x", "y")
Predictor = rep(c("M", "F"), 10)
RandomIntercept = rep(c("A", "B", "C", "D"), 5)
# Make data frame
df <- data.frame(Outcome, Predictor, RandomIntercept)
# Make mlogit-ready dataframe:
df_mlogit <- dfidx(df, choice = "Outcome", shape = "wide", id.var = "RandomIntercept")
# Display first observations:
head(df_mlogit)
# Make fixed-effect-only model:
fixed_model <- mlogit::mlogit(Outcome ~ 1 | Predictor, data = df_mlogit, reflevel = "x")
#Display results:
fixed_model
# The kind of model I want, in lme4-syntax:
dream_model <- lme4::glmer(Outcome ~ Predictor + (1|RandomIntercept), family = "binomial")

Extract Model for Specific Factor

Say I've fit a model as follows fit = lm(Y ~ X + Dummy1 + Dummy2)
How can I extract the regression for a specific dummy variable?
I'm hoping to do something like the following to plot all the regressions:
plot(...)
abline(extracted.lm.dummy1)
abline(extracted.lm.dummy2)
I would look into the sjPlot package. Here is the documentation for sjp.lm, which can be used to visualize linear models in various ways. The package also has some nice tools for tabular summaries of models.
An example:
library(sjPlot)
library(dplyr)
# add a second categorical variable to the iris dataset
# then generate a linear model
set.seed(123)
fit <- iris %>%
mutate(Category = factor(sample(c("A", "B"), 150, replace = TRUE))) %>%
lm(Sepal.Length ~ Sepal.Width + Species + Category, data = .)
Different kinds of plot include:
Marginal effects plot, probably closest to what you want
sjp.lm(fit, type = "eff", vars = c("Category", "Species"))
"Forest plot" (beta coefficients + confidence interval)
sjp.lm(fit)

Resources