Plot interaction effect in sem model with observed variables in R - r

I am estimating an SEM model that has observed variables. I am using SEM to handle missing data using FIML. My model has an interaction term to test for moderation. Here is a toy example that illustrates the issue.
library(lavaan)
library(car)
library(dplyr)
data(starwars)
sw2 <- starwars %>% mutate(
male = Recode(sex, "'male' = 1; NA=NA; else = 0"),
human = Recode(species, "'Human' = 1; NA=NA; else = 0"),
maleXby = male * birth_year,
)
mod <- 'mass ~ height + human + male + birth_year + maleXby'
fit <- sem(mod, data = sw2, missing="fiml.x")
summary(fit)
What I want to do is plot the interaction term like a margin plot, to visualize the interaction effect. But package like library(interactions) does not work with an object of class lavaan. How could I visualize this? Is there a package (like interactions) that makes this easier.

You could fit this model using lm(), but I think you want to be able to use FIML estimates, yes? In that case, you could use the emmeans package, which can work on lavaan-class objects if you have the semTools package loaded.
You didn't say which predictor was focal vs. moderator, but I assume you want to treat male as moderator because it is a grouping variable. The example below can be adapted by switching their roles in the pairs() function, as well as by selecting different birth_year levels at= which to probe the effect of male. When birth_year is the focal predictor, its linear effect will be the same regardless of which levels are chosen, so I chose the full range() below.
library(emmeans)
library(semTools)
## for ease of use, fit model using colon operator
mod <- 'mass ~ height + human + male + birth_year + male:birth_year'
fit <- sem(mod, data = sw2, missing = "fiml.x")
## calculate expected marginal means for multiple
## levels of male (1:0) and birth_year
BYrange <- range(sw2$birth_year, na.rm = TRUE)
em.mass <- emmeans(fit, specs = ~ birth_year | male,
at = list(male = 1:0, birth_year = BYrange),
# because SEMs can have multiple DVs:
lavaan.DV = "mass")
em.mass
## probe effect of year across sex
rbind(pairs(em.mass))
## plot effect of year across sex
emmip(em.mass, male ~ birth_year) # 2 lines in same plot
emmip(em.mass, ~ birth_year | male) # in separate panels

Related

Cannot fit multilevel ordinal logit model using clmm

I'm trying to fit a multilevel (random effects) ordered logit model using the ordinal package, but I keep running into this error:
Error in region:country1 : NA/NaN argument
Here's my simplified model. I'm regressing an indicator of happiness on a number of variables, including class, gender, age, etc. There are two nested levels: regions within countries.
library(ordinal)
# Set as factor
data$happiness <- as.factor(data$happiness)
# Remove NA
missing_country1 <- is.na(data$country1)
data <- data[!missing_country1, ]
missing_region <- is.na(data$region)
data <- data[!missing_region, ]
# Model
model1 <- clmm(happiness ~ age + gender + class + (1 | country1 / region),
data = data,
na.action = na.omit
)
I have removed all NA and NaN from both country1 and region.
Thanks,
Figured it out: it was because ordinal doesn't automatically convert the grouping variables to factor, so you need to do it manually.

Non-parametric bootstrapping to generate 95% Confidence Intervals for fixed effect coefficients calculated by a glmer with nested random effects

I have an R coding question.
This is my first time asking a question here, so apologies if I am unclear or do something wrong.
I am trying to use a Generalized Linear Mixed Model (GLMM) with Poisson error family to test for any significant effect on a count response variable by three separate dichotomous variables (AGE = ADULT or JUVENILE, SEX = MALE or FEMALE and MEDICATION = NEW or OLD) and an interaction between AGE and MEDICATION (AGE:MEDICATION).
There is some dependency in my data in that the data was collected from a total of 22 different sites (coded as SITE vector with 33 distinct levels), and the data was collected over a total of 21 different years (coded as YEAR vector with 21 distinct levels, and treated as a categorical variable). Unfortunately, every SITE was not sampled for each YEAR, with some being sampled for a greater number of years than others.
The data is also quite sparse, in that I do not have a great number of measurements of the response variable (coded as COUNT and an integer vector) per SITE per YEAR.
My Poisson GLMM is constructed using the following code:
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + SEX:MEDICATION + AGE + AGE:SEX + MEDICATION + AGE:MEDICATION + (1|SITE/YEAR),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
In order to try and obtain more reliable estimates for the fixed effect coefficients (particularly given the sparse nature of my data), I am trying to obtain 95% confidence intervals for the fixed effect coefficients through non-parametric bootstrapping.
I have come across the "glmmboot" package which can be used to conduct non-parametric bootstrapping of GLMMs, however when I try to run the non-parametric bootstrapping using the following code:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
When I run this code, I receive the following message:
Performing case resampling (no random effects)
Naturally, though, my model does have random effects, namely (1|SITE/YEAR).
If I try to tell the function to resample from a specific block, by adding in the "reample_specific_blocks" argument, i.e.:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "YEAR")
Then I get the following error message:
Performing block resampling, over SITE
Error: Invalid grouping factor specification, YEAR:SITE
I get a similar error message if I try set 'resample_specific_blocks' to "SITE".
If I then try to set 'resample_specific_blocks' to "SITE:YEAR" or "SITE/YEAR" I get the following error message:
Error in bootstrap_model(base_model = model, base_data = mydata, resamples = 1000, :
No random columns from formula found in resample_specific_blocks
I have tried explicitly nesting YEAR within SITE and then adapting the model accordingly using the code:
mydata <- within(mydata, SAMPLE <- factor(SITE:YEAR))
model.refit <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + AGE + MEDICATION + AGE:MEDICATION + (1|SITE) + (1|SAMPLE),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
bootstrap_model(base_model = model.refit,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "SAMPLE")
But unfortunately I just get this error message:
Error: Invalid grouping factor specification, SITE
The same error message comes up if I set resample_specific_blocks argument to SITE, or if I just remove the resample_specific_blocks argument.
I believe that the case_bootstrap() function found in the lmeresampler package could potentially be another option, but when I look into the help for it it looks like I would need to create a function and I unfortunately have no experience with creating my own functions within R.
If anyone has any suggestions on how I can get the bootstrap_model() function in the glmmboot package to recognise the random effects in my model/dataframe, or any suggestions for alternative methods on conducting non-parametric bootstrapping to create 95% confidence intervals for the coefficients of the fixed effects in my model, it would be greatly appreciated! Many thanks in advance, and for reading such a lengthy question!
For reference, I include links to the RDocumentation and GitHub for the glmmboot package:
https://www.rdocumentation.org/packages/glmmboot/versions/0.6.0
https://github.com/ColmanHumphrey/glmmboot
The following is code that will allow for creation of a reproducible example using the data set from lme4::grouseticks
#Load in required packages
library(tidyverse)
library(lme4)
library(glmmboot)
library(psych)
#Load in the grouseticks dataframe
data("grouseticks")
tibble(grouseticks)
#Create dummy vectors for SEX, AGE and MEDICATION
set.seed(1)
SEX <-sample(1:2, size = 403, replace = TRUE)
SEX <- as.factor(ifelse(SEX == 1, "MALE", "FEMALE"))
set.seed(2)
AGE <- sample(1:2, size = 403, replace = TRUE)
AGE <- as.factor(ifelse(AGE == 1, "ADULT", "JUVENILE"))
set.seed(3)
MEDICATION <- sample(1:2, size = 403, replace = TRUE)
MEDICATION <- as.factor(ifelse(MEDICATION == 1, "OLD", "NEW"))
grouseticks$SEX <- SEX
grouseticks$AGE <- AGE
grouseticks$MEDICATION <- MEDICATION
#Use the INDEX vector to create a vector of sample sizes per LOCATION
#per YEAR
grouseticks$INDEX <- 1
sample.sizes <- grouseticks %>%
group_by(LOCATION, YEAR) %>%
summarise(SAMPLE.SIZE = sum(INDEX))
#Combine the dataframes together into the dataframe to be used in the
#model
mydata$SAMPLE.SIZE <- as.integer(mydata$SAMPLE.SIZE)
#Create the Poisson GLMM model
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = TICKS ~ SEX + SEX + AGE + MEDICATION + AGE:MEDICATION + (1|LOCATION/YEAR),
nAGQ = 0)
#Attempt non-parametric bootstrapping on the model to get 95%
#confidence intervals for the coefficients of the fixed effects
set.seed(1)
Model.bootstrap <- bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
Model.bootstrap

Survminer - include subset of variables in plot

Let's say I want to plot the survival curves using a model of the lung data, that controls for sex and a median split of the age variable (I could also control linearly for age and that would make my problem even worse).
I would like to make a plot of this model only showing the stratification between the levels of the sex factor. If I do what seems to be the standard, however, I get 4 instead of two survival curves.
library(survival)
library(survminor)
reg_lung <- lung %>% mutate(age_cat = ifelse(age > 63, "old", "young"))
lung_fit <- survfit(Surv(time, status) ~ age_cat + sex, data = reg_lung)
ggsurvplot(lung_fit, data = reg_lung)
resulting survival plot
That is to say, I would like to the difference sex makes while holding the influence of age fixed (either as factor old/young or linearly).
You can fit your model with coxph and define sex as strata:
lung_fit <- coxph(Surv(time, status) ~ age_cat + strata(sex), data = reg_lung)
ggsurvplot(survfit(lung_fit), data = reg_lung)

Extract Model for Specific Factor

Say I've fit a model as follows fit = lm(Y ~ X + Dummy1 + Dummy2)
How can I extract the regression for a specific dummy variable?
I'm hoping to do something like the following to plot all the regressions:
plot(...)
abline(extracted.lm.dummy1)
abline(extracted.lm.dummy2)
I would look into the sjPlot package. Here is the documentation for sjp.lm, which can be used to visualize linear models in various ways. The package also has some nice tools for tabular summaries of models.
An example:
library(sjPlot)
library(dplyr)
# add a second categorical variable to the iris dataset
# then generate a linear model
set.seed(123)
fit <- iris %>%
mutate(Category = factor(sample(c("A", "B"), 150, replace = TRUE))) %>%
lm(Sepal.Length ~ Sepal.Width + Species + Category, data = .)
Different kinds of plot include:
Marginal effects plot, probably closest to what you want
sjp.lm(fit, type = "eff", vars = c("Category", "Species"))
"Forest plot" (beta coefficients + confidence interval)
sjp.lm(fit)

How to reverse log transformation when presenting moderation effect from linear regression models in R?

## load the dataset
library(car)
library(texreg)
library(effects)
library(psych)
Prestige$lnincome <- log(Prestige$income)
PrestigeSubset <- Prestige[!rownames(Prestige) %in% c("MINISTERS","BABYSITTERS","NEWSBOYS"), ]
m1 <- lm(lnincome ~ prestige + women + education + type, data = PrestigeSubset)
m2 <- lm(lnincome ~ prestige*women + education + type, data = PrestigeSubset)
anova(m1, m2)
# Analysis of Variance Table
# Model 1: lnincome ~ prestige + women + education + type
# Model 2: lnincome ~ prestige * women + education + type
# Res.Df RSS Df Sum of Sq F Pr(>F)
# 1 91 4.3773
# 2 90 4.2661 1 0.11127 2.3473 0.129
plot(effect("prestige:women", m2), xlevels = list(women = c(0, 25, 50, 75, 97.51)), multiline = T)
I did first a regression model m1, then another regression model with a interaction term (profession prestige and share of women in that profession). Compare the two models to see if the interaction is significant(the same as the p value of interaction in summary(m2)). The plot here is the effect of profession prestige to profession income at different share of women levels using the Prestige dataset (for practice). Idea is that professions with fewer women would be rewarded less(to draw the gender equality issue).
The income (y axis) here is actually natural log transformed. But I want to present the original income. How would I do this?
In addition, the interaction term is actually not significant. But when I used the original value income it was significant. I know there is a moderation effect of "women"(share of women) to "prestige → income". Is there anyway to resolve the inconsistency?
I've solved the first question. Add some parameters to effect function will do it.
plot(effect("prestige:women", m2, xlevels = list(women = c(0, 25, 50, 75, 97.51)),
transformation = list(link = log, inverse = exp)),
multiline = T, type = "response", add = T)

Resources