Run HLM mediation in R - r

I try to run HLM mediation with the "mediation" package:
med.fit <- glmer(M ~ treat + control + (1|subject_id) ,family = binomial(link = "logit"), data = R1_data)
out.fit <- glmer(Y ~ M+ treat + control+ (1 + M|subject_id),family = binomial(link = "logit"), data = R1_data)
med.out <- mediate(med.fit, out.fit, treat = "treat", mediator = "M", sims = 1000)
I got this error message:
Error in [.data.frame(y.data, int.term.name[p]) : undefined columns selected
How to solve this problem?
Here is the original data and code:
names(R1_data)
[1] "subject_id"
[3] "Presented_is_solvable"
[5] "JOS"
[17] "Answer_JOS"
[23] "Matrix_Z_score"
library(mediation)
library(lme4)
med.fit <- glmer(JOS ~ Matrix_Z_score + Presented_is_solvable + (1|subject_id) ,family = binomial(link = "logit"), data = R1_data)
out.fit <- glmer(Answer_JOS ~ JOS + Matrix_Z_score +Presented_is_solvable + (1 + JOS|subject_id),family = binomial(link = "logit"), data = R1_data)
med.out <- mediate(med.fit, out.fit, treat = "Matrix_Z_score", mediator = "JOS", sims = 1000)

Figured out that this happens when treatment or mediator data is classified as factor data in R. The mediate function can't properly locate the names of those variables from the fitted models as in the models, they are displayed as "variablename"+factor level.
The solution is to make sure those variables are classified as integers. You can take a look at the variable classifications in the student data set within the mediation package.

Related

svyglm - how to code for a logistic regression model across all variables?

In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu

Cross Validation Structural Equation Modeling

Not sure why it is difficult to find info on this topic.
I want to CV my SEM model. N = 360. I've pulled 70% of data into a train set and have built the model, first on theory then using modification indices. I also have a test data frame where I have the observed values (for well-being), but I want to use the model to predict the values. lavPredict only seems to be used to predict values of latent variables. Perhaps I'm missing something, but doesn't seem so straightforward as in lmer or basic linear regression. Does one just use the model fit indices from the test dataset? Seems like one should be able to compare observed and predicted values in SEM.
I've included some data here: https://drive.google.com/file/d/1AX50DFNik30Qsyiyp6XnPMETNfVXK83r/view?usp=sharing
Here is the final model I have through the train dataset. When I go to test it, I just get this
Error in lavPredict(fit.latent.8, newdata = test) :
inherits(object, "lavaan") is not TRUE
Thanks much!
fit.latent.8 <- '#factor loadings; measurement model portion
pl =~ exercisescore + mindfulnessscore + promistscore
sl =~ family_support + friendshipcount + friendshipnet + sense_of_community
trauma =~ neglectscore + abusescore + exposure + family_support + age + sesscore
#regressions: structural model
wellbeing ~ age + gender + ethnicity + sesscore + resiliencescore + pl + emotionalsupportscore + trauma
resiliencescore ~ age + sesscore + emotionalsupportscore + pl
emotionalsupportscore ~ sl + gender
#Covariances
friendshipnet~~age
friendshipnet ~~ abusescore
'
train.1 <- sem(fit.latent.8, data = train, meanstructure = TRUE, std.lv = TRUE)
summary(train.1, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)
modindices(train.1, sort. = TRUE, minimum.value = 10)
test.1 <- sem(fit.latent.8, data = test, meanstructure = TRUE, std.lv = TRUE)
summary(test.1, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)

R: Undefined column error when using mediate in a user made function

I am running a series of mediation analyses using R's mediation package. Because the models are extremely similar to each other I wrote a function where all
that would change would be the mediating variable, the outcome variable, and the data set. The function is below:
library(mediation)
data("framing", package = "mediation")
covList <- list("age", "educ", "gender", "income")
meBrokenFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(DATA[[MEDIATOR]] ~ treat + age + educ + gender + income, data = DATA)
medOnOut <- glm(DATA[[OUTCOME]] ~ DATA[[MEDIATOR]] + treat + age + educ + gender + income, data = DATA, family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
set.seed(2019)
test_first <- meBrokenFunction("emo", "cong_mesg", framing)
When I run this function I get the following error:
Error in `[.data.frame`(y.data, , mediator) : undefined columns selected
However if I run the code without using the function I wrote, everything works as intended.
test_treatOnMed <- lm(emo ~ treat + age + educ + gender + income,
data = framing)
test_treatOnOut <- glm(cong_mesg ~ treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_medOnOut <- glm(cong_mesg ~ emo + treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_second <- mediate(test_treatOnMed, test_medOnOut, sims = 100,
treat = "treat", mediator = "emo",
covariates = covList, robustSE = TRUE)
The error appears to be in the mediate function, specifically at mediator = MEDIATOR but I do not understand why it is not working or if I am approaching the problem incorrectly.
In the formula, we may need paste instead of DATA[[MEDIATOR]]
lm(paste(MEDIATOR, "~ treat + age + educ + gender + income"), data = DATA)
Similarly for the glm
-fullcode
meFixedFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(paste(MEDIATOR,
"~ treat + age + educ + gender + income"), data = DATA)
medOnOut <- glm(paste(OUTCOME, "~", MEDIATOR,
"+ treat + age + educ + gender + income"), data = DATA,
family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
-testing
set.seed(2019)
test_first <- meFixedFunction("emo", "cong_mesg", framing)

Visualize Multilevel Growth Model with nlme/ggplot2 vs lme4/ggplot2

I am trying to visualize the results of an nlme object without success. When I do so with an lmer object, the correct plot is created. My goal is to use nlme and visualize a fitted growth curve for each individual with ggplot2. The predict() function seems to work differently with nlme and lmer objects.
model:
#AR1 with REML
autoregressive <- lme(NPI ~ time,
data = data,
random = ~time|patient,
method = "REML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1())
nlme visualization attempt:
data <- na.omit(data)
data$patient <- factor(data$patient,
levels = 1:23)
ggplot(data, aes(x=time, y=NPI, colour=factor(patient))) +
geom_point(size=1) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(autoregressive,
level = 1)), size = 1)
when I use:
data$fit<-fitted(autoregressive, level = 1)
geom_line(aes(y = fitted(autoregressive), group = patient))
it returns the same fitted values for each individual and so ggplot produces the same growth curve for each. Running test <-data.frame(ranef(autoregressive, level=1)) returns varying intercepts and slopes by patient id. Interestingly, when I fit the model with lmer and run the below code it returns the correct plot. Why does predict() work differently with nlme and lmer objects?
timeREML <- lmer(NPI ~ time + (time | patient),
data = data,
REML=T, na.action=na.omit)
ggplot(data, aes(x = time, y = NPI, colour = factor(patient))) +
geom_point(size=3) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(timeREML)))
In creating a reproducible example, I found that the error was not occurring in predict() nor in ggplot() but instead in the lme model.
Data:
###libraries
library(nlme)
library(tidyr)
library(ggplot2)
###example data
df <- data.frame(replicate(78, sample(seq(from = 0,
to = 100, by = 2), size = 25,
replace = F)))
##add id
df$id <- 1:nrow(df)
##rearrange cols
df <- df[c(79, 1:78)]
##sort columns
df[,2:79] <- lapply(df[,2:79], sort)
##long format
df <- gather(df, time, value, 2:79)
##convert time to numeric
df$time <- factor(df$time)
df$time <- as.numeric(df$time)
##order by id, time, value
df <- df[order(df$id, df$time),]
##order value
df$value <- sort(df$value)
Model 1 with no NA values fits successfully.
###model1
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
Introducing NA's causes invertible coefficient matrix error in model 1.
###model 1 with one NA value
df[3,3] <- NA
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
But not in model 2, which has a more simplistic within-group AR(1) correlation structure.
###but not in model2
model2 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form = ~1 | id))
However, changing opt="optim" to opt="nlminb" fits model 1 successfully.
###however changing the opt to "nlminb", model 1 runs
model3 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="nlminb"),
correlation = corAR1(0, form=~time|id,
fixed=F))
The code below visualizes model 3 (formerly model 1) successfully.
df <- na.omit(df)
ggplot(df, aes(x=time, y=value)) +
geom_point(aes(colour = factor(id))) +
#facet_wrap(~id) +
geom_line(aes(y = predict(model3, level = 0)), size = 1.3, colour = "black") +
geom_line(aes(y = predict(model3, level=1, group=id), colour = factor(id)), size = 1)
Note that I am not exactly sure what changing the optimizer from "optim" to "nlminb" does and why it works.

R Predict using multiple models

I am new to R and trying to predict outcomes on a dataset using 4 different GLM's. I have tried running as one large model and while I do get results the model doesn't converge properly and I end up with N/A's. I therefore have four models:
model_team <- glm(mydata$OUT ~ TEAM + OPPONENT, family = "binomial",data = mydata )
model_conf <- glm(mydata$OUT ~ TCONF + OCONF, family = "binomial",data = mydata)
model_tstats <- glm(mydata$OUT ~ TPace + TORtg + TFTr + T3PAr + TTS. + TTRB. + TAST. + TSTL. + TBLK. + TeFG. + TTOV. + TORB. + TFT.FGA, family = "binomial",data = mydata)
model_ostats<- glm(mydata$OUT ~ OPace + OORtg + OFTr + O3PAr + OTS. + OTRB. + OAST. + OSTL. + OBLK. + OeFG. + OTOV. + OORB. + OFT.FGA, family = "binomial",data = mydata)
I then want to predict the outcomes using a different data set using the four models
predict(model_team, model_conf, model_tstats, model_ostats, fix, level = 0.95, type = "probs")
Is there a way to use all four models with joining them into one large set?
I don't really understand why you are trying to do what you are doing. I also don't have any example data that is a representation of the data you are working with. However, below is an example of how you could combine multiple GLMs into one using the resulting coefficients. Note that this will not work well if you have multicollinearity between the variables in your dataset.
# I used the iris dataset for my example
head(iris)
# Run several models
model1 <- glm(data = iris, Sepal.Length ~ Sepal.Width)
model2 <- glm(data = iris, Sepal.Length ~ Petal.Length)
model3 <- glm(data = iris, Sepal.Length ~ Petal.Width)
# Get combined intercept
intercept <- mean(
coef(model1)['(Intercept)'],
coef(model2)['(Intercept)'],
coef(model3)['(Intercept)'])
# Extract coefficients
coefs <- as.matrix(
c(coef(model1)[2],
coef(model2)[2],
coef(model3)[2])
# Get the feature values for the predictions
ds <- as.matrix(iris[,c('Sepal.Width', 'Petal.Length', 'Petal.Width')])
# Linear algebra: Matrix-multiply values with coefficients
prediction <- ds %*% coefs + intercept
# Let's look at the results
plot(iris$Petal.Length, prediction)

Resources