I wrote a function to run several lavaan models at once (from 5 different datasets). In the output I get the 5 different outputs. However, I would like to extract one specific estimate from each of these models, because I am using these in a meta-analysis (and I have many more models)
Here is my code for running the model:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun = function(emo_dyn){
fit=sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
summ = summary(fit, standardized = TRUE)
list(fit = fit,summary = summ)
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
results
And this is how I extract the coefficient. It kinda makes it a dataframe and then I extract the specific value from it. Not sure if that is the best option. It is about the standardized estimate of a specific path. But it is just copy and paste and I am sure this goes easier, but I don't know how to write this loop.
emo_dyn_1_est<-standardizedSolution(results$Y1$fit) # Standardised coefficients
emo_dyn_1_est_1<-emo_dyn_1_est[1, 4]
emo_dyn_1_est_1
emo_dyn_2_est<-standardizedSolution(results$Y2$fit) # Standardised coefficients
emo_dyn_2_est_2<-emo_dyn_2_est[1, 4]
emo_dyn_2_est_2
emo_dyn_3_est<-standardizedSolution(results$Y3$fit) # Standardised coefficients
emo_dyn_3_est_3<-emo_dyn_3_est[1, 4]
emo_dyn_3_est_3
emo_dyn_4_est<-standardizedSolution(results$Y4$fit) # Standardised coefficients
emo_dyn_4_est_4<-emo_dyn_4_est[1, 4]
emo_dyn_4_est_4
emo_dyn_5_est<-standardizedSolution(results$Y5$fit) # Standardised coefficients
emo_dyn_5_est_5<-emo_dyn_5_est[1, 4]
emo_dyn_5_est_5
lavaan has the parameterEstimates function so you can do something like:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun <- function(emo_dyn){
fit <- sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
fit
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
## Get a specific parameter
get_param <- function(fit, coef_pos) {
param <- parameterEstimates(fit, standardized = TRUE)[coef_pos, "std.lv"]
param
}
lapply(results, get_param, coef_pos = 1)
I made one change: in your lapply to get the results I only kept the model fit. If you want all the summaries you can just do lapply(results, summary). The get_param function assumes that you know the position in the results table of the parameter you want.
If you want to keep your existing lapply for the results then something like this would work:
results_fit_only <- lapply(results, "[[", "fit")
lapply(results_fit_only, get_param, coef_pos = 1)
Related
I have a bit of an issue. I am trying to develop some code that will allow me to do the following: 1) run a logistic regression analysis, 2) extract the estimates from the logistic regression analysis, and 3) use those estimates to create another logistic regression formula that I can use in a subsequent simulation of the original model. As I am, relatively new to R, I understand I can extract these coefficients 1-by-1 through indexing, but it is difficult to "scale" this to models with different numbers of coefficients. I am wondering if there is a better way to extract the coefficients and setup the formula. Then, I would have to develop the actual variables, but the development of these variables would have to be flexible enough for any number of variables and distributions. This appears to be easily done in Mplus (example 12.7 in the Mplus manual), but I haven't figured this out in R. Here is the code for as far as I have gotten:
#generating the data
set.seed(1)
gender <- sample(c(0,1), size = 100, replace = TRUE)
age <- round(runif(100, 18, 80))
xb <- -9 + 3.5*gender + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 100, size = 1, prob = p)
#grabbing the coefficients from the logistic regression model
matrix_coef <- summary(glm(y ~ gender + age, family = "binomial"))$coefficients
the_estimates <- matrix_coef[,1]
the_estimates
the_estimates[1]
the_estimates[2]
the_estimates[3]
I just cannot seem to figure out how to have R create the formula with the variables (x's) and the coefficients from the original model in a flexible manner to accommodate any number of variables and different distributions. This is not class assignment, but a necessary piece for the research that I am producing. Any help will be greatly appreciated, and please, treat this as a teaching moment. I really want to learn this.
I'm not 100% sure what your question is here.
If you want to simulate new data from the same model with the same predictor variables, you can use the simulate() method:
dd <- data.frame(y, gender, age)
## best practice when modeling in R: take the variables from a data frame
model <- glm(y ~ gender + age, data = dd, family = "binomial")
simulate(model)
You can create multiple replicates by specifying the nsim= argument (or you can simulate anew every time through a for() loop)
If you want to simulate new data from a different set of predictor variables, you have to do a little bit more work (some model types in R have a newdata= argument, but not GLMs alas):
## simulate new model matrix (including intercept)
simdat <- cbind(1,
gender = rbinom(100, prob = 0.5, size = 1),
age = sample(18:80, size = 100, replace = TRUE))
## extract inverse-link function
invlink <- family(model)$linkinv
## sample new values
resp <- rbinom(n = 100, size = 1, prob = invlink(simdat %*% coef(model)))
If you want to do this later from coefficients that have been stored, substitute the retrieved coefficient vector for coef(model) in the code above.
If you want to flexibly construct formulas, reformulate() is your friend — but I don't see how it fits in here.
If you want to (say) re-fit the model 1000 times to new responses simulated from the original model fit (same coefficients, same predictors: i.e. a parametric bootstrap), you can do something like this.
nsim <- 1000
res <- matrix(NA, ncol = length(coef(model)), nrow = nsim)
for (i in 1:nsim) {
## simulate returns a list (in this case, of length 1);
## extract the response vector
newresp <- simulate(model)[[1]]
newfit <- update(model, newresp ~ .)
res[i,] <- coef(newfit)
}
You don't have to store coefficients - you can extract/compute whatever model summaries you like (change the number of columns of res appropriately).
Let’s say your data matrix including age and gender, or whatever predictors, is X. Then you can use X on the right-hand side of your glm formula, get xb_hat <- X %*% the_estimates (or whatever other data matrix replacing X as long as it has same columns) and plug xb_hat into whatever link function you want.
I am new to R and am trying to loop a mixed model across 90 columns in a dataset.
My dataset looks like the following one but has 90 predictors instead of 7 that I need to evaluate as fixed effects in consecutive models.
I then need to store the model output (coefficients and P values) to finally construct a figure summarizing the size effects of each predictor. I know the discussion of P value estimates from lme4 mixed models.
For example:
set.seed(101)
mydata <- tibble(id = rep(1:32, times=25),
time = sample(1:800),
experiment = rep(1:4, times=200),
Y = sample(1:800),
predictor_1 = runif(800),
predictor_2 = rnorm(800),
predictor_3 = sample(1:800),
predictor_4 = sample(1:800),
predictor_5 = seq(1:800),
predictor_6 = sample(1:800),
predictor_7 = runif(800)) %>% arrange (id, time)
The model to iterate across the N predictors is:
library(lme4)
library(lmerTest) # To obtain new values
mixed.model <- lmer(Y ~ predictor_1 + time + (1|id) + (1|experiment), data = mydata)
summary(mixed.model)
My coding skills are far from being able to set a loop to repeat the model across the N predictors in my dataset and store the coefficients and P values in a dataframe.
I have been able to iterate across all the predictors fitting linear models instead of mixed models using lapply. But I have failed to apply this strategy with mixed models.
varlist <- names(mydata)[5:11]
lm_models <- lapply(varlist, function(x) {
lm(substitute(Y ~ i, list(i = as.name(x))), data = mydata)
})
One option is to update the formula of a restricted model (w/o predictor) in an lapply loop over the predictors. Then summaryze the resulting list and subset the coefficient matrix using a Vectorized function.
library(lmerTest)
mixed.model <- lmer(Y ~ time + (1|id) + (1|experiment), data = mydata)
preds <- grep('pred', names(mydata), value=TRUE)
fits <- lapply(preds, \(x) update(mixed.model, paste('. ~ . + ', x)))
extract_coef_p <- Vectorize(\(x) x |> summary() |> coef() |> {\(.) .[3, c(1, 5)]}())
res <- `rownames<-`(t(extract_coef_p(fits)), preds)
res
# Estimate Pr(>|t|)
# predictor_1 -7.177579138 0.8002737
# predictor_2 -5.010342111 0.5377551
# predictor_3 -0.013030513 0.7126500
# predictor_4 -0.041702039 0.2383835
# predictor_5 -0.001437124 0.9676346
# predictor_6 0.005259293 0.8818644
# predictor_7 31.304496255 0.2511275
Im trying to create AIC scores for several different models in a for loop.
I have created a for loop with the log likeliness for each model. However, I am stuck to create the lm function so that it calculates a model for each combination of my column LOGABUNDANCE with columns 4 to 11 of my dataframe.
This is the code I have used so far. But that gives me a similar AIC score for every model.
# AIC score for every model
LL <- rep(NA, 10)
AIC <- rep(NA, 10)
for(i in 1:10){
mod <- lm(LOGABUNDANCE ~ . , data = butterfly)
sigma = as.numeric(summary(mod)[6])
LL[i] <- sum(log(dnorm(butterfly$LOGABUNDANCE, predict(mod), sigma)))
AIC[i] <- -2*LL[i] + 2*(2)
}
You get the same AIC for every model, because you create 10 equal models.
To make the code work, you need some way of changing the model in each iteration.
I can see two options:
Either subset the data in the start of each iteration so it only contains LOGABUNDANCE and one other variable (as suggested by #yacine-hajji in the comments), or
Create a vector of the variables you want to create models with, and use as.formula() together with paste0() to create a new formula for each iteration.
I think solution 2 is easier. Here is a working example of solution 2, using mtcars:
# AIC score for every model
LL <- rep(NA, 10)
AIC <- rep(NA, 10)
# Say I want to model all variables against `mpg`:
# Create a vector of all variable names except mpg
variables <- names(mtcars)[-1]
for(i in 1:10){
# Note how the formula is different in each iteration
mod <- lm(
as.formula(paste0("mpg ~ ", variables[i])),
data = mtcars
)
sigma = as.numeric(summary(mod)[6])
LL[i] <- sum(log(dnorm(mtcars$mpg, predict(mod), sigma)))
AIC[i] <- -2*LL[i] + 2*(2)
}
Output:
AIC
#> [1] 167.3716 168.2746 179.3039 188.8652 164.0947 202.6534 190.2124 194.5496
#> [9] 200.4291 197.2459
I am using modelsummary to display the results of several multinomial models, each pooled over 5 datasets using the mice::pool function. It works great, but I want to add the q-value / adjusted p-value for false discovery rate.
I understand I need to create a tidy_custom.mipo function to add this statistic but I can't get it to work.
Below is the code to get the 'pool_univariate' list of mipo objects, which I then pass to modelsummary. It works great, I just want to add the q-value statistic.
Any idea how to do that?
Thanks a lot!
# list of exposures
exposures <- c(
Cs(exposure1,exposure2,exposure3)
## model function
models <- function(x) {
lapply(imputed_data, function(y)
multinom(as.formula(
paste0(
"outcome ~ ",
x
)
), data = y, model = TRUE)
)
}
## run models
models_univariate <- as.list(seq(1,length(exposures)))
models_univariate <- pblapply(exposures, models)
## pool
pool_univariate <- as.list(seq(1,length(exposures)))
# run pool
for(j in seq_along(exposures)) {
pool_univariate[[j]] <- pool(models_univariate[[j]])
}
It is difficult to answer this question without a minimal working example. Here I give a simpler example than the original, for the linear regression context.
First, load the package and estimate a regression model:
library(modelsummary)
mod <- lm(mpg ~ hp + drat + vs + am, data = mtcars)
Second, since we want to summarize a model of class lm, we define a new method called tidy_custom.lm. This function takes a statistical model as input, and returns a data frame that conforms to the broom package specification, with one column called term and other columns containing matching statistics. In the current example, the data frame will include three new statistics (q.value, bonferroni and holm). These values are computed using R’s p.adjust function, which adjusts p values for multiple comparison:
tidy_custom.lm <- function(x, ...) {
out <- broom::tidy(x)
out$q.value <- p.adjust(out$p.value, n = 10, method = "fdr")
out$bonferroni <- p.adjust(out$p.value, n = 10, method = "bonferroni")
out$holm <- p.adjust(out$p.value, n = 10, method = "holm")
return(out)
}
Now, we can call modelsummary with our lm model, and request the statistics:
modelsummary(mod, statistic = "q.value")
We can also compare different p values and label them nicely using glue strings:
modelsummary(mod,
statistic = c(
"p = {p.value}",
"q = {q.value}",
"p (Bonferroni) = {bonferroni}",
"p (Holm) = {holm}"
)
)
The working data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
The lme model is fitted as:
library(lme4)
mixed <- lmer(y ~ x + (1|fac1) + (1|fac2), data = df)
I used bootMer to run the parametric bootstrapping and I can successfully obtain the coefficients (intercept) and SEs for fixed&random effects:
mixed_boot_sum <- function(data){s <- sigma(data)
c(beta = getME(data, "fixef"), theta = getME(data, "theta"), sigma = s)}
mixed_boot <- bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", use.u = FALSE)
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I have no problem extracting the coefficients(slope) from mixed model by using augment function from broom package, see below:
library(broom)
mixed.coef <- augment(mixed, df)
However, it seems like broom can't deal with boot class object. I can't use above functions directly on mixed_boot.
I also tried to modify the mixed_boot_sum by adding mmList( I thought this would be what I am looking for), but R complains as:
Error in bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", :
bootMer currently only handles functions that return numeric vectors
Furthermore, is it possible to obtain CI of both fixed&random effects by specifying FUN as well?
Now, I am very confused about the correct specifications for the FUN in order to achieve my needs. Any help regarding to my question would be greatly appreciated!
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I'm not sure what you mean by "coefficients(slope) of each individual level". broom::augment(mixed, df) gives the predictions (residuals, etc.) for every observation. If you want the predicted coefficients at each level I would try
mixed_boot_coefs <- function(fit){
unlist(coef(fit))
}
which for the original model gives
mixed_boot_coefs(mixed)
## fac1.(Intercept)1 fac1.(Intercept)2 fac1.(Intercept)3 fac1.(Intercept)4
## -0.4973925 -0.1210432 -0.3260958 0.2645979
## fac1.(Intercept)5 fac1.x1 fac1.x2 fac1.x3
## -0.6288728 0.2187408 0.2187408 0.2187408
## fac1.x4 fac1.x5 fac2.(Intercept)1 fac2.(Intercept)2
## 0.2187408 0.2187408 -0.2617613 -0.2617613
## ...
If you want the resulting object to be more clearly named you can use:
flatten <- function(cc) setNames(unlist(cc),
outer(rownames(cc),colnames(cc),
function(x,y) paste0(y,x)))
mixed_boot_coefs <- function(fit){
unlist(lapply(coef(fit),flatten))
}
When run through bootMer/confint/boot::boot.ci these functions will give confidence intervals for each of these values (note that all of the slopes facW.xZ are identical across groups because the model assumes random variation in the intercept only). In other words, whatever information you know how to extract from a fitted model (conditional modes/BLUPs [ranef], predicted intercepts and slopes for each level of the grouping variable [coef], parameter estimates [fixef, getME], random-effects variances [VarCorr], predictions under specific conditions [predict] ...) can be used in bootMer's FUN argument, as long as you can flatten its structure into a simple numeric vector.