Path diagram in r - r

I am trying to plot a path diagram of a Structural Equation Model(SEM) in R. I was able to plot it using semPlot::semPaths(). The output is similar to The SEM was modeled using lavaan package.
I want a plot similar to . with estimates and p values. Can anyone help me out?

My suggestion would be lavaanPlot (see more of it in the author's personal website):
library(lavaan)
library(lavaanPlot)
# path model
model <- 'mpg ~ cyl + disp + hp
qsec ~ disp + hp + wt'
fit1 <- sem(model, data = mtcars)
labels1 <- list(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight") #define labels
lavaanPlot(model = fit1, labels = labels1, coefs = TRUE, stand = TRUE, sig = 0.05) #standardized regression paths, showing only paths with p<= .05

check this example, it might be helpful
https://rstudio-pubs-static.s3.amazonaws.com/78926_5aa94ae32fae49f3a384ce885744ef4a.html

Related

Interpreting and plotting car::vif() with categorical variable

I am trying to use vif() from the car package to calculate VIF values after a regression based on this guide.
Without any categorical variables you get output that looks like this:
#code
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
vif_values <- vif(model)
vif_values
barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue")
abline(v = 5, lwd = 3, lty = 2)
disp hp wt drat
8.209402 2.894373 5.096601 2.279547
However, the output changes if you add a categorical variable:
mtcars$cat <- sample(c("a", "b", "c"), size = nrow(mtcars), replace = TRUE)
model <- lm(mpg ~ disp + hp + wt + drat + cat, data = mtcars)
vif_values <- vif(model)
vif_values
GVIF Df GVIF^(1/(2*Df))
disp 8.462128 1 2.908974
hp 3.235798 1 1.798832
wt 5.462287 1 2.337154
drat 2.555776 1 1.598679
cat 1.321969 2 1.072273
Two questions: 1. How do I interpret this different output? Is the GVIF equivalent to the numbers output in the first version? 2. How do I make a nice bar chart with this the way the guide shows?

persp add factor group in R

Following the margins vignette https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html#Motivation I would like to know how to plot using persp after a logit containing a triple interaction.
Using only persp and effect only part of the interaction is shown (drat and wt)
x1 <- lm(mpg ~ drat * wt * am, data = mtcars)
head(mtcars)
persp(x1, what = "effect")
However I would like to see the same graph above but at am=0 and am=1. I tried:
persp(x1,"drat","wt", at = list(am = 0:1), what = "effect")
But the same graph is produced. How to see two graphs at am=0 and am=1? or at least two curves representing am=0 and am=1 in the same cube.
Thanks
It doesn't look like you can do it with the persp.glm() function in the margins package. You will probably have to do it "by hand".
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
x1 <- glm(hihp ~ drat * wt * am + disp + qsec, data = mtcars, family=binomial)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-05-16 by the reprex package (v2.0.1)
Edit: what if you add a factor to the model?
If we turn cyl into a factor and add it to the model, we also have to add it to the tmp object in the predfun() function, however it has to have the same properties that it has in the data, i.e., it has to be a factor (that has a single value) that has the same levels and labels as the one in the data. Here's an example:
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
mtcars$cyl <- factor(mtcars$cyl)
x1 <- glm(hihp ~ drat * wt * am + disp + qsec + cyl, data = mtcars, family=binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE),
cyl = factor(2, levels=1:3, labels=levels(mtcars$cyl)))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-06-06 by the reprex package (v2.0.1)

How to modify the plot when ggpredict() is used

The ggpredict() function is part of the ggeffects package. I like to change the names of the coefficients in the following plot:
lme1<- lme(mpg ~ cyl + disp + hp,
random = ~1|disp, method = "ML", data=mtcars)
plot(ggpredict(lme1, terms = c("cyl", "disp", "hp[80,150,215]")))
How to change the name of the y and x axis can be achieved by adding "labs(x = "Number of cylinders",
y = "Miles/(US) gallon")".
plot(ggpredict(lme1, terms = c("cyl", "disp", "hp[80,150,215]"))) +
labs(x = "Number of cylinders",
y = "Miles/(US) gallon")
But how can the third coefficients "hp" be changed into e.g. "Gross horsepower"?
You could build the plot by yourself, and then use a custom labeller-function. Another solution might be the ggggeffects-package, which provides utilities to build own plots.
The package maintainer has proposed two solutions:
library(ggggeffects)
#> Loading required package: ggeffects
#> Loading required package: ggplot2
library(nlme)
lme1<- lme(mpg ~ cyl + disp + hp,
random = ~1|disp, method = "ML", data=mtcars)
gge <- ggpredict(lme1, terms = c("cyl", "disp", "hp[80,150,215]"))
hp_labeller <- as_labeller(c("80" = "Gross horsepower = 80",
"150" = "Gross horsepower = 150",
"215" = "Gross horsepower = 215"))
autoplot(gge, labeller = labeller(hp = hp_labeller)) +
geom_expected_line() +
geom_CI_ribbon() +
labs(x = "Number of cylinders",
y = "Miles/(US) gallon")
library(ggeffects)
library(ggplot2)
library(nlme)
lme1<- lme(mpg ~ cyl + disp + hp,
random = ~1|disp, method = "ML", data=mtcars)
gge <- ggpredict(lme1, terms = c("cyl", "disp", "hp[80,150,215]"))
p <- plot(gge)
hp_labeller <- as_labeller(c("hp = 80" = "Gross horsepower = 80",
"hp = 150" = "Gross horsepower = 150",
"hp = 215" = "Gross horsepower = 215"))
p +
facet_grid(cols = vars(facet),
labeller = labeller(facet = hp_labeller))
Rename the column before plotting?
library(dplyr)
library(nlme)
library(ggeffects)
df <- mtcars
df <- df %>% rename(Horse_Power = hp)
lme1<- lme(mpg ~ cyl + disp + Horse_Power,
random = ~1|disp, method = "ML", data=df)
plot(ggpredict(lme1, terms = c("cyl", "disp", "Horse_Power[80,150,215]")))

ANOVA problems with revoScaleR::rxGlm() in R

I build lots of GLMs. Usually on large data sets with many model parameters. This means that base R's glm() function isn't really useful because it won't cope with the size/complexity, so I usually use revoScaleR::rxGlm() instead.
However I'd like to be able to do ANOVA tests on pairs of nested models, and I haven't found a way to do this with the model objects that rxGlm() creates, because R's anova() function won't work with them. revoScaleR provides an as.glm() function which converts an rxGlm() object to a glm() object - sort of - but it doesn't work here.
For example:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
# fit a GLM: mpg ~ everything else
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
# fit another GLM where gear is removed
glm_a2 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a2)
# F test on difference
anova(glm_a1, glm_a2, test = "F")
works fine, but if instead I do:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b1)
# fit another GLM where gear is removed
glm_b2 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b2)
# F test on difference
anova(as.glm(glm_b1), as.glm(glm_b2), test = "F")
I see the error message:
Error in qr.lm(object) : lm object does not have a proper 'qr'
component. Rank zero or should not have used lm(.., qr=FALSE)
The same problem cropped up on a previous SO posting: Error converting rxGlm to GLM but doesn't seem to have been solved.
Can anyone help please? if as.glm() isn't going to help here, is there some other way? Could I write a custom function to do this (stretching my coding abilities to their limit I suspect!)?
Also, is SO the best forum, or would one of the other StackExchange forums be a better place to look for guidance?
Thank you.
Partial solution...
my_anova <- function (model_1, model_2, test_type)
{
# only applies for nested GLMs. How do I test for this?
cat("\n")
if(test_type != "F")
{
cat("Invalid function call")
}
else
{
# display model formulae
cat("Model 1:", format(glm_b1$formula), "\n")
cat("Model 2:", format(glm_b2$formula), "\n")
if(test_type == "F")
{
if (model_1$df[2] < model_2$df[2]) # model 1 is big, model 2 is small
{
dev_s <- model_2$deviance
df_s <- model_2$df[2]
dev_b <- model_1$deviance
df_b <- model_1$df[2]
}
else # model 2 is big, model 1 is small
{
dev_s <- model_1$deviance
df_s <- model_1$df[2]
dev_b <- model_2$deviance
df_b <- model_2$df[2]
}
F <- (dev_s - dev_b) / ((df_s - df_b) * dev_b / df_b)
}
# still need to calculate the F tail probability however
# df of F: numerator: df_s - df_b
# df of F: denominator: df_b
F_test <- pf(F, df_s - df_b, df_b, lower.tail = FALSE)
cat("\n")
cat("F: ", round(F, 4), "\n")
cat("Pr(>F):", round(F_test, 4))
}
}

R logistic regression extracting coefficients in a loop: error with setting up loop

I'm trying to build a logistic regression model with 3 predictors, and I have a list of IDs for each predictor like below. (using mtcars dataset as an example)
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
I want to build multiple regression models with each of these IDs used. am is a fixed variable that I want to predict, so each of my model would look like:
mod1 <- glm(am ~ mpg + mpg + drat, data=mtcars, ...)
mod2 <- glm(am ~ mpg + mpg + wt, data=mtcars, ...)
mod3 <- glm(am ~ mpg + mpg + gear, data=mtcars, ...)
...
mod5 <- glm(am ~ mpg + hp + drat, data=mtcars, ...)
...
mod9 <- glm(am ~ mpg + wt + drat, data=mtcars, ...)
...
mod36 <- glm(am ~ disp + wt + carb, data=mtcars, ...)
So in this case it would be 3*3*4 = 36 models total. I'm trying to use apply like below.
coefs_mat <- expand.grid(var1, var2, var3)
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(am ~ row[1] + row[2] + row[3]), data = mtcars,
family = "binomial",control=list(maxit=20))
})
(+ Edit: coefs_mat looks like below:
>coefs_mat
var1 var2 var3
1 mpg mpg drat
2 cyl mpg drat
3 disp mpg drat
4 mpg hp drat
...
36 disp wt carb
This gives the following error: "object of type 'closure' is not subsettable".
I searched for other Stackoverflow posts that had similar problems, and tried this instead:
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(paste("am~", row[1] + row[2] + row[3])), data = mtcars,
family = "binomial",control=list(maxit=20))
})
But this gave another error: "Error in row[1] + row[2] : non-numeric argument to binary operator". What's causing these errors in my code?
I solved this by using sprintf.
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
coefs_mat <- expand.grid(var1, var2, var3)
vars_comb <- apply(coefs_mat, 1, function(x){paste(sort(x), collapse = '+')})
formula_vec <- sprintf("am ~ %s", vars_comb)
glm_res <- lapply(formula_vec, function(x) {
fit1 <- glm(x, data = mtcars, family = binomial("logit"))
return(fit1)
})

Resources