Interpreting and plotting car::vif() with categorical variable - r

I am trying to use vif() from the car package to calculate VIF values after a regression based on this guide.
Without any categorical variables you get output that looks like this:
#code
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
vif_values <- vif(model)
vif_values
barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue")
abline(v = 5, lwd = 3, lty = 2)
disp hp wt drat
8.209402 2.894373 5.096601 2.279547
However, the output changes if you add a categorical variable:
mtcars$cat <- sample(c("a", "b", "c"), size = nrow(mtcars), replace = TRUE)
model <- lm(mpg ~ disp + hp + wt + drat + cat, data = mtcars)
vif_values <- vif(model)
vif_values
GVIF Df GVIF^(1/(2*Df))
disp 8.462128 1 2.908974
hp 3.235798 1 1.798832
wt 5.462287 1 2.337154
drat 2.555776 1 1.598679
cat 1.321969 2 1.072273
Two questions: 1. How do I interpret this different output? Is the GVIF equivalent to the numbers output in the first version? 2. How do I make a nice bar chart with this the way the guide shows?

Related

persp add factor group in R

Following the margins vignette https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html#Motivation I would like to know how to plot using persp after a logit containing a triple interaction.
Using only persp and effect only part of the interaction is shown (drat and wt)
x1 <- lm(mpg ~ drat * wt * am, data = mtcars)
head(mtcars)
persp(x1, what = "effect")
However I would like to see the same graph above but at am=0 and am=1. I tried:
persp(x1,"drat","wt", at = list(am = 0:1), what = "effect")
But the same graph is produced. How to see two graphs at am=0 and am=1? or at least two curves representing am=0 and am=1 in the same cube.
Thanks
It doesn't look like you can do it with the persp.glm() function in the margins package. You will probably have to do it "by hand".
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
x1 <- glm(hihp ~ drat * wt * am + disp + qsec, data = mtcars, family=binomial)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-05-16 by the reprex package (v2.0.1)
Edit: what if you add a factor to the model?
If we turn cyl into a factor and add it to the model, we also have to add it to the tmp object in the predfun() function, however it has to have the same properties that it has in the data, i.e., it has to be a factor (that has a single value) that has the same levels and labels as the one in the data. Here's an example:
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
mtcars$cyl <- factor(mtcars$cyl)
x1 <- glm(hihp ~ drat * wt * am + disp + qsec + cyl, data = mtcars, family=binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE),
cyl = factor(2, levels=1:3, labels=levels(mtcars$cyl)))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-06-06 by the reprex package (v2.0.1)

Impose Constraint on Intercept in Linear Regression Using R [duplicate]

This question already has answers here:
Force certain parameters to have positive coefficients in lm()
(3 answers)
Closed 1 year ago.
I have a linear regression of the form
Y = a + b1 * X1 + b2 * X2 + b3 * X4
I would like to constrain the intercept parameter a to be a => 0
(i.e., a should be a non-negative value).
What are possible ways to do this in R? Specifically, I would be interested in solutions using the caret package.
Thank you for your answers.
A linear model.
m0 <- lm(wt ~ qsec + hp + disp, data = mtcars)
m0
#
# Call:
# lm(formula = wt ~ qsec + hp + disp, data = mtcars)
#
# Coefficients:
# (Intercept) qsec hp disp
# -2.450047 0.201713 0.003466 0.006755
Force the intercept to be zero.
m1 <- lm(wt ~ qsec + hp + disp - 1, data = mtcars)
m1
#
# Call:
# lm(formula = wt ~ qsec + hp + disp - 1, data = mtcars)
#
# Coefficients:
# qsec hp disp
# 0.0842281 0.0002622 0.0072967
You can use nls to apply limits to the paramaters (in this case the lower limit).
m1n <- nls(wt ~ a + b1 * qsec + b2 * hp + b3 * disp,
data = mtcars,
start = list(a = 1, b1 = 1, b2 = 1, b3 = 1),
lower = c(0, -Inf, -Inf, -Inf), algorithm = "port")
m1n
# Nonlinear regression model
# model: wt ~ a + b1 * qsec + b2 * hp + b3 * disp
# data: mtcars
# a b1 b2 b3
# 0.0000000 0.0842281 0.0002622 0.0072967
# residual sum-of-squares: 4.926
#
# Algorithm "port", convergence message: relative convergence (4)
See here for other example solutions.

Path diagram in r

I am trying to plot a path diagram of a Structural Equation Model(SEM) in R. I was able to plot it using semPlot::semPaths(). The output is similar to The SEM was modeled using lavaan package.
I want a plot similar to . with estimates and p values. Can anyone help me out?
My suggestion would be lavaanPlot (see more of it in the author's personal website):
library(lavaan)
library(lavaanPlot)
# path model
model <- 'mpg ~ cyl + disp + hp
qsec ~ disp + hp + wt'
fit1 <- sem(model, data = mtcars)
labels1 <- list(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight") #define labels
lavaanPlot(model = fit1, labels = labels1, coefs = TRUE, stand = TRUE, sig = 0.05) #standardized regression paths, showing only paths with p<= .05
check this example, it might be helpful
https://rstudio-pubs-static.s3.amazonaws.com/78926_5aa94ae32fae49f3a384ce885744ef4a.html

ANOVA problems with revoScaleR::rxGlm() in R

I build lots of GLMs. Usually on large data sets with many model parameters. This means that base R's glm() function isn't really useful because it won't cope with the size/complexity, so I usually use revoScaleR::rxGlm() instead.
However I'd like to be able to do ANOVA tests on pairs of nested models, and I haven't found a way to do this with the model objects that rxGlm() creates, because R's anova() function won't work with them. revoScaleR provides an as.glm() function which converts an rxGlm() object to a glm() object - sort of - but it doesn't work here.
For example:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
# fit a GLM: mpg ~ everything else
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
# fit another GLM where gear is removed
glm_a2 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a2)
# F test on difference
anova(glm_a1, glm_a2, test = "F")
works fine, but if instead I do:
library(dplyr)
data(mtcars)
# don't like having named rows
mtcars <- mtcars %>%
mutate(veh_name = rownames(.)) %>%
select(veh_name, everything())
glm_b1 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b1)
# fit another GLM where gear is removed
glm_b2 <- rxGlm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + carb,
data = mtcars,
family = gaussian(link = "identity"),
verbose = 1)
summary(glm_b2)
# F test on difference
anova(as.glm(glm_b1), as.glm(glm_b2), test = "F")
I see the error message:
Error in qr.lm(object) : lm object does not have a proper 'qr'
component. Rank zero or should not have used lm(.., qr=FALSE)
The same problem cropped up on a previous SO posting: Error converting rxGlm to GLM but doesn't seem to have been solved.
Can anyone help please? if as.glm() isn't going to help here, is there some other way? Could I write a custom function to do this (stretching my coding abilities to their limit I suspect!)?
Also, is SO the best forum, or would one of the other StackExchange forums be a better place to look for guidance?
Thank you.
Partial solution...
my_anova <- function (model_1, model_2, test_type)
{
# only applies for nested GLMs. How do I test for this?
cat("\n")
if(test_type != "F")
{
cat("Invalid function call")
}
else
{
# display model formulae
cat("Model 1:", format(glm_b1$formula), "\n")
cat("Model 2:", format(glm_b2$formula), "\n")
if(test_type == "F")
{
if (model_1$df[2] < model_2$df[2]) # model 1 is big, model 2 is small
{
dev_s <- model_2$deviance
df_s <- model_2$df[2]
dev_b <- model_1$deviance
df_b <- model_1$df[2]
}
else # model 2 is big, model 1 is small
{
dev_s <- model_1$deviance
df_s <- model_1$df[2]
dev_b <- model_2$deviance
df_b <- model_2$df[2]
}
F <- (dev_s - dev_b) / ((df_s - df_b) * dev_b / df_b)
}
# still need to calculate the F tail probability however
# df of F: numerator: df_s - df_b
# df of F: denominator: df_b
F_test <- pf(F, df_s - df_b, df_b, lower.tail = FALSE)
cat("\n")
cat("F: ", round(F, 4), "\n")
cat("Pr(>F):", round(F_test, 4))
}
}

R logistic regression extracting coefficients in a loop: error with setting up loop

I'm trying to build a logistic regression model with 3 predictors, and I have a list of IDs for each predictor like below. (using mtcars dataset as an example)
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
I want to build multiple regression models with each of these IDs used. am is a fixed variable that I want to predict, so each of my model would look like:
mod1 <- glm(am ~ mpg + mpg + drat, data=mtcars, ...)
mod2 <- glm(am ~ mpg + mpg + wt, data=mtcars, ...)
mod3 <- glm(am ~ mpg + mpg + gear, data=mtcars, ...)
...
mod5 <- glm(am ~ mpg + hp + drat, data=mtcars, ...)
...
mod9 <- glm(am ~ mpg + wt + drat, data=mtcars, ...)
...
mod36 <- glm(am ~ disp + wt + carb, data=mtcars, ...)
So in this case it would be 3*3*4 = 36 models total. I'm trying to use apply like below.
coefs_mat <- expand.grid(var1, var2, var3)
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(am ~ row[1] + row[2] + row[3]), data = mtcars,
family = "binomial",control=list(maxit=20))
})
(+ Edit: coefs_mat looks like below:
>coefs_mat
var1 var2 var3
1 mpg mpg drat
2 cyl mpg drat
3 disp mpg drat
4 mpg hp drat
...
36 disp wt carb
This gives the following error: "object of type 'closure' is not subsettable".
I searched for other Stackoverflow posts that had similar problems, and tried this instead:
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(paste("am~", row[1] + row[2] + row[3])), data = mtcars,
family = "binomial",control=list(maxit=20))
})
But this gave another error: "Error in row[1] + row[2] : non-numeric argument to binary operator". What's causing these errors in my code?
I solved this by using sprintf.
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
coefs_mat <- expand.grid(var1, var2, var3)
vars_comb <- apply(coefs_mat, 1, function(x){paste(sort(x), collapse = '+')})
formula_vec <- sprintf("am ~ %s", vars_comb)
glm_res <- lapply(formula_vec, function(x) {
fit1 <- glm(x, data = mtcars, family = binomial("logit"))
return(fit1)
})

Resources