persp add factor group in R - r

Following the margins vignette https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html#Motivation I would like to know how to plot using persp after a logit containing a triple interaction.
Using only persp and effect only part of the interaction is shown (drat and wt)
x1 <- lm(mpg ~ drat * wt * am, data = mtcars)
head(mtcars)
persp(x1, what = "effect")
However I would like to see the same graph above but at am=0 and am=1. I tried:
persp(x1,"drat","wt", at = list(am = 0:1), what = "effect")
But the same graph is produced. How to see two graphs at am=0 and am=1? or at least two curves representing am=0 and am=1 in the same cube.
Thanks

It doesn't look like you can do it with the persp.glm() function in the margins package. You will probably have to do it "by hand".
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
x1 <- glm(hihp ~ drat * wt * am + disp + qsec, data = mtcars, family=binomial)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-05-16 by the reprex package (v2.0.1)
Edit: what if you add a factor to the model?
If we turn cyl into a factor and add it to the model, we also have to add it to the tmp object in the predfun() function, however it has to have the same properties that it has in the data, i.e., it has to be a factor (that has a single value) that has the same levels and labels as the one in the data. Here's an example:
data(mtcars)
mtcars$hihp <- as.numeric(mtcars$hp > quantile(mtcars$hp,.5))
mtcars$cyl <- factor(mtcars$cyl)
x1 <- glm(hihp ~ drat * wt * am + disp + qsec + cyl, data = mtcars, family=binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
drat_s <- with(mtcars, seq(min(drat), max(drat),length=25))
wt_s <- with(mtcars, seq(min(wt), max(wt), length=25))
pred_fun <- function(x,y, am=0){
tmp <- data.frame(drat = x, wt = y, am=am,
disp = mean(mtcars$disp, na.rm=TRUE),
qsec = mean(mtcars$qsec, na.rm=TRUE),
cyl = factor(2, levels=1:3, labels=levels(mtcars$cyl)))
predict(x1, newdata=tmp, type="response")
}
p0 <- outer(drat_s, wt_s, pred_fun)
p1 <- outer(drat_s, wt_s, pred_fun, am=1)
persp(drat_s, wt_s, p0, zlim=c(0,1), theta=-80, col=rgb(.75,.75, .75, .75),
xlab = "Axle Ratio",
ylab="Weight",
zlab="Predicted Probability")
par(new=TRUE)
persp(drat_s, wt_s, p1, zlim=c(0,1), theta=-80, col=rgb(1,0,0,.75), xlab="", ylab="", zlab="")
Created on 2022-06-06 by the reprex package (v2.0.1)

Related

Interpreting and plotting car::vif() with categorical variable

I am trying to use vif() from the car package to calculate VIF values after a regression based on this guide.
Without any categorical variables you get output that looks like this:
#code
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
vif_values <- vif(model)
vif_values
barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue")
abline(v = 5, lwd = 3, lty = 2)
disp hp wt drat
8.209402 2.894373 5.096601 2.279547
However, the output changes if you add a categorical variable:
mtcars$cat <- sample(c("a", "b", "c"), size = nrow(mtcars), replace = TRUE)
model <- lm(mpg ~ disp + hp + wt + drat + cat, data = mtcars)
vif_values <- vif(model)
vif_values
GVIF Df GVIF^(1/(2*Df))
disp 8.462128 1 2.908974
hp 3.235798 1 1.798832
wt 5.462287 1 2.337154
drat 2.555776 1 1.598679
cat 1.321969 2 1.072273
Two questions: 1. How do I interpret this different output? Is the GVIF equivalent to the numbers output in the first version? 2. How do I make a nice bar chart with this the way the guide shows?

Path diagram in r

I am trying to plot a path diagram of a Structural Equation Model(SEM) in R. I was able to plot it using semPlot::semPaths(). The output is similar to The SEM was modeled using lavaan package.
I want a plot similar to . with estimates and p values. Can anyone help me out?
My suggestion would be lavaanPlot (see more of it in the author's personal website):
library(lavaan)
library(lavaanPlot)
# path model
model <- 'mpg ~ cyl + disp + hp
qsec ~ disp + hp + wt'
fit1 <- sem(model, data = mtcars)
labels1 <- list(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight") #define labels
lavaanPlot(model = fit1, labels = labels1, coefs = TRUE, stand = TRUE, sig = 0.05) #standardized regression paths, showing only paths with p<= .05
check this example, it might be helpful
https://rstudio-pubs-static.s3.amazonaws.com/78926_5aa94ae32fae49f3a384ce885744ef4a.html

R plot gam 3D surface to show also actual response values

I'm quite an R newbie and facing the following challange.
I'll share my code here but applied to a different dataframe since I cannot share the original dataframe.
This is my code:
fit = gam( carb ~ te(cyl, hp, k=c(3,4)), data = mtcars)
plot(fit,rug=F,pers=T,theta=45,main="test")
using my company's data, this generates a nice surface with the predicted values on the Z axes.
I would like to add the actual response values as red dots on Z axis so that I could see where predicted values are under/over estimating the actual reponse.
Would you know what parameter I should add to plot in order to do that?
Many thanks
As #李哲源 pointed out in the comments, you shouldn't use plot here, because it's not flexible enough. Here's a version based on the referenced question Rough thin-plate spline fitting (thin-plate spline interpolation) in R with mgcv.
# First, get the fit
library(mgcv)
fit <- gam( carb ~ te(cyl, hp, k=c(3,4)), data = mtcars)
# Now expand it to a grid so that persp will work
steps <- 30
cyl <- with(mtcars, seq(min(cyl), max(cyl), length = steps) )
hp <- with(mtcars, seq(min(hp), max(hp), length = steps) )
newdat <- expand.grid(cyl = cyl, hp = hp)
carb <- matrix(predict(fit, newdat), steps, steps)
# Now plot it
p <- persp(cyl, hp, carb, theta = 45, col = "yellow")
# To add the points, you need the same 3d transformation
obs <- with(mtcars, trans3d(cyl, hp, carb, p))
pred <- with(mtcars, trans3d(cyl, hp, fitted(fit), p))
points(obs, col = "red", pch = 16)
# Add segments to show where the points are in 3d
segments(obs$x, obs$y, pred$x, pred$y)
That produces the following plot:
You might not want to make predictions so far from the observed data. You can put NA values into carb to avoid that. This code does that:
exclude <- exclude.too.far(rep(cyl,steps),
rep(hp, rep(steps, steps)),
mtcars$cyl,
mtcars$hp, 0.15) # 0.15 chosen by trial and error
carb[exclude] <- NA
p <- persp(cyl, hp, carb, theta = 45, col = "yellow")
obs <- with(mtcars, trans3d(cyl, hp, carb, p))
pred <- with(mtcars, trans3d(cyl, hp, fitted(fit), p))
points(obs, col = "red", pch = 16)
segments(obs$x, obs$y, pred$x, pred$y)
That produces this plot:
Finally, you might want to use the rgl package to get a dynamic graph instead. After the same manipulations as above, use this code to do the plotting:
library(rgl)
persp3d(cyl, hp, carb, col="yellow", polygon_offset = 1)
surface3d(cyl, hp, carb, front = "lines", back = "lines")
with(mtcars, points3d(cyl, hp, carb, col = "red"))
with(mtcars, segments3d(rep(cyl, each = 2),
rep(hp, each = 2),
as.numeric(rbind(fitted(fit),
carb))))
Here's one possible view:
You can use the mouse to rotate this one if you want to see it from a different angle. One other advantage is that points that should be hidden by the surface really are hidden; in persp, they'll plot on top even if they should be behind it.

How to create prediction line for Quadratic Model

I am trying to create a quadratic prediction line for a quadratic model. I am using the Auto dataset that comes with R. I had no trouble creating the prediction line for a linear model. However, the quadratic model yields crazy looking lines. Here is my code.
# Linear Model
plot(Auto$horsepower, Auto$mpg,
main = "MPG versus Horsepower",
pch = 20)
lin_mod = lm(mpg ~ horsepower,
data = Auto)
lin_pred = predict(lin_mod)
lines(
Auto$horsepower, lin_pred,
col = "blue", lwd = 2
)
# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2
quad_model = lm(mpg ~ horsepower2,
data = Auto)
quad_pred = predict(quad_model)
lines(
Auto$horsepower,
quad_pred,
col = "red", lwd = 2
)
I am 99% sure that the issue is the prediction function. Why can't I produce a neat looking quadratic prediction curve? The following code I tried does not work—could it be related?:
quad_pred = predict(quad_model, data.frame(horsepower = Auto$horsepower))
Thanks!
The issue is that the x-axis values aren't sorted. It wouldn't matter if was a linear model but it would be noticeable if it was polynomial. I created a new sorted data set and it works fine:
library(ISLR) # To load data Auto
# Linear Model
plot(Auto$horsepower, Auto$mpg,
main = "MPG versus Horsepower",
pch = 20)
lin_mod = lm(mpg ~ horsepower,
data = Auto)
lin_pred = predict(lin_mod)
lines(
Auto$horsepower, lin_pred,
col = "blue", lwd = 2
)
# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2
# Sorting Auto by horsepower2
Auto2 <- Auto[order(Auto$horsepower2), ]
quad_model = lm(mpg ~ horsepower2,
data = Auto2)
quad_pred = predict(quad_model)
lines(
Auto2$horsepower,
quad_pred,
col = "red", lwd = 2
)
One option is to create the sequence of x-values for which you would like to plot the fitted lines. This can be useful if your data has a "gap" or if you wish to plot the fitted lines outside of the range of the x-variables.
# load dataset; if necessary run install.packages("ISLR")
data(Auto, package = "ISLR")
# since only 2 variables at issue, use short names
mpg <- Auto$mpg
hp <- Auto$horsepower
# fit linear and quadratic models
lmod <- lm(mpg ~ hp)
qmod <- lm(mpg ~ hp + I(hp^2))
# plot the data
plot(x=hp, y=mpg, pch=20)
# use predict() to find coordinates of points to plot
x_coords <- seq(from=floor(min(hp)), to=ceiling(max(hp)), by=1)
y_coords_lmod <- predict(lmod, newdata=data.frame(hp=x_coords))
y_coords_qmod <- predict(qmod, newdata=data.frame(hp=x_coords))
# alternatively, calculate this manually using the fitted coefficients
y_coords_lmod <- coef(lmod)[1] + coef(lmod)[2]*x_coords
y_coords_qmod <- coef(qmod)[1] + coef(qmod)[2]*x_coords + coef(qmod)[3]*x_coords^2
# add the fitted lines to the plot
points(x=x_coords, y=y_coords_lmod, type="l", col="blue")
points(x=x_coords, y=y_coords_qmod, type="l", col="red")
Alternatively, using ggplot2:
ggplot(Auto, aes(x = horsepower, y = mpg)) + geom_point() +
stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ x, colour = "red") +
stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ poly(x, 2), colour = "blue")

R print equation of linear regression on the plot itself

How do we print the equation of a line on a plot?
I have 2 independent variables and would like an equation like this:
y=mx1+bx2+c
where x1=cost, x2 =targeting
I can plot the best fit line but how do i print the equation on the plot?
Maybe i cant print the 2 independent variables in one equation but how do i do it for say
y=mx1+c at least?
Here is my code:
fit=lm(Signups ~ cost + targeting)
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(lm(Signups ~ cost))
I tried to automate the output a bit:
fit <- lm(mpg ~ cyl + hp, data = mtcars)
summary(fit)
##Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.90833 2.19080 16.847 < 2e-16 ***
## cyl -2.26469 0.57589 -3.933 0.00048 ***
## hp -0.01912 0.01500 -1.275 0.21253
plot(mpg ~ cyl, data = mtcars, xlab = "Cylinders", ylab = "Miles per gallon")
abline(coef(fit)[1:2])
## rounded coefficients for better output
cf <- round(coef(fit), 2)
## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("mpg = ", cf[1],
ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " cyl ",
ifelse(sign(cf[3])==1, " + ", " - "), abs(cf[3]), " hp")
## printing of the equation
mtext(eq, 3, line=-2)
Hope it helps,
alex
You use ?text. In addition, you should not use abline(lm(Signups ~ cost)), as this is a different model (see my answer on CV here: Is there a difference between 'controling for' and 'ignoring' other variables in multiple regression). At any rate, consider:
set.seed(1)
Signups <- rnorm(20)
cost <- rnorm(20)
targeting <- rnorm(20)
fit <- lm(Signups ~ cost + targeting)
summary(fit)
# ...
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.1494 0.2072 0.721 0.481
# cost -0.1516 0.2504 -0.605 0.553
# targeting 0.2894 0.2695 1.074 0.298
# ...
windows();{
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(coef(fit)[1:2])
text(-2, -2, adj=c(0,0), labels="Signups = .15 -.15cost + .29targeting")
}
Here's a solution using tidyverse packages.
The key is the broom package, whcih simplifies the process of extracting model data. For example:
fit1 <- lm(mpg ~ cyl, data = mtcars)
summary(fit1)
fit1 %>%
tidy() %>%
select(estimate, term)
Result
# A tibble: 2 x 2
estimate term
<dbl> <chr>
1 37.9 (Intercept)
2 -2.88 cyl
I wrote a function to extract and format the information using dplyr:
get_formula <- function(object) {
object %>%
tidy() %>%
mutate(
term = if_else(term == "(Intercept)", "", term),
sign = case_when(
term == "" ~ "",
estimate < 0 ~ "-",
estimate >= 0 ~ "+"
),
estimate = as.character(round(abs(estimate), digits = 2)),
term = if_else(term == "", paste(sign, estimate), paste(sign, estimate, term))
) %>%
summarize(terms = paste(term, collapse = " ")) %>%
pull(terms)
}
get_formula(fit1)
Result
[1] " 37.88 - 2.88 cyl"
Then use ggplot2 to plot the line and add a caption
mtcars %>%
ggplot(mapping = aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
labs(
x = "Cylinders", y = "Miles per Gallon",
caption = paste("mpg =", get_formula(fit1))
)
Plot using geom_smooth()
This approach of plotting a line really only makes sense to visualize the relationship between two variables. As #Glen_b pointed out in the comment, the slope we get from modelling mpg as a function of cyl (-2.88) doesn't match the slope we get from modelling mpg as a function of cyl and other variables (-1.29). For example:
fit2 <- lm(mpg ~ cyl + disp + wt + hp, data = mtcars)
summary(fit2)
fit2 %>%
tidy() %>%
select(estimate, term)
Result
# A tibble: 5 x 2
estimate term
<dbl> <chr>
1 40.8 (Intercept)
2 -1.29 cyl
3 0.0116 disp
4 -3.85 wt
5 -0.0205 hp
That said, if you want to accurately plot the regression line for a model that includes variables that don't appear included in the plot, use geom_abline() instead and get the slope and intercept using broom package functions. As far as I know geom_smooth() formulas can't reference variables that aren't already mapped as aesthetics.
mtcars %>%
ggplot(mapping = aes(x = cyl, y = mpg)) +
geom_point() +
geom_abline(
slope = fit2 %>% tidy() %>% filter(term == "cyl") %>% pull(estimate),
intercept = fit2 %>% tidy() %>% filter(term == "(Intercept)") %>% pull(estimate),
color = "blue"
) +
labs(
x = "Cylinders", y = "Miles per Gallon",
caption = paste("mpg =", get_formula(fit2))
)
Plot using geom_abline()

Resources