R || Adjusting x-axis in sjPlot::plot_model() - r

I want to graph an interaction effect between two variables with one outcome in R. While I can successfully produce a graph using sjPlot:plot_model(), the interaction plot does not resize when I adjust the x-axis values. Instead, the graph that's plotted is always that of the original-size while the x- and y-axis will adjust. Below is an example using the mtcars data in R.
library(sjPlot)
library(sjmisc)
library(ggplot2)
mtcars.df <- mtcars
fit <- lm(mpg ~ hp * disp, data = mtcars.df)
plot_model(fit, type = "pred", terms = c("hp", "disp"))
I can get a graph like this in my own code. However, when I attempt to alter the x- and y-axes as seen below, the grid expands, but the graph itself does not.
plot_model(fit, type = "pred", terms = c("hp", "disp"), axis.lim = list(c(0,150),c(0,200)))
Picture of successfully graphed interaction with wildly exaggerated adjustments to the axes. The graph does not extend but the grid does.
What code can I use to adjust both the lines of my interaction effect AND those of the grid? Adjusting post-hoc with
plot_model(fit, type = "pred", terms = c("hp", "disp"))+xlim(0,150)
creates the same issue.
Post-hoc extending the graph creates the same issue.

plot_model will only plot interactions over the range of your original data. It's really not difficult to do it directly in ggplot though by feeding whatever x values you want into predict:
library(ggplot2)
mtcars.df <- mtcars
fit <- lm(mpg ~ hp * disp, data = mtcars.df)
new_df <- expand.grid(hp = 0:300, disp = c(106.78, 230.72, 354.66))
predictions <- predict(fit, new_df, se = TRUE)
new_df$mpg <- predictions$fit
new_df$upper <- new_df$mpg + 1.96 * predictions$se.fit
new_df$lower <- new_df$mpg - 1.96 * predictions$se.fit
new_df$disp <- factor(new_df$disp)
ggplot(new_df, aes(hp, mpg)) +
geom_ribbon(aes(ymax = upper, ymin = lower, fill = disp), alpha = 0.3) +
geom_line(aes(color = disp)) +
scale_fill_brewer(palette = "Set1") +
scale_color_brewer(palette = "Set1")
Created on 2022-05-21 by the reprex package (v2.0.1)

plot_model allow you to choose the range of the plot just adding the range in square braquets next to the selected variable <<[min,max]>>.
I think the easiest way would be the following:
plot_model(fit, type = "pred", terms = c("hp [0,300]", "disp"))
You can find more details here:
https://strengejacke.github.io/sjPlot/articles/plot_marginal_effects.html

Related

How to plot a subset of plot_model x-axis items in ggplot2 without filtering data before plotting?

I would like to use sjPlot::plot_model to generate some marginal effects plots that are later modified slightly. Specifically, I would like to run a regression with an ordered categorical predictor. I would then like to run plot_model on the regression object to generate a ggplot2 object but drop some elements of the categorical predictor for ease of visualization (for example, from 10 categories in the regression to 5 categories in the plot).
I know I can use ggeffects::ggpredict() to generate the underlying data used by plot_model but was hoping there was something simpler like passing an argument to scale_x_discrete().
This is the standard plot_model output. Is there a straightforward way to drop one of the x-axis elements like "6" but still plot "4" and "8"?
library(sjPlot)
mt <- mtcars
mt$cyl_fct <- as.factor(mt$cyl)
# automatic transmission vs number of cylinders
glm_out <- glm(am ~ cyl_fct, family = binomial, data = mt)
# plot model works fine but how to just show just 4 and 8 on x-axis?
plot_model(glm_out, type = "eff", terms = "cyl_fct") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1))
# options like `breaks` and `limits` don't seem to do the trick
plot_model(glm_out, type = "eff", terms = "cyl_fct") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
scale_x_discrete(breaks = c("4", "8"), limits = c("4", "8"))
You can specify the terms that you want to plot in the plot_model function, e.g.
library(tidyverse)
library(sjPlot)
mt <- mtcars
mt$cyl_fct <- as.factor(mt$cyl)
# automatic transmission vs number of cylinders
glm_out <- glm(am ~ cyl_fct, family = binomial, data = mt)
# plot model works fine but how to just show just 4 and 8 on x-axis?
plot_model(glm_out, type = "eff", terms = "cyl_fct[4, 8]") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1))
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.
Created on 2021-07-21 by the reprex package (v2.0.0)
--
If this looks a little weird due to the "gap" where a term is supposed to be, you can then adjust the x axis scale to suit, e.g.
plot_model(glm_out, type = "eff", terms = "cyl_fct[4, 8]") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
scale_x_discrete(breaks = c(4, 8), limits = c(4, 8))

display nonsignificant coefficients with ggplot2 and legend display

I am trying to show different colors for coefficients that are not significant (p>0.05) and the ones that are. Plus, if someone has a way to show the legend or signify the colors that would also be nice..
Any ideas?
Sample code:
library(nycflights13)
library(dplyr)
library(dotwhisker)
library(MASS)
flights <- nycflights13::flights
flights<- sample_n (flights, 500)
m1<- glm(formula = arr_delay ~ dep_time + origin+ air_time+ distance , data = flights)
#m1<- glm(formula = arr_delay ~ . , data = flights)
m1<- stepAIC(m1)
p<- dotwhisker::dwplot(m1)
z<- p +
geom_vline(xintercept=0, linetype="dashed")+
geom_segment(aes(x=conf.low,y=term,xend=conf.high,
yend=term,col=p.value<0.05)) +
geom_point(aes(x=estimate,y=term,col=p.value<0.05)) +
xlab("standardized coefficient") +
ylab("coefficient") +
ggtitle("coefficients in the model and significance")
print(z)
Your code already kind of does what you want. The problem is that the object p produced by dwplot already has a geom_segment layer and a geom_point layer with a number of aesthetic mappings. Their colors are currently mapped to the variable model, which is just a factor level allowing for different colorings when comparing models side by side. It is possible to over-write them though:
p$layers[[1]]$mapping[5] <- aes(color = p.value < 0.05)
p$layers[[2]]$mapping[4] <- aes(color = p.value < 0.05)
And you can change the legend label with
p$labels$colour <- "Significant"
By default, dwplot also hides the legend, but we can reset that with:
p$theme <- list()
So without adding any new geoms or creating the object z, we have:
p
Note that p is still a valid and internally consistent ggplot, so you can continue to style it as desired, for example:
p + theme_bw() + geom_vline(xintercept = 0, lty = 2)

How to add mean to grouped bwplot Lattice R

I have a grouped boxplot that shows for each category two boxes side by side (see code). Now I am interested in adding the mean for each category and box separately. I can calculate and visualize the mean for each category but not conditioned on the grouped variable "year". I tried to calculate the means for each year individually and add them separately, but that did not work.
data(mpg, package = "ggplot2")
library(latticeExtra)
tmp <- tapply(mpg$hwy, mpg$class, FUN =mean)
bwplot(class~hwy, data = mpg, groups = year,
box.width = 1/3,
panel = panel.superpose,
panel.groups = function(x, y,..., group.number) {
panel.bwplot(x,y + (group.number-1.5)/3,...)
panel.points(tmp, seq(tmp),...)
}
)
Which produces the following plot:
The example is based on: Grouped horizontal boxplot with bwplot
Can someone show how to do this if possible using Lattice graphics? Because all my plots in my master thesis are based on it.
If you want to consider a last option, you can try with ggplot2. Here the code where the red points belong to means:
library(ggplot2)
library(dplyr)
#Data
data(mpg, package = "ggplot2")
#Compute summary for points
Avg <- mpg %>% group_by(class,year) %>%
summarise(Avg=mean(hwy))
#Plot
ggplot(data = mpg, aes(x = class, y = hwy, fill = factor(year))) +
geom_boxplot(alpha=.25) +
geom_point(data=Avg,aes(x = class, y = Avg,color=factor(year)),
position=position_dodge(width=0.9),show.legend = F)+
scale_color_manual(values = c('red','red'))+
coord_flip()+
labs(fill='Year')+
theme_bw()
Output:

Using modelr::add_predictions for glm

I am trying to calculate the logistic regression prediction for a set of data using the tidyverse and modelr packages. Clearly I am doing something wrong in the add_predictions as I am not receiving the "response" of the logistic function as I would if I were using the 'predict' function in stats. This should be simple, but I can't figure it out and multiple searches yielded little.
library(tidyverse)
library(modelr)
options(na.action = na.warn)
library(ISLR)
d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)
grid <- d %>% data_grid(balance) %>% add_predictions(model)
ggplot(d, aes(x=balance)) +
geom_point(aes(y = default)) +
geom_line(data = grid, aes(y = pred))
predict.glm's type parameter defaults to "link", which add_predictions does not change by default, nor provide you with any way to change to the almost-certainly desired "response". (A GitHub issue exists; add your nice reprex on it if you like.) That said, it's not hard to just use predict directly within the tidyverse via dplyr::mutate.
Also note that ggplot is coercing default (a factor) to numeric in order to plot the line, which is fine, except that "No" and "Yes" are replaced by 1 and 2, while the probabilities returned by predict will be between 0 and 1. Explicitly coercing to numeric and subtracting one fixes the plot, though an extra scale_y_continuous call is required to fix the labels.
library(tidyverse)
library(modelr)
d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)
grid <- d %>% data_grid(balance) %>%
mutate(pred = predict(model, newdata = ., type = 'response'))
ggplot(d, aes(x = balance)) +
geom_point(aes(y = as.numeric(default) - 1)) +
geom_line(data = grid, aes(y = pred)) +
scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))
Also note that if all you want is a plot, geom_smooth can calculate predictions directly for you:
ggplot(d, aes(balance, as.numeric(default) - 1)) +
geom_point() +
geom_smooth(method = 'glm', method.args = list(family = 'binomial')) +
scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))

Plotting a number of regression lines in a single plot

How do I show 2 regression lines on the same plot?
Here are both models:
data(mtcars)
a <- lm(mpg~wt+hp)
b <- lm(mpg~wt+hp+wt*hp)
I plot wt on the x-axis, mpg on the y-axis and hp as the colour.
Here it is in base R:
cr <- colorRamp(c("yellow", "red"))
with(mtcars, {
plot(wt, mpg, col = rgb(cr(hp / max(hp)), max=255),
xlab="Weight", ylab="Miles per Gallon", pch=20)
})
Also, please show how to accomplish this in ggplot2.
Here's the plot:
library(ggplot2)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(aes(col = hp))
p + scale_colour_gradientn(colours=c("green","black"))
Thanks in advance!
The documentation for geom_smooth practically tells you how to do this.
One can use the regression models to predict new values for y and then plot these on the same graph using geom_smooth().
Below is code for ggplot2 that produces what I think you want. The two lines overlap so much that it looks like only one line is plotted and I've set one linetype to dashed to demonstrate this.
I don't know how to achieve this in base R though.
data(mtcars)
library(ggplot2)
a <- lm(mpg~wt+hp, data = mtcars)
b <- lm(mpg~wt+hp+wt*hp, data = mtcars)
mtcars$pred.a <- predict(a)
mtcars$pred.b <- predict(b)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(aes(col = hp)) +
scale_colour_gradientn(colours=c("green","black")) +
geom_smooth(aes(x = wt, y = pred.a), method = "lm", colour = "black", fill = NA) +
geom_smooth(aes(x = wt, y = pred.b), method = "lm", colour = "red", fill = NA, linetype = 4)
p
A base R solution:
a <- lm(mpg~wt+hp, data=mtcars)
b <- lm(mpg~wt+hp+wt*hp, data=mtcars)
wt <- mtcars[, "wt"]
idx <- sort(wt, index.return=TRUE)$ix
plot(mpg~wt, data=mtcars)
lines(wt[idx], predict(a)[idx], col="red")
lines(wt[idx], predict(b)[idx], col="blue")
However, it is not the best visualisation conceivable.
You are asking how to add a regression line, but your regression models produce a regression plane and a regression surface, both higher dimensional than a line. You can find a regression line by conditioning on a chosen value of hp, or show multiple lines for different values of hp.
Using base graphics you can use the Predict.Plot function in the TeachingDemos package to add prediction lines/curves to a plot for a fitted model (or 2). The interactive TkPredict' function in the same package will let you interact with the plot to choose conditioning values, then will produce the call toPredict.Plot` to create the current line. You can the combine the generated commands to include them on the same plot.

Resources