Graph GLM in ggplot2 where x variable is categorical - r

I need to graph the predicted probabilities of a logit regression in ggplot2. Essentially, I am trying to graph a glm by each treatment condition within the same graph. However, I am getting quite confused about how to do this seeing that my treat variable (i.e. the x I am interested in) is categorical.This means that when I try to graph the treatment effects using ggplot I just get a bunch of points at 0, 1, and 2 but no lines.
My question is... How could I graph the logit prediction lines in this case? Thanks in advance!
set.seed(96)
df <- data.frame(
vote = sample(0:1, 200, replace = T),
treat = sample(0:3, 200, replace = T))
glm_output <- glm(vote ~ as.factor(treat), data = df, family = binomial(link = "logit"))
predicted_vote <- predict(glm_output, newdata = df, type = "link", interval = "confidence", se = TRUE)
df <- cbind(df, data.frame(predicted_vote))

Since the explanatory variable treat is categorical, it will make more sense if you use boxplot instead like the following:
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2)
If you want to see the predicted probabilities by glm across different values of some of other explanatory variables you may try this:
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2) + facet_wrap(~gender)
# create age groups
df$age_group <- cut(df$age, breaks=seq(0,100,20))
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2) + facet_grid(age_group~gender)

Related

Plotting GAM in R: Setting custom x-axis limits?

Is there a way to set the x-axis limits when plotting the predicted fits for GAM models? More specifically, I'm fitting a smoother for each level of a factor using 'by = ', however, each factor level has a different range of values. Plotting the variable in ggplot results in an x-axis that automatically accommodates the different ranges of 'x'; however, after fitting a GAM (mgcv::gam()) the default behavior of plot.gam() appears to be predicting values across a shared x-axis limit.
The dummy data below has some continuous variable for 'x', but in my real data, 'x' is Time (year), and 'group' is sampling location. Because I did not collect data from each site across the same time range, I feel it is inappropriate to show a model fit in these empty years.
library(tidyverse)
library(mgcv)
library(gratia)
theme_set(theme_classic())
## simulate data with a grouping variable of three levels:
d = data.frame(group = rep(c('A','B','C'), each = 100),
x = c(seq(0,1,length=100),
seq(.2,1,length=100),
seq(0,.5,length=100))) %>%
mutate(y = sin(2*pi*x) + rnorm(100, sd=0.3),
group = as.factor(group))
## Look at data
ggplot(d, aes(x = x, y = y, colour = group))+
facet_wrap(~group)+
geom_point()+
geom_smooth()
Here is the raw data with loess smoother in ggplot:
## fit simple GAM with smoother for X
m1 = mgcv::gam(y ~ s(x, by = group), data = d)
## base R plot
par(mfrow = c(2,2), bty = 'l', las = 1, mai = c(.6,.6,.2,.1), mgp = c(2,.5,0))
plot(m1)
## Gavin's neat plotter
gratia::draw(m1)
Here is the predicted GAM fit that spans the same range (0,1) for all three groups:
Can I limit the prediction/plot to actual values of 'x'?
If you install the current development version (>= 0.6.0.9111) from GitHub, {gratia} will now do what you want, sort of. I added some functionality to smooth_estimates() that I had planned to add eventually but your post kicked it the top of the ToDo list and motivated me to add it now.
You can use smooth_estimates() to evaluate the smooths at the observed (or any user-supplied) data only and then a bit of ggplot() recreates most of the plot.
remotes::install_github("gavinsimpson/gratia")
library('mgcv')
library('gratia')
library('dplyr')
library('ggplot2')
d <- data.frame(group = rep(c('A','B','C'), each = 100),
x = c(seq(0,1,length=100),
seq(.2,1,length=100),
seq(0,.5,length=100))) %>%
mutate(y = sin(2*pi*x) + rnorm(100, sd=0.3),
group = as.factor(group))
m <- gam(y ~ group + s(x, by = group), data = d, method = 'REML')
sm <- smooth_estimates(m, data = d) %>%
add_confint()
ggplot(sm, aes(x = x, y = est, colour = group)) +
geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci, colour = NULL, fill = group),
alpha = 0.2) +
geom_line() +
facet_wrap(~ group)

Boxplot not showing range

I have predicted values, via:
glm0 <- glm(use ~ as.factor(decision), data = decision_use, family = binomial(link = "logit"))
predicted_glm <- predict(glm0, newdata = decision_use, type = "response", interval = "confidence", se = TRUE)
predict <- predicted_glm$fit
predict <- predict + 1
head(predict)
1 2 3 4 5 6
0.3715847 0.3095335 0.3095335 0.3095335 0.3095335 0.5000000
Now when I plot a box plot using ggplot2,
ggplot(decision_use, aes(x = decision, y = predict)) +
geom_boxplot(aes(fill = factor(decision)), alpha = .2)
I get a box plot with one horizontal line per categorical variable. If you look at the predict data, it's same for each categorical variable, so makes sense.
But I want a box plot with the range. How can I get that? When I use "use" instead of predict, I get boxes stretching from end to end (1 to 0). So I suppose that's not it. Thank you in advance.
To clarify, predicted_glm includes se.fit values. I wonder how to incorporate those.
It doesn't really make sense to do a boxplot here. A boxplot shows the range and spread of a continuous variable within groups. Your dependent variable is binary, so the values are all 0 or 1. Since you are plotting predictions for each group, your plot would have just a single point representing the expected value (i.e. the probability) for each group.
The closest you can come is probably to plot the prediction with 95% confidence bars around it.
You haven't provided any sample data, so I'll make some up here:
set.seed(100)
df <- data.frame(outcome = rbinom(200, 1, c(0.1, 0.9)), var1 = rep(c("A", "B"), 100))
Now we'll create our model and get the prediction for each level of my predictor variable using the newdata parameter of predict. I'm going to specify type = "link" because I want the log odds, and I'm also going to specify se.fit = TRUE so I can get the standard error of these predictions:
mod <- glm(outcome ~ var1, data = df, family = binomial)
prediction <- predict(mod, list(var1 = c("A", "B")), se.fit = TRUE, type = "link")
Now I can work out the 95% confidence intervals for my predictions:
prediction$lower <- prediction$fit - prediction$se.fit * 1.96
prediction$upper <- prediction$fit + prediction$se.fit * 1.96
Finally, I transform the fit and confidence intervals from log odds into probabilities:
prediction <- lapply(prediction, function(logodds) exp(logodds)/(1 + exp(logodds)))
plotdf <- data.frame(Group = c("A", "B"), fit = prediction$fit,
upper = prediction$upper, lower = prediction$lower)
plotdf
#> Group fit upper lower
#> 1 A 0.13 0.2111260 0.07700412
#> 2 B 0.92 0.9594884 0.84811360
Now I am ready to plot. I will use geom_points for the probability estimates and geom_errorbars for the confidence intervals :
library(ggplot2)
ggplot(plotdf, aes(x = Group, y = fit, colour = Group)) +
geom_errorbar(aes(ymin = lower, ymax = upper), size = 2, width = 0.5) +
geom_point(size = 3, colour = "black") +
scale_y_continuous(limits = c(0, 1)) +
labs(title = "Probability estimate with 95% CI", y = "Probability")
Created on 2020-05-11 by the reprex package (v0.3.0)

How to visualise an interaction effect using ggplot with method = glmmTMB::glmmTMB and point weights

I am running an analysis in R on the effect of canopy cover (OverheadCover) and the number of carcasses placed on the same location (CarcassNumber) on the proportion of carrion eaten by birds (ProportionBirdsScavenging). The interaction effect OverheadCover * CarcassNumber is significant and I would like visualise this using ggplot like explained here: https://sebastiansauer.github.io/vis_interaction_effects/. I won't be using method = "lm" like in the example, but method = glmmTMB::glmmTMB. I've added the extra arguments formula = and method.args = to make sure R computes the smooth correctly.
This is how it should look, but I prefer the graph to be made with ggplot because then all my graphs will be in the same style.
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
plot_model(glmm_interaction, type = "int", ci.lvl = 0.682) # conf. int. of 68.3% -> ± standard error
This is the code I'm trying to run, but I can't get it to work. It keeps giving me errors, like object 'pointWeight_scaled' not found. Anyone an idea what I'm doing wrong here?
qplot(x = OverheadCover, y = ProportionBirdsScavenging, color = CarcassNumber, data = data_both) +
geom_smooth(method = glmmTMB::glmmTMB,
formula = ProportionBirdsScavenging ~ OverheadCover * CarcassNumber,
method.args = list(data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled))
I know that it might be easier to just individually run the models and plot them on the same graph. I've done that, and it works. However, my calculated standard errors are larger than the ones in the plot_model(), so I wanted to see how these standard errors look if R does all the work, hence my intention to plot it this way.
This is how it should look, but I prefer the graph to be made with ggplot
The plot returned by plot_model() is a ggplot-object, which you can modify as you like. You could also use the ggeffects-package, which returns the underlying data that can be used to create the plot. There are many examples in the vignettes, both on how to create own plots or how to modify plots returned by plot(), e.g. here or here.
Here is a toy example:
library(ggplot2)
library(ggeffects)
library(lme4)
#> Loading required package: Matrix
set.seed(123)
dat <- data.frame(
outcome = rbinom(n = 500, size = 1, prob = 0.25),
var_binom = as.factor(rbinom(n = 500, size = 1, prob = 0.3)),
var_cont = rnorm(n = 500, mean = 10, sd = 3),
group = sample(letters[1:4], size =500, replace = TRUE)
)
model <- glmer(
outcome ~ var_binom * poly(var_cont, 2) + (1 | group),
data = dat,
family = binomial(link = "logit")
)
predictions <- ggpredict(model, c("var_cont [all]", "var_binom"))
# plot-function from ggeffects
plot(predictions)
# self made ggplot
ggplot(
predictions,
aes(x = x, y = predicted, ymin = conf.low, ymax = conf.high, colour = group, fill = group)
) +
geom_line() +
geom_ribbon(alpha = .1, colour = NA) +
theme_minimal()
Created on 2020-02-06 by the reprex package (v0.3.0)

Plotting multiple lift curves

I am new to R and trying to learn. I am trying to plot lift curves of multiple classifiers in one graph. I can't figure out a way to do it. I know the below two classifiers are essentially the same but they both give different graphs and I just want to combine the two. Below is the code I tried. Could someone please point me in the right direction
fullmod = glm(Response ~ page_views_90d+win_visits+osx_visits+mc_1+mc_2+mc_3+mc_4+mc_5+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
summary(fullmod)
fullmod.results <- predict(fullmod, newdata = testing, type='response')
plotLift(fitted.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
redmod1 = glm(Response ~ win_visits+osx_visits+mc_2+mc_4+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
redmod1.results <- predict(redmod1, newdata = testing, type = 'response')
plotLift(redmod1.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
# Attempt to plot multiple classifiers
plotLift((redmod1.results, fullmod.results), test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
Here is a way to plot multiple lift curves using the caret library. But first some data:
set.seed(1)
for_lift <- data.frame(Class = factor(rep(1:2, each = 50)),
model1 = sort(runif(100), decreasing = TRUE),
model2 = runif(100),
model3 = runif(100))
Here the Class column is the real classes
model1 is the predicted probabilities by the first model and so on.
Now create a lift object from the data using:
library(caret)
lift_curve <- lift(Class ~ model1 + model2, data = for_lift)
and plot it
xyplot(lift_curve, auto.key = list(columns = 3))
If you would like to plot with ggplot:
library(ggplot2)
ggplot(lift_curve$data)+
geom_line(aes(CumTestedPct, CumEventPct, color = liftModelVar))+
xlab("% Samples tested")+
ylab("% Samples found")+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_polygon(data = data.frame(x = c(0, lift_curve$pct, 100, 0),
y = c(0, 100, 100, 0)),
aes(x = x, y = y), alpha = 0.1)

brms package in R smoother

I have this data frame in R:
x = rep(seq(-10,10,1),each=5)
y = rep(0,length(x) )
weights = sample( seq(1,20,1) ,length(x), replace = TRUE)
weights = weights/sum(weights)
groups = rep( letters[1:5], times =length(x)/5 )
and some data that looks like this:
library(ggplot2)
ggplot(data = dat, aes(x = x, y = y, color = group))+geom_point( aes(size = weights))+
ylab("outcome")+
xlab("predictor x1")+
geom_vline(xintercept = 0)+ geom_hline(yintercept = 0)
fit_brms = brm(y~ s(x)+(1|group), data = dat)
by_group = marginal_effects(fit_brms, conditions = data.frame(group = dat$group) ,
re_formula = NULL, method = "predict")
plot(by_group, ncol = 5, points = TRUE)
I'd like to make a hierarchical nonlinear model so that there is a different nonlinear fit for each group.
In brms I have the code below which is doing a spline fit on the x predictor with random intercepts on group the fitted line is the same for all groups. the difference is where the lines cross the y intercept. Is there a way to make the non-linear fit be different for each group's data points?
ON page 13 here : https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf
It states "As the smooth term itself cannot be modeled as varying by year in a multilevel manner,we add a basic varying intercept in an effort to account for variation between years"
So the spline will be the same for all groups it appears? The only difference in the plots is where the spline cross the y intercept. That seems very restrictive. Can this be modified to make the spline unique to each group?
Use the formula: y ~ s(x, by = group) + (1|group)

Resources