Plots with error bars using logistf object - r

I initially did a logistic model using the glm package but wanted to correct for separation so I have used the logistf function and I'm now trying to redo my plots. I'm unsure how to make a plot like the one below with a logistf object. A lot of packages don't seem to support it, I've tried using sjPlot packages' plot_model() function which plots a dot for the predicted probability but doesn't add the error bars as it does automatically with a glm object. How can I get round this? Is there perhaps another package that would make this easier or is there a way to manually add the error bars?
The code for the plot I wish to add error bars to is:
sjPlot::plot_model(lr3, type="int", mdrt.values = "meansd", show.values = TRUE, value.offset = .3)
The output of my model lr3 is:
logistf(formula = foodbank_cv ~ wave + ff_country + relevel(race_grp,
ref = "White") + sex_cv + age_r + relevel(numchildren,
ref = "None") + wave * ff_hcondhas + relevel(carer,
ref = "Not") + sempderived + wave * cd_ff_furlough +
log(ff_hours) + qual + num + relevel(keyworksector, ref = "Not keyworker") +
ca_clinvuln_dv + freemeals + ca_blbenefits1 + log(hhincome_week),
data = data, firth = TRUE, family = binomial(link = "logit"))
Model fitted by Penalized ML
Coefficients:
coef se(coef) lower 0.95 upper 0.95 Chisq p method
(Intercept) -5.237542354 0.46736532 -6.23016284 -4.30807241 Inf 0.000000e+00 2
wave5 -0.377956413 0.32598420 -1.07410577 0.28545651 1.232122e+00 2.669947e-01 2
wave7 -0.929934987 0.40813067 -1.84652632 -0.12926473 5.260388e+00 2.181615e-02 2
ff_country2 -0.118780142 0.33317501 -0.86893024 0.51197342 1.196576e-01 7.294061e-01 2
ff_country3 0.393456771 0.25097814 -0.15010616 0.88210537 2.077828e+00 1.494527e-01 2
ff_country4 -0.219066153 0.43493435 -1.23008781 0.57774984 2.481153e-01 6.184053e-01 2
relevel(race_grp, ref = "White")Asian or Asian British 0.882833792 0.22906054 0.39628625 1.33641305 1.183859e+01 5.801581e-04 2
relevel(race_grp, ref = "White")Black or Black British 1.759374627 0.27942672 1.16321835 2.29702048 2.678592e+01 2.272869e-07 2
relevel(race_grp, ref = "White")Mixed 1.786978145 0.27773294 1.19285979 2.32350705 2.763841e+01 1.462461e-07 2
relevel(race_grp, ref = "White")Other -0.345106379 1.38712570 -5.19048868 1.62733736 6.509258e-02 7.986208e-01 2
ff_hcondhas 0.691244774 0.26776923 0.14697164 1.25269746 6.228205e+00 1.257311e-02 2
Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
The code that I used to make the hunger and race plot. I did some manual editing to make it look nicer but this is what I ideally want my plot to look like:
plot_model(model12, type = "pred", terms = c("race_grp"), mdrt.values = "meansd", axis.textsize = .3, wrap.labels = 5)+ theme_sjplot2() + scale_color_sjplot("simply") + ggplot2::labs(title= "Predicted probabilities of Hunger", x= "Race", y="Percentage")

I have found a way to get around this issue, however, not with the logistf package. In case anyone in the future wants to know the answer to this question, my suggestion is that you use the brglm package. I have checked and the results from the brglm package are exactly the same as the logistf package. This is how I recreated the Hunger plot posted above:
hi2<- brglm(formula= hungry_cv~ wave + ff_country + race_grp + sex_cv + age_r + numchildren + wave*ff_hcondhas + carer + sempderived + wave*cd_ff_furlough + log(ff_hours) + qual + num + keyworksector + ca_clinvuln_dv + freemeals + ca_blbenefits1 + log(hhincome_week), data=data, family=binomial(logit), method = "brglm.fit", pl = TRUE)
racehunger<- plot_model(hi2, type = "pred", terms = c("race_grp"), mdrt.values = "meansd", axis.textsize = .3, wrap.labels = 5, show.values = TRUE)+ theme_sjplot2() + ggplot2::labs(title= "Predicted probabilities of Hunger", x= "Race", y="Percentage")
racehunger
png(file="racehunger.png", units="in", width=11, height=8.5, res=300)
print(racehunger)
dev.off()
The output of the code is:
I am personally very happy with the result.

Related

how to visualize the coefficients from different models in just one plot?

I have 2 different datasets. To each one i apply the same plm regression. I would like to know how can i visualize, in the same plot, the estimated coefficients of each model.
mainstream <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = top200, model = "within")
long_tail <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = bottom, model = "within")
I can make the plot for each individual model, however i want to have the info of this both plots in just one. Probably differentiate the coefficients by color (i.e coefficients from "mainstream" in red and the coefficients from "longtail" in blue)
a <- plot_model(long_tail, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")
b <- plot_model(mainstream, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")

Fitting two coefplot in one graph using par(mfrow()) method

I'm trying to arrange two coefplot objects into one graph via the par(mfrow(,)) method, but it didn't work out. What did I do wrong? Or is that coefplot just doesn't work this way? What will be alternative method?
I've referenced this earlier thread, but I tend to think that mine is a quite different issue.
# load the data
dat <- readRDS(url("https://www.dropbox.com/s/88h7hmiroalx3de/act.rds?dl=1"))
#fit two models
library(lmer4)
act1.fit <- glmer(act1 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
action2.fit <- glmer(act2 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
# plot the two model individually
library(coefplot)
# construct coefplot objects
coefplot:::buildModelCI(action1.fit)
coefplot:::buildModelCI(action2.fit)
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# arrange two plots in one graph
par(mfrow=c(1,2))
coefplot(action1.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# didn't work ???

Marginal effects / interaction plots for lfe felm regression object

I need to create an interaction / marginal effects plot for a fixed effects model including clustered standard errors generated using the lfe "felm" command.
I have already created a function that achieves this. However, before I start using it, I wanted to double-check whether this function is correctly specified. Please find the function and a reproducible example below.
library(lfe)
### defining function
felm_marginal_effects <- function(regression_model, data, treatment, moderator, treatment_translation, moderator_translation, dependent_variable_translation, alpha = 0.05, se = NULL) {
library(ggplot2)
library(ggthemes)
library(gridExtra)
### defining function to get average marginal effects
getmfx <- function(betas, data, treatment, moderator) {
betas[treatment] + betas[paste0(treatment, ":", moderator)] * data[, moderator]
}
### defining function to get marginal effects for specific levels of the treatment variable
getmfx_high_low <- function(betas, data, treatment, moderator, treatment_val) {
betas[treatment] * treatment_val + betas[paste0(treatment, ":", moderator)] * data[, moderator] * treatment_val
}
### Defining function to analytically derive standard error for marginal effects
getvarmfx <- function(my_vcov, data, treatment, moderator) {
my_vcov[treatment, treatment] + data[, moderator]^2 * my_vcov[paste0(treatment, ":", moderator), paste0(treatment, ":", moderator)] + 2 * data[, moderator] * my_vcov[treatment, paste0(treatment, ":", moderator)]
}
### constraining data to relevant variables
data <- data[, c(treatment, moderator)]
### getting marginal effects
data[, "marginal_effects"] <- getmfx(coef(regression_model), data, treatment, moderator)
### getting marginal effects for high and low cases of treatment variable
data[, "marginal_effects_treatment_low"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.05))
data[, "marginal_effects_treatment_high"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.95))
### getting robust SEs
if (is.null(se)) {
data$se <- getvarmfx(regression_model$vcv, data, treatment, moderator)
} else if (se == "clustered") {
data$se <- getvarmfx(regression_model$clustervcv, data, treatment, moderator)
} else if (se == "robust") {
data$se <- getvarmfx(regression_model$robustvcv, data, treatment, moderator)
}
### Getting CI bounds
data[, "ci_lower"] <- data[, "marginal_effects"] - abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
data[, "ci_upper"] <- data[, "marginal_effects"] + abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
### plotting marginal effects plot
p_1 <- ggplot(data, aes_string(x = moderator)) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), fill = "grey70", alpha = 0.4) +
geom_line(aes(y = marginal_effects)) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10)) +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Average marginal effects")
p_2 <- ggplot(data, aes_string(x = moderator)) +
geom_line(aes(y = marginal_effects_treatment_high, color = paste0("High ",treatment_translation))) +
geom_line(aes(y = marginal_effects_treatment_low, color = paste0("Low ",treatment_translation))) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10), axis.title.y = element_blank(), legend.justification = c(0.95, 0.95), legend.position = c(1, 1), legend.direction = "vertical") +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Marginal effects at high / low levels of treatment") +
scale_color_manual(name = NULL, values = c(rgb(229, 93, 89, maxColorValue = 255), rgb(75, 180, 184, maxColorValue = 255)), labels=c(paste0("High ",treatment_translation), paste0("Low ",treatment_translation)))
### exporting plots as combined grob
return(grid.arrange(p_1, p_2, ncol = 2))
}
### example:
# example model (just for demonstration, fixed effects and cluster variables make little sense here)
model <- felm(mpg ~ cyl + am + cyl:am | carb | 0 | cyl, data = mtcars)
# creating marginal effects plot
felm_marginal_effects(regression_model = model, data = mtcars, treatment = "cyl", moderator = "am", treatment_translation = "Number of cylinders", moderator_translation = "Transmission", dependent_variable_translation = "Miles per (US) gallon")
The example output looks like this:
Happy for any advice on how to make this a better, "well-coded", fast function so that it's more useful for others afterwards. However, I'm mostly looking to confirm whether it's "correct" in the first place.
Additionally, I wanted to check back with the community regarding some remaining questions, particularly:
Can I use the standard errors I generated for the average marginal effects for the "high" and "low" treatment cases as well or do I need to generate different standard errors for these cases? If so how?
Instead of using the analytically derived standard errors, I could also calculate bootstrapped standard errors by creating many coefficient estimates based on repeated sub-samples of the data. How would I generate bootstrapped standard errors for the high / low case?
Is there something about fixed effects models or fixed effects models with clustered standard errors that make marginal effects plots or anything else I did in the code fundamentally inadmissible?
PS.: The above function and questions are kind of an extension of How to plot marginal effect of an interaction after felm() function

Warming : Rank deficient and missing values

I am running the following code:
ggplot(data= data_nickel_t, aes( x=index(data_nickel_t), y= log(ni_demand) )) +
scale_x_yearqtr(format = "%Y-%q", n = 14) +
geom_point() + stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', aes(colour = "linear fit"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 2), size = 1, aes(colour = "quadratic"), se= FALSE) +
geom_smooth(method='lm', formula = y ~ x + poly(x, 3), size = 1, aes(colour = "polynomial"), se= FALSE ) +
ggtitle("Global Refined Nickel Demand") +
xlab("Time") +
ylab("Thousand Metric Tons")
The code above produce a graph with three fitted lines but I get the following warning messages:
1: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se,
prediction from a rank-deficient fit may be misleading;
2: In predict.lm(model, newdata = data.frame(x = xseq), se.fit = se, :
prediction from a rank-deficient fit may be misleading;
3: Removed 94 rows containing missing values (geom_pointrange).
My first impression was collinearity between time trends variable in poly() function. I might estimate numerical model to check this further. As for the missing value issue, e.g this link explain the reasons for missing k rows. When I tried solutions suggested in that link, it does not work in my case, I still get the same error. I have 94 observations. I also don't have zeros in my data so no reason for log transformation to drop my values. I am still kind of new using r with time series any idea how I may fix the missing value warning?

ggplot GLM fitted curve without interaction

I want to add the fitted function from GLM on a ggplot. By default, it automatically create the plot with interaction. I am wondering, if I can plot the fitted function from the model without interaction. For example,
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
plt <- ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point(size = 2) +
geom_smooth(method = "glm", , se = F,
method.args = list(family = "poisson"))
print(plt)
gives the plot with interaction,
However, I want the plot from the model,
`num_awards` = ß0 + ß1*`math` + ß2*`prog` + error
I tried to get this this way,
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
fun.gen <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd)
fun.acd <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[3])
fun.voc <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[4])
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
stat_function(fun = fun.gen, col = "red") +
stat_function(fun = fun.acd, col = "green") +
stat_function(fun = fun.voc, col = "blue") +
geom_smooth(method = "glm", se = F,
method.args = list(family = "poisson"), linetype = "dashed")
The output plot is
Is there any simple way in ggplot to do this efficiently?
Ben's idea of plotting predicted value of the response for specific model terms inspired me improving the type = "y.pc" option of the sjp.glm function. A new update is on GitHub, with version number 1.9.4-3.
Now you can plot predicted values for specific terms, one which is used along the x-axis, and a second one used as grouping factor:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"))
which gives you following plot:
The vars argument is needed in case your model has more than two terms, to specify the term for the x-axis-range and the term for the grouping.
You can also facet the groups:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"), show.ci = T, facet.grid = T)
There's no way that I know of to trick geom_smooth() into doing this, but you can do a little better than you've done. You still have to fit the model yourself and add the lines, but you can use the predict() method to generate the predictions and load them into a data frame with the same structure as the original data ...
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
## generate prediction frame
pframe <- with(dta,
expand.grid(math=seq(min(math),max(math),length=51),
prog=levels(prog)))
## add predicted values (on response scale) to prediction frame
pframe$num_awards <- predict(mod,newdata=pframe,type="response")
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
geom_smooth(method = "glm", se = FALSE,
method.args = list(family = "poisson"), linetype = "dashed")+
geom_line(data=pframe) ## use prediction data here
## (inherits aesthetics etc. from main ggplot call)
(the only difference here is that the way I've done it the predictions span the full horizontal range for all groups, as if you had specified fullrange=TRUE in geom_smooth()).
In principle it seems as though the sjPlot package should be able to handle this sort of thing, but it looks like the relevant bit of code for doing this plot type is hard-coded to assume a binomial GLM ... oh well.
I'm not sure, but you wrote "without interaction" - maybe you are looking for effect plots? (If not, excuse me that I'm assuming something completely wrong...)
You can, for instance, use the effects package for this.
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
library(effects)
plot(allEffects(mod))
Another option would be the sjPlot package, as Ben suggested - however, the current version on CRAN only supports logistic regression models properly for effect plots. But in the current development version on GitHub I added support for various model families and link functions, so if you like, you can download that snapshot. The sjPlot package uses ggplot instead of lattice (which is used by the effects package, I think):
sjp.glm(mod, type = "eff", show.ci = T)
Or in non-faceted way:
sjp.glm(mod, type = "eff", facet.grid = F, show.ci = T)

Resources