how to visualize the coefficients from different models in just one plot? - r

I have 2 different datasets. To each one i apply the same plm regression. I would like to know how can i visualize, in the same plot, the estimated coefficients of each model.
mainstream <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = top200, model = "within")
long_tail <- plm(log(sum_plays) ~ cancel_public_events + close_public_transport + internationaltravel + restrictions_on_gatherings + school_closing + stay_at_home_requirements + workplace_closing + new_cases_per_million + new_deaths_per_million +
data = bottom, model = "within")
I can make the plot for each individual model, however i want to have the info of this both plots in just one. Probably differentiate the coefficients by color (i.e coefficients from "mainstream" in red and the coefficients from "longtail" in blue)
a <- plot_model(long_tail, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")
b <- plot_model(mainstream, transform = NULL, show.values = TRUE, value.offset =.3, terms = c("workplace_closing" , "stay_at_home_requirements", "school_closing", "close_public_transport", "internationaltravel", "restrictions_on_gatherings", "cancel_public_events"), title = "Coefficients for Long-Tail Music Consumption")

Related

lmer and plot_coefs: add values for estimates

I need to plot my coefficient values for a linear model (lm). I use plot_coef() to plot, but plot only selected variables. But plot_coef() does not allow to add values of those estimates to the plot so that they actually show as numbers?
states <- as.data.frame(state.x77)
fit1 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + `Life Exp` + `HS Grad`,
data = states, weights = runif(50, 0.1, 3))
plot_summs(fit1,
coefs = c("Frost Days" = "Frost", "% Illiterate" = "Illiteracy"),
scale = TRUE)
This may get you started:
library(jtools)
library(ggplot2)
p <- plot_summs(fit1,
coefs = c("Frost Days" = "Frost", "% Illiterate" = "Illiteracy"),
scale = TRUE)
p +
geom_label(aes(label = round(estimate)))+
theme(legend.position = "none")
You get this:
If you want to be in better control of the final product, it would be easier to get the data using:
df <- broom::tidy(fit1, conf.int = TRUE)
Then, use ggplot().

Plots with error bars using logistf object

I initially did a logistic model using the glm package but wanted to correct for separation so I have used the logistf function and I'm now trying to redo my plots. I'm unsure how to make a plot like the one below with a logistf object. A lot of packages don't seem to support it, I've tried using sjPlot packages' plot_model() function which plots a dot for the predicted probability but doesn't add the error bars as it does automatically with a glm object. How can I get round this? Is there perhaps another package that would make this easier or is there a way to manually add the error bars?
The code for the plot I wish to add error bars to is:
sjPlot::plot_model(lr3, type="int", mdrt.values = "meansd", show.values = TRUE, value.offset = .3)
The output of my model lr3 is:
logistf(formula = foodbank_cv ~ wave + ff_country + relevel(race_grp,
ref = "White") + sex_cv + age_r + relevel(numchildren,
ref = "None") + wave * ff_hcondhas + relevel(carer,
ref = "Not") + sempderived + wave * cd_ff_furlough +
log(ff_hours) + qual + num + relevel(keyworksector, ref = "Not keyworker") +
ca_clinvuln_dv + freemeals + ca_blbenefits1 + log(hhincome_week),
data = data, firth = TRUE, family = binomial(link = "logit"))
Model fitted by Penalized ML
Coefficients:
coef se(coef) lower 0.95 upper 0.95 Chisq p method
(Intercept) -5.237542354 0.46736532 -6.23016284 -4.30807241 Inf 0.000000e+00 2
wave5 -0.377956413 0.32598420 -1.07410577 0.28545651 1.232122e+00 2.669947e-01 2
wave7 -0.929934987 0.40813067 -1.84652632 -0.12926473 5.260388e+00 2.181615e-02 2
ff_country2 -0.118780142 0.33317501 -0.86893024 0.51197342 1.196576e-01 7.294061e-01 2
ff_country3 0.393456771 0.25097814 -0.15010616 0.88210537 2.077828e+00 1.494527e-01 2
ff_country4 -0.219066153 0.43493435 -1.23008781 0.57774984 2.481153e-01 6.184053e-01 2
relevel(race_grp, ref = "White")Asian or Asian British 0.882833792 0.22906054 0.39628625 1.33641305 1.183859e+01 5.801581e-04 2
relevel(race_grp, ref = "White")Black or Black British 1.759374627 0.27942672 1.16321835 2.29702048 2.678592e+01 2.272869e-07 2
relevel(race_grp, ref = "White")Mixed 1.786978145 0.27773294 1.19285979 2.32350705 2.763841e+01 1.462461e-07 2
relevel(race_grp, ref = "White")Other -0.345106379 1.38712570 -5.19048868 1.62733736 6.509258e-02 7.986208e-01 2
ff_hcondhas 0.691244774 0.26776923 0.14697164 1.25269746 6.228205e+00 1.257311e-02 2
Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
The code that I used to make the hunger and race plot. I did some manual editing to make it look nicer but this is what I ideally want my plot to look like:
plot_model(model12, type = "pred", terms = c("race_grp"), mdrt.values = "meansd", axis.textsize = .3, wrap.labels = 5)+ theme_sjplot2() + scale_color_sjplot("simply") + ggplot2::labs(title= "Predicted probabilities of Hunger", x= "Race", y="Percentage")
I have found a way to get around this issue, however, not with the logistf package. In case anyone in the future wants to know the answer to this question, my suggestion is that you use the brglm package. I have checked and the results from the brglm package are exactly the same as the logistf package. This is how I recreated the Hunger plot posted above:
hi2<- brglm(formula= hungry_cv~ wave + ff_country + race_grp + sex_cv + age_r + numchildren + wave*ff_hcondhas + carer + sempderived + wave*cd_ff_furlough + log(ff_hours) + qual + num + keyworksector + ca_clinvuln_dv + freemeals + ca_blbenefits1 + log(hhincome_week), data=data, family=binomial(logit), method = "brglm.fit", pl = TRUE)
racehunger<- plot_model(hi2, type = "pred", terms = c("race_grp"), mdrt.values = "meansd", axis.textsize = .3, wrap.labels = 5, show.values = TRUE)+ theme_sjplot2() + ggplot2::labs(title= "Predicted probabilities of Hunger", x= "Race", y="Percentage")
racehunger
png(file="racehunger.png", units="in", width=11, height=8.5, res=300)
print(racehunger)
dev.off()
The output of the code is:
I am personally very happy with the result.

Adding interaction terms to show final values in forrest plots? (sjPlot and ggeffects)

So I have the two graphs below: one is the coefficients from the property crime values (1st level), and the second, to the right, are the coefficients for violent crime. However, the coefficients to the right are only the differences between property crime and violent crime coefficients, not the actual violent crime coefficients. I'm wondering whether there is a way to added the property crime coefficients to the violent crime coefficients, such that I can get the actual violent crime coefficients.
Here is my model:
lmer(log(COUNT + 1) ~ CRIME*( cent.log.pop + cent.log.pop.dens + per_capita_income*black + per_capita_income*white + no.grad.hs + prop5.17.pov + ma.plus + hs + ba + median_gross_rent+cent.EXP_STUDENT +unemployment_rate + asian + diff.dem) + officers + (year|PLACE_ID) + (1|COUNTY_ID) + (1|STATE), control = lmerControl(optimizer= "nloptwrap", calc.derivs = FALSE), na.action = 'na.omit', REML = FALSE, city.v.p = data)
plot_model(model, rm.terms = c("cent.log.pop", "CRIMECRIME_VIOLENT:cent.log.pop", "CRIMECRIME_VIOLENT:cent.log.pop.dens", "CRIMECRIME_VIOLENT:per_capita_income", "CRIMECRIME_VIOLENT:black", "CRIMECRIME_VIOLENT:white", "CRIMECRIME_VIOLENT:no.grad.hs", "CRIMECRIME_VIOLENT:prop5.17.pov", "CRIMECRIME_VIOLENT:ma.plus", "CRIMECRIME_VIOLENT:hs", "CRIMECRIME_VIOLENT:ba", "CRIMECRIME_VIOLENT:median_gross_rent", "CRIMECRIME_VIOLENT:cent.EXP_STUDENT", "CRIMECRIME_VIOLENT:unemployment_rate", "CRIMECRIME_VIOLENT:asian", "CRIMECRIME_VIOLENT:diff.dem", "CRIMECRIME_VIOLENT:per_capita_income:black", "CRIMECRIME_VIOLENT:per_capita_income:white", "CRIMECRIME_VIOLENT"), group.terms = c(1,1,2,3,3,1,2,1,3,1,1,3,1,2,3,3,3), axis.labels = labels, title = "9a: Effect Size: Property Crime", dot.size = .5,line.size = .2)...

Fitting two coefplot in one graph using par(mfrow()) method

I'm trying to arrange two coefplot objects into one graph via the par(mfrow(,)) method, but it didn't work out. What did I do wrong? Or is that coefplot just doesn't work this way? What will be alternative method?
I've referenced this earlier thread, but I tend to think that mine is a quite different issue.
# load the data
dat <- readRDS(url("https://www.dropbox.com/s/88h7hmiroalx3de/act.rds?dl=1"))
#fit two models
library(lmer4)
act1.fit <- glmer(act1 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
action2.fit <- glmer(act2 ~ os + education + marital + nat6 + nat5 + nat4 + nat3 + nat2 + nat1 +
(1 | region_id), data = action, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
# plot the two model individually
library(coefplot)
# construct coefplot objects
coefplot:::buildModelCI(action1.fit)
coefplot:::buildModelCI(action2.fit)
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# arrange two plots in one graph
par(mfrow=c(1,2))
coefplot(action1.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
coefplot(action2.fit, coefficients=c("nat1", "nat2", "nat3", "nat4", "nat5", "nat6"),
intercept = FALSE, color = "brown3")
# didn't work ???

Marginal effects / interaction plots for lfe felm regression object

I need to create an interaction / marginal effects plot for a fixed effects model including clustered standard errors generated using the lfe "felm" command.
I have already created a function that achieves this. However, before I start using it, I wanted to double-check whether this function is correctly specified. Please find the function and a reproducible example below.
library(lfe)
### defining function
felm_marginal_effects <- function(regression_model, data, treatment, moderator, treatment_translation, moderator_translation, dependent_variable_translation, alpha = 0.05, se = NULL) {
library(ggplot2)
library(ggthemes)
library(gridExtra)
### defining function to get average marginal effects
getmfx <- function(betas, data, treatment, moderator) {
betas[treatment] + betas[paste0(treatment, ":", moderator)] * data[, moderator]
}
### defining function to get marginal effects for specific levels of the treatment variable
getmfx_high_low <- function(betas, data, treatment, moderator, treatment_val) {
betas[treatment] * treatment_val + betas[paste0(treatment, ":", moderator)] * data[, moderator] * treatment_val
}
### Defining function to analytically derive standard error for marginal effects
getvarmfx <- function(my_vcov, data, treatment, moderator) {
my_vcov[treatment, treatment] + data[, moderator]^2 * my_vcov[paste0(treatment, ":", moderator), paste0(treatment, ":", moderator)] + 2 * data[, moderator] * my_vcov[treatment, paste0(treatment, ":", moderator)]
}
### constraining data to relevant variables
data <- data[, c(treatment, moderator)]
### getting marginal effects
data[, "marginal_effects"] <- getmfx(coef(regression_model), data, treatment, moderator)
### getting marginal effects for high and low cases of treatment variable
data[, "marginal_effects_treatment_low"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.05))
data[, "marginal_effects_treatment_high"] <- getmfx_high_low(coef(regression_model), data, treatment, moderator, quantile(data[,treatment], 0.95))
### getting robust SEs
if (is.null(se)) {
data$se <- getvarmfx(regression_model$vcv, data, treatment, moderator)
} else if (se == "clustered") {
data$se <- getvarmfx(regression_model$clustervcv, data, treatment, moderator)
} else if (se == "robust") {
data$se <- getvarmfx(regression_model$robustvcv, data, treatment, moderator)
}
### Getting CI bounds
data[, "ci_lower"] <- data[, "marginal_effects"] - abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
data[, "ci_upper"] <- data[, "marginal_effects"] + abs(qt(alpha/2, regression_model$df, lower.tail = TRUE)) * sqrt(data$se)
### plotting marginal effects plot
p_1 <- ggplot(data, aes_string(x = moderator)) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), fill = "grey70", alpha = 0.4) +
geom_line(aes(y = marginal_effects)) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10)) +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Average marginal effects")
p_2 <- ggplot(data, aes_string(x = moderator)) +
geom_line(aes(y = marginal_effects_treatment_high, color = paste0("High ",treatment_translation))) +
geom_line(aes(y = marginal_effects_treatment_low, color = paste0("Low ",treatment_translation))) +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 11.5, hjust = 0.5), axis.title = element_text(size = 10), axis.title.y = element_blank(), legend.justification = c(0.95, 0.95), legend.position = c(1, 1), legend.direction = "vertical") +
geom_rug() +
xlab(moderator_translation) +
ylab(paste("Marginal effect of",treatment_translation,"on",dependent_variable_translation)) +
ggtitle("Marginal effects at high / low levels of treatment") +
scale_color_manual(name = NULL, values = c(rgb(229, 93, 89, maxColorValue = 255), rgb(75, 180, 184, maxColorValue = 255)), labels=c(paste0("High ",treatment_translation), paste0("Low ",treatment_translation)))
### exporting plots as combined grob
return(grid.arrange(p_1, p_2, ncol = 2))
}
### example:
# example model (just for demonstration, fixed effects and cluster variables make little sense here)
model <- felm(mpg ~ cyl + am + cyl:am | carb | 0 | cyl, data = mtcars)
# creating marginal effects plot
felm_marginal_effects(regression_model = model, data = mtcars, treatment = "cyl", moderator = "am", treatment_translation = "Number of cylinders", moderator_translation = "Transmission", dependent_variable_translation = "Miles per (US) gallon")
The example output looks like this:
Happy for any advice on how to make this a better, "well-coded", fast function so that it's more useful for others afterwards. However, I'm mostly looking to confirm whether it's "correct" in the first place.
Additionally, I wanted to check back with the community regarding some remaining questions, particularly:
Can I use the standard errors I generated for the average marginal effects for the "high" and "low" treatment cases as well or do I need to generate different standard errors for these cases? If so how?
Instead of using the analytically derived standard errors, I could also calculate bootstrapped standard errors by creating many coefficient estimates based on repeated sub-samples of the data. How would I generate bootstrapped standard errors for the high / low case?
Is there something about fixed effects models or fixed effects models with clustered standard errors that make marginal effects plots or anything else I did in the code fundamentally inadmissible?
PS.: The above function and questions are kind of an extension of How to plot marginal effect of an interaction after felm() function

Resources