I am trying to plot a dot-whisker plot of the confidence intervals for 4 different regression models.
The data is available here.
#first importing data
Q1<-read.table("~/Q1.txt", header=T)
# Optionally, read in data directly from figshare.
# Q1 <- read.table("https://ndownloader.figshare.com/files/13283882?private_link=ace5b44bc12394a7c46d", header=TRUE)
library(dplyr)
#splitting into female and male
female<-Q1 %>%
filter(sex=="F")
male<-Q1 %>%
filter(sex=="M")
library(lme4)
#Female models
#poisson regression
ab_f_LBS= lmer(LBS ~ ft + grid + (1|byear), data = subset(female))
#negative binomial regression
ab_f_surv= glmer.nb(age ~ ft + grid + (1|byear), data = subset(female), control=glmerControl(tol=1e-6,optimizer="bobyqa",optCtrl=list(maxfun=1e19)))
#Male models
#poisson regression
ab_m_LBS= lmer(LBS ~ ft + grid + (1|byear), data = subset(male))
#negative binomial regression
ab_m_surv= glmer.nb(age ~ ft + grid + (1|byear), data = subset(male), control=glmerControl(tol=1e-6,optimizer="bobyqa",optCtrl=list(maxfun=1e19)))
I then want to only plot two of the variables (ft2 and gridSU) from each model.
ab_f_LBS <- tidy(ab_f_LBS) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group))
ab_m_LBS <- tidy(ab_m_LBS) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group))
ab_f_surv <- tidy(ab_f_surv) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group))
ab_m_surv <- tidy(ab_m_surv) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group))
I am then ready to make a dot-whisker plot.
#required packages
library(dotwhisker)
library(broom)
dwplot(list(ab_f_LBS, ab_m_LBS, ab_f_surv, ab_m_surv),
vline = geom_vline(xintercept = 0, colour = "black", linetype = 2),
dodge_size=0.2,
style="dotwhisker") %>% # plot line at zero _behind_ coefs
relabel_predictors(c(ft2= "Immigrants",
gridSU = "Grid (SU)")) +
theme_classic() +
xlab("Coefficient estimate (+/- CI)") +
ylab("") +
scale_color_manual(values=c("#000000", "#666666", "#999999", "#CCCCCC"),
labels = c("Female LRS", "Male LRS", "Female survival", "Male survival"),
name = "First generation models") +
theme(axis.title=element_text(size=10),
axis.text.x = element_text(size=10),
axis.text.y = element_text(size=12, angle=90, hjust=.5),
legend.position = c(0.7, 0.8),
legend.justification = c(0, 0),
legend.title=element_text(size=12),
legend.text=element_text(size=10),
legend.key = element_rect(size = 0.1),
legend.key.size = unit(0.5, "cm"))
I am encountering this problem:
Error message: Error in psych::describe(x, ...) : unused arguments (conf.int = TRUE, conf.int = TRUE). When I try with just 1 model (i.e. dwplot(ab_f_LBS) it works, but as soon as I add another model I get this error message.
How can I plot the 4 regression models on the same dot-whisker plot?
Update
Results of traceback():
> traceback()
14: stop(gettextf("cannot coerce class \"%s\" to a data.frame", deparse(class(x))),
domain = NA)
13: as.data.frame.default(x)
12: as.data.frame(x)
11: tidy.default(x, conf.int = TRUE, ...)
10: broom::tidy(x, conf.int = TRUE, ...)
9: .f(.x[[i]], ...)
8: .Call(map_impl, environment(), ".x", ".f", "list")
7: map(.x, .f, ...)
6: purrr::map_dfr(x, .id = "model", function(x) {
broom::tidy(x, conf.int = TRUE, ...)
})
5: eval(lhs, parent, parent)
4: eval(lhs, parent, parent)
3: purrr::map_dfr(x, .id = "model", function(x) {
broom::tidy(x, conf.int = TRUE, ...)
}) %>% mutate(model = if_else(!is.na(suppressWarnings(as.numeric(model))),
paste("Model", model), model))
2: dw_tidy(x, by_2sd, ...)
1: dwplot(list(ab_f_LBS, ab_m_LBS, ab_f_surv, ab_m_surv), effects = "fixed",
by_2sd = FALSE)
Here is my session info:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dotwhisker_0.5.0 broom_0.5.0 broom.mixed_0.2.2
[4] glmmTMB_0.2.2.0 lme4_1.1-18-1 Matrix_1.2-14
[7] bindrcpp_0.2.2 forcats_0.3.0 stringr_1.3.1
[10] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1
[13] tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
[16] tidyverse_1.2.1 lubridate_1.7.4 devtools_1.13.6
loaded via a namespace (and not attached):
[1] ggstance_0.3.1 tidyselect_0.2.5 TMB_1.7.14 reshape2_1.4.3
[5] splines_3.5.1 haven_1.1.2 lattice_0.20-35 colorspace_1.3-2
[9] rlang_0.2.2 pillar_1.3.0 nloptr_1.2.1 glue_1.3.0
[13] withr_2.1.2 modelr_0.1.2 readxl_1.1.0 bindr_0.1.1
[17] plyr_1.8.4 munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0
[21] rvest_0.3.2 coda_0.19-2 memoise_1.1.0 Rcpp_0.12.19
[25] scales_1.0.0 backports_1.1.2 jsonlite_1.5 hms_0.4.2
[29] digest_0.6.18 stringi_1.2.4 grid_3.5.1 cli_1.0.1
[33] tools_3.5.1 magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4
[37] pkgconfig_2.0.2 MASS_7.3-50 xml2_1.2.0 assertthat_0.2.0
[41] minqa_1.2.4 httr_1.3.1 rstudioapi_0.8 R6_2.3.0
[45] nlme_3.1-137 compiler_3.5.1
I have a couple of comments/suggestions. (tl;dr is that you can streamline your modeling/graphic-creating process considerably ...)
Setup:
library(dplyr)
Q1 <- read.table("Q1.txt", header=TRUE)
library(lme4)
library(glmmTMB) ## use this for NB models
library(broom.mixed) ## CRAN version should be OK
library(dotwhisker) ## use devtools::install_github("fsolt/dotwhisker")
The model you have labeled as a "Poisson model" isn't -- it's a linear mixed model, and the parameters won't be particularly comparable to a NB model
I got a lot of warnings from glmer.nb and changed to glmmTMB
#Female models
#poisson regression
ab_f_LBS= glmer(LBS ~ ft + grid + (1|byear),
family=poisson, data = subset(Q1,sex=="F"))
#negative binomial regression
ab_f_surv = glmmTMB(age ~ ft + grid + (1|byear),
data = subset(Q1, sex=="F"),
family=nbinom2)
#Male models
#poisson regression
ab_m_LBS= update(ab_f_LBS, data=subset(Q1, sex=="M"))
ab_m_surv= update(ab_f_surv, data=subset(Q1, sex=="M"))
Now the plot:
dwplot(list(LBS_M=ab_m_LBS,LBS_F=ab_f_LBS,surv_m=ab_m_surv,surv_f=ab_f_surv),
effects="fixed",by_2sd=FALSE)+
geom_vline(xintercept=0,lty=2)
ggsave("dwplot1.png")
> sessionInfo()
R Under development (unstable) (2018-07-26 r75007)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_CA.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF8 LC_COLLATE=en_CA.UTF8
[5] LC_MONETARY=en_CA.UTF8 LC_MESSAGES=en_CA.UTF8
[7] LC_PAPER=en_CA.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dotwhisker_0.5.0.9000 ggplot2_3.0.0
[4] broom.mixed_0.2.3 glmmTMB_0.2.2.0 lme4_1.1-18.9000
[7] Matrix_1.2-14 dplyr_0.7.6
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 pillar_1.3.0 compiler_3.6.0 nloptr_1.2.1
[5] plyr_1.8.4 TMB_1.7.14 bindr_0.1.1 tools_3.6.0
[9] digest_0.6.18 ggstance_0.3.1 tibble_1.4.2 nlme_3.1-137
[13] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2
[17] coda_0.19-2 withr_2.1.2 stringr_1.3.1 grid_3.6.0
[21] tidyselect_0.2.5 glue_1.3.0 R6_2.3.0 minqa_1.2.4
[25] purrr_0.2.5 tidyr_0.8.1 reshape2_1.4.3 magrittr_1.5
[29] backports_1.1.2 scales_1.0.0 MASS_7.3-50 splines_3.6.0
[33] assertthat_0.2.0 colorspace_1.3-2 labeling_0.3 stringi_1.2.4
[37] lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0 crayon_1.3.4
With help from this vignette. If you want to use tidy models, you'll need to create one data.frame with a model variable.
ab_f_LBS <- tidy(ab_f_LBS) %>%
filter(!grepl('sd_Observation.Residual', term)) %>%
filter(!grepl('byear', group)) %>%
mutate(model = "ab_f_LBS")
ab_m_LBS <- tidy(ab_m_LBS) %>%
filter(!grepl('sd_Observation.Residual', term)) %>%
filter(!grepl('byear', group)) %>%
mutate(model = "ab_m_LBS")
ab_f_surv <- tidy(ab_f_surv) %>%
filter(!grepl('sd_Observation.Residual', term)) %>%
filter(!grepl('byear', group)) %>%
mutate(model = "ab_f_surv")
ab_m_surv <- tidy(ab_m_surv) %>%
filter(!grepl('sd_Observation.Residual', term)) %>%
filter(!grepl('byear', group)) %>%
mutate(model = "ab_m_surv")
#required packages
library(dotwhisker)
library(broom)
tidy_mods <- bind_rows(ab_f_LBS, ab_m_LBS, ab_f_surv, ab_m_surv)
dwplot(tidy_mods,
vline = geom_vline(xintercept = 0, colour = "black", linetype = 2),
dodge_size=0.2,
style="dotwhisker") %>% # plot line at zero _behind_ coefs
relabel_predictors(c(ft2= "Immigrants",
gridSU = "Grid (SU)")) +
theme_classic() +
xlab("Coefficient estimate (+/- CI)") +
ylab("") +
scale_color_manual(values=c("#000000", "#666666", "#999999", "#CCCCCC"),
labels = c("Female LRS", "Male LRS", "Female survival", "Male survival"),
name = "First generation models") +
theme(axis.title=element_text(size=10),
axis.text.x = element_text(size=10),
axis.text.y = element_text(size=12, angle=90, hjust=.5),
legend.position = c(0.7, 0.8),
legend.justification = c(0, 0),
legend.title=element_text(size=12),
legend.text=element_text(size=10),
legend.key = element_rect(size = 0.1),
legend.key.size = unit(0.5, "cm"))
From what I've seen so far, and to quote the vignette:
one can change the shape of the point estimate instead of using
different colors.
So I'm not sure if both shape and color changes are easily changes without digging a little further...
Am i missing something? the example in the ggalluvial package gives this error:
> library(ggalluvial)
> ggplot(as.data.frame(Titanic),
+ aes(weight = Freq,
+ axis1 = Class, axis2 = Sex, axis3 = Age,
+ fill = Survived)) +
+ geom_alluvium() +
+ scale_x_continuous(breaks = 1:3, labels = c("Class", "Sex", "Age"))
Error: Invalid column specification
UPDATE 2:
as per DanHall's request:
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggalluvial_0.6.0 ggthemes_3.4.0 alluvial_0.1-2 dplyr_0.5.0 purrr_0.2.2 readr_0.2.2 tidyr_0.6.1
[8] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 compiler_3.4.3 plyr_1.8.4 base64enc_0.1-3 forcats_0.2.0 tools_3.4.3 digest_0.6.12
[8] evaluate_0.10.1 jsonlite_1.5 lubridate_1.5.6 gtable_0.2.0 nlme_3.1-128 lattice_0.20-33 rlang_0.1.4
[15] psych_1.6.4 DBI_0.6 yaml_2.1.14 parallel_3.4.3 haven_1.0.0 stringr_1.2.0 httr_1.3.1
[22] knitr_1.19 xml2_1.1.1 hms_0.3 rprojroot_1.2 grid_3.4.3 R6_2.2.2 readxl_0.1.1
[29] rmarkdown_1.8 reshape2_1.4.2 modelr_0.1.0 magrittr_1.5 backports_1.1.1 htmltools_0.3.6 scales_0.5.0
[36] rsconnect_0.8.5 assertthat_0.1 mnormt_1.5-4 rvest_0.3.2 colorspace_1.3-2 labeling_0.3 stringi_1.1.6
[43] lazyeval_0.2.1 munsell_0.4.3 broom_0.4.1
See below, this code works on another machine. When something that is working for other people isn't working for you, it can be useful to run update.packages() and follow the instructions to update any outdated packages you may have installed. This turned out to be the solution here.
It works on my machine as is:
ggplot(as.data.frame(Titanic),
aes(weight = Freq,
axis1 = Class, axis2 = Sex, axis3 = Age,
fill = Survived)) +
geom_alluvium() +
scale_x_continuous(breaks = 1:3, labels = c("Class", "Sex", "Age"))
It also works when calling example(geom_alluvium, package = "ggalluvial").
Here's another usage example (from the vignette).
ggplot(as.data.frame(Titanic),
aes(weight = Freq,
axis1 = Survived, axis2 = Sex, axis3 = Class)) +
geom_alluvium(aes(fill = Class),
width = 0, knot.pos = 0, reverse = FALSE) +
guides(fill = FALSE) +
geom_stratum(width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
scale_x_continuous(breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
coord_flip() +
ggtitle("Titanic survival by class and sex")
I'm trying to change the legend title on my ggplot. Here are two examples (partly from here); the first one is with the sf package, which is what I'm really using. The second one is without that package which seems to have the same problem.
With sf, what I want:
cities <- tibble::tribble(
~ lon, ~ lat, ~ name, ~ pop,
5.121420, 52.09074, "Utrecht", 311367,
6.566502, 53.21938, "Groningen", 189991,
4.895168, 52.37022, "Amsterdam", 779808
) %>% sf::st_as_sf(coords = c("lon", "lat"), crs = 4326)
lines_sfc <- sf::st_sfc(list(
sf::st_linestring(rbind(cities$geometry[[1]], cities$geometry[[2]])),
sf::st_linestring(rbind(cities$geometry[[2]], cities$geometry[[3]]))
))
lines <- sf::st_sf(
id = 1:2,
size = c(10,50),
geometry = lines_sfc,
crs = 4326
)
ggplot() +
geom_sf(aes(colour = pop, size=pop), data = cities)
which gives a nice legend with bad title:
Using this, I modified my script to change the legend:
ggplot() +
geom_sf(aes(colour = pop), data = cities) +
guides(colour=guide_legend(title="New color"))
which gives:
The legend isn't a gradient anymore, why?
If you don't have sf, the same happens with a geom_bar:
ggplot() +
geom_bar(aes(x=name, y=pop, colour = pop), stat="identity", data = cities)
gives:
while this :
ggplot() +
geom_bar(aes(x=name, y=pop, colour = pop), stat="identity", data = cities) +
guides(colour=guide_legend(title="New color"))
gives:
Is there a way to change only the title of the legend and not the whole thing?
my sessionInfo:
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sp_1.2-6 rgeos_0.3-26 ggplot2_2.2.1.9000 dplyr_0.7.4 sf_0.6-1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 pillar_1.0.1 compiler_3.4.3 git2r_0.19.0 plyr_1.8.4
[6] bindr_0.1 viridis_0.4.0 class_7.3-14 tools_3.4.3 digest_0.6.13
[11] viridisLite_0.2.0 memoise_1.1.0 tibble_1.4.1 gtable_0.2.0 lattice_0.20-35
[16] pkgconfig_2.0.1 rlang_0.1.6.9002 cli_1.0.0 DBI_0.7 rstudioapi_0.6
[21] rgdal_1.2-16 curl_2.8.1 bindrcpp_0.2 gridExtra_2.3 e1071_1.6-8
[26] withr_2.1.1.9000 httr_1.3.1 knitr_1.18 devtools_1.13.3 classInt_0.1-24
[31] grid_3.4.3 glue_1.1.1 R6_2.2.2 udunits2_0.13 magrittr_1.5
[36] scales_0.5.0.9000 RStudioShortKeys_0.1.0 units_0.5-1 assertthat_0.2.0 colorspace_1.3-2
[41] labeling_0.3 utf8_1.1.3 lazyeval_0.2.1 munsell_0.4.3 crayon_1.3.4
ggplot() +
geom_sf(aes(colour = pop, size=pop), data = cities) +
scale_color_continuous(name = 'newname')
You can call the color scale and just specify the name
I would like to add rolling medians to my data in ggplot2. Calculating the rolling median in the ggplot aes and in the data.frame itself do not produce similar results (see plots).
I am looking for a solution within ggplot2 that produces the same results as in the data.frame calculation. I know this can be done with ggseas::stat_rollapplyr, but would prefer a solution in base ggplot2.
code;
library(ggplot2)
library(data.table)
library(zoo)
library(gridExtra)
# set up dummy data
set.seed(123)
x = data.table(
date = rep( seq(from = as.Date("2016-01-01"), to = as.Date("2016-04-01"), by = "day"), 2),
y = c(5 + runif(92), 6 + runif(92)),
label = c(rep("A", 92), rep("B", 92))
)
x[, `:=` (
roll = rollmedian(y, k = 15, fill = NA, align = "center")
), by = label]
# plots
theme_set(theme_bw())
p = ggplot(x) +
geom_line(aes(date, y), col = "lightgrey") +
facet_wrap(~label)
# within aes
p1 = p +
geom_line(aes(date, rollmedian(y, k = 15, fill = NA, align = "center"))) +
labs(title = "within aes")
# calculated in data.frame
p2 = p +
geom_line(aes(date, roll)) +
labs(title = "within data.frame")
grid.arrange(p1, p2)
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.7-13 magrittr_1.5 data.table_1.9.7 ggplot2_2.1.0.9000
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.2-6 scales_0.4.0 assertthat_0.1 plyr_1.8.4 rsconnect_0.4.3 tools_3.2.3 gtable_0.2.0 tibble_1.2 Rcpp_0.12.
7 grid_3.2.3 munsell_0.4.3
[13] lattice_0.20-33
How to make drop=TRUE work (so legend contains only categories that exist in the subset) within scale_colour_discrete when using ggplot and trying to have stable colour mapping for categories in different plots?
This question is linked to this one and especially this comment.
Reproducible code borrowed from one of the answers in the linked question:
set.seed(2014)
library(ggplot2)
dataset <- data.frame(category = rep(LETTERS[1:5], 100),
x = rnorm(500, mean = rep(1:5, 100)),
y = rnorm(500, mean = rep(1:5, 100)))
dataset$fCategory <- factor(dataset$category)
subdata <- subset(dataset, category %in% c("A", "D", "E"))
ggplot(dataset, aes(x = x, y = y, colour = fCategory)) + geom_point()
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point() +
scale_colour_discrete(drop=TRUE,limits = levels(dataset$fCategory))
Why does the drop=TRUE not work in the second plot? The legend still contains all categories.
Output from sessionInfo():
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 digest_0.6.8 grid_3.1.2 gtable_0.1.2 labeling_0.3
[6] MASS_7.3-35 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.3
[11] reshape2_1.4.1 scales_0.2.4 stringr_0.6.2 tools_3.1.2
This is either a misconception of what drop does (the help entry does not give much detail, unfortunately) or a bug. However, I'd recommend dropping drop altogether (pun intended) and setting both limits and breaks:
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point() +
scale_colour_discrete(limits = levels(dataset$fCategory),
breaks = unique(subdata$fCategory))
The colour set is consistent, the legend is fine.