rolling median in ggplot2 - r

I would like to add rolling medians to my data in ggplot2. Calculating the rolling median in the ggplot aes and in the data.frame itself do not produce similar results (see plots).
I am looking for a solution within ggplot2 that produces the same results as in the data.frame calculation. I know this can be done with ggseas::stat_rollapplyr, but would prefer a solution in base ggplot2.
code;
library(ggplot2)
library(data.table)
library(zoo)
library(gridExtra)
# set up dummy data
set.seed(123)
x = data.table(
date = rep( seq(from = as.Date("2016-01-01"), to = as.Date("2016-04-01"), by = "day"), 2),
y = c(5 + runif(92), 6 + runif(92)),
label = c(rep("A", 92), rep("B", 92))
)
x[, `:=` (
roll = rollmedian(y, k = 15, fill = NA, align = "center")
), by = label]
# plots
theme_set(theme_bw())
p = ggplot(x) +
geom_line(aes(date, y), col = "lightgrey") +
facet_wrap(~label)
# within aes
p1 = p +
geom_line(aes(date, rollmedian(y, k = 15, fill = NA, align = "center"))) +
labs(title = "within aes")
# calculated in data.frame
p2 = p +
geom_line(aes(date, roll)) +
labs(title = "within data.frame")
grid.arrange(p1, p2)
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.7-13 magrittr_1.5 data.table_1.9.7 ggplot2_2.1.0.9000
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.2-6 scales_0.4.0 assertthat_0.1 plyr_1.8.4 rsconnect_0.4.3 tools_3.2.3 gtable_0.2.0 tibble_1.2 Rcpp_0.12.
7 grid_3.2.3 munsell_0.4.3
[13] lattice_0.20-33

Related

ggplot loses scale_color_manual when saving to png with ggsave

I am having a strange issue where saving a ggplot figure that I make does not maintain the colors I set using scale_color_manual. I have made a reproducible example (with some editing) using the mtcars dataset.
plot1 <- ggplot(data = mtcars %>% rownames_to_column("type") %>%
dplyr::filter(between(cyl, 6, 8)) %>%
dplyr::filter(between(gear, 4, 5))
) +
aes(y = wt, x = type) +
geom_boxplot(outlier.size = 0) +
geom_jitter(aes(color = factor(cyl), shape = factor(gear)), size = 10, position=position_jitter(width=.25, height=0)) +
#geom_smooth(method = lm, se = TRUE) +
scale_shape_manual(values=c("👧","👦"), name = "Gear", labels = c("4", "5")) + # I need 9 values (I for each ID)
scale_color_manual(values=c('red4', 'springgreen4'), name = "cyl", labels = c("4 cylinder", "5 cylinder")) +
# # geom_jitter(size=8, aes(shape=Sex, color=Sex), position = position_dodge(.4)) +
theme(legend.position = "top",
plot.title = element_text(hjust = 0.5) # Center the text title)
)
ggsave("images/review/mean_AllAgents_test.png",plot1, width=11, height=6.5, dpi=400)
The figure in the RStudio "Plots" pane has cyl colored in red and green shown below
Whereas the file saved using ggsave does not show these colors.
I have tried using the fix from this SO post. I also have tried using cowplot::save_plot. The colors do remain if I manually Export the figure from the "Plots" pane.
Does anyone know why this is occurring?
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] apastats_0.3 ggstatsplot_0.8.0 rstatix_0.7.0 hrbrthemes_0.8.0 gtsummary_1.4.2.9011 car_3.0-11
[7] carData_3.0-4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4 readr_2.0.0
[13] tidyr_1.1.3 tibble_3.1.2 tidyverse_1.3.1 Rmisc_1.5 plyr_1.8.6 lattice_0.20-41
[19] ggplot2_3.3.5 rio_0.5.27 pacman_0.5.1
EDIT
I was asked to provide additional detail in my Preferences

ggalluvial (r package) example gives error

Am i missing something? the example in the ggalluvial package gives this error:
> library(ggalluvial)
> ggplot(as.data.frame(Titanic),
+ aes(weight = Freq,
+ axis1 = Class, axis2 = Sex, axis3 = Age,
+ fill = Survived)) +
+ geom_alluvium() +
+ scale_x_continuous(breaks = 1:3, labels = c("Class", "Sex", "Age"))
Error: Invalid column specification
UPDATE 2:
as per DanHall's request:
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggalluvial_0.6.0 ggthemes_3.4.0 alluvial_0.1-2 dplyr_0.5.0 purrr_0.2.2 readr_0.2.2 tidyr_0.6.1
[8] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 compiler_3.4.3 plyr_1.8.4 base64enc_0.1-3 forcats_0.2.0 tools_3.4.3 digest_0.6.12
[8] evaluate_0.10.1 jsonlite_1.5 lubridate_1.5.6 gtable_0.2.0 nlme_3.1-128 lattice_0.20-33 rlang_0.1.4
[15] psych_1.6.4 DBI_0.6 yaml_2.1.14 parallel_3.4.3 haven_1.0.0 stringr_1.2.0 httr_1.3.1
[22] knitr_1.19 xml2_1.1.1 hms_0.3 rprojroot_1.2 grid_3.4.3 R6_2.2.2 readxl_0.1.1
[29] rmarkdown_1.8 reshape2_1.4.2 modelr_0.1.0 magrittr_1.5 backports_1.1.1 htmltools_0.3.6 scales_0.5.0
[36] rsconnect_0.8.5 assertthat_0.1 mnormt_1.5-4 rvest_0.3.2 colorspace_1.3-2 labeling_0.3 stringi_1.1.6
[43] lazyeval_0.2.1 munsell_0.4.3 broom_0.4.1
See below, this code works on another machine. When something that is working for other people isn't working for you, it can be useful to run update.packages() and follow the instructions to update any outdated packages you may have installed. This turned out to be the solution here.
It works on my machine as is:
ggplot(as.data.frame(Titanic),
aes(weight = Freq,
axis1 = Class, axis2 = Sex, axis3 = Age,
fill = Survived)) +
geom_alluvium() +
scale_x_continuous(breaks = 1:3, labels = c("Class", "Sex", "Age"))
It also works when calling example(geom_alluvium, package = "ggalluvial").
Here's another usage example (from the vignette).
ggplot(as.data.frame(Titanic),
aes(weight = Freq,
axis1 = Survived, axis2 = Sex, axis3 = Class)) +
geom_alluvium(aes(fill = Class),
width = 0, knot.pos = 0, reverse = FALSE) +
guides(fill = FALSE) +
geom_stratum(width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
scale_x_continuous(breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
coord_flip() +
ggtitle("Titanic survival by class and sex")

ggplot2 not resizing plot for datetime vline

When I add geom_hline()'s to a plot, the plot is resized to accommodate them. But when I add geom_vline()'s, the plot is not resized.
Why is this happening? How can I get the plot to resize?
MWE
library(ggplot2)
data <- data.frame(
time=c(
"2016-12-09T05:07:11Z", "2016-12-10T09:42:45Z", "2016-12-09T10:04:57Z",
"2016-12-09T02:19:04Z", "2016-12-11T17:43:02Z", "2016-12-11T05:40:48Z",
"2016-12-11T08:47:13Z", "2016-12-12T15:41:13Z"),
value=c(23.3, 8.1, 12.9, 12.7, 5.6, 3.9, 5.5, 27.8)
)
# Each contains 3 values: 1 within the domain/range of `data` and 2 on either side
vlines <- data.frame(time=c("2016-12-07T00:00:00Z", "2016-12-11T00:00:00Z", "2016-12-14T00:00:00Z"))
hlines <- data.frame(value=c(-20, 10, 50))
data$time <- strptime(as.character(data$time), "%Y-%m-%dT%H:%M:%S", tz="UTC")
vlines$time <- strptime(as.character(vlines$time), "%Y-%m-%dT%H:%M:%S", tz="UTC")
vlines$timeNum <- as.numeric(vlines$time)
p <- ggplot(data, aes(x=time, y=value)) + geom_line()
ggsave("mwe1.pdf", p)
p <- p +
geom_hline(data=hlines, aes(yintercept=value), color="red") +
geom_vline(data=vlines, aes(xintercept=timeNum), color="blue")
ggsave("mwe2.pdf", p)
mwe1.pdf
mwe2.pdf
Edit: sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.6
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.4.1 lazyeval_0.2.0
[5] plyr_1.8.4 tools_3.3.3 gtable_0.2.0 tibble_1.3.3
[9] Rcpp_0.12.12 grid_3.3.3 methods_3.3.3 rlang_0.1.1
[13] munsell_0.4.3
You can adjust x-axis using scale_x_date. Add limits to it with as.Date(range(vlines$time)).
Here is my code (adjusted according yours):
######################
# Generate input data
data <- data.frame(
time = c("2016-12-09T05:07:11Z", "2016-12-10T09:42:45Z", "2016-12-09T10:04:57Z",
"2016-12-09T02:19:04Z", "2016-12-11T17:43:02Z", "2016-12-11T05:40:48Z",
"2016-12-11T08:47:13Z", "2016-12-12T15:41:13Z"),
value = c(23.3, 8.1, 12.9, 12.7, 5.6, 3.9, 5.5, 27.8))
data$time <- strptime(as.character(data$time), "%Y-%m-%dT%H:%M:%S", tz = "UTC")
data$time <- as.Date(data$time, "%Y-%m-%dT%H:%M:%S")
vlines <- data.frame(time = c("2016-12-07T00:00:00Z",
"2016-12-11T00:00:00Z",
"2016-12-14T00:00:00Z"))
vlines$time <- strptime(as.character(vlines$time), "%Y-%m-%dT%H:%M:%S", tz = "UTC")
vlines$timeNum <- as.Date(vlines$time, "%Y-%m-%dT%H:%M:%S")
hlines <- data.frame(value = c(-20, 10, 50))
######################
# Plot your timeseries
library(ggplot2)
ggplot(data, aes(time, value)) +
geom_line() +
geom_hline(data = hlines, aes(yintercept = value), color = "red") +
geom_vline(data = vlines, aes(xintercept = timeNum), color = "blue") +
scale_x_date(limits = as.Date(range(vlines$time)))
Result:
PS: I had to tweak some time/date conversions in you code to work (code that you provided didn't work for me).
Used sessionInfo():
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1.9000 prompt_1.0.0 colorout_1.1-2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 memuse_3.0-1 clisymbols_1.2.0 crayon_1.3.2
[5] grid_3.4.1 plyr_1.8.4 gtable_0.2.0 scales_0.5.0.9000
[9] rlang_0.1.2 lazyeval_0.2.0 labeling_0.3 munsell_0.4.3
[13] compiler_3.4.1 colorspace_1.3-2 tibble_1.3.4
#PoGibas's answer didn't quite work for me, but a slight modification of his approach did.
library(ggplot2)
data <- data.frame(
time=c(
"2016-12-09T05:07:11Z", "2016-12-10T09:42:45Z", "2016-12-09T10:04:57Z",
"2016-12-09T02:19:04Z", "2016-12-11T17:43:02Z", "2016-12-11T05:40:48Z",
"2016-12-11T08:47:13Z", "2016-12-12T15:41:13Z"),
value=c(23.3, 8.1, 12.9, 12.7, 5.6, 3.9, 5.5, 27.8)
)
# Each contains 3 values: 1 within the domain/range of `data` and 2 on either side
vlines <- data.frame(time=c("2016-12-07T00:00:00Z", "2016-12-11T00:00:00Z", "2016-12-14T00:00:00Z"))
hlines <- data.frame(value=c(-20, 10, 50))
data$time <- strptime(as.character(data$time), "%Y-%m-%dT%H:%M:%S", tz="UTC")
vlines$time <- strptime(as.character(vlines$time), "%Y-%m-%dT%H:%M:%S", tz="UTC")
vlines$timeNum <- as.numeric(vlines$time)
p <- ggplot(data, aes(x=time, y=value)) +
geom_line() +
geom_hline(data=hlines, aes(yintercept=value), color="red") +
geom_vline(data=vlines, aes(xintercept=timeNum), color="blue") +
scale_x_datetime(limits=as.POSIXct(range(vlines$time))) # add datetime limits
ggsave("mwe3.pdf", p)
MWE3
I'm leaving this question as unanswered for now because I still don't understand why this is necessary. With this approach, if I have to add several pieces to a plot, I have to maintain xmin/xmax as I go to ensure everything is visible. As this isn't necessary with the geom_hline()'s, I still think I'm missing something vital.
Edit: I'm accepting #PoGibas' answer. Seems like this is just how ggplot2 is right now.

Strange interaction between Alpha and legend

While plotting several ecdf curves that overlapped, I tried adjusting the alpha of the curves to improve visibility. While tinkering with the correct placement of alpha, I found the following.
library(ggplot2)
library(dplyr)
x <- data.frame(Var = rep(1:3, 10000)) %>%
mutate(Val = rnorm(10000)*Var,
Var = factor(Var)) %>%
arrange(Var, Val) %>%
group_by(Var) %>%
mutate(ecdf = ecdf(Val)(Val))
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.25, alpha = .9)
This gives the lines the correct alpha, but makes the legend useless. (I'm only using alpha=.9 here to demonstrate the point that the legend colors completely disappear). The work around I've found is to add:
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.35, alpha = .9) +
guides(color = guide_legend(override.aes= list(alpha = 1)))
So while I have a solution for my immediate problem, can someone explain why the first call to ggplot is messed up? Is this a bug? If it makes any difference, I believe this issue also exists when using geom_line (though a slightly different data.frame is needed).
Wierd. Here's my sessionInfo(). I've also checked to see if there are any outdated packages.
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932 LC_MONETARY=Japanese_Japan.932
[4] LC_NUMERIC=C LC_TIME=Japanese_Japan.932
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 ggplot2_1.0.1 stringr_1.0.0 tidyr_0.2.0 dplyr_0.4.2
[6] data.table_1.9.4
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 magrittr_1.5 MASS_7.3-40 munsell_0.4.2 colorspace_1.2-6
[6] R6_2.0.1 plyr_1.8.3 tools_3.2.1 parallel_3.2.1 grid_3.2.1
[11] gtable_0.1.2 DBI_0.3.1 lazyeval_0.1.10 assertthat_0.1 digest_0.6.8
[16] reshape2_1.4.1 labeling_0.3 stringi_0.5-4 scales_0.2.5 chron_2.3-47
[21] proto_0.3-10
How are they different? What am I missing?
library(ggplot2)
library(dplyr)
library(gridExtra)
x <- data.frame(Var = rep(1:3, 10000)) %>%
mutate(Val = rnorm(10000)*Var,
Var = factor(Var)) %>%
arrange(Var, Val) %>%
group_by(Var) %>%
mutate(ecdf = ecdf(Val)(Val))
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.25, alpha = .9) -> gg1
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.35, alpha = .9) +
guides(color = guide_legend(override.aes= list(alpha = 1))) -> gg2
grid.arrange(gg1, gg2)

Stable mapping with ggplot2 scale_colour_discrete: drop does not work?

How to make drop=TRUE work (so legend contains only categories that exist in the subset) within scale_colour_discrete when using ggplot and trying to have stable colour mapping for categories in different plots?
This question is linked to this one and especially this comment.
Reproducible code borrowed from one of the answers in the linked question:
set.seed(2014)
library(ggplot2)
dataset <- data.frame(category = rep(LETTERS[1:5], 100),
x = rnorm(500, mean = rep(1:5, 100)),
y = rnorm(500, mean = rep(1:5, 100)))
dataset$fCategory <- factor(dataset$category)
subdata <- subset(dataset, category %in% c("A", "D", "E"))
ggplot(dataset, aes(x = x, y = y, colour = fCategory)) + geom_point()
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point() +
scale_colour_discrete(drop=TRUE,limits = levels(dataset$fCategory))
Why does the drop=TRUE not work in the second plot? The legend still contains all categories.
Output from sessionInfo():
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 digest_0.6.8 grid_3.1.2 gtable_0.1.2 labeling_0.3
[6] MASS_7.3-35 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.3
[11] reshape2_1.4.1 scales_0.2.4 stringr_0.6.2 tools_3.1.2
This is either a misconception of what drop does (the help entry does not give much detail, unfortunately) or a bug. However, I'd recommend dropping drop altogether (pun intended) and setting both limits and breaks:
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point() +
scale_colour_discrete(limits = levels(dataset$fCategory),
breaks = unique(subdata$fCategory))
The colour set is consistent, the legend is fine.

Resources