ggplot2: plot time series and multiple point forecasts on a quasi time axis - r

I have a problem ploting time series data and multiple point forecasts.
I would like to plot historical data and some point forecasts. Historical data should be linked by a line, point forecasts on the other hand by an arrow, since second forecasted value say forecast_02 is actualy a revised forecast_01.
Libraries used:
library(ggplot2)
library(plyr)
library(dplyr)
library(stringr)
library(grid)
Here is my dummy data:
set.seed(1)
my_df <-
structure(list(values = c(-0.626453810742332, 0.183643324222082,
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015,
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), c = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "2014-09-01", "2014-10-01"), type_of_value = c("historical",
"historical", "historical", "historical", "historical", "historical",
"historical", "historical", "forecast_01", "forecast_02"), time_and_forecast = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "forecast", "forecast")), .Names = c("values",
"c", "time", "type_of_value", "time_and_forecast"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L)
which looks like this:
Source: local data frame [10 x 5]
values c time type_of_value time_and_forecast
1 -0.6264538 a 2014-01-01 historical 2014-01-01
2 0.1836433 b 2014-02-01 historical 2014-02-01
3 -0.8356286 c 2014-03-01 historical 2014-03-01
4 1.5952808 d 2014-04-01 historical 2014-04-01
5 0.3295078 e 2014-05-01 historical 2014-05-01
6 -0.8204684 f 2014-06-01 historical 2014-06-01
7 0.4874291 g 2014-07-01 historical 2014-07-01
8 0.7383247 h 2014-08-01 historical 2014-08-01
9 0.5757814 i 2014-09-01 forecast_01 forecast
10 -0.3053884 j 2014-10-01 forecast_02 forecast
With the code below I almost managed to produce a plot that I wanted. However, I cannot get my historical data points to be linked by a line.
# my code for almost perfect chart
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = time_and_forecast)) +
geom_point(size = 5) +
geom_line(arrow = arrow()) +
theme_minimal()
Could you help me link the blue points with a line? Thank you.
# sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Slovenian_Slovenia.1250 LC_CTYPE=Slovenian_Slovenia.1250 LC_MONETARY=Slovenian_Slovenia.1250
[4] LC_NUMERIC=C LC_TIME=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.0.0 dplyr_0.4.1 plyr_1.8.3 ggplot2_1.0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 assertthat_0.1 digest_0.6.8 MASS_7.3-40 R6_2.0.1 gtable_0.1.2
[7] DBI_0.3.1 magrittr_1.5 scales_0.2.4 stringi_0.4-1 lazyeval_0.1.10 reshape2_1.4.1
[13] labeling_0.3 proto_0.3-10 tools_3.2.0 munsell_0.4.2 parallel_3.2.0 colorspace_1.2-6

I think this will get what you want:
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = 1)) +
geom_point(size = 5) +
geom_line(data=my_df[my_df$type_of_value=='historical',]) +
geom_line(data=my_df[!my_df$type_of_value=='historical',], arrow=arrow()) +
theme_minimal()
ggplot tries to draw lines within your x categorical groups, but it fails because each group only has 1 value. If you specify that they should all be the same group with group = 1, it will draw the lines across groups. Since you wanted a line for the historical group and an arrow between the other two points, you can make two geom_line() calls on subsets of the dataframe with different arrow parameters. I don't know if there's a way to get ggplot to pick arrows automatically by group (like it does with color, linetype, etc).

You may want to split up the datasets:
library(ggplot)
library(grid)
df_hist <- subset(my_df, type_of_value == "historical")
df_forc <- subset(my_df, type_of_value != "historical")
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow())
You could even added a shaded rectangle to further stress the forecasting region:
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow()) +
annotate("rect", xmin = min(df_forc$time), xmax = max(df_forc$time),
ymin = -Inf, ymax = +Inf, alpha = 0.25, fill = "yellow")

Related

Is there a way to subset data in ggrepel with data inherited from the pipe? [duplicate]

I am trying to subset a layer of a plot where I am passing the data to ggplot through a pipe.
Here is an example:
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(aes(x = Month, y = `Relative Value`))
This gives:
I would like only one of those lines to appear, which I would be able to do if something like this worked in the geom_line layer:
geom_line(subset = .(Indicator == 1), aes(x = Month, y = `Relative Value`))
Edit:
Session info:
R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 x64
(build 9200)
locale: 2 LC_COLLATE=English_United States.1252
LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United
States.1252 LC_NUMERIC=C [5]
LC_TIME=English_United States.1252
attached base packages: 2 stats graphics grDevices utils
datasets methods base
other attached packages: 2 scales_0.3.0 lubridate_1.3.3
ggplot2_1.0.1 lazyeval_0.1.10 dplyr_0.4.3 RSQLite_1.0.0
readr_0.2.2 [8] RJDBC_0.2-5 DBI_0.3.1 rJava_0.9-7
loaded via a namespace (and not attached): 2 Rcpp_0.12.2
knitr_1.11 magrittr_1.5 MASS_7.3-40 munsell_0.4.2
lattice_0.20-31 [7] colorspace_1.2-6 R6_2.1.1 stringr_1.0.0
plyr_1.8.3 tools_3.2.1 parallel_3.2.1 [13] grid_3.2.1
gtable_0.1.2 htmltools_0.2.6 yaml_2.1.13 assertthat_0.1
digest_0.6.8 [19] reshape2_1.4.1 memoise_0.2.1
rmarkdown_0.8.1 labeling_0.3 stringi_1.0-1 zoo_1.7-12
[25] proto_0.3-10
tl;dr: Pass the data to that layer as a function that subsets the plot's data according to your criteria.
According to ggplots documentation on layers, you have 3 options when passing the data to a new layer:
If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().
A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for
which variables will be created.
A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the
layer data.
The first two options are the most usual ones, but the 3rd is perfect for our needs when the data has been modified through pyps.
In your example, adding data = function(x) subset(x,Indicator == 1) to the geom_line does the trick:
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(data = function(x) subset(x,Indicator == 1), aes(x = Month, y = `Relative Value`))
This is the resulting plot
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(aes(x = Month, y = `Relative Value`,linetype=Indicator)) +
scale_linetype_manual(values=c("1"="solid","2"="blank"))
yields:
You might benefit from stat_subset(), a stat I made for my personal use that is available in metR: https://eliocamp.github.io/metR/articles/Visualization-tools.html#stat_subset
It has an aesthetic called subset that takes a logical expression and subsets the data accordingly.
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
metR::stat_subset(aes(x = Month, y = `Relative Value`, subset = Indicator == 1),
geom = "line")

ggplot2 both axis labels inside plot area

I would like to create a ggplot2 with both the y-axis and x-axis labels on the inside, i.e., facing inwards and placed inside the plot area.
This previous SO answer by Z.Lin solves it for the case of the y-axis, and I've got that working just fine. But extending that approach to both axes has me stumped. grobs is hard, I think.
So I attempted to start small, by adapting Z.Lin's code to work for the x-axis instead of the y-axis, but I have not been able to achieve even that. grobs is really complicated. My attempt (below) runs without errors/warnings until grid.draw(), where it crashes and burns (I think I'm misusing some args somewhere, but I can't identify which and at this point I'm just guessing).
# locate the grob that corresponds to x-axis labels
x.label.grob <- gp$grobs[[which(gp$layout$name == "axis-b")]]$children$axis
# remove x-axis labels from the plot, & shrink the space occupied by them
gp$grobs[[which(gp$layout$name == "axis-b")]] <- zeroGrob()
gp$widths[gp$layout$l[which(gp$layout$name == "axis-b")]] <- unit(0, "cm")
# create new gtable
new.x.label.grob <- gtable::gtable(widths = unit(1, "npc"))
# place axis ticks in the first row
new.x.label.grob <-
gtable::gtable_add_rows(
new.x.label.grob,
heights = x.label.grob[["heights"]][1])
new.x.label.grob <-
gtable::gtable_add_grob(
new.x.label.grob,
x.label.grob[["grobs"]][[1]],
t = 1, l = 1)
# place axis labels in the second row
new.x.label.grob <-
gtable::gtable_add_rows(
new.x.label.grob,
heights = x.label.grob[["heights"]][2])
new.x.label.grob <-
gtable::gtable_add_grob(
new.x.label.grob,
x.label.grob[["grobs"]][[2]],
t = 1, l = 2)
# add third row that takes up all the remaining space
new.x.label.grob <-
gtable::gtable_add_rows(
new.x.label.grob,
heights = unit(1, "null"))
gp <-
gtable::gtable_add_grob(
x = gp,
grobs = new.x.label.grob,
t = gp$layout$t[which(gp$layout$name == "panel")],
l = gp$layout$l[which(gp$layout$name == "panel")])
grid.draw(gp)
# Error in unit(widths, default.units) :
# 'x' and 'units' must have length > 0
I guess my question can be split into three semi-independent parts, where each subsequent question supersedes the earlier ones (so if you can answer a later question, there will be no need to bother with the earlier ones):
can anyone adapt the existing answer to the x-axis?
can anyone provide code in that vein to get both axes inside?
does anyone know of a neater way to achieve both axes inside for ggplot2?
Here's my MWE (mostly replicating Z.Lin's answer, but with new data):
library(dplyr)
library(magrittr)
library(ggplot2)
library(grid)
library(gtable)
library(errors)
df <- structure(list(
temperature = c(200, 300, 400, 500, 600, 700, 800, 900),
diameter =
structure(
c(13.54317, 10.32521, 10.23137, 17.90464, 29.98183, 55.65514, 101.60747, 147.3074),
id = "<environment>",
errors = c(1.24849, 0.46666, 0.36781, 0.48463, 0.94639, 1.61459, 6.98346, 12.18353),
class = "errors")),
row.names = c(NA, -8L),
class = "data.frame")
p <- ggplot() +
geom_smooth(data = df %>% filter(temperature >= 400),
aes(x = temperature, y = diameter),
method = "lm", formula = "y ~ x",
se = FALSE, fullrange = TRUE) +
# experimental errors as red ribbon (instead of errorbars)
geom_ribbon(data = df,
aes(x = temperature,
ymin = errors_min(diameter),
ymax = errors_max(diameter)),
fill = alpha("red", 0.2),
colour = alpha("red", 0.2)) +
geom_point(data = df,
aes(x = temperature, y = diameter),
size = 0.7) +
geom_line(data = df,
aes(x = temperature, y = diameter),
size = 0.15) +
scale_x_continuous(breaks = seq(200, 900, 200)) +
scale_y_log10(breaks = c(10, seq(30, 150, 30)),
labels = c("10", "30", "60", "90", "120", "150=d/nm")) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_text(hjust = 0))
# convert from ggplot to grob object
gp <- ggplotGrob(p)
y.label.grob <- gp$grobs[[which(gp$layout$name == "axis-l")]]$children$axis
gp$grobs[[which(gp$layout$name == "axis-l")]] <- zeroGrob()
gp$widths[gp$layout$l[which(gp$layout$name == "axis-l")]] <- unit(0, "cm")
new.y.label.grob <- gtable::gtable(heights = unit(1, "npc"))
new.y.label.grob <-
gtable::gtable_add_cols(
new.y.label.grob,
widths = y.label.grob[["widths"]][2])
new.y.label.grob <-
gtable::gtable_add_grob(
new.y.label.grob,
y.label.grob[["grobs"]][[2]],
t = 1, l = 1)
new.y.label.grob <-
gtable::gtable_add_cols(
new.y.label.grob,
widths = y.label.grob[["widths"]][1])
new.y.label.grob <-
gtable::gtable_add_grob(
new.y.label.grob,
y.label.grob[["grobs"]][[1]],
t = 1, l = 2)
new.y.label.grob <-
gtable::gtable_add_cols(
new.y.label.grob,
widths = unit(1, "null"))
gp <-
gtable::gtable_add_grob(
x = gp,
grobs = new.y.label.grob,
t = gp$layout$t[which(gp$layout$name == "panel")],
l = gp$layout$l[which(gp$layout$name == "panel")])
grid.draw(gp)
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] errors_0.3.4 gtable_0.3.0 ggplot2_3.3.2 magrittr_1.5 dplyr_1.0.2
loaded via a namespace (and not attached):
[1] rstudioapi_0.11 splines_3.6.2 tidyselect_1.1.0 munsell_0.5.0
[5] lattice_0.20-41 colorspace_1.4-1 R6_2.5.0 rlang_0.4.8
[9] tools_3.6.2 nlme_3.1-148 mgcv_1.8-31 withr_2.3.0
[13] ellipsis_0.3.1 digest_0.6.27 yaml_2.2.1 tibble_3.0.4
[17] lifecycle_0.2.0 crayon_1.3.4 Matrix_1.2-18 purrr_0.3.4
[21] farver_2.0.3 vctrs_0.3.4 glue_1.4.2 compiler_3.6.2
[25] pillar_1.4.6 generics_0.1.0 scales_1.1.1 pkgconfig_2.0.3
Rather than "freezing" the plot as a grob tree then hacking the grobs, I thought it might be useful to see how we could move the axes inside but keep the object as a ggplot. The way to do this is to write a function that takes your plot, extracts the necessary information, then builds axes and adds them as annotations.
The returned object is a normal ggplot, to which you can add layers, scales and modify themes as normal:
move_axes_inside <- function(p)
{
b <- ggplot_build(p)
x_breaks <- b$layout$panel_scales_x[[1]]$break_info()
y_breaks <- b$layout$panel_scales_y[[1]]$break_info()
x_range <- b$layout$panel_params[[1]]$x.range
y_range <- b$layout$panel_params[[1]]$y.range
y_breaks$major <- diff(y_breaks$range)/diff(y_range) * y_breaks$major +
(y_breaks$range[1] - y_range[1])/diff(y_range)
x_breaks$major <- diff(x_breaks$range)/diff(x_range) * x_breaks$major +
(x_breaks$range[1] - x_range[1])/diff(x_range)
y <- grid::yaxisGrob(at = y_breaks$major, label = y_breaks$labels, main = FALSE)
x <- grid::xaxisGrob(at = x_breaks$major, label = x_breaks$labels, main = FALSE)
p + annotation_custom(y, xmin = x_range[1], xmax = x_range[1]) +
annotation_custom(x, ymin = y_range[1], ymax = y_range[1]) +
theme(axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank())
}
So testing it with your plot we get:
p2 <- move_axes_inside(p)
p2
And we can change theme elements etc:
p2 + theme(panel.grid.major = element_line())
This would need a bit of development and testing to get it working with discrete axes and so on, but it should work for arbitrary continuous axes as-is.
In case anyone else happens to be looking for a way to make a compact plot using ggplot2, for example for placement inside a page margin, I perhaps you'll be helped by the full code for a fairly publication-ready inside-the-margin plot made possible by Allan Cameron's elegant approach in the answer above.
Placing a plot inside a page margin is usually not advisable, and depends on the available margin, the type of document, etc. In any case, it's probably smart to make the plot as clutter-free and stream-lined as possible. That's why, in my case, I was looking for a way to keep as much of the plot inside the panel's footprint, so to speak.
Enough background, here's the code:
library(dplyr)
library(magrittr)
library(ggplot2)
library(grid)
library(gtable)
library(errors)
theme_set(theme_grey())
move_axes_inside <- function(p) {
b <- ggplot_build(p)
x_breaks <- b$layout$panel_scales_x[[1]]$break_info()
y_breaks <- b$layout$panel_scales_y[[1]]$break_info()
x_range <- b$layout$panel_params[[1]]$x.range
y_range <- b$layout$panel_params[[1]]$y.range
y_breaks$major <-
diff(y_breaks$range) / diff(y_range) * y_breaks$major +
(y_breaks$range[1] - y_range[1]) / diff(y_range)
x_breaks$major <-
diff(x_breaks$range) / diff(x_range) * x_breaks$major +
(x_breaks$range[1] - x_range[1]) / diff(x_range)
y <-
grid::yaxisGrob(
at = y_breaks$major,
label = y_breaks$labels,
gp =
gpar(
lwd = 1, # line width of axis and tick marks
fontsize = 8,
cex = 0.8, # multiplier to font size
lineheight = 0.8), # tick mark length
main = FALSE)
x <-
grid::xaxisGrob(
at = x_breaks$major,
label = x_breaks$labels,
gp =
gpar(
lwd = 2, # draw axis with thicker line width
fontsize = 8,
cex = 0.8, # multiplier to font size
lineheight = 0.8), # tick mark length
main = FALSE)
p <-
p +
annotation_custom(
# draw y-axis, shifted slightly inwards (so that axis is inside panel.border)
grob = y,
xmin = x_range[1] + 0.01 * diff(x_range),
xmax = x_range[1] + 0.01 * diff(x_range)) +
annotation_custom(
grob = x,
ymin = y_range[1] + 0.01 * diff(y_range),
ymax = y_range[1] + 0.01 * diff(y_range)) +
theme(
axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_blank())
return(p)
}
p <- ggplot() +
geom_line(
stat = "smooth", method = lm, formula = "y ~ x",
se = FALSE, fullrange = TRUE,
data = df %>% filter(temperature >= 400),
aes(x = temperature, y = diameter),
colour = "blue", size = 2, alpha = 0.35) +
# experimental errors as red ribbon (instead of errorbars)
geom_ribbon(
data = df,
aes(x = temperature,
ymin = errors_min(diameter),
ymax = errors_max(diameter)),
fill = alpha("red", 0.25),
colour = NA) +
# data points excluded in linear fit
geom_point(
data = df %>% filter(temperature < 400),
aes(x = temperature, y = diameter),
# by default, shape=19 (filled circle)
# https://blog.albertkuo.me/post/point-shape-options-in-ggplot/
# I'd like a solid circle, so shape 16 it is
size = 1.2, shape = 16, colour = alpha("red", 0.25)) +
# data points included in linear fit
geom_point(
data = df %>% filter(temperature >= 400),
aes(x = temperature, y = diameter),
size = 1.2, shape = 16, colour = alpha("red", 0.45)) +
# I ended up putting the x-axis unit label on the outside because
# however I tried, it would not fit inside and I was not able to
# rotate the x-axis labels on the inside.
labs(x = "$T_\\mathrm{a}/\\si{\\celsius}$") +
scale_x_continuous(
breaks = seq(200, 900, 100),
# first element can't be empty string - if so then all labels dont print (weird bug?)
labels = c(" ", " ", "400", " ", "600", " ", "800", " ")) +
scale_y_log10(
breaks = c(10, 50, 90, 130),
labels = c("\\num{10}", "\\num{50}", "\\num{90}", "$\\num{130}=d/\\si{\\nm}$")) +
# note that we set some theme settings inside the move_axes_inside() function
theme(
# l = -1 was required to completely fill the space with plot panel
# b = 0 because we are making room for x-axis title on the outside
plot.margin = margin(t = 0, r = 0, b = 0, l = -1, "mm"),
# smaller text size in x-axis title, trying to conform with fontsize inside axis
# vjust moves the title closer to the x-axis line, value optimised optically
axis.title.x = element_text(size = 8 * 0.8, vjust = 2.0),
# grid lines just look busy in such a small plot
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
move_axes_inside(p)
Here's a screen-shot of the result, in a document compiled with knitr and LaTeX and with the plot inside \marginpar{}:

geom_col is not using stat_identify when values are rounded to whole numbers

I'm trying to use geom_col to chart columns for values in time series (annual and quarterly).
When I use Zoo package's YearQtr datatype for the x-axis values and I round the y-axis values to a whole number, geom_col appears to not use the default postion = 'identity' for determining the column bar heights based on the y-value of each occurrence. Instead it appears to switch to position = 'count' and treats the rounded y-values as factors, counting the number of occurrences for each factor value (e.g., 3 occurrences have a rounded y-value = 11)
If I switch to geom_line, the graph is fine with quarterly x-axis values and rounded y-axis values.
library(zoo)
library(ggplot2)
Annual.Periods <- seq(to = 2020, by = 1, length.out = 8) # 8 years
Quarter.Periods <- as.yearqtr(seq(to = 2020, by = 0.25, length.out = 8)) # 8 Quarters
Values <- seq(to = 11, by = 0.25, length.out = 8)
Data.Annual.Real <- data.frame(X = Annual.Periods, Y = round(Values, 1))
Data.Annual.Whole <- data.frame(X = Annual.Periods, Y = round(Values, 0))
Data.Quarter.Real <- data.frame(X = Quarter.Periods, Y = round(Values, 1))
Data.Quarter.Whole <- data.frame(X = Quarter.Periods, Y = round(Values, 0))
ggplot(data = Data.Annual.Real, aes(X, Y)) + geom_col()
ggplot(data = Data.Annual.Whole, aes(X, Y)) + geom_col()
ggplot(data = Data.Quarter.Real, aes(X, Y)) + geom_col()
ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_col() # appears to treat y-values as factors and uses position = 'count' to count occurrences (e.g., 3 occurrences have a rounded Value = 11)
ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_line()
rstudioapi::versionInfo()
# $mode
# [1] "desktop"
#
# $version
# [1] ‘1.3.959’
#
# $release_name
# [1] "Middlemist Red"
sessionInfo()
# R version 4.0.0 (2020-04-24)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Mojave 10.14.6
#
# Matrix products: default
# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] ggplot2_3.3.1 zoo_1.8-8
ggplot tries to guess the orientation of its geom_col()-function, meaning which variable serves as the base of the bars and which as the values to represent. Apparently without any decimal numbers in your Y- variable it choses it as it's base (it stays numeric though, no conversion to factor), and sums up your quarters.
For cases like this you can provide geom_col() with the information what variable to use as the base of the bars via the orientation=argument:
ggplot(data = Data.Quarter.Whole, aes(X, Y)) + geom_col(orientation = "x")
EDIT: I have just seen that Roman answered it in the comments.

How to colour xlabs with the corresponding colour of its jitter in a geom_jitter?

I am trying to colour the xlabs with the same colour as the point they are labelling, but I am having some trouble.
Each jitter is coloured depending on a specified variable levels, and I want the same for the xlabs.
This is my code to plot the figure:
ggplot(coverage_data, aes(x=x_values, y=coverage_data$mean, fill=coverage_data$frecuency))+
geom_jitter(size=2.5, shape=21, stroke=1.5)+
scale_fill_manual(name = "frecuency", values =c("deepskyblue4", "gray67", "darkgoldenrod2", "springgreen4", "brown1", "white"))+
xlab("Id")+
ylab("max coverage")+
theme(axis.text.x=element_text(hjust=1, colour = 'black', size = 9))
If I declare colour ( in theme(axis.text.x(element_text)) ) as a vector I get an error. Do you know how can I achieve that?
Passing a vector of colors generates a warning, but with ggplot2 3.3.0 (what I'm running) it does work.
Since you didn't share any data I've made some up:
frecuency <- rep(c("A", "B", "C", "D", "E", "F"), 10)
mean <- runif(60, 10, 20)
x_values <- runif(60, 1, 100)
coverage_data <- data.frame(frecuency, mean, x_values, stringsAsFactors = FALSE)
ggplot(coverage_data, aes(x= x_values, y= mean, fill= frecuency))+
geom_jitter(size=2.5, shape=21, stroke=1.5)+
scale_fill_manual(name = "frecuency", values =c("deepskyblue4", "gray67", "darkgoldenrod2", "springgreen4", "brown1", "white"))+
xlab("Id")+
ylab("max coverage")+
theme(axis.text.x=element_text(hjust=1, colour = c("black", "blue", "green", "yellow", "red"), size = 9))
Warning message: Vectorized input to element_text() is not
officially supported. Results may be unexpected or may change in
future versions of ggplot2.
sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
other attached packages:
[1] ggQC_0.0.31 readxl_1.3.1 forcats_0.5.0
[4] stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3
[7] readr_1.3.1 tidyr_1.0.2 tibble_2.1.3
[10] ggplot2_3.3.0 tidyverse_1.3.0

R: round() can find object, sprintf() cannot, why?

I have a function that takes a dataframe and plots a number of columns from that data frame using ggplot2. The aes() function in ggplot2 takes a label argument and I want to use sprintf to format that argument - and this is something I have done many times before in other code. When I pass the format string to sprintf (in this case "%1.1f") it says "object not found". If I use the round() function and pass an argument to that function it can find it without problems. Same goes for format(). Apparently only sprintf() is unable to see the object.
At first I thought this was a lazy evaluation issue caused by calling the function rather than using the code inline, but using force() on the format string I pass to sprintf does not resolve the issue. I can work around this, but I would like to know why it happens. Of course, it may be something trivial that I have overlooked.
Q. Why does sprintf() not find the string object?
Code follows (edited and pruned for more minimal example)
require(gdata)
require(ggplot2)
require(scales)
require(gridExtra)
require(lubridate)
require(plyr)
require(reshape)
set.seed(12345)
# Create dummy time series data with year and month
monthsback <- 64
startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback)
mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback), myvalue5 = runif(monthsback, min = 200, max = 300))
mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y"))
mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m"))
getchart_highlight_value <- function(
plotdf,
digits_used = 1
)
{
force(digits_used)
#p <- ggplot(data = plotdf, aes(x = month(mydate, label = TRUE), y = year(mydate), fill = myvalue5, label = round(myvalue5, digits_used))) +
# note that the line below using sprintf() does not work, whereas the line above using round() is fine
p <- ggplot(data = plotdf, aes(x = month(mydate, label = TRUE), y = year(mydate), fill = myvalue5, label = sprintf(paste("%1.",digits_used,"f", sep = ""), myvalue5))) +
scale_x_date(labels = date_format("%Y"), breaks = date_breaks("years")) +
scale_y_reverse(breaks = 2007:2012, labels = 2007:2012, expand = c(0,0)) +
geom_tile() + geom_text(size = 4, colour = "black") +
scale_fill_gradient2(low = "blue", high = "red", limits = c(min(plotdf$myvalue5), max(plotdf$myvalue5)), midpoint = median(plotdf$myvalue5)) +
scale_x_discrete(expand = c(0,0)) +
opts(panel.grid.major = theme_blank()) +
opts(panel.background = theme_rect(fill = "transparent", colour = NA)) +
png(filename = "c:/sprintf_test.png", width = 700, height = 300, units = "px", res = NA)
print(p)
dev.off()
}
getchart_highlight_value (plotdf <- mydf,
digits_used <- 1)
Using the minimal example of Martin (that is a minimal example, see also this question), you can make the code work by specifying the environment ggplot() should use. For that, specify the argument environment in the ggplot() function, eg like this:
require(ggplot2)
getchart_highlight_value <- function(df)
{
fmt <- "%1.1f"
ggplot(df, aes(x, x, label=sprintf(fmt, lbl)),
environment = environment()) +
geom_tile(bg="white") +
geom_text(size = 4, colour = "black")
}
df <- data.frame(x = 1:5, lbl = runif(5))
getchart_highlight_value (df)
The function environment() returns the current (local) environment, which is the environment created by the function getchart_highlight_value(). If you don't specify this, ggplot() will look in the global environment, and there the variable fmt is not defined.
Nothing to do with lazy evaluation, everything to do with selecting the right environment.
The code above produces following plot:
Here's a minimal-er example
require(ggplot2)
getchart_highlight_value <- function(df)
{
fmt <- "%1.1f"
ggplot(df, aes(x, x, label=sprintf(fmt, lbl))) + geom_tile()
}
df <- data.frame(x = 1:5, lbl = runif(5))
getchart_highlight_value (df)
It fails with
> getchart_highlight_value (df)
Error in sprintf(fmt, lbl) : object 'fmt' not found
If I create fmt in the global environment then everything is fine; maybe this explains the 'sometimes it works' / 'it works for me' comments above.
> sessionInfo()
R version 2.15.0 Patched (2012-05-01 r59304)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.1
loaded via a namespace (and not attached):
[1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 grid_2.15.0
[5] labeling_0.1 MASS_7.3-18 memoise_0.1 munsell_0.3
[9] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 reshape2_1.2.1
[13] scales_0.2.1 stringr_0.6

Resources