How can I add an annotation to a faceted ggplot (with a log scale) outside the plot area - r

I'm looking to add some annotations (ideally a text and an arrow) to a faceted ggplot outside the plot area.
What's that, you say? Hasn't someone asked something similar here, here and here? Well yes. But none of them were trying to do this below an x-axis with a log scale.
With the exception of this amazing answer by #Z.Lin — but that involved a specific package and I'm looking for a more generic solution.
At first glance this would appear to be a very niche question, but for those of you familiar with forest plots this may tweak some interest.
Firstly, some context... I'm interested in presenting the results of a coxph model using a forest plot in a publication. My goal here is to take the results of a model (literally a standalone coxph object) and use it to produce output that is customisable (gotta match the style guide) and helps translate the findings for an audience that might not be au fait with the technical details of hazard ratios. Hence the annotations and directional arrows.
Before you start dropping links to r packages/functions that could help do this... here are those that I've tried so far:
ggforestplot — this package produces lovely customisable forest plots (if you are using odds ratios), but it hard codes a geom_vline at zero which doesn't help for HR's
ggforest — this package is a nerd paradise of detail, but good luck a) editing the variable names and b) trying to theme it (I mentioned earlier that I'm working with a coxph object, what I didn't mention was that the varnames are ugly — they need to be changed for a punter to understand what we're trying to communicate)
finalfit offers a great workflow and its hr_plot kicks out some informative output, but it doesn't play nice if you've already got a coxph object and you just want to plot it
So... backstory out of the way. I've created my own framework for a forest plot below to which I'd love to add — in the space below the x-axis labels and the x-axis title — two annotations that help interpret the result. My current code struggles with:
repeating the code under each facet (this is something I'm trying to avoid)
mirroring the annotations of either side of the geom_vline with a log scale
Any advice anyone might have would be much appreciated... I've added a reproducible example below.
## LOAD REQUIRED PACKAGES
library(tidyverse)
library(survival)
library(broom)
library(ggforce)
library(ggplot2)
## PREP DATA
model_data <- lung %>%
mutate(inst_cat = case_when(
inst %% 2 == 0 ~ 2,
TRUE ~ 1)) %>%
mutate(pat.karno_cat = case_when(
pat.karno < 75 ~ 2,
TRUE ~ 1)) %>%
mutate(ph.karno_cat = case_when(
ph.karno < 75 ~ 2,
TRUE ~ 1)) %>%
mutate(wt.loss_cat = case_when(
wt.loss > 15 ~ 2,
TRUE ~ 1)) %>%
mutate(meal.cal_cat = case_when(
meal.cal > 900 ~ 2,
TRUE ~ 1))
coxph_model <- coxph(
Surv(time, status) ~
sex +
inst_cat +
wt.loss_cat +
meal.cal_cat +
pat.karno_cat +
ph.karno_cat,
data = model_data)
## PREP DATA
plot_data <- coxph_model %>%
broom::tidy(
exponentiate = TRUE,
conf.int = TRUE,
conf.level = 0.95) %>%
mutate(stat_sig = case_when(
p.value < 0.05 ~ "p < 0.05",
TRUE ~ "N.S.")) %>%
mutate(group = case_when(
term == "sex" ~ "gender",
term == "inst_cat" ~ "site",
term == "pat.karno_cat" ~ "outcomes",
term == "ph.karno_cat" ~ "outcomes",
term == "meal.cal_cat" ~ "outcomes",
term == "wt.loss_cat" ~ "outcomes"))
## PLOT FOREST PLOT
forest_plot <- plot_data %>%
ggplot() +
aes(
x = estimate,
y = term,
colour = stat_sig) +
geom_vline(
aes(xintercept = 1),
linetype = 2
) +
geom_point(
shape = 15,
size = 4
) +
geom_linerange(
xmin = (plot_data$conf.low),
xmax = (plot_data$conf.high)
) +
scale_colour_manual(
values = c(
"N.S." = "black",
"p < 0.05" = "red")
) +
annotate(
"text",
x = 0.45,
y = -0.2,
col="red",
label = "indicates y",
) +
annotate(
"text",
x = 1.5,
y = -0.2,
col="red",
label = "indicates y",
) +
labs(
y = "",
x = "Hazard ratio") +
coord_trans(x = "log10") +
scale_x_continuous(
breaks = scales::log_breaks(n = 7),
limits = c(0.1,10)) +
ggforce::facet_col(
facets = ~group,
scales = "free_y",
space = "free"
) +
theme(
legend.position = "bottom",
legend.title = element_blank(),
strip.text = element_text(hjust = 0),
axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0))
)
Created on 2022-05-10 by the reprex package (v2.0.1)

I think I would use annotation_custom here. This requires standard coord_cartesian with clip = 'off', but it should be easy to re-jig your x axis to use scale_x_log10
plot_data %>%
ggplot() +
aes(
x = estimate,
y = term,
colour = stat_sig) +
geom_vline(
aes(xintercept = 1),
linetype = 2
) +
geom_point(
shape = 15,
size = 4
) +
geom_linerange(
xmin = (log10(plot_data$conf.low)),
xmax = (log10(plot_data$conf.high))
) +
scale_colour_manual(
values = c(
"N.S." = "black",
"p < 0.05" = "red")
) +
annotation_custom(
grid::textGrob(
x = unit(0.4, 'npc'),
y = unit(-7.5, 'mm'),
label = "indicates yada",
gp = grid::gpar(col = 'red', vjust = 0.5, hjust = 0.5))
) +
annotation_custom(
grid::textGrob(
x = unit(0.6, 'npc'),
y = unit(-7.5, 'mm'),
label = "indicates bada",
gp = grid::gpar(col = 'blue', vjust = 0.5, hjust = 0.5))
) +
annotation_custom(
grid::linesGrob(
x = unit(c(0.49, 0.25), 'npc'),
y = unit(c(-10, -10), 'mm'),
arrow = arrow(length = unit(3, 'mm')),
gp = grid::gpar(col = 'red'))
) +
annotation_custom(
grid::linesGrob(
x = unit(c(0.51, 0.75), 'npc'),
y = unit(c(-10, -10), 'mm'),
arrow = arrow(length = unit(3, 'mm')),
gp = grid::gpar(col = 'blue'))
) +
labs(
y = "",
x = "Hazard ratio") +
scale_x_log10(
breaks = c(0.1, 0.3, 1, 3, 10),
limits = c(0.1,10)) +
ggforce::facet_col(
facets = ~group,
scales = "free_y",
space = "free"
) +
coord_cartesian(clip = 'off') +
theme(
legend.position = "bottom",
legend.title = element_blank(),
strip.text = element_text(hjust = 0),
axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)),
panel.spacing.y = (unit(15, 'mm'))
)

Related

SHAP Summary Plot for XGBoost model in R without displaying Mean Absolute SHAP value on the plot

I don't want to display the Mean Absolute Values on my SHAP Summary Plot in R. I want an output similar to the one produced in python. What line of code will help remove the mean absolute values from the summary plot in R?
I'm currently using this line of code:
shap.plot.summary.wrap1(xgb_model, X = x, top_n = 10)
You can do this by sligtly modifying the source code of shap.plot.summary() as below:
shap.plot.summary.edited <- function(data_long,
x_bound = NULL,
dilute = FALSE,
scientific = FALSE,
my_format = NULL){
if (scientific){label_format = "%.1e"} else {label_format = "%.3f"}
if (!is.null(my_format)) label_format <- my_format
# check number of observations
N_features <- setDT(data_long)[,uniqueN(variable)]
if (is.null(dilute)) dilute = FALSE
nrow_X <- nrow(data_long)/N_features # n per feature
if (dilute!=0){
# if nrow_X <= 10, no dilute happens
dilute <- ceiling(min(nrow_X/10, abs(as.numeric(dilute)))) # not allowed to dilute to fewer than 10 obs/feature
set.seed(1234)
data_long <- data_long[sample(nrow(data_long),
min(nrow(data_long)/dilute, nrow(data_long)/2))] # dilute
}
x_bound <- if (is.null(x_bound)) max(abs(data_long$value))*1.1 else as.numeric(abs(x_bound))
plot1 <- ggplot(data = data_long) +
coord_flip(ylim = c(-x_bound, x_bound)) +
geom_hline(yintercept = 0) + # the y-axis beneath
# sina plot:
ggforce::geom_sina(aes(x = variable, y = value, color = stdfvalue),
method = "counts", maxwidth = 0.7, alpha = 0.7) +
# print the mean absolute value:
#geom_text(data = unique(data_long[, c("variable", "mean_value")]),
# aes(x = variable, y=-Inf, label = sprintf(label_format, mean_value)),
# size = 3, alpha = 0.7,
# hjust = -0.2,
# fontface = "bold") + # bold
# # add a "SHAP" bar notation
# annotate("text", x = -Inf, y = -Inf, vjust = -0.2, hjust = 0, size = 3,
# label = expression(group("|", bar(SHAP), "|"))) +
scale_color_gradient(low="#FFCC33", high="#6600CC",
breaks=c(0,1), labels=c(" Low","High "),
guide = guide_colorbar(barwidth = 12, barheight = 0.3)) +
theme_bw() +
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(), # remove axis line
legend.position="bottom",
legend.title=element_text(size=10),
legend.text=element_text(size=8),
axis.title.x= element_text(size = 10)) +
# reverse the order of features, from high to low
# also relabel the feature using `label.feature`
scale_x_discrete(limits = rev(levels(data_long$variable))#,
#labels = label.feature(rev(levels(data_long$variable)))
)+
labs(y = "SHAP value (impact on model output)", x = "", color = "Feature value ")
return(plot1)
}

Adding hatches or patterns to ggplot bars [duplicate]

This question already has an answer here:
How can I add hatches, stripes or another pattern or texture to a barplot in ggplot?
(1 answer)
Closed 1 year ago.
Suppose I want to show in a barplot the gene expression results (logFC) based on RNA-seq and q-PCR analysis. My dataset looks like that:
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
My R command line
p=ggplot(f1, aes(x=geneID, y=logfc, fill= comp,color=exp))+
geom_bar(stat="identity", position =position_dodge2(preserve="single"))+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))```
I have this output:
I want to get any mark patterns or hatches on the bars corresponding to one of the variables (exp or comp) and adding the upper error bars as shown in this plot bellow:
Any help please?
Following the linked answer, it seems quite natural how to extend it to your case. In the example below, I'm using some dummy data structured like the head() data you gave, since the csv link gave me a 404.
library(ggplot2)
library(ggpattern)
#>
#> Attaching package: 'ggpattern'
#> The following objects are masked from 'package:ggplot2':
#>
#> flip_data, flipped_names, gg_dep, has_flipped_aes, remove_missing,
#> should_stop, waiver
# Setting up some dummy data
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
ggplot(f1, aes(x = geneID, y = logfc, fill = comp)) +
geom_col_pattern(
aes(pattern = exp),
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_discrete(
guide = guide_legend(override.aes = list(pattern = "none")) # <- hide pattern
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT: if you want to repeat the hatching in the fill legend, you can make an interaction() and then customise a manual fill scale.
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT2: Now with errorbars:
library(ggplot2)
library(ggpattern)
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
geom_errorbar(
aes(
ymin = logfc,
ymax = logfc + sign(logfc) * SE,
group = interaction(geneID, comp, exp)
),
position = "dodge"
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-22 by the reprex package (v1.0.0)

I am trying to create an exponent instead of R^2

I am using ggplot2 to create a scatter plot of 2 variables. I want to have these printed out on the caption portion of ggplot:
linear regression equation
r2 value
p-value
I am using brackets, new lines and stored values to concatenate everything together. I have attempted using expression(), parse() and bquote() functions but it only prints out the variable name and not the stored values.
This is the graph I have now. Everything looks great other than the R^2 part. Brackets seem to cause a lot of problems but I want to keep them (looks better in my opinion).This is my ggplot script. I am only concerned about the caption section at the end.
Difficult to work with the code you have provided as an example (see comment re: reproducible example), but I had my students complete a similar exercise for their homework recently, and can provide an example which you can likely generalize from. My approach is to use the TeX() function from the latex2exp package.
A psychologist is interested in whether she can predict GPA in graduate school from students' earlier scores on the Graduate Record Exam (GRE).
Setup the Toy Data and Regression Model
GPA <- c(3.70,3.18,2.90,2.93,3.02,2.65,3.70,3.77,3.41,2.38,
3.54,3.12,3.21,3.35,2.60,3.25,3.48,2.74,2.90,3.28)
GRE <- c(637,562,520,624,500,500,700,680,655,525,
593,656,592,689,550,536,629,541,588,619)
gpa.gre <- data.frame(GPA, GRE)
mod <- lm(GPA ~ GRE, data = gpa.gre)
mod.sum <- summary(mod)
print(cofs <- round(mod$coefficients, digits = 4))
aY <- cofs[[1]]
bY <- cofs[[2]]
print(Rsqr <- round(cor(GPA,GRE)^2, digits = 2))
Generate the Plot
require(ggplot2)
require(latex2exp)
p <- ggplot(data = gpa.gre, aes(x = GRE, y = GPA)) +
geom_smooth(formula = 'y ~ x', color ="grey40", method = "lm",
linetype = 1, lwd = 0.80, se = TRUE, alpha = 0.20) +
geom_point(color = "grey10", size = 1) +
labs(y = "Grade Point Average", x = "GRE Score") +
coord_cartesian(ylim = c(2.28, 3.82), xlim = c(498, 702), clip = "off") +
scale_y_continuous(breaks = seq(2.30, 3.80, 0.25)) +
scale_x_continuous(breaks = seq(500, 700, 50)) +
theme_classic() +
theme(axis.title.x = element_text(margin = unit(c(3.5,0,0,0), "mm"), size = 11.5),
axis.title.y = element_text(margin = unit(c(0,3.5,0,0), "mm"), size = 11.5),
axis.text = element_text(size = 10),
plot.margin = unit(c(0.25,4,1,0.25), "cm"))
# Use TeX function to use LaTeX
str_note <- TeX("\\textit{Note. ***p} < .001")
str_eq <- TeX("$\\hat{\\textit{y}} = 0.4682 + 0.0045 \\textit{x}$")
str_rsq <- TeX("$\\textit{R}^2 = .54***$")
# Create annotations
p + annotate("text", x = 728, y = 3.70, label = str_eq, size = 3.5,
hjust = 0, na.rm = TRUE) +
annotate("text", x = 728, y = 3.57, label = str_rsq, size = 3.5,
hjust = 0, na.rm = TRUE) +
annotate("text", x = 490, y = 1.80, label = str_note, size = 3.5,
hjust = 0, na.rm = TRUE)
Get Result
ggsave(filename = '~/Documents/gregpa.png', # your favourite file path here
width = unit(5, "in"), # width of plot
height = unit(4, "in"), # height of plot
dpi = 400) # resolution in dots per inch

How to add a vertical blank space between straight and inverted geom_density() with ggplot2

I am trying to reproduce this kind of Figure, with two densities, a first one pointing upwards and a second one pointing downwards. I would also like to have some blank space between the two densities.
Here is the code I am currently using.
library(hrbrthemes)
library(tidyverse)
library(RWiener)
# generating data
df <- rwiener(n = 1e2, alpha = 2, tau = 0.3, beta = 0.5, delta = 0.5)
df %>%
ggplot(aes(x = q) ) +
geom_density(
data = . %>% filter(resp == "upper"),
aes(y = ..density..),
colour = "steelblue", fill = "steelblue",
outline.type = "upper", alpha = 0.8, adjust = 1, trim = TRUE
) +
geom_density(
data = . %>% filter(resp == "lower"),
aes(y = -..density..), colour = "orangered", fill = "orangered",
outline.type = "upper", alpha = 0.8, adjust = 1, trim = TRUE
) +
# stimulus onset
geom_vline(xintercept = 0, lty = 1, col = "grey") +
annotate(
geom = "text",
x = 0, y = 0,
# hjust = 0,
vjust = -1,
size = 3, angle = 90,
label = "stimulus onset"
) +
# aesthetics
theme_ipsum_rc(base_size = 12) +
theme(axis.text.y = element_blank() ) +
labs(x = "Reaction time (in seconds)", y = "") +
xlim(0, NA)
Which results in something like...
How could I add some vertical space between the two densities to reproduce the above Figure?
If you want to try without faceting, you're probably best to just plot the densities as polygons with adjusted y values according to your desired spacing:
s <- 0.25 # set to change size of the space
ud <- density(df$q[df$resp == "upper"])
ld <- density(df$q[df$resp == "lower"])
x <- c(ud$x[1], ud$x, ud$x[length(ud$x)],
ld$x[1], ld$x, ld$x[length(ld$x)])
y <- c(s, ud$y + s, s, -s, -ld$y - s, -s)
df2 <- data.frame(x = x, y = y,
resp = rep(c("upper", "lower"), each = length(ud$x) + 2))
df2 %>%
ggplot(aes(x = x, y = y, fill = resp, color = resp) ) +
geom_polygon(alpha = 0.8) +
scale_fill_manual(values = c("steelblue", "orangered")) +
scale_color_manual(values = c("steelblue", "orangered"), guide = guide_none()) +
geom_vline(xintercept = 0, lty = 1, col = "grey") +
annotate(
geom = "text",
x = 0, y = 0,
# hjust = 0,
vjust = -1,
size = 3, angle = 90,
label = "stimulus onset"
) +
# aesthetics
theme_ipsum_rc(base_size = 12) +
theme(axis.text.y = element_blank() ) +
labs(x = "Reaction time (in seconds)", y = "")
you can try facetting
set.seed(123)
q=rbeta(100, 0.25, 1)
df_dens =data.frame(gr=1,
x=density(df$q)$x,
y=density(df$q)$y)
df_dens <- rbind(df_dens,
data.frame(gr=2,
x=density(df$q)$x,
y=-density(df$q)$y))
ggplot(df_dens, aes(x, y, fill = factor(gr))) +
scale_x_continuous(limits = c(0,1)) +
geom_area(show.legend = F) +
facet_wrap(~gr, nrow = 2, scales = "free_y") +
theme_minimal() +
theme(strip.background = element_blank(),
strip.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank())
The space between both plots can be increased using panel.spacing = unit(20, "mm"). Instead of facet_grid you can also try facet_grid(gr~., scales = "free_y")

Complex Chart in R/ggplot with Proper Legend Display

This is my first question to StackExchange, and I've searched for answers that have been helpful, but haven't really gotten me to where I'd like to be.
This is a stacked bar chart, combined with a point chart, combined with a line.
Here's my code:
theme_set(theme_light())
library(lubridate)
FM <- as.Date('2018-02-01')
x.range <- c(FM - months(1) - days(1) - days(day(FM) - 1), FM - days(day(FM) - 1) + months(1))
x.ticks <- seq(x.range[1] + days(1), x.range[2], by = 2)
#populate example data
preds <- data.frame(FM = FM, DATE = seq(x.range[1] + days(1), x.range[2] - days(1), by = 1))
preds <- data.frame(preds, S_O = round(seq(1, 1000000, by = 1000000/nrow(preds))))
preds <- data.frame(preds, S = round(ifelse(month(preds$FM) == month(preds$DATE), day(preds$DATE) / 30.4, 0) * preds$S_O))
preds <- data.frame(preds, O = preds$S_O - preds$S)
preds <- data.frame(preds, pred_sales = round(1000000 + rnorm(nrow(preds), 0, 10000)))
preds$ma <- with(preds, stats::filter(pred_sales, rep(1/5, 5), sides = 1))
y.max <- ceiling(max(preds$pred_sales) / 5000) * 5000 + 15000
line.cols <- c(O = 'palegreen4', S = 'steelblue4',
P = 'maroon', MA = 'blue')
fill.cols <- c(O = 'palegreen3', S = 'steelblue3',
P = 'red')
p <- ggplot(data = preds,
mapping = aes(DATE, pred_sales))
p <- p +
geom_bar(data = reshape2::melt(preds[,c('DATE', 'S', 'O')], id.var = 'DATE'),
mapping = aes(DATE, value, group = 1, fill = variable, color = variable),
width = 1,
stat = 'identity',
alpha = 0.5) +
geom_point(mapping = aes(DATE, pred_sales, group = 2, fill = 'P', color = 'P'),
shape = 22, #square
alpha = 0.5,
size = 2.5) +
geom_line(data = preds[!is.na(preds$ma),],
mapping = aes(DATE, ma, group = 3, color = 'MA'),
alpha = 0.8,
size = 1) +
geom_text(mapping = aes(DATE, pred_sales, label = formatC(pred_sales / 1000, format = 'd', big.mark = ',')),
angle = 90,
size = 2.75,
hjust = 1.25,
vjust = 0.4) +
labs(title = sprintf('%s Sales Predictions - %s', 'Overall', format(FM, '%b %Y')),
x = 'Date',
y = 'Volume in MMlbs') +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 8),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
legend.title = element_blank(),
legend.position = 'bottom',
legend.text = element_text(size = 8),
legend.margin = margin(t = 0.25, unit = 'cm')) +
scale_x_date(breaks = x.ticks,
date_labels = '%b %e',
limits = x.range) +
scale_y_continuous(limits = c(0, y.max),
labels = function(x) { formatC(x / 1000, format='d', big.mark=',') }) +
scale_color_manual(values = line.cols,
breaks = c('MA'),
labels = c(MA = 'Mvg Avg (5)')) +
scale_fill_manual(values = fill.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions'))
p
The chart it generates is this:
As you can see, the legend does a couple of funky things. It's close, but not quite there. I only want boxes with exterior borders for Predictions, Open Orders, and Sales, and only a blue line for the Mvg Avg (5).
Any advice would be appreciated.
Thanks!
Rather late, but if you are still interested to understand this problem, the following should work. Explanations are included as comments within the code:
library(dplyr)
preds %>%
# scale the values for ALL numeric columns in the dataset, before
# passing the dataset to ggplot()
mutate_if(is.numeric, ~./1000) %>%
# since x / y mappings are stated in the top level ggplot(), there's
# no need to repeat them in the subsequent layers UNLESS you want to
# override them
ggplot(mapping = aes(x = DATE, y = pred_sales)) +
# 1. use data = . to inherit the top level data frame, & modify it on
# the fly for this layer; this is neater as you are essentially
# using a single data source for the ggplot object.
# 2. geom_col() is a more succinct way to say geom_bar(stat = "identity")
# (I'm using tidyr rather than reshape package, since ggplot2 is a
# part of the tidyverse packages, & the two play together nicely)
geom_col(data = . %>%
select(S, O, DATE) %>%
tidyr::gather(variable, value, -DATE),
aes(y = value, fill = variable, color = variable),
width = 1, alpha = 0.5) +
# don't show legend for this layer (o/w the fill / color legend would
# include a square shape in the centre of each legend key)
geom_point(aes(fill = 'P', color = 'P'),
shape = 22, alpha = 0.5, size = 2.5, show.legend = FALSE) +
# use data = . %>% ... as above.
# since the fill / color aesthetic mappings from the geom_col layer would
# result in a border around all fill / color legends, avoid it all together
# here by hard coding the line color to "blue", & map its linetype instead
# to create a separate linetype-based legend later.
geom_line(data = . %>% na.omit(),
aes(y = ma, linetype = 'MA'),
color = "blue", alpha = 0.8, size = 1) +
# scales::comma is a more succinct alternative to formatC for this use case
geom_text(aes(label = scales::comma(pred_sales)),
angle = 90, size = 2.75, hjust = 1.25, vjust = 0.4) +
labs(title = sprintf('%s Sales Predictions - %s', 'Overall', format(FM, '%b %Y')),
x = 'Date',
y = 'Volume in MMlbs') +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 8),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
legend.title = element_blank(),
legend.position = 'bottom',
legend.text = element_text(size = 8),
legend.margin = margin(t = 0.25, unit = 'cm')) +
scale_x_date(breaks = x.ticks,
date_labels = '%b %e',
limits = x.range) +
# as above, scales::comma is more succinct
scale_y_continuous(limits = c(0, y.max / 1000),
labels = scales::comma) +
# specify the same breaks & labels for the manual fill / color scales, so that
# a single legend is created for both
scale_color_manual(values = line.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions')) +
scale_fill_manual(values = fill.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions')) +
# create a separate line-only legend using the linetype mapping, with
# value = 1 (i.e. unbroken line) & specified alpha / color to match the
# geom_line layer
scale_linetype_manual(values = 1,
label = 'Mvg Avg (5)',
guide = guide_legend(override.aes = list(alpha = 1,
color = "blue")))

Resources