How to add legend to plot from CausalImpact package? - r

I see the other post here about this, but I'm relatively new to R so the answers weren't helpful to me. I'd really appreciate some more in-depth help with how to do this.
I've already made a plot using the commands from the Causal Impact package. In the package documentation, it clearly says that the plots are ggplot2 objects and can be customized the same way as any other object like that. I've successfully done that, adding titles and customizing colors. I need to add a legend (it's required at the journal I'm submitting to). Here is an example of what my graph currently looks like and the code I used to get there.
library(ggplot2)
devtools::install_github("google/CausalImpact")
library(CausalImpact)
## note that I took this example code from the package documentation up until I customize the plot
#create data
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)
y <- 1.2 * x1 + rnorm(100)
y[71:100] <- y[71:100] + 10
data <- cbind(y, x1)
#causal impact analysis
> pre.period <- c(1, 70)
> post.period <- c(71, 100)
> impact <- CausalImpact(data, pre.period, post.period)
#graph
example<-plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) # get rid of minor grid
In my head, the solution I'd like is to have a legend for each panel of the plot. The first legend (next to the 'original' panel) would show a solid line represents the observed data, the dotted line represents the estimated counterfactual, and the colored band represents the 95% CrI around the estimated counterfactual. The second legend (next to the 'cumulative' panel) would show the dotted line represents the estimated change in trend associated with the intervention and the colored band again represents the 95% CrI around the estimation. Maybe there's a better solution than that, but that's what I've thought of.
Here is a section of the underlying code that runs when you plot:
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
# Add prediction intervals
q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
data, fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(aes(y = baseline),
colour = "darkgrey", size = 0.8, linetype = "solid",
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(aes(y = mean), data,
size = 0.6, colour = "darkblue", linetype = "dashed",
na.rm = TRUE)
# Add observed data
q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
One of the answers in that older post here said that I'd have to adapt the pre-existing function to get a legend, and I don't really have the skills yet to see what I'd have to change or add. I thought that legends were supposed to be automatically added according to what's in the aes() bit of the ggplot code, so I'm a little confused why there isn't one in the first place. Can someone help me with this?

Here is an updated/edited version of an earlier solution in order to merge aesthetics into one legend. The requirement was to merge linetype and fill (ribbon color) into one legend.
In order to merge legends, the same aesthetics have to be used in the geoms and the scales have to account for the different variables, have the same name and the same labels. So geom_ribbon() needs to have a linetype in the aes() as well as fill, and the geom_line() needs to have a fill in the aes() as well as the linetype. One side effect of adding a linetype to geom_ribbon() is that you then get a line around both edges of the band. On the other hand, fill is not applicable to geom_line so you just get a warning message that the fill aesthetic will be ignored.
The way to address this is to apply a linetype of "blank" to the relevant value in scale_linetype_manual(). Similarly, we use "transparent" in scale_fill_manual() to avoid applying a color to the other elements of the scale.
What I didn't realize before working through this is that it is possible to create a legend for an aesthetic for values across multiple variables. The values just have to be mapped appropriately in the scale. So I truly learned something new putting this together.
CreateImpactPlot <- function(impact, metrics = c("original", "cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Make data longer
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Initialize plot
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q3 <- ggplot(data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
q1 <- q1 + geom_ribbon(data = data %>%
filter(metric == "original") %>%
mutate(metric = factor(metric, levels = c("original"))), aes(x = time, ymin = lower, ymax = upper, fill = metric,
linetype = metric))
q3 <- q3 + geom_ribbon(data = data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time, ymin = lower, ymax = upper, fill = metric))
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to cumulative plot
# Add point predictions
# Add observed data
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable, fill = variable, color = variable),
na.rm = TRUE)+
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("dashed", "solid", "blank"), limits = c("mean", "response","original")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "response","original")) + #limits controls the order in the legend
scale_size_manual(values = c(0.6, 0.8, 0.5)) +
scale_color_manual(values = c("darkgray", "darkblue")) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.title.y = element_blank()) +
guides(size = "none", color = "none")+
facet_wrap(~metric[1], strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable,
fill = variable),
na.rm = TRUE) +
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("dashed", "solid", "blank"), limits = c("mean", "baseline","cumulative")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "baseline","cumulative")) + #limits controls the order in the legend
theme(legend.position = "right", axis.title.y = element_blank())+
labs(x = "Time") +
facet_wrap(~metric, strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
g1 <- grid::textGrob("Clicks (Millions)", rot = 90, gp=gpar(fontsize = 15), x= 0.85)
wrap_elements(g1) | (q1/q3)
patchwork <- wrap_elements(g1) | (q1/q3)
q <- patchwork
return(q)
}
# To run the function
plot(impact, c("original", "cumulative")) +
plot_annotation(title = "Figure. Analysis of click behavior after intervention"
, theme = theme(plot.title = element_text(hjust = 0.5))) &
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())

I rewrote the plot function. Instead of using facet_wrap(), I created individual plots with their own legends and used patchwork to group them together into a single plot. In order to run this you need to memory all of the source code including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R with the exception of the CreateImpactPlot function which I recreated. So instead, run what I have below. You will also need to load ggplot2, tidyr, dplyr, and patchwork. This will only run for Original and Cumulative metrics. Though I revised to some extent for Pointwise, I did not want to do this as I didn't have an example to reproduce. I worked your theme preferences directly into the code in the function. You should be able to identify and change those elements now at your leisure. To be clear, the plots are q1 = original, q2 = pointwise, and q3 = cumulative. I don't see how to bring the confidence band into the legend as it is not part of aes(). Possibly could create a grob from scratch. I just referenced it in the title which you can change if it doesn't suit you. Hopefully this helps.
"cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Initialize plot
#q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
#q <- q + xlab("") + ylab("")
#if (length(metrics) > 1) {
# q <- q + facet_grid(metric ~ ., scales = "free_y")
#}
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q2 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q2 <- q2 + xlab("") + ylab("")
q3 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
#q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
# data, fill = "slategray2")
q1 <- q1 + geom_ribbon(data = data %>% dplyr::filter(metric == "original"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q2 <- q2 + geom_ribbon(data = data %>% dplyr::filter(metric == "pointwise"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q3 <- q3 + geom_ribbon(data = data %>% dplyr::filter(metric == "cumulative"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
#q <- q + geom_vline(xintercept = xintercept,
# colour = "darkgrey", size = 0.8, linetype = "dashed")
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q2 <- q2 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Add zero line to pointwise and cumulative plot
#q <- q + geom_line(aes(y = baseline),
# colour = "darkgrey", size = 0.8, linetype = "solid",
# na.rm = TRUE)
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable),
na.rm = TRUE)+
scale_linetype_manual(guide = "Legend", labels = c("estimated counterfactual", "oberserved"),
values = c("dashed", "solid")) +
scale_size_manual(values = c(0.6, 0.8)) +
scale_color_manual(values = c("darkblue", "darkgrey")) +
theme(legend.position = "right") +
guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
labs(title = "Original", y = "Clicks (Millions)") +
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
#q2 <- q2 + geom_line(data = data_long %>% dplyr::filter(metric == "pointwise"),
# aes(x = time, y = value, linetype = Line, group = Line),
# na.rm = TRUE) +
# scale_linetype_manual(title = "Legend", labels = c("estimated counterfactual", "observed"),
# values = c("dashed", "solid")) +
# scale_size_manual(values = c(0.6, 0.8)) +
# scale_color_manual(values = c("darkblue", "darkgrey")) +
# theme(legend.position = "right") +
# guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
# labs(title = "Pointwise", y = "Clicks (Millions)")
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable),
na.rm = TRUE) +
scale_linetype_manual(labels = c("observed", "estimated trend change"),
values = c("solid", "dashed")) +
theme(legend.position = "right")+
guides(linetype = guide_legend("Legend", nrow=2))+
labs(title = "Cumulative",x = "Time", y = "Clicks (Millions)")+
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
patchwork <- q1 / q3
q <- patchwork + plot_annotation(title = "Figure. Analysis of click behavior after intervention with
95% Confidence Interval")
# Add point predictions
#q <- q + geom_line(aes(y = mean), data,
# size = 0.6, colour = "darkblue", linetype = "dashed",
# na.rm = TRUE)
# Add observed data
#q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
plot(impact, c("original", "cumulative"))

Here is a rebuild of the CreateImpactPlot() function that will work for all three metrics. The legends can be modified. I introduced more colors and linetypes so that the legends could be applicable across all the facets.
The base case looks like this:
plot(impact)
You will note that the labels in the legend for the ribbons and for the lines refer to the metrics. These are placeholder labels that you can then modify.
line_labels <- c("cumulative_mean" = "change in trend", "baseline" = "baseline", "original_mean" =
"estimated counterfactual", "original_response" = "observed")
plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) + # get rid of minor grid
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "cumulative" = "darkseagreen"),
labels = c("original" = "counterfactual", "cumulative" = "estimation")) +
scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid")) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod"))
The vector "line_labels" is where you define the text you want to appear in the Legend. You will note that I removed the pointwise related values as I am excluding the pointwise metric from the plot. The scale_linetype_manual and scale_color_manual have to have the Name and labels kept in synch in order to have a combined legend, otherwise you will have two separate legends. The scale_fill_manual is for the ribbons. For these scales, you can change the names, the labels and the values as you desire. You can copy the code out of the function, revise it, and add it to the plot call as shown above.
Here is the code for the revised function. In the example, everything should be run and "impact" generated from the CausalImpact package. Then all of the package code needs to be loaded into memory including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R. Then load the code below.
CreateImpactPlot2 <- function(impact, metrics = c("original", "pointwise","cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE) %>%
mutate(variable2 = factor(ifelse(variable == "baseline", variable, paste0(metric,"_", variable))),
variable = factor(variable))
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
#Add prediction intervals
q <- q + geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = metric), data_long)
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "baseline"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "mean"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add observed data
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "response"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
#Add scales
line_labels <- c("cumulative_mean" = "cumulative_mean", "baseline" = "baseline", "original_mean" =
"original_mean", "original_response" = "original_response", "pointwise_mean"=
"pointwise_mean")
q <- q + scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid", "pointwise_mean"=
"solid")) +
scale_size_manual(values = c("cumulative_mean" = 0.6, "baseline" = 0.8, "original_mean"= 0.6, "original_response" = 0.5,
"pointwise_mean"= 0.6)) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod",
"pointwise_mean"= "darkgreen")) +
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "pointwise" = "pink3", "cumulative" = "darkseagreen"),
labels = c("original" = "original", "pointwise" = "pointwise", "cumulative" = "cumulative")) +
guides(size = "none")
return(q)
}
plot.CausalImpact <- function(x, ...) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# x: A \code{CausalImpact} results object, as returned by
# \code{CausalImpact()}.
# ...: Can be used to specify \code{metrics}, which determines which panels
# to include in the plot. The argument \code{metrics} can be any
# combination of "original", "pointwise", "cumulative". Partial matches
# are allowed.
#
# Returns:
# A ggplot2 object that can be plotted using plot().
#
# Examples:
# \dontrun{
# impact <- CausalImpact(...)
#
# # Default plot:
# plot(impact)
#
# # Customized plot:
# impact.plot <- plot(impact) + ylab("Sales")
# plot(impact.plot)
# }
return(CreateImpactPlot2(x, ...))
}

Related

Ordering y axis by another variable in a ggolot bar plot

I have a swimlane plot which I want to order by a group variable. I was also wondering if it is possible to label the groups on the ggplot.
Here is the code to create the data set and plot the data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "10034","10034","10034","10034","10054","10054","10054","1003","1003","1003","1003"),
exdose = c(5,10,20,5,5,10,20,20,5,10,20,5,20,10,5),
p= c(1,2,3,4,1,2,3,4,1,2,3,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5,3,5,6,3,5,6,7),
group =c("grp1","grp1","grp1","grp1","grp2","grp2","grp2","grp2","grp1","grp1","grp1","grp2","grp2","grp2","grp2")
)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE))
I want the y axis order by group and I want a label on the side to label the groups if possible
you can see from the plot it is ordered by subject number but I want it ordered by group and some indicator of group.
I tried reorder but I was unsuccessful in getting the desired plot.
As Stefan points out, facets are probably the way to go here, but you can use them with subtle theme tweaks to make it look as though you have just added a grouping variable on the y axis:
library(tidyverse)
dataset %>%
mutate(group = factor(group),
subject = reorder(subject, as.numeric(group)),
exdose = factor(exdose)) %>%
ggplot(aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = exdose), color = "gray50",
position = position_stack(reverse = TRUE)) +
scale_y_discrete(expand = c(0.1, 0.4)) +
scale_fill_brewer(palette = "Set2") +
facet_grid(group ~ ., scales = "free_y", switch = "y") +
theme_minimal(base_size = 16) +
theme(strip.background = element_rect(color = "gray"),
strip.text = element_text(face = 2),
panel.spacing.y = unit(0, "mm"),
panel.background = element_rect(fill = "#f9f8f6", color = NA))

Adding a legend for vertical lines of histograms

I'm trying to put a legend for a graph I am creating. The idea is to compare the mean and medians of a skewed and symmetric distribution. This is what I currently have as the code however the
show.legend = TRUE
code doesn't do the job.
set.seed(19971222)
sym <- as.data.frame(cbind(c(1:500), rchisq(500, df = 2))) # generate 500 random numbers from a symetric distribution
colnames(sym) <- c("index", "rnum")
sym_mean <- mean(sym$rnum)
sym_med <- median(sym$rnum)
# get into a format that tidyverse likes
central_measures <- as.data.frame(cbind(sym_mean, sym_med))
colnames(central_measures) <- c("mean", "median")
sym %>% ggplot(aes(sym$rnum)) +
geom_histogram(binwidth = 0.4, fill = "steelblue", colour = "navy", alpha = 0.9) +
geom_vline(xintercept = sym_mean, colour = "red", show.legend = TRUE) +
geom_vline(xintercept = sym_med, colour = "yellow", show.legend = TRUE) +
labs(title = "Histogram of 500 Randomly Generated Numbers from the Chi-Squared Distribution",
x = "Value",
y = "Frequency") +
theme_minimal()
I just want to have a legend on the side saying that the red is the "Mean" and the yellow is the "Median".
Thank you!
Heyyy, sorry I got side-tracked for a bit and my first suggestions was a little off. Here is one way to accomplish your goal of adding a legend for your centrality measures.
# use this instead of central_measures
central_values <- data.frame(measurement = c("mean", "median"),
value = c(sym_mean, sym_med))
sym %>% ggplot(aes(sym$rnum)) +
geom_histogram(binwidth = 0.4, fill = "steelblue", colour = "navy", alpha = 0.9) +
geom_vline(data = central_values, aes(xintercept = value, color = measurement)) +
scale_color_manual(values = c("red", "orange"), name = NULL) +
labs(title = "Histogram of 500 Randomly Generated Numbers from the Chi-Squared Distribution",
x = "Value",
y = "Frequency") +
theme_minimal()
Let me know if you have any other troubles and sorry again for leading you astray with my comment!

using y-axis values to create secondary x-axis in ggplot2

I would like to create a dot plot with percentiles, which looks something like this-
Here is the ggplot2 code I used to create the dot plot. There are two things I'd like to change:
I can plot the percentile values on the y-axis but I want these
values on the x-axis (as shown in the graph above). Note that
the coordinates are flipped.
The axes don't display label for the
minimum value (for example the percentile axis labels start at 25
when they should start at 0 instead.)
# loading needed libraries
library(tidyverse)
library(ggstatsplot)
# creating dataframe with mean mileage per manufacturer
cty_mpg <- ggplot2::mpg %>%
dplyr::group_by(.data = ., manufacturer) %>%
dplyr::summarise(.data = ., mileage = mean(cty, na.rm = TRUE)) %>%
dplyr::rename(.data = ., make = manufacturer) %>%
dplyr::arrange(.data = ., mileage) %>%
dplyr::mutate(.data = ., make = factor(x = make, levels = .$make)) %>%
dplyr::mutate(
.data = .,
percent_rank = (trunc(rank(mileage)) / length(mileage)) * 100
) %>%
tibble::as_data_frame(x = .)
# plot
ggplot2::ggplot(data = cty_mpg, mapping = ggplot2::aes(x = make, y = mileage)) +
ggplot2::geom_point(col = "tomato2", size = 3) + # Draw points
ggplot2::geom_segment(
mapping = ggplot2::aes(
x = make,
xend = make,
y = min(mileage),
yend = max(mileage)
),
linetype = "dashed",
size = 0.1
) + # Draw dashed lines
ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(trans = ~(trunc(rank(.)) / length(.)) * 100, name = "percentile")) +
ggplot2::coord_flip() +
ggplot2::labs(
title = "City mileage by car manufacturer",
subtitle = "Dot plot",
caption = "source: mpg dataset in ggplot2"
) +
ggstatsplot::theme_ggstatsplot()
Created on 2018-08-17 by the reprex package (v0.2.0.9000).
I am not 100% sure to have understood what you really want, but below is my attempt to reproduce the first picture with mpg data:
require(ggplot2)
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
g <- ggplot(data, aes(y = rank, x = cty))
g <- g + geom_point(size = 2)
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
g <- g + scale_x_continuous(name = "Mileage", limits = c(10, 25),
sec.axis = dup_axis(name = element_blank()))
g <- g + theme_classic()
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
print(g)
That produces:
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
These two lines generate the data for the graph. Basically we need the manufacturers, the mileage (average of cty by manufacturer) and the rank.
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
Note that here the scale is using rank and not the column manufacturer. To display the name of the manufacturers, you must use the labels property and you must force the breaks to be for every values (see property breaks).
The second y-axis is generated using the sec.axis property. This is very straight-forward using dup_axis that easily duplicate the axis. By replacing the labels and the breaks, you can display the %-value.
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
The horizontal lines are just the major grid. This is much easier to manipulate than geom_segments in my opinion.
Regarding your question 1, you can flip the coordinates easily using coord_flip, with minor adjustments. Replace the following line:
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted")
By the following two lines:
g <- g + coord_flip()
g <- g + theme(panel.grid.major.x = element_line(color = "black", linetype = "dotted"),
axis.text.x = element_text(angle = 90, hjust = 1))
Which produces:
Regarding your question 2, the problem is that the value 0% is outside the limits. You can solve this issue by changing the way you calculate the percentage (starting from zero and not from one), or you can extend the limit of your plot to include the value zero, but then no point will be associated to 0%.

Create legend with manual shapes and colours

I use bars and line to create my plot. The demo code is:
timestamp <- seq(as.Date('2010-01-01'),as.Date('2011-12-01'),by="1 mon")
data1 <- rnorm(length(timestamp), 3000, 30)
data2 <- rnorm(length(timestamp), 30, 3)
df <- data.frame(timestamp, data1, data2)
p <- ggplot()
p <- p + geom_histogram(data=df,aes(timestamp,data1),colour="black",stat="Identity",bindwidth=10)
p <- p + geom_line(data=df,aes(timestamp,y=data2*150),colour="red")
p <- p + scale_y_continuous(sec.axis = sec_axis(~./150, name = "data2"))
p <- p + scale_colour_manual(name="Parameter", labels=c("data1", "data2"), values = c('black', 'red'))
p <- p+ scale_shape_manual(name="Parameter", labels=c("data1", "data2"), values = c(15,95))
p
This results in a plot like this:
This figure does not have a legend. I followed this answer to create a customized legend but it is not working in my case. I want a square and line shape in my legend corresponding to bars and line. How can we get it?
I want legend as shown in below image:
For the type of data you want to display, geom_bar is a better fit then geom_histogram. When you to manipulate the appaerance of the legend(s), you need to place the colour = ... parts inside the aes. To get the desired result it probably best to use different types of legend for the line and the bars. In that way you are better able to change the appearance of the legends with guide_legend and override.aes.
A proposal for your problem:
ggplot(data = df) +
geom_bar(aes(x = timestamp, y = data1, colour = "black"),
stat = "Identity", fill = NA) +
geom_line(aes(x = timestamp, y = data2*150, linetype = "red"), colour = "red", size = 1) +
scale_y_continuous(sec.axis = sec_axis(~./150, name = "data2")) +
scale_linetype_manual(labels = "data2", values = "solid") +
scale_colour_manual(name = "Parameter\n", labels = "data1", values = "black") +
guides(colour = guide_legend(override.aes = list(colour = "black", size = 1),
order = 1),
linetype = guide_legend(title = NULL,
override.aes = list(linetype = "solid",
colour = "red",
size = 1),
order = 2)) +
theme_minimal() +
theme(legend.key = element_rect(fill = "white", colour = NA),
legend.spacing = unit(0, "lines"))
which gives:

How to merge legends for color and shape when geom_hline has a separate (additional) entry in the color legend?

I have the following code, which produces the following plot:
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P"), show_guide = FALSE) +
scale_color_manual(name="Approach", breaks=c("C2P", "P2P", "CP2P"), values = cols[c(1,3,2)]) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4 <- p4 + guides(colour=guide_legend(override.aes=list(shape=c(NA,17,16))))
p4
When I try show_guide = FALSE in geom_point, the shape of the point in the upper legend are all set to default solid circles.
How can I make the lower legend to disappear, without affecting the upper legend?
This is a solution, complete with reproducible data:
library("ggplot2")
library("grid")
library("RColorBrewer")
cp2p <- data.frame(xval = 10 * 2:15, yval = cumsum(c(0.55, rnorm(13, 0.01, 0.005))), Approach = "CP2P", stringsAsFactors = FALSE)
p2p <- data.frame(xval = 10 * 1:15, yval = cumsum(c(0.7, rnorm(14, 0.01, 0.005))), Approach = "P2P", stringsAsFactors = FALSE)
pd <- position_dodge(0.1)
cp.best <- list(slope = 0.65)
all.m <- rbind(p2p, cp2p)
all.m$Approach <- factor(all.m$Approach, levels = c("C2P", "P2P", "CP2P"))
all.m$se <- rnorm(29, 0.1, 0.02)
all.m[nrow(all.m) + 1, ] <- all.m[nrow(all.m) + 1, ] # Creates a new row filled with NAs
all.m$Approach[nrow(all.m)] <- "C2P"
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4, na.rm = TRUE) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P")) +
scale_color_manual(values = c(C2P = cols[1], P2P = cols[2], CP2P = cols[3])) +
scale_shape_manual(values = c(C2P = NA, P2P = 16, CP2P = 17)) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4
The trick is to make sure that all of the desired levels of all.m$Approach appear in all.m, even if one of them gets dropped out of the graph. The warning about the omitted point is suppressed by the na.rm = TRUE argument to geom_point.
Short answer:
Just add a dummy geom_point layer (transparent points) where shape is mapped to the same level as in geom_hline.
geom_point(aes(shape = "int"), alpha = 0)
Longer answer:
Whenever possible, ggplot merges / combines legends of different aesthetics. For example, if colour and shape is mapped to the same variable, then the two legends are combined into one.
I illustrate this using simple data set with 'x', 'y' and a grouping variable 'grp' with two levels:
df <- data.frame(x = rep(1:2, 2), y = 1:4, grp = rep(c("a", "b"), each = 2))
First we map both color and shape to 'grp'
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4)
Fine, the legends for the aesthetics, color and shape, are merged into one.
Then we add a geom_hline. We want it to have a separate color from the geom_lines and to appear in the legend. Thus, we map color to a variable, i.e. put color inside aes of geom_hline. In this case we do not map the color to a variable in the data set, but to a constant. We may give the constant a desired name, so we don't need to rename the legend entries afterwards.
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int"))
Now two legends appears, one for the color aesthetics of geom_line and geom_hline, and one for the shape of the geom_points. The reason for this is that the "variable" which color is mapped to now contains three levels: the two levels of 'grp' in the original data, plus the level 'int' which was introduced in the geom_hline aes. Thus, the levels in the color scale differs from those in the shape scale, and by default ggplot can't merge the two scales into one legend.
How to combine the two legends?
One possibility is to introduce the same, additional level for shape as for color by using a dummy geom_point layer with transparent points (alpha = 0) so that the two aesthetics contains the same levels:
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
geom_point(aes(shape = "int"), alpha = 0) # <~~~~ a blank geom_point
Another possibility is to convert the original grouping variable to a factor, and add the "geom_hline level" to the original levels. Then use drop = FALSE in scale_shape_discrete to include "unused factor levels from the scale":
datadf$grp <- factor(df$grp, levels = c(unique(df$grp), "int"))
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
scale_shape_discrete(drop = FALSE)
Then, as you already know, you may use the guides function to "override" the shape aesthetics in the legend, and remove the shape from the geom_hline entry by setting it to NA:
guides(colour = guide_legend(override.aes = list(shape = c(16, 17, NA))))

Resources