Adding a legend for vertical lines of histograms

Adding a legend for vertical lines of histograms - r

I'm trying to put a legend for a graph I am creating. The idea is to compare the mean and medians of a skewed and symmetric distribution. This is what I currently have as the code however the
show.legend = TRUE
code doesn't do the job.
set.seed(19971222)
sym <- as.data.frame(cbind(c(1:500), rchisq(500, df = 2))) # generate 500 random numbers from a symetric distribution
colnames(sym) <- c("index", "rnum")
sym_mean <- mean(sym$rnum)
sym_med <- median(sym$rnum)
# get into a format that tidyverse likes
central_measures <- as.data.frame(cbind(sym_mean, sym_med))
colnames(central_measures) <- c("mean", "median")
sym %>% ggplot(aes(sym$rnum)) +
geom_histogram(binwidth = 0.4, fill = "steelblue", colour = "navy", alpha = 0.9) +
geom_vline(xintercept = sym_mean, colour = "red", show.legend = TRUE) +
geom_vline(xintercept = sym_med, colour = "yellow", show.legend = TRUE) +
labs(title = "Histogram of 500 Randomly Generated Numbers from the Chi-Squared Distribution",
x = "Value",
y = "Frequency") +
theme_minimal()
I just want to have a legend on the side saying that the red is the "Mean" and the yellow is the "Median".
Thank you!

Heyyy, sorry I got side-tracked for a bit and my first suggestions was a little off. Here is one way to accomplish your goal of adding a legend for your centrality measures.
# use this instead of central_measures
central_values <- data.frame(measurement = c("mean", "median"),
value = c(sym_mean, sym_med))
sym %>% ggplot(aes(sym$rnum)) +
geom_histogram(binwidth = 0.4, fill = "steelblue", colour = "navy", alpha = 0.9) +
geom_vline(data = central_values, aes(xintercept = value, color = measurement)) +
scale_color_manual(values = c("red", "orange"), name = NULL) +
labs(title = "Histogram of 500 Randomly Generated Numbers from the Chi-Squared Distribution",
x = "Value",
y = "Frequency") +
theme_minimal()
Let me know if you have any other troubles and sorry again for leading you astray with my comment!

Related

How to add legend to plot from CausalImpact package?

I see the other post here about this, but I'm relatively new to R so the answers weren't helpful to me. I'd really appreciate some more in-depth help with how to do this.
I've already made a plot using the commands from the Causal Impact package. In the package documentation, it clearly says that the plots are ggplot2 objects and can be customized the same way as any other object like that. I've successfully done that, adding titles and customizing colors. I need to add a legend (it's required at the journal I'm submitting to). Here is an example of what my graph currently looks like and the code I used to get there.
library(ggplot2)
devtools::install_github("google/CausalImpact")
library(CausalImpact)
## note that I took this example code from the package documentation up until I customize the plot
#create data
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)
y <- 1.2 * x1 + rnorm(100)
y[71:100] <- y[71:100] + 10
data <- cbind(y, x1)
#causal impact analysis
> pre.period <- c(1, 70)
> post.period <- c(71, 100)
> impact <- CausalImpact(data, pre.period, post.period)
#graph
example<-plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) # get rid of minor grid
In my head, the solution I'd like is to have a legend for each panel of the plot. The first legend (next to the 'original' panel) would show a solid line represents the observed data, the dotted line represents the estimated counterfactual, and the colored band represents the 95% CrI around the estimated counterfactual. The second legend (next to the 'cumulative' panel) would show the dotted line represents the estimated change in trend associated with the intervention and the colored band again represents the 95% CrI around the estimation. Maybe there's a better solution than that, but that's what I've thought of.
Here is a section of the underlying code that runs when you plot:
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
# Add prediction intervals
q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
data, fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(aes(y = baseline),
colour = "darkgrey", size = 0.8, linetype = "solid",
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(aes(y = mean), data,
size = 0.6, colour = "darkblue", linetype = "dashed",
na.rm = TRUE)
# Add observed data
q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
One of the answers in that older post here said that I'd have to adapt the pre-existing function to get a legend, and I don't really have the skills yet to see what I'd have to change or add. I thought that legends were supposed to be automatically added according to what's in the aes() bit of the ggplot code, so I'm a little confused why there isn't one in the first place. Can someone help me with this?

Here is an updated/edited version of an earlier solution in order to merge aesthetics into one legend. The requirement was to merge linetype and fill (ribbon color) into one legend.
In order to merge legends, the same aesthetics have to be used in the geoms and the scales have to account for the different variables, have the same name and the same labels. So geom_ribbon() needs to have a linetype in the aes() as well as fill, and the geom_line() needs to have a fill in the aes() as well as the linetype. One side effect of adding a linetype to geom_ribbon() is that you then get a line around both edges of the band. On the other hand, fill is not applicable to geom_line so you just get a warning message that the fill aesthetic will be ignored.
The way to address this is to apply a linetype of "blank" to the relevant value in scale_linetype_manual(). Similarly, we use "transparent" in scale_fill_manual() to avoid applying a color to the other elements of the scale.
What I didn't realize before working through this is that it is possible to create a legend for an aesthetic for values across multiple variables. The values just have to be mapped appropriately in the scale. So I truly learned something new putting this together.
CreateImpactPlot <- function(impact, metrics = c("original", "cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Make data longer
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Initialize plot
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q3 <- ggplot(data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
q1 <- q1 + geom_ribbon(data = data %>%
filter(metric == "original") %>%
mutate(metric = factor(metric, levels = c("original"))), aes(x = time, ymin = lower, ymax = upper, fill = metric,
linetype = metric))
q3 <- q3 + geom_ribbon(data = data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time, ymin = lower, ymax = upper, fill = metric))
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to cumulative plot
# Add point predictions
# Add observed data
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable, fill = variable, color = variable),
na.rm = TRUE)+
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("dashed", "solid", "blank"), limits = c("mean", "response","original")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "response","original")) + #limits controls the order in the legend
scale_size_manual(values = c(0.6, 0.8, 0.5)) +
scale_color_manual(values = c("darkgray", "darkblue")) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.title.y = element_blank()) +
guides(size = "none", color = "none")+
facet_wrap(~metric[1], strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable,
fill = variable),
na.rm = TRUE) +
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("dashed", "solid", "blank"), limits = c("mean", "baseline","cumulative")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "baseline","cumulative")) + #limits controls the order in the legend
theme(legend.position = "right", axis.title.y = element_blank())+
labs(x = "Time") +
facet_wrap(~metric, strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
g1 <- grid::textGrob("Clicks (Millions)", rot = 90, gp=gpar(fontsize = 15), x= 0.85)
wrap_elements(g1) | (q1/q3)
patchwork <- wrap_elements(g1) | (q1/q3)
q <- patchwork
return(q)
}
# To run the function
plot(impact, c("original", "cumulative")) +
plot_annotation(title = "Figure. Analysis of click behavior after intervention"
, theme = theme(plot.title = element_text(hjust = 0.5))) &
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())

I rewrote the plot function. Instead of using facet_wrap(), I created individual plots with their own legends and used patchwork to group them together into a single plot. In order to run this you need to memory all of the source code including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R with the exception of the CreateImpactPlot function which I recreated. So instead, run what I have below. You will also need to load ggplot2, tidyr, dplyr, and patchwork. This will only run for Original and Cumulative metrics. Though I revised to some extent for Pointwise, I did not want to do this as I didn't have an example to reproduce. I worked your theme preferences directly into the code in the function. You should be able to identify and change those elements now at your leisure. To be clear, the plots are q1 = original, q2 = pointwise, and q3 = cumulative. I don't see how to bring the confidence band into the legend as it is not part of aes(). Possibly could create a grob from scratch. I just referenced it in the title which you can change if it doesn't suit you. Hopefully this helps.
"cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Initialize plot
#q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
#q <- q + xlab("") + ylab("")
#if (length(metrics) > 1) {
# q <- q + facet_grid(metric ~ ., scales = "free_y")
#}
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q2 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q2 <- q2 + xlab("") + ylab("")
q3 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
#q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
# data, fill = "slategray2")
q1 <- q1 + geom_ribbon(data = data %>% dplyr::filter(metric == "original"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q2 <- q2 + geom_ribbon(data = data %>% dplyr::filter(metric == "pointwise"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q3 <- q3 + geom_ribbon(data = data %>% dplyr::filter(metric == "cumulative"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
#q <- q + geom_vline(xintercept = xintercept,
# colour = "darkgrey", size = 0.8, linetype = "dashed")
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q2 <- q2 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Add zero line to pointwise and cumulative plot
#q <- q + geom_line(aes(y = baseline),
# colour = "darkgrey", size = 0.8, linetype = "solid",
# na.rm = TRUE)
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable),
na.rm = TRUE)+
scale_linetype_manual(guide = "Legend", labels = c("estimated counterfactual", "oberserved"),
values = c("dashed", "solid")) +
scale_size_manual(values = c(0.6, 0.8)) +
scale_color_manual(values = c("darkblue", "darkgrey")) +
theme(legend.position = "right") +
guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
labs(title = "Original", y = "Clicks (Millions)") +
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
#q2 <- q2 + geom_line(data = data_long %>% dplyr::filter(metric == "pointwise"),
# aes(x = time, y = value, linetype = Line, group = Line),
# na.rm = TRUE) +
# scale_linetype_manual(title = "Legend", labels = c("estimated counterfactual", "observed"),
# values = c("dashed", "solid")) +
# scale_size_manual(values = c(0.6, 0.8)) +
# scale_color_manual(values = c("darkblue", "darkgrey")) +
# theme(legend.position = "right") +
# guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
# labs(title = "Pointwise", y = "Clicks (Millions)")
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable),
na.rm = TRUE) +
scale_linetype_manual(labels = c("observed", "estimated trend change"),
values = c("solid", "dashed")) +
theme(legend.position = "right")+
guides(linetype = guide_legend("Legend", nrow=2))+
labs(title = "Cumulative",x = "Time", y = "Clicks (Millions)")+
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
patchwork <- q1 / q3
q <- patchwork + plot_annotation(title = "Figure. Analysis of click behavior after intervention with
95% Confidence Interval")
# Add point predictions
#q <- q + geom_line(aes(y = mean), data,
# size = 0.6, colour = "darkblue", linetype = "dashed",
# na.rm = TRUE)
# Add observed data
#q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
plot(impact, c("original", "cumulative"))

Here is a rebuild of the CreateImpactPlot() function that will work for all three metrics. The legends can be modified. I introduced more colors and linetypes so that the legends could be applicable across all the facets.
The base case looks like this:
plot(impact)
You will note that the labels in the legend for the ribbons and for the lines refer to the metrics. These are placeholder labels that you can then modify.
line_labels <- c("cumulative_mean" = "change in trend", "baseline" = "baseline", "original_mean" =
"estimated counterfactual", "original_response" = "observed")
plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) + # get rid of minor grid
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "cumulative" = "darkseagreen"),
labels = c("original" = "counterfactual", "cumulative" = "estimation")) +
scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid")) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod"))
The vector "line_labels" is where you define the text you want to appear in the Legend. You will note that I removed the pointwise related values as I am excluding the pointwise metric from the plot. The scale_linetype_manual and scale_color_manual have to have the Name and labels kept in synch in order to have a combined legend, otherwise you will have two separate legends. The scale_fill_manual is for the ribbons. For these scales, you can change the names, the labels and the values as you desire. You can copy the code out of the function, revise it, and add it to the plot call as shown above.
Here is the code for the revised function. In the example, everything should be run and "impact" generated from the CausalImpact package. Then all of the package code needs to be loaded into memory including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R. Then load the code below.
CreateImpactPlot2 <- function(impact, metrics = c("original", "pointwise","cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE) %>%
mutate(variable2 = factor(ifelse(variable == "baseline", variable, paste0(metric,"_", variable))),
variable = factor(variable))
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
#Add prediction intervals
q <- q + geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = metric), data_long)
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "baseline"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "mean"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add observed data
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "response"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
#Add scales
line_labels <- c("cumulative_mean" = "cumulative_mean", "baseline" = "baseline", "original_mean" =
"original_mean", "original_response" = "original_response", "pointwise_mean"=
"pointwise_mean")
q <- q + scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid", "pointwise_mean"=
"solid")) +
scale_size_manual(values = c("cumulative_mean" = 0.6, "baseline" = 0.8, "original_mean"= 0.6, "original_response" = 0.5,
"pointwise_mean"= 0.6)) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod",
"pointwise_mean"= "darkgreen")) +
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "pointwise" = "pink3", "cumulative" = "darkseagreen"),
labels = c("original" = "original", "pointwise" = "pointwise", "cumulative" = "cumulative")) +
guides(size = "none")
return(q)
}
plot.CausalImpact <- function(x, ...) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# x: A \code{CausalImpact} results object, as returned by
# \code{CausalImpact()}.
# ...: Can be used to specify \code{metrics}, which determines which panels
# to include in the plot. The argument \code{metrics} can be any
# combination of "original", "pointwise", "cumulative". Partial matches
# are allowed.
#
# Returns:
# A ggplot2 object that can be plotted using plot().
#
# Examples:
# \dontrun{
# impact <- CausalImpact(...)
#
# # Default plot:
# plot(impact)
#
# # Customized plot:
# impact.plot <- plot(impact) + ylab("Sales")
# plot(impact.plot)
# }
return(CreateImpactPlot2(x, ...))
}

adding custom ggplot legend to dashed lines and confidence bands

I'm having trouble setting a custom legend for confidence bands and dashed lines. This is my graph so far.
di<-matrix(ncol = 3,nrow = 5) %>% as.data.frame()
colnames(di)<-c('group','estimate','SE')
di<-di %>% mutate(group=1:5,
estimate=c(0.5,9.6,13,15,23.1),
SE=14)
ggplot(di, aes(x=group, y=estimate)) +
geom_point() +
geom_errorbar(width=.5, aes(ymin=estimate-(1.647*SE), ymax=estimate+(1.647*SE)), colour="black") +
xlab('Group') +
ylab('Treatment Effect') +
labs(title="GATE with confidence bands",
subtitle="Point estimates and confidence bands are derived using median of all splits") +
geom_hline(yintercept=c(7.83,22.55),
linetype="longdash",
col='darkred') +
geom_hline(yintercept=15.19,
linetype="longdash",
col='blue')
It looks like this:
However what I want it to look like is something like this, with the exact same legend:
Any advice on this?

This could be achieved like so:
As a general rule: If you want to have a legend you have to map something on aesthetics, e.g. move color=... into aes() for all four geoms
The desired color values can then be set via scale_color_manual
For the geom_hline we also have to pass yintercept as an aes() too. To this end these get something helper data frames with the desired values.
To fix the lines and shapes in the legend I make use of guide_legend's overide.aes to remove the undesired points in the legend as well as removing the line for the point. Additionally I set the number of rows for the legend to 2.
The labels and the order of the layers can be set via the labels and the breaks argument of scale_color_manual
Move the legend in the topleft and get rid of the background fill for the legend and the keys via theme options.
library(ggplot2)
di <- data.frame(
group = 1:5,
estimate = c(0.5, 9.6, 13, 15, 23.1),
SE = 14
)
labels <- c(point = "Point", error = "Error", blue = "Blue", darkred = "Red")
breaks <- c("blue", "darkred", "point", "error")
ggplot(di, aes(x = group, y = estimate)) +
geom_point(aes(color = "point"), size = 3) +
geom_errorbar(width = .5, aes(
ymin = estimate - (1.647 * SE),
ymax = estimate + (1.647 * SE),
color = "error"
)) +
scale_color_manual(values = c(
point = "black",
error = "black",
blue = "blue",
darkred = "darkred"
), labels = labels, breaks = breaks) +
labs(
title = "GATE with confidence bands",
subtitle = "Point estimates and confidence bands are derived using median of all splits",
x = "Group",
y = "Treatment Effect",
color = NULL, linetype = NULL, shape = NULL
) +
geom_hline(
data = data.frame(yintercept = c(7.83, 22.55)),
aes(yintercept = yintercept, color = "darkred"), linetype = "longdash"
) +
geom_hline(
data = data.frame(yintercept = 15.19),
aes(yintercept = yintercept, color = "blue"), linetype = "longdash"
) +
guides(color = guide_legend(override.aes = list(
shape = c(NA, NA, 16, NA),
linetype = c("longdash", "longdash", "blank", "solid")
), nrow = 2, byrow = TRUE)) +
theme(legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.background = element_rect(fill = NA),
legend.key = element_rect(fill = NA))

GGplot with multiple/conditional scale_fill_manual for stat_density_ridges

Here's a question for ggplot experts...
My dataset has 432000 observations of 4 variables (one is numeric, the others are factors). Predictors has 6 levels, Estimate has 4 levels, Model has 2 levels. Value has a max of 2.6 and a min of -3. (I hope you can create data with that information.)
The plot set-up is a 4x6 faceted plot here's is a 2x3 example of the plot:
each row is a different level of a factor (Predictors)
each column a different level of another factor (Estimate)
there are two distributions within each mini-plot which is another factor (Model)
The goal is to plot:
the distributions in each column in a different color (blue, green, red, yellow) (according to Estimate)
within each mini-plot, the shade/hue of that color should be different (e.g., within the green column, repeat the order of colors according to Model)
fill the tails of two quantiles on each distribution of each mini-plot (as the tail lines in the picture indicate; color the tail from each line to the end of the tail in black/gray). The tails can be the same throughout the entire plot.
Here's an example of the code that I'm using. It doesn't plot the quantiles in a separate color:
pp <- ggplot(dd, aes(x=Value, y=as.factor(Model), fill=factor(Model))) +
stat_density_ridges(quantile_lines = TRUE, quantiles = c(0.05, 0.95), alpha = 0.95,vline_size = 0.5)+
scale_fill_manual(values = c("red", "white")) +
geom_vline(xintercept = 0, linetype="dashed", color = "black", size=0.5) +
facet_grid(Predictors~Estimate, scales = "free") + labs(x="Parameter value", y=" ") +
theme(text = element_text(size = 16)) + theme(axis.title=element_text(face="bold"), strip.text = element_text(
size = 16)) + theme(legend.position = "none")
To color the quantiles, you can swap fill=factor(Model) with fill=factor(..quantile..), but getting both "fills" in the same plot has been impossible thus far. Among many other things, I tried entering multiple factors into "fill", like this: fill=c(factor(Model), factor(Estimate), ..quantile..) , but it didn't work.
Any ideas?

I think from your description your data looks a bit like this (though I've limited it to 6000 rows):
set.seed(69)
Value <- rnorm(6000)
Predictors <- factor(rep(LETTERS[1:6], each = 1000))
Estimate <- factor(rep(rep(letters[1:4], each = 250), 6))
Model <- factor(rep(rep(c("Model1", "Model2"), each = 125), 24))
Value <- Value + rep(rnorm(6), each = 1000)
Value <- Value + rep(rep(rnorm(4), each = 250), 6)
Value <- Value + rep(rep(rnorm(2), each = 125), 24)
dd <- data.frame(Value, Predictors, Estimate, Model)
It sounds like most of what you want to do can be achieved by creating a new factor variable that is a conjunction of two other factors:
dd$fill_factor <- as.factor(paste0(Model, Estimate))
Which means that we should get close to the desired effect with minimal changes to your code:
library(ggplot2)
library(ggridges)
my_colors <- c("#0000FF", "#00FF00", "#FF0000", "#FFFF00")
ggplot(dd, aes(x = Value, y = Model, fill = fill_factor)) +
stat_density_ridges(quantile_lines = TRUE,
quantiles = c(0.05, 0.95),
alpha = 0.95,
vline_size = 0.5) +
scale_fill_manual(values = c(gsub("0", "6", my_colors),
gsub("F", "A", my_colors))) +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.5) +
facet_grid(Predictors ~ Estimate, scales = "free") +
labs(x = "Parameter value", y = " ") +
theme(text = element_text(size = 16),
axis.title = element_text(face = "bold"),
strip.text = element_text(size = 16),
legend.position = "none")

adding a label in geom_line in R

I have two very similar plots, which have two y-axis - a bar plot and a line plot:
code:
sec_plot <- ggplot(data, aes_string (x = year, group = 1)) +
geom_col(aes_string(y = frequency), fill = "orange", alpha = 0.5) +
geom_line(aes(y = severity))
However, there are no labels. I want to get a label for the barplot as well as a label for the line plot, something like:
How can I add the labels to the plot, if there is only pone single group? is there a way to specify this manually? Until know I have only found option where the labels can be added by specifying them in the aes
EXTENSION (added a posterior):
getSecPlot <- function(data, xvar, yvar, yvarsec, groupvar){
if ("agegroup" %in% xvar) xvar <- get("agegroup")
# data <- data[, startYear:= as.numeric(startYear)]
data <- data[!claims == 0][, ':=' (scaled = get(yvarsec) * max(get(yvar))/max(get(yvarsec)),
param = max(get(yvar))/max(get(yvarsec)))]
param <- data[1, param] # important, otherwise not found in ggplot
sec_plot <- ggplot(data, aes_string (x = xvar, group = groupvar)) +
geom_col(aes_string(y = yvar, fill = groupvar, alpha = 0.5), position = "dodge") +
geom_line(aes(y = scaled, color = gender)) +
scale_y_continuous(sec.axis = sec_axis(~./(param), name = paste0("average ", yvarsec),labels = function(x) format(x, big.mark = " ", scientific = FALSE))) +
labs(y = paste0("total ", yvar)) +
scale_alpha(guide = 'none') +
theme_pubclean() +
theme(legend.title=element_blank(), legend.background = element_rect(fill = "white"))
}
plot.ExposureYearly <- getSecPlot(freqSevDataAge, xvar = "agegroup", yvar = "exposure", yvarsec = "frequency", groupvar = "gender")
plot.ExposureYearly
How can the same be done on a plot where both the line plot as well as the bar plot are separated by gender?

Here is a possible solution. The method I used was to move the color and fill inside the aes and then use scale_*_identity to create and format the legends.
Also, I needed to add a scaling factor for severity axis since ggplot does not handle the secondary axis well.
data<-data.frame(year= 2000:2005, frequency=3:8, severity=as.integer(runif(6, 4000, 8000)))
library(ggplot2)
library(scales)
sec_plot <- ggplot(data, aes(x = year)) +
geom_col(aes(y = frequency, fill = "orange"), alpha = 0.6) +
geom_line(aes(y = severity/1000, color = "black")) +
scale_fill_identity(guide = "legend", label="Claim frequency (Number of paid claims per 100 Insured exposure)", name=NULL) +
scale_color_identity(guide = "legend", label="Claim Severity (Average insurance payment per claim)", name=NULL) +
theme(legend.position = "bottom") +
scale_y_continuous(sec.axis =sec_axis( ~ . *1, labels = label_dollar(scale=1000), name="Severity") ) + #formats the 2nd axis
guides(fill = guide_legend(order = 1), color = guide_legend(order = 2)) #control which scale plots first
sec_plot

ggplot2 legend with two different geom_point

I have the following ggplot graph with circles representing the observed data and the crosses the mean for each treatment :
d <- data.frame(Number = rnorm(12,100,20),
Treatment = rep(c("A","B","C", "D"), each = 3))
av <- aggregate(d["Number"], d["Treatment"], mean)
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(shape = 1, size = 6, color = "grey50") +
geom_point(data=av, shape = 4) +
theme_bw()
I would like to add a legend with the exact same symbols on top of the graphs but I'm a bit lost... I use aes to force the creation of legend and then try to modify it with manual scales but the result is not convincing. I would like to have one grey circle of size 6. That sounds also quite complicated for such a basic thing ... There is probably an easyier solution.
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(aes(shape = "1", size = "6", color = "grey50")) +
geom_point(data=av, aes(shape = "4")) +
theme_bw() +
scale_shape_manual(name = "", values = c(1,4), labels = c("observed values", "mean")) +
scale_size_manual(name = "", values = c(6,1), labels = c("observed values", "mean")) +
scale_color_manual(name = "", values = c("grey50","black"),
labels = c("observed values", "mean")) +
theme(legend.position = "top",
legend.key = element_rect(color = NA))
http://imagizer.imageshack.us/v2/320x240q90/842/4pgj.png

The ggplot2 way would be combining everything into a single data.frame like this:
av$Aggregated <- "mean"
d$Aggregated <- "observed value"
d <- rbind(d, av)
ggplot(data = d, aes(y = Number, x = Treatment,
shape=Aggregated, size=Aggregated, colour=Aggregated)) +
geom_point()
And than customize using manual scales and themes.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Adding a legend for vertical lines of histograms - r

Related

How to add legend to plot from CausalImpact package?

adding custom ggplot legend to dashed lines and confidence bands

GGplot with multiple/conditional scale_fill_manual for stat_density_ridges

adding a label in geom_line in R

ggplot2 legend with two different geom_point

Categories

Resources