I see the other post here about this, but I'm relatively new to R so the answers weren't helpful to me. I'd really appreciate some more in-depth help with how to do this.
I've already made a plot using the commands from the Causal Impact package. In the package documentation, it clearly says that the plots are ggplot2 objects and can be customized the same way as any other object like that. I've successfully done that, adding titles and customizing colors. I need to add a legend (it's required at the journal I'm submitting to). Here is an example of what my graph currently looks like and the code I used to get there.
library(ggplot2)
devtools::install_github("google/CausalImpact")
library(CausalImpact)
## note that I took this example code from the package documentation up until I customize the plot
#create data
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)
y <- 1.2 * x1 + rnorm(100)
y[71:100] <- y[71:100] + 10
data <- cbind(y, x1)
#causal impact analysis
> pre.period <- c(1, 70)
> post.period <- c(71, 100)
> impact <- CausalImpact(data, pre.period, post.period)
#graph
example<-plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) # get rid of minor grid
In my head, the solution I'd like is to have a legend for each panel of the plot. The first legend (next to the 'original' panel) would show a solid line represents the observed data, the dotted line represents the estimated counterfactual, and the colored band represents the 95% CrI around the estimated counterfactual. The second legend (next to the 'cumulative' panel) would show the dotted line represents the estimated change in trend associated with the intervention and the colored band again represents the 95% CrI around the estimation. Maybe there's a better solution than that, but that's what I've thought of.
Here is a section of the underlying code that runs when you plot:
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
# Add prediction intervals
q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
data, fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(aes(y = baseline),
colour = "darkgrey", size = 0.8, linetype = "solid",
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(aes(y = mean), data,
size = 0.6, colour = "darkblue", linetype = "dashed",
na.rm = TRUE)
# Add observed data
q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
One of the answers in that older post here said that I'd have to adapt the pre-existing function to get a legend, and I don't really have the skills yet to see what I'd have to change or add. I thought that legends were supposed to be automatically added according to what's in the aes() bit of the ggplot code, so I'm a little confused why there isn't one in the first place. Can someone help me with this?
Here is an updated/edited version of an earlier solution in order to merge aesthetics into one legend. The requirement was to merge linetype and fill (ribbon color) into one legend.
In order to merge legends, the same aesthetics have to be used in the geoms and the scales have to account for the different variables, have the same name and the same labels. So geom_ribbon() needs to have a linetype in the aes() as well as fill, and the geom_line() needs to have a fill in the aes() as well as the linetype. One side effect of adding a linetype to geom_ribbon() is that you then get a line around both edges of the band. On the other hand, fill is not applicable to geom_line so you just get a warning message that the fill aesthetic will be ignored.
The way to address this is to apply a linetype of "blank" to the relevant value in scale_linetype_manual(). Similarly, we use "transparent" in scale_fill_manual() to avoid applying a color to the other elements of the scale.
What I didn't realize before working through this is that it is possible to create a legend for an aesthetic for values across multiple variables. The values just have to be mapped appropriately in the scale. So I truly learned something new putting this together.
CreateImpactPlot <- function(impact, metrics = c("original", "cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Make data longer
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Initialize plot
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q3 <- ggplot(data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
q1 <- q1 + geom_ribbon(data = data %>%
filter(metric == "original") %>%
mutate(metric = factor(metric, levels = c("original"))), aes(x = time, ymin = lower, ymax = upper, fill = metric,
linetype = metric))
q3 <- q3 + geom_ribbon(data = data %>%
filter(metric == "cumulative") %>%
mutate(metric = factor(metric, levels = c("cumulative"))), aes(x = time, ymin = lower, ymax = upper, fill = metric))
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to cumulative plot
# Add point predictions
# Add observed data
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable, fill = variable, color = variable),
na.rm = TRUE)+
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("dashed", "solid", "blank"), limits = c("mean", "response","original")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated counterfactual", "response" = "oberserved", "original" = "95% Crl counterfactual"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "response","original")) + #limits controls the order in the legend
scale_size_manual(values = c(0.6, 0.8, 0.5)) +
scale_color_manual(values = c("darkgray", "darkblue")) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.title.y = element_blank()) +
guides(size = "none", color = "none")+
facet_wrap(~metric[1], strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable,
fill = variable),
na.rm = TRUE) +
scale_linetype_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("dashed", "solid", "blank"), limits = c("mean", "baseline","cumulative")) +
scale_fill_manual(name = "Legend", labels = c("mean"= "estimated trend change", "baseline" = "oberserved", "cumulative" = "95% Crl estimation"),
values = c("transparent", "transparent","slategray2"), limits = c("mean", "baseline","cumulative")) + #limits controls the order in the legend
theme(legend.position = "right", axis.title.y = element_blank())+
labs(x = "Time") +
facet_wrap(~metric, strip.position = "right", drop = TRUE) #use facet_wrap to generate the stip
g1 <- grid::textGrob("Clicks (Millions)", rot = 90, gp=gpar(fontsize = 15), x= 0.85)
wrap_elements(g1) | (q1/q3)
patchwork <- wrap_elements(g1) | (q1/q3)
q <- patchwork
return(q)
}
# To run the function
plot(impact, c("original", "cumulative")) +
plot_annotation(title = "Figure. Analysis of click behavior after intervention"
, theme = theme(plot.title = element_text(hjust = 0.5))) &
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
I rewrote the plot function. Instead of using facet_wrap(), I created individual plots with their own legends and used patchwork to group them together into a single plot. In order to run this you need to memory all of the source code including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R with the exception of the CreateImpactPlot function which I recreated. So instead, run what I have below. You will also need to load ggplot2, tidyr, dplyr, and patchwork. This will only run for Original and Cumulative metrics. Though I revised to some extent for Pointwise, I did not want to do this as I didn't have an example to reproduce. I worked your theme preferences directly into the code in the function. You should be able to identify and change those elements now at your leisure. To be clear, the plots are q1 = original, q2 = pointwise, and q3 = cumulative. I don't see how to bring the confidence band into the legend as it is not part of aes(). Possibly could create a grob from scratch. I just referenced it in the title which you can change if it doesn't suit you. Hopefully this helps.
"cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
# Initialize plot
#q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
#q <- q + xlab("") + ylab("")
#if (length(metrics) > 1) {
# q <- q + facet_grid(metric ~ ., scales = "free_y")
#}
q1 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q1 <- q1 + xlab("") + ylab("")
q2 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q2 <- q2 + xlab("") + ylab("")
q3 <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q3 <- q3 + xlab("") + ylab("")
# Add prediction intervals
#q <- q + geom_ribbon(aes(ymin = lower, ymax = upper),
# data, fill = "slategray2")
q1 <- q1 + geom_ribbon(data = data %>% dplyr::filter(metric == "original"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q2 <- q2 + geom_ribbon(data = data %>% dplyr::filter(metric == "pointwise"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
q3 <- q3 + geom_ribbon(data = data %>% dplyr::filter(metric == "cumulative"), aes(x = time, ymin = lower, ymax = upper),
fill = "slategray2")
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
#q <- q + geom_vline(xintercept = xintercept,
# colour = "darkgrey", size = 0.8, linetype = "dashed")
q1 <- q1 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q2 <- q2 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
q3 <- q3 + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dotted")
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE)
# Add zero line to pointwise and cumulative plot
#q <- q + geom_line(aes(y = baseline),
# colour = "darkgrey", size = 0.8, linetype = "solid",
# na.rm = TRUE)
q1 <- q1 + geom_line(data = data_long %>% dplyr::filter(metric == "original"),
aes(x = time, y = value, linetype = variable, group = variable,
size = variable),
na.rm = TRUE)+
scale_linetype_manual(guide = "Legend", labels = c("estimated counterfactual", "oberserved"),
values = c("dashed", "solid")) +
scale_size_manual(values = c(0.6, 0.8)) +
scale_color_manual(values = c("darkblue", "darkgrey")) +
theme(legend.position = "right") +
guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
labs(title = "Original", y = "Clicks (Millions)") +
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
#q2 <- q2 + geom_line(data = data_long %>% dplyr::filter(metric == "pointwise"),
# aes(x = time, y = value, linetype = Line, group = Line),
# na.rm = TRUE) +
# scale_linetype_manual(title = "Legend", labels = c("estimated counterfactual", "observed"),
# values = c("dashed", "solid")) +
# scale_size_manual(values = c(0.6, 0.8)) +
# scale_color_manual(values = c("darkblue", "darkgrey")) +
# theme(legend.position = "right") +
# guides(linetype = guide_legend("Legend", nrow=2), size = "none", color = "none")+
# labs(title = "Pointwise", y = "Clicks (Millions)")
q3 <- q3 + geom_line(data = data_long %>% dplyr::filter(metric == "cumulative"),
aes(x = time, y = value, linetype = variable, group = variable),
na.rm = TRUE) +
scale_linetype_manual(labels = c("observed", "estimated trend change"),
values = c("solid", "dashed")) +
theme(legend.position = "right")+
guides(linetype = guide_legend("Legend", nrow=2))+
labs(title = "Cumulative",x = "Time", y = "Clicks (Millions)")+
theme(
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank())
patchwork <- q1 / q3
q <- patchwork + plot_annotation(title = "Figure. Analysis of click behavior after intervention with
95% Confidence Interval")
# Add point predictions
#q <- q + geom_line(aes(y = mean), data,
# size = 0.6, colour = "darkblue", linetype = "dashed",
# na.rm = TRUE)
# Add observed data
#q <- q + geom_line(aes(y = response), size = 0.6, na.rm = TRUE)
return(q)
}
plot(impact, c("original", "cumulative"))
Here is a rebuild of the CreateImpactPlot() function that will work for all three metrics. The legends can be modified. I introduced more colors and linetypes so that the legends could be applicable across all the facets.
The base case looks like this:
plot(impact)
You will note that the labels in the legend for the ribbons and for the lines refer to the metrics. These are placeholder labels that you can then modify.
line_labels <- c("cumulative_mean" = "change in trend", "baseline" = "baseline", "original_mean" =
"estimated counterfactual", "original_response" = "observed")
plot(impact, c("original", "cumulative")) +
labs(
x = "Time",
y = "Clicks (Millions)",
title = "Figure. Analysis of click behavior after intervention.") +
theme(plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
panel.background = element_rect(fill = "transparent"), # panel bg
plot.background = element_rect(fill = "transparent", color = NA), # plot bg
panel.grid.major = element_blank(), # get rid of major grid
panel.grid.minor = element_blank()) + # get rid of minor grid
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "cumulative" = "darkseagreen"),
labels = c("original" = "counterfactual", "cumulative" = "estimation")) +
scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid")) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod"))
The vector "line_labels" is where you define the text you want to appear in the Legend. You will note that I removed the pointwise related values as I am excluding the pointwise metric from the plot. The scale_linetype_manual and scale_color_manual have to have the Name and labels kept in synch in order to have a combined legend, otherwise you will have two separate legends. The scale_fill_manual is for the ribbons. For these scales, you can change the names, the labels and the values as you desire. You can copy the code out of the function, revise it, and add it to the plot call as shown above.
Here is the code for the revised function. In the example, everything should be run and "impact" generated from the CausalImpact package. Then all of the package code needs to be loaded into memory including impact_analysis.R, impact_misc.R, impact_model.R, impact_inference.R and impact_plot.R. Then load the code below.
CreateImpactPlot2 <- function(impact, metrics = c("original", "pointwise","cumulative")) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# impact: \code{CausalImpact} results object returned by
# \code{CausalImpact()}.
# metrics: Which metrics to include in the plot. Can be any combination of
# "original", "pointwise", and "cumulative".
#
# Returns:
# A ggplot2 object that can be plotted using plot().
# Create data frame of: time, response, mean, lower, upper, metric
data <- CreateDataFrameForPlot(impact)
# Select metrics to display (and their order)
assert_that(is.vector(metrics))
metrics <- match.arg(metrics, several.ok = TRUE)
data <- data[data$metric %in% metrics, , drop = FALSE]
data$metric <- factor(data$metric, metrics)
data_long <- data %>%
tidyr::pivot_longer(cols = c("baseline", "mean", "response"), names_to = "variable",
values_to = "value", values_drop_na = TRUE) %>%
mutate(variable2 = factor(ifelse(variable == "baseline", variable, paste0(metric,"_", variable))),
variable = factor(variable))
# Initialize plot
q <- ggplot(data, aes(x = time)) + theme_bw(base_size = 15)
q <- q + xlab("") + ylab("")
if (length(metrics) > 1) {
q <- q + facet_grid(metric ~ ., scales = "free_y")
}
#Add prediction intervals
q <- q + geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = metric), data_long)
# Add pre-period markers
xintercept <- CreatePeriodMarkers(impact$model$pre.period,
impact$model$post.period,
time(impact$series))
q <- q + geom_vline(xintercept = xintercept,
colour = "darkgrey", size = 0.8, linetype = "dashed")
# Add zero line to pointwise and cumulative plot
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "baseline"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add point predictions
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "mean"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
# Add observed data
q <- q + geom_line(data = data_long %>% dplyr::filter(variable == "response"),
aes(x = time, y = value, linetype = variable2, group = variable2, size = variable2, color = variable2),
na.rm = TRUE)
#Add scales
line_labels <- c("cumulative_mean" = "cumulative_mean", "baseline" = "baseline", "original_mean" =
"original_mean", "original_response" = "original_response", "pointwise_mean"=
"pointwise_mean")
q <- q + scale_linetype_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "dotted", "baseline" = "solid", "original_mean" =
"dotted", "original_response" = "solid", "pointwise_mean"=
"solid")) +
scale_size_manual(values = c("cumulative_mean" = 0.6, "baseline" = 0.8, "original_mean"= 0.6, "original_response" = 0.5,
"pointwise_mean"= 0.6)) +
scale_color_manual(name = "Legend", labels = line_labels,
values = c("cumulative_mean" = "red", "baseline" = "darkgrey", "original_mean"= "darkblue", "original_response" = "goldenrod",
"pointwise_mean"= "darkgreen")) +
scale_fill_manual(name = "95% Crl", values = c("original" = "slategray2", "pointwise" = "pink3", "cumulative" = "darkseagreen"),
labels = c("original" = "original", "pointwise" = "pointwise", "cumulative" = "cumulative")) +
guides(size = "none")
return(q)
}
plot.CausalImpact <- function(x, ...) {
# Creates a plot of observed data and counterfactual predictions.
#
# Args:
# x: A \code{CausalImpact} results object, as returned by
# \code{CausalImpact()}.
# ...: Can be used to specify \code{metrics}, which determines which panels
# to include in the plot. The argument \code{metrics} can be any
# combination of "original", "pointwise", "cumulative". Partial matches
# are allowed.
#
# Returns:
# A ggplot2 object that can be plotted using plot().
#
# Examples:
# \dontrun{
# impact <- CausalImpact(...)
#
# # Default plot:
# plot(impact)
#
# # Customized plot:
# impact.plot <- plot(impact) + ylab("Sales")
# plot(impact.plot)
# }
return(CreateImpactPlot2(x, ...))
}
With ggplot2, I can create a violin plot with overlapping points, and paired points can be connected using geom_line().
library(datasets)
library(ggplot2)
library(dplyr)
iris_edit <- iris %>% group_by(Species) %>%
mutate(paired = seq(1:length(Species))) %>%
filter(Species %in% c("setosa","versicolor"))
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin() +
geom_line(mapping = aes(group = paired),
position = position_dodge(0.1),
alpha = 0.3) +
geom_point(mapping = aes(fill = Species, group = paired),
size = 1.5, shape = 21,
position = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
The see package includes the geom_violindot() function to plot a halved violin plot alongside its constituent points. I've found this function helpful when plotting a large number of points so that the violin is not obscured.
library(see)
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violindot(dots_size = 0.8,
position_dots = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
Now, I would like to add geom_line() to geom_violindot() in order to connect paired points, as in the first image. Ideally, I would like the points to be inside and the violins to be outside so that the lines do not intersect the violins. geom_violindot() includes the flip argument, which takes a numeric vector specifying the geoms to be flipped.
ggplot(data = iris_edit,
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violindot(dots_size = 0.8,
position_dots = position_dodge(0.1),
flip = c(1)) +
geom_line(mapping = aes(group = paired),
alpha = 0.3,
position = position_dodge(0.1)) +
theme_classic() +
theme(legend.position = "none",
axis.text.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.title.x = element_blank(),
axis.text.y = element_text(size = 10))
As you can see, invoking flip inverts the violin half, but not the corresponding points. The see documentation does not seem to address this.
Questions
How can you create a geom_violindot() plot with paired points, such that the points and the lines connecting them are "sandwiched" in between the violin halves? I suspect there is a solution that uses David Robinson's GeomFlatViolin function, though I haven't been able to figure it out.
In the last figure, note that the lines are askew relative to the points they connect. What position adjustment function should be supplied to the position_dots and position arguments so that the points and lines are properly aligned?
Not sure about using geom_violindot with see package. But you could use a combo of geom_half_violon and geom_half_dotplot with gghalves package and subsetting the data to specify the orientation:
library(gghalves)
ggplot(data = iris_edit[iris_edit$Species == "setosa",],
mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_half_violin(side = "l") +
geom_half_dotplot(stackdir = "up") +
geom_half_violin(data = iris_edit[iris_edit$Species == "versicolor",],
aes(x = Species, y = Sepal.Length, fill = Species), side = "r")+
geom_half_dotplot(data = iris_edit[iris_edit$Species == "versicolor",],
aes(x = Species, y = Sepal.Length, fill = Species),stackdir = "down") +
geom_line(data = iris_edit, mapping = aes(group = paired),
alpha = 0.3)
As a note, the lines in the pairing won't properly align because the dotplot is binning each observation then lengthing out the dotline-- the paired lines only correspond to x-value as defined in aes, not where the dot is in the line.
As per comment - this is not a direct answer to your question, but I believe that you might not get the most convincing visualisation when using the "slope graph" optic. This becomes quickly convoluted (so many dots/ lines overlapping) and the message gets lost.
To show change between paired observations (treatment 1 versus treatment 2), you can also (and I think: better) use a scatter plot. You can show each observation and the change becomes immediately clear. To make it more intuitive, you can add a line of equality.
I don't think you need to show the estimated distribution (left plot), but if you want to show this, you could make use of a two-dimensional density estimation, with geom_density2d (right plot)
library(tidyverse)
## patchwork only for demo purpose
library(patchwork)
iris_edit <- iris %>% group_by(Species) %>%
## use seq_along instead
mutate(paired = seq_along(Species)) %>%
filter(Species %in% c("setosa","versicolor")) %>%
## some more modificiations
select(paired, Species, Sepal.Length) %>%
pivot_wider(names_from = Species, values_from = Sepal.Length)
lims <- c(0, 10)
p1 <-
ggplot(data = iris_edit, aes(setosa, versicolor)) +
geom_abline(intercept = 0, slope = 1, lty = 2) +
geom_point(alpha = .7, stroke = 0, size = 2) +
cowplot::theme_minimal_grid() +
coord_equal(xlim = lims, ylim = lims) +
labs(x = "Treatment 1", y = "Treatment 2")
p2 <-
ggplot(data = iris_edit, aes(setosa, versicolor)) +
geom_abline(intercept = 0, slope = 1, lty = 2) +
geom_density2d(color = "Grey") +
geom_point(alpha = .7, stroke = 0, size = 2) +
cowplot::theme_minimal_grid() +
coord_equal(xlim = lims, ylim = lims) +
labs(x = "Treatment 1", y = "Treatment 2")
p1+ p2
Created on 2021-12-18 by the reprex package (v2.0.1)
I'm trying to add set of markers with text above the top of a faceted chart to indicate certain points of interest in the value of x. Its important that they appear in the right position left to right (as per the main scale), including when the overall ggplot changes size.
Something like this...
However, I'm struggling to:
place it in the right vertical position (above the facets). In my
reprex below (a simplified version of the original), I tried using a
value of the factor (Merc450 SLC), but this causes issues such as adding that to
every facet including when it is not part of that facet and doesn't
actually go high enough. I also tried converting the factor to a number using as.integer, but this causes every facet to include all factor values, when they obviously shouldn't
apply to the chart as a whole, not each
facet
Note that in the full solution, the marker x values are independent of the main data.
I have tried using cowplot to draw it separately and overlay it, but that seems to:
affect the overall scale of the main plot, with the facet titles on the right being cropped
is not reliable in placing the markers at the exact location along the x scale
Any pointers welcome.
library(tidyverse)
mtcars2 <- rownames_to_column(mtcars, var = "car") %>%
mutate(make = stringr::word(car, 1)) %>%
filter(make >= "m" & make < "n")
markers <- data.frame(x = c(max(mtcars2$mpg), rep(runif(nrow(mtcars2), 1, max(mtcars2$mpg))), max(mtcars2$mpg))) %>%
mutate(name = paste0("marker # ", round(x)))
ggplot(mtcars2, aes()) +
# Main Plot
geom_tile(aes(x = mpg, y = car, fill = cyl), color = "white") +
# Add Markers
geom_point(data = markers, aes(x = x, y = "Merc450 SLC"), color = "red") +
# Marker Labels
geom_text(data = markers, aes(x = x, "Merc450 SLC",label = name), angle = 45, size = 2.5, hjust=0, nudge_x = -0.02, nudge_y = 0.15) +
facet_grid(make ~ ., scales = "free", space = "free") +
theme_minimal() +
theme(
# Facets
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines"),
plot.margin = margin(50, 20, 20, 20)
)
Perhaps draw two separate plots and assemble them together with patchwork:
library(patchwork)
p1 <- ggplot(markers, aes(x = x, y = 0)) +
geom_point(color = 'red') +
geom_text(aes(label = name),
angle = 45, size = 2.5, hjust=0, nudge_x = -0.02, nudge_y = 0.02) +
scale_y_continuous(limits = c(-0.01, 0.15), expand = c(0, 0)) +
theme_minimal() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank())
p2 <- ggplot(mtcars2, aes(x = mpg, y = car, fill = cyl)) +
geom_tile(color = "white") +
facet_grid(make ~ ., scales = "free", space = "free") +
theme_minimal() +
theme(
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines")
)
p1/p2 + plot_layout(heights = c(1, 9))
It required some workaround with plot on different plot and using cowplot alignment function to align them on the same axis. Here is a solution
library(tidyverse)
library(cowplot)
# define a common x_axis to ensure that the plot are on same scales
# This may not needed as cowplot algin_plots also adjust the scale however
# I tended to do this extra step to ensure.
x_axis_common <- c(min(mtcars2$mpg, markers$x) * .8,
max(mtcars2$mpg, markers$x) * 1.1)
# Plot contain only marker
plot_marker <- ggplot() +
geom_point(data = markers, aes(x = x, y = 0), color = "red") +
# Marker Labels
geom_text(data = markers, aes(x = x, y = 0,label = name),
angle = 45, size = 2.5, hjust=0, nudge_x = 0, nudge_y = 0.001) +
# using coord_cartesian to set the zone of plot for some scales
coord_cartesian(xlim = x_axis_common,
ylim = c(-0.005, 0.03), expand = FALSE) +
# using theme_nothing from cow_plot which remove all element
# except the drawing
theme_nothing()
# main plot with facet
main_plot <- ggplot(mtcars2, aes()) +
# Main Plot
geom_tile(aes(x = mpg, y = car, fill = cyl), color = "white") +
coord_cartesian(xlim = x_axis_common, expand = FALSE) +
# Add Markers
facet_grid(make ~ ., scales = "free_y", space = "free") +
theme_minimal() +
theme(
# Facets
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines"),
plot.margin = margin(0, 20, 20, 20)
)
Then align the plot and plot them using cow_plot
# align the plots together
temp <- align_plots(plot_marker, main_plot, axis = "rl",
align = "hv")
# plot them with plot_grid also from cowplot - using rel_heights for some
# adjustment
plot_grid(temp[[1]], temp[[2]], ncol = 1, rel_heights = c(1, 8))
Created on 2021-05-03 by the reprex package (v2.0.0)
I am doing Manhattan plot for 2 phenotypes and therefore I am melting data for columns GWAS and GTEX in my dataframe which looks like this:
pos.end GWAS GTEX
1 16975756 0.71848040 2.82508e-05
2 16995937 0.02349431 4.54958e-11
3 17001098 0.04310933 1.93264e-20
4 17001135 0.04354486 8.52552e-21
5 17002964 0.02352996 1.84111e-15
6 17005677 0.01046168 2.09734e-11
...
The problem is that GTEX data is much smaller than GWAS so I would need to have two y axis to represent them.
I am suppose to use something like this:
scale_y_continuous(sec.axis = sec_axis...
but I am unsure how to implement that in my case.
right now this is my code:
library(dplyr)
library(ggplot2)
library(tibble)
library(ggrepel)
snpsOfInterest = c("17091307")
tmp = read.table("nerve_both_manh", header=T)
tmp.tidy <- tmp %>%
tidyr::gather(key, value, -pos.end) %>%
mutate(is_highlight = ifelse(pos.end %in% snpsOfInterest, "yes", "no")) %>%
mutate(is_annotate = ifelse(-log10(value) > 5, "yes", "no"))
ggplot(tmp.tidy, aes(pos.end, -log10(value), color = key)) +
geom_point(data = subset(tmp.tidy, is_highlight == "yes"),
color = "purple", size = 2)+
geom_label_repel(data = subset(tmp.tidy, is_annotate == "yes"),
aes(label = pos.end), size = 2)
I need to have 2 Y axis one for GWAS and another one for GTEX. GTEX values are much smaller than those for GWAS.
I plotted with the code above this and it looks like this:
![two muppets][1]
UPDATE
I tired to use locus.zoom() from ggforce library but still results is not good. How do I get just the zoomed GWAS values?
ggplot(tmp.tidy, aes(pos.end, -log10(value), color=key)) +
facet_zoom(xy = key == "GWAS")+
geom_point(data=subset(tmp.tidy, is_highlight=="yes"), color="purple", size=2)+
geom_label_repel( data=subset(tmp.tidy, is_annotate=="yes"), aes(label=pos.end), size=2)
![one muppet][1]
UPDATE
per suggestion bellow I did:
ggplot(tmp.tidy) +
geom_count(aes(pos.end, -log10(value), color = key)) +
facet_wrap(~key, scales = "free") +
guides(size = FALSE) +
theme(
panel.background = element_rect(fill = "white", color = "grey90"),
panel.spacing = unit(2, "lines")
)
But I don't know how to integrate in this these two lines:
geom_point(data=subset(tmp.tidy, is_highlight=="yes"), color="purple", size=2)+
geom_label_repel( data=subset(tmp.tidy, is_annotate=="yes"), aes(label=pos.end), size=2)+
If I use it with the above code I am getting this error:
Error: geom_point requires the following missing aesthetics: x, y
I tried doing it like this but nothing happens:
ggplot(tmp.tidy) +
geom_count(aes(pos.end, -log10(value), color = key)) +
facet_wrap(~key, scales = "free") +
guides(size = FALSE) +
geom_point(data = subset(tmp.tidy, is_highlight == "yes"), aes(x = pos.end, y = -log10(value)),color = "purple", size = 2) +
geom_label_repel(data = subset(tmp.tidy, is_annotate == "yes"), aes(aes(x = pos.end, y = -log10(value), label = pos.end), size = 2)
theme(
panel.background = element_rect(fill = "white", color = "grey90"),
panel.spacing = unit(2, "lines")
)
I tried to approximate your problem with the diamonds dataset. Could you add an identifier in your data and then use facet_wrap() on it?
df <-
diamonds %>%
slice(1:2000) %>%
filter(price < 400 | price > 3000) %>%
mutate(type = ifelse(price < 500 & row_number() < 100, "GWAS", "GTEX"))
ggplot(df) +
geom_count(aes(table, price, color = type)) +
facet_wrap(~type, scales = "free") +
guides(size = FALSE) +
theme(
panel.background = element_rect(fill = "white", color = "grey90"),
panel.spacing = unit(2, "lines")
)
To update your code per the conversation below you would use
geom_point(
data = subset(tmp.tidy, is_highlight == "yes"),
aes(x = pos.end, y = -log10(value)),
color = "purple", size = 2
) +
geom_label_repel(
data = subset(tmp.tidy, is_annotate == "yes"),
aes(x = pos.end, y = -log10(value), label = pos.end),
size = 2
)
I intend to make a dot plot somewhat like this:
But there's some issue with the code:
df = data.frame(x=runif(100))
df %>%
ggplot(aes(x )) +
geom_dotplot(binwidth =0.01, aes(fill = ..count..), stackdir = "centerwhole",dotsize=2, stackgroups = T, binpositions = "all")
how to choose bin width to avoid dots overlapping, bins wrapping itself in 2 columns or dots get truncated at the top and bottom?
And why is the y axis showing decimal points instead of count? And how to color the dots by x value? I tried fill = x and no color is shown.
The overlap is caused by the dotsize > 1; as #Jimbuo said, the decimal values on the y axis is due to the internals of this geom; for the fill and color you can use the ..x.. computed variable:
Computed variables
x center of each bin, if binaxis is "x"
df = data.frame(x=runif(1000))
library(dplyr)
library(ggplot2)
df %>%
ggplot(aes(x, fill = ..x.., color = ..x..)) +
geom_dotplot(method = 'histodot',
binwidth = 0.01,
stackdir = "down",
stackgroups = T,
binpositions = "all") +
scale_fill_gradientn('', colours = c('#5185FB', '#9BCFFD', '#DFDFDF', '#FF0000'), labels = c(0, 1), breaks = c(0,1), guide = guide_legend('')) +
scale_color_gradientn(colours = c('#5185FB', '#9BCFFD', '#DFDFDF', '#FF0000'), labels = c(0, 1), breaks = c(0,1), guide = guide_legend('')) +
scale_y_continuous() +
scale_x_continuous('', position = 'top') +
# coord_equal(ratio = .25) +
theme_classic() +
theme(axis.line = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
aspect.ratio = .25,
legend.position = 'bottom',
legend.direction = 'vertical'
)
Created on 2018-05-18 by the reprex package (v0.2.0).
First from the help of ?geom_dotplot
When binning along the x axis and stacking along the y axis, the
numbers on y axis are not meaningful, due to technical limitations of
ggplot2. You can hide the y axis, as in one of the examples, or
manually scale it to match the number of dots.
Thus you can try following. Note, the coloring is not completly fitting the x axis.
library(tidyverse)
df %>%
ggplot(aes(x)) +
geom_dotplot(stackdir = "down",dotsize=0.8,
fill = colorRampPalette(c("blue", "white", "red"))(100)) +
scale_y_continuous(labels = c(0,10), breaks = c(0,-0.4)) +
scale_x_continuous(position = "top") +
theme_classic()
For the correct coloring, you have to calculate the bins by yourself using e.g. .bincode:
df %>%
mutate(gr=with(.,.bincode(x ,breaks = seq(0,1,1/30)))) %>%
mutate(gr2=factor(gr,levels = 1:30, labels = colorRampPalette(c("blue", "white", "red"))(30))) %>%
arrange(x) %>%
{ggplot(data=.,aes(x)) +
geom_dotplot(stackdir = "down",dotsize=0.8,
fill = .$gr2) +
scale_y_continuous(labels = c(0,10), breaks = c(0,-0.4)) +
scale_x_continuous(position = "top") +
theme_classic()}