ggplot2 custom legend with multiple geom overlays: guide_legend() confusion - r

I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")

Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))

Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)

Related

Join lines of different aesthetics in ggplot2

I'm trying to solve the following problem: how to join lines of different aesthetics/groups in ggplot and maintain the color of a certain group. It is best to see the examples below for a better understanding of what I mean.
Consider the following code:
set.seed(870123)
library(ggplot2)
example_data <- data.frame(Date = seq.Date(as.Date("2020-01-01"),length.out = 20,by = "month"),
Series = rnorm(20) + 1,
Grouping = c(rep(c("Group1"),10),rep(c("Group2"),10)))
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Which generates the plot:
My problem is that ggplot won't join the lines where the grouping variable changes. I would like, for example, there to be a red line joining the last blue point to the first red point. One way to solve such issue would be to not specify any color aesthetic for the geom_line portion, as below, but this is NOT what I want:
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line() +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
The only way I am able to achieve what I want is with the really bad solution demonstrated below...
### Dirty Solution
temporary <- example_data[as.Date("2020-10-01") == example_data$Date,]
temporary$Grouping <- "Group2"
example_data <- rbind(example_data,temporary)
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Plot generated, which is what I would like to achieve through a better solution:
Is there any way to achieve what I want with ggplot2? I wasn't able to find a better solution.
library(ggplot2); library(dplyr)
ggplot(example_data,aes(x = Date, y = Series)) +
# use lag to connect backwards, lead to connect forwards
geom_segment(aes(color = Grouping, xend = lag(Date), yend = lag(Series))) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red"))

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

ggplot2 - using two different color scales for same fill in overlayed plots

A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?
You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.

ggplot2 bar chart with lines inside bars

I'm making a plot in which I have a 3x3 grid obtained from facet_wrap. Eight out of nine plots use geom_violin while the remaining plot is made using geom_bar. After finding some helpful answers here on the site, I got this all working. The problem that I have is that when I use fill = "white, color = "black" for my bar chart, it draws these lines inside the bars.
Here is some example code and figures.
library(tidyverse)
n <- 100
tib <- tibble(value = c(rnorm(n, mean = 100, sd = 10), rbinom(n, size = 1, prob = (1:4)/4)),
variable = rep(c("IQ", "Sex"), each = n),
year = factor(rep(2012:2015, n/2)))
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_bar(data = filter(tib, variable == "Sex"), stat = "identity",
color = "black", fill = "white")
Now to my question: how do I get rid of these lines inside the bars? I just want it to be white with black borders. I've been experimenting a lot with various configurations, and I can manage to get rid of the lines but at the expense of screwing the facet up. I'm fairly certain it's got to do with the stat, but I'm at a loss trying to fix it. Any suggestions?
I would suggest summarizing the data within the barplot:
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_bar(data = tib %>%
group_by(year,variable) %>%
summarise(value=sum(value)) %>%
filter(variable == "Sex"),
stat = "identity",
color = "black",
fill = "white")
I'm not sure this is a good way to represent the data, with the y-axes of the different panels representing very different things, but accept that your example might not match your actual use case. Making separate plots and then using gridExtra::grid.arrange, or cowplot::plot_grid is probably a better solution.
But if you want to do this
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_col(data = filter(tib, variable == "Sex") %>%
group_by(year, variable) %>%
summarise(value = sum(value)),
fill = "white", colour = "black")
Using geom_col rather than geom_bar so I don't need to use stat = identity.

How to add a legend to the multiple histograms with ggplot?

I am trying to add a legend to the graph but it doesn't work.
Do you have any ideas ?
Here is my code :
ggplot(data =stats_201507_AF ) +
geom_histogram(aes(gross_ind),fill="dodgerblue3", show.legend =T,bins=25)+
geom_histogram(aes(net_ind),fill="springgreen4",show.legend = T,bins=25) +
geom_histogram(aes(tax_ind),fill="gold2",show.legend = T, bins=25) +
xlab("Indices")+
scale_colour_manual(values=c("dodgerblue3","springgreen4","gold2"))
I wanted a description for every histogram with a corresponding colour.
Thanks a lot in advance
If you don't want to reshape your data, just do this:
ggplot(iris) +
geom_histogram(aes(x = Sepal.Length, fill = "Sepal.Length"),
position = "identity", alpha = 0.5) +
geom_histogram(aes(x = Sepal.Width, fill = "Sepal.Width"),
position = "identity", alpha = 0.5) +
scale_fill_manual(values = c(Sepal.Length = "blue",
Sepal.Width = "red"))
The key is that you need to map something to fill inside aes. Of course, reshaping your data to long format (and actually having a column to map to fill as a result) is usually preferable.

Resources