How do I add a legend to identify vertical lines in ggplot? - r

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?

1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)

You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

Related

Join lines of different aesthetics in ggplot2

I'm trying to solve the following problem: how to join lines of different aesthetics/groups in ggplot and maintain the color of a certain group. It is best to see the examples below for a better understanding of what I mean.
Consider the following code:
set.seed(870123)
library(ggplot2)
example_data <- data.frame(Date = seq.Date(as.Date("2020-01-01"),length.out = 20,by = "month"),
Series = rnorm(20) + 1,
Grouping = c(rep(c("Group1"),10),rep(c("Group2"),10)))
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Which generates the plot:
My problem is that ggplot won't join the lines where the grouping variable changes. I would like, for example, there to be a red line joining the last blue point to the first red point. One way to solve such issue would be to not specify any color aesthetic for the geom_line portion, as below, but this is NOT what I want:
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line() +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
The only way I am able to achieve what I want is with the really bad solution demonstrated below...
### Dirty Solution
temporary <- example_data[as.Date("2020-10-01") == example_data$Date,]
temporary$Grouping <- "Group2"
example_data <- rbind(example_data,temporary)
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Plot generated, which is what I would like to achieve through a better solution:
Is there any way to achieve what I want with ggplot2? I wasn't able to find a better solution.
library(ggplot2); library(dplyr)
ggplot(example_data,aes(x = Date, y = Series)) +
# use lag to connect backwards, lead to connect forwards
geom_segment(aes(color = Grouping, xend = lag(Date), yend = lag(Series))) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red"))

how to change / specify fill color which exceeds the limits of a gradient bar?

In ggplot2/geom_tile, how to change fill color whice exceed the limits?
As the image, Region_4/5 are out of limis(1,11) , so the fill color is default grey, how to change 'Region_4' to 'darkblue', 'Region_5' to 'black' . Thanks!
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(category=letters[1:5],
region=paste0('region_',1:5),
sales=c(1,2,5,0.1,300))
tile_data %>% ggplot(aes(x=category,
y=region,
fill=sales))+
geom_tile()+
scale_fill_gradientn(limits=c(1,11),
colors=brewer.pal(12,'Spectral'))+
theme_minimal()
If you want to keep the gradient scale and have two additional discrete values for off limits above and below, I think the easiest way would be to have separate fill scales for "in-limit" and "off-limit" values. This can be done with separate calls to geom_tile on subsets of your data and with packages such as {ggnewscale}.
I think it then would make sense to place the discrete "off-limits" at the respective extremes of your gradient color bar. You need then three geom_tile calls and three scale_fill calls, and you will need to specify the guide order within each scale_fill call. You will then need to play around with the legend margins, but it's not a big problem to make it look OK.
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(
category = letters[1:5],
region = paste0("region_", 1:5),
sales = c(1, 2, 5, 0.1, 300)
)
ggplot(tile_data, aes(
x = category,
y = region,
fill = sales
)) +
geom_tile(data = filter(tile_data, sales <= 11 & sales >=1)) +
scale_fill_gradientn(NULL,
limits = c(1, 11),
colors = brewer.pal(11, "Spectral"),
guide = guide_colorbar(order = 2)
) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales > 11), mapping = aes(fill = sales > 11)) +
scale_fill_manual("Sales", values = "black", labels = "> 11", guide = guide_legend(order = 1)) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales < 1), mapping = aes(fill = sales < 1)) +
scale_fill_manual(NULL, values = "darkblue", labels = "< 1", guide = guide_legend(order = 3)) +
theme_minimal() +
theme(legend.spacing.y = unit(-6, "pt"),
legend.title = element_text(margin = margin(b = 10)))
Created on 2021-11-22 by the reprex package (v2.0.1)
You can try scales::squish, define the limits, and put the out of bound (oob) values into the scalw:
p = tile_data %>% ggplot(aes(x=category,y=region,fill=sales))+ geom_tile()
p + scale_fill_gradientn(colors = brewer.pal(11,"Spectral"),
limit = c(1,11),oob=scales::squish)

ggplot2 custom legend with multiple geom overlays: guide_legend() confusion

I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")
Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))
Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)

ggplot generating two legends when only one is wanted

In R I'm trying to generate a plot where I want to apply unique colors, line types, transparencies, and line thicknesses by case grouping. As currently implemented two legend plots are generated instead of one. The second legend plot is the only one that I can change the legend title. Presumably I've made a mistake any help would be greatly appreciated.
Ultimately I want to generate a single legend and have the style changes and labeling changes take effect.
library(ggplot2)
temp_df <- data.frame(year = integer(50), value = numeric(50), case = character(50))
temp_df$year <- 1:50
temp_df$value <- runif(50)
temp_df$case <- "A"
df <- temp_df
temp_df$value <- runif(50)
temp_df$case <- "B"
df <- rbind(df, temp_df)
LineTypes <- c("solid", "dotted")
colors <- c("red", "black")
linealphas <- c(1, .8)
linesizes <- c(1, 2)
Plot <- ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case))+
scale_linetype_manual(values = LineTypes)+
scale_color_manual(values = colors)+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas)+
scale_size_manual(values = linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
theme_minimal()
Plot
If you want ggplot to merge the legends then they all have to have the same title. You can specify the legend title with the name argument in the scales:
ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case)) +
scale_linetype_manual(values = LineTypes, name = "Scenario")+
scale_color_manual(values = colors, name = "Scenario")+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas, name = "Scenario")+
scale_size_manual(values = linesizes, name = "Scenario")+
xlab("Year")+
ylab("Percentage%")+
theme_minimal()
A coworker pointed out a resolution to me the key was to remove the guides so that only one of styles that I had defined was being used for the legend.
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
His explanation for this was that R doesn't recognize that the vector of factors defining the properties of the plot are necessarily related. As a result it will generate multiple legends when only one is desired.
library(ggplot2)
temp_df<-data.frame(year=integer(50),value=numeric(50),case=character(50))
temp_df$year<-1:50
temp_df$value<-runif(50)
temp_df$case<-"A"
df<-temp_df
temp_df$value<-runif(50)
temp_df$case<-"B"
df<-rbind(df,temp_df)
LineTypes<-c("solid","dotted")
colors<-c("red","black")
linealphas<-c(1,.8)
linesizes<-c(1,2)
Plot<-ggplot(df,aes(x=year,y=value,group=case))+
geom_line(aes(linetype=case, color=case, size=case, alpha =case))+
scale_linetype_manual(values=LineTypes)+
scale_color_manual(values=colors)+
scale_y_continuous(limits=c(0,1),labels = scales::percent)+
scale_alpha_manual(values=linealphas)+
scale_size_manual(values=linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
theme_minimal()
Plot
Can't you just remove the line "labs(color = "Scenario")"?
This is the plot that gets generated. Not sure if it's missing anything that you need.
The result seems fine to me:

Selecting entries in legend of ggplot in R

I'm creating a figure using ggplot. That figure has 27 lines that I want to show but not emphasize, and two lines, mean and weighted mean, that I want to emphasize. I would like only these last two lines to appear into the legend of the plot. Here is my code:
p_plot <- ggplot(data = dta, aes(x = date, y = premium, colour = State)) +
geom_line(, show_guide=FALSE) +
scale_color_manual(values=c(rep("gray60", 27)))
p_plot <- p_plot + geom_line(aes(y = premium.m), colour = "blue", size = 1.25,
show_guide=TRUE) + geom_line(aes(y = premium.m.w), colour = "red",
size = 1.25, show_guide=c(TRUE)) + ylab("Pe/pg")
p_plot
The show_guide = FALSE statement in the first geom_line seems to be overridden by the other show_guide=TRUE statements. How can I limit the number of entries in the legend of my figures to the lines "premium.m" and "premium.m.w"? Thank you.
I think this should answer your question: (the code's been slightly modified but the concept is the same)
dta <- data.frame(date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), "months"), 26),
premium = rnorm(12*26),
State = rep(letters, each = 12))
library(ggplot2)
p_plot <- ggplot(data = dta) +
geom_line(aes(x = date, y = premium, group = State), colour = "grey60")
p_plot + geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, mean)), colour = "mean"),
size = 1.25) +
geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, median)), colour = "median"),
size = 1.25) + ylab("Pe/pg") + scale_color_discrete("stats")
p_plot
However, this is just a (ugly) workaround and far from the best practice for data visualisation (especially for the purposes ggplot has been implemented for). Anyway, I could provide you with a more elegant solution if you edited your question adding more details.

Resources