Legend with data from two different columns [duplicate] - r

I plot a 2 geom_point graph with the following code:
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
ggplot() +
geom_point(aes(x = year,y = boys),data=arbuthnot,colour = '#3399ff') +
geom_point(aes(x = year,y = girls),data=arbuthnot,shape = 17,colour = '#ff00ff') +
xlab(label = 'Year') +
ylab(label = 'Rate')
I simply want to know how to add a legend on the right side. With the same shape and color. Triangle pink should have the legend "woman" and blue circle the legend "men". Seems quite simple but after many trial I could not do it. (I'm a beginner with ggplot).

If you rename your columns of the original data frame and then melt it into long format withreshape2::melt, it's much easier to handle in ggplot2. By specifying the color and shape aesthetics in the ggplot command, and specifying the scales for the colors and shapes manually, the legend will appear.
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
library(reshape2)
names(arbuthnot) <- c("Year", "Men", "Women")
arbuthnot.melt <- melt(arbuthnot, id.vars = 'Year', variable.name = 'Sex',
value.name = 'Rate')
ggplot(arbuthnot.melt, aes(x = Year, y = Rate, shape = Sex, color = Sex))+
geom_point() + scale_color_manual(values = c("Women" = '#ff00ff','Men' = '#3399ff')) +
scale_shape_manual(values = c('Women' = 17, 'Men' = 16))

This is the trick that I usually use. Add colour argument to the aes and use it as an indicator for the label names.
ggplot() +
geom_point(aes(x = year,y = boys, colour = 'Boys'),data=arbuthnot) +
geom_point(aes(x = year,y = girls, colour = 'Girls'),data=arbuthnot,shape = 17) +
xlab(label = 'Year') +
ylab(label = 'Rate')

Here is a way of doing this without using reshape::melt. reshape::melt works, but you can get into a bind if you want to add other things to the graph, such as line segments. The code below uses the original organization of data. The key to modifying the legend is to make sure the arguments to scale_color_manual(...) and scale_shape_manual(...) are identical otherwise you will get two legends.
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
library(reshape2)
ptheme <- theme (
axis.text = element_text(size = 9), # tick labels
axis.title = element_text(size = 9), # axis labels
axis.ticks = element_line(colour = "grey70", size = 0.25),
panel.background = element_rect(fill = "white", colour = NA),
panel.border = element_rect(fill = NA, colour = "grey70", size = 0.25),
panel.grid.major = element_line(colour = "grey85", size = 0.25),
panel.grid.minor = element_line(colour = "grey93", size = 0.125),
panel.margin = unit(0 , "lines"),
legend.justification = c(1, 0),
legend.position = c(1, 0.1),
legend.text = element_text(size = 8),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.01), "npc") # c(bottom, left, top, right), values can be negative
)
cols <- c( "c1" = "#ff00ff", "c2" = "#3399ff" )
shapes <- c("s1" = 16, "s2" = 17)
p1 <- ggplot(data = arbuthnot, aes(x = year))
p1 <- p1 + geom_point(aes( y = boys, color = "c1", shape = "s1"))
p1 <- p1 + geom_point(aes( y = girls, color = "c2", shape = "s2"))
p1 <- p1 + labs( x = "Year", y = "Rate" )
p1 <- p1 + scale_color_manual(name = "Sex",
breaks = c("c1", "c2"),
values = cols,
labels = c("boys", "girls"))
p1 <- p1 + scale_shape_manual(name = "Sex",
breaks = c("s1", "s2"),
values = shapes,
labels = c("boys", "girls"))
p1 <- p1 + ptheme
print(p1)
output results

Here is an answer based on the tidyverse package. Where one can use the pipe, %>%, to chain functions together. Creating the plot in one continues manner, omitting the need to create temporarily variables. More on the pipe can be found in this post What does %>% function mean in R?
As far as I know, legends in ggplot2 are only based on aesthetic variables. So to add a discrete legend one uses a category column, and change the aesthetics according to the category. In ggplot this is for example done by aes(color=category).
So to add two (or more) different variables of a data frame to the legends, one needs to transform the data frame such that we have a category column telling us which column (variable) is being plotted, and a second column that actually holds the value. The tidyr::gather function, that was also loaded by tidyverse, does exactly that.
Then one creates the legend by just specifying which aesthetics variables need to be different. In this example the code would look as follows:
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(tidyverse)
arbuthnot %>%
rename(Year=year,Men=boys,Women=girls) %>%
gather(Men,Women,key = "Sex",value = "Rate") %>%
ggplot() +
geom_point(aes(x = Year, y=Rate, color=Sex, shape=Sex)) +
scale_color_manual(values = c("Men" = "#3399ff","Women"= "#ff00ff")) +
scale_shape_manual(values = c("Men" = 16, "Women" = 17))
Notice that tidyverse package also automatically loads in the ggplot2 package. An overview of the packages installed can be found on their website tidyverse.org.
In the code above I also used the function dplyr::rename (also loaded by tidyverse) to first rename the columns to the wanted labels. Since the legend automatically takes the labels equal to the category names.
There is a second way to renaming labels of legend, which involves specifying the labels explicitly in the scale_aesthetic_manual functions by the labels = argument. For examples see legends cookbook. But is not recommended since it gets messy quickly with more variables.

Related

ggplot2 - split one legend (two color scales) and delete another

I am having much trouble configuring plot legend in ggplot2. I have two data frames:
require(ggplot2)
mat <- rep(c("None", "wood", "steel"), each = 4)
feet = rep(seq(0,3), 3)
load = c(3:6, 4:7, 5:8)
soil <- data.frame(
feet = feet,
test = rep(1:3, each = 4),
load = c(0.1, 0.2, 0.3, 0.04,
0.5, 0.6, 0.7, 0.44,
0.8, 0.9, 1.0, 0.74)
)
dat <- rbind(
data.frame(
feet = feet,
mat = mat,
load = c(3:6, 4:7, 5:8),
SF = FALSE
),
data.frame(
feet = feet,
mat = mat,
load = c(6:9, 7:10, 8:11),
SF = TRUE
)
)
I would like a plot with a legend for dat$mat and a legend for soil$test:
myplot <- ggplot(dat, aes(x = load, y = feet)) +
geom_line(aes(color = mat, linetype = SF)) +
geom_path(dat = soil, aes(x = load, y = feet, color = factor(test)))
myplot
I don't want the legend named SF. Also, I would like to split the legend named mat into two legends, mat (values = "none", "wood", "steel") from the dat data.frame, and test (values = 1, 2, 3) from the soil data.frame.
I've tried theme(legend.position = "none"), and many other various combinations of code that would fill the page if I listed them all. Thanks for any assistance you can offer.
update - there is a much better option offered in this answer. I will leave this because hacking legends with a fake aesthetic might still be needed in certain cases.
As #M-M correctly said - ggplot doesn't want to draw two legends for one aesthetic.
I truly hope that you won't often need to do something like the following hack:
Make a fake aesthetic (I chose alpha), and define the color for each line manually.
Then change your legend keys using override.aes manually.
If you have more than this data to show, consider different ways of visualisation / data separation. A very good thing is facetting.
library(ggplot2)
library(dplyr)
ggplot(dat, aes(x = load, y = feet)) +
geom_line(aes(color = mat, linetype = SF)) +
geom_path(dat = filter(soil,test ==1),
aes(x = load, y = feet, alpha = factor(test)), color = 'red') +
geom_path(dat = filter(soil,test ==2),
aes(x = load, y = feet, alpha = factor(test)), color = 'brown') +
geom_path(dat = filter(soil,test ==3),
aes(x = load, y = feet, alpha = factor(test)), color = 'green') +
scale_alpha_manual(values = c(rep(1,3))) +
scale_linetype(guide = FALSE) +
guides( alpha = guide_legend(title = 'test',
override.aes = list(color = c('red','brown','green'))))
Or you can make two separate ggplots, then overlay one using cowplot:
library(cowplot) #cowplot_1.0.0
library(ggplot2)
myplot <- ggplot(dat, aes(x = load, y = feet)) +
geom_line(aes(color = mat, linetype = SF)) +
scale_linetype(guide = FALSE) +
lims(x = c(0,11), y = c(0,3)) +
theme(legend.justification = c(0, 1), # move the bottom legend up a bit
axis.text.x = element_blank(), # remove all the labels from the base plot
axis.text.y = element_blank(),
axis.title = element_blank())
myplot2 <- ggplot() +
geom_path(dat = soil, aes(x = load, y = feet, color = factor(test))) +
theme_half_open() +
lims(x = c(0,11), y = c(0,3))
aligned_plots <- align_plots(myplot, myplot2, align="hv", axis="tblr")
ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
Actually, there is a better option than my previous hack - I am sure this must have been around back then, but I was simply not aware of it. Adding a new scale is very easy with ggnewscale.
ggnewscale is currently to my knowledge the only package on CRAN that allows several (discrete!) scales for the same aesthetic. For continuous scales, there is now also ggh4x::scale_color/fill_multi. And there is also Claus Wilke's relayer package on GitHub.
I really like the ggnewscale package because it's super easy to use and works with literally all aesthetics.
ggplot(mapping = aes(x = load, y = feet)) +
geom_line(data = dat, aes(color = mat, linetype = SF)) +
scale_linetype(guide = FALSE) + # This is to delete the linetype legend
ggnewscale::new_scale_color() +
geom_path(data = soil, aes(x = load, y = feet, color = as.factor(test))) +
scale_color_manual("Test", values = c('red','brown','green'))

Label specific points in ggplot

I would like certain points I have created through ggplot to take labels at the side of the graph but I am not able to do that through my current code.
Ceplane1 is a matrix with two columns and 100 rows ( can take any random numbers). I want to plot column 2 on the x-axis and column 1 on the y-axis with. I have done this part using the below code. Now I want to make changes in the code so that I can put the label at the side of the graph and not on the graph area itself. Additionally, I want to represent the axis in a comma format. you can take result.table[1,1] and result.table[1,3] to be some number and suggest the solution.
ggplot(Ceplane1, aes(x = Ceplane1[,2], y = Ceplane1[,1])) +
geom_point(colour="blue")+geom_abline(slope = -results.table[5,1],intercept = 0,colour="darkred",size=1.25)+
geom_point(aes(mean(Ceplane1[,2]),mean(Ceplane1[,1])),colour="red")+
geom_point(aes(results.table[1,1],results.table[3,1],colour="darkred"))+ggtitle("CE-Plane: Drug A vs Drug P")+
xlab("QALY Difference")+ylab("Cost Difference")+xlim(-0.05,0.05)+ylim(-6000,6000)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),plot.background = element_rect(fill = "white", colour = "black", size = 0.5))+
geom_vline(xintercept = 0,colour="black")+geom_hline(yintercept = 0,colour="black")+
geom_label(aes(mean(Ceplane1[,2]),mean(Ceplane1[,1])),label="mean")+
geom_label(aes(results.table[1,1],results.table[3,1]),label="Base ICER")
I want to put the label at the side of the graph and not on the points of the graph itself. Please suggest me a way to do that.
I think the best way is to add the mean and Base ICER points to your dataset. Then add a column for the legend and you will see them show up as matching in the chart and the legend:
library(ggplot2)
set.seed(1)
Ceplane1 <- data.frame(y = rnorm(100),
x = rnorm(100))
results.table <- data.frame(z = rnorm(100))
Ceplane1$Legend <- "Data"
meanPoint <- data.frame(y = mean(Ceplane1[,1]), x = mean(Ceplane1[,2]), Legend = "Mean")
basePoint <- data.frame(y = results.table[3,1], x = results.table[1,1], Legend = "Base ICER")
Ceplane1 <- rbind(Ceplane1, meanPoint)
Ceplane1 <- rbind(Ceplane1, basePoint)
ggplot(Ceplane1, aes(x = x, y = y, color = Legend)) +
geom_point() +
geom_abline(slope = -results.table[5,1],intercept = 0,colour="darkred",size=1.25) +
ggtitle("CE-Plane: Drug A vs Drug P")+ xlab("QALY Difference")+ylab("Cost Difference") +
xlim(-3,3) + ylim(-3,3) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),plot.background = element_rect(fill = "white", colour = "black", size = 0.5)) +
geom_vline(xintercept = 0,colour="black") +
geom_hline(yintercept = 0,colour="black")
This gives me the following:
Note that I changed the xlim and ylim to match the random data I created.

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

ggplot generating two legends when only one is wanted

In R I'm trying to generate a plot where I want to apply unique colors, line types, transparencies, and line thicknesses by case grouping. As currently implemented two legend plots are generated instead of one. The second legend plot is the only one that I can change the legend title. Presumably I've made a mistake any help would be greatly appreciated.
Ultimately I want to generate a single legend and have the style changes and labeling changes take effect.
library(ggplot2)
temp_df <- data.frame(year = integer(50), value = numeric(50), case = character(50))
temp_df$year <- 1:50
temp_df$value <- runif(50)
temp_df$case <- "A"
df <- temp_df
temp_df$value <- runif(50)
temp_df$case <- "B"
df <- rbind(df, temp_df)
LineTypes <- c("solid", "dotted")
colors <- c("red", "black")
linealphas <- c(1, .8)
linesizes <- c(1, 2)
Plot <- ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case))+
scale_linetype_manual(values = LineTypes)+
scale_color_manual(values = colors)+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas)+
scale_size_manual(values = linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
theme_minimal()
Plot
If you want ggplot to merge the legends then they all have to have the same title. You can specify the legend title with the name argument in the scales:
ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case)) +
scale_linetype_manual(values = LineTypes, name = "Scenario")+
scale_color_manual(values = colors, name = "Scenario")+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas, name = "Scenario")+
scale_size_manual(values = linesizes, name = "Scenario")+
xlab("Year")+
ylab("Percentage%")+
theme_minimal()
A coworker pointed out a resolution to me the key was to remove the guides so that only one of styles that I had defined was being used for the legend.
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
His explanation for this was that R doesn't recognize that the vector of factors defining the properties of the plot are necessarily related. As a result it will generate multiple legends when only one is desired.
library(ggplot2)
temp_df<-data.frame(year=integer(50),value=numeric(50),case=character(50))
temp_df$year<-1:50
temp_df$value<-runif(50)
temp_df$case<-"A"
df<-temp_df
temp_df$value<-runif(50)
temp_df$case<-"B"
df<-rbind(df,temp_df)
LineTypes<-c("solid","dotted")
colors<-c("red","black")
linealphas<-c(1,.8)
linesizes<-c(1,2)
Plot<-ggplot(df,aes(x=year,y=value,group=case))+
geom_line(aes(linetype=case, color=case, size=case, alpha =case))+
scale_linetype_manual(values=LineTypes)+
scale_color_manual(values=colors)+
scale_y_continuous(limits=c(0,1),labels = scales::percent)+
scale_alpha_manual(values=linealphas)+
scale_size_manual(values=linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
theme_minimal()
Plot
Can't you just remove the line "labs(color = "Scenario")"?
This is the plot that gets generated. Not sure if it's missing anything that you need.
The result seems fine to me:

Two geom_points add a legend

I plot a 2 geom_point graph with the following code:
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
ggplot() +
geom_point(aes(x = year,y = boys),data=arbuthnot,colour = '#3399ff') +
geom_point(aes(x = year,y = girls),data=arbuthnot,shape = 17,colour = '#ff00ff') +
xlab(label = 'Year') +
ylab(label = 'Rate')
I simply want to know how to add a legend on the right side. With the same shape and color. Triangle pink should have the legend "woman" and blue circle the legend "men". Seems quite simple but after many trial I could not do it. (I'm a beginner with ggplot).
If you rename your columns of the original data frame and then melt it into long format withreshape2::melt, it's much easier to handle in ggplot2. By specifying the color and shape aesthetics in the ggplot command, and specifying the scales for the colors and shapes manually, the legend will appear.
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
library(reshape2)
names(arbuthnot) <- c("Year", "Men", "Women")
arbuthnot.melt <- melt(arbuthnot, id.vars = 'Year', variable.name = 'Sex',
value.name = 'Rate')
ggplot(arbuthnot.melt, aes(x = Year, y = Rate, shape = Sex, color = Sex))+
geom_point() + scale_color_manual(values = c("Women" = '#ff00ff','Men' = '#3399ff')) +
scale_shape_manual(values = c('Women' = 17, 'Men' = 16))
This is the trick that I usually use. Add colour argument to the aes and use it as an indicator for the label names.
ggplot() +
geom_point(aes(x = year,y = boys, colour = 'Boys'),data=arbuthnot) +
geom_point(aes(x = year,y = girls, colour = 'Girls'),data=arbuthnot,shape = 17) +
xlab(label = 'Year') +
ylab(label = 'Rate')
Here is a way of doing this without using reshape::melt. reshape::melt works, but you can get into a bind if you want to add other things to the graph, such as line segments. The code below uses the original organization of data. The key to modifying the legend is to make sure the arguments to scale_color_manual(...) and scale_shape_manual(...) are identical otherwise you will get two legends.
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(ggplot2)
library(reshape2)
ptheme <- theme (
axis.text = element_text(size = 9), # tick labels
axis.title = element_text(size = 9), # axis labels
axis.ticks = element_line(colour = "grey70", size = 0.25),
panel.background = element_rect(fill = "white", colour = NA),
panel.border = element_rect(fill = NA, colour = "grey70", size = 0.25),
panel.grid.major = element_line(colour = "grey85", size = 0.25),
panel.grid.minor = element_line(colour = "grey93", size = 0.125),
panel.margin = unit(0 , "lines"),
legend.justification = c(1, 0),
legend.position = c(1, 0.1),
legend.text = element_text(size = 8),
plot.margin = unit(c(0.1, 0.1, 0.1, 0.01), "npc") # c(bottom, left, top, right), values can be negative
)
cols <- c( "c1" = "#ff00ff", "c2" = "#3399ff" )
shapes <- c("s1" = 16, "s2" = 17)
p1 <- ggplot(data = arbuthnot, aes(x = year))
p1 <- p1 + geom_point(aes( y = boys, color = "c1", shape = "s1"))
p1 <- p1 + geom_point(aes( y = girls, color = "c2", shape = "s2"))
p1 <- p1 + labs( x = "Year", y = "Rate" )
p1 <- p1 + scale_color_manual(name = "Sex",
breaks = c("c1", "c2"),
values = cols,
labels = c("boys", "girls"))
p1 <- p1 + scale_shape_manual(name = "Sex",
breaks = c("s1", "s2"),
values = shapes,
labels = c("boys", "girls"))
p1 <- p1 + ptheme
print(p1)
output results
Here is an answer based on the tidyverse package. Where one can use the pipe, %>%, to chain functions together. Creating the plot in one continues manner, omitting the need to create temporarily variables. More on the pipe can be found in this post What does %>% function mean in R?
As far as I know, legends in ggplot2 are only based on aesthetic variables. So to add a discrete legend one uses a category column, and change the aesthetics according to the category. In ggplot this is for example done by aes(color=category).
So to add two (or more) different variables of a data frame to the legends, one needs to transform the data frame such that we have a category column telling us which column (variable) is being plotted, and a second column that actually holds the value. The tidyr::gather function, that was also loaded by tidyverse, does exactly that.
Then one creates the legend by just specifying which aesthetics variables need to be different. In this example the code would look as follows:
source("http://www.openintro.org/stat/data/arbuthnot.R")
library(tidyverse)
arbuthnot %>%
rename(Year=year,Men=boys,Women=girls) %>%
gather(Men,Women,key = "Sex",value = "Rate") %>%
ggplot() +
geom_point(aes(x = Year, y=Rate, color=Sex, shape=Sex)) +
scale_color_manual(values = c("Men" = "#3399ff","Women"= "#ff00ff")) +
scale_shape_manual(values = c("Men" = 16, "Women" = 17))
Notice that tidyverse package also automatically loads in the ggplot2 package. An overview of the packages installed can be found on their website tidyverse.org.
In the code above I also used the function dplyr::rename (also loaded by tidyverse) to first rename the columns to the wanted labels. Since the legend automatically takes the labels equal to the category names.
There is a second way to renaming labels of legend, which involves specifying the labels explicitly in the scale_aesthetic_manual functions by the labels = argument. For examples see legends cookbook. But is not recommended since it gets messy quickly with more variables.

Resources