R: Displaying a Legend using ggplot2 with multiple dataframes - r

I am using ggplot to plot multiple dataframes in one plot.
ggplot(data2_CY, aes(x = Date, y = AVERAGETOTALCOST)) + geom_point(color = 'blue') + geom_point(data = data2_CN, color = 'red') +
geom_point(data = data2_LY, color = 'cyan') + geom_point(data = data2_LN, color = 'magenta') +
labs(title = "Trends Over Time", x = "Time", y = "Average Total Cost") + scale_x_date(breaks = seq(as.Date("2015-01-01"), as.Date("2019-07-01"), by = "6 months"), date_labels = "%b\n%Y") +
scale_color_manual(values = c('C&Y' = 'blue', 'C&N' = 'red', 'L&Y' = 'cyan', 'L&N' = 'magenta'))
The code above returns a singular plot with multiple points in the correct colors. However, the scale_color_manual() portion of the code does not return a legend. I am trying to display the legend to the right of the graph.
Thanks for the help!

The scale_color_manual() must connect to a specific data. My advice to you is to merge all these data and plot them using one dataframe

Related

Add a legend in two geom_points

I want to add the legend next to my plot in order to be able to show which color is explaining each variable. How can I do that? Here's my code:
scatter = ggplot(vehicles, aes(x = seq(1,108))) +
geom_point(aes(y = RV),size = 2,color = "blue" ) +
geom_point(aes(y= BOP), size = 2, color = "red") +
scatter
I just obtained my plot but without the names in it
It would be clearer if you can provide the vehicles data as well but I'm guessing this would work:
scatter = ggplot(vehicles, aes(x = seq(1,108))) +
geom_point(aes(y = RV, colour = "RV"),size = 2 ) +
geom_point(aes(y= BOP, colour = "BOP"), size = 2) +
scatter

adding a label in geom_line in R

I have two very similar plots, which have two y-axis - a bar plot and a line plot:
code:
sec_plot <- ggplot(data, aes_string (x = year, group = 1)) +
geom_col(aes_string(y = frequency), fill = "orange", alpha = 0.5) +
geom_line(aes(y = severity))
However, there are no labels. I want to get a label for the barplot as well as a label for the line plot, something like:
How can I add the labels to the plot, if there is only pone single group? is there a way to specify this manually? Until know I have only found option where the labels can be added by specifying them in the aes
EXTENSION (added a posterior):
getSecPlot <- function(data, xvar, yvar, yvarsec, groupvar){
if ("agegroup" %in% xvar) xvar <- get("agegroup")
# data <- data[, startYear:= as.numeric(startYear)]
data <- data[!claims == 0][, ':=' (scaled = get(yvarsec) * max(get(yvar))/max(get(yvarsec)),
param = max(get(yvar))/max(get(yvarsec)))]
param <- data[1, param] # important, otherwise not found in ggplot
sec_plot <- ggplot(data, aes_string (x = xvar, group = groupvar)) +
geom_col(aes_string(y = yvar, fill = groupvar, alpha = 0.5), position = "dodge") +
geom_line(aes(y = scaled, color = gender)) +
scale_y_continuous(sec.axis = sec_axis(~./(param), name = paste0("average ", yvarsec),labels = function(x) format(x, big.mark = " ", scientific = FALSE))) +
labs(y = paste0("total ", yvar)) +
scale_alpha(guide = 'none') +
theme_pubclean() +
theme(legend.title=element_blank(), legend.background = element_rect(fill = "white"))
}
plot.ExposureYearly <- getSecPlot(freqSevDataAge, xvar = "agegroup", yvar = "exposure", yvarsec = "frequency", groupvar = "gender")
plot.ExposureYearly
How can the same be done on a plot where both the line plot as well as the bar plot are separated by gender?
Here is a possible solution. The method I used was to move the color and fill inside the aes and then use scale_*_identity to create and format the legends.
Also, I needed to add a scaling factor for severity axis since ggplot does not handle the secondary axis well.
data<-data.frame(year= 2000:2005, frequency=3:8, severity=as.integer(runif(6, 4000, 8000)))
library(ggplot2)
library(scales)
sec_plot <- ggplot(data, aes(x = year)) +
geom_col(aes(y = frequency, fill = "orange"), alpha = 0.6) +
geom_line(aes(y = severity/1000, color = "black")) +
scale_fill_identity(guide = "legend", label="Claim frequency (Number of paid claims per 100 Insured exposure)", name=NULL) +
scale_color_identity(guide = "legend", label="Claim Severity (Average insurance payment per claim)", name=NULL) +
theme(legend.position = "bottom") +
scale_y_continuous(sec.axis =sec_axis( ~ . *1, labels = label_dollar(scale=1000), name="Severity") ) + #formats the 2nd axis
guides(fill = guide_legend(order = 1), color = guide_legend(order = 2)) #control which scale plots first
sec_plot

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

Why do two legends appear when manually editing in ggplot2?

I want to plot two lines, one solid and another one dotted, both with different colors. I'm having trouble dealing with the legends for this plot. Take this example:
library(ggplot2)
library(reshape2)
df = data.frame(time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
)
test_data_long <- melt(df, id="time") # convert to long format
p = ggplot(data=test_data_long,
aes(x=time, y=value, colour=variable)) +
geom_line(aes(linetype=variable)) +
labs(title = "", x = "Muestras", y = "Amplitud", color = "Spike promedio\n") +
scale_color_manual(labels = c("Hallado", "Real"), values = c("blue", "red")) +
xlim(0, 127)
print(p)
Two legends appear, and on top of it, none of them is correct (the one with the right colors has wrong line styles, and the one with the right line styles has all other things wrong).
Why is this happening and how can I get the right legend to appear?
You need to ensure all the aesthetic mappings match between the different aesthetics you're using:
library(ggplot2)
library(reshape2)
data.frame(
time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
) -> xdf
test_data_long <- melt(xdf, id = "time")
ggplot(
data = test_data_long,
aes(x = time, y = value, colour = variable)
) +
geom_line(aes(linetype = variable)) +
scale_color_manual(
name = "Spike promedio\n", labels = c("Hallado", "Real"), values = c("blue", "red")
) +
scale_linetype(
name = "Spike promedio\n", labels = c("Hallado", "Real")
) +
labs(
x = "Muestras", y = "Amplitud", title = ""
) +
xlim(0, 127)
Might I suggest also using theme parameters to adjust the legend title:
ggplot(data = test_data_long, aes(x = time, y = value, colour = variable)) +
geom_line(aes(linetype = variable)) +
scale_x_continuous(name = "Muestras", limits = c(0, 127)) +
scale_y_continuous(name = "Amplitud") +
scale_color_manual(name = "Spike promedio", labels = c("Hallado", "Real"), values = c("blue", "red")) +
scale_linetype(name = "Spike promedio", labels = c("Hallado", "Real")) +
labs(title = "") +
theme(legend.title = element_text(margin = margin(b=15)))

Second Y-Axis in ggplot (R) [duplicate]

This question already has answers here:
ggplot with 2 y axes on each side and different scales
(18 answers)
Closed 4 years ago.
I am trying to make a plot with ggplot in a Shiny app in R and I need to set a second Y-axis in it. This plot has two types of graphics: lines and bars. I would like to represent the bars (depth of precipitation) on the left, and the lines (flows) on the right.
My current code is:
output$plotRout <- renderPlot({
ggplot(totalRR(),aes(x=time)) +
geom_bar(aes(y=mm), stat = "identity",fill = "dodgerblue",color = "black") +
geom_bar(aes(y=NetRain), stat = "identity",fill = "Cyan",color = "black") +
geom_line(aes(y=DirRun, colour = "Direct Runoff"), stat = "identity",color = "Red") +
geom_line(aes(y=BF, colour = "Baseflow"), stat = "identity",color = "Darkorange", linetype = "longdash") +
scale_y_continuous("Rainfall (mm)", sec.axis = sec_axis(~.*10, name = "Flow (m3/s)")) +
xlab("Time (h)")
})
The result is:
This plot has on the left the values of the flows, the values that should be on the right, whereas the values of rainfall (the bars) are not displayed on the plot.
How could I make this plot putting the values of the bars (rainfall) on the left and the second y-axis on the right showing the values of the lines (flows)?
Many thanks in advance.
Victor
One solution would be to make the Flow axis your primary y axis. This involves 1) scaling the data using *10 and then 2) transforming the secondary axis using /10 to get back the correct numbers for the Rainfall axis:
ggplot(totalRR(),aes(x=time)) +
geom_bar(aes(y=10*mm), stat = "identity",fill = "dodgerblue",color = "black") +
geom_bar(aes(y=10*NetRain), stat = "identity",fill = "Cyan",color = "black") +
geom_line(aes(y=10*DirRun, colour = "Direct Runoff"), stat = "identity",color = "Red") +
geom_line(aes(y=10*BF, colour = "Baseflow"), stat = "identity",color = "Darkorange", linetype = "longdash") +
scale_y_continuous("Flow (m3/s)", sec.axis = sec_axis(~./10, name = "Rainfall (mm)")) +
xlab("Time (h)")

Resources