Join lines of different aesthetics in ggplot2 - r

I'm trying to solve the following problem: how to join lines of different aesthetics/groups in ggplot and maintain the color of a certain group. It is best to see the examples below for a better understanding of what I mean.
Consider the following code:
set.seed(870123)
library(ggplot2)
example_data <- data.frame(Date = seq.Date(as.Date("2020-01-01"),length.out = 20,by = "month"),
Series = rnorm(20) + 1,
Grouping = c(rep(c("Group1"),10),rep(c("Group2"),10)))
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Which generates the plot:
My problem is that ggplot won't join the lines where the grouping variable changes. I would like, for example, there to be a red line joining the last blue point to the first red point. One way to solve such issue would be to not specify any color aesthetic for the geom_line portion, as below, but this is NOT what I want:
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line() +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
The only way I am able to achieve what I want is with the really bad solution demonstrated below...
### Dirty Solution
temporary <- example_data[as.Date("2020-10-01") == example_data$Date,]
temporary$Grouping <- "Group2"
example_data <- rbind(example_data,temporary)
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Plot generated, which is what I would like to achieve through a better solution:
Is there any way to achieve what I want with ggplot2? I wasn't able to find a better solution.

library(ggplot2); library(dplyr)
ggplot(example_data,aes(x = Date, y = Series)) +
# use lag to connect backwards, lead to connect forwards
geom_segment(aes(color = Grouping, xend = lag(Date), yend = lag(Series))) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red"))

Related

Reset color aesthetic when using multiple ggplot2 geom_crossbar arguments

I'm trying to control the color of two separate calls to geom_crosbar, using green for the first plot, and blue for the second plot. However, I get the warning from the second geom_crossbar call Scale for 'fill' is already present:
Warning: Adding another scale for 'fill', which will replace the existing
scale.
Here's an example of my code:
my.data %>%
ggplot(aes(site, npp_nofert)) +
geom_crossbar(aes(ymin=npp_nofert-npp.sd_nofert,ymax=npp_nofert+npp.sd_nofert,
fatten=1.0,fill=period),position='dodge', alpha=0.5) +
scale_fill_brewer(palette="Greens") +
#labs(y=expression(paste("MMM %",Delta," (+/- 1",sigma,")")), x="", fill="", title="") + theme_bw() +
labs(y="",x="", fill="", title="") + theme_bw() +
theme(legend.key.size=unit(1.0,"cm"),legend.direction="horizontal",legend.position=c(0.3,0.05),
axis.text.x=element_blank(),axis.ticks.x=element_blank(),
plot.title=element_text(size=12,margin=margin(t=5,b=-20)), legend.spacing=unit(0,"cm"),
text = element_text(size=15)) +
new_scale_fill() +
geom_crossbar(aes(ymin=npp_fert-npp.sd_fert,ymax=npp_fert+npp.sd_fert, fatten=1.0,fill=period),
position='dodge',alpha=0.5) +
scale_fill_brewer(palette="Blues")
And example output:
Unfortunately, I cannot dput() the data as I do not have permission to do that.
How can I set the first plot to green and second to blue? Also, just noticed the call to alpha is in the legend. How to remove that?
Notes: The 1980 to 1999 period, there is only a single plot (i.e., no treatment), so there will not be overlaying plots for that period. The x axis represents study sites, I can fix the labels later.
The general way to go about this would be to use the ggnewscale package, which allows you to 'reset' an aesthetic at some point in the plotting process.
Since there is no data to use, I'll make up some dummy data that has a vague semblance to what you're showing above.
library(ggplot2)
library(ggnewscale)
df <- data.frame(
x = 1:5,
blue_low = 1:5,
blue_mid = 2:6,
blue_high = 3:7,
green_low = 0:4,
green_mid = 2:6,
green_high = 4:8
)
ggplot(df, aes(x = 1, group = x)) +
geom_crossbar(aes(ymin = green_low, y = green_mid, ymax = green_high,
fill = as.factor(x)),
position = "dodge", alpha = 0.5) +
scale_fill_brewer(palette = "Greens") +
new_scale_fill() + # Important to put this after you defined the first scale
geom_crossbar(aes(ymin = blue_low, y = blue_mid, ymax = blue_high,
fill = paste0(x, "_blue")), # paste to differentiate scale
position = "dodge", alpha = 0.5) +
scale_fill_brewer(palette = "Blues")
Created on 2020-06-18 by the reprex package (v0.3.0)
I'm sure it won't be too difficult to take the new_scale_fill() and put it in the correct position in your plotting code, which I think is after scale_fill_brewer(palette="Greens").
So I've decided that the approach I was using for the plot looks terrible. A better solution, IMO, is to use geom_crossbar with geom_pointrange.
Here's an example using the data that teubrand provided:
library(ggplot2)
library(ggnewscale)
df <- data.frame(
x = 1:5,
blue_low = 1:5,
blue_mid = 2:6,
blue_high = 3:7,
green_low = 0:4,
green_mid = 2:6,
green_high = 4:8
)
ggplot(df, aes(x = 1, group = x)) +
geom_crossbar(aes(ymin = green_low, y = green_mid, ymax = green_high,
fill = as.factor(x)),
position = "dodge", alpha = 0.8) +
scale_fill_brewer(palette = "Greens") +
new_scale_fill() + # Important to put this after you defined the first scale
geom_pointrange(aes(ymin = blue_low, y = blue_mid, ymax = blue_high,
fill = as.factor(x)), # paste to differentiate scale
position = position_dodge(width=0.9), color="gray30") +
scale_fill_brewer(palette = "Blues")

ggplot2 custom legend with multiple geom overlays: guide_legend() confusion

I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")
Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))
Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)

Adjusting rugplot in ggplot2

Below is the code for a graph I am making for an article I am working on. The plot showed the predicted probabilities along a range of values in my data set. Along the x-axis is a rug plot that shows the distribution of trade share values (I provided the code and an image of the graph):
sitc8 <- ggplot() + geom_line(data=plotdat8, aes(x = lagsitc8100, y = PredictedProbabilityMean), size = 2, color="blue") +
geom_ribbon(data=plotdat8, aes(x = lagsitc8100, ymin = lowersd, ymax = uppersd),
fill = "grey50", alpha=.5) +
ylim(c(-0.75, 1.5)) +
geom_hline(yintercept=0) +
geom_rug(data=multi.sanctions.bust8.full#frame, aes(x=lagsitc8100), col="black", size=1.0, sides="b") +
xlab("SITC 8 Trade Share") +
ylab("Probability of Sanctions Busting") +
theme(panel.grid.major = element_line(colour = "gray", linetype = "dotted"), panel.grid.minor =
element_blank(), panel.background = element_blank())
My question is: is it possible to change the color of the lines of the rugplot of trade share in which the event I am modeling occurs? In other words, I would like to add red lines or red dots along those values of trade share when my event = 1.
Is this possible?
Sure. You'd just have to add a color argument within an aes() function call within geom_rug().
Here's some code to create a dummy data frame.
library(tidyverse)
set.seed(42)
dummy_data <- tibble(x_var = rnorm(100),
y_var = abs(rnorm(100)) * x_var) %>%
rownames_to_column(var = "temp_row") %>%
mutate(color_id = if_else(as.numeric(temp_row) <= 50,
"Type A",
"Type B"))
And here's a ggplot call where the color for geom_rug is mapped to a character column named color_id
ggplot(data = dummy_data, mapping = aes(x = x_var, y = y_var)) +
geom_smooth(method = "lm") +
geom_rug(mapping = aes(color = color_id), sides = "b")
Update:
Following OP's comment, here's an updated version. If it's a numeric vector of 0s and 1s, you have to tell ggplot to treat it as a dichotomous variable. You can do that by wrapping it in a call to factor() for instance.
For the color we can set that manually using scale_color_manual(). So the changes to the code are the following.
color_id is now a vector og 0s and 1s.
the color is now mapped to factor(color_id)
the color scale is determined using scale_color_manual
library(tidyverse)
set.seed(42)
dummy_data <- tibble(x_var = rnorm(100),
y_var = abs(rnorm(100)) * x_var) %>%
rownames_to_column(var = "temp_row") %>%
mutate(color_id = if_else(as.numeric(temp_row) <= 50,
0,
1))
ggplot(data = dummy_data, mapping = aes(x = x_var, y = y_var)) +
geom_smooth(method = "lm") +
geom_rug(mapping = aes(color = factor(color_id)), sides = "b") +
scale_color_manual(values = c("black", "red")) +
labs(color = "This takes two values")
Definitely possible. Here's an example using iris, and a dynamic condition in the rug. You could also do two rugs, if you chose.
library(tidyverse)
iris %>%
ggplot(aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_rug(aes(color = Petal.Length >3), sides = "b")
# Second example, output not shown
iris %>%
ggplot(aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_rug(data = subset(iris, Petal.Length > 3), color = "black", sides = "b") +
geom_rug(data = subset(iris, Petal.Length <= 3), color = "red", sides = "b")

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

Selecting entries in legend of ggplot in R

I'm creating a figure using ggplot. That figure has 27 lines that I want to show but not emphasize, and two lines, mean and weighted mean, that I want to emphasize. I would like only these last two lines to appear into the legend of the plot. Here is my code:
p_plot <- ggplot(data = dta, aes(x = date, y = premium, colour = State)) +
geom_line(, show_guide=FALSE) +
scale_color_manual(values=c(rep("gray60", 27)))
p_plot <- p_plot + geom_line(aes(y = premium.m), colour = "blue", size = 1.25,
show_guide=TRUE) + geom_line(aes(y = premium.m.w), colour = "red",
size = 1.25, show_guide=c(TRUE)) + ylab("Pe/pg")
p_plot
The show_guide = FALSE statement in the first geom_line seems to be overridden by the other show_guide=TRUE statements. How can I limit the number of entries in the legend of my figures to the lines "premium.m" and "premium.m.w"? Thank you.
I think this should answer your question: (the code's been slightly modified but the concept is the same)
dta <- data.frame(date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), "months"), 26),
premium = rnorm(12*26),
State = rep(letters, each = 12))
library(ggplot2)
p_plot <- ggplot(data = dta) +
geom_line(aes(x = date, y = premium, group = State), colour = "grey60")
p_plot + geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, mean)), colour = "mean"),
size = 1.25) +
geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, median)), colour = "median"),
size = 1.25) + ylab("Pe/pg") + scale_color_discrete("stats")
p_plot
However, this is just a (ugly) workaround and far from the best practice for data visualisation (especially for the purposes ggplot has been implemented for). Anyway, I could provide you with a more elegant solution if you edited your question adding more details.

Resources