ggplot2: Making changes to symbols in the legend - r

I'm having a problem making the symbols in the legend of my plot match those in the plot itself.
Suppose the data has four columns like this
data = data.frame(x = sample(1:10, 10, replace=TRUE), y = sample(1:10, 10, replace=TRUE),
Rank = sample(1:10, 10, replace = TRUE), Quantified = factor(sample(1:2, 10, replace = TRUE))
)
I would like points to be different sizes (distinguished by 'Rank') and represented by different symbols (crosses and open circles, distinguished by 'Quantified').
My code is
ggplot(data, aes(x = x, y = y)) +
geom_point(aes(size = Rank, shape = Quantified)) +
scale_shape_manual("Quantified", labels = c("Yes", "No"), values = c(1, 4)
)
The symbols in the plot are as I want them.
My problem is that I would like the circles in the top legend to be unfilled as they are in the plot.
I've tried a variety of commands in different parts of the code (e.g., fill = "white") but nothing seems to work quite right.
Any suggestions?

Now that I'm sure it's what you want:
library(scales)
ggplot(data, aes(x = x, y = y)) +
geom_point(aes(size = Rank, shape = Quantified)) +
scale_shape_manual("Quantified", labels = c("Yes", "No"), values = c(1, 4)) +
guides(size = guide_legend(override.aes = list(shape = 1)))

Related

Why geom_line legend if show.legend = FALSE and why different colours

After executing the code here below, I was wondering:
1- Why "A.line" and "B.line" variables appear in the geom_point() legend.
2- why there are four colors in the legend.
I guess both answers are related, but I can not tell what is going on.
I would like to have the legend just with "A.points" and "B.points".
I would also like the same colors in both lines and points (I guess this I can do manually).
Thanks in advance for your help.
Best,
David
data.frame(x = rep(1:2,2),
names.points = rep(c("A.point","B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line","B.line"), each = 2),
y.line = c(3, 3, 8, 8)) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE)
Legends are not related to geoms but to the scales and display the categories (or the range of the values) mapped on an aesthetic. Hence, you get four colors because you have four categories mapped on the color aesthetic. The geoms used are only displayed in the legend key via the so called key glyph which is a point for geom_point and a line for geom_line. And show.legend=FALSE only means to not display the key glyph for geom_line in the legend key, i.e. the legend keys shows only a point but no line.
To remove the categories related to the lines from your legend use e.g. the breaks argument of scale_color_discrete instead.
library(ggplot2)
library(dplyr)
data.frame(
x = rep(1:2, 2),
names.points = rep(c("A.point", "B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line", "B.line"), each = 2),
y.line = c(3, 3, 8, 8)
) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE) +
scale_color_discrete(breaks = c("A.point", "B.point"))
UPDATE To fix your issue with the colors you could use a named color vector:
pal_col <- rep(c("darkblue","darkred"), 2)
names(pal_col) <- c("A.point", "B.point", "A.line", "B.line")
data.frame(
x = rep(1:2, 2),
names.points = rep(c("A.point", "B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line", "B.line"), each = 2),
y.line = c(3, 3, 8, 8)
) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE) +
scale_color_manual(breaks = c("A.point", "B.point"),
values = pal_col)

How to produce neat label positions in the ggplot2 line chart?

I have a line chart built using ggplot2. It looks following:
Lines are close to each other and data labels are overlapping. It is not convenient. It would be better if light red labels were below the line and green labels where there is room for them. Something of the sort:
This post is helpful. However, I do not know in advance for which line it would be better to put labels above and for which it would be better to keep them below. Therefore I am looking for a generic solution.
ggrepel does a great job in organizing labels. But cannot figure out how to make it work in my case. I tried different parameters. Here is one of the simplest variants (not the best looking):
Questions:
Is there any way to make in R the chart look like on the 2nd picture?
I think ggrepel computes the best label position taking into account the size of the chart. If I export the chart to PowerPoint, for example, the size of the PowerPoint chart might be different from the size used to get optimal data label positions. Is there any way to pass the size of the chart to ggrepel?
Here is a code I used to generate data and charts:
library(ggplot2)
library(ggrepel)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text_repel(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
Changing the theme to theme_bw() and removing gridlines from {ggExtra}'s removeGridX() gets the plot closer your second image. I also increased the size of the lines, limited the axes, and changed geom_text_repel to geom_label_repel to improve readability.
library(ggplot2)
library(ggrepel)
library(ggExtra)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
theme_bw() + removeGridX() +
geom_line(size = 2) +
geom_label_repel(aes(label = round(y, 1)),
nudge_y = 0.5,
point.size = NA,
segment.color = NA,
min.segment.length = 0.1,
key_glyph = draw_key_path) +
scale_x_continuous(breaks=seq(0,20,by=1)) +
scale_y_continuous(breaks = seq(0, 14, 2), limits = c(0, 14))

Combined scatter and line ggplot with proper legend

I try to find a clear approach for combined scatter and line plots with ggplot2 that have an appropriate legend. The following works, in principle, but with warnings:
library("ggplot2")
library("dplyr")
## 2 data sets, one for the lines, one for the points
tbl <- tibble(
f = rep(letters[1:2], each = 10),
x = rep(1:10, 2),
y = c(1e-4 * exp(1:10), log(1:10))
)
obs <- tibble(
f = rep("c", 5),
x = seq(2, 10, 2),
y = log(seq(2, 10, 2)) + rnorm(5, sd = 0.1)
)
rbind(tbl, obs) %>%
ggplot(aes(x, y, color = f, linetype = f)) +
geom_line(show.legend = TRUE) +
geom_point(show.legend = TRUE, aes(shape = f), size = 3) +
scale_linetype_manual(values=c("solid", "solid", "blank")) +
scale_shape_manual(values=c(NA, NA, 16))
but I would like to get rid of warnings and to write something like:
scale_shape_manual(values=c("none", "none", "circle"))
Is there already a "none" or "empty" shape code? Several past answers have been suggested on SO, but I wonder if there is a recent canonical way.

Use position_jitterdodge without mapping aesthetic

I would like to produce a plot like the one obtained with the code below. However, I would like to dodge by "replicate", but without actually mapping an aesthetic (because I would like to assign fill and colors to other aesthetics).
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
ggplot(data = dataset, aes(x = sample, y = value, fill = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F)
I had hope using group = replicate instead of fill = replicate but this doesn't work. I can imagine a workaround using for example alpha = replicate as an aesthetic and setting scale_alpha_manual(values = c(1, 1)) in case of duplicates, but I don't find this solution ideal and would like to keep all aesthetics available (other than x and y available for further use)
ggplot(data = dataset, aes(x = sample, y = value, alpha = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F) +
scale_alpha_manual(values = c(1, 1))
The plot that I expect to get is:
I hope my question makes sense, any hint ?
Best,
Yvan
You could unite the sample and replicate columns and use that as the x-axis, injecting a 'Placeholder' value for spacing between samples.
library(tidyverse)
set.seed(20181101)
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
dataset %>%
bind_rows({
#create a dummy placeholder to allow for spacing between samples
data.frame(sample = unique(dataset$sample),
replicate = rep("Placeholder", length(unique(dataset$sample))),
stringsAsFactors = FALSE)
}) %>%
#unite the sample & replicate columns, and use it as the new x-axis
unite(sample_replicate, sample, replicate, remove = FALSE) %>%
ggplot(aes(x = sample_replicate, y = value, color = replicate)) +
geom_jitter() +
#only have x-axis labels for each sample
scale_x_discrete(breaks = paste0("Sample", 1:length(unique(dataset$sample)), "_B"),
labels = paste0("Sample ", 1:length(unique(dataset$sample)))) +
labs(x = "Sample") +
#don't show the Placeholder value in the legend
scale_color_discrete(breaks = c("A", "B"))

Adding line plot with boxplot

Sample data
set.seed(123)
par(mfrow = c(1,2))
dat <- data.frame(years = rep(1980:2014, each = 8), x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 4000))
I have another dataset that has a single value for some selected years
ref.dat <- data.frame(years = c(1991:1995, 2001:2008), x = sample(1000:2000, 13, replace = T))
plot(ref.dat$years, ref.dat$x, type = "b")
How can I add the line plot on top of the boxplot
With ggplot2 you could do this:
ggplot(dat, aes(x = years, y = x)) +
geom_boxplot(data = dat, aes(group = years)) +
geom_line(data = ref.dat, colour = "red") +
geom_point(data = ref.dat, colour = "red", shape = 1) +
coord_cartesian(ylim = c(500, 4000)) +
theme_bw()
The trick here is to figure out the x-axis on the boxplot. You have 35 boxes and they are plotted at the x-coordinates 1, 2, 3, ..., 35 - i.e. year - 1979. With that, you can add the line with lines as usual.
set.seed(123)
dat <- data.frame(years = rep(1980:2014, each = 8),
x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 2500))
ref.dat <- data.frame(years = c(1991:1995, 2001:2008),
x = sample(1000:2000, 13, replace = T))
lines(ref.dat$years-1979, ref.dat$x, type = "b", pch=20)
The points were a bit hard to see, so I changed the point style 20. Also, I used a smaller range on the y-axis to leave less blank space.

Resources