Legend with ggplot2 and geom_point - r

I am trying to create a legend when using ggplot2 and geom_point. I have a dataframe that looks like this
d <- data.frame(variable = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
value = rnorm(10, 3),
Schoolave = rnorm(10, 3),
districtave = rnorm(10, 3),
max = rnorm(10, 3),
min = rnorm(10, 3))
I want to make a plot that looks like this.
plot <- ggplot(data = d, aes(x = variable, y = value)) + geom_errorbar(ymax = d$max, ymin = d$min)
plot <- plot + coord_flip()
plot <- plot + geom_point(data = d, aes(x = variable, y = value),
shape = 1, size = 5)
plot <- plot + geom_point(data = d, aes(x = variable, y = districtave), shape = 0, size = 4)
plot <- plot + geom_point(data = d, aes(x = variable, y = Schoolave), shape = 2, size = 3)
plot <- plot + theme_bw() + theme(legend.position= "bottom")
plot
I would like a legend that tells them the circle for your average score. The triangle is the average score for your school. The square is the average for the district. I have looked for a way to do this can't find a way. Any help would be appreciated.

ggplot likes tidy data, which means you have to melt your data into long format.
library(reshape2)
d.points = melt(d[, c("variable", "value", "Schoolave", "districtave")],
id = "variable",
variable.name = "type")
Now your data has a single column that we'll map to shape, so the auto legend will work.
And there's no need to add to the same plot and save it every single line, just add it all at once.
And please don't specify the data$column inside of aes()! It will cause problems if you ever want to facet or do more advanced plots. You specify the data upfront, so you don't have to say it again inside aes(). And do use aes() for all your aesthetic mappings, like the min and max of the errorbar.
plot <- ggplot(data = d, aes(x = variable, y = value)) +
geom_errorbar(aes(ymax = max, ymin = min)) +
coord_flip() +
geom_point(data = d.points,
aes(x = variable, y = value, shape = type),
size = 5) +
scale_shape_manual(values = 1:3, name = "") +
theme_bw() +
theme(legend.position= "bottom")
plot
If you want nicer labels, you can specify them in the shape scale. On a horizontal legend, I like to add in a little white space, something like this:
scale_shape_manual(values = 1:3, name = "",
labels = c("Individual Average ", "School Average ", "District Average ")) +

Related

ggplot2: How to integrate legends for scales with parsed labels?

Problem
In ggplot2, legends for different scales are usually integrated into a single, combined legend whenever possible. This worked fine for me so far. However, when I try parsing the scale labels to include mathematical symbols in the legend, this does not seem to work.
See this example:
# example data
d <- data.frame(x = 1:3, y = rep(0,3), f = c("a[1]", "a[2]", "a[3]"))
# plot
p <- ggplot(data = d, aes(x = x, y = y, color = f, shape = f)) +
geom_point() +
guides(
color = guide_legend(title = "F"),
shape = guide_legend(title = "F")
)
The following gives the plot with custom values for shapes/colors and with the legends combined as intended.
# plot + custom shapes/colors
p +
scale_color_manual(name = "F", values = c("red", "blue", "green")) +
scale_shape_manual(name = "F", values = c(16, 15, 18))
However, when parsing the labels, the labels come out as expected, but the legends are no longer combined.
# plot + custom shapes/colors + parsed labels
parse.labels <- function(x) parse(text = x)
p +
scale_color_manual(name = "F", labels = parse.labels, values = c("red", "blue", "green")) +
scale_shape_manual(name = "F", labels = parse.labels, values = c(16, 15, 18))
Note that the result is the same with scale_._discrete instead of scale_._manual. Similarly, specifying identical names for the two scales with guides(shape = guide_legend(title = "F"), color = guide_legend(title = "F")) does not change this behavior.
Question
How can I integrate the two legends while maintaining the parsed labels?
Use scales::parse_format() instead of the parse() function from base R, and you should be fine:
library(scales)
ggplot(data = d, aes(x = x, y = y, color = f, shape = f)) +
geom_point() +
scale_color_manual(name = "F",
labels = parse_format(),
values = c("red", "blue", "green")) +
scale_shape_manual(name = "F",
labels = parse_format(),
values = c(16, 15, 18))
I think this has something to do with how parse returns an expression tagged with automatically-generated srcfile / wholeSrcref attributes by default, while parse_format does not. These additional attributes prevent the two scales from being merged together, since they are not identical.
(Using function(x) parse(x = text, srcfile = NULL) in both scales will also work, same as above, but I find the function from scales to be less verbose.)
I would suggest this approach using the labels argument in scale_*_discrete() and saving your values for labels in a new vector:
library(ggplot2)
# example data
d <- data.frame(x = 1:3, y = rep(0,3), f = c("a[1]", "a[2]", "a[3]"))
#Labs
lab1 <- c(expression(a[1]),
expression(a[2]),
expression(a[3]))
# plot
ggplot(data = d, aes(x = x, y = y, color = f, shape = f)) +
geom_point() +
guides(
color = guide_legend(title = "F"),
shape = guide_legend(title = "F")
)+
scale_color_discrete(labels = lab1) +
scale_shape_discrete(labels = lab1)
Output:

R: How to combine grouping and colour aesteric in ggplot line plot

I am trying to create a line plot with 2 types of measurements, but my data is missing some x values. In Line break when no data in ggplot2 I have found how to create plot that will make a break when there is now data, but id does not allow to plot 2 lines (one for each Type).
1) When I try
ggplot(Data, aes(x = x, y = y, group = grp)) + geom_line()
it makes only one line, but with break when there is no data
2) When I try
ggplot(Data, aes(x = x, y = y, col = Type)) +
geom_line()
it makes 2 lines, but with break when there is no data
3) When I try
ggplot(Data, aes(x = x, y = y, col = Type, group = grp)) +
geom_line()
it makes unreadyble chart
4) of course I could combine the Type and grp to make new variable, but then the legend is not nice, and I get 4 groups (and colours) insted of 2.
5) also I could make something like that, but it dose not produce a legend, and in my real dataset i have way to many Types to do that
ggplot() +
geom_line(data = Data[Data$Type == "A",], aes(x = x, y = y, group = grp), col = "red") +
geom_line(data = Data[Data$Type == "B",], aes(x = x, y = y, group = grp), col = "blue")
Data sample:
Data <- data.frame(x = c(1:100, 201:300), y = rep(c(1, 2), 100), Type = rep(c("A", "B"), 100), grp = rep(c(1, 2), each = 100))
One way is to use interaction() to specify a grouping of multiple columns:
library(ggplot2)
Data <- data.frame(x = c(1:100, 201:300), y = rep(c(1, 2), 100), Type = rep(c("A", "B"), 100), grp = rep(c(1, 2), each = 100))
ggplot(Data, aes(x = x, y = y, col = Type, group = interaction(grp,Type))) +
geom_line()

How to start ggplot2 geom_bar from different origin

I'd like to start a bar chart at somewhere other than the y = 0. In my case, I want to start the bar chart at y = 1.
As an example, let's say that I build a identity geom_bar() chart with ggplot2.
df <- data.frame(values = c(1, 2, 0),
labels = c("A", "B", "C"))
library(ggplot2)
ggplot(df, aes(x = labels, y = values, fill = labels, colour = labels)) +
geom_bar(stat="identity")
Now, I'm not asking how to set scale or axis limits. I want bars representing values less than 1 to flow down from y = 1.
It needs to look like this...but with a different y axis:
Any advice?
You could just change the labels manually, as shown in the other answer. However, I think conceptually the better solution is to define a transformation object that transforms the y axis scale as requested. With that approach, you're literally just modifying the relative baseline for the bar plots, and you can still set breaks and limits as you normally would.
df <- data.frame(values = c(1,2,0), labels = c("A", "B", "C"))
t_shift <- scales::trans_new("shift",
transform = function(x) {x-1},
inverse = function(x) {x+1})
ggplot(df, aes(x = labels, y = values, fill = labels, colour = labels)) +
geom_bar(stat="identity") +
scale_y_continuous(trans = t_shift)
Setting breaks and limits:
ggplot(df, aes(x = labels, y = values, fill = labels, colour = labels)) +
geom_bar(stat="identity") +
scale_y_continuous(trans = t_shift,
limits = c(-0.5, 2.5),
breaks = c(0, 1, 2))
You could use
ggplot(df, aes(x = labels, y = values-1, fill = labels, colour = labels)) +
geom_bar(stat = "identity") +
scale_y_continuous(name = 'values',
breaks = seq(-1, 1, 0.5),
labels = seq(-1, 1, 0.5) + 1)

Side by side violin plots for multiple iteration

Here is my data
set.seed(42)
dat = data.frame(iter = rep(1:3, each = 10),
variable = rep(rep(letters[1:2], each = 5), 3),
value = rnorm(30))
I know I can draw violin plots for a and b with
library(ggplot2)
ggplot(data = dat, aes (x = variable, y = value)) + geom_violin()
But how do I draw violin plots for each iteration of a and b so that there will be three plots for a next to three plots for b. I have done it previously using base plot but I am looking for a better solution since the number of iterations as well as number of 'a's and 'b's keeps on changing.
There are two possible ways. One would be by adding a fill command, the other using facet_wrap (or facet_grid)
With fill:
ggplot(data = dat, aes (x = variable, y = value, fill = as.factor(iter))) + geom_violin(position = "dodge")
Or using facet_wrap:
ggplot(data = dat, aes (x = as.factor(iter), y = value)) + geom_violin(position = "dodge") + facet_wrap(~variable)
Maybe there is a better way but in this kind of situation I usually create a new variable:
set.seed(42)
dat = data.frame(iter = rep(1:3, each = 10),
variable = rep(rep(letters[1:2], each = 5), 3),
value = rnorm(30))
dat <- dat %>% mutate(x_axis = as.factor(as.numeric(factor(variable))*100 + 10*iter))
levels(dat$x_axis)<- c("a1", "a", "a3", "b2", "b", "b3")
ggplot(data = dat,
aes(x = x_axis,
y = value, fill =variable)) + geom_violin() + scale_x_discrete(breaks = c("a","b"))
Result is:

Change the symbol in a legend key in ggplot2

This R code produces a ggplot2 graph in which the legend key contains the letter "a" repeated in red, blue and green.
x <- rnorm(9); y <- rnorm(9); s <- rep(c("F","G","K"), each = 3)
df <- data.frame(x, y, s)
require(ggplot2)
ggplot(df, aes(x = x, y = y, col = s, label = s)) +
geom_text() +
scale_colour_discrete(name = "My name", breaks = c("F","K","G"), labels = c("Fbig","Kbig","Gbig"))
I would like to replace the repeated "a" in the legend key with "F", "K" and "G".
Is this possible please? Thank you.
Adapting code for this answer:
The idea is to inhibit the geom_text legend, but to allow a legend for geom_point, but make the point size zero so the points are not visible in the plot, then set size and shape of the points in the legend in the guides statement
x <- rnorm(9); y <- rnorm(9); s <- rep(c("F","G","K"), each = 3)
df <- data.frame(x, y, s)
#
require(ggplot2)
#
ggplot(df, aes(x = x, y = y, colour = s, label = s)) +
geom_point(size = 0, stroke = 0) + # OR geom_point(shape = "") +
geom_text(show.legend = FALSE) +
guides(colour = guide_legend(override.aes = list(size = 5, shape = c(utf8ToInt("F"), utf8ToInt("K"), utf8ToInt("G"))))) +
scale_colour_discrete(name = "My name", breaks = c("F","K","G"), labels = c("Fbig","Kbig","Gbig"))
to manually rename the legend add
+ scale_x_continuous(breaks=c(x1,x2,x3), labels=c("F", "K", "G"))
where x1,x2,x3 are the point number

Resources