Plotting lines and the group aesthetic in ggplot2 - r

This question follows on from an earlier question and its answers.
First some toy data:
df = read.table(text =
"School Year Value
A 1998 5
B 1999 10
C 2000 15
A 2000 7
B 2001 15
C 2002 20", sep = "", header = TRUE)
The original question asked how to plot Value-Year lines for each School. The answers more or less correspond to p1 and p2 below. But also consider p3.
library(ggplot2)
(p1 <- ggplot(data = df, aes(x = Year, y = Value, colour = School)) +
geom_line() + geom_point())
(p2 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +
geom_line(aes(group = School)) + geom_point())
(p3 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +
geom_line() + geom_point())
Both p1 and p2 do the job. The difference between p1 and p2 is that p1 treats Year as numeric whereas p2 treats Year as a factor. Also, p2 contains a group aesthetic in geom_line. But when the group aesthetic is dropped as in p3, the lines are not drawn.
The question is: Why is the group aesthetic necessary when the x-axis variable is a factor but the group aesthetic is not needed when the x-axis variable is numeric?

In the words of Hadley himself:
The important thing [for a line graph with a factor on the horizontal axis] is to manually specify the grouping. By
default ggplot2 uses the combination of all categorical variables in
the plot to group geoms - that doesn't work for this plot because you
get an individual line for each point. Manually specify group = 1
indicates you want a single line connecting all the points.
You can actually group the points in very different ways as demonstrated by koshke here

Related

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.
When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.
I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.
Here is my code:
# Start plotting
g <- ggplot(NULL)
# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval,
na.rm = TRUE), y = numval, fill = catval), trim = TRUE,
scale = "width", adjust = 0.5)
(snip)
# Specify colors
g <- g + scale_colour_viridis_d()
# Remove legend
g <- g + theme(legend.position = "none")
# Flip for readability
g <- g + coord_flip()
# Produce plot
g
Here is the resulting plot.
If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.
Is there a way to get what I'm after?
I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.
library(dplyr)
diamonds %>%
group_by(cut) %>%
summarize(mean_price = mean(price))
# A tibble: 5 x 2
cut mean_price
<ord> <dbl>
1 Fair 4359.
2 Good 3929.
3 Very Good 3982.
4 Premium 4584.
5 Ideal 3458.
By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price, fill = cut)) +
geom_violin() +
coord_flip()
If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price,
fill = reorder(cut, -price))) +
geom_violin() +
coord_flip()
Or
diamonds %>%
mutate(cut = reorder(cut, -price)) %>%
ggplot(aes(x = cut, y = price, fill = cut)) +
geom_violin() +
coord_flip()

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

Boxplot and line with dual y-axis from two data frame using ggplot in R

I am using ggplot to put boxplot and line in the same plot. I have two data frames, here are snippets for these two DFs:
TMA.core variable value
1 I-5 H&E 356642.6
2 B-1 H&E 490276.9
3 B-13 H&E 460831.8
4 L-11 H&E 551614.2
5 B-6 H&E 663711.8
6 F-10 H&E 596832.8
(there are many variables.)
TMA.core Mean CoV
I-5 390829.7 0.15181577
B-1 414909.9 0.21738852
B-13 500829.8 0.39049256
L-11 537229.7 0.07387486
B-6 575698.9 0.44764127
F-10 589245.2 0.15382864
What I want to do is draw boxplot using the first data frame and then plot the CoV for the corresponding TMA core and connect using geom_line.
My codes are:
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV, group = 1)) +
scale_y_continuous(
# Add a second axis and specify its features
sec.axis = sec_axis(~./1000000, name = 'CoV')
)
Using these codes I can draw the boxplot but the line is always a horizontal line at y = 0.
How to solve this issue?
Using one or two data frames doesn't really matter. Just remember to adjust the y aesthetic accordingly, which you forgot to do.
library(ggplot2)
library(scales)
Find the ideal scaling factor for the dual axis
ratio <- max(Merge_stats_melt$value) / max(Merge_stas_mean_order$CoV)
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV*ratio, group = 1)) +
scale_y_continuous(labels=comma,
sec.axis = sec_axis(~./ratio, name = 'CoV')
)

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

Add pair lines in R

I have some data measured pair-wise (e.g. 1C, 1M, 2C and 2M), which I have plotted separately (as C and M). However, I would like to add a line between each pair (e.g. a line from point 1 in the C column to point 1 in the M 'column').
A small section of the entire dataset:
PairNumber Type M
1 M 0.117133
2 M 0.054298837
3 M 0.039734
4 M 0.069247069
5 M 0.043053957
1 C 0.051086898
2 C 0.075519
3 C 0.065834198
4 C 0.084632915
5 C 0.054254946
I have generated the below picture using the following tiny R snippet:
boxplot(test$M ~ test$Type)
stripchart(test$M ~ test$Type, vertical = TRUE, method="jitter", add = TRUE, col = 'blue')
Current plot:
I would like to know what command or what function I would need to achieve this (a rough sketch of the desired result, with only some of the lines, is presented below).
Desired plot:
Alternatively, doing this with ggplot is also fine by me, I have the following alternative ggplot code to produce a plot similar to the first one above:
ggplot(,aes(x=test$Type, y=test$M)) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(position=position_jitter(width=.1, height=0))
I have been trying geom_path, but I have not found the correct syntax to achieve what I want.
I would probably recommend breaking this up into multiple visualizations -- with more data, I feel this type of plot would become difficult to interpret. In addition, I am not sure it's possible to draw the geom_lines and connect them with the additional call to geom_jitter. That being said, this gets you most of the way there:
ggplot(df, aes(x = Type, y = M)) +
geom_boxplot(outlier.shape = NA) +
geom_line(aes(group = PairNumber)) +
geom_point()
The trick is to specify your group aesthetic within geom_line() and not up top within ggplot().
Additional Note: No reason to fully qualify your aesthetic variables within ggplot() -- that is, no reason to do ggplot(data = test, aes(x = test$Type, y = test$M); rather, just use: ggplot(data = test, aes(x = Type, y = M)).
UPDATE
Leveraging cowplot to visualize this data in different plots could prove helpful:
library(cowplot)
p1 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_boxplot()
p2 <- ggplot(df, aes(x = Type, y = M, color = Type)) +
geom_jitter(position = position_jitter(width = 0.1, height = 0))
p3 <- ggplot(df, aes(x = M, color = Type, fill = Type)) +
geom_density(alpha = 0.5)
p4 <- ggplot(df, aes(x = Type, y = M)) +
geom_line(aes(group = PairNumber, color = factor(PairNumber)))
plot_grid(p1, p2, p3, p4, labels = c(LETTERS[1:4]), align = "v")

Resources