ggplot with full data and subset(s) along x-axis - r

I'll use violin plots here as an example, but the question extends to many other ggplot types.
I know how to subset my data along the x-axis by a factor:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
And I know how to plot only the full dataset:
ggplot(iris, aes(x = 1, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
My question is: is there a way to plot the full data AND a subset-by-factor side-by-side in the same plot? In other words, for the iris data, could I make a violin plot that has both "full data" and "setosa" along the x-axis?
This would enable a comparison of the distribution of a full dataset and a subset of that dataset. If this isn't possible, any recommendations on better way to visualise this would also be welcome :)
Thanks for any ideas!

Using:
ggplot(iris, aes(x = "All", y = Sepal.Length)) +
geom_violin() +
geom_point(aes(color="All"), position = "jitter") +
geom_violin(data=iris, aes(x = Species, y = Sepal.Length)) +
geom_point(data=iris, aes(x = Species, y = Sepal.Length, color = Species),
position = "jitter") +
scale_color_manual(values = c("black","#F8766D","#00BA38","#619CFF")) +
theme_minimal(base_size = 16) +
theme(axis.title.x = element_blank(), legend.title = element_blank())
gives:

Related

ggplot: adding a frequency plot over a percentage plot

I am interested in doing a plot showing percentages by group.
something like this:
data(iris)
ggplot(iris,
aes(x = Sepal.Length, group = factor(Species), fill = factor(Species))) +
geom_histogram(position = "fill")+theme_bw()
however, I would also like to plot a histogram showing the frequency distribution on top of this graph.
something like the plot below.
ggplot(iris,aes(x = Sepal.Length)) +
geom_histogram()+theme_bw()
Does anyone know how to do this?
Note I know how to do a frequency plot by group: ggplot(iris,aes(x = Sepal.Length, group = factor(Species), fill = factor(Species))) + geom_histogram()+theme_bw(). But this is not what I want. Rather I would like a small frequency distribution at the bottom of the percentage plot presented at the beginning.
Thank you very much
Something like this?
library(gridExtra)
p1 <- ggplot(iris,
aes(x = Sepal.Length,
group = factor(Species),
fill = factor(Species))) +
geom_histogram(position = "fill") +
theme_bw() +
theme(legend.position = "top")
p2 <- ggplot(iris,aes(x = Sepal.Length,
group = factor(Species),
fill = factor(Species))) +
geom_histogram() +
theme_bw() +
theme(legend.position = "none")
grid.arrange(p1, p2,
heights = c(4, 1.5))
Edit: So you are looking for this then? Note that in this case the absolute values of the smaller histogram become meaningless since they were scaled down to be ~25% of the vertical chart range.
ggplot() +
geom_histogram(data = iris,
aes(x = Sepal.Length,
group = factor(Species),
fill = factor(Species)),
position = "fill",
alpha = 1) +
geom_histogram(data = iris,
aes(x = Sepal.Length,
y = ..ncount.. / 4),
alpha = 0.5,
fill = 'black')

Combine and merge legends in ggplot2 with patchwork (discrete/categorical data)

I arranged 3 ggplot2 plots into a single figure by using the functionality of package patchwork. I tried to collect the legends and they appeared one next to the other. But still, they are 3 separate legends and I expected a single legend. So how can I merge the legends that contain identical values of the same factor variable into a single legend?
Notes:
And I do not want to remove the legends of separate plots by using, e.g., theme(legend.position = "none") in case some additional factor level appears. I expect patchwork specific solution.
A similar question was answered in Combine and merge legends in ggplot2 with patchwork but the data was continuous. And in my case, I have categorical data.
The code:
library(ggplot2)
library(patchwork)
iris_1 <-
ggplot(iris, aes(x = Sepal.Length, fill = Species, color = Species)) +
geom_density(alpha = 0.3, adjust = 1.5)
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point()
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
geom_boxplot()
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")
Created on 2020-10-14 by the reprex package (v0.3.0)
Update
I tried using the same aesthetic mappings (fill = Species and color = Species) as it was proposed in the comments below but it had no effect:
library(tidyverse)
library(patchwork)
iris_1 <-
ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) +
geom_density(alpha = 0.3, adjust = 1.5)
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species, fill = Species)) +
geom_point()
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width, color = Species, fill = Species)) +
geom_boxplot(color = "black")
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")
Created on 2020-10-14 by the reprex package (v0.3.0)
Unfortunately setting the same aes is only one condition. patchwork will merge legends only if they are identical. Therefore we have to ensure that the legends are the same for each plot. To this end I add a guides layer which makes the look of each legend the same by setting color, shape, size and alpha. Additionally we have to choose the same glyph for each geom using argument key_glyph. After these adjustments the three legends get merged into one.
library(ggplot2)
library(patchwork)
g <- guides(fill = guide_legend(override.aes = list(color = scales::hue_pal()(3),
shape = c(16, 16, 16),
size = c(1, 1, 1),
alpha = c(1, 1, 1)),))
iris_1 <-
ggplot(iris, aes(x = Sepal.Length)) +
geom_density(aes(fill = Species, color = Species), key_glyph = "point", alpha = 0.3, adjust = 1.5) +
g
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(fill = Species, color = Species), key_glyph = "point") +
g
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_boxplot(aes(fill = Species, color = Species), key_glyph = "point") +
scale_color_manual(values = c("black", "black", "black")) +
g
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")

Increase Plot layout and reduce the legend list

Below is my code to plot Stacked BarPlot
ggplot(data = mdata, aes(x = variable, y = value, fill = Species)) +
geom_bar(position = "fill", stat = "identity") +
theme(legend.text=element_text(size=rel(0.7)),
legend.key.size = unit(0.5, "cm")) +
scale_y_continuous(labels=function(x)x*100) +
coord_flip() +
ylab("Species Percentage") +
xlab("Samples")
OutputPlot:
As you can see from the plot my Species legends are split in to 5 column list, which takes the 50% of the total plot layout.
Is there a way to make/convert legend list in to only 2 or 3 column so that area above and below will be covered and BarPlot can be widened.
Also to make Legend Text Bold its looking blurred with many legends
You can set any number of columns with the ncol argument in guide_legend():
library(ggplot2)
dat <- cbind(car = rownames(mtcars), mtcars)
ggplot(dat, aes(mpg, wt, colour = car)) +
geom_point() +
scale_colour_discrete(guide = guide_legend(ncol = 3))
EDIT: As Z.Lin pointed out, for fill scales; replace scale_colour_* by scale_fill_*.

Plotting in layers in R

I'm trying to plot individual regression lines for all of my experimental subjects (n=40) on the same plot where I show the overall regression line.
I can do the plots separately with ggplot, but I haven't found a way to superpose them on the same graph.
I can illustrate what I did with the iris data frame:
#first plot
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# second plot, grouped by species
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# and I've been trying things like this:
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic() +
geom_point(aes(x = Sepal.Width, y = Sepal.Length, colour =Species))) +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
which returns the message "Error: Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?", so I get that this is not the right way to combine them, but what is?
How can I combine both graphs in one?
Thanks in advance!
Repeat the whole data and set Species to be something else ("Together") in example below. Attach the repeated data to the original data and just call the second plot.
d1 = iris
d2 = rbind(d1, transform(d1, Species = "Together"))
ggplot(d2, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
stat_smooth(method = lm, se = FALSE) +
geom_point(data = d1) +
theme_classic()
Similar to #d.b's answer, consider expanding the data frame with rbind, assigning an "All" category for Species and adjust for factor levels (so All shows at top on legend):
new_species_level <- c("All", unique(as.character(iris$Species)))
iris_expanded <- rbind(transform(iris, Species=factor("All", levels=new_species_level)),
transform(iris, Species=factor(Species, levels=new_species_level)))
ggplot(iris_expanded, aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()

Combining geom_point() legends

How do I combine the two legends into one "Species" legend in the code below?
library(ggplot2)
data(iris)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(color = "red", size = Species))
Thanks.
I would prefer visualizing the data in this way if you would like to add the information of the sample size per class.
cat_table <- table(iris$Species)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(color =Species)) +
scale_color_manual(breaks=names(cat_table), labels=paste(names(cat_table), ':', cat_table), values=rainbow(n=length(cat_table)))
Just remove "colour" out of the aesthetics. This seems to be what you are after.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(size = Species), colour = "red")
If you want to keep colour as an aesthetic, then
you can manually override the colour in the size legend guide
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(size = Species, colour = "red")) +
guides(colour = FALSE,
size=(guide_legend(override.aes =
list(colour = "red"))))

Resources