How to overlay multiple layers of data in same plot in ggplot? - r

Data: Data
Code:
## Load the data
ifpricc = read.csv(file = "IFPRI_CCAgg2050.csv", heade=TRUE)
#-----------------------------------------------------------------------
# Plotting Kernel density distribution for the final yield impact data
#-----------------------------------------------------------------------
ifpricc.df = as.data.frame(ifpricc)
ifpricc_mlt.df = melt(ifpricc.df, id.vars=c("crop","codereg","reg","sres","gcm","scen"))
kernel = ggplot(data=subset(ifpricc_mlt.df, reg %in% c("Canada","United States","Oceania","OECD Europe","Eastern Europe","Former USSR") & gcm %in% c("CSIRO","MIROC","noCC")),
aes(x = value, y = ..density..))
kernel = kernel + geom_density(aes(fill = gcm), alpha=.4, subset = .(crop %in% c("WHET")),
position="identity", stat="density", size=0.75,
bw = "nrd0", adjust = 1.5,
kernel = c("gaussian"))
kernel = kernel + scale_fill_manual(name="GCM model",breaks=c("CSIRO","MIROC","noCC"), values=c("red","blue","gray80"))
kernel = kernel + facet_grid(sres ~ reg, scale="free") + scale_y_continuous(breaks=seq(0,2,.25))
kernel = kernel + labs(title="Kernel density distribution - with and without climate change", y="Density", x="Yield") + theme_bw()
kernel = kernel + theme(plot.title=element_text(face="bold", size=rel(2), hjust=0.5, vjust=1.5, family="serif"),
axis.text.x=element_text(color="black", size=rel(2), hjust=0.5, family="serif"),
axis.text.y=element_text(color="black", size=rel(2), hjust=1, family="serif"),
axis.title.x=element_text(face="bold", color="black", size=rel(1.6), hjust=0.5, vjust=0.2, family="serif"),
axis.title.y=element_text(face="bold", color="black", size=rel(1.6), hjust=0.5, vjust=0.2, family="serif"),
strip.text=element_text(face="bold", size=rel(1.5), family="serif"),
legend.text=element_text(face="bold", size=rel(1.25), family="serif"),
legend.title=element_text(face="bold", size=rel(1.45), family="serif"))
Results:
Question:
What I am trying to achieve here is to plot Kernel density curves. My problem is that I want to overlay the baseline kernel curves (in lower facet) over the colored ones (two upper facets) and which represent deviations from the baseline. Any help will be greatly appreciated.
Cheers :)
Alternative Question:
So I tinkered a bit after looking up potential solutions on the website, and I came up with this: instead of faceting using facet_grid(sres ~ reg) by "sres" x "reg", I faceted by using facet_wrap(~ reg). It produces something closer to what I am intending .
The problem now is that I cannot identify the distribution by "sres", which is what I am looking for. To solve this, I thought to annotate the plot by adding vertical lines that plot the mean of the data by "sres". But I kind of am at loss of how to move from here.
Any suggestions?

If I understand, I think this is what you want. You need to rearrange the data: repeat the lines in the data frame containing the noCC factor, once with sres = A1B, and once with sres = B1. That way, the noCC density curve will appear in the A1B facets and in the B1 facets.
In addition, the melting of the data frame has no effect other than to create a column of 1s. Also, I do the subsetting outside the call to ggplot2.
library(ggplot2)
ifpricc = read.csv(file = "IFPRI_CCAgg2050.csv", heade=TRUE)
# Subset the data frame
df = subset(ifpricc,
reg %in% c("Canada","United States","Oceania","OECD Europe","Eastern Europe","Former USSR") &
gcm %in% c("CSIRO","MIROC","noCC") &
crop %in% c("WHET"))
# Manipulate the data frame
x = df[df$sres == "PM", ]
x = rbind(x, x)
x$sres = rep(c("A1B", "B1"), each = dim(x)[1]/2)
df = df[df$sres != "PM",]
df = rbind(df, x)
# Draw the plot
ggplot(data=df, aes(x = yield, fill = gcm)) +
geom_density(alpha=.4, size=0.75, adjust = 1.5) +
scale_fill_manual(name="GCM model",breaks=c("CSIRO","MIROC","noCC"),
values=c("red","blue","gray80")) +
facet_grid(sres ~ reg, scale="free") + scale_y_continuous(breaks=seq(0,2,.25))
EDIT: The facet_wrap version:
The idea is to draw two charts: one for A1B, and the second for B1; then to put the two charts together using functions from the gridExtra package. But that will give a legend for each chart. It would look better with only one legend. Therefore, draw one of the charts so that its legend can be extracted. Then draw the two charts without their legends, and put the two charts and the legend together.
library(ggplot)
library(gridExtra)
library(gtable)
ifpricc = read.csv(file = "IFPRI_CCAgg2050.csv", heade=TRUE)
# Subset the data frame
df = subset(ifpricc,
reg %in% c("Canada","United States","Oceania","OECD Europe","Eastern Europe","Former USSR") &
gcm %in% c("CSIRO","MIROC","noCC") &
crop %in% c("WHET"))
# Draw first chart
p1 = ggplot(data=df[df$sres != "B1", ], aes(x = yield, fill = gcm)) +
geom_density(alpha=.4, size=0.75, adjust = 1.5) +
ggtitle("sres = A1B") +
scale_fill_manual(name="GCM model",breaks=c("CSIRO","MIROC","noCC"),
values=c("red","blue","gray80")) +
facet_wrap( ~ reg, scales = "free_x") + scale_y_continuous(breaks=seq(0,2,.25))
# Extract its legend
legend = gtable_filter(ggplot_gtable(ggplot_build(p1)), "guide-box")
# Redraw the first chart without its legend
p1 = p1 + guides(fill = FALSE)
# Draw the second chart without its legend
p2 = ggplot(data=df[df$sres != "A1B", ], aes(x = yield, fill = gcm)) +
geom_density(alpha=.4, size=0.75, adjust = 1.5) +
ggtitle("sres = B1") +
scale_fill_manual(name="GCM model",breaks=c("CSIRO","MIROC","noCC"),
values=c("red","blue","gray80"), guide = "none") +
facet_wrap( ~ reg, scales = "free_x") + scale_y_continuous(breaks=seq(0,2,.25))
# Combine the two charts and the legend (and a main title)
grid.arrange(arrangeGrob(p1, p2, ncol = 1),
legend, widths = unit.c(unit(1, "npc") - legend$width, legend$width), nrow = 1,
main = textGrob("Kernel density distribution - with and without climate change",
vjust = 1, gp = gpar(fontface = "bold")))

Related

how to input data from multiple columns in x and y arguments in ggplot

I am trying to create a density plot for particle size data. My data has multiple density and size readings for each genotype set. Is there a way to specify multiple columns into x and y using ggplot? I tried coding for this but am only getting a blank plot as of now. This is the link to the csv file I used: https://drive.google.com/file/d/11djXTmZliPCGLCZavukjb0TT28HsKMRQ/view?usp=sharing
Thanks!
crop.data6 <- read.csv("barleygt25.csv", header = TRUE)
crop.data6
library(ggplot2)
plot1 = ggplot(data=crop.data6, aes(x=, xend=bq, y=a, yend=bq, color=genotype))
plot1
Your data is in a strange format that doesn't lend itself well to plotting. Effectively, it needs to be transposed then pivoted into long format to make it suitable for plotting:
df <- data.frame(xvals = c(t(crop.data6[1:9, -c(1:2)])),
yvals = c(t(crop.data6[10:18, -c(1:2)])),
genotype = rep(crop.data6$genotype[1:9], each = 68))
ggplot(df, aes(xvals, yvals, color = genotype)) +
geom_line(size = 1) +
scale_color_brewer(palette = "Set1") +
theme_bw(base_size = 16) +
labs(x = "value", y = "density")

How to plot filled points and confidence ellipses with the same color using ggplot in R?

I would like to plot a graph from a Discriminant Function Analysis in which points must have a black border and be filled with specific colors and confidence ellipses must be the same color as the points are filled. Using the following code, I get almost the graph I want, except that points do not have a black border:
library(ggplot2)
library(ggord)
library(MASS)
data("iris")
set.seed(123)
linear <- lda(Species~., iris)
linear
dfaplot <- ggord(linear, iris$Species, labcol = "transparent", arrow = NULL, poly = FALSE, ylim = c(-11, 11), xlim = c(-11, 11))
dfaplot +
scale_shape_manual(values = c(16,15,17)) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 1
I could put a black border on the points by using the following code, but then confidence ellipses turn black.
dfaplot +
scale_shape_manual(values = c(21,22,24)) +
scale_color_manual(values = c("black","black","black")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 2
I would like to keep the ellipses as in the first graph, but the points as in the second one. However, I am being unable to figure out how I could do this. If anyone has suggestions on how to do this, I would be very grateful. I am using the "ggord" package because I learned how to run the analysis using it, but if anyone has suggestions on how to do the same with only ggplot, it would be fine.
This roughly replicates what is going on in ggord. Looking at the source for the package, the ellipses are implemented differently in ggord than below, hence the small differences. If that is a big deal you can review the source and make changes. By default, geom_point doesn't have a fill attribute. So we set the shapes to a character type that does, and then specify color = 'black' in geom_point(). The full code (including projecting the original data) is below.
set.seed(123)
linear <- lda(Species~., iris)
linear
# Get point x, y coordinates
df <- data.frame(predict(linear, iris[, 1:4]))
df$species <- iris$Species
# Get explained variance for each axis
var_exp <- 100 * linear$svd ^ 2 / sum(linear$svd ^ 2)
ggplot(data = df,
aes(x = x.LD1,
y = x.LD2)) +
geom_point(aes(fill = species,
shape = species),
size = 4) +
stat_ellipse(aes(color = species),
level = 0.95) +
ylim(c(-11, 11)) +
xlim(c(-11, 11)) +
ylab(paste("LD2 (",
round(var_exp[2], 2),
"%)")) +
xlab(paste("LD1 (",
round(var_exp[1], 2),
"%)")) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_shape_manual(values = c(21, 22, 24)) +
coord_fixed(1) +
theme_bw() +
theme(
legend.position = "none"
)
To plot arrows, you can grab the scaling from the output it and plot it with geom_segment. I played with the colors/alpha so they were visible in the plot below.
scaling <- data.frame(linear$scaling)
...
geom_segment(data = scaling,
aes(x = 0,
y = 0,
xend = LD1,
yend = LD2),
arrow = arrow(),
color = "black") +
geom_text(data = scaling,
aes(x = ifelse(LD1 <= 0.1, LD1 - 2, LD1 + 2),
y = ifelse(LD2 <= 0.1, LD2 - 1, LD2 + 1)),
label = rownames(scaling),
color = "black") +
...

Plotting a vertical normal distribution next to a box plot in R

I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this:
This is what I currently have graphed from an excel sheet uploaded to R:
And the code associated with them:
set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape=20, fill="gray", color = "gray") +
geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"),
outlier.color = "NA") +
theme_hc()
Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?
You can do something similar to your example plot with the gghalves package:
library(gghalves)
n=0.02
ggplot(iris, aes(Species, Sepal.Length)) +
geom_half_boxplot(center=TRUE, errorbar.draw=FALSE,
width=0.5, nudge=n) +
geom_half_violin(side="r", nudge=n) +
geom_half_dotplot(dotsize=0.5, alpha=0.3, fill="red",
position=position_nudge(x=n, y=0)) +
theme_hc()
There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:
set.seed(12345)
X8_17_20_R_20_60 <- data.frame(
Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
Now we create a little data frame of normal distributions based on the parameters taken from each group:
df <- do.call(rbind, mapply( function(d, n) {
y <- seq(min(d), max(d), length.out = 1000)
data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
}, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
Finally, we draw your plot and add a geom_path with the new data.
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
theme_hc()
Created on 2020-08-21 by the reprex package (v0.3.0)

Ordered bar graphs using ggplot2 and facet

I have a data.frame that looks something like this:
HSP90AA1 SSH2 ACTB TotalTranscripts
ESC_11_TTCGCCAAATCC 8.053308 12.038484 10.557234 33367.23
ESC_10_TTGAGCTGCACT 9.430003 10.687959 10.437068 30285.41
ESC_11_GCCGCGTTATAA 7.953726 9.918988 10.078192 30133.94
ESC_11_GCATTCTGGCTC 11.184402 11.056144 8.316846 24857.07
ESC_11_GTTACATTTCAC 11.943733 11.004500 9.240883 23629.00
ESC_11_CCGTTGCCCCTC 7.441695 9.774733 7.566619 22792.18
The TotalTranscripts column is sorted in descending order. What I'd like to do is generate three bar graphs using ggplot2 with each bar graph corresponding to each column of the data.frame with the exception of TotalTranscripts. I'd like the bar graphs to be ordered by TotalTranscripts just as the data.frame. I would be ideal to have these bar graphs on one plot using a facet wrap.
Any help would be greatly appreciated! Thank you!
EDIT: Here is my current code using barplot().
cells = "ESC"
genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
barplot(as.matrix(g[1]), beside=TRUE, names.arg=paste(rownames(g)," (",g$TotalTranscripts,")",sep=""), las=2, col="light blue", cex.names=0.3, main=paste(colnames(g)[1], "\nCells sorted by total number of transcripts (colSums)", sep=""))
This will generate a plot that looks like this.
Again, the problem I seem to be having here is how to have multiple of these plots on the same image. I would like to add 20+ columns to this data.frame but I've cut this down to 3 for the sake of simplicity.
EDIT: Current code incorporating the answer below
cells = "ESC"
genes = rownames(data[x,])[1:8]
# genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
g$rowz <- row.names(g)
g$Cells <- reorder(g$rowz, rev(g$TotalTranscripts))
df1 <- melt(g, id.vars = c("Cells", "TotalTranscripts"), measure.vars=genes)
ggplot(df1, aes(x = Cells, y = value)) + geom_bar(stat = "identity") +
theme(axis.title.x=element_blank(), axis.text.x = element_blank()) +
facet_wrap(~ variable, scales = "free") +
theme_bw() + theme(axis.text.x = element_text(angle = 90))
Here is the example data for anybody else:
df <- structure(list(HSP90AA1 = c(8.053308, 9.430003, 7.953726, 11.184402,
11.943733, 7.441695), SSH2 = c(12.038484, 10.687959, 9.918988,
11.056144, 11.0045, 9.774733), ACTB = c(10.557234, 10.437068,
10.078192, 8.316846, 9.240883, 7.566619), TotalTranscripts = c(33367.23,
30285.41, 30133.94, 24857.07, 23629, 22792.18)), .Names = c("HSP90AA1",
"SSH2", "ACTB", "TotalTranscripts"), class = "data.frame", row.names = c("ESC_11_TTCGCCAAATCC",
"ESC_10_TTGAGCTGCACT", "ESC_11_GCCGCGTTATAA", "ESC_11_GCATTCTGGCTC",
"ESC_11_GTTACATTTCAC", "ESC_11_CCGTTGCCCCTC"))
And here is a solution:
#New column for row names so they can be used as x-axis elements
df$rowz <- row.names(df)
#Explicitly order the rows (see the Kohske link)
df$rowz1 <- reorder(df$rowz, rev(df$TotalTranscripts))
library(reshape2)
#Melt the data from wide to long
df1 <- melt(df, id.vars = c("rowz1", "TotalTranscripts"),
measure.vars = c("HSP90AA1", "SSH2", "ACTB"))
library(ggplot2)
gp <- ggplot(df1, aes(x = rowz1, y = value)) + geom_bar(stat = "identity") +
facet_wrap(~ variable, scales = "free") +
theme_bw()
gp + theme(axis.text.x = element_text(angle = 90))
This example by Kohske is a constant reference for me on ordering elements in ggplot2.
If you have many columns, but the same six ESC complexes, you can switch the groupings, i.e. x = variable and facet_wrap(~ rowz1), but this fundamentally changes how you are visualizing/comparing your data. Also, consider facet_grid(row ~ column) if you can organize the columns by 2 components (Columns being the data that are melted into 'variable' and 'value').
And this additional SO solution isn't related to your question, but it is an elegant way to reorder elements in each facet by their values (for future reference).
Finally, the method that will give you the finest control is to plot each graph separately and combine the grobs. Baptiste's packages like gridExtra and gtable are useful for these tasks.
**EDIT in response to new information from OP**
The OP has subsequently asked how to visualize the data, especially when there are more ESC categorical variables (up to 600+).
Here are some examples, with the big caveat that with many categorical variables, they should be grouped or converted to a continuous variable somehow.
#Plot colour to a few discrete, categorical variables
gp + aes(fill = rowz1) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
labs(x = NULL, fill = "Cell", title = "Discrete categorical variables")
#Plot colour on a continuous scale.
#Ultimately, not appropriate for this example! (but shown for reference)
#More appropriate: fill = TotalTranscripts
gp + aes(fill = as.numeric(rowz1)) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
labs(x = NULL, title = "Continuous variables (legend won't work for many values)") +
scale_fill_gradient2(name = "Cell",
breaks = as.numeric(df1$rowz1),
labels = df1$rowz1,
midpoint=median(as.numeric(df1$rowz1)))
#x is continuous, colour plotted to the categorical variable.
#Same caveats as earlier.
gp1 <- ggplot(df1, aes(x = TotalTranscripts/1000, y = value, colour = rowz1)) +
geom_point(size=3) + facet_wrap(~ variable, scales = "free") +
labs(title = "X is an actual continuous variable") +
theme_bw() + labs(x = bquote("Total Transcripts,"~10^3), colour = "Cell")
gp1

Overlaying histograms with ggplot2 in R

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

Resources