Specifying the scale for the density in ggplot2's stat_density2d - r

I'm looking to create multiple density graphs, to make an "animated heat map."
Since each frame of the animation should be comparable, I'd like the density -> color mapping on each graph to be the same for all of them, even if the range of the data changes for each one.
Here's the code I'd use for each individual graph:
ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level..), geom="polygon", bins=3, size=1)
Imagine I use this same code, but 'this_df' changes on each frame. So in one graph, maybe density ranges from 0 to 4e-4. On another, density ranges from 0 to 4e-2.
By default, ggplot will calculate a distinct density -> color mapping for each of these. But this would mean the two graphs-- the two frames of the animation--aren't really comparable. If this were a histogram or density plot, I'd simply make a call to coord_cartesian and change the x and y lim. But for the density plot, I have no idea how to change the scale.
The closest I could find is this:
Overlay two ggplot2 stat_density2d plots with alpha channels
But I don't have the option of putting the two density plots on the same graph, since I want them to be distinct frames.
Any help would be hugely appreciated!
EDIT:
Here's a reproducible example:
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)), geom="contour", bins=4, size= 2)
}
print(g) # level has a different scale for each

I would like to leave an update for this question. As of July 2016, stat_density2d is not taking breaks any more. In order to reproduce the graphic, you need to move breaks=1e-6*seq(0,10,by=2) to scale_alpha_continuous().
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) +
scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
geom="contour", bins=4, size= 2) +
scale_alpha_continuous(limits=c(0,1e-5), breaks=1e-6*seq(0,10,by=2))+
scale_color_discrete("Condition")
}
do.call(grid.arrange,c(g,ncol=2))

So to have both plots show contours with the same levels, use the breaks=... argument in stat_densit2d(...). To have both plots with the same mapping of alpha to level, use scale_alpha_continuous(limits=...).
Here is the full code to demonstrate:
library(ggplot2)
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
breaks=1e-6*seq(0,10,by=2),geom="contour", bins=4, size= 2)+
scale_alpha_continuous(limits=c(0,1e-5))+
scale_color_discrete("Condition")
}
library(gridExtra)
do.call(grid.arrange,c(g,ncol=2))
And the result...

Not sure how useful this is, but I found it easier to either use:
scale_fill_gradient(low = "purple", high = "yellow", limits = c(0, 1000))
Where you can overwrite the limits of the plot easily, choose colors etc. and you can just add it at the end of your code so it'll overwrite most things it needs to, so it's easy to use
or a similar solution using:
library(viridis)#colors for heat map
scale_fill_viridis(option = 'inferno')+
scale_fill_viridis_c(limits = c(0, 1000))

Related

How to plot filled points and confidence ellipses with the same color using ggplot in R?

I would like to plot a graph from a Discriminant Function Analysis in which points must have a black border and be filled with specific colors and confidence ellipses must be the same color as the points are filled. Using the following code, I get almost the graph I want, except that points do not have a black border:
library(ggplot2)
library(ggord)
library(MASS)
data("iris")
set.seed(123)
linear <- lda(Species~., iris)
linear
dfaplot <- ggord(linear, iris$Species, labcol = "transparent", arrow = NULL, poly = FALSE, ylim = c(-11, 11), xlim = c(-11, 11))
dfaplot +
scale_shape_manual(values = c(16,15,17)) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 1
I could put a black border on the points by using the following code, but then confidence ellipses turn black.
dfaplot +
scale_shape_manual(values = c(21,22,24)) +
scale_color_manual(values = c("black","black","black")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 2
I would like to keep the ellipses as in the first graph, but the points as in the second one. However, I am being unable to figure out how I could do this. If anyone has suggestions on how to do this, I would be very grateful. I am using the "ggord" package because I learned how to run the analysis using it, but if anyone has suggestions on how to do the same with only ggplot, it would be fine.
This roughly replicates what is going on in ggord. Looking at the source for the package, the ellipses are implemented differently in ggord than below, hence the small differences. If that is a big deal you can review the source and make changes. By default, geom_point doesn't have a fill attribute. So we set the shapes to a character type that does, and then specify color = 'black' in geom_point(). The full code (including projecting the original data) is below.
set.seed(123)
linear <- lda(Species~., iris)
linear
# Get point x, y coordinates
df <- data.frame(predict(linear, iris[, 1:4]))
df$species <- iris$Species
# Get explained variance for each axis
var_exp <- 100 * linear$svd ^ 2 / sum(linear$svd ^ 2)
ggplot(data = df,
aes(x = x.LD1,
y = x.LD2)) +
geom_point(aes(fill = species,
shape = species),
size = 4) +
stat_ellipse(aes(color = species),
level = 0.95) +
ylim(c(-11, 11)) +
xlim(c(-11, 11)) +
ylab(paste("LD2 (",
round(var_exp[2], 2),
"%)")) +
xlab(paste("LD1 (",
round(var_exp[1], 2),
"%)")) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_shape_manual(values = c(21, 22, 24)) +
coord_fixed(1) +
theme_bw() +
theme(
legend.position = "none"
)
To plot arrows, you can grab the scaling from the output it and plot it with geom_segment. I played with the colors/alpha so they were visible in the plot below.
scaling <- data.frame(linear$scaling)
...
geom_segment(data = scaling,
aes(x = 0,
y = 0,
xend = LD1,
yend = LD2),
arrow = arrow(),
color = "black") +
geom_text(data = scaling,
aes(x = ifelse(LD1 <= 0.1, LD1 - 2, LD1 + 2),
y = ifelse(LD2 <= 0.1, LD2 - 1, LD2 + 1)),
label = rownames(scaling),
color = "black") +
...

Plotting a vertical normal distribution next to a box plot in R

I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this:
This is what I currently have graphed from an excel sheet uploaded to R:
And the code associated with them:
set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape=20, fill="gray", color = "gray") +
geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"),
outlier.color = "NA") +
theme_hc()
Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?
You can do something similar to your example plot with the gghalves package:
library(gghalves)
n=0.02
ggplot(iris, aes(Species, Sepal.Length)) +
geom_half_boxplot(center=TRUE, errorbar.draw=FALSE,
width=0.5, nudge=n) +
geom_half_violin(side="r", nudge=n) +
geom_half_dotplot(dotsize=0.5, alpha=0.3, fill="red",
position=position_nudge(x=n, y=0)) +
theme_hc()
There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:
set.seed(12345)
X8_17_20_R_20_60 <- data.frame(
Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
Now we create a little data frame of normal distributions based on the parameters taken from each group:
df <- do.call(rbind, mapply( function(d, n) {
y <- seq(min(d), max(d), length.out = 1000)
data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
}, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
Finally, we draw your plot and add a geom_path with the new data.
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
theme_hc()
Created on 2020-08-21 by the reprex package (v0.3.0)

Extending ggplot geom_polygon to end of plot

How can I make a confidence interval band that extends to the end of the plot in ggplot?
I can do it if the plotted band is entirely within the plot, for example
limits <- c(1e2, 1e7)
confPolygon <- tibble(
x = c(limits[1], limits[1]*10, limits[2], limits[2], limits[2]/10, limits[1], limits[1]),
y = c(limits[1], limits[1], limits[2]/10, limits[2], limits[2], limits[1]*10, limits[1])
)
plot <- ggplot() +
geom_polygon(data = confPolygon, aes(x = x, y = y), fill = "grey", alpha = .25) +
scale_x_log10(limits = limits) +
scale_y_log10(limits = limits)
works. However, if I try any shape that extends the polygon to the edges
confPolygon <- tibble(
x = c(limits[1], limits[2]*10, limits[2]*10, limits[1], limits[1]),
y = c(limits[1], limits[1], limits[2]*10, limits[2]*10, limits[1])
)
then it doesn't plot the polygon.
The reason is because the method you are using to zoom in to the plot (setting limits within the x or y scales) isn't meant to zoom in; it actually subsets the data, accidentally creating missing values on the way. Use coord_cartesian(xlim = c(0,5), ylim = c(0,5)), or in your case, coord_cartesian(xlim = limits, ylim = limits) instead, as this step does not subset the data.
One way to do this is with oob=scales::squish().
plot2 <- ggplot() +
geom_polygon(data = confPolygon, aes(x = x, y = y), fill = "grey", alpha = .25) +
scale_x_log10(limits = limits, oob=scales::squish) +
scale_y_log10(limits = limits, oob=scales::squish)
If you really want the polygon to extend all the way to the edge, you should also add expand=c(0,0) to each of the scale_*_log10() argument lists.

ggplot2 - draw logistic distribution with small part of the area colored

I have following code to draw my logistic distribution:
x=seq(-2000,2000,length=1000)
dat <- data.frame(x=x)
dat$value <- dlogis(x,location=200,scale=400/log(10))
dat$type <- "Expected score"
p <- ggplot(data=dat, aes(x=x, y=value)) + geom_line(col="blue", size=1) +
coord_cartesian(xlim = c(-500, 900), ylim = c(0, 0.0016)) +
scale_x_continuous(breaks=c(seq(-500, 800, 100)))
pp <- p + geom_line(aes(x = c(0,0), y = c(0,0.0011)), size=0.9, colour="green", linetype=2, alpha=0.7)
Now what I would like to do is to highlight the area to the left of x = 0.
I tried to do it like this:
x = seq(-500, 0, length=10)
y = dlogis(x,location=200,scale=400/log(10))
pol <- data.frame(x = x, y = y)
pp + geom_polygon(aes(data=pol,x=x, y=y), fill="light blue", alpha=0.6)
But this does not work. Not sure what I am doing wrong. Any help?
I haven't diagnosed the problem with your polygon (although I think you would need to give the full path around the outside, i.e. attach rep(0,length(x)) to the end of y and rev(x) to the end of x), but geom_ribbon (as in Shading a kernel density plot between two points. ) seems to do the trick:
pp + geom_ribbon(data=data.frame(x=x,y=y),aes(ymax=y,x=x,y=NULL),
ymin=0,fill="light blue",alpha=0.5)

Overlaying histograms with ggplot2 in R

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

Resources