How to stack functions? - r

I have the following code:
library("ggplot2")
base <- ggplot(data.frame(x = c(-5, 5)), aes(x))
f_sin <- stat_function(fun=sin, colour="red", geom="area", position = 'stack', mapping=aes(fill = "red"))
f_cos <- stat_function(fun=cos, colour="green", geom="area", position = 'stack', mapping=aes(fill = "green"))
print(base + f_sin + f_cos)
Which is producing this graph:
Why aren't the areas of the two functions stacked?

In general, you want to do your calculations outside of ggplot. Is this what you wanted?
library(reshape)
df <- data.frame(x=seq(-5,5,0.01))
df$sin <- sin(df$x)
df$cos <- cos(df$x)
df <- melt(df,id="x")
ggplot(df, aes(x=x,y=value,fill=variable)) + geom_area(position="stack")
The red area is sin(x), the green area is 'stacked' (sin+cos).

Related

Plotting a vertical normal distribution next to a box plot in R

I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this:
This is what I currently have graphed from an excel sheet uploaded to R:
And the code associated with them:
set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape=20, fill="gray", color = "gray") +
geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"),
outlier.color = "NA") +
theme_hc()
Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?
You can do something similar to your example plot with the gghalves package:
library(gghalves)
n=0.02
ggplot(iris, aes(Species, Sepal.Length)) +
geom_half_boxplot(center=TRUE, errorbar.draw=FALSE,
width=0.5, nudge=n) +
geom_half_violin(side="r", nudge=n) +
geom_half_dotplot(dotsize=0.5, alpha=0.3, fill="red",
position=position_nudge(x=n, y=0)) +
theme_hc()
There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:
set.seed(12345)
X8_17_20_R_20_60 <- data.frame(
Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
Now we create a little data frame of normal distributions based on the parameters taken from each group:
df <- do.call(rbind, mapply( function(d, n) {
y <- seq(min(d), max(d), length.out = 1000)
data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
}, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
Finally, we draw your plot and add a geom_path with the new data.
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
theme_hc()
Created on 2020-08-21 by the reprex package (v0.3.0)

How to overlay two heatmaps via ggplot2 with two different scales_fill_gradient?

I have a data which has two variables and I want to see a single plot with heatmap for each of them overlaid on one another and showing two color scales for the two different variables. My code while not correct should clearly indicate what I am trying to achieve.
I have looked through several examples none of those indicate how to do this for geom_tile(). It would have been easy for geom_point. I am providing a synthetic example to show what I am doing. I get the error saying "Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale." Evidently it is accepting only the second scale_fill_gradient, but I would like to view both the color gradients corresponding to the variables in the same heatmap.
It would be great if I could find a way to get this plot. Thank you!
library(reshape2)
library(ggplot2)
set.seed(2)
m1 = matrix(rnorm(100), nrow=10)
m2 = matrix(rnorm(100), nrow=10)
M1 = melt(m1)
M2 = melt(m2)
names(M1) = c("Var1", "Var2", "value1")
names(M2) = c("Var1", "Var2", "value2")
pp1 <- ggplot() +
geom_tile(data=M1, aes(x=Var1, y=Var2, fill=value1)) +
scale_fill_gradient(low="white", high="red") +
geom_tile(data=M2, aes(x=Var1, y=Var2, fill=value2)) +
scale_fill_gradient(low="blue", high="yellow")
pp1
So the legends themselves are no problem with the ggnewscale package, the problem lies in choosing the actual colours that you want to display. So let's make a new matrix with the actual colours you want to display:
library(ggnewscale)
library(scales)
r <- rescale(M1$value1)
# 1 - rescaled value because yellow should be bottom
g <- 1 - rescale(M2$value2)
# Second scale goes from yellow (low) to blue (high)
# Yellow is 100% blue, 100% green, so blue stays invariant
rgb <- rgb(r, g, 1)
# Make new matrix
M3 <- M1
M3$value1 <- rgb
And now plotting would occur as follows:
ggplot(mapping = aes(x = Var1, y = Var2)) +
# This bit is for making scales
geom_tile(data=M1, aes(fill = value1)) +
scale_fill_gradient(low = "white", high = "red") +
new_scale_fill() +
geom_tile(data=M2, aes(fill=value2)) +
scale_fill_gradient(low="yellow", high="blue") +
new_scale_fill() +
# This is the actual colours
geom_tile(data=M3, aes(fill = M3$value1)) +
scale_fill_identity()
The legends aren't 100% accurate since ggplot mixes colours in 'Lab' space, while we've mixed colours in rgb space, but you could replace the scale_fill_gradient() with for example scale_fill_gradientn(colours = rgb(seq(0, 1, length.out = 100), 0, 0)). Also be aware that the white-to-red scale should technically be a black-to-red scale in this example.
A bivariate color legend. The intervals should maybe be the corresponding quantile.
library(tidyverse)
library(cowplot)
set.seed(2)
m1 = matrix(rnorm(100), nrow=10)
m2 = matrix(rnorm(100), nrow=10)
M1 = melt(m1)
M2 = melt(m2)
names(M1) = c("Var1", "Var2", "value1")
names(M2) = c("Var1", "Var2", "value2")
M1$value_cut <- cut(M1$value1, breaks = 3)
M2$value_cut <- cut(M2$value2, breaks = 3)
M1$value_cut2 <- M2$value_cut
M1$cuts <- paste(M1$value_cut, M1$value_cut2, sep = "-")
levels_comb <- expand.grid(lev1 = levels(M1$value_cut), lev2 = levels(M2$value_cut))
levels_comb$cuts <- paste(levels_comb$lev1, levels_comb$lev2, sep = "-")
levels_comb$filling <- c("#be64ac","#8c62aa","#3b4994","#dfb0d6","#a5add3","#5698b9","#e8e8e8","#ace4e4","#5ac8c8")
data_m <- left_join(M1, levels_comb, by = "cuts")
plot_tile <- ggplot(data_m, aes(x = Var1, y = Var2, fill = filling)) +
geom_tile() +
scale_fill_identity() +
coord_equal() +
theme_minimal()
legend_tile <- ggplot(levels_comb, aes(x = lev1, y = lev2, fill = filling)) +
geom_tile() +
scale_fill_identity() +
coord_equal() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggdraw() +
draw_plot(plot_tile, 0, 0, 1, 1) +
draw_plot(legend_tile, .75, .4, .3, .3)
I find geom_col() + facet_grid() to be a useful pattern to get at your goal of visualizing the multiple values from the same area together.
There is a little set-up overhead from your starting data:
names(M1) = c("Var1", "Var2", "value")
names(M2) = c("Var1", "Var2", "value")
M1$type <- "M1"
M2$type <- "M2"
M <- rbind(M1, M2)
But the plot is straight forward. You don't really need the fill scale anymore, but I like to keep for highlighting the value changes.
ggplot(M) +
geom_col(aes(type, value, fill = value)) +
facet_grid(Var2 ~ Var1) +
scale_fill_gradient(low="blue", high="yellow")
Not sure if this is palatable for you or not, but at least you get to see an alternative viz option.

Overlay plot and histogram in R with ggplot

I am trying to overlay a Plot and a Histogram in R, usign the ggplot2 package.
The Plot contains a set of curves (visualized as straight lines due to logarithmich axis) and a horizontal line.
I would like to plot on the same image an histogram showing the density distribution of the crossing ponts between the curves and the horizontal line. I can plot the histogram alone but not on the graph because the aes-length is not the same (the last intersection is at x = 800, while the x asis is much longer).
the code I wrote is:
baseplot +
geom_histogram(data = timesdf, aes(v)) + xlim(0,2000)
where v contains the intersections between the curves and the dashed line.
Any ideas?
edited: as suggested I wrote a little reproducible example:
library(ggplot2)
xvalues <- c(0:100)
yvalues1 <- xvalues^2-1000
yvalues2 <- xvalues^3-100
yvalues3 <- xvalues^4-10
yvalues4 <- xvalues^5-50
plotdf <- as.data.frame(xvalues)
plotdf$horiz <- 5
plotdf$vert1 <- yvalues1
plotdf$vert2 <- yvalues2
plotdf$vert3 <- yvalues3
plotdf$vert4 <- yvalues4
baseplot <- ggplot(data = plotdf, mapping = aes(x= xvalues, y= horiz))+
geom_line(linetype = "dashed", size = 1)+
geom_line(data = plotdf, mapping = aes(x= xvalues, y = vert1))+
geom_line(data = plotdf, mapping = aes(x= xvalues, y = vert2))+
geom_line(data = plotdf, mapping = aes(x= xvalues, y = vert3))+
geom_line(data = plotdf, mapping = aes(x= xvalues, y = vert4))+
coord_cartesian(xlim=c(0, 100), ylim=c(0, 1000))
baseplot
v<-c(ncol(plotdf)-1)
for(i in 1:ncol(plotdf)){
v[i] <- plotdf[max(which(plotdf[,i]<5)),1]
}
v <- as.integer(v[-1])
timesdf <- as.data.frame(v)
# my wish: visualize baseplot and histplot on the same image
histplot <- ggplot() + geom_histogram(data = timesdf, aes(v)) +
coord_cartesian(xlim=c(0, 100), ylim=c(0, 10))

alternative for ggplot2 aes: order function?

Does somebody know a alternative method for ordering stacks of a ggplot2 bar graph?
I used to use for example
library(ggplot2)
library(plyr)
a <- cbind(rep("a",5),sample(1:100,5), rep_len(c("1","2","3"),5))
b <- cbind(rep("b",7),sample(1:100,7), rep_len(c("1","2","3"),7))
c <- cbind(rep("c",3),sample(1:100,3), rep_len(c("1","2","3"),3))
d <- cbind(rep("d",10),sample(1:100,10), rep_len(c("1","2","3"),10))
e <- cbind(rep("e",15),sample(1:100,15), rep_len(c("1","2","3"),15))
dat <- rbind(a,b,c,d,e)
colnames(dat) <- c("x","count","example")
dat <- as.data.frame(dat)
dat$x <- as.character(dat$x)
dat$count <- as.numeric(dat$count)
dat$example <- as.character(dat$example)
GP <- ggplot(dat, aes(x= reorder(x, count, sum), y=count, fill = example, order = desc(count)))+
geom_bar(stat="identity", fill= "grey", colour= "black", size = 1)+
coord_flip() +
scale_y_continuous()+
scale_x_discrete('')+
#scale_fill_brewer()+
labs(y="")+
theme_bw()+
theme(axis.text.y=element_text(size=8,face="bold"),
axis.text.x=element_text(size=10,face="bold"),
axis.title.x=element_text(size=16,face="bold"),
axis.title.y=element_text(size=16,face="bold"),
plot.title=element_text(size=16,face="bold"),
strip.text.x = element_text(size=10,face="bold"),
strip.background = element_blank())
print(GP)
to create graphs like
however in version 2.0.0 of ggplot2 order() has been removed. and now the graph will be like:
Does anybody know a alternative?
Tanks

Scatterplot with too many points

I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question:

Resources