Related
This should seem relatively straightforward but I can't find an argument which would allow me to do this and I've searched Google and Stack for an answer.
Sample code:
library(ggplot2)
library(plotly)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=.8)))
p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) + geom_boxplot()
p <- ggplotly(p)
This outputs the first graph, I would want something like the second.
I tried including colour=cond but that gets rid of the median.
Two possible hacks for consideration, using the same dataset as Marco Sandri's answer.
Hack 1. If you don't really need it to work in plotly, just static ggplot image:
ggplot(dat, aes(x=cond, y=rating, fill=cond)) +
geom_boxplot() +
geom_boxplot(aes(color = cond),
fatten = NULL, fill = NA, coef = 0, outlier.alpha = 0,
show.legend = F)
This overlays the original boxplot with a version that's essentially an outline of the outer box, hiding the median (fatten = NULL), fill colour (fill = NA), whiskers (coef = 0) & outliers (outlier.alpha = 0).
However, it doesn't appear to work well with plotly. I've tested it with the dev version of ggplot2 (as recommended by plotly) to no avail. See output below:
Hack 2. If you need it to work in plotly:
ggplot(dat %>%
group_by(cond) %>%
mutate(rating.IQR = case_when(rating <= quantile(rating, 0.3) ~ quantile(rating, 0.25),
TRUE ~ quantile(rating, 0.75))),
aes(x=cond, y=rating, fill=cond)) +
geom_boxplot() +
geom_boxplot(aes(color = cond, y = rating.IQR),
fatten = NULL, fill = NA)
(ggplot output is same as above)
plotly doesn't seem to understand the coef = 0 & output.alpha = 0 commands, so this hack creates a modified version of the y variable, such that everything below P30 is set to P25, and everything above is set to P75. This creates a boxplot with no outliers, no whiskers, and the median sits together with the upper box limit at P75.
It's more cumbersome, but it works in plotly:
Here is an inelegant solution based on grobs:
set.seed(1)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)),
rating = c(rnorm(200),rnorm(200, mean=.8)))
library(ggplot2)
library(plotly)
p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) + geom_boxplot()
# Generate a ggplot2 plot grob
g <- ggplotGrob(p)
# The first box-and-whiskers grob
box_whisk1 <- g$grobs[[6]]$children[[3]]$children[[1]]
pos.box1 <- which(grepl("geom_crossbar",names(box_whisk1$children)))
g$grobs[[6]]$children[[3]]$children[[1]]$children[[pos.box1]]$children[[1]]$gp$col <-
g$grobs[[6]]$children[[3]]$children[[1]]$children[[pos.box1]]$children[[1]]$gp$fill
# The second box-and-whiskers grob
box_whisk2 <- g$grobs[[6]]$children[[3]]$children[[2]]
pos.box2 <- which(grepl("geom_crossbar",names(box_whisk2$children)))
g$grobs[[6]]$children[[3]]$children[[2]]$children[[pos.box2]]$children[[1]]$gp$col <-
g$grobs[[6]]$children[[3]]$children[[2]]$children[[pos.box2]]$children[[1]]$gp$fill
library(grid)
grid.draw(g)
P.S. To my knowledge, the above code cannot be used for generating plotly graphs.
I couldn't find the way not to plot the outer frame when combining graphs through ggplot2 + ggExtra + cowplot. I am not sure where I have to tell R, but suspect the issue to lie in ggExtra. Here is an example:
require(ggplot2)
require(cowplot)
require(ggExtra)
# Creat a graph
A <- ggplot(mpg, aes(x = cty, y = hwy, colour = factor(cyl))) + geom_point(size = 2.5)
# Add marginal histogram
B <- ggExtra::ggMarginal(A,type = 'histogram', margins = 'x', size = 9)
# Combine through cowplot
combo <- plot_grid(B,B,labels=c("A","B"))
plot(combo) # looks fine
# Re-combine through cowplot
plot_grid(B,combo,ncol=1,rel_heights = c(2,3)) # that's where I got an unwanted nasty frame around 'combo'
Any hint would be greatly appreciated!
p <- plot_grid(B,combo,ncol=1,rel_heights = c(2,3))
p <- p + panel_border(remove = TRUE)
https://rdrr.io/cran/cowplot/man/panel_border.html
I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question:
I've created a plot of categorical data using facet in ggplot.
Example script here:
#script to produce plot with dummy data
rm(list=ls(all=TRUE))
library(ggplot2)
require(gridExtra)
#put dummy data in df
dummy_data<-data.frame(experiment_number=c(rep("exp_1",15),rep("exp_2",15)),
group=rep(c("A","B","C"),5),yvalue=runif(30, 0.0, 0.05))
# make plot
plot1<-ggplot(data = dummy_data)+
geom_point(aes(x = group, y = yvalue,
colour=group,shape=group),size=3.5,position = position_jitter(w = 0.2)) +
facet_wrap( ~ experiment_number) +
ylab("yvalue") +
xlab("")
#plot
plot1
I now want to add text & bars below the plot to show the p values relating to a statistical test between the groups -an example where I've just drawn it in my hand is attached (p values just made up).
Note the p values will be different in the two different panels. I've played around with annotate & custom annotate but cant seem to get it to work. Any ideas?
thanks v much
Here's a totally ridiculous way of doing something similar to what you are asking for. I used geom_errorbar for the bars, so I had to flip the coordinate system. Anyway, you should be able to customize this to do what you need.
rm(list=ls(all=TRUE))
library(ggplot2)
#put dummy data in df
dummy_data<-data.frame(experiment_number=c(rep("exp_1",15),rep("exp_2",15)),
group=rep(c("A","B","C"),5),yvalue=runif(30, 0.0, 0.05))
# make plot
plot1<-ggplot(data = dummy_data)+
geom_point(aes(y = group, x = yvalue, #changed x and y
colour=group,shape=group),size=3.5,position = position_jitter(h = 0.2)) + # changed w=... to h=...
facet_wrap( ~ experiment_number) +
xlab("yvalue") +
ylab("") + coord_flip() # flipped coordinate system
#plot
rng <- range(dummy_data$yvalue) # range
df.lines <- data.frame(ymin=LETTERS[1:3], ymax=LETTERS[c(2,3,1)], x=rng[1]-diff(rng)*1:3/12) #data for geom_errorbar
# data for geom_text
df.txt <- data.frame(y=c("AB", "BC", "B"),
x=rng[1]-diff(rng)*(1:3+.5)/12,
label=c("p=0.003", "p=0.05", "p=0.6",
"p=0.2", "p=0.1", "p=0.05"),
experiment_number=rep(c("exp_1", "exp_2"), each=3))
# add some space and geom_errorbar and geom_text
plot2 <- plot1 + scale_x_continuous(limits=c(rng[1]-diff(rng)/3, rng[2]+diff(rng)/5)) +
geom_errorbar(data=df.lines, aes(x=x, ymin=ymin, ymax=ymax)) +
scale_y_discrete(breaks=LETTERS[1:3], limits=c("A", "AB", "B", "BC", "C")) +
geom_text(data=df.txt, aes(x=x, y=y, label=label), xjust=0.5)
plot2
I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question: