gradient fill violin plots using ggplot2 - r

I want to gradient fill a violin plot based on the density of points in the bins (blue for highest density and red for lowest).
I have generated a plot using the following commands but failed to color it based on density (in this case the width of the violin. I also would like to generate box plots with similar coloring).
library("ggplot2")
data(diamonds)
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()

to change the colour of the violin plot you use fill = variable, like this:
ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin(aes(fill=cut))
same goes for boxplot
ggplot(diamonds, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=cut))
but whatever value you have has to have the same value for each cut, that is, if you wanted to use for example mean depth/cut as the color variable you would have to code it.
with dplyr group your diamonds by cut and with summarize get the mean depth (or any other variable)
library(dplyr)
diamonds_group <- group_by(diamonds, cut)
diamonds_group <- summarize(diamonds_group, Mean_Price = mean(price))
Then I used diamonds2 as a copy of diamonds to then manipulate the dataset
diamonds2 <- diamonds
I merge both dataframes to get the Mean_Depth as a variable in diamonds2
diamonds2 <- merge(diamonds2, diamonds_group)
And now I can plot it with mean depth as a color variable
ggplot(diamonds2, aes(x=cut,y=carat)) + geom_boxplot(aes(fill=Mean_Price)) + scale_fill_gradient2(midpoint = mean(diamonds2$price))

Just answered this for another thread, but believe it's possibly more appropriate for this thread. You can create a pseudo-fill by drawing many segments. You can get those directly from the underlying data in the ggplot_built object.
If you want an additional polygon outline ("border"), you'd need to create this from the x/y coordinates. Below one option.
library(tidyverse)
p <- ggplot(diamonds, aes(x=cut,y=carat)) + geom_violin()
mywidth <- .35 # bit of trial and error
# all you need for the gradient fill
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth)
# the outline is a bit more convoluted, as the order matters
vl_poly <- vl_fill %>%
select(xnew, xend, y, group) %>%
pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>%
arrange(y) %>%
split(., .$oldx) %>%
map(., function(x) {
if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
x
}) %>%
bind_rows()
ggplot() +
geom_polygon(data = vl_poly, aes(x, y, group = group),
color= "black", size = 1, fill = NA) +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = violinwidth))
Created on 2021-04-14 by the reprex package (v1.0.0)

Related

How to make a violin plot gradient fill as a colour Scale based on Y-axis

I am trying to assign a colour scale/gradient to some violin plots based on the y-axis value of Income. However, I only get a white violin plot. I can change the colour based on state.region but not Income.
Data
USA.states <- data.frame(state.region,state.x77)
Code
p <- ggplot(USA.states,aes(x=state.region,y=Income,fill=Income))+
geom_violin(trim = F,)+
ggtitle("Violin plot of income and Population")
p + scale_fill_gradient(low="red",high="blue")
I assigned the fill to Income but it just ends up filled with white.
You can create a pseudo-fill from segments, and you can create those from the underlying data (in the ggplot_built object) directly.
If you want an additional polygon outline, you would still need to create the polygons manually though, using x and y coordinates as calculated for the segments. (There is certainly a cleverer way to put this into a data frame than below, so don't take this as gospel).
Of another note, the violins in the original plot seem to be scaled, but I don't exactly understand how, so I just scaled it with a constant which I found with some trial and error.
library(tidyverse)
USA.states <- data.frame(state.region,state.x77)
p <- ggplot(USA.states,aes(x=state.region,y=Income,fill=Income))+
geom_violin(trim = F)
mywidth <- .35 # bit of trial and error
# This is all you need for the fill:
vl_fill <- data.frame(ggplot_build(p)$data) %>%
mutate(xnew = x- mywidth*violinwidth, xend = x+ mywidth*violinwidth)
# Bit convoluted for the outline, need to be rearranged: the order matters
vl_poly <-
vl_fill %>%
select(xnew, xend, y, group) %>%
pivot_longer(-c(y, group), names_to = "oldx", values_to = "x") %>%
arrange(y) %>%
split(., .$oldx) %>%
map(., function(x) {
if(all(x$oldx == "xnew")) x <- arrange(x, desc(y))
x
}) %>%
bind_rows()
ggplot() +
geom_polygon(data = vl_poly, aes(x, y, group = group),
color= "black", size = 1, fill = NA) +
geom_segment(data = vl_fill, aes(x = xnew, xend = xend, y = y, yend = y,
color = y))
Created on 2021-04-14 by the reprex package (v1.0.0)

Convert a geom_tile in dotplot in ggplot2

I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)

Unintended line across X axis of density plot (r)

I am trying to identify why I have a purple line appearing along the x axis that is the same color as "Prypchan, Lida" from my legend. I took a look at the data and do not see any issues there.
ggplot(LosDoc_Ex, aes(x = LOS)) +
geom_density(aes(colour = AttMD)) +
theme(legend.position = "bottom") +
xlab("Length of Stay") +
ylab("Distribution") +
labs(title = "LOS Analysis * ",
caption = "*exluding Residential and WSH",
color = "Attending MD: ")
Usually I'd wait for a reproducible example, but in this case, I'd say the underlying explanation is really quite straightforward:
geom_density() creates a polygon, not a line.
Using a sample dataset from ggplot2's own package, we can observe the same straight line below the density plots, covering the x-axis & y-axis. The colour of the line simply depends on which plot is on top of the rest:
p <- ggplot(diamonds, aes(carat, colour = cut)) +
geom_density()
Workaround 1: You can manually calculate the density values yourself for each colour group in a new data frame, & plot the results using geom_line() instead of geom_density():
library(dplyr)
library(tidyr)
library(purrr)
diamonds2 <- diamonds %>%
nest(-cut) %>%
mutate(density = map(data, ~density(.x$carat))) %>%
mutate(density.x = map(density, ~.x[["x"]]),
density.y = map(density, ~.x[["y"]])) %>%
select(cut, density.x, density.y) %>%
unnest()
ggplot(diamonds2, aes(x = density.x, y = density.y, colour = cut)) +
geom_line()
Workaround 2: Or you can take the data generated by the original plot, & plot that using geom_line(). The colours would need to be remapped to the legend values though:
lp <- layer_data(p)
if(is.factor(diamonds$cut)) {
col.lev = levels(diamonds$cut)
} else {
col.lev = sort(unique(diamonds$cut))
}
lp$cut <- factor(lp$group, labels = col.lev)
ggplot(lp, aes(x = x, y = ymax, colour = cut)) +
geom_line()
There are two simple workarounds. First, if you only want lines and no filled areas, you can simply use geom_line() with the density stat:
library(ggplot2)
ggplot(diamonds, aes(x = carat, y = stat(density), colour = cut)) +
geom_line(stat = "density")
Note that for this to work, we need to set the y aesthetic to stat(density).
Second, if you want the area under the lines to be filled, you can use geom_density_line() from the ggridges package. It works exactly like geom_density() but draws a line (with filled area underneath) rather than a polygon.
library(ggridges)
ggplot(diamonds, aes(x = carat, colour = cut, fill = cut)) +
geom_density_line(alpha = 0.2)
Created on 2018-12-14 by the reprex package (v0.2.1)

R Highlight point on ecdf line graph

I'm creating a frequency plot using ggplot and the stat_ecdf function. I would like to add the Y-value to the graph for specific X-values, but just can't figure out how. geom_point or geom_text seems likely options, but as stat_ecdf automatically calculates Y, I don't know how to call that value in the geom_point/text mappings.
Sample code for my initial plot is:
x = as.data.frame(rnorm(100))
ggplot(x, aes(x)) +
stat_ecdf()
Now how would I add specific y-x points here, e.g. y-value at x = -1.
The easiest way is to create the ecdf function beforehand using ecdf() from the stats package, then plot it using geom_label().
library(ggplot2)
# create a data.frame with column name
x = data.frame(col1 = rnorm(100))
# create ecdf function
e = ecdf(x$col1)
# plot the result
ggplot(x, aes(col1)) +
stat_ecdf() +
geom_label(aes(x = -1, y = e(-1)),
label = e(-1))
You can try
library(tidyverse)
# data
set.seed(123)
df = data.frame(x=rnorm(100))
# Plot
Values <- c(-1,0.5,2)
df %>%
mutate(gr=FALSE) %>%
bind_rows(data.frame(x=Values,gr=TRUE)) %>%
mutate(y=ecdf(x)(x)) %>%
mutate(xmin=min(x)) %>%
ggplot(aes(x, y)) +
stat_ecdf() +
geom_point(data=. %>% filter(gr), aes(x, y)) +
geom_segment(data=. %>% filter(gr),aes(y=y,x=xmin, xend=x,yend=y), color="red")+
geom_segment(data=. %>% filter(gr),aes(y=0,x=x, xend=x,yend=y), color="red") +
ggrepel::geom_label_repel(data=. %>% filter(gr),
aes(x, y, label=paste("x=",round(x,2),"\ny=",round(y,2))))
The idea is to add the y values in the beginning, together with the index gr specifing which Values you want to show.
Edit:
Since this code adds points to the actual data, which could be wrong for the curve, one should consider to remove these points at least in the ecdf function stat_ecdf(data=. %>% filter(!gr))

Ggplot Heatmap - customized colors for customized count ranges

I want to make a heatmap that creates a group of clarity & color combinations as the X axis and cut as the Y axis. The heatmap would color based upon the counts of clarity+color and its intersection with the cut.
library(ggplot2)
library(dplyr)
## rename diamonds df
# 1. Generate a count for the frequency of cut+clarity
# 2. Make a heatmap of this using the following bins
# 3. Red <= 100 Frequency
Yellow = between (100 and 500)
Green > 500
# place counts inside the cell:
df = diamonds %>%
select( cut, clarity) %>%
group_by(cut,clarity)%>%
mutate(count = n())
myplot = ggplot(df, aes(x = clarity, y=cut)) +
geom_bin2d( bins = c(100,500,50000), col='orange') #
geom_text( aes(label = count),col='red')
myplot
Try this:
df$col <- cut(df$count,breaks = c(-Inf,100,500,Inf),right = TRUE)
df$color<-df$col
levels(df$color) <- c("<=100","100<#<=500",">500")
ggplot(data = df, aes(x = clarity, y = cut)) +
geom_tile(aes(fill = df$color), colour = "white") +
scale_fill_brewer("Count",palette = "Set1")+
geom_text(aes(label = count),col='yellow',cex=3)

Resources