Layering violin plots with geom_violin to compare distributions - r

I am trying to compare the distributions of a continuous variable across groups using violin plots. Pretty easy. However, I would like to make comparisons across distributions easier by showing the distribution for one of the groups (the reference) in grey with a low alpha value in the background. Something like this but with a violin plot:
My current approach plots the data twice. For the first geom_violin, I duplicate the data for the reference group and plot it in grey. For the second geom_violin, I use the actual data d. In this example, the two violin plots in grey and blue should look the same for the group "blue". However, they are NOT the same even though they are based on exactly the same data for group "blue".
How can I resolve this problem? Or is there another better approach to do this?
d <- tibble(
group = sample(c("green", "blue"), 1000, replace = TRUE, prob = c(0.7, 0.3)),
x = ifelse(group == "green", rnorm(1000, 1, 1), rnorm(1000, 0, 3))
)
dblue <- filter(d, group == "blue")
dblue <- bind_rows(dblue, mutate(dblue, group = "green"))
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0))

Add scale = "width" to the second geom_violin
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0),
scale = "width")

Related

ggplot2 Facet wrapping a filtered variable while keeping background geoms

I'm trying to create a set of small multiple line charts, where in each chart all lines are visible as light grey/background but in each individual chart one line is one highlighted.
As an example I want to go from this chart: facet_wrap without previous geom to a facet_wrap that would have two plots, each with all grey dashed lines, and one coloured line per plot.
I can facet_wrap so each highlighted line has it's own chart (but no other background lines show)(facet_fail below) and I can create individual charts which have all lines in grey and a highlighted single line (ind_chart below), but I can't get both at the same time.
I'm sure I'm missing a simple setting here but can't see it sorry.
Here's a simplified example of the data and charts I've made so far.
(the lines have been converted to radar/polar coordinates but I don't think that should matter?)
library(tidyverse)
#variables
food <- rep(c("apple", "pear", "orange"),3)
sense <- as.factor(rep(c("taste", "texture", "appearance"),each = 3))
score <- sample(0:10, 9)
#dataframe
df <- data.frame(food,sense,score) %>%
mutate(theta = (as.integer(sense)-1)*(360/n_distinct(y)*pi/180)+pi/2) %>% #transrom data to polar with cartesian coords
mutate(x = cos(theta)*score,
y = sin(theta)*score)%>%
rbind(filter(., sense == "taste")) #repeat first row so that geom_path completes circuit
# Plot
p <- ggplot(df, aes(x = x, y = y, group = food))+
geom_point(colour = "gray", size = 3)+
geom_path(colour = "gray", size = 1, linetype = "longdash")+
geom_spoke(data = data.frame( x = 0,
y = 0,
angle =seq(from = 0, to = 2*pi, length.out = 4)+pi/2,
radius = 10),
aes(x = x, y = y, angle = angle, radius = radius, group = NULL),
colour = "darkgray", size = 0.8)+
theme_void()
p
# individual chart is correct
ind_chart <- p + geom_path(data = filter(df, food == "apple" | food == "orange"), aes(color = food), size = 2)
# facet wrap by 'food' creates multiples without backgroud geom_path's from p
fltr <- filter(df, food == "apple" | food == "orange")
facet_fail <- p +
geom_path(data = fltr , aes(color = food))+
facet_wrap(~food)+
coord_equal()

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

How to get different colors related to treatment for boxplot and violin plot (ggplot / using geom_split_violin) that are plotted in one?

I am trying to show a boxplot and a violin plot in one.
I can fill in the colors of the boxplot and violin plot based on the treatment. But, I don't want them in exactly the same color, I'd prefer the violin plot or the boxplot filling to be lighter.
Also, I am able to get the outer lines of the boxplot in different colors if I add col=TM to the aes of the geom_boxplot. But, then I can not choose these colors or don't know how to (they are now automatically pink and blue).
BACKGROUND:
I am working with a data set that looks something like this:
TM yax X Zscore
Org zscore zhfa -1.72
Org zscore zfwa -0.12
I am plotting the z-scores based on the X (zhfa e.d.) per treatment (TM).
#Colours
ocean = c('#BBDED6' , '#61C0BF' , '#FAE3D9' , '#FFB6B9' )
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(col="white", fill="white") +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
Now, half of the violin plot is filled white, but not both (which would already be better). If I would plot geom_split_violin() it would get exactly the same colors as the boxplot.
Furthermore, should the violinplot of zhfa be on the left side but it get's switched and is displayed at the right side, while it matched the data of the organic (left) boxplot.
The graph now:
I don't know if it can be solved by adding something related to the scale_fill_manual or if this is an impossible request
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
You can add an additional column to your data that is the same structure as TM but different values, then scale the fill:
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
Begin solution:
data <- data %>% mutate(TMm = c(rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5)))
#Colours
ocean = c('#BBDED6' , '#FAE3D9', '#61C0BF' , '#FFFFFF')
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(mapping = aes(fill=TMm)) +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(breaks = c("org", "min"), values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
In your data you may have to change breaks = c("org", "min") to whatever you call the factor levels in the TM variable
Or if you want the whole violin plot white:
ocean = c('#BBDED6' , '#FFFFFF', '#61C0BF' , '#FFFFFF')
New Plot:

Change alpha value for certain break values in ggplot geom_point

I have made a scatter plot from 100k++ points and i would like the colour points (break values 1 and 2 which are "green" and break value 20 which is "red") to stand out more than the "cornsilk1" points (break values 3 to 19). I have tried the code below but no luck.
Any help would be appreciated.
Thanks so much
p.s. please excuse my juvenile code. I am sure there is a way more effective way to do this...
plotIA<-ggplot(plotintaobs,aes(x=SD13009PB,y=SD13009PB2,colour=quartile))+geom_point()+labs(x="Phillips Observeration 1", y="Phillips Observation 2") + ggtitle("Intra-observer Variation") + mytheme
plotIA+ scale_color_manual(breaks = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"),
values=c("green","green", "cornsilk1", "cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","red"))
plotIA+scale_alpha_manual(values=c(1,1,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,1))
One strategy is to use cut to split the quartiles into into your three groups. Then you can use scale_colour_manual
# some fake data
plotintaobs <- data.frame(SD13009PB = rnorm(20), SD13009PB2 = rnorm(20), quartile = 1:20)
#cut quartile
plotintaobs$q2 <- cut(plotintaobs$quartile, breaks = c(0, 2, 19, 20), labels = c("low", "mid", "high"))
#plot
plotIA <- ggplot(plotintaobs, aes(x = SD13009PB, y = SD13009PB2, colour = q2, alpha = q2)) +
geom_point() +
scale_colour_manual(values = c("green", "cornsilk1","red")) +
scale_alpha_manual(values = c(1, 0.8, 1))
plotIA

How to make a color scale with sharp transition in ggplot2

I am trying to create a color scale with a sharp color transition at one point. What I am currently doing is:
test <- data.frame(x = c(1:20), y = seq(0.01, 0.2, by = 0.01))
cutoff <- 0.10
ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y), width = 1, binwidth = 0)) +
geom_bar(stat = "identity") +
scale_fill_gradientn(colours = c("red", "red", "yellow", "green"),
values = rescale(log(c(0.01, cutoff - 0.0000000000000001, cutoff, 0.2))),
breaks = c(log(cutoff)), label = c(cutoff))
It is producing the plots I want. But the position of the break in colorbar somehow varies depending on the cutoff. Sometimes below the value, sometimes above, sometimes on the line. Here are some plots with different cutoffs (0.05, 0.06, 0.1):
What am I doing wrong? Or alternatively, is there a better way to create a such a color scale?
Have you looked into scale_colour_steps or scale_colour_stepsn?
Using the option n.break from scale_colour_stepsn you should be able to specify the number of breaks you want and have sharper transitions.
Be sure to use ggplot2 > 3.3.2
In case you are still interested in a solution for this, you can add guide = guide_colourbar(nbin = <some arbitrarily large number>) to scale_fill_gradientn(). This increases the number of bins used by the colourbar legend, which makes the transition look sharper.
# illustration using nbin = 1000, & weighted colours below the cutoff
plot.cutoff <- function(cutoff){
p <- ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y))) +
geom_col(width = 1) +
scale_fill_gradientn(colours = c("red4", "red", "yellow", "green"),
values = scales::rescale(log(c(0.01, cutoff - 0.0000000000000001,
cutoff, 0.2))),
breaks = c(log(cutoff)),
label = c(cutoff),
guide = guide_colourbar(nbin = 1000))
return(p)
}
cowplot::plot_grid(plot.cutoff(0.05),
plot.cutoff(0.06),
plot.cutoff(0.08),
plot.cutoff(0.1),
ncol = 2)
(If you find the above insufficiently sharp at very high resolutions, you can also set raster = FALSE in guide_colourbar(), which turns off interpolation & draws rectangles instead.)
I think it is slightly tricky to achieve an exact, discrete cutoff point in the continuous color scale using scale_fill_gradientn. A quick alternative would be to use scale_fill_gradient, set the cutoff with limits, and set the color of 'out-of-bounds' values with na.value.
Here's a slightly simpler example than in your question:
# some data
df <- data.frame(x = factor(1:10), y = 1, z = 1:10)
# a cutoff point
lo <- 4
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green",
limits = c(lo, max(df$z)), na.value = "red")
As you see, the values below your cutpoint will not appear in the legend, but one may consider including a large chunk of red a waste of "legend band width" anyway. You might just add a verbal description of the red bars in the figure caption instead.
You may also wish to differentiate between values below a lower cutpoint and above an upper cutpoint. For example, set 'too low' values to blue and 'too high values' to red. Here I use findInterval to differentiate between low, mid and high values.
# some data
set.seed(2)
df <- data.frame(x = factor(1:10), y = 1, z = sample(1:10))
# lower and upper limits
lo <- 3
hi <- 8
# create a grouping variable based on the the break points
df$grp <- findInterval(df$z, c(lo, hi), rightmost.closed = TRUE)
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green", limits = c(lo, hi), na.value = "red") +
geom_bar(data = df[df$grp == 0, ], fill = "blue", stat = "identity")

Resources