Raincloud plot - histogram? - r

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)

This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Related

How do I shift the geom_text labels to AFTER a geom_segment arrow in ggplot2?

I have an NMDS ordination that I've plotted using ggplot2. I've added environmental vectors on top (from the envfit() function in vegan) using geom_segment() and added corresponding labels to the same coordinates as the segments using geom_text() (code below):
ggplot() +
geom_point(data = nmds.sites.plot, aes(x = NMDS1, y = NMDS2, col = greening), size = 2) +
labs(title = "Study Area",
col = "Sites") +
geom_polygon(data = hull.data, aes(x = NMDS1, y = NMDS2, fill = grp, group = grp), alpha = 0.2) +
scale_fill_discrete(name = "Ellipses",
labels = c("High", "Moderate", "Control")) +
xlim(c(-1, 1)) +
guides(shape = guide_legend(order = 1),
colour = guide_legend(order = 2)) +
geom_segment(data = env.arrows,
aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
arrow = arrow(length = unit(0.25, "cm")),
colour = "black", inherit.aes = FALSE) +
geom_text(data = env.arrows, aes(x = NMDS1, y = NMDS2, label = rownames(env.arrows))) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 14))
However, since the labels are justified to centre, part of the label sometimes overlaps with the end of the arrow. I want to have the text START at the end of the arrow. In some other cases, if the arrow is pointing up, it pushes into the middle of the text. Essentially, I want to be able to see both the arrow head AND the text.
I have tried using geom_text_repel() from the ggrepel package but the placement seems random (and will also repel from other points or text in the plot (or just not do anything at all).
[EDIT]
Below are the coordinates of the NMDS vectors (this is the env.arrows object from the example code above):
NMDS1 NMDS2
Variable1 -0.46609087 0.27567532
Variable2 -0.21524887 -0.10128795
Variable3 0.59093184 0.03423775
Variable4 -0.00136418 0.46550043
Variable5 -0.30900813 -0.19659929
Variable6 0.53510347 -0.36387227
Variable7 0.66376246 -0.05220685
In the code below, we create a radial shift function to move the labels away from the arrows. The shift includes a constant amount plus an additional shift that varies with the absolute value of the cosine of the label's angle to the x-axis. This is because labels with theta near 0 or 180 degrees have a larger length of overlap with the arrows, and therefore need to be moved farther, than labels with theta near 90 or 270 degrees.
You may need to tweak the code a bit to get the labels exactly where you want them. Also, you'll likely need to add an additional adjustment if the variable names can have different widths.
One additional note: I've turned the variable names into a data column. You should do this with your data as well and then map that data column to the label argument of aes. Using rownames(env.arrows) for the labels reaches outside the ggplot function environment to the external data frame env.arrows and breaks the mapping to the data frame you've provided in the data argument to geom_text (although it likely won't cause a problem in this particular case).
library(tidyverse)
library(patchwork)
# data
env.arrows = read.table(text=" var NMDS1 NMDS2
Variable1 -0.46609087 0.27567532
Variable2 -0.21524887 -0.10128795
Variable3 0.59093184 0.03423775
Variable4 -0.00136418 0.46550043
Variable5 -0.30900813 -0.19659929
Variable6 0.53510347 -0.36387227
Variable7 0.66376246 -0.05220685", header=TRUE)
# Radial shift function
rshift = function(r, theta, a=0.03, b=0.07) {
r + a + b*abs(cos(theta))
}
# Calculate shift
env.arrows = env.arrows %>%
mutate(r = sqrt(NMDS1^2 + NMDS2^2),
theta = atan2(NMDS2,NMDS1),
rnew = rshift(r, theta),
xnew = rnew*cos(theta),
ynew = rnew*sin(theta))
p = ggplot() +
geom_segment(data = env.arrows,
aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
arrow = arrow(length = unit(0.25, "cm")),
colour = "black", inherit.aes = FALSE) +
geom_text(data = env.arrows, aes(x = NMDS1, y = NMDS2, label = var)) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 14))
pnew = ggplot() +
geom_segment(data = env.arrows,
aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
arrow = arrow(length = unit(0.2, "cm")),
colour = "grey60", inherit.aes = FALSE) +
geom_text(data = env.arrows, aes(x = xnew, y = ynew, label = var), size=3.5) +
coord_fixed() +
theme_bw() +
theme(text = element_text(size = 14)) +
scale_x_continuous(expand=expansion(c(0.12,0.12))) +
scale_y_continuous(expand=expansion(c(0.07,0.07)))
p / pnew

Combine legend for fill and colour ggplot to give only single legend

I am plotting a smooth to my data using geom_smooth and using geom_ribbon to plot shaded confidence intervals for this smooth. No matter what I try I cannot get a single legend that represents both the smooth and the ribbon correctly, i.e I am wanting a single legend that has the correct colours and labels for both the smooth and the ribbon. I have tried using + guides(fill = FALSE), guides(colour = FALSE), I also read that giving both colour and fill the same label inside labs() should produce a single unified legend.
Any help would be much appreciated.
Note that I have also tried to reset the legend labels and colours using scale_colour_manual()
The below code produces the below figure. Note that there are two curves here that are essentially overlapping. The relabelling and setting couours has worked for the geom_smooth legend but not the geom_ribbon legend and I still have two legends showing which is not what I want.
ggplot(pred.dat, aes(x = age.x, y = fit, colour = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci, fill = tagged), alpha = 0.2, colour = NA) +
theme_classic() +
labs(x = "Age (days since hatch)", y = "Body mass (g)", colour = "", fill = "") +
scale_colour_manual(labels = c("Untagged", "Tagged"), values = c("#3399FF", "#FF0033")) +
theme(axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.text = element_text(size = 12))
The problem is that you provide new labels for the color-aesthetic but not for the fill-aesthetic. Consequently ggplot shows two legends because the labels are different.
You can either also provide the same labels for the fill-aesthetic (code option #1 below) or you can set the labels for the levels of your grouping variable ("tagged") before calling ggplot (code option #2).
library(ggplot2)
#make some data
x = seq(0,2*pi, by = 0.01)
pred.dat <- data.frame(x = c(x,x),
y = c(sin(x), cos(x)) + rnorm(length(x) * 2, 0, 1),
tag = rep(0:1, each = length(x)))
pred.dat$lci <- c(sin(x), cos(x)) - 0.4
pred.dat$uci <- c(sin(x), cos(x)) + 0.4
#option 1: set labels within ggplot call
pred.dat$tagged <- as.factor(pred.dat$tag)
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
scale_color_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
scale_fill_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
theme_classic() + theme(legend.title = element_blank())
#option 2: set labels before ggplot call
pred.dat$tagged <- factor(pred.dat$tag, levels = 0:1, labels = c("untagged", "tagged"))
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
theme_classic() + theme(legend.title = element_blank())

Combine/Overlay boxplot with histogram in R

I need to combine the boxplot with the histogram using ggplot2. So far I have this code.
library(dplyr)
library(ggplot2)
data(mtcars)
dat <- mtcars %>% dplyr::select(carb, wt) %>%
dplyr::group_by(carb) %>% dplyr::mutate(mean_wt = mean(wt), carb_count = n())
plot<-ggplot(data=mtcars, aes(x=carb, y=..count..)) +
geom_histogram(alpha=0.3, position="identity", lwd=0.2,binwidth=1)+
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_text(data=aggregate(mean_wt~carb+carb_count,dat,mean), aes(carb, carb_count+0.5, label=round(mean_wt,1)), color="black")
plot + geom_boxplot(data = mtcars,mapping = aes(x = carb, y = 6*wt,group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA)+
scale_y_continuous(name = "Count",
sec.axis = sec_axis(~./6, name = "Weight"))
This results in
However, I dont want the secondary y axis to be the same length of primary y axis. I want the secondary y axis to be smaller and on the top right corner only. Lets say secondary y axis should scale between 20-30 of primary y axis and the box plot should also scale with the axis.
Can anyone help me with this?
Here's one approach, where I adjusted the secondary axis formula and tweaked the way it's labeled. (EDIT: adjusted to make boxplots bigger, per OP comment.)
plot + geom_boxplot(data = mtcars,
# Adj'd scaling so each 1 wt = 2.5 count
aes(x = carb, y = (wt*2.5)+10,group=carb),
color="black", fill="red", alpha=0.2,
width=0.5, outlier.shape = NA)+ # Wider width
scale_y_continuous(name = "Count", # Adj'd labels to limit left to 0, 5, 10
breaks = 5*0:5, labels = c(5*0:2, rep("", 3)),
# Adj'd scaling to match the wt scaling
sec.axis = sec_axis(~(.-10)/2.5, name = "Weight",
breaks = c(0:5))) +
theme(axis.title.y.left = element_text(hjust = 0.15, vjust = 1),
axis.title.y.right = element_text(hjust = 0.15, vjust = 1))
You might also consider an alternative using the patchwork package, coincidentally written by the same developer who implemented secondary scales in ggplot2...
# Alternative solution using patchwork
library(patchwork)
plot2 <- ggplot(data=mtcars, aes(x=carb, y=..count..)) +
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_boxplot(data = mtcars,
aes(x = carb, y = wt, group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA) +
scale_y_continuous(name = "Weight") +
scale_x_continuous(labels = NULL, name = NULL,
expand = c(0, 0.85), breaks = c(2,4,6,8))
plot2 + plot + plot_layout(nrow = 2, heights = c(1,3)) +
labs(x=NULL)

How to choose the right parameters for dotplot in r ggplot

I intend to make a dot plot somewhat like this:
But there's some issue with the code:
df = data.frame(x=runif(100))
df %>%
ggplot(aes(x )) +
geom_dotplot(binwidth =0.01, aes(fill = ..count..), stackdir = "centerwhole",dotsize=2, stackgroups = T, binpositions = "all")
how to choose bin width to avoid dots overlapping, bins wrapping itself in 2 columns or dots get truncated at the top and bottom?
And why is the y axis showing decimal points instead of count? And how to color the dots by x value? I tried fill = x and no color is shown.
The overlap is caused by the dotsize > 1; as #Jimbuo said, the decimal values on the y axis is due to the internals of this geom; for the fill and color you can use the ..x.. computed variable:
Computed variables
x center of each bin, if binaxis is "x"
df = data.frame(x=runif(1000))
library(dplyr)
library(ggplot2)
df %>%
ggplot(aes(x, fill = ..x.., color = ..x..)) +
geom_dotplot(method = 'histodot',
binwidth = 0.01,
stackdir = "down",
stackgroups = T,
binpositions = "all") +
scale_fill_gradientn('', colours = c('#5185FB', '#9BCFFD', '#DFDFDF', '#FF0000'), labels = c(0, 1), breaks = c(0,1), guide = guide_legend('')) +
scale_color_gradientn(colours = c('#5185FB', '#9BCFFD', '#DFDFDF', '#FF0000'), labels = c(0, 1), breaks = c(0,1), guide = guide_legend('')) +
scale_y_continuous() +
scale_x_continuous('', position = 'top') +
# coord_equal(ratio = .25) +
theme_classic() +
theme(axis.line = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
aspect.ratio = .25,
legend.position = 'bottom',
legend.direction = 'vertical'
)
Created on 2018-05-18 by the reprex package (v0.2.0).
First from the help of ?geom_dotplot
When binning along the x axis and stacking along the y axis, the
numbers on y axis are not meaningful, due to technical limitations of
ggplot2. You can hide the y axis, as in one of the examples, or
manually scale it to match the number of dots.
Thus you can try following. Note, the coloring is not completly fitting the x axis.
library(tidyverse)
df %>%
ggplot(aes(x)) +
geom_dotplot(stackdir = "down",dotsize=0.8,
fill = colorRampPalette(c("blue", "white", "red"))(100)) +
scale_y_continuous(labels = c(0,10), breaks = c(0,-0.4)) +
scale_x_continuous(position = "top") +
theme_classic()
For the correct coloring, you have to calculate the bins by yourself using e.g. .bincode:
df %>%
mutate(gr=with(.,.bincode(x ,breaks = seq(0,1,1/30)))) %>%
mutate(gr2=factor(gr,levels = 1:30, labels = colorRampPalette(c("blue", "white", "red"))(30))) %>%
arrange(x) %>%
{ggplot(data=.,aes(x)) +
geom_dotplot(stackdir = "down",dotsize=0.8,
fill = .$gr2) +
scale_y_continuous(labels = c(0,10), breaks = c(0,-0.4)) +
scale_x_continuous(position = "top") +
theme_classic()}

customize two legends inside one graph in ggplot2

I wanted to comment on the following doubt.
Using this code:
Plot<-data.frame(Age=c(0,0,0,0,0),Density=c(0,0,0,0,0),Sensitivity=c(0,0,0,0,0),inf=c(0,0,0,0,0),sup=c(0,0,0,0,0),tde=c(0,0,0,0,0))
Plot[1,]<-c(1,1,0.857,0.793,0.904,0.00209834)
Plot[2,]<-c(1,2,0.771 ,0.74,0.799,0.00348286)
Plot[3,]<-c(1,3,0.763 ,0.717,0.804,0.00577784)
Plot[4,]<-c(1,4,0.724 ,0.653,0.785,0.00504161)
Plot[5,]<-c(2,1,0.906,0.866,0.934,0.00365742)
Plot[6,]<-c(2,2,0.785 ,0.754,0.813,0.00440399)
Plot[7,]<-c(2,3,0.660,0.593,0.722,0.00542849)
Plot[8,]<-c(2,4,0.544,0.425,0.658,0.00433052)
names(Plot)<-c("Age","Mammographyc density","Sensitivity","inf","sup","tde")
Plot$Age<-c("50-59","50-59","50-59","50-59","60-69","60-69","60-69","60-69")
Plot$Density<-c("Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense","Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense")
levels(Plot$Age)<-c("50-59","60-69")
levels(Plot$Density)<-c("Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense")
pd <- position_dodge(0.2) #
Plot$Density <- reorder(Plot$Density, 1-Plot$Sensitivity)
ggplot(Plot, aes(x = Density, y = 100*Sensitivity, colour=Age)) +
geom_errorbar(aes(ymin = 100*inf, ymax = 100*sup), width = .1, position = pd) +
geom_line(position = pd, aes(group = Age), linetype = c("dashed")) +
geom_point(position = pd, size = 4)+
scale_y_continuous(expand = c(0, 0),name = 'Sensitivity (%)',sec.axis = sec_axis(~./5, name = 'Breast cancer detection rate (per 1000 mammograms)', breaks = c(0,5,10,15,20),
labels = c('0‰',"5‰", '10‰', '15‰', '20‰')), limits = c(0,100)) +
geom_line(position = pd, aes(x = Density, y = tde * 5000, colour = Age, group = Age), linetype = c("dashed"), data = Plot) +
geom_point(shape=18,aes(x = Density, y = tde * 5000, colour = Age, group = Age), position = pd, size = 4) +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
theme(legend.position="bottom") + guides(colour = guide_legend(), size = guide_legend(),
shape = guide_legend())
I have made the following graph,
in which the axis on the left is the scale of the circles and the axis on the right is the scale of the diamonds. The fact is that I would like to have a legend approximately like this:
But it is impossible for me, I have tried suggestions of other threads like scale_shape and different commands in guides but I have not got success. I just want to make clear the difference in what shape and color represent.
Would someone know how to help me?
Best regards,
What you should do is a panel plot to avoid the confusion of double axes:
library(dplyr)
library(tidyr)
Plot %>%
gather(measure, Result, Sensitivity, tde) %>%
ggplot(aes(x = Density, y = Result, colour=Age)) +
geom_errorbar(aes(ymin = inf, ymax = sup), width = .1, position = pd,
data = . %>% filter(measure == "Sensitivity")) +
geom_line(aes(group = Age), position = pd, linetype = "dashed") +
geom_point(position = pd, size = 4)+
# scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) +
scale_y_continuous(labels = scales::percent) +
facet_wrap(~measure, ncol = 1, scales = "free_y") +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
theme(legend.position="bottom")
But to do what you asked, you problem is that you have only 1 non-positional aesthetic mapped so you cannot get more than one legend. To force a second legend, you need to add a second mapping. It can be a dummy mapping that has no effect, as below we map alpha but then manually scale both levels to 100%. This solution is not advisable because, as you have done in your example of a desired legend, it is easy to mix up the mappings and have your viz tell a lie by mislabeling which points are sensitivity and which are detection rate.
ggplot(Plot, aes(x = Density, y = 100*Sensitivity, colour=Age, alpha = Age)) +
geom_errorbar(aes(ymin = 100*inf, ymax = 100*sup), width = .1, position = pd) +
geom_line(position = pd, aes(group = Age), linetype = c("dashed")) +
geom_point(position = pd, size = 4)+
scale_y_continuous(expand = c(0, 0),name = 'Sensitivity (%)',sec.axis = sec_axis(~./5, name = 'Breast cancer detection rate (per 1000 mammograms)', breaks = c(0,5,10,15,20),
labels = c('0‰',"5‰", '10‰', '15‰', '20‰')), limits = c(0,100)) +
geom_line(position = pd, aes(x = Density, y = tde * 5000, colour = Age, group = Age), linetype = c("dashed"), data = Plot) +
geom_point(shape=18,aes(x = Density, y = tde * 5000, colour = Age, group = Age), position = pd, size = 4) +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
scale_alpha_manual(values = c(1, 1)) +
guides(alpha = guide_legend("Sensitivity"),
color = guide_legend("Detection Rate", override.aes = list(shape = 18))) +
theme(legend.position="bottom")

Resources