I've made a ggplot, coloured by the hours of incubation. When I add significance bars using geom_signif, they are all coloured by the first colour, here pink. Ideally I'd like to be able to choose the colour of the significance bars, so i can indicate which incubation time they refer to. Or if that's not possible, how would I make them black?
ggplot(data = data, mapping = aes(y = Fluorescence, x = Treatment, colour = Hours)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(position = position_jitterdodge(jitter.width = 0.3, jitter.height = 0), alpha = 0.4) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
xlab("Treatment") +
ylab("Mean fluorescence") +
ggtitle("C6/36") +
geom_signif(y_position = c(5.8,6.3,2.5,2,1.5,0.5,1),
xmin = c(2,3,2,2.8,1.8,1.8,2.8),
xmax = c(3,4,4,10,3.8,2.8,3.8),
annotations = c("****","****","ns","****","****","ns","**"),
textsize=3.5)
If you set the color mapping only to the boxplot geom, coloring will only applied to this particular geom:
library(tidyverse)
library(ggsignif)
iris %>%
mutate(group = Species %in% c("setosa", "versicolor")) %>%
ggplot(aes(group, Sepal.Length)) +
geom_boxplot(aes(color = Species)) +
geom_jitter(aes(color = Species)) +
geom_signif(comparisons = list(c("TRUE", "FALSE")))
Created on 2021-09-14 by the reprex package (v2.0.1)
Related
I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)
i need the plan legend
How to add a legend manually for geom_line
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+theme_classic()+
geom_line(data = impact_end_Current_yr_m_plan, aes(x=month, y= gender_value, group=1),color="#288D55",size=1.2)+
geom_point(data = impact_end_Current_yr_m_plan, aes(x=month, y=gender_value))+
theme(axis.line.y = element_blank(),axis.ticks = element_blank(),legend.position = "bottom", axis.text.x = element_text(face = "bold", color = "black", size = 10, angle = 0, hjust = 1))+
labs(x="", y="End Beneficiaries (in Num)", fill="")+
scale_fill_manual(values=c("#284a8d", "#00B5CE","#0590eb","#2746c2"))+
scale_y_continuous(labels = function(x) format(x, scientific = FALSE)
The neatest way to do it I think is to add colour = "[label]" into the aes() section of geom_line() then put the manual assigning of a colour into scale_colour_manual() here's an example from mtcars (apologies that it uses stat_summary instead of geom_line but does the same trick):
library(tidyverse)
mtcars %>%
ggplot(aes(gear, mpg, fill = factor(cyl))) +
stat_summary(geom = "bar", fun = mean, position = "dodge") +
stat_summary(geom = "line",
fun = mean,
size = 3,
aes(colour = "Overall mean", group = 1)) +
scale_fill_discrete("") +
scale_colour_manual("", values = "black")
Created on 2020-12-08 by the reprex package (v0.3.0)
The limitation here is that the colour and fill legends are necessarily separate. Removing labels (blank titles in both scale_ calls) doesn't them split them up by legend title.
In your code you would probably want then:
...
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+
geom_line(data = impact_end_Current_yr_m_plan,
aes(x=month, y= gender_value, group=1, color="Plan"),
size=1.2)+
scale_color_manual(values = "#288D55") +
...
(but I cant test on your data so not sure if it works)
I have a dataset containing 1,000 values for a model, these values are all within the same range (y=40-70), so the points overlap a ton. I'm interested in using color to show the density of the points converging on a single value (y=56.72) which I have indicated with a horizontal dashed line on the plot below. How can I color these points to show this?
ggplot(data, aes(x=model, y=value))+
geom_point(size=1) +
geom_hline(yintercept=56.72,
linetype="dashed",
color = "black")
I think that you should opt for an histogram or density plot:
n <- 500
data <- data.frame(model= rep("model",n),value = rnorm(n,56.72,10))
ggplot(data, aes(x = value, y = after_stat(count))) +
geom_histogram(binwidth = 1)+
geom_density(size = 1)+
geom_vline(xintercept = 56.72, linetype = "dashed", color = "black")+
theme_bw()
Here is your plot with the same data:
ggplot(data, aes(x = model, y = value))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
If your model is iterative and do converge to the value, I suggest you plot as a function of the iteration to show the convergence. An other option, keeping a similar plot to your, is dodging the position of the points :
ggplot(data, aes(x = model, y = value))+
geom_point(position = position_dodge2(width = 0.2),
shape = 1,
size = 2,
stroke = 1,
alpha = 0.5) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
Here is a color density plot as you asked:
library(dplyr)
library(ggplot2)
data %>%
mutate(bin = cut(value, breaks = 10:120)) %>%
dplyr::group_by(bin) %>%
mutate(density = dplyr::n()) %>%
ggplot(aes(x = model, y = value, color = density))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")+
scale_colour_viridis_c(option = "A")
I would suggest to use the alpha parameter within the geom_point. You should use a value close to 0.
ggplot(data, aes(x=model, y=value)) +
geom_point(size=1, alpha = .1) +
geom_hline(yintercept=56.72, linetype="dashed", color = "black")
based on some dummy data I created a histogram with desity plot
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
The histogram of weight shall be colored corresponding to sex, so I use aes(y = ..density.., color = sex) for geom_histogram():
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
As I want it to, the density plot stays the same (overall for both groups), but the histograms jump scale up (and seem to be treated individually now):
How do I prevent this from happening? I need individually colored histogram bars but a joint density plot for all coloring groups.
P.S.
Using aes(color = sex) for geom_density() gets everything back to original scales - but I don't want individual density plots (like below):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
EDIT:
As it has been suggested, dividing by the number of groups in geom_histogram()'s aesthetics with y = ..density../2 may approximate the solution. Nevertheless, this only works with symmetric distributions like in the first output below:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
which yields
Less symmetric distributions, however, may cause trouble using this approach. See those below, where for 5 groups, y = ..density../5 was used. First original, then manipulation (with position = "stack"):
Since the distribution is heavy on the left, dividing by 5 underestimates on the left and overestimates on the right.
EDIT 2: SOLUTION
As suggested by Andrew, the below (complete) code solves the problem:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
Note:
binwidth = binwidth needed to be passed to ggplot()'s aes(), otherwise the pre-specified binwidth would not be found by geom_histogram()'s aes(). Further, position = "stack" is specified, so that both versions of the histogram are comparable. Plots for dummy data and the more complex distribution below:
Solved - Thanks for your help!
I don't think you can do it using y=..density.., but you can recreate the same thing like this...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
I would like to add summary statistics (e.g. mean) to the boxplot which have two factors. I have tried this:
library(ggplot2)
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean, aes(supp,dose), geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green"))
I got this picture:
It is possible somehow put the sample size of each box (e.g. top of the whiskers)? I have tried this:
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean,aes(supp,dose),geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth,
group_by(dose, supp),
summarize(Count = n(),
q3 = quantile(ToothGrowth, 0.75),
iqr = IQR(ToothGrowth),
aes(x= dose, y = len,label = paste0("n = ",Count, "\n")), position = position_dodge(width = 0.75)))
You can state the aesthetics just once by putting them in the main ggplot call and then they will apply to all of the geom layers: ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp))
For the count of observations: The data summary step in geom_text isn't coded properly. Also, to set len (the y-value) for the text placement, the summarize function needs to output values for len.
To add the mean values in the correct locations on the x-axis, use stat_summary with the exact same aesthetics as the other geoms and stats. I've overridden the color aesthetic by setting the color to yellow so that the point markers will be visible on top of the box plot fill colors.
The code to implement the plot is below:
library(tidyverse)
pd = position_dodge(0.85)
ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp)) +
stat_boxplot(geom = "errorbar", position = pd) +
geom_boxplot(notch=TRUE, notchwidth=0.5, outlier.size=2, position=pd) +
stat_summary(fun.y=mean, geom="point", shape=3, size=2, colour="yellow", stroke=1.5,
position=pd, show.legend=FALSE) +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n(),
len=max(len) + 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +
theme_bw()
Note that the notch goes outside the hinges for all of the box plots. Also, having the sample size just above the maximum of each boxplot seems distracting and unnecessary to me. You could place all of the text annotations at the bottom of the plot like this:
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n()) %>%
ungroup %>%
mutate(len=min(ToothGrowth$len) - 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +