How to add jitter to geom_point when already using dodge

How to add jitter to geom_point when already using dodge - r

I'm plotting summary stats in front of individual geom_points, but can't figure out how to add jitter to the plots. I think the issue is that I'm already using the position argument to move the High and Low water points away from each other.
waterSymPop_p <- ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_dodge(width = 0.9)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()
Here's the plot that produces (obviously not finished with the color scheme, etc)
I'd like the point to be slightly jittered within each point group (ie, not in a straight line). Thanks for the help!
Answer: use position_jitterdodge
Amended code and new figure:
ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.2)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()

Drey answered this.
Answer: use position_jitterdodge
Amended code and new figure:
ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.2)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()

Related

Indicating a range for the Y axis in a bar chart

For my data the average normally lies between 8,000 and 10,000 and I want to indicate this range on my bar chart below,
I want to show to red lines from y=10,000 and y=8,000 and potentially shade the area in between them, if possible.
Bar chart attachted
Monthly_accidents2 %>%
ggplot(aes(x=Month,y=Traffic_Accidents))+
geom_bar(stat ="identity",fill = "#97B3C6")+
geom_text(aes(label = Traffic_Accidents), vjust = 0.5, colour = "white")+
ylim(0,12000)+
#coord_flip()+
theme_dark()+
labs(x=NULL,
y="Number of traffic accidents",
title = " Traffic Accidents throughout the year")
Thanks for any possible help in advance.
I tried creating a data set and adding the two lines but it didn't work.

For adding the two lines, 'geom_hline' makes it very straightforward. And for the shaded area, you can use 'geom_rect', though I'm guessing your 'Month' variable is factor so it requires a little bit of faffing to convert it to numeric first then adjust so the shaded area covers all of your bars :)
You didn't provide any sample data so I haven't run this but it should work.
Monthly_accidents2 %>%
ggplot(aes(x=Month,y=Traffic_Accidents))+
geom_bar(stat ="identity",fill = "#97B3C6")+
geom_text(aes(label = Traffic_Accidents), vjust = 0.5, colour = "white")+
ylim(0,12000)+
geom_hline(yintercept = c(8000, 10000), colour = 'red')+
geom_rect(aes(xmin = min(as.integer(Monthly_accidents2$Month)) - 0.5,
xmax = max(as.integer(Monthly_accidents2$Month)) + 0.5,
ymin = 8000, ymax = 10000), alpha = 0.2, fill = 'darkred')+
#coord_flip()+
theme_dark()+
labs(x=NULL,
y="Number of traffic accidents",
title = " Traffic Accidents throughout the year")

One option to achieve your desired result would be to use geom_hline to add some horizontal lines and annotate to add a shaded rectangle:
Using some fake example data:
Monthly_accidents2 <- data.frame(
Month = factor(month.abb, month.abb),
Traffic_Accidents = 1000 * seq_len(12)
)
library(ggplot2)
base <- ggplot(Monthly_accidents2, aes(x = Month, y = Traffic_Accidents)) +
geom_col(fill = "#97B3C6") +
geom_text(aes(label = Traffic_Accidents), vjust = 0.5, colour = "white") +
ylim(0, 12000) +
theme_dark() +
labs(
x = NULL,
y = "Number of traffic accidents",
title = "Traffic Accidents throughout the year"
) +
theme(plot.title = element_text(hjust = .5))
base +
geom_hline(yintercept = c(8000, 10000), color = "red") +
annotate(geom = "rect", ymin = 8000, ymax = 10000, xmin = -Inf, xmax = Inf, fill = "red", alpha = .2)

add the following to your plot:
+
geom_hline(aes(yintercept = c(8000, 10000), color = "red"))
For the lines
edit:
See stefan's more complete answer.

ggplot crossbar mislocalization in log-scaled data

I want to plot my data as a scatter plot with the mean+sem error bar by using ggplot2. I am using stat_summary to add the mean bar and errorbar. As the variation is pretty huge, I used scale_y_continuous to transform the y axis as a log10 scale for better visualization.
Here is the example data:
Value <- c(815,2467,4130,32588,171,68,582,476)
Treatment <- c(rep("Ctl",4),rep("Mutant",4))
data.frame(Value, Treatment)
It works fine when I use the linear y-axis. The crossbar localized on 10000, the mean of Ctl group.
plot_linear <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw()
However, if I log transform the y axis, the crossbar for the mean value of Ctl always localizes on the second-highest point (4130) but not the mean point (10000).
plot_log <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
# log scaled y axis
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
I don't understand the logic of this wired localization for the crossbar.
Is there a way to plot the mean bar for the log-scaled data?
Thanks a lot!

Is the issue related to you adding the log scale after everything else, so when the error bars are made, they are made for the non-log axis and then the axis is overwritten in log scale? I would try putting the scale_y_conintous() function first after the ggplot() function.

The stat_summary steps are performed after the transformation, which is why the mean looks different with log scale. This is described a bit here:
https://stackoverflow.com/a/14845174/6851825
And more technical discussion about the rationale and implications from the ggplot2 team here:
https://github.com/tidyverse/ggplot2/issues/2804
To resolve this, you have a few options.
Calculate your summary values upstream of ggplot2:
dat_sum <- dat %>%
group_by(Treatment) %>%
summarize(mean = mean(Value),
mean_se = sd(Value) / sqrt(n()))
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
geom_errorbar(data = dat_sum,
aes(y = mean, ymin = mean - mean_se, ymax = mean + mean_se)) +
geom_crossbar(data = dat_sum,
aes(y = mean, ymin = mean, ymax = mean)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
Apply the inverse transform to your summary variables. This is probably more brittle and more likely to confuse others reading the code, but it does keep the calculation within the ggplot2 chain.
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = function(x) log10(mean(10^x)), geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = function(x) log10(mean_se(10^x)), geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)

This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

I am trying to create a plot in ggplot2 similar to this one:
Here is the code I am using:
Dataset %>%
group_by(Participant, Group, Emotion) %>%
ggplot(aes(y = Score, x = Emotion, fill = Group, colour = Group)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .4) +
geom_point(aes(y = Score, color = Group), position = position_jitter(width = .15), size = 3, alpha = 0.4) +
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48') +
stat_summary(fun = mean, geom = 'pointrange', width = 0.2, size = 2, alpha = 1.2, position=position_dodge(width=0.3)) +
stat_summary(fun.data = mean_se, geom='errorbar', width = 0.25, size = 2.2, alpha = 1.2, linetype = "solid",position=position_dodge(width=0.3)) +
guides(color = FALSE) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Dark2") +
ylim(0, 100) +
graph_theme
What I am failing to do is set up the stat_summary(geom = 'line') to connect the green and orange means within each emotion on the x-axis. Could anyone give any pointers on this? I'd also like all the other features to stay the same if possible (e.g., I wouldn't like to use facet_grid or facet_wrap).
Thank you!
When I change the group argument in stat_summary to 'Group' instead of 'Emotion', means for each group are connected across emotions, but I can't figure out how to connect means of different groups within each emotion:

This is a tricky one because your line needs to connect points that have different x values but even if you jitter in the point layer, they still technically have the same x value so the line doesn't know how to connect them. What others have done is to manually add the jitter to force the points to have a different x position. For more inspiration check out this, this and this. Here's an example:
library(tidyverse)
set.seed(1)
emotion <- c("anger", "fear", "sadness")
group <- letters[1:2]
participant <- 1:10
dat <- expand_grid(emotion, group, participant) %>%
mutate(across(everything(), as.factor),
score = sample(x = 1:100, size = nrow(.), replace = T))
dat %>%
mutate(new_emot = case_when(
group == "a" ~as.numeric(emotion) - 0.125,
group == "b" ~as.numeric(emotion) + 0.125
)) %>%
ggplot(aes(x = emotion, y = score)) +
stat_summary(aes(color = group), fun = mean, geom = "point", position = position_dodge(width = 0.5)) +
stat_summary(aes(color = group), fun.data = mean_se, geom = "errorbar", width = 0.5, position = position_dodge(width = 0.5)) +
stat_summary(aes(x = new_emot, group = emotion), fun = mean, geom = "line") +
theme_bw()
Created on 2021-03-24 by the reprex package (v1.0.0)

Setting geom_line to the same position as pointrange and errorbar will solve the problem.
i.e.,
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48', position=position_dodge(width=0.3))

customize two legends inside one graph in ggplot2

I wanted to comment on the following doubt.
Using this code:
Plot<-data.frame(Age=c(0,0,0,0,0),Density=c(0,0,0,0,0),Sensitivity=c(0,0,0,0,0),inf=c(0,0,0,0,0),sup=c(0,0,0,0,0),tde=c(0,0,0,0,0))
Plot[1,]<-c(1,1,0.857,0.793,0.904,0.00209834)
Plot[2,]<-c(1,2,0.771 ,0.74,0.799,0.00348286)
Plot[3,]<-c(1,3,0.763 ,0.717,0.804,0.00577784)
Plot[4,]<-c(1,4,0.724 ,0.653,0.785,0.00504161)
Plot[5,]<-c(2,1,0.906,0.866,0.934,0.00365742)
Plot[6,]<-c(2,2,0.785 ,0.754,0.813,0.00440399)
Plot[7,]<-c(2,3,0.660,0.593,0.722,0.00542849)
Plot[8,]<-c(2,4,0.544,0.425,0.658,0.00433052)
names(Plot)<-c("Age","Mammographyc density","Sensitivity","inf","sup","tde")
Plot$Age<-c("50-59","50-59","50-59","50-59","60-69","60-69","60-69","60-69")
Plot$Density<-c("Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense","Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense")
levels(Plot$Age)<-c("50-59","60-69")
levels(Plot$Density)<-c("Almost entirely fat","Scattered fibroglandular density","Heterogeneously dense","Extremely dense")
pd <- position_dodge(0.2) #
Plot$Density <- reorder(Plot$Density, 1-Plot$Sensitivity)
ggplot(Plot, aes(x = Density, y = 100*Sensitivity, colour=Age)) +
geom_errorbar(aes(ymin = 100*inf, ymax = 100*sup), width = .1, position = pd) +
geom_line(position = pd, aes(group = Age), linetype = c("dashed")) +
geom_point(position = pd, size = 4)+
scale_y_continuous(expand = c(0, 0),name = 'Sensitivity (%)',sec.axis = sec_axis(~./5, name = 'Breast cancer detection rate (per 1000 mammograms)', breaks = c(0,5,10,15,20),
labels = c('0‰',"5‰", '10‰', '15‰', '20‰')), limits = c(0,100)) +
geom_line(position = pd, aes(x = Density, y = tde * 5000, colour = Age, group = Age), linetype = c("dashed"), data = Plot) +
geom_point(shape=18,aes(x = Density, y = tde * 5000, colour = Age, group = Age), position = pd, size = 4) +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
theme(legend.position="bottom") + guides(colour = guide_legend(), size = guide_legend(),
shape = guide_legend())
I have made the following graph,
in which the axis on the left is the scale of the circles and the axis on the right is the scale of the diamonds. The fact is that I would like to have a legend approximately like this:
But it is impossible for me, I have tried suggestions of other threads like scale_shape and different commands in guides but I have not got success. I just want to make clear the difference in what shape and color represent.
Would someone know how to help me?
Best regards,

What you should do is a panel plot to avoid the confusion of double axes:
library(dplyr)
library(tidyr)
Plot %>%
gather(measure, Result, Sensitivity, tde) %>%
ggplot(aes(x = Density, y = Result, colour=Age)) +
geom_errorbar(aes(ymin = inf, ymax = sup), width = .1, position = pd,
data = . %>% filter(measure == "Sensitivity")) +
geom_line(aes(group = Age), position = pd, linetype = "dashed") +
geom_point(position = pd, size = 4)+
# scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) +
scale_y_continuous(labels = scales::percent) +
facet_wrap(~measure, ncol = 1, scales = "free_y") +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
theme(legend.position="bottom")
But to do what you asked, you problem is that you have only 1 non-positional aesthetic mapped so you cannot get more than one legend. To force a second legend, you need to add a second mapping. It can be a dummy mapping that has no effect, as below we map alpha but then manually scale both levels to 100%. This solution is not advisable because, as you have done in your example of a desired legend, it is easy to mix up the mappings and have your viz tell a lie by mislabeling which points are sensitivity and which are detection rate.
ggplot(Plot, aes(x = Density, y = 100*Sensitivity, colour=Age, alpha = Age)) +
geom_errorbar(aes(ymin = 100*inf, ymax = 100*sup), width = .1, position = pd) +
geom_line(position = pd, aes(group = Age), linetype = c("dashed")) +
geom_point(position = pd, size = 4)+
scale_y_continuous(expand = c(0, 0),name = 'Sensitivity (%)',sec.axis = sec_axis(~./5, name = 'Breast cancer detection rate (per 1000 mammograms)', breaks = c(0,5,10,15,20),
labels = c('0‰',"5‰", '10‰', '15‰', '20‰')), limits = c(0,100)) +
geom_line(position = pd, aes(x = Density, y = tde * 5000, colour = Age, group = Age), linetype = c("dashed"), data = Plot) +
geom_point(shape=18,aes(x = Density, y = tde * 5000, colour = Age, group = Age), position = pd, size = 4) +
theme_light() +
scale_color_manual(name="Age (years)",values = c("50-59"= "grey55", "60-69" = "grey15")) +
scale_alpha_manual(values = c(1, 1)) +
guides(alpha = guide_legend("Sensitivity"),
color = guide_legend("Detection Rate", override.aes = list(shape = 18))) +
theme(legend.position="bottom")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to add jitter to geom_point when already using dodge - r

Related

Indicating a range for the Y axis in a bar chart

ggplot crossbar mislocalization in log-scaled data

Raincloud plot - histogram?

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

customize two legends inside one graph in ggplot2

Categories

Resources