ggplot crossbar mislocalization in log-scaled data

ggplot crossbar mislocalization in log-scaled data - r

I want to plot my data as a scatter plot with the mean+sem error bar by using ggplot2. I am using stat_summary to add the mean bar and errorbar. As the variation is pretty huge, I used scale_y_continuous to transform the y axis as a log10 scale for better visualization.
Here is the example data:
Value <- c(815,2467,4130,32588,171,68,582,476)
Treatment <- c(rep("Ctl",4),rep("Mutant",4))
data.frame(Value, Treatment)
It works fine when I use the linear y-axis. The crossbar localized on 10000, the mean of Ctl group.
plot_linear <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw()
However, if I log transform the y axis, the crossbar for the mean value of Ctl always localizes on the second-highest point (4130) but not the mean point (10000).
plot_log <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
# log scaled y axis
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
I don't understand the logic of this wired localization for the crossbar.
Is there a way to plot the mean bar for the log-scaled data?
Thanks a lot!

Is the issue related to you adding the log scale after everything else, so when the error bars are made, they are made for the non-log axis and then the axis is overwritten in log scale? I would try putting the scale_y_conintous() function first after the ggplot() function.

The stat_summary steps are performed after the transformation, which is why the mean looks different with log scale. This is described a bit here:
https://stackoverflow.com/a/14845174/6851825
And more technical discussion about the rationale and implications from the ggplot2 team here:
https://github.com/tidyverse/ggplot2/issues/2804
To resolve this, you have a few options.
Calculate your summary values upstream of ggplot2:
dat_sum <- dat %>%
group_by(Treatment) %>%
summarize(mean = mean(Value),
mean_se = sd(Value) / sqrt(n()))
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
geom_errorbar(data = dat_sum,
aes(y = mean, ymin = mean - mean_se, ymax = mean + mean_se)) +
geom_crossbar(data = dat_sum,
aes(y = mean, ymin = mean, ymax = mean)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
Apply the inverse transform to your summary variables. This is probably more brittle and more likely to confuse others reading the code, but it does keep the calculation within the ggplot2 chain.
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = function(x) log10(mean(10^x)), geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = function(x) log10(mean_se(10^x)), geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))

Related

ggplot not respecting xlimits in coord_cartesian

A new problem with the same code from an earlier question. Funny because this is the first line of code in my R journey that just keeps stumping me. Following up on the previous post, now the extra line charts are being plotted but I realized that the x axis is not respecting the limits I'm trying to set. Here is the code. Changing the values for xlim does not 'zoom in' on the x axis and seems to have no effect at all:
ggplot(aes(x = age, y = friend_count), data = pf) +
coord_cartesian(xlim = c(13,90)) +
geom_point(alpha = 0.05,
position = position_jitter(h = 0),
color = 'orange') +
coord_trans(y = "sqrt") +
geom_line(stat = 'summary', fun = mean) +
geom_line(stat = 'summary', fun = quantile, fun.args = list(probs = .1),
linetype = 2, color = 'blue') +
geom_line(stat = 'summary', fun = quantile, fun.args = list(probs = .9),
linetype = 2, color = 'blue')

stat_summary() and fun.data = mean_sdl not working

set.seed(1) # generate random data
day1 = rnorm(20,0,1)
day2 = rnorm(20,5,1)
Subject <- rep(paste0('S',seq(1:20)), 2)
Data <- data.frame(Value = matrix(c(day1,day2),ncol=1))
Day <- rep(c('Day 1', 'Day 2'), each = length(day1))
df <- cbind(Subject, Data, Day)
Using this random data, I'd like to plot individual points with unique color for each subject and a summary point (mean + standard deviation).
It seems that the plot is okay when all points are plotted with the same color because stat_summary(fun.data = mean_sdl) works properly.
ggplot(data = df, mapping = aes(x= Day, y =Value)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color= 'black') +
geom_point(size = 2)
But not when all points have unique color (for each subject).
ggplot(data = df, mapping = aes(x = Day, y = Value,
fill = Subject)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color = 'black') +
geom_point(shape = 21, color = 'white', size = 2)

In your example ggplot assumes that each color corresponds to an individual group, but you want the grouping and color to be separate. Therefore, you need to explicitly define the group to be "Day".
ggplot(data = df, mapping = aes(x = Day, y = Value,
fill = Subject, group = Day)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color = 'black') +
geom_point(shape = 21, color = 'white', size = 2)

Try the following:
ggplot(data = df, mapping = aes(x= Day, y =Value)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color= 'black') +
geom_point(size = 2, aes(color = Subject))
Instead of specifying fill in aes() in the first line (ggplot(...)), I've moved it to the geom_point() element instead. Otherwise, stat_summary() will be doing its calculations grouped using Subject!

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)

This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

I am trying to create a plot in ggplot2 similar to this one:
Here is the code I am using:
Dataset %>%
group_by(Participant, Group, Emotion) %>%
ggplot(aes(y = Score, x = Emotion, fill = Group, colour = Group)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .4) +
geom_point(aes(y = Score, color = Group), position = position_jitter(width = .15), size = 3, alpha = 0.4) +
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48') +
stat_summary(fun = mean, geom = 'pointrange', width = 0.2, size = 2, alpha = 1.2, position=position_dodge(width=0.3)) +
stat_summary(fun.data = mean_se, geom='errorbar', width = 0.25, size = 2.2, alpha = 1.2, linetype = "solid",position=position_dodge(width=0.3)) +
guides(color = FALSE) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Dark2") +
ylim(0, 100) +
graph_theme
What I am failing to do is set up the stat_summary(geom = 'line') to connect the green and orange means within each emotion on the x-axis. Could anyone give any pointers on this? I'd also like all the other features to stay the same if possible (e.g., I wouldn't like to use facet_grid or facet_wrap).
Thank you!
When I change the group argument in stat_summary to 'Group' instead of 'Emotion', means for each group are connected across emotions, but I can't figure out how to connect means of different groups within each emotion:

This is a tricky one because your line needs to connect points that have different x values but even if you jitter in the point layer, they still technically have the same x value so the line doesn't know how to connect them. What others have done is to manually add the jitter to force the points to have a different x position. For more inspiration check out this, this and this. Here's an example:
library(tidyverse)
set.seed(1)
emotion <- c("anger", "fear", "sadness")
group <- letters[1:2]
participant <- 1:10
dat <- expand_grid(emotion, group, participant) %>%
mutate(across(everything(), as.factor),
score = sample(x = 1:100, size = nrow(.), replace = T))
dat %>%
mutate(new_emot = case_when(
group == "a" ~as.numeric(emotion) - 0.125,
group == "b" ~as.numeric(emotion) + 0.125
)) %>%
ggplot(aes(x = emotion, y = score)) +
stat_summary(aes(color = group), fun = mean, geom = "point", position = position_dodge(width = 0.5)) +
stat_summary(aes(color = group), fun.data = mean_se, geom = "errorbar", width = 0.5, position = position_dodge(width = 0.5)) +
stat_summary(aes(x = new_emot, group = emotion), fun = mean, geom = "line") +
theme_bw()
Created on 2021-03-24 by the reprex package (v1.0.0)

Setting geom_line to the same position as pointrange and errorbar will solve the problem.
i.e.,
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48', position=position_dodge(width=0.3))

How to add jitter to geom_point when already using dodge

I'm plotting summary stats in front of individual geom_points, but can't figure out how to add jitter to the plots. I think the issue is that I'm already using the position argument to move the High and Low water points away from each other.
waterSymPop_p <- ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_dodge(width = 0.9)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()
Here's the plot that produces (obviously not finished with the color scheme, etc)
I'd like the point to be slightly jittered within each point group (ie, not in a straight line). Thanks for the help!
Answer: use position_jitterdodge
Amended code and new figure:
ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.2)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()

Drey answered this.
Answer: use position_jitterdodge
Amended code and new figure:
ggplot(aes(x = SymPop, y = Finish, fill = Water, color = Water), data = xanFull) +
geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.2)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_flip()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot crossbar mislocalization in log-scaled data - r

Is the issue related to you adding the log scale after everything else, so when the error bars are made, they are made for the non-log axis and then the axis is overwritten in log scale? I would try putting the scale_y_conintous() function first after the ggplot() function.

Related

ggplot not respecting xlimits in coord_cartesian

stat_summary() and fun.data = mean_sdl not working

Raincloud plot - histogram?

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

How to add jitter to geom_point when already using dodge

Categories

Resources