A new problem with the same code from an earlier question. Funny because this is the first line of code in my R journey that just keeps stumping me. Following up on the previous post, now the extra line charts are being plotted but I realized that the x axis is not respecting the limits I'm trying to set. Here is the code. Changing the values for xlim does not 'zoom in' on the x axis and seems to have no effect at all:
ggplot(aes(x = age, y = friend_count), data = pf) +
coord_cartesian(xlim = c(13,90)) +
geom_point(alpha = 0.05,
position = position_jitter(h = 0),
color = 'orange') +
coord_trans(y = "sqrt") +
geom_line(stat = 'summary', fun = mean) +
geom_line(stat = 'summary', fun = quantile, fun.args = list(probs = .1),
linetype = 2, color = 'blue') +
geom_line(stat = 'summary', fun = quantile, fun.args = list(probs = .9),
linetype = 2, color = 'blue')
Related
I want to plot my data as a scatter plot with the mean+sem error bar by using ggplot2. I am using stat_summary to add the mean bar and errorbar. As the variation is pretty huge, I used scale_y_continuous to transform the y axis as a log10 scale for better visualization.
Here is the example data:
Value <- c(815,2467,4130,32588,171,68,582,476)
Treatment <- c(rep("Ctl",4),rep("Mutant",4))
data.frame(Value, Treatment)
It works fine when I use the linear y-axis. The crossbar localized on 10000, the mean of Ctl group.
plot_linear <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw()
However, if I log transform the y axis, the crossbar for the mean value of Ctl always localizes on the second-highest point (4130) but not the mean point (10000).
plot_log <- dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
# log scaled y axis
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
I don't understand the logic of this wired localization for the crossbar.
Is there a way to plot the mean bar for the log-scaled data?
Thanks a lot!
Is the issue related to you adding the log scale after everything else, so when the error bars are made, they are made for the non-log axis and then the axis is overwritten in log scale? I would try putting the scale_y_conintous() function first after the ggplot() function.
The stat_summary steps are performed after the transformation, which is why the mean looks different with log scale. This is described a bit here:
https://stackoverflow.com/a/14845174/6851825
And more technical discussion about the rationale and implications from the ggplot2 team here:
https://github.com/tidyverse/ggplot2/issues/2804
To resolve this, you have a few options.
Calculate your summary values upstream of ggplot2:
dat_sum <- dat %>%
group_by(Treatment) %>%
summarize(mean = mean(Value),
mean_se = sd(Value) / sqrt(n()))
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
geom_errorbar(data = dat_sum,
aes(y = mean, ymin = mean - mean_se, ymax = mean + mean_se)) +
geom_crossbar(data = dat_sum,
aes(y = mean, ymin = mean, ymax = mean)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
Apply the inverse transform to your summary variables. This is probably more brittle and more likely to confuse others reading the code, but it does keep the calculation within the ggplot2 chain.
dat %>%
ggplot(aes(x=Treatment, y=Value, color = Treatment)) +
geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
binaxis='y', stackdir='center', dotsize = 1,
position=position_dodge(0.9)) +
stat_summary(fun = function(x) log10(mean(10^x)), geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9)) +
stat_summary(fun.data = function(x) log10(mean_se(10^x)), geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9)) +
theme_bw() +
scale_y_continuous(trans = log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x))
set.seed(1) # generate random data
day1 = rnorm(20,0,1)
day2 = rnorm(20,5,1)
Subject <- rep(paste0('S',seq(1:20)), 2)
Data <- data.frame(Value = matrix(c(day1,day2),ncol=1))
Day <- rep(c('Day 1', 'Day 2'), each = length(day1))
df <- cbind(Subject, Data, Day)
Using this random data, I'd like to plot individual points with unique color for each subject and a summary point (mean + standard deviation).
It seems that the plot is okay when all points are plotted with the same color because stat_summary(fun.data = mean_sdl) works properly.
ggplot(data = df, mapping = aes(x= Day, y =Value)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color= 'black') +
geom_point(size = 2)
But not when all points have unique color (for each subject).
ggplot(data = df, mapping = aes(x = Day, y = Value,
fill = Subject)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color = 'black') +
geom_point(shape = 21, color = 'white', size = 2)
In your example ggplot assumes that each color corresponds to an individual group, but you want the grouping and color to be separate. Therefore, you need to explicitly define the group to be "Day".
ggplot(data = df, mapping = aes(x = Day, y = Value,
fill = Subject, group = Day)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color = 'black') +
geom_point(shape = 21, color = 'white', size = 2)
Try the following:
ggplot(data = df, mapping = aes(x= Day, y =Value)) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 2),
geom = 'pointrange', fatten = 3*1.2, size = 1.2,
color= 'black') +
geom_point(size = 2, aes(color = Subject))
Instead of specifying fill in aes() in the first line (ggplot(...)), I've moved it to the geom_point() element instead. Otherwise, stat_summary() will be doing its calculations grouped using Subject!
I have the following raw data that I plotted in R:
And I would like to edit this plot to look like this version below which was made by log-transforming the X axis using Excel
However, when I run my code below using scale_x_log10(), the output is not the desired plot I was hoping to make. See image below:
Can anyone identify where I have gone wrong?
ggplot(data = data, aes(x = x, y = y, group = group, color = group)) +
stat_summary(fun = "mean", geom = "line", size = 1.2, aes(group = group, linetype = group, color = group)) +
stat_summary(fun = "mean", geom = "point", size = 3, aes(color = group)) +
theme_apa() +
scale_linetype_manual(values = c("solid", "dashed")) +
scale_color_manual(values = c("mediumturquoise", "red")) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_log10(limits = c(.01, 40), breaks = c(.01, .1, 1, 10))
It looks like your first datapoint is at zero - this can't be displayed on a log scale. You'll need to work out if there's a difference in you data in excel, failing that you could achieve a similar result by modifying the lowest value of x with:
ggplot(data = data, aes(x = pmax(x,0.01), y = y, group = group, color = group)) +
stat_summary(fun = "mean", geom = "line", size = 1.2, aes(group = group, linetype = group, color = group)) +
stat_summary(fun = "mean", geom = "point", size = 3, aes(color = group)) +
theme_apa() +
scale_linetype_manual(values = c("solid", "dashed")) +
scale_color_manual(values = c("mediumturquoise", "red")) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_log10(limits = c(.01, 40), breaks = c(.01, .1, 1, 10))
I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)
I'm totally beginner at R and I’m with a problem to plot a line of quantiles. The line of "mean" appear, but the "quantiles" not.
Someone can help me? :)
My code:
ggplot(data = pf, aes(x = age, y = friend_count)) +
geom_point(alpha = 1/20, position = position_jitter(h = 0), color = ‘orange’) +
xlim(13, 90) +
coord_trans(y = ‘sqrt’) +
geom_line(stat = ‘summary’, fun.y = ‘mean’) +
geom_line(stat = ‘summary’, fun.y = ‘quantile’, fun.args = list(probs = 0.1))
Assuming you want to show deciles (10th percentile, 20th, etc.) and you want to show them as straight regression lines you could add geom_quantile(quantiles = seq(.1, .9, by = .1)) at the end instead of geom_line(stat = ‘summary’, fun.y = ‘quantile’, fun.args = list(probs = 0.1))
For example:
ggplot(data = cars, aes(x=speed, y=dist)) +
geom_point() + geom_quantile(quantiles = seq(.1, .9, by = .1))
Gives: