Problem with jittered data points in geom_boxplot - r

I have created a boxplot using the following code -
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot() +
geom_point(aes(colour = Ploidy), size = 0.5) +
geom_jitter() +
# USE ENVSTATS PACKAGE TO INCLUDE SAMPLE SIZE
stat_n_text(size = 3) +
# INCLUDE MEAN VALUES
stat_summary(fun = mean, geom = "point", shape = 4, size = 2, color = "black") +
stat_summary(fun = mean, colour = "black", geom = "text", size = 3, show.legend = FALSE,
hjust = -0.35, vjust = -0.5, aes( label = round(..y.., digits = 2)))
I wanted to spread the data points out a little; however, when I use geom_jitter it seems to blur all the data points together and ruin the chart (see image).
Any help with this would be greatly appreciated.

You can use the width argument of geom_jitter to control how much the points are spread along the x-axis. I'd also recommend making the jittered points transparent (alpha argument) and to stop geom_boxplot from plotting the outliers with the outlier.shape argument (as those points also will be plotted by the jitter layer). Try the following:
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.25, width = 0.1)

Related

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Is there a way to present multiple means in ggplot in r

This is the code that I am using:
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary(fun.y = "mean", geom = "point", shape = 20, size = 2, color = "red", fill = "red")
I present accuracy of 2 different analyzers vs vehicle type on x-axis.
I have 2 boxplots (not for all vehicles) and I want to show also two mean values with dots (and not only median). How can I do that? Below is example of my plot.
It would be helpful if you could include some sample data so that we could verify an answer. However, it looks like you need to tell stat_summary how to group the data for the mean calculation and then to space the points horizontally. The code probably should be
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary( aes(group = Analyzer), fun.y = "mean", geom = "point",
shape = 20, size = 2, color = "red", fill = "red", position = position_dodge(width = .75) )

How to insert grouped median segments in violin plot in ggplot2

I'd like to insert median lines for factor levels into a violin plot in ggplot2. Here's some reproducible data:
set.seed(12)
FactorVar <- sample(LETTERS[1:5], 500, replace = T)
NumericVar <- abs(rnorm(500))
df <- data.frame(FactorVar, NumericVar)
To get the grouped medians I use tapply:
medians <- tapply(df$NumericVar, df$FactorVar, FUN = median)
And this is the code for the plot. As can be seen, I'm inserting each median line individually. That's cumbersome and uneconomical:
library(ggplot2)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
geom_segment(x = 0.5, xend = 1.5, y= medians[1], yend = medians[1], color = "red", linetype = 2) +
geom_segment(x = 1.5, xend = 2.5, y = medians[2], yend = medians[2], color = "red", linetype = 2) +
geom_segment(x = 2.5, xend = 3.5, y = medians[3], yend = medians[3], color = "red", linetype = 2) +
geom_segment(x = 3.5, xend = 4.5, y = medians[4], yend = medians[4], color = "red", linetype = 2) +
geom_segment(x = 4.5, xend = 5.5, y = medians[5], yend = medians[5], color = "red", linetype = 2) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray(); g
How can the median segments be inserted in a single command? Also, observe how the median line for factor A is thinner than the others? Why's that?
One method (that simplifies the +/- axis) would be to facet it. Before, though, we'll need to put the medians into a frame, preferably with the same grouping factors as the original.
mediansdf <- data.frame(FactorVar=names(medians), NumericVar=medians)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray() +
facet_grid(FactorVar~., scales="free") +
geom_segment(aes(x = 0.5, xend = 1.5, yend = NumericVar), color = "red", linetype = 2, data = mediansdf)
g
This example reused the y aesthetic, but since we have a different frame, we could easily use different names (and specify them within aes(...). One advantage to using the same variable names is (in my opinion) clearer declarative code.
Since the facet_grid adds the factor label on the right side, you likely could remove it from the axis. Note, if you do not use scales="free", then you'll see all factors in each facet, which is distracting and unnecessary.
The reason I am suggesting facets is that it makes the x and xend simple and relative to a single violin, so 0.5 to 1.5; otherwise, as you saw, there is some assumption on which is going with which integer placement.
Last, the appearance of thinner red lines for me was while looking at the raster plot window. If you save to vector-based format (e.g., PDF), the lines appear to be the same thickness.

How to modify and add an extra legend in a ggplot2 figure

I have data that looks like this:
example.df <- as.data.frame(matrix( c("height","fruit",0.2,0.4,0.7,
"height","veggies",0.3,0.6,0.8,
"height","exercise",0.1,0.2,0.5,
"bmi","fruit",0.2,0.4,0.6,
"bmi","veggies",0.1,0.5,0.7,
"bmi","exercise",0.4,0.7,0.8,
"IQ","fruit",0.4,0.5,0.6,
"IQ","veggies",0.3,0.5,0.7,
"IQ","exercise",0.1,0.4,0.6),
nrow=9, ncol=5, byrow = TRUE))
colnames(example.df) <- c("phenotype","predictor","corr1","corr2","corr3")
So basically three different correlations between 3x3 variables. I want to visualize the increase in correlations as follows:
ggplot(example.df, aes(x=phenotype, y=corr1, yend=corr3, colour = predictor)) +
geom_linerange(aes(x = phenotype,
ymin = corr1, ymax = corr3,
colour = predictor),
position = position_dodge(width = 0.5))+
geom_point(size = 3,
aes(x = phenotype, y = corr1, colour = predictor),
position = position_dodge(width = 0.5), shape=4)+
geom_point(size = 3,
aes(x = phenotype, y = corr2, colour = predictor),
position = position_dodge(width = 0.5), shape=18)+
geom_point(size = 3,
aes(x = phenotype, y = corr3, colour = predictor),
position = position_dodge(width = 0.5))+
labs(x=NULL, y=NULL,
title="Stackoverflow Example Plot")+
scale_colour_manual(name="", values=c("#4682B4", "#698B69", "#FF6347"))+
theme_minimal()
This gives me the following plot:
Problems:
Tthere is something wrong with the way the geom_point shapes are dodged with BMI and IQ. They should be all with on the line with the same colour, like with height.
How do I get an extra legend that can show what the circle, cross, and square represent? (i.e., the three different correlations shown on the line: cross = correlation 1, square = correlation 2, circle = correlation 3).
The legend now shows a line, circle, cross through each other, while just a line for the predictors (exercise, fruit, veggies) would suffice..
Sorry for the multiple issues, but adding the extra legend (problem #2) is the most important one, and I would be already very satisfied if that could be solved, the rest is bonus! :)
See if the following works for you? The main idea is to convert the data frame from wide to long format for the geom_point layer, and map correlation as a shape aesthetic:
example.df %>%
ggplot(aes(x = phenotype, color = predictor, group = predictor)) +
geom_linerange(aes(ymin = corr1, ymax = corr3),
position = position_dodge(width = 0.5)) +
geom_point(data = . %>% tidyr::gather(corr, value, -phenotype, -predictor),
aes(y = value, shape = corr),
size = 3,
position = position_dodge(width = 0.5)) +
scale_color_manual(values = c("#4682B4", "#698B69", "#FF6347")) +
scale_shape_manual(values = c(4, 18, 16),
labels = paste("correlation", 1:3)) +
labs(x = NULL, y = NULL, color = "", shape = "") +
theme_minimal()
Note: The colour legend is based on both geom_linerange and geom_point, hence the legend keys include both a line and a point shape. While it's possible to get rid of the second one, it does take some more convoluted code, and I don't think the plot would be much improved as a result...

How to make the points look larger in ggplot

ggplot(data_exp, aes(x = transaction, y = exp, size = count, colour = class, label = label)) +
geom_point(alpha = 0.5) +
geom_text(colour = "black", vjust = 0, nudge_y = 0.5, size = 3, fontface = "bold")
I want to make the points in the plot look larger, I try to make size becomes count*1000, but nothing seems to change.
Use size aesthetic in geom_point
An example;
library(ggplot2)
data(mtcars)
# increase point size based on variable value
p <- ggplot(mtcars, aes(wt,mpg))
p + geom_point(aes(size=wt*100))

Resources