This is the code that I am using:
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary(fun.y = "mean", geom = "point", shape = 20, size = 2, color = "red", fill = "red")
I present accuracy of 2 different analyzers vs vehicle type on x-axis.
I have 2 boxplots (not for all vehicles) and I want to show also two mean values with dots (and not only median). How can I do that? Below is example of my plot.
It would be helpful if you could include some sample data so that we could verify an answer. However, it looks like you need to tell stat_summary how to group the data for the mean calculation and then to space the points horizontally. The code probably should be
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary( aes(group = Analyzer), fun.y = "mean", geom = "point",
shape = 20, size = 2, color = "red", fill = "red", position = position_dodge(width = .75) )
Related
I conducted some interviews and I wanted to create box plots with ggplot based on these interviews. I managed to create the box plots but I do not manage to include the outliers in the box plot. I have only a few observations and therefore I want the outliers to be part of the box plot.
This is the code that I have so far:
data_insurances_boxplot_merged <- ggplot(data_insurances_merged, aes(x = value, y = func, fill = group)) +
stat_boxplot(geom = "errorbar", width = 0.3, position = position_dodge(width = 0.75)) +
geom_boxplot() +
stat_summary(fun.y = mean, geom = "point", shape = 20, size = 3, color = "red",
position = position_dodge2(width = 0.75,
preserve = "single")) +
scale_x_continuous(breaks = seq(1, 7, 1), limits = c(1, 7)) +
scale_fill_manual(values = c("#E6645E", "#EF9C9D")) +
labs(x = "",
y = "", title = "") +
theme_light(base_size = 12) +
theme(legend.title = element_blank())
data_insurances_boxplot_merged
And this is the box plot that is generated:
Does anyone know how to achieve this?
I am making a line graph using ggplot of some data that I have were I am sizing objects based on treatment groups into bins and looking at the percentage in each size bin by group.
My goal is to facet the data but have the control line as a graph but also added to the other groups when they facet.
I have the graph working were I get all of my groups one of which is my control and I can facet it. I just would to keep the control line on the subsequent graphs but slightly grayed out. I am not sure how to add that.
graph <- ggplot(data=my_data, aes(x=bins, y=p_per_bin, group = group,
color = group, linetype = group, shape = group))+
stat_summary(fun = "mean", geom = "line", lwd = rel(1))+
stat_summary(fun = mean,
geom = "pointrange",
fun.max = function(x) mean(x) + sd(x) / sqrt(length(x)),
fun.min = function(x) mean(x) - sd(x) / sqrt(length(x)))+
stat_summary(fun = "mean", geom = "point", size = rel(2), fill = "white", stroke = rel(1.1))
graph + facet_wrap(~group)
Here is a portion of my data as a sample. Group "a" is the control.
sample <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9)
group <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b",
"b","c","c","c","c","c","c","c","c","c")
bins <-c("0-20","20-40","40-60","0-20","20-40","40-60","0-20","20-40","40-60","0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60")
p_per_bin <- c(0,37.7192982,21.0526316,0,36.744186,23.7209302,0,36.2126246,31.5614618,
0,31.25,27.0833333,0,41.2280702,28.5087719,0,39.6078431,31.372549,0,43.7262357,
20.1520913,0,35.4716981,21.1320755,0,38.5350318, 29.9363057)
my_data <- cbind(sample,group,bins,p_per_bin)
One option would be to add your reference group via an additional stat_summary for which you only use the data on the reference group. To get this layer displayed on each facet it's important to drop the group column after filtering.
For the example code I have chosen group "a" as the reference group:
library(ggplot2)
library(dplyr)
ggplot(data = my_data, aes(
x = bins, y = p_per_bin, group = group,
color = group, linetype = group, shape = group
)) +
#### Add line for reference group
stat_summary(fun = "mean", geom = "line", lwd = rel(.5),
data = ~filter(.x, group == "a") |> select(-group),
color = "grey45", linetype = "solid") +
####
stat_summary(fun = "mean", geom = "line", lwd = rel(1)) +
stat_summary(
fun = mean,
geom = "pointrange",
fun.max = function(x) mean(x) + sd(x) / sqrt(length(x)),
fun.min = function(x) mean(x) - sd(x) / sqrt(length(x))
) +
stat_summary(fun = "mean", geom = "point", size = rel(2), fill = "white", stroke = rel(1.1)) +
facet_wrap(~group)
I need to overlay the mean of the abiotic line over the point chart. I tried using geom_line as some other answers recommend but it doesn't quite work. I also want the mean of each point to be shown for each level.
sp <- rep(c("A","B"), times = 10)
sp.val <- rnorm(20,5,1)
abitoic <- rnorm(20,40,2)
level <- rep(c("Low","High"), each = 10)
df <- data.frame(sp, sp.val, abitoic, level)
pd = position_dodge(0.5)
ggplot(df, aes(x = level, y = sp.val, col = sp, group = sp)) +
geom_point(aes(fill = sp),colour="white",pch=21, size=4, stroke = 1, alpha = 0.7, position = pd) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
width = 0.5, colour = "black",
position = pd) +
stat_summary(fun = median, color = "black",
geom = "point", size = 7,show.legend = FALSE,
position = pd) +
stat_summary(fun = median,
geom = "line", show.legend = FALSE,
position = pd)+
stat_summary(fun = median,
geom = "point", size = 3,show.legend = FALSE,
position = pd) +
geom_line(aes(x = level, y = abitoic/5, group = level))
I've written this code:
ggplot() +
geom_sf(aes(fill = dat$color_province)) +
theme_void() +
geom_point(data = producer,
aes(x = producer$MX, y = producer$MY), size = 3, col = "green", shape = 17, alpha = 0.6) +
geom_point(data = distribution,
aes(x = distribution$MX, y = distribution$MY), size = 4.5, col = "yellow", shape = 15) +
geom_point(data = retailer,
aes(x = retailer$MX, y = retailer$MY), size = 3, col = "slateblue", shape = 16) +
geom_point(data = Demand,
aes(x = Demand$MX, y = Demand$MY, size = Demand$De), col = "slateblue", shape = 17, alpha = 0.7) +
scale_fill_manual(values = c("#ff3333", "#ffc266"),
name = "Situation")
and now I want to add a legend to identify all points in my plot. How can I do it?
Here's an example on some data that everyone can run, since it uses built-in datasets that come with R. Here, I made color and size be dynamic aesthetics with the name of the series, and then mapped those series values to different aesthetic values using scale_*_manual, where * are the aesthetics you want to vary by series. This generates an automatic legend. By giving each aesthetic the same name ("source" here), ggplot2 knows to combine them into one legend.
(By the way, it's unnecessary and can lead to errors to refer to variables in ggplot2 aesthetics using the form retailer$MY; each geom will assume the variable is within the data frame referred to with data =, so you can just use MY in that case.)
ggplot() +
geom_point(data = mtcars,
aes(x = wt, y = mpg, color = "mtcars", size = "mtcars")) +
geom_point(data = attitude,
aes(x = rating/20, y = complaints/3, color = "attitude", size = "attitude")) +
scale_color_manual(values = c("mtcars" = "slateblue", "attitude" = "red"), name = "source") +
scale_size_manual(values = c("mtcars" = 3, "attitude" = 4.5), name = "source")
I have created a boxplot using the following code -
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot() +
geom_point(aes(colour = Ploidy), size = 0.5) +
geom_jitter() +
# USE ENVSTATS PACKAGE TO INCLUDE SAMPLE SIZE
stat_n_text(size = 3) +
# INCLUDE MEAN VALUES
stat_summary(fun = mean, geom = "point", shape = 4, size = 2, color = "black") +
stat_summary(fun = mean, colour = "black", geom = "text", size = 3, show.legend = FALSE,
hjust = -0.35, vjust = -0.5, aes( label = round(..y.., digits = 2)))
I wanted to spread the data points out a little; however, when I use geom_jitter it seems to blur all the data points together and ruin the chart (see image).
Any help with this would be greatly appreciated.
You can use the width argument of geom_jitter to control how much the points are spread along the x-axis. I'd also recommend making the jittered points transparent (alpha argument) and to stop geom_boxplot from plotting the outliers with the outlier.shape argument (as those points also will be plotted by the jitter layer). Try the following:
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.25, width = 0.1)