ggplot2: add line and points showing means (stat_summary) - r

So I am using this data frame:
xym <- data.frame(
Var1 = c("vloga", "odločitve", "dolgoročno",
"krizno", "uživa v", "vloga", "odločitve",
"dolgoročno", "krizno", "uživa v", "vloga",
"odločitve","dolgoročno", "krizno", "uživa v",
"vloga","odločitve", "dolgoročno", "krizno",
"uživa v"),
Var2 = c("Nad","Nad", "Nad", "Nad", "Nad", "Pod",
"Pod", "Pod", "Pod", "Pod", "Enak","Enak",
"Enak", "Enak", "Enak", "Sam.", "Sam.", "Sam.",
"Sam.", "Sam."),
value = c(4, 3, 4, 4, 3, 3, 3, 2, 3, 3, 3, 2.5, 2.5,
2, 3.5 ,5 ,6 ,6 ,5 ,6))
And with this code:
p <- ggplot(xym, aes(x = Var1, y = value, fill = Var2)) + coord_flip()+
theme_bw() + scale_fill_manual(values = c("yellow", "deepskyblue1", "yellowgreen","orchid4")) + xlim(rev(levels(xym$Var1)))+ theme(axis.title=element_blank(),axis.ticks.y=element_blank(),legend.position = "bottom",
axis.text.x = element_text(angle = 0,vjust = 0.4)) +
geom_bar(stat = "identity", width = 0.7, position = position_dodge(width=0.7)) +
geom_text(aes(x = Var1, y =max(value), label = round(value, 2), fill = Var2),
angle = 0, position = position_dodge(width = 0.7), size = 4.2)
p + labs(fill="")
p + stat_summary(fun.y=mean, colour="red", geom="line", aes(group = 1))
I produce output:
But beside the red line which is marking total average by question (i.e. "dolgoročno", "krizno" etc.) I would like to add points and next to the bars as well as labels of the individual question group mean
My output should look something like the picture below, (I did it in paint), where the black dots represent my desired points and the value 3.6 of the first dot is the average of (6,2,4,2.5) and represents my desired value labels.
I've also looked at:
Plot average line in a facet_wrap
ggplot2: line connecting the means of grouped data
How to label graph with the mean of the values using ggplot2

One option would be the following. I followed your code and added a few lines.
# Your code
p <- ggplot(xym, aes(x = Var1, y = value, fill = Var2)) +
coord_flip() +
theme_bw() +
scale_fill_manual(values = c("yellow", "deepskyblue1", "yellowgreen","orchid4")) +
xlim(rev(levels(xym$Var1))) +
theme(axis.title = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "bottom",
axis.text.x = element_text(angle = 0,vjust = 0.4)) +
geom_bar(stat = "identity", width = 0.7, position = position_dodge(width = 0.7)) +
geom_text(aes(x = Var1, y = max(value), label = round(value, 2), fill = Var2),
angle = 0, position = position_dodge(width = 0.7), size = 4.2)
p + labs(fill = "")
Then, I added the following code. You can add dots changing geom to point in stat_summary. For labels, I chose to get data from ggplot_build() and crated a data frame called foo. (I think there are other ways to do the same job.) Using foo, I added annotation in the end.
p2 <- p +
stat_summary(fun.y = mean, color = "red", geom = "line", aes(group = 1)) +
stat_summary(fun.y = mean, color = "black", geom ="point", aes(group = 1), size = 5,
show.legend = FALSE)
# This is the data for your dots in the graph
foo <- as.data.frame(ggplot_build(p2)$data[[4]])
p2 +
annotate("text", x = foo$x, y = foo$y + 0.5, color = "black", label = foo$y)

Related

ggplot trying to make a Cleveland plot but I cannot get a legend

library(ggplot2)
library(ggthemes)
data <- read.csv('/Users/zbhay/Documents/r-data.csv', header = 1)
zb <- ggplot(data) +
geom_segment( aes(x=x, xend=x, y=value1, yend=value2), color="black")+
geom_point( aes(x=x, y=value1), color=rgb(0.2,0.7,0.1,1), size=4 )+
geom_point( aes(x=x, y=value2), color=rgb(0.7,0.2,0.1,1), size=4 )+
coord_flip() +
theme_solarized() +
scale_y_continuous(breaks = seq(0, 10000, by = 500)
)
zb + labs(title = "Title",
subtitle = "subtitle") +
xlab("Business Functions") +
ylab("# of hours")
legend("left", c("Starting", "Ending"),
box.col = "darkgreen"
)
Hello, here is the code. The CSV file is structured as follows; column A = names, column b = starting number, column c = final number.
I am trying to set up a legend that calls out the final number vs starting number. I have tried and tried but cannot seem to be able to crack it. If anyone knows a fix, I would appreciate it if you could let me know.
As a general rule when using ggplot2 you have to map on aesthetics if you want to get a legend, i.e. instead of setting the colors for your points as arguments map a value on the color aes, e.g. in my code below I map the constant value or category start on the color aes inside aes() for the first geom_point. Afterwards you could use scale_color_manual to assign your desired colors and labels to these "categories" or "values". Finally, the color of the legend box could be set via the theme option legend.background. However, the legend keys themselves have a background color too, which I set to NA via legend.key.
Using some fake random example data:
library(ggplot2)
library(ggthemes)
set.seed(123)
data <- data.frame(x = letters[1:5], value1 = runif(5, 0, 10000), value2 = runif(5, 0, 10000))
ggplot(data) +
geom_segment(aes(x = x, xend = x, y = value1, yend = value2), color = "black") +
geom_point(aes(x = x, y = value1, color = "start"), size = 4) +
geom_point(aes(x = x, y = value2, color = "end"), size = 4) +
coord_flip() +
theme_solarized() +
scale_y_continuous(breaks = seq(0, 10000, by = 500)) +
scale_color_manual(values = c(start = rgb(0.2, 0.7, 0.1, 1), end = rgb(0.7, 0.2, 0.1, 1)), labels = c(start = "Starting", end = "Ending")) +
labs(title = "Title", subtitle = "subtitle", x = "Business Functions", y = "# of hours", color = NULL) +
theme(
legend.key = element_rect(fill = NA),
legend.background = element_rect(fill = "darkgreen")
)

Include outliers in ggplot boxplot

I conducted some interviews and I wanted to create box plots with ggplot based on these interviews. I managed to create the box plots but I do not manage to include the outliers in the box plot. I have only a few observations and therefore I want the outliers to be part of the box plot.
This is the code that I have so far:
data_insurances_boxplot_merged <- ggplot(data_insurances_merged, aes(x = value, y = func, fill = group)) +
stat_boxplot(geom = "errorbar", width = 0.3, position = position_dodge(width = 0.75)) +
geom_boxplot() +
stat_summary(fun.y = mean, geom = "point", shape = 20, size = 3, color = "red",
position = position_dodge2(width = 0.75,
preserve = "single")) +
scale_x_continuous(breaks = seq(1, 7, 1), limits = c(1, 7)) +
scale_fill_manual(values = c("#E6645E", "#EF9C9D")) +
labs(x = "",
y = "", title = "") +
theme_light(base_size = 12) +
theme(legend.title = element_blank())
data_insurances_boxplot_merged
And this is the box plot that is generated:
Does anyone know how to achieve this?

How to create an individual line plot in between box plot in r

I'm trying to create a plot like this image below where the individual data lines are in between the box plots. Image to create in R ggplot2
The closest I am getting is something like this:
Image using ggplot2 but it looks a bit cluttered with the lines/points behind.
data1 %>%
ggplot(aes(Time,Trait)) +
geom_line(aes(group=ID), position = "identity")+
geom_point(aes(group=ID), shape=21, colour="black", size=2, position = "identity")+
geom_boxplot(width=.5,position = position_dodge(width=0.9), fill="white") +
stat_summary(fun.data= mean_cl_boot, geom = "errorbar", width = 0.1, position = position_dodge(width = .9)) +
stat_summary(fun = mean, geom = "point", shape = 18, size=3, position = "identity")+
facet_wrap(~Cond) +
theme_classic()
Any tips would be greatly appreciated!
One option to achieve your desired result would be to make use of continuous x scale. Doing so makes it possible to shift the box plots to the left or to right and vice versa for the points and lines:
Making use of some random data to mimic your real data set.
data1$Time1 <- as.numeric(factor(data1$Time, levels = c("Pre", "Post")))
data1$Time_box <- data1$Time1 + .1 * ifelse(data1$Time == "Pre", -1, 1)
data1$Time_lp <- data1$Time1 + .1 * ifelse(data1$Time == "Pre", 1, -1)
library(ggplot2)
ggplot(data1, aes(x = Time_box, y = Trait)) +
geom_line(aes(x = Time_lp, group=ID), position = "identity")+
geom_point(aes(x = Time_lp, group=ID), shape=21, colour="black", size=2, position = "identity")+
geom_boxplot(aes(x = Time_box, group=Time1), width=.25, fill="white") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.1) +
stat_summary(fun = mean, geom = "point", shape = 18, size=3, position = "identity") +
scale_x_continuous(breaks = c(1, 2), labels = c("Pre", "Post")) +
facet_wrap(~Cond) +
theme_classic()
DATA
set.seed(42)
data1 <- data.frame(
ID = rep(1:10, 4),
Time = rep(c("Pre", "Post"), each = 10),
Trait = runif(40),
Cond = rep(c("MBSR", "SME"), each = 20)
)
EDIT If you want to two boxplots side by side it's basically the same. However in that case you have to map the interaction of Time1 and the variable mapped on fill on the group aesthetic in geom_boxplot (and probably the error bars as well):
library(ggplot2)
set.seed(42)
data1 <- data.frame(
ID = rep(1:10, 4),
Time = rep(c("Pre", "Post"), each = 10),
Fill = rep(c("Fill1", "Fill2"), each = 5),
Trait = runif(40),
Cond = rep(c("MBSR", "SME"), each = 20)
)
ggplot(data1, aes(x = Time_box, y = Trait)) +
geom_line(aes(x = Time_lp, group=ID, color = Fill), position = "identity")+
geom_point(aes(x = Time_lp, group=ID, fill = Fill), shape=21, colour="black", size=2, position = "identity")+
geom_boxplot(aes(x = Time_box, group=interaction(Time1, Fill) , fill = Fill), width=.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.1) +
stat_summary(fun = mean, geom = "point", shape = 18, size=3, position = "identity") +
scale_x_continuous(breaks = c(1, 2), labels = c("Pre", "Post")) +
facet_wrap(~Cond) +
theme_classic()

log transform X axis R

I have the following raw data that I plotted in R:
And I would like to edit this plot to look like this version below which was made by log-transforming the X axis using Excel
However, when I run my code below using scale_x_log10(), the output is not the desired plot I was hoping to make. See image below:
Can anyone identify where I have gone wrong?
ggplot(data = data, aes(x = x, y = y, group = group, color = group)) +
stat_summary(fun = "mean", geom = "line", size = 1.2, aes(group = group, linetype = group, color = group)) +
stat_summary(fun = "mean", geom = "point", size = 3, aes(color = group)) +
theme_apa() +
scale_linetype_manual(values = c("solid", "dashed")) +
scale_color_manual(values = c("mediumturquoise", "red")) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_log10(limits = c(.01, 40), breaks = c(.01, .1, 1, 10))
It looks like your first datapoint is at zero - this can't be displayed on a log scale. You'll need to work out if there's a difference in you data in excel, failing that you could achieve a similar result by modifying the lowest value of x with:
ggplot(data = data, aes(x = pmax(x,0.01), y = y, group = group, color = group)) +
stat_summary(fun = "mean", geom = "line", size = 1.2, aes(group = group, linetype = group, color = group)) +
stat_summary(fun = "mean", geom = "point", size = 3, aes(color = group)) +
theme_apa() +
scale_linetype_manual(values = c("solid", "dashed")) +
scale_color_manual(values = c("mediumturquoise", "red")) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_x_log10(limits = c(.01, 40), breaks = c(.01, .1, 1, 10))

custom color for each group + category combination raincloud plot

I have a raincloud plot:
but I would like each combination of TL group and yr to be a different color, as one can do in base boxplot():
I have tried using the following code for the raincloud plot:
Y_C_rain= ggplot(yct_rain, aes(y=d13C, x=lengthcat,fill = yr,color=yr)) +
geom_flat_violin(position = position_nudge(x = .2, y =0), alpha = .8)+
geom_point(aes(y = , color = yr),
position = position_jitter(width = .05), size = 2, alpha = .5) +
geom_boxplot(width = .3, guides = FALSE, outlier.shape = NA, alpha = 0, notch = FALSE) +
stat_summary(fun= mean, geom = "point", shape = 21, size = 3, fill = "black") +
scale_y_continuous (limits = c(-35,-10),expand = c(0,0),breaks=seq(-35,-10,5)) +
ylab("d13C") + xlab("TL group") +
ggtitle("YCT d13C") +
theme_bw() +
scale_colour_discrete(my_clrs_yct)+
scale_fill_discrete(my_clrs_yct)
Y_C_rain
I know that the colors in the rain plot will need to be coded with some variant of scale_fill_xxx but I am hitting a road block since it appears that each point also needs to have its own color. Therefore the variations of scale_fill_xxx with only 6 individual colors listed is not working.
Do you want something like this?
library(dplyr)
library(data.table)
library(ggplot2)
# used geom_flat_violin from https://gist.github.com/dgrtwo/eb7750e74997891d7c20
my_clrs_yct <- c("#404040", "#407a8c", "#7a7a7a", "#404f86", "#a6a6a6", "#3e1451")
## used storms from dplyr as reproducible example
data("storms")
setDT(storms)
storms[, season:= factor(ifelse(month <=6, "Q12", "Q34"))]
ggplot(storms, aes(x=status, y=pressure, color=interaction(status, season),
fill=interaction(status, season))) +
geom_point(aes(color = interaction(status, season)),
position = position_jitterdodge(
jitter.width=.1, dodge.width=.25), size = 2, alpha = .5)+
geom_flat_violin(position = position_nudge(x = .5, y =0), alpha = .5)+
geom_boxplot(width = .3, guides = FALSE, outlier.shape = NA, alpha = 0)+
stat_summary(fun = mean, geom = "point", shape = 21, size = 3,
fill = "black", position = position_nudge(x = c(-.075,.075), y =0)) +
theme_bw() +
scale_colour_manual(values=my_clrs_yct) +
scale_fill_manual(values=my_clrs_yct)

Resources