ggplot - multiple boxplots - r

I'm trying to create boxplots with this dataset faceting by factor mix (3 boxplots combined):
daf <- read.table("http://pastebin.com/raw.php?i=xxYjmdgD", header=T, sep="\t")
This is what the sample looks like:
ia mix Rs
1 Fluazinam 1 0.62
2 Fluazinam 1 0.76
3 Fluazinam 1 0.76
4 Fluazinam 1 0.52
5 Fluazinam 1 0.56
6 Fluazinam 1 0.20
7 Fluazinam 1 0.98
235 Carbendazim+Cresoxim-Metílico+Tebuconazole 3 0.65
236 Carbendazim+Cresoxim-Metílico+Tebuconazole 3 0.28
237 Carbendazim+Cresoxim-Metílico+Tebuconazole 3 0.41
These are my failed attempts!
library(ggplot2)
qplot( Rs, ia, data=daf) +
facet_grid(mix ~ ., scales = "free", space = "free", labeller = label_both)
» When I add the qplot( Rs, ia, data=daf, geom="boxplot")
It simply appear a line, not the box.
ggplot(data=daf, aes(x=ia, y=Rs))+
geom_boxplot(outlier.colour = "black", outlier.size = 2) +
coord_flip() + theme_bw() +
scale_y_continuous(breaks=seq(0,1,by=0.25))+
stat_summary(fun.y = mean, geom="point", shape = 4, size = 3, colour = "blue") +
facet_grid(mix ~. , scales = "free", space="free", labeller = label_both)
» It repeats every "ia" level into each "mix" level
ggplot(data=daf, aes(x=ia, y=Rs))+
geom_boxplot(outlier.colour = "black", outlier.size = 2) +
layer(data = a, mapping = aes(x = ia, y= 0, label=a$Rs.median),
geom = "text", color="NavyBlue", size=3.5) +
coord_flip() + theme_bw() +
scale_y_continuous(breaks=seq(0,1,by=0.25))+
stat_summary(fun.y = mean, geom="point", shape = 4, size = 3, colour = "blue")
Finally I'd like a combination of the three plots:
from the first plot, the facet.grid(without repeating "ia" variables), from the second one, the boxes, and from the third one the median values in the left inside margin, and if it could be possible, into each level of factor "mix", reordering the "ia" by median values...
Could someone help me with this??
Thanks in advance!

geom_boxplot assumes the categorical variables are on the x-axis. coord_flip doesn't work in combination with facet_grid + geom_boxplot. One workaround is to rotate the text. You can export and rotate the image in another program (or figure out how to pull out the grid object and rotate it).
a = ddply(daf, .(ia,mix), function(x) c(Rs=median(x$Rs, na.rm=TRUE)))
ggplot( data=daf, aes(x=ia, y=Rs) ) +
geom_boxplot() +
facet_wrap(~mix, scales="free_x") +
stat_summary(fun.y = mean, geom="point", shape = 4, size = 3, colour = "blue") +
theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust=0.5)) +
theme(axis.title.x=element_text(angle = 90, vjust=0.5)) +
theme(axis.text.y=element_text(angle = 90, hjust=0.5)) +
theme(strip.text=element_text(angle = 90, hjust=0.5)) +
geom_text(data = a, mapping = aes(x = ia, y= 0.02, label=round(Rs,2)),
color="NavyBlue", size=3.5, angle=90, hjust=1) +
ylim(-0.03,1)

I found https://github.com/lionel-/ggstance and thought I'd make an alternative answer.
library(devtools)
devtools::install_github("lionel-/ggstance")
library(ggplot2)
library(ggstance)
daf <- read.table("http://pastebin.com/raw.php?i=xxYjmdgD", header=T, sep="\t")
library(dplyr)
a = daf %>%
group_by(ia, mix) %>%
summarize(Rs=mean(Rs))
ggplot(daf, aes(x=Rs, y=ia)) +
geom_boxploth() +
geom_point(data=a, shape = 4, size = 3, colour = "blue") +
geom_text(data = a, mapping = aes(y = ia, x=0, label=round(Rs,2)),
color="NavyBlue", size=3.5, hjust=0) +
facet_grid(mix~., scales="free_y")

Related

Ho to color specific values in scatter plot in R

A sample of my data is:
df<-read.table (text=" No value
1 -1.25
2 -0.9
3 0.91
4 2.39
5 1.54
6 1.87
7 -2.5
8 -1.73
9 1.26
10 -2.1
", header=TRUE)
The numbers outside of -2 and +2 should be coloured, let's say, red. In this example, the number are 4,7 and 10, Here is my effort :
ggplot(df, aes(x=No, y=value)) +
theme_bw()+geom_text(aes(label=No))+
geom_hline(yintercept=2, linetype="dashed", color = "red")+
geom_hline(yintercept=-2, linetype="dashed", color = "red")
Use ggplot2's aesthetics for color= (and a manual color scale).
ggplot(df, aes(x=No, y=value)) +
theme_bw() + geom_text(aes(label=No, color=abs(value)>2))+
geom_hline(yintercept=2, linetype="dashed", color = "red")+
geom_hline(yintercept=-2, linetype="dashed", color = "red")+
scale_color_manual(values = c("FALSE" = "black", "TRUE" = "red"))
Reduction: you can combine your geom_hline's if you'd like,
ggplot(df, aes(x=No, y=value)) +
theme_bw() + geom_text(aes(label=No, color=abs(value)>2))+
geom_hline(yintercept=c(-2,2), linetype="dashed", color = "red")+
scale_color_manual(values = c("FALSE" = "black", "TRUE" = "red"))
In general, I prefer to use as few geom_*s as strictly required, relying more in ggplot2's internal grouping and aesthetic handling: it is robust, elegant, and at times more flexible when the data changes. There are certainly times when I use multiple geom_* calls and bespoke subsets of the data for each, so it's not a broken paradigm.
The naming of the legend is unlikely to be satisfactory in the long term. You can remove it entirely with ... + guides(color="none"), or you can pre-process the variable in as Tom's answer demonstrates, providing a way to control the name of the group and its apparent levels.
You could create two geom_text by subset your data twice based on your conditions like this:
library(ggplot2)
ggplot() +
geom_text(data = subset(df, value >=2 | value <= -2),
aes(x=No, y=value, label = No), color = "red") +
geom_text(data = subset(df, value < 2 & value > -2),
aes(x=No, y=value, label = No)) +
geom_hline(yintercept=2, linetype="dashed", color = "red")+
geom_hline(yintercept=-2, linetype="dashed", color = "red")+
theme_bw()
Created on 2023-01-19 with reprex v2.0.2
Mutating a new column with the group
df %>%
mutate(group = if_else(between(value, -2, 2), "Inside", "Outside")) %>%
ggplot() +
aes(No, value) +
geom_text(aes(label = No, col = group)) +
theme_bw() +
geom_hline(yintercept=2, linetype="dashed", color = "red")+
geom_hline(yintercept=-2, linetype="dashed", color = "red") +
scale_color_manual(values = c("Inside" = "black", "Outside" = "red"))

How to put a black border around certain dots on a ggplot geom_point plot

I am trying to make a geom plot using ggplot for some pathways of interest. I would like to put a black border around certain dots that are significant. -log10 > 1.2, so they are easier to identify. Is there anyway to do this in the package so I do not have to do in an illustrator package after I have produced the image? Thank you kindly for advice.
Image of current dot image:
Image of raw data:
cols <- c("blue",
"white",
"red")
li <- c(-2, 2)
D1 <- ggplot(Practice, aes(Practice$case, Practice$pathway,
colour = Enrichment_score, size = Practice$ln)) +
geom_point(alpha = 0.8) +
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(breaks = c(0, 1.2, 1.4), range = c(0.06,12)) +
guides(size=guide_legend(title = "-log10(q value)"),
scale_colour_gradient()) +
labs(colour = "Enrichment Score") +
theme_bw()
D1 + ggtitle("") +
xlab("") + ylab("") +
scale_x_discrete(limits=c("Responder vs Non-responder",
"Non-responder vs Control",
"Responder vs Control",
"Case vs Control"))
Since I do not have your original data, and you don't have an example graph, I'll use diamonds to see if this is want you want.
To "circle" the data point that you want to highlight, we can use an extra geom_point, and use some subset of data in it.
In your case, the subset can be like geom_point(data = subset(Practice, -log10(Enrichment_score) > 1.2), col = "black", stroke = 3, shape = 21).
library(tidyveres)
cols <- c("blue", "white", "red")
ggplot(diamonds, aes(cut, clarity,
colour = price, size = depth)) +
geom_point(alpha = 0.8) +
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(breaks = c(0, 1.2, 1.4), range = c(0.06,12)) +
guides(size=guide_legend(title = "-log10(q value)"),
scale_colour_gradient()) +
labs(colour = "Enrichment Score") +
theme_bw() +
geom_point(data = subset(diamonds, depth > 70), col = "black", stroke = 3, shape = 21)
Also, you don't need to use the dollar sign $ to specify column names in ggplot.
Another way, which may be simpler, is to use shape 21 with geom_point:
library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_point(shape = 21, stroke = 1, aes(colour = disp >= 250, fill = hp)) +
scale_colour_manual(values = c(`TRUE` = "black", `FALSE` = rgb(0,0,0,0)))
The manual colour scale makes the edge of shape 21 either black or transparent. Note the backticks for TRUE or FALSE.

R graph: label by group

The data I am working on is a clustering data, with multiple observations within one group, I generated a caterpillar plot and want labelling for each group(zipid), not every line, my current graph and code look like this:
text = hosp_new[,c("zipid")]
ggplot(hosp_new, aes(x = id, y = oe, colour = zipid, shape = group)) +
# theme(panel.grid.major = element_blank()) +
geom_point(size=1) +
scale_shape_manual(values = c(1, 2, 4)) +
geom_errorbar(aes(ymin = low_ci, ymax = high_ci)) +
geom_smooth(method = lm, se = FALSE) +
scale_linetype_manual(values = linetype) +
geom_segment(aes(x = start_id, xend = end_id, y = region_oe, yend = region_oe, linetype = "4", size = 1.2)) +
geom_ribbon(aes(ymin = region_low_ci, ymax = region_high_ci), alpha=0.2, linetype = "blank") +
geom_hline(aes(yintercept = 1, alpha = 0.2, colour = "red", size = 1), show.legend = "FALSE") +
scale_size_identity() +
scale_x_continuous(name = "hospital id", breaks = seq(0,210, by = 10)) +
scale_y_continuous(name = "O:E ratio", breaks = seq(0,7, by = 1)) +
geom_text(aes(label = text), position = position_stack(vjust = 10.0), size = 2)
Caterpillar plot:
Each color represents a region, I just want one label/per region, but don't know how to delete the duplicated labels in this graph.
Any idea?
The key is to have geom_text return only one value for each zipid, rather than multiple values. If we want each zipid label located in the middle of its group, then we can use the average value of id as the x-coordinate for each label. In the code below, we use stat_summaryh (from the ggstance package) to calculate that average id value for the x-coordinate of the label and return a single label for each zipid.
library(ggplot2)
theme_set(theme_bw())
library(ggstance)
# Fake data
set.seed(300)
dat = data.frame(id=1:100, y=cumsum(rnorm(100)),
zipid=rep(LETTERS[1:10], c(10, 5, 20, 8, 7, 12, 7, 10, 13,8)))
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
stat_summaryh(fun.x=mean, aes(label=zipid, y=1.02*max(y)), geom="text") +
guides(colour=FALSE)
You could also use faceting, as mentioned by #user20650. In the code below, panel.spacing.x=unit(0,'pt') removes the space between facet panels, while expand=c(0,0.5) adds 0.5 units of padding on the sides of each panel. Together, these ensure constant spacing between tick marks, even across facets.
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
facet_grid(. ~ zipid, scales="free_x", space="free_x") +
guides(colour=FALSE) +
theme_classic() +
scale_x_continuous(breaks=0:nrow(dat),
labels=c(rbind(seq(0,100,5),'','','',''))[1:(nrow(dat)+1)],
expand=c(0,0.5)) +
theme(panel.spacing.x = unit(0,"pt"))

Adjusting distance between groups of bars in ggplot2

This is my data:
> sum.ex
Timepoint mean n sd Time Group
A1 A1-All 1.985249 26 1.000180 A1 All
A1-pT2D A1-pT2D 1.913109 13 1.012633 A1 pT2D
A1-Control A1-Control 2.934105 13 2.472951 A1 Control
B1 B1-All 2.555601 25 1.939970 B1 All
B1-pT2D B1-pT2D 2.057389 13 1.023416 B1 pT2D
B1-Control B1-Control 2.145555 12 1.089522 B1 Control
This is my code:
png('ex')
ggplot(sum.ex, aes(x = Timepoint, y = mean)) +
  geom_bar(width = 0.5, position = position_dodge(width = 200), stat="identity", aes(fill = Group)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), size = 1, shape = 1, width = 0.2) +
scale_fill_manual(values = c("#333333", "#FF0000", "#0000FF")) +
xlab(NULL) +
ggtitle("PLIN1") + theme_bw() + theme(panel.grid.major = element_blank())
dev.off()
This is the output:
However, I want to have Black+Red+Blue really close, then a space and then Black+Red+Blue really close again.
Thank you!
I think this is easiest to achieve if you use x = Time and fill = Group. Something like:
dodge <- position_dodge(width = 0.5)
ggplot(df, aes(x = Time, y = mean, fill = Group)) +
geom_bar(width = 0.5, stat="identity", position = dodge) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd),
position = dodge, size = 1, shape = 1, width = 0.2) +
scale_fill_manual(values = c("#333333", "#FF0000", "#0000FF")) +
theme_bw() +
theme(panel.grid.major = element_blank())
Plot against Time only, then position_dodge has meaning for the bars (there are 3 observations per group). Use position_dodge with width close to the bar width. Add group=Group to make the errorbars behave like the bars (you need it since they don't have colour aesthetic to distinguish them). Use the same position_dodge width as before to align them properly.
ggplot(sum.ex, aes(x = Time, y = mean)) +
geom_bar(width = 0.5, position = position_dodge(width = 0.5), stat = "identity", aes(fill = Group)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd, group=Group), position=position_dodge(width = 0.5), size = 1, shape = 1, width = 0.2) +
scale_fill_manual(values = c("#333333", "#FF0000", "#0000FF")) +
xlab(NULL) +
ggtitle("PLIN1") + theme_bw() + theme(panel.grid.major = element_blank())

How to overlay two geom_bar?

I'm trying to overlay 2 the bars from geom_bar derived from 2 separate data.frames.
dEQ
lab perc
1 lmP 55.9
2 lmN 21.8
3 Nt 0.6
4 expG 5.6
5 expD 0.0
6 prbN 11.2
7 prbP 5.0
and
LMD
lab perc
1 lmP 16.8
2 lmN 8.9
3 Nt 0.0
4 expG 0.0
5 expD 0.0
6 prbN 0.0
7 prbP 0.0
The first plot is:
p <- ggplot(dEQ, aes(lab, perc)) +
xlab(xlabel) + ylab(ylabel) +
geom_bar(stat="identity", colour="blue", fill="darkblue") +
geom_text(aes(vecX, vecYEQ+1.5, label=vecYlbEQ), data=dEQ, size=8.5) +
theme_bw() +
opts(axis.text.x = theme_text(size = 20, face = "bold", colour = "black")) +
opts(axis.text.y = theme_text(size = 20, face = "bold", colour = "black")) +
coord_flip() +
scale_y_continuous(breaks=c(0,10,20,30,40,50,60),
labels=c("0","","20","","40","","60"),
limits = c(0, 64), expand = c(0,0))
print(p)
but I want to overplot with another geom_bar from data.frame LMD
ggplot(LMD, aes(lab, perc)) +
geom_bar(stat="identity", colour="blue", fill="red", add=T)
and I want to have a legend.
here is an example:
p <- ggplot(NULL, aes(lab, perc)) +
geom_bar(aes(fill = "dEQ"), data = dEQ, alpha = 0.5) +
geom_bar(aes(fill = "LMD"), data = LMD, alpha = 0.5)
p
but I recommend to rbind them and plot it by dodging:
dEQ$name <- "dEQ"
LMD$name <- "LMD"
d <- rbind(dEQ, LMD)
p <- ggplot(d, aes(lab, perc, fill = name)) + geom_bar(position = "dodge")
Though the answer is not directly the requirement of OP, but as this question is linked to many subsequent questions on SO that have been closed by giving the link of this question, I am proposing a method for bar(s) within bar plot construction method in ggplot2.
Example for two bars (group-wise division) within one bigger bar plot.
library(tidyverse)
set.seed(40)
df <- data_frame(name = LETTERS[1:10], provision = rnorm(mean = 100, sd = 20, n = 10),
expenditure = provision - rnorm(mean = 25, sd = 10, n = 10))
df %>% mutate(savings = provision - expenditure) %>%
pivot_longer(cols = c("expenditure", "savings"), names_to = "Exp", values_to = "val") %>%
ggplot() + geom_bar(aes(x= name, y = provision/2), stat = "identity", fill = "blue", width = 0.9, alpha = 0.3) +
geom_col(aes(x=name,y=val, fill = Exp), position ="dodge", width = 0.7) +
scale_y_continuous(name = "Amount in \u20b9")
Another option to overlay your bars without lowering transparency using alpha is to group_by the data based on your fill variable and arrange(desc()) your y variable, using position = position_identity() to overlay your bars and have the highest value bars behind and lower values in front. Then you don't need to change the transparency. Here is a reproducible example:
# Add name for fill aesthetic
dEQ$name <- "dEQ"
LMD$name <- "LMD"
library(dplyr)
library(ggplot2)
dEQ %>%
rbind(LMD) %>%
group_by(name) %>%
arrange(desc(perc)) %>%
ggplot(aes(x = lab, y = perc, fill = name)) +
geom_bar(stat="identity", position = position_identity())
Created on 2022-11-02 with reprex v2.0.2
As you can see the bars overlay while keeping the origin transparency.

Resources