ggplot : several histogram as one - r

I want to plot the results of a benchmark of several bioinformatics tools, using ggplot. I would like t have all the bars on the same graph instead of having one graph for each tool. I already have an output with LibreOffice (see image below), but I want to re-do it with ggplot.
For now I have this kind of code for each tool (example with the first one) :
data_reduced <- read.table("benchmark_groups_4sps", sep="\t", header=TRUE)
p<-ggplot(data=data_reduced, aes(x=Nb_sps, y=OrthoFinder)) +
geom_bar(stat="identity", color="black", fill="red") +
xlab("Number of species per group") + ylab("Number of groups") +
geom_text(aes(label=OrthoFinder), vjust=1.6, color="black", size=3.5)
But I have not found out how to paste together all the graphes, but not how to merge them into a single one.
My input data :
Nb_species OrthoFinder FastOrtho POGS (no_para) POGS (soft_para) proteinOrtho
4 125 142 152 202 114
5 61 65 42 79 44
6 37 29 15 21 8
7 19 17 4 7 5
8 15 10 1 0 0
9 10 2 0 0 0
Thanks !

Maybe this can help you in the right direction:
# sample data
df = data.frame(Orthofinder=c(1,2,3), FastOrtho=c(2,3,4), POGs_no_para=c(1,2,2))
library(reshape2)
library(dplyr)
# first let's convert the dataset: Convert to long format and aggregate.
df = melt(df, id.vars=NULL)
df = df %>% group_by(variable,value) %>% count()
# Then, we create a plot.
ggplot(df, aes(factor(value), n, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
There is enough documentation around on formatting a plot, so I'll leave that to you ;) Hope this helps!
EDIT: Since the question was changed to work with a different dataset as origin while I was typing my answer, here is the modified code to work with that:
df = data.frame(Nb_species = c(4,5,6,7), OrthoFinder=c(125,142,100,110), FastOrtho=c(100,120,130,140))
library(reshape2)
library(dplyr)
df = melt(df, id.vars="Nb_species")
ggplot(df, aes(factor(Nb_species), value, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")

Related

How to combine vlines from one dataframe with series from another dataframe using GGPLOT2 in R

I am trying to make a graph that will plot the cumulative sum value of different customers which will reset whenever a new order is placed. When a new order is placed, it will be indicated with a DateTick = 1 and I've tried to add this to my plots with vlines. Unfortunately, the plot will only show me either the correct Vlines or the correct series lines.
The data I'm using looks something like this
> head(CUSTWP)
# A tibble: 6 x 6
# Groups: Customer [1]
Customer YearWeek `Corrected Delta` `Ordered Quantity TU` DateTick ROP
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 CustLoc1 2020-01 46 NA 0 46
2 CustLoc1 2020-02 148 NA 0 194
3 CustLoc1 2020-03 150 NA 0 344
4 CustLoc1 2020-04 186 NA 0 530
5 CustLoc1 2020-05 205 NA 0 735
6 CustLoc1 2020-06 246 NA 0 981
I used below mentioned code to create the graphs.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ ., scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
When I execute the code, I get a result as can be seen in the link.
The issue with this graph is that the Vlines are the orders DateTicks for all customers rather than the DateTicks grouped by customer. I've tried a different code that somehow produces the correct graphs but also a bunch of incorrect graphs with below-mentioned code.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ z, scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
The above code creates a matrix of plots but the only correct ones are the plots on the diagonal line running from top left to bottom right.
I would really appreciate your input on this as I've been stuck on this for quite some time. Thank you in advance and apologies for the incorrect posting standards, this is my first post.

coloring legend in bar chart in R

I'm faily new to R and do need some help in manipulating my graph. I'm trying to compare actual and forecast figures, but cannot get the coloring of the legends right. The data looks like this:
hierarchy Actual Forecast
<fctr> <dbl> <dbl>
1 E 9313 5455
2 K 6257 3632
3 O 7183 8684
4 A 1579 6418
5 S 8755 0149
6 D 5897 7812
7 F 1400 8810
8 G 4960 5710
9 R 3032 0412
And the code looks like this:
ggplot(sam4, aes(hierarchy))+ theme_bw() +
geom_bar(aes(y = Actual, colour="Actual"),fill="#66FF33", stat="identity",position="dodge", width=0.40) +
geom_bar(aes(y = Forecast, colour="Forecast"), fill="#FF3300", stat="identity",position="dodge", width=0.2)
The graph ends up looking like this:
I believe your problem is that your data is not formatted well to use ggplot. You want to tidy up your dataframe first. Check out http://tidyr.tidyverse.org/ to get familiar with the concept of tidy data.
Using the tidyverse (ggplot is part of it), I tidied up your data and I believe got the plot you want.
library(tidyverse) #includes ggplot
newdata <- gather(sam4, actualorforecast, value, -hierarchy)
ggplot(newdata, aes(x = hierarchy)) +
theme_bw() +
geom_bar(aes(y = value, fill = actualorforecast),
stat = "identity",
width = ifelse(newdata$actualorforecast == "Actual", .4, .2),
position = "dodge") +
scale_fill_manual(values= c(Actual ="#66FF33", Forecast="#FF3300"))

R: Arranging axis using ggplot2

I am trying to present my data using ggplot2. My dataframe is build up like this:
type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10
.. ... ..
I am trying to present the data by plotting as histograms and boxplots, but I encounter some problems.
For the histograms I used the following code:
ggplot(hisdat, aes(x=count, fill=type)) +
geom_histogram(binwidth=.5, position="dodge")
and that gives me this plot:
As you can see the counts in the bottom of the plot are arranged such that 10 follows 1 and 100 follows 10. I arrange them from the first single number of the number count. How do I get it to go from 1-148?
For the boxplot I have the same trouble and on top of that my plot is not looking like a boxplot at all. Is my code wrong?
ggplot(hisdat, aes(x=type, y=count, fill=type)) + geom_boxplot()
It gives me this result:
since the other part of your question has already been answered in the comments here is the answer to this part:
How do I get it to go from 1-148?
df <- read.table(header = TRUE, text=
" type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10")
library(ggplot2)
library(ggplot2)
ggplot(df, aes(x = reorder(type, count), y = count, fill = type)) + geom_bar(stat = "identity", position = "dodge")

Boxplot R select data with non unique values only

I have a data frame like this
head(data)
n OESST wsB
4 0.52924690 4
8 0.04488144 6
6 0.29909668 6
0 1.42228888 6
2 1.92228888 4
4 1.85659560 6
and I am doing a box plot of OESST as a function of wsB for the different n values
ggplot(na.omit(data), aes(x=factor(wsB), y=OESST, colour = factor(n))) + geom_boxplot(outlier.size=0,fill = "white",position="dodge",size=0.3,alpha=0.3) + stat_summary(fun.y=median, geom="line", aes(group=factor(n), colour = factor(n)),size=1)
What I would like to do is to remove from the plot the unique n-wsB combinations (which are visualized only as a line but don't have actually a box).
Any help?
Thanks
I think the best approach is just filter your data first. Using dplyr
library(dplyr)
data %>%
group_by(n, wsB) %>%
mutate(n.wsB.count = n()) %>%
filter(n.wsB.count > 1) %>%
na.omit() %>%
ggplot(aes(x=factor(wsB), y=OESST, colour = factor(n))) +
geom_boxplot(outlier.size=0,fill = "white", position="dodge", size=0.3, alpha=0.3) +
stat_summary(fun.y=median, geom="line", aes(group=factor(n)), size=1)
Not tested as (#MrFlick points out) the provided data isn't reproducible for the problem. I also took out the redundant colour aesthetic in the stat_summary.

R ggplot barplot; Fill based on two separate variables

A picture says more than a thousand words. As you can see, my fill is based on the variable variable.
Within each bar there is however multiple data entities (black borders) since the discrete variable complexity make them unique. What I am trying to find is something that makes each section of the bar more distinguishable than the current look. Preferable would be if it was something like shading.
Here's an example (not the same dataset, since the original was imported):
dat <- read.table(text = "Complexity Method Sens Spec MMC
1 L Alpha 50 20 10
2 M Alpha 40 30 80
3 H Alpha 10 10 5
4 L Beta 70 50 60
5 M Beta 49 10 80
6 H Beta 90 17 48
7 L Gamma 19 5 93
8 M Gamma 18 39 4
9 H Gamma 10 84 74", sep = "", header=T)
library(ggplot2)
library(reshape)
short.m <- melt(dat)
ggplot(short.m, aes(x=Method, y= value/100 , fill=variable)) +
geom_bar(stat="identity",position="dodge", colour="black") +
coord_flip()
This is far from perfect, but hopefully a step in the right direction, as it's dodged by variable, but still manages to represent Complexity in some way:
ggplot(short.m, aes(x=Method, y=value/100, group=variable, fill=variable, alpha=Complexity,)) +
geom_bar(stat="identity",position="dodge", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) +
coord_flip()
Adding alpha=complexity might work:
ggplot(short.m, aes(x=Method, y= value/100 , fill=variable, alpha=complexity)) +
geom_bar(stat="identity",position="dodge", colour="black") + coord_flip()
You might need to separate your Method and variable factors. Here are two ways to do that:
Use facet_wrap():
ggplot(short.m, aes(x=variable, y=value/100, fill=Complexity)) +
facet_wrap(~ Method) + geom_bar(position="stack", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) + coord_flip()
Use both on the x-axis:
ggplot(short.m, aes(x=Method:variable, y=value/100, group=Method, fill=variable, alpha=Complexity,)) +
geom_bar(stat="identity", position="stack", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) + coord_flip()

Resources