R ggplot barplot; Fill based on two separate variables - r

A picture says more than a thousand words. As you can see, my fill is based on the variable variable.
Within each bar there is however multiple data entities (black borders) since the discrete variable complexity make them unique. What I am trying to find is something that makes each section of the bar more distinguishable than the current look. Preferable would be if it was something like shading.
Here's an example (not the same dataset, since the original was imported):
dat <- read.table(text = "Complexity Method Sens Spec MMC
1 L Alpha 50 20 10
2 M Alpha 40 30 80
3 H Alpha 10 10 5
4 L Beta 70 50 60
5 M Beta 49 10 80
6 H Beta 90 17 48
7 L Gamma 19 5 93
8 M Gamma 18 39 4
9 H Gamma 10 84 74", sep = "", header=T)
library(ggplot2)
library(reshape)
short.m <- melt(dat)
ggplot(short.m, aes(x=Method, y= value/100 , fill=variable)) +
geom_bar(stat="identity",position="dodge", colour="black") +
coord_flip()

This is far from perfect, but hopefully a step in the right direction, as it's dodged by variable, but still manages to represent Complexity in some way:
ggplot(short.m, aes(x=Method, y=value/100, group=variable, fill=variable, alpha=Complexity,)) +
geom_bar(stat="identity",position="dodge", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) +
coord_flip()

Adding alpha=complexity might work:
ggplot(short.m, aes(x=Method, y= value/100 , fill=variable, alpha=complexity)) +
geom_bar(stat="identity",position="dodge", colour="black") + coord_flip()

You might need to separate your Method and variable factors. Here are two ways to do that:
Use facet_wrap():
ggplot(short.m, aes(x=variable, y=value/100, fill=Complexity)) +
facet_wrap(~ Method) + geom_bar(position="stack", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) + coord_flip()
Use both on the x-axis:
ggplot(short.m, aes(x=Method:variable, y=value/100, group=Method, fill=variable, alpha=Complexity,)) +
geom_bar(stat="identity", position="stack", colour="black") +
scale_alpha_manual(values=c(0.1, 0.5, 1)) + coord_flip()

Related

Adding names to all X axis values using ggplot2 in R

The head of data frame is as follows:
Age number
21 4
22 4
23 5
24 6
25 11
26 10
I am trying to plot the frequency chart using ggplot using the following code
ggplot(data=x2, aes(x=Age, y=number)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=number), vjust=-0.3, size=3.5)+
theme_minimal()+ labs(x = "Age", y = "Number of users")+
ggtitle("Frequency of Age")
and I get the output but not all the values on the X Axis are visible. I am sorry as this might be a very silly question but I am very new to R.
You can use scale_x_continuous to set the axis breaks. With such a large number of axis labels, this probably works better if the orientation is flipped. Even then, it's still quite crowded.
library(tidyverse)
# Fake data
set.seed(2)
x2 = data_frame(Age=sample(20:70, 1000, replace=TRUE)) %>%
group_by(Age) %>%
summarise(number=n())
ggplot(data=x2, aes(x=Age, y=number)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=number, y=0.5*number), size=3, colour="white")+
theme_minimal() +
labs(x = "Age", y = "Number of users")+
ggtitle("Frequency of Age") +
coord_flip() +
scale_x_continuous(breaks=min(x2$Age):max(x2$Age), expand=c(0,0.1)) +
scale_y_continuous(expand=c(0,0.2))

ggplot : several histogram as one

I want to plot the results of a benchmark of several bioinformatics tools, using ggplot. I would like t have all the bars on the same graph instead of having one graph for each tool. I already have an output with LibreOffice (see image below), but I want to re-do it with ggplot.
For now I have this kind of code for each tool (example with the first one) :
data_reduced <- read.table("benchmark_groups_4sps", sep="\t", header=TRUE)
p<-ggplot(data=data_reduced, aes(x=Nb_sps, y=OrthoFinder)) +
geom_bar(stat="identity", color="black", fill="red") +
xlab("Number of species per group") + ylab("Number of groups") +
geom_text(aes(label=OrthoFinder), vjust=1.6, color="black", size=3.5)
But I have not found out how to paste together all the graphes, but not how to merge them into a single one.
My input data :
Nb_species OrthoFinder FastOrtho POGS (no_para) POGS (soft_para) proteinOrtho
4 125 142 152 202 114
5 61 65 42 79 44
6 37 29 15 21 8
7 19 17 4 7 5
8 15 10 1 0 0
9 10 2 0 0 0
Thanks !
Maybe this can help you in the right direction:
# sample data
df = data.frame(Orthofinder=c(1,2,3), FastOrtho=c(2,3,4), POGs_no_para=c(1,2,2))
library(reshape2)
library(dplyr)
# first let's convert the dataset: Convert to long format and aggregate.
df = melt(df, id.vars=NULL)
df = df %>% group_by(variable,value) %>% count()
# Then, we create a plot.
ggplot(df, aes(factor(value), n, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
There is enough documentation around on formatting a plot, so I'll leave that to you ;) Hope this helps!
EDIT: Since the question was changed to work with a different dataset as origin while I was typing my answer, here is the modified code to work with that:
df = data.frame(Nb_species = c(4,5,6,7), OrthoFinder=c(125,142,100,110), FastOrtho=c(100,120,130,140))
library(reshape2)
library(dplyr)
df = melt(df, id.vars="Nb_species")
ggplot(df, aes(factor(Nb_species), value, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")

Use alpha values provided in data

I would like to use the explicit values for the alpha level.
head(D)
x y group alpha
1 1 18 A 0.40 <~~~~
2 2 18 A 0.44
3 3 18 A 0.47
4 1 18 A 0.51
5 2 21 B 0.55
6 3 21 B 0.58
...
However, ggplot is scaling the alpha levels. I can override this using scale_alpha_continuous(range = range(D$alpha)), but this becomes a nuisance when creating the graph programmatically.
Is there a direct way to tell ggplot NOT to scale alpha? (instead of telling it what range to scale to)
Reproducible Exmple
library(ggplot)
library(gridExtra)
(D <- data.frame(x=rep(1:3, 4), y=rep((6:8)*3, each=4), group=rep(c("A","B", "C"), each=4), alpha=round(seq(.4, .8, length.out=12), 2)))
P <- ggplot(data=D, aes(x=x, y=y, alpha=alpha)) + geom_bar(stat="identity", fill="blue") + theme(legend.position="bottom") + facet_grid(group ~. )
### Adding scale_alpha_continuous
P.manually_scaled <- P + scale_alpha_continuous(range=range(D$alpha))
grid.arrange( P + ggtitle("INCORRECT")
, P.manually_scaled + ggtitle("CORRECT")
, ncol=2)
If you have actual alpha, color, ..., values then you should use ..identity() scales. This will tell ggplot() to assign alpha values as they are in your data frame and not to scale them.
ggplot(data=D, aes(x=x, y=y, alpha=alpha)) +
geom_bar(stat="identity", fill="blue") +
facet_grid(group ~. ) +
scale_alpha_identity()

geom_lines not linking what they should with error bars plot in ggplot

I have the following dataset ready to plot an error bars and lines graph
> growth
treatment class variable N value sd se ci
1 elevated Dominant RBAI2012 18 0.014127713 0.009739951 0.002295728 0.004843564
2 elevated Dominant RBAI2013 18 0.021869978 0.013578741 0.003200540 0.006752549
3 elevated Codominant RBAI2012 40 0.011564725 0.013718591 0.002169100 0.004387418
4 elevated Codominant RBAI2013 41 0.011471512 0.011091167 0.001732149 0.003500804
5 elevated Subordinate RBAI2012 24 0.004419784 0.009286883 0.001895677 0.003921507
6 elevated Subordinate RBAI2013 24 0.004397105 0.008704831 0.001776866 0.003675728
7 ambient Dominant RBAI2012 13 0.025836265 0.011880315 0.003295007 0.007179203
8 ambient Dominant RBAI2013 13 0.025992636 0.015162901 0.004205432 0.009162850
9 ambient Codominant RBAI2012 26 0.018067329 0.011830940 0.002320238 0.004778620
10 ambient Codominant RBAI2013 26 0.015595275 0.012467140 0.002445007 0.005035587
11 ambient Subordinate RBAI2012 33 0.006073904 0.008287442 0.001442658 0.002938599
12 ambient Subordinate RBAI2013 35 0.003239033 0.006846507 0.001157271 0.002351857
I've tried the following code, resulting this plot:
p <- ggplot(growth,aes(class,value,colour=treatment,group=variable))
pd<-position_dodge(.9)
# se= standard error; ci=confidence interval
p + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") + geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1))
The lines should link the points of their same color within each x-axis category, but clearly they don't. Please, could you help me draw the lines properly (e.g blue with blue and red with red within "Dominant" class, different lines for "codominant" class.
Also, do you know how to include in the x-labels the variables I am grouping with (i.e. "RBAI2012","RBAI2013"?
Many thanks
To distinguish also between different of levels of 'variable' you may introduce a fourth aesstetic: shape. First define a new grouping variable, a combination of 'treatment' and 'variable', which has four levels. Map group, colours and shape to this variable. Then use scale_colour_manual and scale_shape_manual to set two levels of colours, which corresponds to the two levels of 'treatment'. Similarly, define two 'variable' shapes.
growth$grp <- paste0(growth$treatment, growth$variable)
ggplot(data = growth, aes(x = class, y = value, group = grp,
colour = grp, shape = grp)) +
geom_point(size = 4, position = pd) +
geom_line(position = pd) +
geom_errorbar(aes(ymin = value - se, ymax = value + se), colour = "black",
position = pd, width = 0.1) +
scale_colour_manual(name = "Treatment:Variable",
values = c("red", "red","blue", "blue")) +
scale_shape_manual(name = "Treatment:Variable",
values = c(19, 17, 19, 17))
theme_bw() +
theme(legend.position = c(1,1), legend.justification = c(1,1))
One option is using a facet plot like so:
p <- ggplot(growth, aes(x = class, y = value, group = treatment, color = treatment))
p + geom_point(size = 4) + facet_grid(. ~ variable) + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,colour="black") + geom_line()
If you want it on one graph, another option is defining a new variable that combines treatment and variable:
growth$treatment_variable <- paste(growth$treatment, growth$variable)
p <- ggplot(growth, aes(x = class, y = value, group = treatment_variable, colour = treatment_variable))
pd<-position_dodge(.2)
p + geom_point(size = 4, position=pd) + geom_errorbar(aes(ymin=value-se, ymax=value+se), width=.1, position=pd, colour="black") + geom_line(position=pd)
You have too many grouping variables (variable and treatment) and including them in a single plot may be a bit confusing. You might want to use faceting, like this:
p <- ggplot(growth,aes(class,value,colour=treatment,group=treatment))
pd<-position_dodge(.9)
p +
geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") +
geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
facet_grid(variable~treatment)
It is possible to do this, but you need to hack it since you're essentially plotting a geom_line() on different groupings (variable + treatment) than with the geom_point() and geom_errorbar() calls.
You need to use ggplot_build() to get back the rendered data and draw a geom_line(), based on the existing points data, grouped by colour:
p <- ggplot(growth) # move the aes() into the individual charts
pd<-position_dodge(.9) # leave dodge as is
se<-0.01 # faked this
p <- p +
geom_point(aes(x=factor(class),y=value,colour=treatment,group=variable),position=pd,size=4) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
geom_errorbar(aes(x=factor(class),ymin=value-se,ymax=value+se,colour=treatment,group=variable),position=pd,width=.1,colour="black")
b<-ggplot_build(p)$data[[1]] # get the ggpolt rendered data for this panel
p + geom_line(data=b,aes(x,y,group=colour), color=b$colour) # plot the lines

multiple ggplot2 in 1 data frame

df
primer timepoints mean sde
Acan 0 1.000000e+00 0.000000e+00
Acan 20 9.547922e-01 1.729115e-01
Acan 40 1.936454e+00 9.934593e-01
Acan 60 1.261360e+00 2.232165e-01
Acan 120 2.219807e+00 5.915425e-01
Acan 240 2.540490e+00 5.651534e-01
Acan 360 1.518923e+00 1.522455e-01
Actb 0 1.000000e+00 0.000000e+00
Actb 20 1.061931e+00 4.362860e-02
Actb 40 8.835103e-01 1.196449e-01
Actb 60 8.889279e-01 1.401378e-01
Actb 120 1.001135e+00 7.770563e-02
Actb 240 8.551348e-01 1.884853e-01
Actb 360 7.343955e-01 1.824412e-01
This treats the data like each primer is in 1 df, but I want to make a scatter plot using ggplot2 for each unique primer (the y axis would be column mean and the x axis would be timepoints), could I get lapply to work here?
If I could just lapply a function somehow that would be ideal, a list of plots.
Here's the code I've been using for ggplot, in my attempts to loop this
plot_gg <- function(x){
ggplot(df,aes(x=timepoints,mean)) +
geom_point() +
geom_line() +
scale_x_continuous(name='x axis') +
scale_y_continuous(name='y axis') +
geom_errorbar(aes(ymin=mean-sde,ymax=mean+sde),width=2) +
opts(title = primer)
}
desired_list <- lapply(unique(df$primer),plot_gg,df)
this is pretty wrong, but, I'm not sure if I should subset the df first according to each individual primer. or if it would be easier to do w/ ggplot in the structure the data is in
if you could help direct me a little bit that would be great
I think the missing pieces are a need to redo the definition of arguments to geom_errorbar and add the use of facet_wrap. If you specify the number of columns and rows in the layout of facet_warp you can get multiple pages. Another way to print multiple pages is with the grid::grid.newpage() function.
ggplot(df, aes(x = timepoints, y = mean, ymin = mean - sde,
ymax = mean + sde)) +
geom_errorbar() + geom_point() + geom_line() +
facet_wrap(~ primer) +
xlab('x axis') + ylab('y axis') + opts(title = "primer")
For the multi-page request added in the comment below and using #Thierry's edits:
pdf("twopage.pdf", onefile=TRUE)
for ( i in unique(df$primer) ) {
g <- ggplot(df[df$primer == i, ], aes(x = timepoints, y = mean, ymin = mean - sde,
ymax = mean + sde)) +
geom_errorbar() + geom_point() + geom_line() +
facet_wrap(~ primer, ncol=1, nrow=1) +
xlab('x axis') + ylab('y axis') + opts(title = "primer")
print(g) ; cat(paste("printing", i, "\n"))}
dev.off()

Resources