Redistribute columns along x axis using ggplot2 - r

Using this code:
ggplot(total_reads, aes(x=Week, y=Reads)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_log10(breaks=breaks, minor_breaks=minor_breaks) +
scale_x_continuous() +
facet_grid(~PEDIS, scales="free_x", space = "free_x") +
theme_classic() +
ylab("Total Bacterial Reads")
I produced this graph:
How do I remove the empty spaces in the first facet (pedis1) and make sure only the relevant labels are on the x axis (ie 0,3,6,12,13)?

The quick answer is because your x axis values (total_reads$Week) is an integer/number. This automatically sets the scale to be continuous and therefore you have spacing according to the distance on the scale (like any numeric scale). If you want to have the bars right next to one another and remove the white space, you'll need to set the x axis to a discrete variable when plotting. It's easiest to do this by mapping factor(Week) right in the aes() declaration.
Here's an example with that modification as well as some other suggestions described below:
total_reads <- data.frame(
Week=c(0,3,6,12,13),
Reads=c(100,110,100,129,135),
PEDIS=c(rep('PEDIS1', 3), rep('PEDIS2',2))
)
ggplot(total_reads, aes(x=factor(Week), y=Reads)) +
geom_col() +
facet_grid(~PEDIS, scales="free_x", space="free_x") +
theme_classic()
A few other notes on what you see changed here:
Use geom_col(), not geom_bar(). If you check out the documentation associated with the geom_bar() function, you can see it mentions that geom_bar() is used for showing counts of observations along a single axis, whereas if you want to show value, you should use geom_col(). You get the same effect with geom_col() as if you use geom_bar(stat="identity").
Remove scale_x_continuous(). Not sure why you have this there anyway, but if your column Week is numeric, it would default to use this scale anyway. If you do use the sale, you will ask ggplot to force a continuous scale - apparently not what you want here.

Related

How to make stacked bar chart with count values on y axis>

I'm trying to create a stacked barchart with gene sequencing data, where for each gene there is a tRF.type and Amino.Acid value. An example data set looks like this:
tRF <- c('tRF-26-OB1690PQR3E', 'tRF-27-OB1690PQR3P', 'tRF-30-MIF91SS2P46I')
tRF.type <- c('5-tRF', 'i-tRF', '3-tRF')
Amino.Acid <- c('Ser', 'Lys', 'Ser')
tRF.data <- data.frame(tRF, tRF.type, Amino.Acid)
I would like the x-axis to represent the amino acid type, the y-axis the number of counts of each tRF type and the the fill of the bars to represent each tRF type.
My code is:
ggplot(chart_data, aes(x = Amino.Acid, y = tRF.type, fill = tRF.type)) +
geom_bar(stat="identity") +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")
However, it generates this graph, where the y-axis is labelled with the categories of tRF type. How can I change my code so that the y-axis scale is numerical and represents the counts of each tRF type?
Barchart
OP and Welcome to SO. In future questions, please, be sure to provide a minimal reproducible example - meaning provide code, an image (if possible), and at least a representative dataset that can demonstrate your question or problem clearly.
TL;DR - don't use stat="identity", just use geom_bar() without providing a stat, since default is to use the counts. This should work:
ggplot(chart_data, aes(x = Amino.Acid, fill = tRF.type)) + geom_bar()
The dataset provided doesn't adequately demonstrate your issue, so here's one that can work. The example data herein consists of 100 observations and two columns: one called Capitals for randomly-selected uppercase letters and one Lowercase for randomly-selected lowercase letters.
library(ggplot2)
set.seed(1234)
df <- data.frame(
Capitals=sample(LETTERS, 100, replace=TRUE),
Lowercase=sample(letters, 100, replace=TRUE)
)
If I plot similar to your code, you can see the result:
ggplot(df, aes(x=Capitals, y=Lowercase, fill=Lowercase)) +
geom_bar(stat="identity")
You can see, the bars are stacked, but the y axis is all smooshed down. The reason is related to understanding the difference between geom_bar() and geom_col(). Checking the documentation for these functions, you can see that the main difference is that geom_col() will plot bars with heights equal to the y aesthetic, whereas geom_bar() plots by default according to stat="count". In fact, using geom_bar(stat="identity") is really just a complicated way of saying geom_col().
Since your y aesthetic is not numeric, ggplot still tries to treat the discrete levels numerically. It doesn't really work out well, and it's the reason why your axis gets smooshed down like that. What you want, is geom_bar(stat="count").... which is the same as just using geom_bar() without providing a stat=.
The one problem is that geom_bar() only accepts an x or a y aesthetic. This means you should only give it one of them. This fixes the issue and now you get the proper chart:
ggplot(df, aes(x=Capitals, fill=Lowercase)) + geom_bar()
You want your y-axis to be a count, not tRF.type. This code should give you the correct plot: I've removed the y = tRF.type from ggplot(), and stat = "identity from geom_bar() (it is using the default value of stat = "count instead).
ggplot(tRF.data, aes(x = Amino.Acid, fill = tRF.type)) +
geom_bar() +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")

Adding text to facetted histogram

Using ggplot2 I have made facetted histograms using the following code.
library(ggplot2)
library(plyr)
df1 <- data.frame(monthNo = rep(month.abb[1:5],20),
classifier = c(rep("a",50),rep("b",50)),
values = c(seq(1,10,length.out=50),seq(11,20,length.out=50))
)
means <- ddply (df1,
c(.(monthNo),.(classifier)),
summarize,
Mean=mean(values)
)
ggplot(df1,
aes(x=values, colour=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo,ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1)
The vertical line showing means per month is to stay.
But I want to also add text over these vertical lines displaying the mean values for each month. These means are from the 'means' data frame.
I have looked at geom_text and I can add text to plots. But it appears my circumstance is a little different and not so easy. It's a lot simpler to add text in some cases where you just add values of the plotted data points. But cases like this when you want to add the mean and not the value of the histograms I just can't find the solution.
Please help. Thanks.
Having noted the possible duplicate (another answer of mine), the solution here might not be as (initially/intuitively) obvious. You can do what you need if you split the geom_text call into two (for each classifier):
ggplot(df1, aes(x=values, fill=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo, ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="a",]) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="b",])
I'm assuming you can format the numbers to the appropriate precision and place them on the y-axis where you need to with this code.

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

facet_wrap: How to add y axis to every individual graph when scales="free_x"?

The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.

Different fills for different facet_grids using geom_bar

I'm new with ggplot2 and I have a question that I couldn't find the answer.
I've created the following toy data to help in the explanation:
data <- data.frame(tech=c(rep(letters[1:15],2)),
sep=c(rep(c("SitutationA", "SitutationB"),each=15)),
error=c(runif(15,min=-0.2, max=0.5), runif(15, min=0.3, max=1)))
I want to plot a geom_bar graph showing the "error" (axis y) for each technique "tech" (axis x) divided in two different situations (SituationA and SituationB) using facet_grid. The color (fill) of each bar should represent the "error" of each technique, and not the technique (as a factor). The errors for situations A and B are measured in different scales. However, in my code, an error of the same value have the same color in both situations. I do not want this behavior since they were measured in different scales. Thus, I would like that the colors in Situations A and B were independents.
The following code plots the graph, but using the same color for both situations.
ggplot(data, aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)
How could I use different continuous fills for each facet (situationA and situationB)?
Thank you.
You can't have two different fill scales on the same plot.
Solution to the problem could be to make two plots and then put them together with grid.arrange() from library gridExtra.
In the first plot put only values of SitutationA. Changed y scale to show values with two numbers after decimal point (to be the same as for second plot). Removed x axis title, texts and ticks and changed plot margins - set bottom margin to -0.4 to reduce space between plots.
library(grid)
library(gridExtra)
p1<-ggplot(subset(data,sep=="SitutationA"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)+
scale_y_continuous(breaks=c(0,0.25,0.50))+
theme(axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin=unit(c(1,1,-0.4,1),"lines"))
For the second plot (SitutationB) changed top plot margin to -0.4 to reduce space between plots. Then changed scale_fill_continuous() and provided new colors.
p2<-ggplot(subset(data,sep=="SitutationB"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE,low="red",high="purple") +
theme(plot.margin=unit(c(-0.4,1,1,1),"lines"))
Now put both plots together.
grid.arrange(p1,p2)

Resources