ggplot2: plotting bars when using stat_summary() - r

Here is my current script and the output:
ggplot(data.and.factors.prov,aes(x=assumptions,y=FP,
colour=factor(Design.Complexity))) +
stat_summary(fun.data=mean_cl_normal,position=position_dodge(width=0.5)) +
geom_blank() + scale_colour_manual(values=1:7,name='Design Complexity') +
coord_flip()
How can I have (horizontal) bars (starting at FP=0 and ending at the point position) instead of points ? (I don't want to lose the error bars)
I'd like to give you my data.and.factors.prov data.table but it is too big to be posted ! If you need a reproducible example, please let me know how I can give you my data set ?!

For the stat_summary() default geom is "pointrange". To get the bars and errorbars one solution is to use two stat_summary() calls - one to make errorbars and second to calculate just mean values and plot bars. You will need also to adjust width= inside the position_dodge() and fill= to the same factor as for colour= to change filling of bars.
Here is an example with mtcars data.
ggplot(mtcars,aes(x=factor(cyl),y=mpg,colour=factor(gear),fill=factor(gear))) +
stat_summary(fun.data=mean_cl_normal,position=position_dodge(0.95),geom="errorbar") +
stat_summary(fun.y=mean,position=position_dodge(width=0.95),geom="bar")+
coord_flip()

Related

Redistribute columns along x axis using ggplot2

Using this code:
ggplot(total_reads, aes(x=Week, y=Reads)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_log10(breaks=breaks, minor_breaks=minor_breaks) +
scale_x_continuous() +
facet_grid(~PEDIS, scales="free_x", space = "free_x") +
theme_classic() +
ylab("Total Bacterial Reads")
I produced this graph:
How do I remove the empty spaces in the first facet (pedis1) and make sure only the relevant labels are on the x axis (ie 0,3,6,12,13)?
The quick answer is because your x axis values (total_reads$Week) is an integer/number. This automatically sets the scale to be continuous and therefore you have spacing according to the distance on the scale (like any numeric scale). If you want to have the bars right next to one another and remove the white space, you'll need to set the x axis to a discrete variable when plotting. It's easiest to do this by mapping factor(Week) right in the aes() declaration.
Here's an example with that modification as well as some other suggestions described below:
total_reads <- data.frame(
Week=c(0,3,6,12,13),
Reads=c(100,110,100,129,135),
PEDIS=c(rep('PEDIS1', 3), rep('PEDIS2',2))
)
ggplot(total_reads, aes(x=factor(Week), y=Reads)) +
geom_col() +
facet_grid(~PEDIS, scales="free_x", space="free_x") +
theme_classic()
A few other notes on what you see changed here:
Use geom_col(), not geom_bar(). If you check out the documentation associated with the geom_bar() function, you can see it mentions that geom_bar() is used for showing counts of observations along a single axis, whereas if you want to show value, you should use geom_col(). You get the same effect with geom_col() as if you use geom_bar(stat="identity").
Remove scale_x_continuous(). Not sure why you have this there anyway, but if your column Week is numeric, it would default to use this scale anyway. If you do use the sale, you will ask ggplot to force a continuous scale - apparently not what you want here.

Why does this ggplot only plot the grid without the values?

I am trying to plot a bar chart in ggplot but I am continuously getting only the grid. This is apparently a demonstration about the draw nothing here but I would like to understand how to get the values visible in the simplest way.
library(ggplot2)
testData<-data.frame(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17))
bar <- ggplot(data=testData, aes(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17), fill = "#FFCC00"))
One way I can get the plots is the geom_bar
bar <- ggplot(data=testData, aes(x=c("a","b","c","d","e","f"), y=c(10,6,9,28,10,17), fill = "#FFCC00")) + geom_bar(stat="identity")
Why are the values not plotted on the first bar chart and how to fix it the simplest way? What is the idea behind of this way of plotting with + and what is it called?
With the ggplot2 package, calling ggplot() is only meant to call the basic grid; it's like taking out a piece of graph paper before drawing a graph. In either case, having the grid ready has nothing to do with plotting the graph. That's why running the following command will result in the empty grid in your first example:
ggplot(data=testData, aes(x=x, y=y, fill = "#FFCC00"))
It's not the same as using a function like plot() or hist(), which prep the grid and plot the data at the same time:
plot(x=x,y=y,data=testData)
hist(x=x,data=testData)
The "+" in ggplot is just a way to say that there are more arguments related to the ggplot that we want included on top of the first blank grid. That's why each line separated by a "+" is typically called a layer.
So, if we want to make a simple scatterplot, we add points on top of a grid:
testData<-data.frame(x=c(1:6), y=c(10,6,9,28,10,17))
ggplot(data=testData,aes(x=x,y=y)) +
geom_point()
Output:
If we want to add lines to that scatterplot, we can just add one line of code:
ggplot(data=testData,aes(x=x,y=y)) +
geom_point() +
geom_line()
Output:
We can keep adding layers like this if we want. Just note that they will print in the order that you type them (i.e. the first few lines will be below the lines printed after them):
ggplot(data=testData,aes(x=x,y=y)) +
geom_bar(stat="identity",fill="#00BFC4") +
geom_point() +
geom_line()
Output:
Also, note that it's recommended not to call your data multiple times within a ggplot call; that can lead to errors.
Don't use:
ggplot(data=testData, aes(x=c("a","b","c","d","e","f"),
y=c(10,6,9,28,10,17), fill = "#FFCC00")) +
geom_bar(stat="identity")
#or
ggplot(data=testData, aes(x=testData$x, y=testData$x, fill = "#FFCC00")) +
geom_bar(stat="identity")
Instead use:
ggplot(data=testData, aes(x=x, y=y, fill="#FFCC00")) +
geom_bar(stat="identity")
If you want to plot data from a data frame(s) not called within the first ggplot() line, then simply add a data argument to the "layers" that use that different data frame, like this:
ggplot(data=testData,aes(x=x,y=y)) +
geom_bar(stat="identity",fill="#00BFC4") +
geom_point(data=differentDf, aes(x=x,y=y)) +
geom_line(data=differentDf, aes(x=x,y=y))

geom_text positions per group

I am using geom_line, geom_point, and geom_text to plot something like the picture below:
I am grouping, and coloring my data frame, but I want the geom_text not to be so close to each other.
I want to put the one text on top, and the other on bottom. Or at least, hide the one of the two. Is there any way I can do this?
You can specify custom aesthetics in different geom_text() calls. You can include only a subset of the data (such as just one group) in each call, and give each geom_text() a custom hjust or vjust value for each subset.
ggplot(dat, aes(x, y, group=mygroups, color=mygroups, label=mylabel)) +
geom_point() +
geom_line() +
geom_text(data=dat[dat$mygroups=='group1',], aes(vjust=1)) +
geom_text(data=dat[dat$mygroups=='group2',], aes(vjust=-1))

facet_wrap: How to add y axis to every individual graph when scales="free_x"?

The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.

ggplot2 position='dodge' producing bars that are too wide

I'm interested in producing a histogram with position='dodge' and fill=some factor (i.e. side-by-side bars for different subgroups within each bar/group), but ggplot2 gives me something like the first plot here, which has a rightmost bar that's too wide and reserves no space for the empty group, which I would like.
Here's a simple case:
df = data.frame(a=c('o','x','o','o'), b=c('a','b','a','b'))
qplot(a, data=df, fill=b, position='dodge')
From ggplot geom_bar - bars too wide I got this idea, and while it technically produces a bar of the same width, but preserves no space for the empty group:
ggplot(df, aes(x=a, fill=a))+
geom_bar(aes(y=..count../sum(..count..))) +
facet_grid(~b,scales="free",space="free")
How do I achieve what I want? Thanks in advance.
The default options in ggplot produces what I think you describe. The scales="free" and space="free" options does the opposite of what you want, so simply remove these from the code. Also, the default stat for geom_bar is to aggregate by counting, so you don't have to specify your stat explicitly.
ggplot(df, aes(x=a, fill=a)) + geom_bar() + facet_grid(~b)

Resources