Adding group mean lines to geom_bar plot and including in legend - r

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks

To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")

Related

Linking legend to plot with a line or an arrow

Context: when you have "many" categories it can become hard to distinguish them in a bar plot. I found the plot below dealing with this situation quite nicely by linking the legend with categories in the plot.
Question: is it possible to do something similar with ggplot2?
With ggplot2 it is straighforward to get this:
But I really do not know were to start to acheive the result shown in the 1st plot.
Here is some code to sort it out:
library(ggplot2)
ggplot(data = mtcars, aes(x = vs, y = disp, fill = factor(carb))) +
geom_bar(stat = "identity")
Expected output (not as nice as the one presented above but it shows the idea)
There is no proper legend on the axes in any of the plots, but my guess is that the desired chart is based on relative frequencies, while your plot seems to show absolute frequencies, though I'm not sure about that.
Assuming that you want to produce a stacked bar chart giving the (relative) number of observations of a categorial variable in two groups, there are two ways to get the two stacked bars to be of the same height:
There need to be the exact same amount of observations in both of
them. Then you can use absolute frequencies.
The absolute frequencies need to be transformed to relative frequencies (or percent) by dividing them by the total number of observations in each group.
You can calculate the relative frequencies yourself and use them as the y-values.
Or refer to this post, as it seems to describe exactly what you want using ggplot2.

ggplot histogram with labels

I want to label to different histograms that are in the same plot. By labels, I want to identify by colors each histogram, for example one green that corresponds to x and one red that corresponds to y.
I tried to use the function label. But it is not working.
ggplot() +
geom_histogram(data=junk, aes(x),fill="green", alpha=.2) +
geom_histogram(data=jun, aes(y), fill="red", alpha=.2)+
labs(x = "something") +
ggtitle("title")
I expect to have both histograms, one green and the other one red, and labels in the right describing each histogram.
for this, you need to have the data in long format, so the data that should make up the green histogram and the data that make up the red one in a data frame below one another, and another column, that defines the groups.
df=data.frame(values=rnorm(20000),colorby=c("red_values","green_values"))
ggplot(data=df,aes(x=values,fill=colorby))+
geom_histogram(position="dodge")+
scale_fill_manual(values=c("red_values"="red","green_values"="green"))
For the position argument you could also try if "stack" fits your needs better.

Shortening mean line in two level factor within facet panels

I have a factor of time that has two levels, admission and discharge. I'm using facet_grid to create four panels in which my continuous Y will be looked at by time. I want to be able to add a mean line to each of the two time levels in each panel. My problem is that the mean line spans the entire width of the panel and I'd like to shorten it to just remain within the area of the dots.
Here is the code:
plot <- ggplot(data.in, aes(x=Time, y=Y)) + geom_point()
plot <- plot + facet_grid(.~FacetGroup)
data_hline <- aggregate(data.in$Y~data.in$Time + data.in$FacetGroup, FUN=mean)
plot + geom_hline(data=data_hline, aes(yintercept=Y))

Revisiting R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot

Please don't tag this as a duplicate of R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot : some people commented that the example in there was too long/convoluted/weird, so here is a simpler example that reproduces the problem. If a moderator think it is a good idea I will delete the original (longer) question.
I am trying to create a function that does a stacked bar plot of some yearly measures. The function takes as parameters the data and the min and max year I want to plot. The problem is that for some combination of the years the bars get weird.
Here is the code, it defines the function, creates a simple simulated dataset and creates four plots with different parameters. The resulting images are below.
library(ggplot2)
library(plyr)
# Plot either all data or select by name.
doPlot <- function(data,minYear,maxYear) {
title = paste("Bob's Performance ",minYear,"-",maxYear)
# Aggregate quantity by year and category
byYear <- aggregate(Quantity ~ Year+Category, data, sum)
# Get coordinates for numbers in stacked bars
byYear = ddply(byYear, "Year", mutate, label_y = cumsum(Quantity))
g <- ggplot(byYear, aes(x=Year,y=Quantity))
g <- g + geom_bar(stat="identity",aes(fill=Category), colour="black") +
ggtitle(title) +
scale_fill_discrete("Category",labels=c("Sheep","Cactus","Chicken"),drop=FALSE,c=45, l=80)+
scale_x_continuous(name="Year", limits=c(minYear,maxYear), breaks=seq(minYear,maxYear,1)) +
geom_text(aes(label=Quantity,y=label_y), vjust=1.3,size=6)
print(g)
}
consts = paste('"Category","Year","Name","Quantity"\n',
'CACTUS,1997,Bob,45\n',
'CHICKEN,1997,Bob,6\n',
'SHEEP,1998,Bob,2\n',
'SHEEP,1999,Bob,4\n',
'SHEEP,2005,Bob,5\n',sep = "")
data <- read.csv(text=consts,header = TRUE)
data$Category <- factor(data$Category, levels = c("SHEEP", "CACTUS", "CHICKEN"))
# This works OK
doPlot(data,1996,2006)
# This don't: bars on left and rightside disappears
doPlot(data,1997,2005)
# This don't: left bar disappears but it seems it was not plotted.
doPlot(data,1998,2000)
# This is weird: why does the bar width uses over 5 years?
doPlot(data,1999,2011)
The first plot is OK since the data is all inside the years range:
In the second plot the years range is exactly the same as the range of years in the data. The leftmost and rightmost bars are not plotted, but the numbers are.
In the third plot the year range is very narrow -- again leftmost and/or rightmost bars are not plotted. There's a hint here that the bar width could not be fitted in the plot -- see the width for 1999!
The fourth plot the year range is wider, but again leftmost and/or rightmost bars are not plotted, and the one bar that is plotted covers several years.
I can make the plot sort of work by using always an extended range for years, but this is bugging me. I guess I didn't specify something that controls the bar widths, but what?
I noticed that there are similar problems with the leftmost and rightmost bars, e.g. In ggplot2 - how to ensure geom_errorbar displays bar limits for all points when controlling x-axis with xlim() , and the solutions are similar, but I believe there ought to be a better way.
I must point out that using
scale_x_continuous(name="Year", breaks=seq(minYear,maxYear,1)) +
coord_cartesian(xlim=c(minYear,maxYear)) +
instead of
scale_x_continuous(name="Year", limits=c(minYear,maxYear),breaks=seq(minYear,maxYear,1)) +
solves the "bar over several years" issue of the fourth plot, but causes parts of the leftmost/rightmost bars to be plotted:
thanks
Rafael

Readjusting one plot from a two-plots graph

I have two plots in the same graph. I need to adjust the first plot so that the y axis is in function of the number of cases (which is lower than for the second plot). As for now the first plot is difficult to read because it is much smaller than the second one. I would want it to be proportional.
all.scores = rbind(UK_Together.scores, YesScotland.scores)
ggplot(data=all.scores) + # ggplot works on data.frames, always
geom_bar(mapping=aes(x=score, fill=Parti), binwidth=1) +
facet_grid(Parti~.) + # make a separate plot for each hashtag
theme_bw() + scale_fill_brewer() # plain display, nicer colors
Anyone can help?

Resources