positioning labels on geom_bar - r

I am trying to create a horizontal bar chart with category labels using ggplot.
I have been able to create the plot without hassles, and can put labels on, however I suffer issues with the formatting. Ultimately I would like to have my label within the bar if it fits, otherwise just outside the bar without truncating the label.
The following are what I have tried so far.
Data
dt1 <- data.table(x=c("a","b","c","d","e"), y=c(43,52,296,102,157), y2=c(50,10,100,45,80))
Chart 1
ggplot() + geom_bar(data=dt1, aes(x=x, y=y), stat="identity",fill="red") + coord_flip() +
geom_text(data=dt1, aes(x=x, y=y, label=paste0("$",y," from ",y2," records")),hjust=0)
As you can see below the labels get truncated.
Chart 2
I then came across this question which was helpful and made me realise that I was setting the label position based on my y variable so I have hardcoded it now and use hjust to pad it from the axis.
ggplot() + geom_bar(data=dt1, aes(x=x, y=y), stat="identity",fill="red") + coord_flip() +
geom_text(data=dt1, aes(x=x, y=0, label=paste0("$",y," from ",y2," records")),hjust=-0.1)
But you can see below that only 2 of the labels fit within the bar, so I would prefer the others to be placed at the end, on the outside of the bar like in chart 1.
Is there a programatic way I can get the best of both worlds from chart 1 and chart 2?

Move the hjust into the aes so we may vary off the value, then move it if the bar is a certain way past the max. It’s a bit hacky still, since it makes assumptions about the scaling, but looks pretty good. Divisor may need tweaking:
library(tidyverse)
dt1 <- data.frame(x=c("a","b","c","d","e"), y=c(43,52,296,102,157), y2=c(50,10,100,45,80))
ggplot() +
geom_bar(data=dt1, aes(x=x, y=y), stat="identity",fill="red") +
coord_flip() +
geom_text(
data=dt1,
aes(
x=x, y=y,
label=paste0("$",y," from ",y2," records"),
hjust=ifelse(y < max(dt1$y) / 1.5, -0.1, 1.1), # <- Here lies the magic
),
)
Results in this plot:

Here is one way. It is a bit lengthy approach, but you can subset your data for geom_text. In this way, you can manually assign the position you want for each bar.
ggplot() +
geom_bar(data = dt1, aes(x=x, y=y), stat="identity",fill="red") +
coord_flip() +
geom_text(data = filter(dt1, x == "e" | x == "c"),
aes(x=x, y=0, label=paste0("$",y," from ",y2," records")),hjust = -0.1) +
geom_text(data = filter(dt1, x == "d"),
aes(x=x, y=0, label=paste0("$",y," from ",y2," records")),hjust = - 1.1) +
geom_text(data = filter(dt1, x == "b"),
aes(x=x, y=0, label=paste0("$",y," from ",y2," records")),hjust = - 0.6) +
geom_text(data = filter(dt1, x == "a"),
aes(x=x, y=0, label=paste0("$",y," from ",y2," records")),hjust = - 0.5)

I'm going to misread programmatic as 'pragmatic'. Adding "+ scale_y_continuous(limits=c(0,max(dt1$y)+100))" created sufficient room for the labels. I lack the reputation to upload the plot.
ggplot() + geom_bar(data=dt1, aes(x=x, y=y), stat="identity",fill="red") + coord_flip() + geom_text(data=dt1, aes(x=x, y=y, label=paste0("$",y," from ",y2," records")),hjust=0) + scale_y_continuous(limits=c(0,max(dt1$y)+100))
Edit 2; I altered the code to retrieve the maximum value and add 100 to it. It's still not fitting the plot to include the text specifically but it'll work with fixed labels.

Related

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

ggplot2 z order of multiple geoms (background to foreground)

I am trying to plot multiple lines with surrounding area, using ggplot2, with geom_ribbon for the are, and a centerline with geom_line. The values are overlapping, but I'd like each ribbon/line combination to be either bottom or top as a combination.
Here's a reproducible example:
library(ggplot2)
x <- 1:10
y <- c(1+x/100, 2-x, .1*x^2)
data <- data.frame(x=rep(x,3), y=y, lower=y-1, upper=y+1, color=rep(c('green', 'blue', 'yellow'), each=10))
In the example I can get the plot I want by using this code:
ggplot() +
geom_ribbon(data=data[data$color=='green',],aes(x=x, ymin=lower, ymax=upper, fill=paste0('light',color))) +
geom_line(data=data[data$color=='green',],aes(x=x, y=y, col=color)) +
geom_ribbon(data=data[data$color=='blue',],aes(x=x, ymin=lower, ymax=upper, fill=paste0('light',color))) +
geom_line(data=data[data$color=='blue',],aes(x=x, y=y, col=color)) +
geom_ribbon(data=data[data$color=='yellow',],aes(x=x, ymin=lower, ymax=upper, fill=paste0('light',color))) +
geom_line(data=data[data$color=='yellow',],aes(x=x, y=y, col=color)) +
scale_color_identity() +
scale_fill_identity()
But when I keep it simple and us this this code
plot <- ggplot(data=data) +
geom_ribbon(aes(x=x, ymin=lower, ymax=upper, fill=paste0('light',color))) +
geom_line(aes(x=x, y=y, col=color)) +
scale_color_identity() +
scale_fill_identity()
the lines of the background data go over the 'top' ribbons, or if I switch the geom_line and geom_ribbon, my middle-lines are no longer visible.
For this example, the lengthy call works, but in my real data, I have a lot more lines, and I'd like to be able to switch lines from background to foreground dynamically.
Is there any way that I can tell ggplot2 that there is an ordering that has to switch between my different geoms?
P.S. I can't post images yet, sorry if my question seems unclear.
You could save some typing with a loop
ggplot(data=data) +
purrr::map(.x = split(data, f = data$color),
.f = function(d){
list(geom_ribbon(data=d,aes(x=x, ymin=lower, ymax=upper), fill=paste0('light',unique(d$color))),
geom_line(data=d,aes(x=x, y=y), col=unique(d$color)))
})

ggplot2: Shift the baseline of barplot (geom_bar) to the minimum data value

I'm trying to generate a bar plot using geom_bar. My bars have both negative and positive values:
set.seed(1)
df <- data.frame(y=log(c(runif(6,0,1),runif(6,1,10))),se=runif(12,0.05,0.1),name=factor(rep(c("a","a","b","b","c","c"),2),levels=c("a","b","c")),side=factor(rep(1:2,6),levels=1:2),group=factor(c(rep("x",6),rep("y",6)),levels=c("x","y")),stringsAsFactors=F)
This plot command plots the positive bars to face up and the negative ones to face down:
library(ggplot2)
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=y+se,ymin=y-se)
ggplot(df,aes(x=name,y=y,group=interaction(side,name),col=group,fill=group))+facet_wrap(~group)+geom_bar(width=0.6,position=position_dodge(width=1),stat="identity")+
geom_bar(position=dodge,stat="identity")+geom_errorbar(limits,position=dodge,width=0.25)
My question is how do I set the base line to the minimum of all bars instead of at 0 and therefre have the red bars facing up?
You can subtract min(df$y) from each value so that the data are shifted to a baseline of zero, but then relabel the y-axis to the actual values of the points. The code to do it is below, but I wouldn't recommend this. It seems confusing to have bars emanating from a non-zero baseline, as the lengths of the bars no longer encode the magnitudes of the y values.
ggplot(df, aes(x=name,y=y - min(y),group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_bar(position=dodge, stat="identity", width=0.8) +
geom_errorbar(aes(ymin=y-se-min(y), ymax=y+se-min(y)),
position=dodge, width=0.25, colour="black") +
scale_y_continuous(breaks=0:4, labels=round(0:4 + min(df$y), 1)) +
geom_hline(aes(yintercept=0))
Another option is to use geom_linerange which avoids having to shift the y-values and relabel the y-axis. But this suffers from the same distortions as the bar plot above:
ggplot(df, aes(x=name, group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_linerange(aes(ymin=min(y), ymax=y, x=name, xend=name), position=dodge, size=10) +
geom_errorbar(aes(ymin=y-se, ymax=y+se), position=dodge, width=0.25, colour="black") +
geom_hline(aes(yintercept=min(y)))
Instead, it seems to me points would be more intuitive and natural than bars here:
ggplot(df, aes(x=name,y=y,group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_hline(yintercept=0, lwd=0.4, colour="grey50") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(position=dodge)
This simple hack also works:
m <- min(df$y) # find min
df$y <- df$y - m
ggplot(df,aes(x=name,y=y,group=interaction(side,name),col=group,fill=group))+
facet_wrap(~group)+
geom_bar(width=0.6,position=position_dodge(width=1),stat="identity")+
geom_bar(position=dodge,stat="identity")+
geom_errorbar(limits,position=dodge,width=0.25) +
scale_y_continuous(breaks=seq(min(df$y), max(df$y), length=5),labels=as.character(round(seq(m, max(df$y+m), length=5),2))) # relabel
I ran into the same problem and discovered you can also easily do this using geom_crossbar.
As long as color and fill are the same you don't see the break in the crossbar (set with y aesthetic) so they look exactly like bars.
library(ggplot2)
dodge <- position_dodge(width=0.9)
limits <- aes(ymax = y+se, ymin = y-se)
df$ymin <- min(df$y)
ggplot(df, aes(x = name, ymax = y, y = y, ymin = ymin, group = interaction(side,name), col = group, fill = group)) +
facet_wrap(~group) +
geom_crossbar(width=0.6,position=position_dodge(width=1),stat="identity") +
geom_errorbar(limits, color = 'black', position = dodge, width=0.25)
ggplot output

Edit 2 stat_hex_bin geoms separately ggplot2

I start by giving you my example code:
x <- runif(1000,0, 5)
y <- c(runif(500, 0, 2), runif(500, 3,5))
A <- data.frame("X"=x,"Y"=y[1:500])
B <- data.frame("X"=x,"Y"=y[501:1000])
ggplot() +
stat_bin_hex(data=A, aes(x=X, y=Y), bins=10) +
stat_bin_hex(data=B, aes(x=X, y=Y), bins=10) +
scale_fill_continuous(low="red4", high="#ED1A3A")
It produces the following plot:
Now I want the lower hexagons to follow a different scale. Namely ranging from a dark green to a lighter green. How can I achieve that?
Update:
As you can see from the answers so far, I am asking myself whether there is a solution without using alpha scales. Also, using two plots with no margin or something similar is not an option for my specific application. Though they both are legitimate answers :)
Rather than trying to get two different fill scales in one plot you could alter the colours of the lower values, after the plot has been built. The basic idea is have two plots with the differing fill scales and then copy accross certain details from one plot to the other.
# Base plot
p <- ggplot() +
stat_bin_hex(data=A, aes(x=X, y=Y), bins=10) +
stat_bin_hex(data=B, aes(x=X, y=Y), bins=10)
# Produce two plots with different fill colours
p1 <- p + scale_fill_continuous(low="red4", high="#ED1A3A")
p2 <- p + scale_fill_continuous(low="darkgreen", high="lightgreen")
# Get fill colours for second plot and overwrite the corresponding
# values in the first plot
g1 <- ggplot_build(p1)
g2 <- ggplot_build(p2)
g1$data[[1]][,"fill"] <- g2$data[[1]][,"fill"]
# You can draw this now but there is only one legend
grid.draw(ggplot_gtable(g1))
To have two legends you can join the legends from the two plots together
# Bind the legends from the two plots together
g1 <- ggplot_gtable(g1)
g2 <- ggplot_gtable(g2)
g1$grobs[[grep("guide", g1$layout$name )]] <-
rbind(g1$grobs[[grep("guide", g1$layout$name )]],
g2$grobs[[grep("guide", g2$layout$name )]] )
grid.newpage()
grid.draw(g1)
Giving (from set.seed(10) prior to data generation)
This should provide more or less what you want
ggplot() +
stat_bin_hex(data=A, aes(x=X, y=Y, alpha=..count..), bins=10,fill="green") +
stat_bin_hex(data=B, aes(x=X, y=Y, alpha=..count..), bins=10,fill="red")
To avoid that the grey is disturbing due to the alpha one could underlay the plot with another white plot at the same location and darken the colours a bit, as suggested by the TO in the comments
#just the red to show the impact due to scale_alpha
ggplot() +scale_alpha_continuous(range=c(0.5,1))+ stat_bin_hex(data=A, aes(x=X, y=Y), bins=10,fill="white",show.legend = TRUE) +
+ stat_bin_hex(data=A, aes(x=X, y=Y, alpha=..count..), bins=10,fill="red",show.legend = TRUE) +
+ stat_bin_hex(data=B, aes(x=X, y=Y, alpha=..count..), bins=10,fill="green", show.legend=TRUE)+guides(fill=FALSE, alpha=FALSE)
An alternative, if you want more options to play with the colours, just create two plots and remove all the space between the two plots when combined with grid.arrange().
p1 <- ggplot() + stat_bin_hex(data=B, aes(x=X, y=Y), bins=10) +
scale_fill_continuous(low="red4", high="#ED1A3A") + xlab("") + theme(axis.text.x=element_blank(), axis.ticks.x=element_blank(), plot.margin=unit(c(1,1,-0.5,1), "cm")) + scale_y_continuous(limits = c(2.5, 5.5))
p2 <- ggplot() + stat_bin_hex(data=A, aes(x=X, y=Y), bins=10) + scale_fill_continuous(low="darkgreen", high="green") + theme(plot.margin=unit(c(-0.5,1,1,1), "cm")) + scale_y_continuous(limits = c(-0.5, 2.5))
grid.arrange(p1,p2)

y axis limits with bar graphs

I can display a bar graph, but the y axis always starts at zero, which doesn't make sense with this data:
y <- data.frame(x=c("a","b","c","d","e","f"), y=c(500,501,502,503,504,505))
ggplot(y, aes(x=x, y=y)) +
stat_summary(fun.y=mean, geom="bar")
Ideally, R would automatically set axis limits like it usually does. If I try to set manually, like as follows, my bars disappear. Any idea why?
y <- data.frame(x=c("a","b","c","d","e","f"), y=c(500,501,502,503,504,505))
ggplot(y, aes(x=x, y=y)) +
stat_summary(fun.y=mean, geom="bar") +
scale_y_continuous(limits=c(490,510))
I think you can use ylim as follows:
y <- data.frame(x=c("a","b","c","d","e","f"), y=c(500,501,502,503,504,505))
ggplot(y, aes(x=x, y=y)) +
stat_summary(fun.y=mean, geom="bar") + coord_cartesian(ylim=c(490,510))
The resulting output is as follows:

Resources