Add data label to bar chart in R [duplicate] - r

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?

To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)

As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.

Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))

So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

Related

How do I add label for each of my bar plot? [duplicate]

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

How to plot multiple boxplots with numeric x values properly in ggplot2?

I am trying to get a boxplot with 3 different tools in each dataset size like the one below:
ggplot(data1, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
But I need to transform x-axis to log scale. For that, I need to numericize each dataset to be able to transform them to log scale. Even without transforming them, they look like the one below:
ggplot(data2, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
I checked boxplot parameters and grouping parameters of aes, but could not resolve my problem. At first, I thought this problem is caused by scaling to log, but removing those elements did not resolve the problem.
What am I missing exactly? Thanks...
Files are in this link. "data2" is the numericized version of "data1".
Your question was a tough cookie, but I learned something new from it!
Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.
This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).
# This is for pretty-printing the axis labels
my_labs <- function(x){
paste0(x/1000, "k")
}
levs <- unique(data2$dataset)
ggplot(data2, aes(x = dataset, y = time, color = tool,
group = interaction(dataset, tool))) +
geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
scale_y_log10() + theme_bw()
This plots

Best way to calculate number of facets in geom_hline/_vline

When I combine geom_vline() with facet_grid() like so:
DATA <- data.frame(x = 1:6,y = 1:6, f = rep(letters[1:2],3))
ggplot(DATA,aes(x = x,y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(xintercept = 2:3,
colour =c("goldenrod3","dodgerblue3"))
I get an error message stating Error: Aesthetics must be either length 1 or the same as the data (4): colour because there are two lines in each facet and there are two facets. One way to get around this is to use rep(c("goldenrod3","dodgerblue3"),2), but this requires that every time I change the faceting variables, I also have to calculate the number of facets and replace the magic number (2) in the call to rep(), which makes re-using ggplot code so much less nimble.
Is there a way to get the number of facets directly from ggplot for use in this situation?
You could put the xintercept and colour info into a data.frame to pass to geom_vline and then use scale_color_identity.
ggplot(DATA, aes(x = x, y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(data = data.frame(xintercept = 2:3,
colour = c("goldenrod3","dodgerblue3") ),
aes(xintercept = xintercept, color = colour) ) +
scale_color_identity()
This side-steps the issue of figuring out the number of facets, although that could be done by pulling out the number of unique values in the faceting variable with something like length(unique(DATA$f)).

facet_wrap Title wrapping & Decimal places on free_y axis (ggplot2)

I have a set of code that produces multiple plots using facet_wrap:
ggplot(summ,aes(x=depth,y=expr,colour=bank,group=bank)) +
geom_errorbar(aes(ymin=expr-se,ymax=expr+se),lwd=0.4,width=0.3,position=pd) +
geom_line(aes(group=bank,linetype=bank),position=pd) +
geom_point(aes(group=bank,pch=bank),position=pd,size=2.5) +
scale_colour_manual(values=c("coral","cyan3", "blue")) +
facet_wrap(~gene,scales="free_y") +
theme_bw()
With the reference datasets, this code produces figures like this:
I am trying to accomplish two goals here:
Keep the auto scaling of the y axis, but make sure only 1 decimal place is displayed across all the plots. I have tried creating a new column of the rounded expr values, but it causes the error bars to not line up properly.
I would like to wrap the titles. I have tried changing the font size as in Change plot title sizes in a facet_wrap multiplot, but some of the gene names are too long and will end up being too small to read if I cram them on a single line. Is there a way to wrap the text, using code within the facet_wrap statement?
Probably cannot serve as definite answer, but here are some pointers regarding your questions:
Formatting the y-axis scale labels.
First, let's try the direct solution using format function. Here we format all y-axis scale labels to have 1 decimal value, after rounding it with round.
formatter <- function(...){
function(x) format(round(x, 1), ...)
}
mtcars2 <- mtcars
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() + facet_wrap(~cyl, scales = "free_y")
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 1))
The issue is, sometimes this approach is not practical. Take the leftmost plot from your figure, for example. Using the same formatting, all y-axis scale labels would be rounded up to -0.3, which is not preferable.
The other solution is to modify the breaks for each plot into a set of rounded values. But again, taking the leftmost plot of your figure as an example, it'll end up with just one label point, -0.3
Yet another solution is to format the labels into scientific form. For simplicity, you can modify the formatter function as follow:
formatter <- function(...){
function(x) format(x, ..., scientific = T, digit = 2)
}
Now you can have a uniform format for all of plots' y-axis. My suggestion, though, is to set the label with 2 decimal places after rounding.
Wrap facet titles
This can be done using labeller argument in facet_wrap.
# Modify cyl into factors
mtcars2$cyl <- c("Four Cylinder", "Six Cylinder", "Eight Cylinder")[match(mtcars2$cyl, c(4,6,8))]
# Redraw the graph
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() +
facet_wrap(~cyl, scales = "free_y", labeller = labeller(cyl = label_wrap_gen(width = 10)))
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 2))
It must be noted that the wrap function detects space to separate labels into lines. So, in your case, you might need to modify your variables.
This only solved the first part of the question. You can create a function to format your axis and use scale_y_continous to adjust it.
df <- data.frame(x=rnorm(11), y1=seq(2, 3, 0.1) + 10, y2=rnorm(11))
library(ggplot2)
library(reshape2)
df <- melt(df, 'x')
# Before
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free")
# label function
f <- function(x){
format(round(x, 1), nsmall=1)
}
# After
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free") +
scale_y_continuous(labels=f)
scale_*_continuous(..., labels = function(x) sprintf("%0.0f", x)) worked in my case.

How to put labels over geom_bar in R with ggplot2

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

Resources