I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question:
Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:
stat_bin(geom="text", aes(x = bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
)
This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.
Here the actually code for the figure above:
g.invite.distro <- ggplot(data = df.exp) +
geom_bar(aes(x = invite_bins)) +
facet_wrap(~cat1, ncol=3) +
stat_bin(geom="text", aes(x = invite_bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
),
vjust = -1, size = 3) +
theme_bw() +
scale_y_continuous(limits = c(0, 3000))
UPDATE: As per request, here's a small example re-producing the issue:
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
stat_bin(geom = "text", aes(
x = x,
y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
facet_wrap(~f)
Update geom_bar requires stat = identity.
Sometimes it's easier to obtain summaries outside the call to ggplot.
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
# Load packages
library(ggplot2)
library(plyr)
# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1))
# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) +
geom_bar(stat = "identity", width = .7) +
geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
facet_wrap(~ f, ncol = 2) + theme_bw() +
scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))
Related
I've got a plot that looks like the output of the following code using the iris data
require(tidyverse)
require(purrr)
require(forcats) # Useful for ordering facets found at [here][1]
# Make some long data and set a custom sorting order using some of t
tbl <- iris %>%
pivot_longer(., cols = 1:4, names_to = "Msr", values_to = "Vls") %>%
mutate(Msr = factor(Msr)) %>%
mutate(plot_fct = fct_cross(Species, Msr)) %>%
mutate(plot_fct = fct_reorder(plot_fct, Vls))
# A functioning factory for minor log breaks found [here][1] (very helpful)
minor_breaks_log <- function(base) {
# Prevents lazy evaluation
force(base)
# Wrap calculation in a function that the outer function returns
function(limits) {
ggplot2:::calc_logticks(
base = base,
minpow = floor(log(limits[1], base = base)),
maxpow = ceiling(log(limits[2], base = base))
)$value
}
}
# Plot the images
ggplot(data = tbl, aes(x =plot_fct, y = Vls, fill = Species)) +
geom_violin() +
coord_flip() + # swap coords
scale_y_log10(labels = function(x) sprintf("%g", x),
minor_breaks = minor_breaks_log(10)) + # format for labels # box fills
theme_bw(base_size = 12) +
annotation_logticks(base = 10, sides = "b") +
facet_wrap(~Species, nrow = 1, scales = "free")
I would now like to list the number of observations per violin on the right side of each facet just inside the maximum border, which I'm sure is possible but cannot seem to find an example that does this sort of labeling, with violins and facets.
ggplot(data = tbl, aes(y = plot_fct, fill = Species)) +
geom_violin(aes(x = Vls)) +
geom_text(aes(label = after_stat(count)), hjust = 1,
stat = "count", position = "fill") +
scale_x_log10(labels = function(x) sprintf("%g", x),
minor_breaks = minor_breaks_log(10)) + # format for labels # box fills
theme_bw(base_size = 12) +
annotation_logticks(base = 10, sides = "b") +
facet_wrap(~Species, nrow = 1, scales = "free")
I have this data frame
df <- data.frame(profile = rep(c(1,2), times = 1, each = 3), depth = c(100, 200, 300), value = 1:3)
This is my plot
ggplot() +
geom_bar(data = df, aes(x = profile, y = - depth, fill = value), stat = "identity")
My problem is the y labels which doesn't correspond to the depth values of the data frame
To help, my desired plot seems like this :
ggplot() +
geom_point(data = df, aes(x = profile, y = depth, colour = value), size = 20) +
xlim(c(0,3))
But with bar intead of points vertically aligned
nb : I don't want to correct it manually in changing ticks with scale_y_discrete(labels = (desired_labels))
Thanks for help
Considering you want a y-axis from 0 to -300, using facet_grid() seems to be a right option without summarising the data together.
ggplot() + geom_bar(data = df, aes(x = as.factor(profile), y = -depth, fill = value), stat = 'identity') + facet_grid(~ value)
I have it !
Thanks for your replies and to this post R, subtract value from previous row, group by
To resume; the data :
df <- data.frame(profile = rep(c(1,2), times = 1, each = 3), depth = c(100, 200, 300), value = 1:3)
Then we compute the depth step of each profile :
df$diff <- ave(df$depth, df$profile, FUN=function(z) c(z[1], diff(z)))
And finally the plot :
ggplot(df, aes(x = factor(profile), y = -diff, fill = value)) + geom_col()
I want to explore the directlabels package with ggplot. I am trying to plot labels at the endpoint of a simple line chart; however, the labels are clipped by the plot panel. (I intend to plot about 10 financial time series in one plot and I thought directlabels would be the best solution.)
I would imagine there may be another solution using annotate or some other geoms. But I would like to solve the problem using directlabels. Please see code and image below. Thanks.
library(ggplot2)
library(directlabels)
library(tidyr)
#generate data frame with random data, for illustration and plot:
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw()
On data visualization principles, I would like to avoid extending the x-axis to make the labels fit--this would mean having data space with no data. Rather, I would like the labels to extend toward the white space beyond the chart box/panel (if that makes sense).
In my opinion, direct labels is the way to go. Indeed, I would position labels at the beginning and at the end of the lines, creating space for the labels using expand(). Also note that with the labels, there is no need for the legend.
This is similar to answers here and here.
library(ggplot2)
library(directlabels)
library(grid)
library(tidyr)
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0.15, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x - .3), "first.bumpup")) +
theme_bw()
If you prefer to push the labels into the plot margin, direct labels will do that. But because the labels are positioned outside the plot panel, clipping needs to be turned off.
p1 <- ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
theme_bw() +
theme(plot.margin = unit(c(1,4,1,1), "lines"))
# Code to turn off clipping
gt1 <- ggplotGrob(p1)
gt1$layout$clip[gt1$layout$name == "panel"] <- "off"
grid.draw(gt1)
This effect can also be achieved using geom_text (and probably also annotate), that is, without the need for direct labels.
p2 = ggplot(tidy_data, aes(x = month, y = value, group = asset, colour = asset)) +
geom_line() +
geom_text(data = subset(tidy_data, month == 100),
aes(label = asset, colour = asset, x = Inf, y = value), hjust = -.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_colour_discrete(guide = 'none') +
theme_bw() +
theme(plot.margin = unit(c(1,3,1,1), "lines"))
# Code to turn off clipping
gt2 <- ggplotGrob(p2)
gt2$layout$clip[gt2$layout$name == "panel"] <- "off"
grid.draw(gt2)
Since you didn't provide a reproducible example, it's hard to say what the best solution is. However, I would suggest trying to manually adjust the x-scale. Use a "buffer" increase the plot area.
#generate data frame with random data, for illustration and plot:
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw() +
xlim(minimum_value, maximum_value + buffer)
Using scale_x_discrete() or scale_x_continuous() would likely also work well here if you want to use the direct labels package. Alternatively, annotate or a simple geom_text would also work well.
When using ggplot2 to make faceted plots, I'm having trouble getting individual labels in each facet when I also specify a grouping parameter. Without specifying group = ..., things work fine, but I'm trying to make plots of paired data that emphasize the before vs. after treatment changes.
Here is an example:
library(tidyr)
library(ggplot2)
set.seed(253)
data <- data.frame(Subject = LETTERS[1:10],
Day1.CompoundA = rnorm(10, 4, 2),
Day2.CompoundA = rnorm(10, 7, 2),
Day1.CompoundB = rnorm(10, 5, 2),
Day2.CompoundB = rnorm(10, 5.5, 2))
# Compare concentration of compounds by day
A <- t.test(data$Day1.CompoundA, data$Day2.CompoundA, paired = TRUE)
B <- t.test(data$Day1.CompoundB, data$Day2.CompoundB, paired = TRUE)
data.long <- gather(data, key = DayCompound, value = Concentration, -Subject) %>%
separate(DayCompound, c("Day", "Compound"))
# text to annotate graphs
graphLabels <- data.frame(Compound = c("CompoundA", "CompoundB"),
Pval = paste("p =", c(signif(A$p.value, 2),
signif(B$p.value, 2))))
Ok, now that the data are set up, I can make a boxplot just fine:
ggplot(data.long, aes(x = Day, y = Concentration)) +
geom_boxplot() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval))
But if I want to show line plots that emphasize the paired nature of the data by showing each subject in a different color, the facet labels don't work.
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval))
# Error in eval(expr, envir, enclos) : object 'Subject' not found
Any suggestions?
When you map aesthetics (i.e. aes(...,color = Subject)) in the top level ggplot() call, those mappings are passed on to each layer, which means that each layer expects data to have variables by those names.
You either need to specify the data and mapping separately in each layer, or unmap them explicitly:
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval,color = NULL,group= NULL))
There is also an inherit.aes argument that you can set to FALSE in any layer you don't want pulling in those other mappings, e.g.
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval),inherit.aes = FALSE)
I'd like to ask if it's possible to label each of the points plotted by stat_sum with the percentage (i.e. the proportion) of the observations that that point represents. Ideally I would like the label to be in percent format rather than decimal.
Many thanks for your time.
Edit: Minimal reproducible example
library("ggplot2")
library("scales")
ggplot(diamonds, aes(x = cut, y = clarity)) +
stat_sum(aes(group = 1)) +
scale_size_continuous(labels=percent)
Image of the resulting plot
So my question is, how (if possible) to label each of those summary points with their 'prop' percentage value.
There are a few options. I'll assume that the legend is not needed given that the points are labelled with percentage counts.
One option is to add another stat_sum() function that contains a label aesthetic and a "text" geom. For instance:
library("ggplot2")
ggplot(diamonds, aes(x = cut, y = clarity, group = 1)) +
stat_sum(geom = "point", show.legend = FALSE) +
stat_sum(aes(label = paste(round(..prop.. * 100, 2), "%", sep = "")),
size = 3, hjust = -0.4, geom = "text", show.legend = FALSE)
Or, there may be no need for the points. The labels can do all the work - show location and size:
ggplot(diamonds, aes(x = cut, y = clarity, group = 1)) +
stat_sum(aes(label = paste(round(..prop.. * 100, 2), "%", sep = "")),
geom = "text", show.legend = FALSE) +
scale_size(range=c(2, 8))
Sometimes it is easier to create a summary table outside ggplot:
library(plyr)
df = transform(ddply(diamonds, .(cut, clarity), "nrow"),
percent = round(nrow/sum(nrow)*100, 2))
ggplot(df, aes(x = cut, y = clarity)) +
geom_text(aes(size = percent, label = paste(percent, "%", sep = "")),
show.legend = FALSE) +
scale_size(range = c(2, 8))