ggplot generally does a good job of creating sensible breaks and labels in scales.
However, I find that in plot with many facets and perhaps a formatter= statement, the labels tend to get too "dense" and overprint, for example in this picture:
df <- data.frame(
fac=rep(LETTERS[1:10], 100),
x=rnorm(1000)
)
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(formatter="percent")
I know that I can specify the breaks and labels of scales explicitly, by providing breaks= and scale= arguments to scale_x_continuous.
However, I am processing survey data with many questions and a dozen crossbreaks, so need to find a way to do this automatically.
Is there a way of telling ggplot to calculate breaks and labels automatically, but just have fewer, say at the minimum, maximum and zero point?
EDIT: Ideally, I don't want to specify the minimum and maximum points, but somehow tap into the built-in ggplot training of scales, and use the default calculated scale limits.
You can pass in arguments such as min() and max() in your call to ggplot to dynamically specify the breaks. It sounds like you are going to be applying this across a wide variety of data so you may want to consider generalizing this into a function and messing with the formatting, but this approach should work:
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(breaks = c(min(df$x), 0, max(df$x))
, labels = c(paste( 100 * round(min(df$x),2), "%", sep = ""), paste(0, "%", sep = ""), paste( 100 * round(max(df$x),2), "%", sep = ""))
)
or rotate the x-axis text with opts(axis.text.x = theme_text(angle = 90, hjust = 0)) to produce something like:
Update
In the latest version of ggplot2 the breaks and labels arguments to scale_x_continuous accept functions, so one can do something like the following:
myBreaks <- function(x){
breaks <- c(min(x),median(x),max(x))
names(breaks) <- attr(breaks,"labels")
breaks
}
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(breaks = myBreaks,labels = percent_format()) +
opts(axis.text.x = theme_text(angle = 90, hjust = 1,size = 5))
The scales package contains several breaks_* and label_* functions which return functions (closures) that are used by ggplot. So, you can write a wrappers for these that modify the output.
For example:
library(ggplot2)
# Compute the list of breaks using original_func,
# then remove any of these that occur in remove_list
remove_breaks <- function(original_func, remove_list = list()) {
function(x) {
original_result <- original_func(x)
original_result[!(original_result %in% remove_list)]
}
}
# Compute the list of labels using original_func,
# then remove any of these that occur in remove_list
remove_labels <- function(original_func, remove_list = list()) {
function(x) {
original_result <- original_func(x)
replace(original_result, original_result %in% remove_list, '')
}
}
# Original plot
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
scale_x_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = scales::label_number_auto()) +
scale_y_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = scales::label_number_auto())
# Remove some breaks from the x-axis, and remove some labels from the y-axis
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
scale_x_continuous(breaks = remove_breaks(scales::breaks_pretty(9), seq(3,6)),
minor_breaks = remove_breaks(scales::breaks_pretty(18), seq(3,6,0.5)),
labels = scales::label_number_auto()) +
scale_y_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = remove_labels(scales::label_number_auto(), seq(20, 30)))
Of course, with my simple remove_breaks and remove_labels functions you still have to specify which values to remove, but you can easily modify these to something that removes the max and min value, removes any value in a specified range, etc.
Related
I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.
I have the stacked bar chart below and I would like to know if it is possible to set a max limit of characters displayed in the values of the y-axis, for example 4, and then add a "." at the point that the characters stop. For example "subcompact" should become "subc."
g <- ggplot(mpg, aes(class))
g+geom_bar(aes(fill = drv), position = position_stack(reverse = TRUE)) +
coord_flip() +
theme(legend.position = "top")
You could also do the replacement in ggplot code if you didn't want to alter your source data - this is a slightly different regex solution #AndreElrico 's
g <- ggplot(mpg, aes(sub(class,pattern = "(\\w{4}).*",replacement = "\\1.")))
change your variable into your desired variable before using it.
mpg$class <- sub("(?<=^.{4}).*",".", mpg$class, perl = T)
You can use regex to archive this.
You can adjust the labels with scale_x_discrete, which means no editing of the dataset is done.
g+geom_bar(aes(fill = drv), position = position_stack(reverse = TRUE)) +
scale_x_discrete(
labels = function(x) {
is_long <- nchar(x) > 4
x[is_long] <- paste0(substr(x[is_long], 1, 4), ".")
x
}
) +
coord_flip() +
theme(legend.position = "top")
This answer shows how you can specify where the minor breaks should go. In the documentation it says that minor_breaks can be a function. This, however, takes as input the plot limits not, as I expected, the location of the major gridlines below and above.
It doesn't seem very simple to make a script that will return me, say, 4 minors per major. This is something I would like to do since I have a script that I want to use on multiple different datasets. I don't know the limits beforehand, so I can't hard code them in. I can of course create a function that gets the values I need from the dataset before plotting, but it seems overkill.
Is there a general way to state the number of minor breaks per major break?
You can extract the majors from the plot, and from there calculate what minors you want and set it for your plot.
df <- data.frame(x = 0:10,
y = 0:10)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$panel$ranges[[1]]$x.major_source
multiplier <- 4
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1)
p + scale_x_continuous(minor_breaks = minors)
I think scales::extended_breaks is the default function for a continuous scale. You can set the number of breaks in this function, and make the number of minor_breaks a integer multiple of the number of breaks.
library(ggplot2)
library(scales)
nminor <- 7
nmajor <- 5
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_point() +
scale_y_continuous(breaks = extended_breaks(n = nmajor), minor_breaks = extended_breaks(n = nmajor * nminor) )
Using ggplot2 version 3, I have to modify Eric Watt's code above a bit to get it to work (I can't comment on that instead since I don't have a 50 reputation yet)
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 10
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
If I copy paste the same code in my editor, it doesn't create majors (NULL), and so the next line gives an error.
I'm creating a population pyramid. I would like my axis to use absolute values so that I don't have negative numbers for population count. I would also like to have the tick marks on the x-axis formatted with commas so that instead of 30000 it would read 30,000.
I know labels=comma will provide numbers that uses comma and I know that labels =abs will provide numbers that are in absolute values.
How do I combine the two options?
#test data
mfrawcensus <- data.frame(Age = factor(rep(x = 1:90, times = 2)),
Sex = rep(x = c("Females", "Males"), each = 90),
Population = sample(x = 1:60000, size = 180))
censuspop <- ggplot(data=mfrawcensus,aes(x=Age,y=Population, fill=Sex)) +
geom_bar(data= subset(mfrawcensus,Sex=="Females"), stat="identity") +
geom_bar(data= subset(mfrawcensus,Sex=="Males"),
mapping=aes(y=-Population),
stat="identity",
position="identity") +
scale_y_continuous(labels=comma) +
xlab("Age (years)")+ ylab("Population") +
scale_x_discrete(breaks =seq(0,90,5), drop=FALSE)+ coord_flip(ylim=c(-55000,55000))
I tried adding like another scale on top of the original one to get absolute values but it didn't work. Here's my attempt:
censuspyramid<-censuspop+theme_bw() +theme(legend.position="none")
+ ggtitle("a")+scale_y_continuous(labels=abs)
You can make a new function that does both abs and comma
abs_comma <- function (x, ...) {
format(abs(x), ..., big.mark = ",", scientific = FALSE, trim = TRUE)
}
Use this instead of comma
I am trying to format Cost and Revenue (both in thousands) and Impressions (in millions) data for a ggplot graph's y-axis labels.
My plot runs from 31 days ago to 'yesterday' and uses the min and max values over that period for the ylim(c(min,max)) option. Showing just the Cost example,
library(ggplot2)
library(TTR)
set.seed(1984)
#make series
start <- as.Date('2016-01-01')
end <- Sys.Date()
days <- as.numeric(end - start)
#make cost and moving averages
cost <- rnorm(days, mean = 45400, sd = 11640)
date <- seq.Date(from = start, to = end - 1, by = 'day')
cost_7 <- SMA(cost, 7)
cost_30 <- SMA(cost, 30)
df <- data.frame(Date = date, Cost = cost, Cost_7 = cost_7, Cost_30 = cost_30)
# set parameters for window
left <- end - 31
right <- end - 1
# plot series
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
ylim(c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) +
xlab("")
I would a) like to represent thousands and millions on the y-axis with commas, and b) like those numbers abbreviated and with 'K' for thousands or 'MM' for millions. I realize b) may be a tall order, but for now a) cannot be accomplished with
ggplot(...) + ... + ylim(c(min, max)) + scale_y_continuous(labels = comma)
Because the following error is thrown:
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
I have tried putting the scale_y_continuous(labels = comma) section after the geom_line()layer (which throws the error above) or at the end of all the ggplot layers, which overrides my limits in the ylim call and then throws the error above, anyway.
Any ideas?
For the comma formatting, you need to include the scales library for label=comma. The "error" you discussed is actually just a warning, because you used both ylim and then scale_y_continuous. The second call overrides the first. You can instead set the limits and specify comma-separated labels in a single call to scale_y_continuous:
library(scales)
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Another option would be to melt your data to long format before plotting, which reduces the amount of code needed and streamlines aesthetic mappings:
library(reshape2)
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Either way, to put the y values in terms of thousands or millions you could divide the y values by 1,000 or 1,000,000. I've used dollar_format() below, but I think you'll also need to divide by the appropriate power of ten if you use unit_format (per #joran's suggestion). For example:
div=1000
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value/div, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost (Thousands)") +
scale_y_continuous(label=dollar_format(),
limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left]))/div)
Use scale_color_manual and scale_linetype_manual to set custom colors and linetypes, if desired.
I just found the solution. It does not work with "label = comma". Please try this solution:
scale_y_continuous(labels = scales::comma)
It works well for me.
The unit_format() function highlighted by #joran has now been depreciated within the scales package and replaced with label_number(). It defaults to using a space as a separator, change this with the big.mark= argument. Use the prefix = and suffix = arguments to add characters before and after, and the scale = argument to multiple the numbers by a scaling factor (so in many cases you want a negative exponent here).
The problem that #konrad notes with a space between the number and the suffix no longer seems to exist. If you want a space, include it in the suffix argument suffix = " M".
So for example to show 1234000 as £1,234k on the the y axis scale_y_continuous(labels = label_number(prefix = "£", suffix = "k", scale = 1e-3, big.mark = ","))
As comma separators are so commonly used there is a convenience function label_comma which sets big.mark = ",". Or, for even less typing, the comma() function is exactly the same.
One gotcha is that the scales package is not loaded as a dependency with library(ggplot), you have to load it separately, or as #Aurora points out in their answer, by prefixing the function with scales::
https://scales.r-lib.org/reference/label_number.html