ggplot2: minor breaks in scale_x_datetime - r

This code:
library(lubridate)
library(ggplot2)
library(scales)
.months <- 3
.minor.intervals <- 4
.minor.intervals.num <- .months * .minor.intervals
sdate <- as.POSIXct("2015-01-01")
edate <- sdate + months(.months)
df <- data.frame(x = seq(from = sdate, to = edate,
length.out = .minor.intervals.num * 2),
y = 1:(.minor.intervals.num * 2))
p <- ggplot(df, aes(x = x, y = y))
xbm <- seq(from = sdate, to = edate, length.out = .minor.intervals.num)
p <- p + scale_x_datetime(limits = c(sdate, edate),
breaks = date_breaks("month"),
minor_breaks = xbm)
p <- p + geom_line() + geom_point()
plot(p)
gives me error: Error in Ops.POSIXt((x - from[1]), diff(from)) : '/' not defined for "POSIXt" objects
If I comment minor_breaks part — everything works.
If I change minor_breaks part to minor_breaks = date_breaks("week") — everything works too.
But I want to split month exactly for 4 parts...
How to fix it?

I have found a way to solve the problem, but I must admit that I am not sure why it has to be done this way. It seems that minor_breaks expects numeric values and not dates as input.
I created the breaks with the following code:
maj.breaks <- sdate + months(0:.months)
min.breaks <- do.call(c,
lapply(1:.months,function(m) {
seq(maj.breaks[m],maj.breaks[m+1],length.out = .minor.intervals+1)
})
)
which relies on the variables as defined in your example. Note the difference to your way of defining the minor breaks: since each month has different length, it is not enough to simply split the range between the start and end dates into the appropriate number of segments. You have to split each month by itself.
As mentioned above, you then have to convert min.breaks to numeric before you pass it to minor_breaks. I produce the plot as follows:
p <- ggplot(df, aes(x = x, y = y)) +
scale_x_datetime(limits = c(sdate, edate),
breaks = maj.breaks,
minor_breaks = as.numeric(min.breaks)) +
geom_line() + geom_point()
plot(p)
This is identical to your code up to the inputs for breaks and minor_breaks. There is no need to use the vector maj.breaks since your version works just as well. But I think it is interesting to note that breaks works with input of class POSIXct, while minor_breaks expects numeric values. Unfortunately, I don't know the reason for this.

Related

How to format difftime as hh:mm in ggplot2?

I want to display difftime data with ggplot2 and I want the tick format to be hh:mm.
library(ggplot2)
a= as.difftime(c("0:01", "4:00"), "%H:%M", unit="mins")
b= as.difftime(c('0:01', "2:47"), "%H:%M", unit="mins")
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1) +
scale_x_time(labels = date_format("%H:%M"),
breaks = "1 hour")
But I get the following warning:
Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
Warning message:
In structure(as.numeric(x), names = names(x)) : NAs introduced by coercion
and this as a graph:
Update:
my example was too minimal, I also need to be able to display negative differences, so this would be better data:
a= as.difftime(c(-60, -4*60), unit="mins")
b= as.difftime(c(-60, 2*60+47), unit="mins")
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1)
The answer has two parts.
Plotting difftime objects
According to help("scale_x_time"), ggplot2 supports three date/time classes: scale_*_date for dates (class Date), scale_*_datetime for datetimes (class POSIXct), and scale_*_time for times (class hms). The last one is what we need here.
Class hms is a custom class for difftime vectors. as.hms() has a method for difftime. So. difftime objects can be plotted with ggplot2 by coercing to class hms:
a <- as.difftime(c(-60, -4 * 60), unit = "mins")
b <- as.difftime(c(-60, 2 * 60 + 47), unit = "mins")
library(ggplot2)
ggplot(data = NULL, aes(x = hms::as.hms(b), y = hms::as.hms(a))) +
geom_point(shape = 1)
Please, note that negative time differences are shown as well.
Formatting the tick labels
The OP has requested that tick marks should be labeled in hh:mm format. Apparently, the default formatting is hh:mm:ss. This can be modified by specifying a function that takes the breaks as input and returns labels as output to the labels parameter of the scale_x_time() and scale_y_time() functions:
format_hm <- function(sec) stringr::str_sub(format(sec), end = -4L)
ggplot(data = NULL, aes(x = hms::as.hms(b), y = hms::as.hms(a))) +
geom_point(shape = 1) +
scale_x_time(name = "b", labels = format_hm) +
scale_y_time(name = "a", labels = format_hm)
The format_hm() function truncates the :ss part from the default format. In addition, the axis are labeled nicely.
Depending on your constraints, you might consider translating the difftimes to distinct datetimes, which ggplot can handle just fine:
library(lubridate)
a_date_times <- floor_date(Sys.time(), "1 day") + a
b_date_times <- floor_date(Sys.time(), "1 day") + b
ggplot(data=NULL, aes(x=a_date_times, y=b_date_times)) +
geom_point(shape=1)
My best approach so far is:
library(ggplot2)
library(lubridate)
a= as.difftime(c(-60, -4*60), unit="mins")
b= as.difftime(c(-60, 2*60+47), unit="mins")
xbreaks = seq(ceiling(min(b)/60), floor(max(b)/60)) * 60
ybreaks = seq(ceiling(min(a)/60), floor(max(a)/60)) * 60
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1) +
scale_x_continuous(labels = f, breaks = xbreaks) +
scale_y_continuous(labels = f, breaks = ybreaks)
f <- function(x){
t = seconds_to_period(abs(x)*60)
r = sprintf("% 2i:%02i", sign(x)*hour(t), minute(t))
return(r)
}

Extend x-axis with dates

I wish to use ggrepel to add labels to the ends of the lines of a ggplot. To do that, I need to make space for the labels. To do that, I use scale_x_continuous ot extend the x-axis. Not sure that's correct and am open to other strategies.
I can do it when the x_axis type is friendly numeric.
library("tidyverse")
library("ggrepel")
p <- tibble (
x = c(1991, 1999),
y = c(3, 5)
)
ggplot(p, aes(x, y)) + geom_line() + scale_x_continuous(limits = c(1991, 2020)) +
geom_text_repel(data = p[2,], aes(label = "Minimum Wage"), size = 4, nudge_x = 1, nudge_y = 0, colour = "gray50")
However, when I try something similar except the x-axis is of the evil date type, I get the error:
Error in as.Date.numeric(value) : 'origin' must be supplied
p <- tibble (
x = c(as.Date("1991-01-01"), as.Date("1999-01-01")),
y = c(2, 5)
)
range <- c(as.Date("1991-01-01"), as.Date("2020-01-01"))
ggplot(p, aes(x, y)) + geom_line() + scale_x_continuous(limits = range)
How can I get this to work with my arch nemesis, date?
Use scale_x_date instead of scale_x_continuous:
p <- tibble (
x = c(as.Date("1991-01-01"), as.Date("1999-01-01")),
y = c(2, 5)
)
range <- c(as.Date("1991-01-01"), as.Date("2020-01-01"))
ggplot(p, aes(x, y)) + geom_line() + scale_x_date(limits = range)
Note that scale_x_date() has an expand argument which allows exact control over where the x-axis starts and ends. You could try expand = c(0,0) to include only the dates specified in your limits = argument or expand = c(f, f) where f is the fraction of days relative to the entire time series record you should include in your plot beyond the range of dates specified via your limit = argument. For example, f could be 0.01.

ggplot2 integer multiple of minor breaks per major break

This answer shows how you can specify where the minor breaks should go. In the documentation it says that minor_breaks can be a function. This, however, takes as input the plot limits not, as I expected, the location of the major gridlines below and above.
It doesn't seem very simple to make a script that will return me, say, 4 minors per major. This is something I would like to do since I have a script that I want to use on multiple different datasets. I don't know the limits beforehand, so I can't hard code them in. I can of course create a function that gets the values I need from the dataset before plotting, but it seems overkill.
Is there a general way to state the number of minor breaks per major break?
You can extract the majors from the plot, and from there calculate what minors you want and set it for your plot.
df <- data.frame(x = 0:10,
y = 0:10)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$panel$ranges[[1]]$x.major_source
multiplier <- 4
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1)
p + scale_x_continuous(minor_breaks = minors)
I think scales::extended_breaks is the default function for a continuous scale. You can set the number of breaks in this function, and make the number of minor_breaks a integer multiple of the number of breaks.
library(ggplot2)
library(scales)
nminor <- 7
nmajor <- 5
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_point() +
scale_y_continuous(breaks = extended_breaks(n = nmajor), minor_breaks = extended_breaks(n = nmajor * nminor) )
Using ggplot2 version 3, I have to modify Eric Watt's code above a bit to get it to work (I can't comment on that instead since I don't have a 50 reputation yet)
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 10
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
If I copy paste the same code in my editor, it doesn't create majors (NULL), and so the next line gives an error.

cumulative plot using ggplot2

I'm learning to use ggplot2 and am looking for the smallest ggplot2 code that reproduces the base::plot result below. I've tried a few things and they all ended up being horrendously long, so I'm looking for the smallest expression and ideally would like to have the dates on the x-axis (which are not there in the plot below).
df = data.frame(date = c(20121201, 20121220, 20130101, 20130115, 20130201),
val = c(10, 5, 8, 20, 4))
plot(cumsum(rowsum(df$val, df$date)), type = "l")
Try this:
ggplot(df, aes(x=1:5, y=cumsum(val))) + geom_line() + geom_point()
Just remove geom_point() if you don't want it.
Edit: Since you require to plot the data as such with x labels are dates, you can plot with x=1:5 and use scale_x_discrete to set labels a new data.frame. Taking df:
ggplot(data = df, aes(x = 1:5, y = cumsum(val))) + geom_line() +
geom_point() + theme(axis.text.x = element_text(angle=90, hjust = 1)) +
scale_x_discrete(labels = df$date) + xlab("Date")
Since you say you'll have more than 1 val for "date", you can aggregate them first using plyr, for example.
require(plyr)
dd <- ddply(df, .(date), summarise, val = sum(val))
Then you can proceed with the same command by replacing x = 1:5 with x = seq_len(nrow(dd)).
After a couple of years, I've settled on doing:
ggplot(df, aes(as.Date(as.character(date), '%Y%m%d'), cumsum(val))) + geom_line()
Jan Boyer seems to have found a more concise solution to this problem in this question, which I have shortened a bit and combined with the answers of Prradep, so as to provide a (hopefully) up-to-date-answer:
ggplot(data = df,
aes(x=date)) +
geom_col(aes(y=value)) +
geom_line(aes(x = date, y = cumsum((value))/5, group = 1), inherit.aes = FALSE) +
ylab("Value") +
theme(axis.text.x = element_text(angle=90, hjust = 1))
Note that date is not in Date-Format, but character, and that value is already grouped as suggested by Prradep in his answer above.

Is there a way of manipulating ggplot scale breaks and labels?

ggplot generally does a good job of creating sensible breaks and labels in scales.
However, I find that in plot with many facets and perhaps a formatter= statement, the labels tend to get too "dense" and overprint, for example in this picture:
df <- data.frame(
fac=rep(LETTERS[1:10], 100),
x=rnorm(1000)
)
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(formatter="percent")
I know that I can specify the breaks and labels of scales explicitly, by providing breaks= and scale= arguments to scale_x_continuous.
However, I am processing survey data with many questions and a dozen crossbreaks, so need to find a way to do this automatically.
Is there a way of telling ggplot to calculate breaks and labels automatically, but just have fewer, say at the minimum, maximum and zero point?
EDIT: Ideally, I don't want to specify the minimum and maximum points, but somehow tap into the built-in ggplot training of scales, and use the default calculated scale limits.
You can pass in arguments such as min() and max() in your call to ggplot to dynamically specify the breaks. It sounds like you are going to be applying this across a wide variety of data so you may want to consider generalizing this into a function and messing with the formatting, but this approach should work:
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(breaks = c(min(df$x), 0, max(df$x))
, labels = c(paste( 100 * round(min(df$x),2), "%", sep = ""), paste(0, "%", sep = ""), paste( 100 * round(max(df$x),2), "%", sep = ""))
)
or rotate the x-axis text with opts(axis.text.x = theme_text(angle = 90, hjust = 0)) to produce something like:
Update
In the latest version of ggplot2 the breaks and labels arguments to scale_x_continuous accept functions, so one can do something like the following:
myBreaks <- function(x){
breaks <- c(min(x),median(x),max(x))
names(breaks) <- attr(breaks,"labels")
breaks
}
ggplot(df, aes(x=x)) +
geom_bar(binwidth=0.5) +
facet_grid(~fac) +
scale_x_continuous(breaks = myBreaks,labels = percent_format()) +
opts(axis.text.x = theme_text(angle = 90, hjust = 1,size = 5))
The scales package contains several breaks_* and label_* functions which return functions (closures) that are used by ggplot. So, you can write a wrappers for these that modify the output.
For example:
library(ggplot2)
# Compute the list of breaks using original_func,
# then remove any of these that occur in remove_list
remove_breaks <- function(original_func, remove_list = list()) {
function(x) {
original_result <- original_func(x)
original_result[!(original_result %in% remove_list)]
}
}
# Compute the list of labels using original_func,
# then remove any of these that occur in remove_list
remove_labels <- function(original_func, remove_list = list()) {
function(x) {
original_result <- original_func(x)
replace(original_result, original_result %in% remove_list, '')
}
}
# Original plot
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
scale_x_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = scales::label_number_auto()) +
scale_y_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = scales::label_number_auto())
# Remove some breaks from the x-axis, and remove some labels from the y-axis
ggplot(data.frame(x=c(1,2,3,4,5,6,7,8), y = c(1,4,9,16,25,36,49,64))) + geom_line(aes(x, y)) +
scale_x_continuous(breaks = remove_breaks(scales::breaks_pretty(9), seq(3,6)),
minor_breaks = remove_breaks(scales::breaks_pretty(18), seq(3,6,0.5)),
labels = scales::label_number_auto()) +
scale_y_continuous(breaks = scales::breaks_pretty(9),
minor_breaks = scales::breaks_pretty(18),
labels = remove_labels(scales::label_number_auto(), seq(20, 30)))
Of course, with my simple remove_breaks and remove_labels functions you still have to specify which values to remove, but you can easily modify these to something that removes the max and min value, removes any value in a specified range, etc.

Resources