Trouble with placing and formatting dates in ggplot2 graph using chron - r

I've been trying to add appropriate dates on the x-axis of my graph, but can't figure out how to do it in a sane way. What I want is pretty simple: a date at every January 1st in between the minimum and maximum of my data set.
I don't want to include the month - just '2008' or '2009' or whatever is fine. A great example would be this graph:
example graph
Except I want the date on every year, rather than every other year.
I can't seem to figure this out. My dates are defined as days since 1/1/1970, and I've included a method dateEPOCH_formatter which converts the epoch format to a format using the chron package. I've figured out how to make a tick mark and date at the origin of the graph and every 365 days thereafter, but that's not quite the same thing.
Another minor problem is that, mysteriously, the line chron(floor(y), out.format="mon year",origin.=epoch) outputs a graph with axis markers like 'Mar 2008', but changing the line to chron(floor(y), out.format="year",origin.=epoch) doesn't give me a result like '2008' - it just results in the error:
Error in parse.format(format[1]) : unrecognized format year
Calls: print ... as.character.times -> format -> format.dates -> parse.format
Execution halted
Here's my code - thanks for the help.
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y), out.format="mon year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 14400)) + # adds tick marks every 4 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
theme_bw() +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=5/8, size=2, color="#A9203E"),
position=position_identity(),
)
print(p)
dev.off()

I think it will be much easier to use the built in function scale_x_date with date_format and date_breaks from the scales package. These should work with most date classes in R, such as Date, chron etc
for example
library(ggplot2)
library(chron)
library(scales)
# some example data
days <- seq(as.Date('01-01-2000', format = '%d-%m-%Y'),
as.Date('01-01-2010', format = '%d-%m-%Y'), by = 1)
days_chron <- as.chron(days)
mydata <- data.frame(day = days_chron, y = rnorm(length(days)))
# the plot
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('year'), labels = date_format('%Y'))
To show how intuitive and easy these function are, if you wanted Montth-year labels every 6 months - note that this requires a very wide plot or very small axis labels
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('6 months'), labels = date_format('%b-%Y'))

Related

ggplot2, x-axis not recognizing dates

Trying to plot the following data frame (call it bob):
1
Since the original date is in d/m/y, I use Finaldate and Value to graph.
Here is the code used to graph:
ggplot(Bob, aes(Finaldate, Value)) +geom_line() + geom_point(size = 3) +
labs(title = "TITLE",subtitle = "SUBTITLE", y = "Y", x = "X") +
theme_fivethirtyeight()+scale_y_continuous(name="name", labels = scales::comma)+theme(legend.title = element_blank())+scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
While I do get an output, it is not as a time series but rather the dates are not in order and the plot makes no sense. Attached a copy of the plot as well.
enter image description here
Not sure how to fix this problem, and have tried a couple of different things
Have you tried using
+ scale_x_date(date_labels = "%d %m %Y") (ggplot2)
https://r-graph-gallery.com/279-plotting-time-series-with-ggplot2.html
You need to convert Finaldate to a date -- it is being treated as a character so all the dates are in "alphabetical" order. Try:
Bob$finalDate <- as.Date(Bob$finalDate, format = "%m/%d/%Y")

How to format difftime as hh:mm in ggplot2?

I want to display difftime data with ggplot2 and I want the tick format to be hh:mm.
library(ggplot2)
a= as.difftime(c("0:01", "4:00"), "%H:%M", unit="mins")
b= as.difftime(c('0:01', "2:47"), "%H:%M", unit="mins")
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1) +
scale_x_time(labels = date_format("%H:%M"),
breaks = "1 hour")
But I get the following warning:
Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
Warning message:
In structure(as.numeric(x), names = names(x)) : NAs introduced by coercion
and this as a graph:
Update:
my example was too minimal, I also need to be able to display negative differences, so this would be better data:
a= as.difftime(c(-60, -4*60), unit="mins")
b= as.difftime(c(-60, 2*60+47), unit="mins")
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1)
The answer has two parts.
Plotting difftime objects
According to help("scale_x_time"), ggplot2 supports three date/time classes: scale_*_date for dates (class Date), scale_*_datetime for datetimes (class POSIXct), and scale_*_time for times (class hms). The last one is what we need here.
Class hms is a custom class for difftime vectors. as.hms() has a method for difftime. So. difftime objects can be plotted with ggplot2 by coercing to class hms:
a <- as.difftime(c(-60, -4 * 60), unit = "mins")
b <- as.difftime(c(-60, 2 * 60 + 47), unit = "mins")
library(ggplot2)
ggplot(data = NULL, aes(x = hms::as.hms(b), y = hms::as.hms(a))) +
geom_point(shape = 1)
Please, note that negative time differences are shown as well.
Formatting the tick labels
The OP has requested that tick marks should be labeled in hh:mm format. Apparently, the default formatting is hh:mm:ss. This can be modified by specifying a function that takes the breaks as input and returns labels as output to the labels parameter of the scale_x_time() and scale_y_time() functions:
format_hm <- function(sec) stringr::str_sub(format(sec), end = -4L)
ggplot(data = NULL, aes(x = hms::as.hms(b), y = hms::as.hms(a))) +
geom_point(shape = 1) +
scale_x_time(name = "b", labels = format_hm) +
scale_y_time(name = "a", labels = format_hm)
The format_hm() function truncates the :ss part from the default format. In addition, the axis are labeled nicely.
Depending on your constraints, you might consider translating the difftimes to distinct datetimes, which ggplot can handle just fine:
library(lubridate)
a_date_times <- floor_date(Sys.time(), "1 day") + a
b_date_times <- floor_date(Sys.time(), "1 day") + b
ggplot(data=NULL, aes(x=a_date_times, y=b_date_times)) +
geom_point(shape=1)
My best approach so far is:
library(ggplot2)
library(lubridate)
a= as.difftime(c(-60, -4*60), unit="mins")
b= as.difftime(c(-60, 2*60+47), unit="mins")
xbreaks = seq(ceiling(min(b)/60), floor(max(b)/60)) * 60
ybreaks = seq(ceiling(min(a)/60), floor(max(a)/60)) * 60
ggplot(data=NULL, aes(x=b, y=a)) + geom_point(shape=1) +
scale_x_continuous(labels = f, breaks = xbreaks) +
scale_y_continuous(labels = f, breaks = ybreaks)
f <- function(x){
t = seconds_to_period(abs(x)*60)
r = sprintf("% 2i:%02i", sign(x)*hour(t), minute(t))
return(r)
}

Ongoing dramas with epicurves date scales

I'm attempting to use ggplot and R for analysing some epidemiologic data, and I'm continuing to struggle with getting an epidemic curve to appear properly.
Data is here
attach(epicurve)
head(epicurve)
onset age
1 21/12/2012 18
2 14/06/2013 8
3 10/06/2013 64
4 28/05/2013 79
5 14/04/2013 56
6 9/04/2013 66
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
ggplot(epicurve, aes(onset)) + geom_histogram() + scale_x_date(breaks=date_breaks("1 year"), minor_breaks=date_breaks("1 month"), labels = date_format("%b-%Y"))
gives this graph. This is fine, but the binwidths are not related to any time period of note, and adjusting them is a bit trial and error.
For this particular dataset, I'd like to display the cases by month of onset.
One way I worked out how to do this is:
epicurve$monyr <- format(epicurve$onset, "%b-%Y")
epicurve$monyr <- as.factor(epicurve$monyr)
ggplot(epicurve, aes(monyr)) + geom_histogram()
Outputs a graph I can't post because of the reputation system. The bars represent something meaningful, but the axis labels are a bomb-site. I can't format the axes using scale_x_date because they aren't dates and I can't work out what arguments to pass to scale_x_discrete to give useful labels.
I have a feeling there should be an easier way to do this by doing an operation on the onset column. Can anyone give me any pointers, please?
One option is to aggregate the data outside ggplot and then use geom_bar. This will produce counts by month.
edited Sept. 21 2013. Altered plot to show months with no counts.
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
# initial formatting
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y") # convert to Date class
epicurve$onset <- strftime(epicurve$onset, format="%Y/%m") # convert to Year-month
epicurve$onset <- paste(epicurve$onset, "/01", sep = "") # add arbitrary day on to end to make compatible w/ ggplot2
# aggregate by month
onset_counts <- aggregate(epicurve$onset, by = list(date = epicurve$onset), length) # aggregate by month
onset_counts$date = as.Date(onset_counts$date, format = "%Y/%m/%d") # covert to Date class
# plot
library(ggplot2)
library(scales)
ggplot(onset_counts, aes(x=date, y=x)) + geom_bar(stat="identity") + theme_bw() + theme(axis.text.x = element_text(angle=90, hjust = 1, vjust = 1)) +
ylab("Frequency") + xlab(NULL) + scale_x_date(breaks="month", labels=date_format("%Y-%m"))
I've also just happened across another way of making it look pretty, although it feels like a bit of a kludge.
#read data
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
#load libraries
library(ggplot2)
library(scales)
#plot
ggplot(epicurve, aes(onset)) + geom_histogram(colour="white", binwidth=30.4375) +
scale_x_date(breaks=date_breaks("1 year"), minor_breaks=("1 month"), labels=date_format("%b-%Y")) +
scale_y_continuous(breaks=0:10, minor_breaks=NULL) +
theme(axis.text.x = element_text(angle=45, vjust=0.5))
# binwidth = (365.25/12) = 30.4375 - which nicely makes the bins fit the scale nicely
Which gives this (notice the beautiful alignment of the bins!):
Many thanks to Nate for the help, and hopefully this will be useful!

ggplot2: plotting non-contiguous time durations as a bar chart

I'm using ggplot to plot various events as a function of the date (x-axis) and start time (y-axis) on which they began. The data/code are as follows:
date<-c("2013-06-05","2013-06-05","2013-06-04","2013-06-04","2013-06-04","2013-06-04","2013-06-04",
"2013-06-04","2013-06-04","2013-06-03","2013-06-03","2013-06-03","2013-06-03","2013-06-03",
"2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02")
start <-c("07:36:00","01:30:00","22:19:00","22:12:00","20:16:00","19:19:00","09:00:00",
"06:45:00","01:03:00","22:15:00","19:05:00","08:59:00","08:01:00","07:08:00",
"23:24:00","20:39:00","18:53:00","16:57:00","15:07:00","14:33:00","13:24:00")
duration <-c(0.5,6.1,2.18,0.12,1.93,0.95,10.32,
2.25,5.7,2.78,3.17,9.03,0.95,0.88,
7.73,2.75,1.77,1.92,1.83,0.57,1.13)
event <-c("AF201","SS431","BE201","CD331","HG511","CD331","WQ115",
"CD331","SS431","WQ115","HG511","WQ115","CD331","AF201",
"SS431","WQ115","HG511","WQ115","CD331","AS335","CD331")
df<-data.frame(date,start,duration,event)
library(ggplot2)
library(scales)
p <- ggplot(df, aes(as.Date(date),as.POSIXct(start,format='%H:%M:%S'),color=event))
p <- p+geom_point(alpha = I(6/10),size=5)
p + ylab("time (hr)") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
The resulting plot looks like this:
Question: Instead of simply indicating the start time of the event with a single point (shown above), how can I plot a bar that spans the time duration of the event? As shown in the data frame above I have this duration data (in hours). Alternatively, I could supply a 'stop time' (not shown).
I'm imagining the solution would look something like a stacked bar chart. However, a bar chart isn't quite right as it assumes the bar starts at the bottom of the plot and that the vertically stacked events have no gaps between them. My events may be non-contiguous -- 'starting' and 'stopping' at various positions along the y-axis. The solution will also have to take into consideration that 1) some events may ultimately be concurrent (overlap in time) and 2) some events will span multiple days.
I'd be very grateful for any suggestions!
It's a bit unclear exactly what you want - #Michele's answer seemed good, I wasn't clear if you wanted to to use geom_rect because it would make for thicker lines (if so, just change the line width), or if there was another reason. I decided to give it a go using geom_rect to enable dodging. I've plotted it with the starting date on the x axis, and the start and end times on y. I've set up the data slightly differently to enable that. If you're after something different, try to make it explicit, but at least here's another option:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
df$date <- strptime(df$start, "%Y-%m-%d")
df$start.new <- format(df$start, format = "%H:%M:%S")
df$end.new <- format(df$end, format = "%H:%M:%S")
df$day <- factor(as.POSIXct(df$date))
levels(df$day) <- 1:4
df$day <- as.numeric(as.character(df$day))
df$event.int <- df$event
levels(df$event.int) <- 1:7
df$event.int <- as.numeric(as.character(df$event.int))
p <- ggplot(df, aes(day, start)) + geom_rect(aes(ymin = start, ymax = end,
xmin = (day - 0.45) + event.int/10,
xmax = (day - 0.35) + event.int/10,
fill = event)) +
scale_x_discrete(limits = 1:4,breaks = 1:4, labels = sort(unique(date)),
name = "Start date") + ylab("Duration")
Thanks (+1s) to #Michele and #alexwhan for your input. Using geom_rect I was able to get all of the events which occur on the same date on the same point on the x axis. (I'm anticipating that this data set may ultimately include many months of events.)
df<-data.frame(date,start,duration,event)
library(ggplot2)
p <- ggplot(df, aes(xmin=as.Date(date),xmax=as.Date(date)+1,
ymin=as.POSIXct(start,format='%H:%M:%S'),
ymax=as.POSIXct(start,format='%H:%M:%S')+duration*3600,
fill=event))
p <- p+geom_rect(alpha = I(8/10))
p + ylab("time") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
... resulting in this:
This is pretty close to what I was aiming for.
I think I can deal with the potential overplotting issue by adjusting the alpha.
Ideally I'd like the y axis to include just a single day (00 to 00). To do this I guess I'll probably need to reformat the data such that events with durations that extend beyond midnight are reallocated to the next day. (Not sure how to do this in R.)
try this method. Probably it's different to what you planned but I think it's a quite clear way to show your data:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
library(reshape2)
df <- melt(df, id.vars="event")
df$value <- as.POSIXct(df$value, origin=as.Date("1970-01-01"))
df <- df[order(df$event, df$value),]
df$eventID <- rep(seq(1, nrow(df)/2, 1), each=2)
library(ggplot2)
ggplot(df) +
geom_line(aes(value, event, group=eventID, color=event))
Combining the benefits of: (i) y-axis containing a single ~24 hour period; (ii) events not overlapping; (iii) events labelled within the graph in addition to the legend; and (iv) concise code.
library(dplyr)
library(lubridate)
# Re-create data frame
df <- data_frame(date, start, duration, event) %>%
mutate(start_dt = as.POSIXct(paste(date, start), tz = 'UTC'),
start_hr = hour(start_dt),
end_dt = start_dt + duration * 3600,
end_hr = hour(end_dt) + (as.Date(end_dt) - as.Date(start_dt)) * 24)
# Plot
df %>% ggplot() +
geom_segment(aes(x = event, y = start_hr, xend = event, yend = end_hr,
color = event, size = 1)) +
facet_wrap(~ date, nrow = 1) +
guides(size = 'none')
Image of plot:

Changing alpha doesn't affect anything in ggplot2

I'm somewhat new to R and ggplot2 so this question is likely somewhat low-level. But I've done a fair amount of experimenting and found no answers online, so I thought I'd ask here.
When I add alpha to my graph, the graph appears as follows:
Some alpha
However, no matter how I change the value of alpha, I get no changes in the graph. I tried alpha=.9 and alpha=1/10000, and there was no difference whatsoever in the graph.
Yet it seems that the 'alpha' term is doing something. When I remove the 'alpha' from the code, I get the following graph:
No alpha
Here's my code. Thanks!
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y),out.format="mon-year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 7200)) + # adds tick marks every 2 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds, alpha=1/2, size=5),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(),
position=position_identity(),
)
print(p)
dev.off()
You need to put the alpha specification outside the mapping statement, as in
layer(
data=mydata,
mapping=aes(x=day, y=seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=1/2, size=5),
position=position_identity(),
)
I'm more used to expressing this somewhat more compactly as
geom_point(data=mydata,
mapping=aes(x=day, y=seconds),
alpha=1/2,size=5)
The rest of the excluded stuff represents default values, I believe ...
See also: Why does the ggplot legend show the "colour" parameter?

Resources