What is the smartest way to manipulate POSIX for use in ggplot axis?
I am trying to create a function for plotting many graphs (One per day) spanning a period of weeks, using POSIX time for the x axis.
To do so, I create an additional integer column DF$Day with the day, that I input into the function. Then, I create a subset using that day, which I plot using ggplot2. I figured how to use scale_x_datetime to format the POSIX x axis. Basically, I have it show the hours & minutes only, omitting the date.
Here is my question: How can I set the limits for each individual graph in hours of the day?
Below is some working, reproducible code to get an idea. It creates the first day, shows it for 3 seconds & the proceeds to create the second day. But, each days limits is chosen based on the range of the time variable. How can I make the range, for instance, all day long (0h - 24h)?
DF <- data.frame(matrix(ncol = 0, nrow = 4))
DF$time <- as.POSIXct(c("2010-01-01 02:01:00", "2010-01-01 18:10:00", "2010-01-02 04:20:00", "2010-01-02 13:30:00"))
DF$observation <- c(1,2,1,2)
DF$Day <- c(1,1,2,2)
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
print(ggplot( data=Day_subset, aes_string( x="time", y="observation") ) + geom_point() +
scale_x_datetime( breaks=("2 hour"), minor_breaks=("1 hour"), labels=date_format("%H:%M")))
Sys.sleep(3) }
Well, here's one way.
# ...
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
lower <- with(Day_subset,as.POSIXct(strftime(min(time),"%Y-%m-%d")))
upper <- with(Day_subset,as.POSIXct(strftime(as.Date(max(time))+1,"%Y-%m-%d"))-1)
limits = c(lower,upper)
print(ggplot( data=Day_subset, aes( x=time, y=observation) ) +
geom_point() +
scale_x_datetime( breaks=("2 hour"),
minor_breaks=("1 hour"),
labels=date_format("%H:%M"),
limits=limits)
)
}
The calculation for lower takes the minimum time in the subset and coerces it to character with only the date part (e.g., strips away the time part). Converting back to POSIXct generates the beginning of that day.
The calculation for upper is a little more complicated. You have to convert the maximum time to a Date value and add 1 (e.g., 1 day), then convert to character (strip off the time part), convert back to POSIXct, and subtract 1 (e.g., 1 second). This generates 23:59 on the end day.
Huge amount of work for such a small thing. I hope someone else posts a simpler way to do this...
Related
In ggplot2, I have a question about appropriate scales for making POSIXct datetimes into time-of-day in an axis. Consider:
library(tidyverse)
library(lubridate)
library(hms)
library(patchwork)
test <- tibble(
dates = c(ymd_hms("2022-01-01 6:00:00"),
ymd_hms("2023-01-01 19:00:00")),
x = c(1, 2),
hms_dates = as_hms(dates)
)
plot1 <- ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_time()
plot2 <- ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time()
plot1 + plot2
Plot 1 y axis includes dates and time, but Plot 2 shows just time of day. That's what I want! I'd like to generate plot 2 like images without having to use the hms::as_hms approach. This seems to imply some options for scale_y_datetime (or similar) that I can't discover. I'd welcome suggestions.
Does someone have an example of how to use the limits option in scale_*_time, or (see question #1) limits for a scale_y_datetime that specifies hours within the day, e.g. .. limits(c(8,22)) predictably fails.
For your second question, when dealing with dates or datetimes or times you have to set the limits and/or breaks as dates, datetimes or times too, i.e. use limits = as_hms(c("8:00:00", "22:00:00"):
library(tidyverse)
library(lubridate)
library(hms)
ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time(limits = as_hms(c("8:00:00", "22:00:00")))
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
Concerning your first question. TBMK this could not be achieved via scale_..._datetime. And if you just want to show the time part of your dates then converting to an has object is IMHO the easiest way to achieve that. You could of course set the units to be shown as axis text via the date_labels argument, e.g. date_labels="%H:%M:%S" to show only the time of day. However, as your dates variable is still a datetime the scale, breaks and limits will still reflect that, i.e. you only change the format of the labels and for your example data you end up with an axis showing the same time for each break, i.e. the start of the day.
ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_datetime(date_labels = "%H:%M:%S")
I have a dataset with events. These events have a start time and a duration. I want to create a scatter plot with the start time on the x-axis and the duration on the y-axis, but I want to alter the x-axis so that it displays the course of a week. That is, I want the x-axis to start on Monday 00:00 and run through Sunday 23:59.
All the solutions I've found online show me how to perform group-by-and-sum over weekdays, which is not what I want to do. I want to plot all data points individually, I simply want to reduce the date-axis to weekday and time.
Any suggestions?
This does what you need. What it does is to create a new variable by putting every observation in one week, and then generate a scatter plot in a necessary format.
library(lubridate)
library(dplyr)
set.seed(1)
tmp <- data.frame(st_time = mdy("01-01-2018") + minutes(sample(1e5, size = 100)))
tmp <- tmp %>%
mutate(st_week = floor_date(st_time, unit = 'week')) %>% # calculate the start of week
mutate(st_time_inweek = st_time - st_week) %>% # calculate the time elapsed from the start of the week
mutate(st_time_all_in_oneweek = st_week[1] + st_time_inweek) %>% # put every obs in one week
mutate(duration = runif(100, 0, 100)) # generate a random duration variable
This is how to generate the plot. The part "%a %H:%M:%S" could be just "%a" as the time portion is not informative.
library(ggplot2)
ggplot(tmp) + aes(x = st_time_all_in_oneweek, y = duration) +
geom_point() + scale_x_datetime(date_labels = "%a %H:%M:%S", date_breaks = "1 day")
With "%a" the plot look like this:
Maybe late, but for others searching:
there is a solution with
scale_x_date(date_labels = '%a')
described here: Weekdays below date on x-axis in ggplot2
This question already has answers here:
Time series plot gets offset by 2 hours if scale_x_datetime is used
(2 answers)
Closed 6 years ago.
I am trying to plot a time series that has an x axis of class "POSIXct" using ggplot2, which is working up to a point.
When I try to manipulate the x axis breaks and labels using scale_x_datetime it generates a one month shift in the x axis.
Can anyone explain this, and provide a solution?
Example simplified code:
start <- as.POSIXct("2014/07/01 00:00:00")
end <- as.POSIXct("2014/10/01 23:30:00")
interval <- as.difftime("00:30:00")
df <- data.frame(t=seq(start, end, by="1 day"))
df$v <- sample(1:100, replace=TRUE, nrow(df))
p <- ggplot(data=df, aes(x=t)) +
geom_line(aes(y=v))
p2 <- p + scale_x_datetime(breaks=date_breaks("1 month"), labels=date_format("%b-%y"))
It's a time zone issue. date_format sets the time zone to "UTC" by default and internally calls format.POSIXct which calls as.POSIXlt internally. There this happens:
as.POSIXlt(start, "UTC")
#[1] "2014-06-30 22:00:00 UTC"
Voilà, a different month.
You can avoid this by not changing the time zone:
p + scale_x_datetime(breaks=date_breaks("1 month"),
labels=date_format("%b-%y", tz = Sys.timezone(location = TRUE)))
If you explicitly defined a time zone (you should) when creating the POSIXct variable, you should pass this time zone here.
Suppose I have a vector of numbers from 1:12 and want to plot them over period of time ranged from Jan. 2013 to Dec. 2013. I used the following code to generate the data and plotting:
dates<-seq(as.Date("2013/1/1"), by = "month", length.out = 12)
n<-seq(1:12)
df<-cbind(dates,n)
plot(df)
However, some problems come up with the last code; Firstly could not find an option in the first seq to generate only months and year without day. Secondly, all dates in df become serial even after adding as.Date before dates in cbind. Finally, the x axis in the plot not in the time format as a result of the last two problems.
just use
plot(dates,n)
without cbinding it. cbind creates a matrix (see class(df)). Within this process the dates are saved as class numeric.
For nicer and easier to customize plots use
require(ggplot2)
qplot(dates,n) + xlab("") + ylab("my y lab")
df<-data.frame(dates=dates,n=n)
plot(df$dates, df$n, axes=FALSE)
axis(1, labels=format(df$dates, "%b %Y"), at=df$dates)
axis(2)
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.