scale_datetime shifts x axis [duplicate] - r

This question already has answers here:
Time series plot gets offset by 2 hours if scale_x_datetime is used
(2 answers)
Closed 6 years ago.
I am trying to plot a time series that has an x axis of class "POSIXct" using ggplot2, which is working up to a point.
When I try to manipulate the x axis breaks and labels using scale_x_datetime it generates a one month shift in the x axis.
Can anyone explain this, and provide a solution?
Example simplified code:
start <- as.POSIXct("2014/07/01 00:00:00")
end <- as.POSIXct("2014/10/01 23:30:00")
interval <- as.difftime("00:30:00")
df <- data.frame(t=seq(start, end, by="1 day"))
df$v <- sample(1:100, replace=TRUE, nrow(df))
p <- ggplot(data=df, aes(x=t)) +
geom_line(aes(y=v))
p2 <- p + scale_x_datetime(breaks=date_breaks("1 month"), labels=date_format("%b-%y"))

It's a time zone issue. date_format sets the time zone to "UTC" by default and internally calls format.POSIXct which calls as.POSIXlt internally. There this happens:
as.POSIXlt(start, "UTC")
#[1] "2014-06-30 22:00:00 UTC"
VoilĂ , a different month.
You can avoid this by not changing the time zone:
p + scale_x_datetime(breaks=date_breaks("1 month"),
labels=date_format("%b-%y", tz = Sys.timezone(location = TRUE)))
If you explicitly defined a time zone (you should) when creating the POSIXct variable, you should pass this time zone here.

Related

Using ggplot scale_x_datetime() to set first date on x axis [duplicate]

This question already has an answer here:
Specification of first and last tick marks with scale_x_date
(1 answer)
Closed 10 months ago.
I've got a plot of water levels over two years. I have a column of date time (format POSIXct and displayed like 2020-03-05 17:00:00). There are 18,000 rows of data in the dataframe. The first record was the 5 March 2020. When I plot the data, the first date that appears on the x axis is Jul-20. How do I make this Mar-20?
This is the bit of code I have at the moment relevant to this part of the plot design.
scale_x_datetime(
date_breaks="6 months",
date_labels="%b-%y")
Heres an example of the plot.
NOTE: this is also part of ongoing data collection, so I don't want to hard code the last date.
library(ggplot2)
dat <- data.frame(date = seq(as.Date("2020-03-05 17:00:00",
format = "%Y-%m-%d %H:%M:%S"),
as.Date("2022-03-05 17:00:00",
format = "%Y-%m-%d %H:%M:%S"),
length=18000))
dat$x <- rnorm(18000)
dat$date <- as.POSIXct(dat$date)
ggplot(dat, aes(x=date, y=x)) +
geom_line() +
theme_classic() +
scale_x_datetime(breaks= seq(min(dat$date), max(dat$date), length=6),
date_labels="%b-%y")
Created on 2022-05-03 by the reprex package (v2.0.1)
scale_x_date(breaks = seq(as.Date("2020-03-05"), as.Date(Sys.Date( )), by = "6 months"), date_labels="%b-%y")

X axis custom intervals when posixct data type using axis.Date()

I have a data set where the X axis variable is Date / Time data. I find the following syntax works when the X axis variable is a Date but when it is a date-time it does not seem to work.
What I wanted was the X axis here to have (say) weekly labels.
Any ideas how to make this work in plot(). Don't want to switch to ggplot etc.
This does not work:
plot(x = data$Time,y=data$foobar,
xlab = "Date / Time",
ylab = "y-foo-bar",main = "foo",xaxt="n")
axis.Date(1,data$Time,
at=seq(as.POSIXct("2020-04-01 16:36:00 IST"),
as.POSIXct("2020-05-01 16:36:00 IST"),by="weeks"))
Nor this:
axis.Date(1,data$Time,at=seq(as.Date("2020/04/01"),
as.Date("2020/05/01"),by="weeks"))
For more context:
class(data$Time)
[1] "POSIXct" "POSIXt"
data$Time[500]
[1] "2020-03-24 08:18:00 IST"
I would define ticks mark positions and labels beforehand. Use axis without labels then mtext, since with its las= argument you're able to rotate the labels.
Using strftime you may extract weeks (or else, just lookup ?strftime) and subset the time points.
weeks <- strftime(dat$time, "%W")
ats <- dat$time[!duplicated(weeks)]
labs <- strftime(ats, "%m-%d")
with(dat, plot(x=time, y=x, type="l", main="foo", xaxt="n"))
axis(1, at=ats, labels=FALSE)
mtext(labs, side=1, line=.75, at=ats, las=2)
I omitted the year, since it might be redundant information. You could also omit the month by using two mtexts in different lines and also omit the duplicates.
Data
set.seed(33720)
n <- 100
dat <- data.frame(time=seq(1585034280, (1585034280 + n*24*60*60), length.out=n),
x=cumsum(rexp(n)))
dat$time <- as.POSIXct(dat$time, origin="1970-01-01")

plot(x,y) with strptime doesn't work with singular data point

I am trying to plot a scatterplot in shiny, using times of day on the y axis.
When plotting multiple points, the y axis looks great.
Here's the code:
output$outputPlot <- renderPlot({
coords <- subset(coords, location == input$cities)
month <- coords$month
time <- strptime(coords$format_time, format = "%l:%M:%S %p")
plot(month, time)
})
But when there's only 1 data point in coords, the plot's time scale on the y-axis isn't in terms of time anymore, and the data point appears in the middle of the graph.
Thanks for your help!
What you're seeing is that R doesn't know how to guess an appropriate range for a single point. Typically it expands the range of data by 4% of the range (look at ?par and look for 'xaxs'), but with a single point that means nothing.
So we need to tell it what ylim to use. (Similarly, your x-axis needs some guidance, too.)
Fake data:
set.seed(2)
coords <- data.frame(tm = Sys.time() + runif(20, -3e7, 3e7))
coords$month <- as.integer(format(coords$tm, format = "%m"))
coords$format_time <- format(coords$tm, format = "%l:%M:%S %p")
head(coords)
# tm month format_time
# 1 2018-10-24 20:15:17 10 8:15:17 PM
# 2 2019-10-19 05:07:04 10 5:07:04 AM
# 3 2019-07-21 14:19:22 7 2:19:22 PM
# 4 2018-10-13 03:44:57 10 3:44:57 AM
# 5 2020-04-03 21:32:22 4 9:32:22 PM
# 6 2020-04-03 15:27:59 4 3:27:59 PM
The "normal" plot looks fine:
month <- coords$month
time <- strptime(coords$format_time, format = "%l:%M:%S %p")
plot(month, time)
but the single-point does not:
sub <- coords[1,]
month <- sub$month
time <- strptime(sub$format_time, format = "%l:%M:%S %p")
plot(month, time)
So we fix it by specifying the xlim and ylim arguments. In this case, since I'm inferring it is meant to be a year of months (x) and a day of times (y), I can hard-code them, but in other situations you might want to just substract/add a small amount from the one datum you have:
sub <- coords[1,]
month <- sub$month
time <- strptime(sub$format_time, format = "%l:%M:%S %p")
xlim <- c(1, 12)
ylim <- strptime(c("12:00:00 AM", "11:59:59 PM"), format = "%l:%M:%S %p")
plot(month, time, xlim = xlim, ylim = as.numeric(ylim))
You only need to specify ylim to answer this question, but without setting xlim= here, the previous x-axis spanned 6-14, not good for months. Also of note is that I had to coerce ylim to numeric for the plot, it did not work with ylim in its pure POSIXt form ... not sure exactly why that is the case, but this doesn't detract from the utility of the plot in general.

How to plot a variable over time with time as rownames

I am trying to plot a time series in ggplot2. Assume I am using the following data structure (2500 x 20 matrix):
set.seed(21)
n <- 2500
x <- matrix(replicate(20,cumsum(sample(c(-1, 1), n, TRUE))),nrow = 2500,ncol=20)
aa <- x
rnames <- seq(as.Date("2010-01-01"), length=dim(aa)[1], by="1 month") - 1
rownames(aa) <- format(as.POSIXlt(rnames, format = "%Y-%m-%d"), format = "%d.%m.%Y")
colnames(aa) <- paste0("aa",1:k)
library("ggplot2")
library("reshape2")
library("scales")
aa <- melt(aa, id.vars = rownames(aa))
names(aa) <- c("time","id","value")
Now the following command to plot the time series produces a weird looking x axis:
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()
What I found out is that I can change the format to date:
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()
This looks better, but still not a good graph. My question is especially how to control the formatting of the x axis.
Does it have to be in Date format? How can I control the amount of breaks (i.e. years) shown in either case? It seems to be mandatory if Date is not used; otherwise ggplot2 uses some kind of useful default for the breaks I believe.
For example the following command does not work:
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line() +
scale_x_continuous(breaks=pretty_breaks(n=10))
Also if you got any hints how to improve the overall look of the graph feel free to add (e.g. the lines look a bit inprecise imho).
You can format dates with scale_x_date as #Gopala mentioned. Here's an example using a shortened version of your data for illustration.
library(dplyr)
# Dates need to be in date format
aa$time <- as.Date(aa$time, "%d.%m.%Y")
# Shorten data to speed rendering
aa = aa %>% group_by(id) %>% slice(1:200)
In the code below, we get date breaks every six months with date_breaks="6 months". That's probably more breaks than you want in this case and is just for illustration. If you want to determine which months get the breaks (e.g., Jan/July, Feb/Aug, etc.) then you also need to use coord_cartesian and set the start date with xlim and expand=FALSE so that ggplot won't pad the start date. But when you set expand=FALSE you also don't get any padding on the y-axis, so you need to add the padding manually with scale_y_continuous (I'd prefer to be able to set expand separately for the x and y axes, but AFAIK it's not possible). Because the breaks are packed tightly, we use a theme statement to rotate the labels by 90 degrees.
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line(show.legend=FALSE) +
scale_y_continuous(limits=c(min(aa$value) - 2, max(aa$value) + 1)) +
scale_x_date(date_breaks="6 months",
labels=function(d) format(d, "%b %Y")) +
coord_cartesian(xlim=c(as.Date("2009-07-01"), max(aa$time) + 182),
expand=FALSE) +
theme_bw() +
theme(axis.text.x=element_text(angle=-90, vjust=0.5))

R - ggplot2 - How to use limits on POSIX axis?

What is the smartest way to manipulate POSIX for use in ggplot axis?
I am trying to create a function for plotting many graphs (One per day) spanning a period of weeks, using POSIX time for the x axis.
To do so, I create an additional integer column DF$Day with the day, that I input into the function. Then, I create a subset using that day, which I plot using ggplot2. I figured how to use scale_x_datetime to format the POSIX x axis. Basically, I have it show the hours & minutes only, omitting the date.
Here is my question: How can I set the limits for each individual graph in hours of the day?
Below is some working, reproducible code to get an idea. It creates the first day, shows it for 3 seconds & the proceeds to create the second day. But, each days limits is chosen based on the range of the time variable. How can I make the range, for instance, all day long (0h - 24h)?
DF <- data.frame(matrix(ncol = 0, nrow = 4))
DF$time <- as.POSIXct(c("2010-01-01 02:01:00", "2010-01-01 18:10:00", "2010-01-02 04:20:00", "2010-01-02 13:30:00"))
DF$observation <- c(1,2,1,2)
DF$Day <- c(1,1,2,2)
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
print(ggplot( data=Day_subset, aes_string( x="time", y="observation") ) + geom_point() +
scale_x_datetime( breaks=("2 hour"), minor_breaks=("1 hour"), labels=date_format("%H:%M")))
Sys.sleep(3) }
Well, here's one way.
# ...
for (Individual_Day in 1:2) {
Day_subset <- DF[DF$Day == as.integer(Individual_Day),]
lower <- with(Day_subset,as.POSIXct(strftime(min(time),"%Y-%m-%d")))
upper <- with(Day_subset,as.POSIXct(strftime(as.Date(max(time))+1,"%Y-%m-%d"))-1)
limits = c(lower,upper)
print(ggplot( data=Day_subset, aes( x=time, y=observation) ) +
geom_point() +
scale_x_datetime( breaks=("2 hour"),
minor_breaks=("1 hour"),
labels=date_format("%H:%M"),
limits=limits)
)
}
The calculation for lower takes the minimum time in the subset and coerces it to character with only the date part (e.g., strips away the time part). Converting back to POSIXct generates the beginning of that day.
The calculation for upper is a little more complicated. You have to convert the maximum time to a Date value and add 1 (e.g., 1 day), then convert to character (strip off the time part), convert back to POSIXct, and subtract 1 (e.g., 1 second). This generates 23:59 on the end day.
Huge amount of work for such a small thing. I hope someone else posts a simpler way to do this...

Resources