I am using the fasttime package for its fastPOSIXct function that can read character datetimes very efficiently. My problem is that it can only read character datetimes THAT ARE EXPRESSED IN GMT.
R) fastPOSIXct("2010-03-15 12:37:17.223",tz="GMT") #very fast
[1] "2010-03-15 12:31:16.223 GMT"
R) as.POSIXct("2010-03-15 12:37:17.223",tz="GMT") #very slow
[1] "2010-03-15 12:31:16.223 GMT"
Now, say I have a file with datetimes expressed in "America/Montral" timezone, the plan is to load them (implicitely pretending they are in GMT) and modifying subsequently the timezone attribute without changing the underlying value.
If I use this function, refered in another post:
forceTZ = function(x,tz){
return(as.POSIXct(as.numeric(x), origin=as.POSIXct("1970-01-01",tz=tz), tz=tz))
}
I am seeing a bug ...
R) forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 13:37:17.223 EDT"
... because I would like it to be
R) as.POSIXct("2010-03-15 12:37:17.223",format="%Y-%m-%d %H:%M:%OS",tz="America/Montreal")
[1] "2010-03-15 12:37:17.223 EDT"
Is there a workaround ?
EDIT: I know about lubridate::force_tz but it is too slow (not point using fasttime::fastPOSIXct anymore )
The smart thing to do here is almost certainly to write readable, easy-to-maintain code, and throw more hardware at the problem if your code is too slow.
If you are desperate for a code speedup, then you could write a custom time-zone adjustment function. It isn't pretty, so if you have to convert between many time zones, you'll end up with spaghetti code. Here's my solution for the specific case of converting from GMT to Montreal time.
First precompute a list of dates for daylight savings time. You'll need to extend this to before 2010/after 2013 in order to fit your dataset. I found the dates here
http://www.timeanddate.com/worldclock/timezone.html?n=165
montreal_tz_data <- cbind(
start = fastPOSIXct(
c("2010-03-14 07:00:00", "2011-03-13 07:00:00", "2012-03-11 07:00:00", "2013-03-10 07:00:00")
),
end = fastPOSIXct(
c("2010-11-07 06:00:00", "2011-11-06 06:00:00", "2012-11-04 06:00:00", "2013-11-03 06:00:00")
)
)
For speed, the function to change time zones treats the times as numbers.
to_montreal_tz <- function(x)
{
x <- as.numeric(x)
is_dst <- logical(length(x)) #initialise as FALSE
#Loop over DST periods in each year
for(row in seq_len(nrow(montreal_tz_data)))
{
is_dst[x > montreal_tz_data[row, 1] & x < montreal_tz_data[row, 2]] <- TRUE
}
#Hard-coded numbers are 4/5 hours in seconds
ans <- ifelse(is_dst, x + 14400, x + 18000)
class(ans) <- c("POSIXct", "POSIXt")
ans
}
Now, to compare times:
#A million dates
ch <- rep("2010-03-15 12:37:17.223", 1e6)
#The easy way (no conversion of time zones afterwards)
system.time(as.POSIXct(ch, tz="America/Montreal"))
# user system elapsed
# 28.96 0.05 29.00
#A slight performance gain by specifying the format
system.time(as.POSIXct(ch, format = "%Y-%m-%d %H:%M:%S", tz="America/Montreal"))
# user system elapsed
# 13.77 0.01 13.79
#Using the fast functions
library(fasttime)
system.time(to_montreal_tz(fastPOSIXct(ch)))
# user system elapsed
# 0.51 0.02 0.53
As with all optimisation tricks, you've either got a 27-fold speedup (yay!) or you've saved 13 seconds processing time but added 3 days of code-maintenance time from an obscure bug when you DST table runs out in 2035 (boo!).
It's a daylight savings issue: http://www.timeanddate.com/time/dst/2010a.html
In 2010 it began on the 14th March in Canada, but not until the 28th March in the UK.
You can use POSIXlt objects to modify timezones directly:
lt <- as.POSIXlt(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"))
attr(lt,"tzone") <- "America/Montreal"
as.POSIXct(lt)
[1] "2010-03-15 12:37:17 EDT"
Or you could use format to convert to a string and set the timezone in a call to as.POSIXct. You can therefore modify forceTZ:
forceTZ <- function(x,tz)
{
return(as.POSIXct(format(x),tz=tz))
}
forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 12:37:17 EDT"
Could you not just add the appropriate number of seconds to correct the offset from GMT?
# Original problem
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal")
# [1] "2010-03-15 08:37:17 EDT"
# Add 4 hours worth of seconds to the data. This should be very quick.
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal") + 14400
# [1] "2010-03-15 12:37:17 EDT"
Related
I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.
While doing predicting modeling on timestamped data, I want to write a function in R (possibly using data.table) that rounds the date by X number of hours. E.g. rounding by 2 hours should give this:
"2014-12-28 22:59:00 EDT" becomes "2014-12-28 22:00:00 EDT"
"2014-12-28 23:01:00 EDT" becomes "2014-12-29 00:00:00 EDT"
It's very easy to do when you round by 1 hour - using round.POSIXt(.date, "hour") function.
Writing a generic function, like I'm doing below using multiple if statements, becomes quite ugly however:
d7.dateRoundByHour <- function (.date, byHours) {
if (byHours == 1)
return (round.POSIXt(.date, "hour"))
hh = hour(.date); dd = mday(.date); mm = month(.date); yy = year(.date)
hh = round(hh/byHours,digits=0) * byHours
if (hh>=24) {
hh=0; dd=dd+1
}
if ((mm==2 & dd==28) |
(mm %in% c(1,3,5,7,8,10,12) & dd==31) |
(mm %in% c(2,4,6,9,11) & dd==30)) { # NB: it won't work on 29 Feb leap year.
dd=1; mm=mm+1
}
if (mm==13) {
mm=1; yy=yy+1
}
str = sprintf("%i-%02.0f-%02.0f %02.0f:%02.0f:%02.0f EDT", yy,mm,dd, hh,0,0)
as.POSIXct(str, format="%Y-%m-%d %H:%M:%S")
}
Anyone can show a better way to do that?
(perhaps by converting to numeric and back to POSIXt or some other POSIXt functions?)
Use the round_date function from the lubridate package. Assuming you had a data.table with a column named date you could do the following:
dt[, date := round_date(date, '2 hours')]
A quick example will give you exactly the results you were looking for:
x <- as.POSIXct("2014-12-28 22:59:00 EDT")
round_date(x, '2 hours')
This is actually really easy with just base R. The basic idea for round by "odd lots" that you
scale down by an appropriate scale factor
round down to integer in the downscaled unit
scale back up and re-convert
Or in two R code statements:
R> pt <- as.POSIXct(c("2014-12-28 22:59:00", "2014-12-28 23:01:00 EDT"))
R> pt # just to check
[1] "2014-12-28 22:59:00 CST" "2014-12-28 23:01:00 CST"
R>
R> scalefactor <- 60*60*2 # 2 hours of 60 minutes times 60 seconds
R>
R> as.POSIXct(round(as.numeric(pt)/scalefactor) * scalefactor, origin="1970-01-01")
[1] "2014-12-28 22:00:00 CST" "2014-12-29 00:00:00 CST"
R>
The key last line just does what I outlined: convert the POSIXct to a numeric representation, scales it down, then rounds before scaling back up and converting to a POSIXct again.
I have a POSIXct object and would like to change it's tz attribute WITHOUT R to interpret it (interpret it would mean to change how the datetime is displayed on the screen).
Some background: I am using the fasttime package from S.Urbanek, which take strings and cast it to POSIXct very quickly. Problem is that the string should represent a datetime in "GMT" and it's not the case of my data.
I end up with a POSIXct object with tz=GMT, in reality it is tz=GMT+1, if I change the timezone with
attr(datetime, "tzone") <- "Europe/Paris";
datetime <- .POSIXct(datetime,tz="Europe/Paris");
then it will be "displayed" as GMT+2 (the underlying value never change).
EDIT: Here is an example
datetime=as.POSIXct("2011-01-01 12:32:23.234",tz="GMT")
attributes(datetime)
#$tzone
#[1] "GMT"
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
How can I change this attribute without R to interpret it aka how can I change tzone and still have datetime displayed as "2011-01-01 12:32:23.233" ?
EDIT/SOLUTION, #GSee's solution is reasonably fast, lubridate::force_tz very slow
datetime=rep(as.POSIXct("2011-01-01 12:32:23.234",tz="GMT"),1e5)
f <- function(x,tz) return(as.POSIXct(as.numeric(x), origin="1970-01-01", tz=tz))
> system.time(datetime2 <- f(datetime,"Europe/Paris"))
user system elapsed
0.01 0.00 0.02
> system.time(datetime3 <- force_tz(datetime,"Europe/Paris"))
user system elapsed
5.94 0.02 5.98
identical(datetime2,datetime3)
[1] TRUE
To change the tz attribute of a POSIXct variable it is not best practice to convert to character or numeric and then back to POSIXct. Instead you could use the force_tz function of the lubridate package
library(lubridate)
datetime2 <- force_tz(datetime, tzone = "CET")
datetime2
attributes(datetime2)
EDITED:
My previous solution was passing a character value to origin (i.e.origin="1970-01-01"). That only worked here because of a bug (#PR14973) that has now been fixed in R-devel.
origin was being coerced to POSIXct using the tz argument of the as.POSIXct call, and not "GMT" as it was documented to do. The behavior has been changed to match the documentation which, in this case, means that you have to specify your timezone for both the origin and the as.POSIXct call.
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
as.POSIXct(as.numeric(datetime), origin=as.POSIXct("1970-01-01", tz="Europe/Paris"),
tz="Europe/Paris")
#[1] "2011-01-01 12:32:23.233 CET"
This will also works in older versions of R.
An alternative to the lubridate package is via conversion to and back from character type:
recastTimezone.POSIXct <- function(x, tz) return(
as.POSIXct(as.character(x), origin = as.POSIXct("1970-01-01"), tz = tz))
(Adapted from GSee's answer)
Don't know if this is efficient, but it would work for time zones with daylight savings.
Test code:
x <- as.POSIXct('2003-01-03 14:00:00', tz = 'Etc/UTC')
x
recastTimezone.POSIXct(x, tz = 'Australia/Melbourne')
Output:
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 14:00:00 AEDT" # Nothing is changed apart from the time zone.
Output if I replaced as.character() by as.numeric() (as GSee had done):
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 15:00:00 AEDT" # An hour is added.
How would I extract the time from a series of POSIXct objects discarding the date part?
For instance, I have:
times <- structure(c(1331086009.50098, 1331091427.42461, 1331252565.99979,
1331252675.81601, 1331262597.72474, 1331262641.11786, 1331269557.4059,
1331278779.26727, 1331448476.96126, 1331452596.13806), class = c("POSIXct",
"POSIXt"))
which corresponds to these dates:
"2012-03-07 03:06:49 CET" "2012-03-07 04:37:07 CET"
"2012-03-09 01:22:45 CET" "2012-03-09 01:24:35 CET"
"2012-03-09 04:09:57 CET" "2012-03-09 04:10:41 CET"
"2012-03-09 06:05:57 CET" "2012-03-09 08:39:39 CET"
"2012-03-11 07:47:56 CET" "2012-03-11 08:56:36 CET"
Now, I have some values for a parameter measured at those times:
val <- c(1.25343125e-05, 0.00022890575,
3.9269125e-05, 0.0002285681875,
4.26353125e-05, 5.982625e-05,
2.09575e-05, 0.0001516951251,
2.653125e-05, 0.0001021391875)
I would like to plot val vs time of the day, irrespectively of the specific day when val was measured.
Is there a specific function that would allow me to do that?
You can use strftime to convert datetimes to any character format:
> t <- strftime(times, format="%H:%M:%S")
> t
[1] "02:06:49" "03:37:07" "00:22:45" "00:24:35" "03:09:57" "03:10:41"
[7] "05:05:57" "07:39:39" "06:47:56" "07:56:36"
But that doesn't help very much, since you want to plot your data. One workaround is to strip the date element from your times, and then to add an identical date to all of your times:
> xx <- as.POSIXct(t, format="%H:%M:%S")
> xx
[1] "2012-03-23 02:06:49 GMT" "2012-03-23 03:37:07 GMT"
[3] "2012-03-23 00:22:45 GMT" "2012-03-23 00:24:35 GMT"
[5] "2012-03-23 03:09:57 GMT" "2012-03-23 03:10:41 GMT"
[7] "2012-03-23 05:05:57 GMT" "2012-03-23 07:39:39 GMT"
[9] "2012-03-23 06:47:56 GMT" "2012-03-23 07:56:36 GMT"
Now you can use these datetime objects in your plot:
plot(xx, rnorm(length(xx)), xlab="Time", ylab="Random value")
For more help, see ?DateTimeClasses
The data.table package has a function 'as.ITime', which can do this efficiently use below:
library(data.table)
x <- "2012-03-07 03:06:49 CET"
as.IDate(x) # Output is "2012-03-07"
as.ITime(x) # Output is "03:06:49"
There have been previous answers that showed the trick. In essence:
you must retain POSIXct types to take advantage of all the existing plotting functions
if you want to 'overlay' several days worth on a single plot, highlighting the intra-daily variation, the best trick is too ...
impose the same day (and month and even year if need be, which is not the case here)
which you can do by overriding the day-of-month and month components when in POSIXlt representation, or just by offsetting the 'delta' relative to 0:00:00 between the different days.
So with times and val as helpfully provided by you:
## impose month and day based on first obs
ntimes <- as.POSIXlt(times) # convert to 'POSIX list type'
ntimes$mday <- ntimes[1]$mday # and $mon if it differs too
ntimes <- as.POSIXct(ntimes) # convert back
par(mfrow=c(2,1))
plot(times,val) # old times
plot(ntimes,val) # new times
yields this contrasting the original and modified time scales:
Here's an update for those looking for a tidyverse method to extract hh:mm::ss.sssss from a POSIXct object. Note that time zone is not included in the output.
library(hms)
as_hms(times)
Many solutions have been provided, but I have not seen this one, which uses package chron:
hours = times(strftime(times, format="%T"))
plot(val~hours)
(sorry, I am not entitled to post an image, you'll have to plot it yourself)
I can't find anything that deals with clock times exactly, so I'd just use some functions from package:lubridate and work with seconds-since-midnight:
require(lubridate)
clockS = function(t){hour(t)*3600+minute(t)*60+second(t)}
plot(clockS(times),val)
You might then want to look at some of the axis code to figure out how to label axes nicely.
The time_t value for midnight GMT is always divisible by 86400 (24 * 3600). The value for seconds-since-midnight GMT is thus time %% 86400.
The hour in GMT is (time %% 86400) / 3600 and this can be used as the x-axis of the plot:
plot((as.numeric(times) %% 86400)/3600, val)
To adjust for a time zone, adjust the time before taking the modulus, by adding the number of seconds that your time zone is ahead of GMT. For example, US central daylight saving time (CDT) is 5 hours behind GMT. To plot against the time in CDT, the following expression is used:
plot(((as.numeric(times) - 5*3600) %% 86400)/3600, val)
What's the best way to get the length of time represented by an interval in lubridate, in specified units? All I can figure out is something like the following messy thing:
> ival
[1] 2011-01-01 03:00:46 -- 2011-10-21 18:33:44
> difftime(attr(ival, "start") + as.numeric(ival), attr(ival, "start"), 'days')
Time difference of 293.6479 days
(I also added this as a feature request at https://github.com/hadley/lubridate/issues/105, under the assumption that there's no better way available - but maybe someone here knows of one.)
Update - apparently the difftime function doesn't handle this either. Here's an example.
> (d1 <- as.POSIXct("2011-03-12 12:00:00", 'America/Chicago'))
[1] "2011-03-12 12:00:00 CST"
> (d2 <- d1 + days(1)) # Gives desired result
[1] "2011-03-13 12:00:00 CDT"
> (i2 <- d2 - d1)
[1] 2011-03-12 12:00:00 -- 2011-03-13 12:00:00
> difftime(attr(i2, "start") + as.numeric(i2), attr(i2, "start"), 'days')
Time difference of 23 hours
As I mention below, I think one nice way to handle this would be to implement a /.interval function that doesn't first cast its input to a period.
The as.duration function is what lubridate provides. The interval class is represented internally as the number of seconds from the start, so if you wanted the number of hours you could simply divide as.numeric(ival) by 3600, or by (3600*24) for days.
If you want worked examples of functions applied to your object, you should provide the output of dput(ival). I did my testing on the objects created on the help(duration) page which is where ?interval sent me.
date <- as.POSIXct("2009-03-08 01:59:59") # DST boundary
date2 <- as.POSIXct("2000-02-29 12:00:00")
span <- date2 - date #creates interval
span
#[1] 2000-02-29 12:00:00 -- 2009-03-08 01:59:59
str(span)
#Classes 'interval', 'numeric' atomic [1:1] 2.85e+08
# ..- attr(*, "start")= POSIXct[1:1], format: "2000-02-29 12:00:00"
as.duration(span)
#[1] 284651999s (9.02y)
as.numeric(span)/(3600*24)
#[1] 3294.583
# A check against the messy method:
difftime(attr(span, "start") + as.numeric(span), attr(span, "start"), 'days')
# Time difference of 3294.583 days
This question is really old, but I'm adding an update because this question has been viewed many times and when I needed to do something like this today, I found this page. In lubridate you can now do the following:
d1 <- ymd_hms("2011-03-12 12:00:00", tz = 'America/Chicago')
d2 <- ymd_hms("2011-03-13 12:00:00", tz = 'America/Chicago')
(d1 %--% d2)/dminutes(1)
(d1 %--% d2)/dhours(1)
(d1 %--% d2)/ddays(1)
(d1 %--% d2)/dweeks(1)
Ken, Dividing by days(1) will give you what you want. Lubridate doesn't coerce periods to durations when you divide intervals by periods. (Although the algorithm for finding the exact number of whole periods in the interval does begin with an estimate that uses the interval divided by the analagous number of durations, which might be what you are noticing).
The end result is the number of whole periods that fit in the interval. The warning message alerts the user that it is an estimate because there will be some fraction of a period that is dropped from the answer. Its not sensible to do math with a fraction of a period since we can't modify a clock time with it unless we convert it to multiples of a shorter period - but there won't be a consistent way to make the conversion. For example, the day you mention would be equal to 23 hours, but other days would be equal to 24 hours. You are thinking the right way - periods are an attempt to respect the variations caused by DST, leap years, etc. but they only do this as whole units.
I can't reproduce the error in subtraction that you mention above. It seems to work for me.
three <- force_tz(ymd_hms("2011-03-12 12:00:00"), "")
# note: here in TX, "" *is* CST
(four <- three + days(1))
> [1] "2011-03-13 12:00:00 CDT"
four - days(1)
> [1] "2011-03-12 12:00:00 CST"
Be careful when divinding time in seconds to obtain days as then you are no longer working with abstract representations of time but in bare numbers, which can lead to the following:
> date_f <- now()
> date_i <- now() - days(23)
> as.duration(date_f - date_i)/ddays(1)
[1] 22.95833
> interval(date_i,date_f)/ddays(1)
[1] 22.95833
> int_length(interval(date_i,date_f))/as.numeric(ddays(1))
[1] 22.95833
Which leads to consider that days or months are events in a calendar, not time amounts that can be measured in seconds, miliseconds, etc.
The best way to calculate differences in days is avoiding the transformation into seconds and work with days as a unit:
> e <- now()
> s <- now() - days(23)
> as.numeric(as.Date(s))
[1] 18709
> as.numeric(as.Date(e) - as.Date(s))
[1] 23
However, if you are considering a day as a pure 86400 seconds time span, as ddays() does, the previous approach can lead to the following:
> e <- ymd_hms("2021-03-13 00:00:10", tz = 'UTC')
> s <- ymd_hms("2021-03-12 23:59:50", tz = 'UTC')
> as.duration(e - s)
[1] "20s"
> as.duration(e - s)/ddays(1)
[1] 0.0002314815
> as.numeric(as.Date(e) - as.Date(s))
[1] 1
Hence, it depends on what you are looking for: time difference or calendar difference.