Why does R add non-existing microseconds/nanoseconds to date-time objects?

Why does R add non-existing microseconds/nanoseconds to date-time objects? - r

I have datetimes written in format "%d-%b-%Y %H:%M:%S.%OS",
so for example "25-Apr-2021 18:31:56.234",
that is to the precision of milliseconds.
And when parse that to time object I see values are not the same, sometimes it adds 1 microsecond or decreases it, or similiar things.
Why is this and what to do about this?
I want to have a time object which is exactly 56 seconds and 234 milliseconds! (and zeroes after that if it needs to add higher precision
For example some of the values it prints when I call print(as.numeric(), digits=20) command: "1615310444.7509999 1615310442.5550001",
or when I ask for difference between some 2 values, it gives: "Time difference of 0.003999949 secs" for example.

You can use the options(digits.secs) to show the milliseconds. Here is an example below. The digits.secs must be set at zero. Also note that you should change the format of the date
> dp <- options(digits.secs=0)
> dp
$digits.secs
[1] 0
> strptime("25-Apr-2021 18:31:56.234", format = "%d-%b-%Y %H:%M:%OS")
[1] "2021-04-25 18:31:56 +08"
> dp <- options(digits.secs=3)
> dp
$digits.secs
[1] 0
> strptime("25-Apr-2021 18:31:56.234", format = "%d-%b-%Y %H:%M:%OS")
[1] "2021-04-25 18:31:56.234 +08"

Related

How to extract floating point second from POSIXct object in R?

I am working on a project that requires getting the exact second from a POSIXct object. For example, if printing out a POSIXct object named tm :
> tm
[1] "2017-07-10 09:03:32.26876 BRT"
> class(tm)
[1] "POSIXct" "POSIXt"
If I run:
> format(tm, "%S")
[1] "32"
which only prints out the decimal, instead I want "32.26876", how do I do that? Thanks for the help in advance.

You can use %OS parameter to extract fractional seconds:
tm <- as.POSIXct("2017-07-10 09:03:32.26876 BRT", format="%Y-%m-%d %H:%M:%OS")
class(tm)
# [1] "POSIXct" "POSIXt"
# set the second digits option
options(digits.secs=5)
# `strftime` or `format`:
strftime(tm, "%OS")
#[1] "32.26875"
format(tm, "%OS")
#[1] "32.26875"

From ?strftime the following is noted:
Specific to R is %OSn, which for output gives the seconds
truncated to 0 <= n <= 6 decimal places (and if %OS is not
followed by a digit, it uses the setting of
getOption("digits.secs"), or if that is unset, n = 0).
Hence we can recover up to 6 decimal places, although there seems to be some change in this information:
> tm <- as.POSIXct("2017-07-10 09:03:32.26876", tz = "BRT")
> tm
[1] "2017-07-10 09:03:32.268 BRT"
> format(tm, "%OS5")
[1] "32.26875"
> format(tm, "%OS6")
[1] "32.268759"

If you convert it to a POSIXlt, this is easy.
tm = as.POSIXct("2017-07-10 09:03:32.26876 BRT")
as.POSIXlt(tm)$sec
[1] 32.26876

In R, is the %OSn time format only valid for formatting, but not parsing?

Consider this R code, which uses a defined time format string (the timeFormat variable below) to format and parse dates:
time = as.POSIXct(1433867059, origin = "1970-01-01")
print(time)
print( as.numeric(time) )
timeFormat = "%Y-%m-%d %H:%M:%OS3"
tz = "EST"
timestamp = format(time, format = timeFormat, tz = tz)
print(timestamp)
timeParsed = as.POSIXct(timestamp, format = timeFormat, tz = tz)
print(timeParsed)
print( as.numeric(timeParsed) )
If I paste that into Rgui on my Windows box, which is running the latest (3.2.0) stable release, I get this:
> time = as.POSIXct(1433867059, origin = "1970-01-01")
> print(time)
[1] "2015-06-09 12:24:19 EDT"
> print( as.numeric(time) )
[1] 1433867059
>
> timeFormat = "%Y-%m-%d %H:%M:%OS3"
> tz = "EST"
>
> timestamp = format(time, format = timeFormat, tz = tz)
> print(timestamp)
[1] "2015-06-09 11:24:19.000"
>
> timeParsed = as.POSIXct(timestamp, format = timeFormat, tz = tz)
> print(timeParsed)
[1] NA
> print( as.numeric(timeParsed) )
[1] NA
Notice how the time format, which ends with %OS3, produces the correct time stamp (a 3 digit millisecond resolution).
However, that same time format cannot parse that time stamp back into the original POSIXct value; it barfs and parses NA.
Anyone know what is going on?
A web search found this stackoverflow link,
where one of the commenters, Waldir Leoncio, in the first answer, appears to describe the same parsing bug with %OS3 that I do:
"use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3"), but it doesn't
work for me. Henrik noted that the function's help page, ?strptime
states that the %OS3 bit is OS-dependent. I'm using an updated Ubuntu
13.04 and using %OS3 yields NA."
The help page mentioned in the quote above likely is this link,
which is unfortunately terse, merely saying
"Specific to R is %OSn, which for output gives the seconds truncated
to 0 <= n <= 6 decimal places (and if %OS is not followed by a digit,
it uses the setting of getOption("digits.secs"), or if that is unset,
n = 3). Further, for strptime %OS will input seconds including
fractional seconds. Note that %S ignores (and not rounds) fractional
parts on output."
That final senetence about strptime (i.e. parsing) is subtle: it says "for strptime %OS". Note the absence of an 'n': it says %OS instead of %OSn.
Does that mean that %OSn can NOT be used for parsing, only for formatting?
That is what I have empirically found, but is it expected behavior or a bug?
Very annoying if expected behavior, since that means that I need different time formats for formatting and parsing. Have never seen that before in any other language's date API...
(Aside: I am aware that there is another issue, even if you just want to format, with %OSn: R truncates fractional parts instead of rounds. For those not aware of this bad behavior, its hazards are discussed here, here, and here.)

This is expected behavior, not a bug. "%OSn" is for output. "%OS" is for input, and includes fractional seconds, as it says in your second blockquote:
Further, for strptime %OS will input seconds including fractional seconds.
options(digits.secs=6)
as.POSIXct("2015-06-09 11:24:19.002", "America/New_York", "%Y-%m-%d %H:%M:%OS")
# [1] "2015-06-09 11:24:19.002 EDT"
Also note that "EST" is an ambiguous timezone, and probably not what you expect. See the Time zone names section of ?timezone.

Parsing a Millisecond Timestamp To a Time In R

I'm this has been asked before, but I just can't find the exact answer.
If I have a number which represents milliseconds since midnight, say 34200577, how do I turn this into an R time?

Construct a 'baseline time' at midnight, add the given millisecond once converted to seconds and interpret as a time:
R> as.POSIXct(as.numeric(ISOdatetime(2013,8,22,0,0,0)) + 34200577/1e3,
+ origin="1970-01-01")
[1] "2013-08-22 09:30:00.576 CDT"
R>
In fact, the shorter
R> ISOdatetime(2013,8,22,0,0,0) + 34200577/1e3
[1] "2013-08-22 09:30:00.576 CDT"
R>
works as well as ISOdatetime() returns a proper time object which operates in fractional seconds so we just apply the given offset.
This appears to be correct as
R> 34200577/1e3 # seconds
[1] 34200.6
R> 34200577/1e3/60 # minutes
[1] 570.01
R> 34200577/1e3/60/60 # hours
[1] 9.50016
R>

POSIXct uses 1970 as the origin of its time scale(measured in seconds.)
> time= as.POSIXct(34200577/1000 , origin=Sys.Date() )
> time
[1] "2013-08-22 02:30:00 PDT"
Note the discrepancy in results between Dirk's and my method. The POSIX times are input as assumed to occur in UCT, so there appeared the addition 8 hours for my location in UCT-8.
> difftime( as.POSIXct(34200577/1000 , origin=Sys.Date() ) , Sys.Date() )
Time difference of 9.50016 hours
You could get the time since midnight with:
format( as.POSIXct(34200577/1000 , origin=Sys.Date(), tz="UCT" ),
format="%H:%M:%S")
[1] "09:30:00"

A little "gottcha" which I think is worth pointing out...
In R 3.1.2 on windows 64 bit I get the following results for Dirk's example
> ISOdatetime(2013,8,22,0,0,0) + 34200577/1e3
[1] "2013-08-22 09:30:00 BST"
Note the lack of fractional seconds. This is due to the option setting for "digits.secs"
> getOption("digits.secs")
NULL
Setting this option as follows gives the expected result:
> options(digits.secs=3)
> ISOdatetime(2013,8,22,0,0,0) + 34200577/1e3
[1] "2013-08-22 09:30:00.576 BST"
As you can probably guess, this is to do with the formatting of output, not the actual values we get from our date arithmetic. See ?strptime and ?options for the documentation on this.

Length of lubridate interval

What's the best way to get the length of time represented by an interval in lubridate, in specified units? All I can figure out is something like the following messy thing:
> ival
[1] 2011-01-01 03:00:46 -- 2011-10-21 18:33:44
> difftime(attr(ival, "start") + as.numeric(ival), attr(ival, "start"), 'days')
Time difference of 293.6479 days
(I also added this as a feature request at https://github.com/hadley/lubridate/issues/105, under the assumption that there's no better way available - but maybe someone here knows of one.)
Update - apparently the difftime function doesn't handle this either. Here's an example.
> (d1 <- as.POSIXct("2011-03-12 12:00:00", 'America/Chicago'))
[1] "2011-03-12 12:00:00 CST"
> (d2 <- d1 + days(1)) # Gives desired result
[1] "2011-03-13 12:00:00 CDT"
> (i2 <- d2 - d1)
[1] 2011-03-12 12:00:00 -- 2011-03-13 12:00:00
> difftime(attr(i2, "start") + as.numeric(i2), attr(i2, "start"), 'days')
Time difference of 23 hours
As I mention below, I think one nice way to handle this would be to implement a /.interval function that doesn't first cast its input to a period.

The as.duration function is what lubridate provides. The interval class is represented internally as the number of seconds from the start, so if you wanted the number of hours you could simply divide as.numeric(ival) by 3600, or by (3600*24) for days.
If you want worked examples of functions applied to your object, you should provide the output of dput(ival). I did my testing on the objects created on the help(duration) page which is where ?interval sent me.
date <- as.POSIXct("2009-03-08 01:59:59") # DST boundary
date2 <- as.POSIXct("2000-02-29 12:00:00")
span <- date2 - date #creates interval
span
#[1] 2000-02-29 12:00:00 -- 2009-03-08 01:59:59
str(span)
#Classes 'interval', 'numeric' atomic [1:1] 2.85e+08
# ..- attr(*, "start")= POSIXct[1:1], format: "2000-02-29 12:00:00"
as.duration(span)
#[1] 284651999s (9.02y)
as.numeric(span)/(3600*24)
#[1] 3294.583
# A check against the messy method:
difftime(attr(span, "start") + as.numeric(span), attr(span, "start"), 'days')
# Time difference of 3294.583 days

This question is really old, but I'm adding an update because this question has been viewed many times and when I needed to do something like this today, I found this page. In lubridate you can now do the following:
d1 <- ymd_hms("2011-03-12 12:00:00", tz = 'America/Chicago')
d2 <- ymd_hms("2011-03-13 12:00:00", tz = 'America/Chicago')
(d1 %--% d2)/dminutes(1)
(d1 %--% d2)/dhours(1)
(d1 %--% d2)/ddays(1)
(d1 %--% d2)/dweeks(1)

Ken, Dividing by days(1) will give you what you want. Lubridate doesn't coerce periods to durations when you divide intervals by periods. (Although the algorithm for finding the exact number of whole periods in the interval does begin with an estimate that uses the interval divided by the analagous number of durations, which might be what you are noticing).
The end result is the number of whole periods that fit in the interval. The warning message alerts the user that it is an estimate because there will be some fraction of a period that is dropped from the answer. Its not sensible to do math with a fraction of a period since we can't modify a clock time with it unless we convert it to multiples of a shorter period - but there won't be a consistent way to make the conversion. For example, the day you mention would be equal to 23 hours, but other days would be equal to 24 hours. You are thinking the right way - periods are an attempt to respect the variations caused by DST, leap years, etc. but they only do this as whole units.
I can't reproduce the error in subtraction that you mention above. It seems to work for me.
three <- force_tz(ymd_hms("2011-03-12 12:00:00"), "")
# note: here in TX, "" *is* CST
(four <- three + days(1))
> [1] "2011-03-13 12:00:00 CDT"
four - days(1)
> [1] "2011-03-12 12:00:00 CST"

Be careful when divinding time in seconds to obtain days as then you are no longer working with abstract representations of time but in bare numbers, which can lead to the following:
> date_f <- now()
> date_i <- now() - days(23)
> as.duration(date_f - date_i)/ddays(1)
[1] 22.95833
> interval(date_i,date_f)/ddays(1)
[1] 22.95833
> int_length(interval(date_i,date_f))/as.numeric(ddays(1))
[1] 22.95833
Which leads to consider that days or months are events in a calendar, not time amounts that can be measured in seconds, miliseconds, etc.
The best way to calculate differences in days is avoiding the transformation into seconds and work with days as a unit:
> e <- now()
> s <- now() - days(23)
> as.numeric(as.Date(s))
[1] 18709
> as.numeric(as.Date(e) - as.Date(s))
[1] 23
However, if you are considering a day as a pure 86400 seconds time span, as ddays() does, the previous approach can lead to the following:
> e <- ymd_hms("2021-03-13 00:00:10", tz = 'UTC')
> s <- ymd_hms("2021-03-12 23:59:50", tz = 'UTC')
> as.duration(e - s)
[1] "20s"
> as.duration(e - s)/ddays(1)
[1] 0.0002314815
> as.numeric(as.Date(e) - as.Date(s))
[1] 1
Hence, it depends on what you are looking for: time difference or calendar difference.

One hour increment in R, zoo

How can I add one hour to all the elements of the index of a zoo series?
I've tried
newseries <- myzooseries
index(newseries) <- index(myzooseries)+times("1:00:00")
but I get the message
Incompatible methods ("Ops.dates", "Ops.times") for "+"
thanks
My index is a chron object with date and time but I've tried with simpler examples and I can't get it

This is easily solved by adding the time you want in a numerical fashion :
newseries <- myzooseries
index(newseries) <- index(myzooseries) + 1/24
chron objects are represented as decimal numbers, so you can use that to calculate. A day is 1, so an hour is 1/24, a minute 1/1440 and so on. You can see this easily if you use the function times. This gives you the times of the object tested, eg :
> A <- chron(c("01/01/97","01/02/97","01/03/97"))
> B <- A + 1/24
> B
[1] (01/01/97 01:00:00) (01/02/97 01:00:00) (01/03/97 01:00:00)
> times(A)
Time in days:
[1] 9862 9863 9864
> times(B)
Time in days:
[1] 9862.042 9863.042 9864.042
> times(B-A)
[1] 01:00:00 01:00:00 01:00:00
> times(A[3]-B[1])
Time in days:
[1] 1.958333

Convert to POSIXct, add 60*60 (1h in s) and then convert back.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why does R add non-existing microseconds/nanoseconds to date-time objects? - r

Related

How to extract floating point second from POSIXct object in R?

In R, is the %OSn time format only valid for formatting, but not parsing?

Parsing a Millisecond Timestamp To a Time In R

Length of lubridate interval

One hour increment in R, zoo

Categories

Resources