Delta Timestamp Parsing in R (Nanoseconds, Microseconds, Milliseconds) - r

I am working in R and need to change the timestamp from what I believe is nanosecond precision to either microsecond precision or millisecond precision (I believe it needs to be milliseconds or only three digits past the decimal).
Example of two of the timestamps
"2019-03-02D00:00:12.214841000"
Part of the difficulty is I don't think there is a package like lubridate to handle it. I'm not sure if I need to use a regular expression to extract the seconds and then transform the nanoseconds to milliseconds. I'm open to any suggestions.
Also, how do you recommend dealing with the D? I was thinking I should use gsub("D", "-", df$timestamp) and maybe then a package like lubridate could parse the timestamp even with the nanosecond precision?

You can use the library nanotime which is related to integer64(really high precision float)
library(nanotime)
x<-nanotime("2019-03-02T00:00:12.214841000+00:00")
As you can see, you need to change D for T and add 00:00to the end, but that is easyly done as symbolrush showed you.
x<-nanotime(paste0(gsub("D", "T", "2019-03-02D00:00:12.214841000"), "+00:00"))
See more here:
http://dirk.eddelbuettel.com/code/nanotime.html

You can use as.POSIXct after gsub("D", " ", x):
as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
You can still work with millisecond precision afterwards:
dt <- as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
dt
[1] "2019-03-02 00:00:12 CET"
for(i in 1:1000) dt <- dt - 0.001
dt
[1] "2019-03-02 00:00:11 CET"
If you want to display those milliseconds you can use format:
format(dt, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.214"
format(dt - 1E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.213"
format(dt - 10E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.204"

Related

How do remove the UTC from the date using R

I am trying to remove the utc from this data and just keep it in single quotes this is the function i am using in R.
date.start = as.Date(Sys.Date())
But i am getting this result
I guess date.start is Sys.time() therefore do:
date.start = as.Date(Sys.time())
Sys.Date()
Sys.time()
Sys.timezone()
as.Date(Sys.time())
Output:
> Sys.Date()
[1] "2021-08-17"
> Sys.time()
[1] "2021-08-17 09:14:33 CEST"
> Sys.timezone()
[1] "Europe/Berlin"
> as.Date(Sys.time())
[1] "2021-08-17"
I think that the timezone 'UTC' is being posited there by your system settings. I believe that generating the system date with lubridate might sidestep the issue within R:
date.start = lubridate::today(tzone = "")
Use sub:
sub(" UTC", "", date)
[1] "2021-08-17" "2020-12-12"
Test data:
date <- c("2021-08-17 UTC", "2020-12-12 UTC")
Try using different time formats when getting data.
format(Sys.time(),"%d-%m-%y")
For better understanding you can read rbloggers article on Date Formats in R here:
https://www.r-bloggers.com/2013/08/date-formats-in-r/
I'm not sure why you want to remove it. That would help. Another answer showed you how to convert it to a string.
But you'll want it in date format to do something like seq(Sys.Date(), Sys.Date() + 24, by = 'day').
If the reason you want it in a particular time zone is to to join data set at midnight, you should use lubridate's force_tz ala force_tz(Sys.Date(), 'America/Chicago'). Be careful, here because it the timezone will change as needed due to daylight savings. That's why it's usually better to stick with UTC anyways.
Otherwise, as the other poster mentioned, just convert to string and format it ala format(Sys.Date(),"%Y-%m-%d").

Converting Nanotime Object in R [duplicate]

I am working in R and need to change the timestamp from what I believe is nanosecond precision to either microsecond precision or millisecond precision (I believe it needs to be milliseconds or only three digits past the decimal).
Example of two of the timestamps
"2019-03-02D00:00:12.214841000"
Part of the difficulty is I don't think there is a package like lubridate to handle it. I'm not sure if I need to use a regular expression to extract the seconds and then transform the nanoseconds to milliseconds. I'm open to any suggestions.
Also, how do you recommend dealing with the D? I was thinking I should use gsub("D", "-", df$timestamp) and maybe then a package like lubridate could parse the timestamp even with the nanosecond precision?
You can use the library nanotime which is related to integer64(really high precision float)
library(nanotime)
x<-nanotime("2019-03-02T00:00:12.214841000+00:00")
As you can see, you need to change D for T and add 00:00to the end, but that is easyly done as symbolrush showed you.
x<-nanotime(paste0(gsub("D", "T", "2019-03-02D00:00:12.214841000"), "+00:00"))
See more here:
http://dirk.eddelbuettel.com/code/nanotime.html
You can use as.POSIXct after gsub("D", " ", x):
as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
You can still work with millisecond precision afterwards:
dt <- as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
dt
[1] "2019-03-02 00:00:12 CET"
for(i in 1:1000) dt <- dt - 0.001
dt
[1] "2019-03-02 00:00:11 CET"
If you want to display those milliseconds you can use format:
format(dt, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.214"
format(dt - 1E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.213"
format(dt - 10E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.204"

R time format conversion

I have almost finished my script but I have a problem with my dates format.
I installed lubridate package used the as_date function, but it doesn't give me what I want (a date).
"time" is my variable, I put its description below.
I do not put my entire script since the concern is only about this format question (and it implies a huge netcdf file impossible to download)
Could you help me please ?
class(time)
[1] "array"
head(time)
[1] 3573763200 3573774000 3573784800 3573795600 3573806400 3573817200
tunits
$long_name
[1] "time in seconds (UT)"
$standard_name
[1] "time"
$units
[1] "seconds since 1900-01-01T00:00:00Z"
$axis
[1] "T"
$time_origin
[1] "01-JAN-1900 00:00:00"
$conventions
[1] "relative number of seconds with no decimal part"
#conversion
date = as_date(time,tz="UTC",origin = "1900-01-01")
head(date)
[1] "-5877641-06-23" "-5877641-06-23" "-5877641-06-23" "-5877641-06-23"
[5] "-5877641-06-23" "-5877641-06-23"
Time is in seconds since 01/01/1900. Converting a value in time to an actual date would work as follows, using the seconds methods in lubridate:
lubridate::ymd("1900-01-01") + lubridate::seconds(3573763200)
You can vectorize it:
lubridate::ymd("1900-01-01") + lubridate::seconds(time)
as_date() calculates the date using the number of days since the origin.
What you are looking for seems to be as_datetime() also from the lubridate package which calculates the date using the number of seconds since the origin. In your example this would be:
time <- c(3573763200,3573774000,3573784800,3573795600,3573806400,3573817200)
date <- as_datetime(time, tz = "UTC", origin = "1900-01-01") %>% date()
Using a dplyr pipe and the date() function from lubridate to extract the date from the as_datetime() function.
date <- as_date(time/(24*60*60), tz = "UTC", origin = "1900-01-01")
date

Adding milliseconds to a timestamp in R, even though the original character does not have milliseconds?

I am doing some animal movement analysis and I want to submit data to an organisation called Movebank for annotation, but they require the timestamp to have milliseconds included with 3 decimal places.
I have a column in my data frame (dat) with my timestamps as characters (without milliseconds), for example "2017-07-19 16:30:24"
To convert them to time and date format with milliseconds I am using the code:
options(digits.secs = 3)
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS", tz = "UTC")
Which works fine at converting my timestamp column to POSIXct which I can use to make tracks etc., but it does not add .000 milliseconds to the end of each timestamp which I was hoping it would.
I have also tried:
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS3", tz = "UTC")
(Note: I added .. %OS3 ...)
But this returns an NA for my for my timestamps.
Can anybody shed some light on this? I essentially need to add .000 to the end of each of my timestamps so that, using the example given above, I would have the format "2017-07-19 16:30:24.000"
The milliseconds will be dropped if there are no times with effective milliseconds.
options(digits.secs=4)
x1 <- as.POSIXct("2017-07-19 16:30:25")
as.POSIXct(paste0(x1, ".000"), format="%Y-%m-%d %H:%M:%OS")
# [1] "2017-07-19 16:30:25 UTC"
However, they will be added automatically if there are.
x2 <- as.POSIXct("2017-07-19 16:30:25.002")
c(x1, x2)
# [1] "2017-07-19 18:30:25.000 CEST" "2017-07-19 18:30:25.002 CEST"

How do I convert correctly timezones

I am using the fasttime package for its fastPOSIXct function that can read character datetimes very efficiently. My problem is that it can only read character datetimes THAT ARE EXPRESSED IN GMT.
R) fastPOSIXct("2010-03-15 12:37:17.223",tz="GMT") #very fast
[1] "2010-03-15 12:31:16.223 GMT"
R) as.POSIXct("2010-03-15 12:37:17.223",tz="GMT") #very slow
[1] "2010-03-15 12:31:16.223 GMT"
Now, say I have a file with datetimes expressed in "America/Montral" timezone, the plan is to load them (implicitely pretending they are in GMT) and modifying subsequently the timezone attribute without changing the underlying value.
If I use this function, refered in another post:
forceTZ = function(x,tz){
return(as.POSIXct(as.numeric(x), origin=as.POSIXct("1970-01-01",tz=tz), tz=tz))
}
I am seeing a bug ...
R) forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 13:37:17.223 EDT"
... because I would like it to be
R) as.POSIXct("2010-03-15 12:37:17.223",format="%Y-%m-%d %H:%M:%OS",tz="America/Montreal")
[1] "2010-03-15 12:37:17.223 EDT"
Is there a workaround ?
EDIT: I know about lubridate::force_tz but it is too slow (not point using fasttime::fastPOSIXct anymore )
The smart thing to do here is almost certainly to write readable, easy-to-maintain code, and throw more hardware at the problem if your code is too slow.
If you are desperate for a code speedup, then you could write a custom time-zone adjustment function. It isn't pretty, so if you have to convert between many time zones, you'll end up with spaghetti code. Here's my solution for the specific case of converting from GMT to Montreal time.
First precompute a list of dates for daylight savings time. You'll need to extend this to before 2010/after 2013 in order to fit your dataset. I found the dates here
http://www.timeanddate.com/worldclock/timezone.html?n=165
montreal_tz_data <- cbind(
start = fastPOSIXct(
c("2010-03-14 07:00:00", "2011-03-13 07:00:00", "2012-03-11 07:00:00", "2013-03-10 07:00:00")
),
end = fastPOSIXct(
c("2010-11-07 06:00:00", "2011-11-06 06:00:00", "2012-11-04 06:00:00", "2013-11-03 06:00:00")
)
)
For speed, the function to change time zones treats the times as numbers.
to_montreal_tz <- function(x)
{
x <- as.numeric(x)
is_dst <- logical(length(x)) #initialise as FALSE
#Loop over DST periods in each year
for(row in seq_len(nrow(montreal_tz_data)))
{
is_dst[x > montreal_tz_data[row, 1] & x < montreal_tz_data[row, 2]] <- TRUE
}
#Hard-coded numbers are 4/5 hours in seconds
ans <- ifelse(is_dst, x + 14400, x + 18000)
class(ans) <- c("POSIXct", "POSIXt")
ans
}
Now, to compare times:
#A million dates
ch <- rep("2010-03-15 12:37:17.223", 1e6)
#The easy way (no conversion of time zones afterwards)
system.time(as.POSIXct(ch, tz="America/Montreal"))
# user system elapsed
# 28.96 0.05 29.00
#A slight performance gain by specifying the format
system.time(as.POSIXct(ch, format = "%Y-%m-%d %H:%M:%S", tz="America/Montreal"))
# user system elapsed
# 13.77 0.01 13.79
#Using the fast functions
library(fasttime)
system.time(to_montreal_tz(fastPOSIXct(ch)))
# user system elapsed
# 0.51 0.02 0.53
As with all optimisation tricks, you've either got a 27-fold speedup (yay!) or you've saved 13 seconds processing time but added 3 days of code-maintenance time from an obscure bug when you DST table runs out in 2035 (boo!).
It's a daylight savings issue: http://www.timeanddate.com/time/dst/2010a.html
In 2010 it began on the 14th March in Canada, but not until the 28th March in the UK.
You can use POSIXlt objects to modify timezones directly:
lt <- as.POSIXlt(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"))
attr(lt,"tzone") <- "America/Montreal"
as.POSIXct(lt)
[1] "2010-03-15 12:37:17 EDT"
Or you could use format to convert to a string and set the timezone in a call to as.POSIXct. You can therefore modify forceTZ:
forceTZ <- function(x,tz)
{
return(as.POSIXct(format(x),tz=tz))
}
forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 12:37:17 EDT"
Could you not just add the appropriate number of seconds to correct the offset from GMT?
# Original problem
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal")
# [1] "2010-03-15 08:37:17 EDT"
# Add 4 hours worth of seconds to the data. This should be very quick.
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal") + 14400
# [1] "2010-03-15 12:37:17 EDT"

Resources