Converting Nanotime Object in R [duplicate] - r

I am working in R and need to change the timestamp from what I believe is nanosecond precision to either microsecond precision or millisecond precision (I believe it needs to be milliseconds or only three digits past the decimal).
Example of two of the timestamps
"2019-03-02D00:00:12.214841000"
Part of the difficulty is I don't think there is a package like lubridate to handle it. I'm not sure if I need to use a regular expression to extract the seconds and then transform the nanoseconds to milliseconds. I'm open to any suggestions.
Also, how do you recommend dealing with the D? I was thinking I should use gsub("D", "-", df$timestamp) and maybe then a package like lubridate could parse the timestamp even with the nanosecond precision?

You can use the library nanotime which is related to integer64(really high precision float)
library(nanotime)
x<-nanotime("2019-03-02T00:00:12.214841000+00:00")
As you can see, you need to change D for T and add 00:00to the end, but that is easyly done as symbolrush showed you.
x<-nanotime(paste0(gsub("D", "T", "2019-03-02D00:00:12.214841000"), "+00:00"))
See more here:
http://dirk.eddelbuettel.com/code/nanotime.html

You can use as.POSIXct after gsub("D", " ", x):
as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
You can still work with millisecond precision afterwards:
dt <- as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
dt
[1] "2019-03-02 00:00:12 CET"
for(i in 1:1000) dt <- dt - 0.001
dt
[1] "2019-03-02 00:00:11 CET"
If you want to display those milliseconds you can use format:
format(dt, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.214"
format(dt - 1E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.213"
format(dt - 10E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.204"

Related

How do remove the UTC from the date using R

I am trying to remove the utc from this data and just keep it in single quotes this is the function i am using in R.
date.start = as.Date(Sys.Date())
But i am getting this result
I guess date.start is Sys.time() therefore do:
date.start = as.Date(Sys.time())
Sys.Date()
Sys.time()
Sys.timezone()
as.Date(Sys.time())
Output:
> Sys.Date()
[1] "2021-08-17"
> Sys.time()
[1] "2021-08-17 09:14:33 CEST"
> Sys.timezone()
[1] "Europe/Berlin"
> as.Date(Sys.time())
[1] "2021-08-17"
I think that the timezone 'UTC' is being posited there by your system settings. I believe that generating the system date with lubridate might sidestep the issue within R:
date.start = lubridate::today(tzone = "")
Use sub:
sub(" UTC", "", date)
[1] "2021-08-17" "2020-12-12"
Test data:
date <- c("2021-08-17 UTC", "2020-12-12 UTC")
Try using different time formats when getting data.
format(Sys.time(),"%d-%m-%y")
For better understanding you can read rbloggers article on Date Formats in R here:
https://www.r-bloggers.com/2013/08/date-formats-in-r/
I'm not sure why you want to remove it. That would help. Another answer showed you how to convert it to a string.
But you'll want it in date format to do something like seq(Sys.Date(), Sys.Date() + 24, by = 'day').
If the reason you want it in a particular time zone is to to join data set at midnight, you should use lubridate's force_tz ala force_tz(Sys.Date(), 'America/Chicago'). Be careful, here because it the timezone will change as needed due to daylight savings. That's why it's usually better to stick with UTC anyways.
Otherwise, as the other poster mentioned, just convert to string and format it ala format(Sys.Date(),"%Y-%m-%d").

R time format conversion

I have almost finished my script but I have a problem with my dates format.
I installed lubridate package used the as_date function, but it doesn't give me what I want (a date).
"time" is my variable, I put its description below.
I do not put my entire script since the concern is only about this format question (and it implies a huge netcdf file impossible to download)
Could you help me please ?
class(time)
[1] "array"
head(time)
[1] 3573763200 3573774000 3573784800 3573795600 3573806400 3573817200
tunits
$long_name
[1] "time in seconds (UT)"
$standard_name
[1] "time"
$units
[1] "seconds since 1900-01-01T00:00:00Z"
$axis
[1] "T"
$time_origin
[1] "01-JAN-1900 00:00:00"
$conventions
[1] "relative number of seconds with no decimal part"
#conversion
date = as_date(time,tz="UTC",origin = "1900-01-01")
head(date)
[1] "-5877641-06-23" "-5877641-06-23" "-5877641-06-23" "-5877641-06-23"
[5] "-5877641-06-23" "-5877641-06-23"
Time is in seconds since 01/01/1900. Converting a value in time to an actual date would work as follows, using the seconds methods in lubridate:
lubridate::ymd("1900-01-01") + lubridate::seconds(3573763200)
You can vectorize it:
lubridate::ymd("1900-01-01") + lubridate::seconds(time)
as_date() calculates the date using the number of days since the origin.
What you are looking for seems to be as_datetime() also from the lubridate package which calculates the date using the number of seconds since the origin. In your example this would be:
time <- c(3573763200,3573774000,3573784800,3573795600,3573806400,3573817200)
date <- as_datetime(time, tz = "UTC", origin = "1900-01-01") %>% date()
Using a dplyr pipe and the date() function from lubridate to extract the date from the as_datetime() function.
date <- as_date(time/(24*60*60), tz = "UTC", origin = "1900-01-01")
date

Delta Timestamp Parsing in R (Nanoseconds, Microseconds, Milliseconds)

I am working in R and need to change the timestamp from what I believe is nanosecond precision to either microsecond precision or millisecond precision (I believe it needs to be milliseconds or only three digits past the decimal).
Example of two of the timestamps
"2019-03-02D00:00:12.214841000"
Part of the difficulty is I don't think there is a package like lubridate to handle it. I'm not sure if I need to use a regular expression to extract the seconds and then transform the nanoseconds to milliseconds. I'm open to any suggestions.
Also, how do you recommend dealing with the D? I was thinking I should use gsub("D", "-", df$timestamp) and maybe then a package like lubridate could parse the timestamp even with the nanosecond precision?
You can use the library nanotime which is related to integer64(really high precision float)
library(nanotime)
x<-nanotime("2019-03-02T00:00:12.214841000+00:00")
As you can see, you need to change D for T and add 00:00to the end, but that is easyly done as symbolrush showed you.
x<-nanotime(paste0(gsub("D", "T", "2019-03-02D00:00:12.214841000"), "+00:00"))
See more here:
http://dirk.eddelbuettel.com/code/nanotime.html
You can use as.POSIXct after gsub("D", " ", x):
as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
You can still work with millisecond precision afterwards:
dt <- as.POSIXct(gsub("D", " ", "2019-03-02D00:00:12.214841000"))
dt
[1] "2019-03-02 00:00:12 CET"
for(i in 1:1000) dt <- dt - 0.001
dt
[1] "2019-03-02 00:00:11 CET"
If you want to display those milliseconds you can use format:
format(dt, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.214"
format(dt - 1E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.213"
format(dt - 10E-3, "%Y-%m-%d %H:%M:%OS3")
[1] "2019-03-02 00:00:11.204"

R drops hours, minutes, and seconds from date

While converting a dataframe to xts I realized that there is something wrong with the formatter. Here's an example dataframe:
effective_date price
"1990-01-01" "100"
"1990-01-02 00:05:00" "200"
This is example output from a package that I use.
Converting this to xts is straight-forward
xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")
However this errors out, saying NAs can't be in row names, and the result is:
<NA> 100
1990-01-02 00:05:00 200
Obviously xts can't figure out what to do with the weird date there (midnight) and it won't coerce it.
If I add tz="UTC" to as.POSIXct it doesn't work. Additionally, as.POSIXlt doesnt change anything here either.
What can I do to coerce that midnight date to the correct format?
Two issues:
1) You cannot parse a date alone as POSIXct with a given format:
R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA "2017-01-03 04:05:06 CST"
R>
2) You can however use the anytime() function to do it:
R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R>
Once you have a POSIXct, forming the xts is easy.
Also note that you have typos: you need a comma before the column indicator: df[, "price"].
Edit: Getting a little tired of #42's comment about Gabor's (fine) solution "dominating" this one, so here's minimal benchmark:
R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
expr min lq mean median uq max neval cld
anytime(v) 33.6 36.8 42.1 45.6 46.6 80.7 100 a
do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4 586.8 589.5 695.7 100 b
R>
so in short "not really". It is using only R Base, which is a plus, put it is a) harder read and understand, b) more limited as it deals with exactly one format (in ISO style) and c) it is about thirteen times slower.
1) To get the "POSIXct" datetime vector try converting each datetime to "POSIXct" separately and then concatenate them together:
do.call("c", lapply(df$effective_date, as.POSIXct))
2) Another base solution that is even shorter and is also substantially faster is the following which relies on the fact that as.POSIXct will ignore junk at the end.
as.POSIXct(paste(df$effective, "00:00:00"))
Most of lubridate's parsing functions have a truncated parameter that takes a number indicating the number of elements that can be missing from the end. Missing elements will be replaced by zero.
Example with the data at hand:
lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"
Assuming you want the timestamps, preprocess with something like:
temp <- c("1990-01-01", "1990-01-02 00:05:00")
# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \\1 and 00:00:00
temp2 <- gsub("(\\d{4}\\-\\d{2}\\-\\d{2}$)", "\\1 00:00:00", temp)
# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"

Parsing ISO8601 date and time format in R [duplicate]

This question already has answers here:
Using strptime %z with special timezone format
(2 answers)
Closed 9 years ago.
This should be quick - we are parsing the following format in R:
2013-04-05T07:49:54-07:00
My current approach is
require(stringr)
timenoT <- str_replace_all("2013-04-05T07:49:54-07:00", "T", " ")
timep <- strptime(timenoT, "%Y-%m-%d %H:%M:%S%z", tz="UTC")
but it gives NA.
%z is the signed offset in hours, in the format hhmm, not hh:mm. Here's one way to remove the last :.
newstring <- gsub("(.*).(..)$","\\1\\2","2013-04-05T07:49:54-07:00")
(timep <- strptime(newstring, "%Y-%m-%dT%H:%M:%S%z", tz="UTC"))
# [1] "2013-04-05 14:49:54 UTC"
Also note that you don't have to remove the "T".
You don't the string replacement.
NA just means that the whole did not work, so do it pieces to build your expression:
R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%d")
[1] "2013-04-05"
R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%dT%H:%M")
[1] "2013-04-05 07:49:00"
R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%dT%H:%M:%S")
[1] "2013-04-05 07:49:54"
R>
Also, for reasons I never fully understood -- but which probably reside with C library function underlying it, %z only works on output, not input. So your NA mostly likely comes from your use of %z.
strptime("2013-04-05 07:49:54-07:00", "%Y-%m-%d %H:%M:%S", tz="UTC") gives 2013-04-05 07:49:54 UTC
Try
timep <- strptime(timenoT, "%Y-%m-%d %H:%M:%S", tz="UTC")

Resources