How do I use strptime or any other functions to parse time stamps with milliseconds in R?
time[1]
# [1] "2010-01-15 13:55:23.975"
strptime(time[1], format="%Y-%m-%d %H:%M:%S.%f")
# [1] NA
strptime(time[1], format="%Y-%m-%d %H:%M:%S")
# [1] "2010-01-15 13:55:23"`
Courtesy of the ?strptime help file (with the example changed to your value):
> z <- strptime("2010-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS")
> z # prints without fractional seconds
[1] "2010-01-15 13:55:23 UTC"
> op <- options(digits.secs=3)
> z
[1] "2010-01-15 13:55:23.975 UTC"
> options(op) #reset options
You can also use strptime(time[1], "%OSn") where 0 <= n <= 6, without having to set digits.secs.
The documentation states "Which of these are supported is OS-dependent." so YMMV.
Related
I want to calculate length in different time dimensions but I have problems dealing with the two slightly different time formats in my data frame column.
The original data frame column has about a million rows with the two formats (shown in the example code) mixed up .
Example code:
time <- c("2018-07-29T15:02:05Z", "2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z", "2018-10-04T12:13:45.479Z")
length <- c(15.8, 132.1, 12.5, 33.2)
df <- data.frame(time, length)
df$time <- format(as.POSIXlt(strptime(df$time,"%Y-%m-%dT%H:%M:%SZ", tz="")))
df
The formats "2018-10-04T12:13:41.333Z" and "2018-10-04T12:13:45.479Z" result in NA.
Is there a solution that would also be applicable to a big data frame where the two formats are mixed up?
We may use %OS instead of %S to account for decimals in seconds.
help("strptime")
Specific to R is %OSn, which for output gives the seconds truncated to
0 <= n <= 6 decimal places (and if %OS is not followed by a digit, it
uses the setting of getOption("digits.secs"), or if that is unset, n =
0).
as.POSIXct(time, format="%Y-%m-%dT%H:%M:%OSZ")
# [1] "2018-07-29 15:02:05 CEST" "2018-07-29 14:46:57 CEST"
# [3] "2018-10-04 12:13:41 CEST" "2018-10-04 12:13:45 CEST"
This base R code is considerably faster than the package solutions, try it yourself.
Update 1
time2 <- c("2018-09-01T12:42:37.000+02:00", "2018-10-01T11:42:37.000+03:00")
This one is trickier. ?strptime says we should use %z for offsets from UTC, but somehow it won't work with as.POSIXct. Instead we could do this,
as.POSIXct(substr(time2, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
{os <- as.numeric(el(strsplit(substring(time2, 24), "\\:")))
(os[1]*60 + os[2])*60}
# [1] "2018-09-01 14:42:37 CEST" "2018-10-01 13:42:37 CEST"
which cuts the unreadable part from the string, converts it to seconds and adds it to the "POSIXct" object.
If there are only hours as in time2, we could also say:
as.POSIXct(substr(time2, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
as.numeric(substr(time2, 24, 26))*3600
# [1] "2018-09-01 14:42:37 CEST" "2018-10-01 13:42:37 CEST"
That the code is slightly longer now should not obscure the fact that it runs practically as fast as the one at top of the answer.
Update 2
You could wrap the current three variants into a function with if (nchar(x) == 29) ... else structure, such as this one:
fixDateTime <- function(x) {
s <- split(x, nchar(x))
if ("20" %in% names(s))
s$`20` <- as.POSIXct(s$`20` , format="%Y-%m-%dT%H:%M:%SZ")
else if ("24" %in% names(s))
s$`24` <- as.POSIXct(s$`24`, format="%Y-%m-%dT%H:%M:%OSZ")
else if ("29" %in% names(s))
s$`29` <- as.POSIXct(substr(s$`29`, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
{os <- as.numeric(el(strsplit(substring(s[[3]], 24), "\\:")))
(os[1]*60 + os[2])*60}
return(unsplit(s, nchar(x)))
}
res <- fixDateTime(time3)
res
# [1] "2018-07-29 15:02:05 CEST" "2018-10-04 00:00:00 CEST" "2018-10-01 00:00:00 CEST"
str(res)
# POSIXct[1:3], format: "2018-07-29 15:02:05" "2018-10-04 00:00:00" "2018-10-01 00:00:00"
Compared to the packages only fixDateTime can handle all three defined date-time types. According to the concluding benchmark the function is still very fast.
Note: The function logically fails if different date formats have the same nchar, and it should be customized in the case (e.g. by another split condition)! Not tested: daylight saving time behavior when adding seconds to POSIXct.
Benchmark
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# fixDateTime 35.46387 35.94761 40.07578 36.05923 39.54706 68.46211 10 c
# as.POSIXct 20.32820 20.45985 21.00461 20.62237 21.16019 23.56434 10 b # to compare
# lubridate 11.59311 11.68956 12.88880 12.01077 13.76151 16.54479 10 a # produces NAs!
# anytime 198.57292 201.06483 203.95131 202.91368 203.62130 212.83272 10 d # produces NAs!
Data
time <- c("2018-07-29T15:02:05Z", "2018-07-29T14:46:57Z", "2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
time2 <- c("2018-07-29T15:02:05Z", "2018-07-29T15:02:05Z", "2018-07-29T15:02:05Z")
time3 <- c("2018-07-29T15:02:05Z", "2018-10-04T12:13:41.333Z",
"2018-10-01T11:42:37.000+03:00")
Benchmark code
n <- 1e3
t1 <- sample(time2, n, replace=TRUE)
t2 <- sample(time3, n, replace=TRUE)
library(lubridate)
library(anytime)
microbenchmark::microbenchmark(fixDateTime=fixDateTime(t2),
as.POSIXct=as.POSIXct(t1, format="%Y-%m-%dT%H:%M:%OSZ"),
lubridate=parse_date_time(t2, "ymd_HMS"),
anytime=anytime(t2),
times=10L)
You can use library anytime
library(anytime)
time<- c("2018-07-29T15:02:05Z",
"2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
anytime(time)
#[1] "2018-07-29 15:02:05 CEST" "2018-07-29 14:46:57 CEST" "2018-10-04 12:13:41 CEST" "2018-10-04 12:13:45 CEST"
or you can also use:
time<- c("2018-07-29T15:02:05Z",
"2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
length<-c(15.8,132.1,12.5,33.2)
df<-data.frame(time,length)
library(lubridate)
# df$time2<-as_datetime(df$time)
df$time2 <-parse_date_time(df$time, "ymd_HMS")
df
I'm trying to convert a timestamp in microseconds to the following format in R:
YYYY-MM-DD HH:MM:SS
I've tried different approaches, but couldn't succeed. Following my code:
options(digits=16)
value = 1521222492687300
as.POSIXct(value, tz = "UTC", origin="1970-01-01 00:00:00")
And I get this as return:
[1] "48207591-10-13 12:15:00 UTC"
Even divided by 1000, as some posts suggested, I'm still getting a non sense result:
as.POSIXct(value/1000, tz = "UTC", origin="1970-01-01 00:00:00")
[1] "50175-08-15 19:31:27.300048 UTC"
Any suggestion to solve this problem?
As Gabor hinted you need to divide by 1e6, not 1e3:
R> v <- 1521222492687300
R> v
[1] 1.52122e+15
R> anytime::anytime(v / 1e6)
[1] "2018-03-16 12:48:12.6872 CDT"
R>
Same of course with as.POSIXct etc but you nee to supply the redundant origin:
R> as.POSIXct(v / 1e6, origin="1970-01-01")
[1] "2018-03-16 12:48:12.6872 CDT"
R>
One way to see your scale is to convert current time:
R> w <- as.numeric(Sys.time())
R> c(v, w)
[1] 1.52122e+15 1.52346e+09
R>
which makes the scaling difference more obvious.
How do I use strptime or any other functions to parse time stamps with milliseconds in R?
time[1]
# [1] "2010-01-15 13:55:23.975"
strptime(time[1], format="%Y-%m-%d %H:%M:%S.%f")
# [1] NA
strptime(time[1], format="%Y-%m-%d %H:%M:%S")
# [1] "2010-01-15 13:55:23"`
Courtesy of the ?strptime help file (with the example changed to your value):
> z <- strptime("2010-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS")
> z # prints without fractional seconds
[1] "2010-01-15 13:55:23 UTC"
> op <- options(digits.secs=3)
> z
[1] "2010-01-15 13:55:23.975 UTC"
> options(op) #reset options
You can also use strptime(time[1], "%OSn") where 0 <= n <= 6, without having to set digits.secs.
The documentation states "Which of these are supported is OS-dependent." so YMMV.
I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}
I want to get subseconds so I use following:
> options(digits.secs=6)
> as.POSIXlt(df1$Global.Time[5]/1000, origin="1970-01-01", tz="America/Los_Angeles")
[1] "2005-06-15 07:53:42.7 PDT"
Why does the output not contain something like "07:53:42.700000"?
Same problem with POSIXct:
> as.POSIXct(df1$Global.Time[3]/1000, origin="1970-01-01", tz="America/Los_Angeles")
[1] "2005-06-15 07:53:42.5 PDT"
How about this (corrected per Frank's direction):
d <- as.POSIXct(Sys.time())
format(d,"%Y-%m-%d %H:%M:%OS6")
[1] "2015-05-30 18:06:08.693852"