R drops hours, minutes, and seconds from date - r

While converting a dataframe to xts I realized that there is something wrong with the formatter. Here's an example dataframe:
effective_date price
"1990-01-01" "100"
"1990-01-02 00:05:00" "200"
This is example output from a package that I use.
Converting this to xts is straight-forward
xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")
However this errors out, saying NAs can't be in row names, and the result is:
<NA> 100
1990-01-02 00:05:00 200
Obviously xts can't figure out what to do with the weird date there (midnight) and it won't coerce it.
If I add tz="UTC" to as.POSIXct it doesn't work. Additionally, as.POSIXlt doesnt change anything here either.
What can I do to coerce that midnight date to the correct format?

Two issues:
1) You cannot parse a date alone as POSIXct with a given format:
R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA "2017-01-03 04:05:06 CST"
R>
2) You can however use the anytime() function to do it:
R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R>
Once you have a POSIXct, forming the xts is easy.
Also note that you have typos: you need a comma before the column indicator: df[, "price"].
Edit: Getting a little tired of #42's comment about Gabor's (fine) solution "dominating" this one, so here's minimal benchmark:
R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
expr min lq mean median uq max neval cld
anytime(v) 33.6 36.8 42.1 45.6 46.6 80.7 100 a
do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4 586.8 589.5 695.7 100 b
R>
so in short "not really". It is using only R Base, which is a plus, put it is a) harder read and understand, b) more limited as it deals with exactly one format (in ISO style) and c) it is about thirteen times slower.

1) To get the "POSIXct" datetime vector try converting each datetime to "POSIXct" separately and then concatenate them together:
do.call("c", lapply(df$effective_date, as.POSIXct))
2) Another base solution that is even shorter and is also substantially faster is the following which relies on the fact that as.POSIXct will ignore junk at the end.
as.POSIXct(paste(df$effective, "00:00:00"))

Most of lubridate's parsing functions have a truncated parameter that takes a number indicating the number of elements that can be missing from the end. Missing elements will be replaced by zero.
Example with the data at hand:
lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"

Assuming you want the timestamps, preprocess with something like:
temp <- c("1990-01-01", "1990-01-02 00:05:00")
# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \\1 and 00:00:00
temp2 <- gsub("(\\d{4}\\-\\d{2}\\-\\d{2}$)", "\\1 00:00:00", temp)
# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"

Related

Reconvert numeric date to POSIXct R

I have a date that I convert to a numeric value and want to convert back to a date afterwards.
Converting date to numeric:
date1 = as.POSIXct('2017-12-30 15:00:00')
date1_num = as.numeric(date1)
# 1514646000
Reconverting numeric to date:
as.Date(date1_num, origin = '1/1/1970')
# "4146960-12-12"
What am I missing with the reconversion? I'd expect the last command to return my original date1.
As the numeric vector is created from an object with time component, reconversion can also be in the same way i.e. first to POSIXct and then wrap with as.Date
as.Date(as.POSIXct(date1_num, origin = '1970-01-01'))
#[1] "2017-12-30"
You could use anytime() and anydate() from the anytime package:
R> pt <- anytime("2017-12-30 15:00:00")
R> pt
[1] "2017-12-30 15:00:00 CST"
R>
R> anydate(pt)
[1] "2017-12-30"
R>
R> as.numeric(pt)
[1] 1514667600
R>
R> anydate(as.numeric(pt))
[1] "2017-12-30"
R>
POSIXct counts the number of seconds since the Unix Epoch, while Date counts the number of days. So you can recover the date by dividing by (60*60*24) (let's ignore leap seconds), or convert back to POSIXct instead.
as.Date(as.numeric(date1)/(60*60*24), origin="1970-01-01")
[1] "2017-12-30"
as.POSIXct(as.numeric(date1),origin="1970-01-01")
[1] "2017-12-30 15:00:00 GMT"
Using lubridate :
lubridate::as_datetime(1514646000)
[1] "2017-12-30 15:00:00 UTC"

How to fast convert different time formats in large data frames?

I want to calculate length in different time dimensions but I have problems dealing with the two slightly different time formats in my data frame column.
The original data frame column has about a million rows with the two formats (shown in the example code) mixed up .
Example code:
time <- c("2018-07-29T15:02:05Z", "2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z", "2018-10-04T12:13:45.479Z")
length <- c(15.8, 132.1, 12.5, 33.2)
df <- data.frame(time, length)
df$time <- format(as.POSIXlt(strptime(df$time,"%Y-%m-%dT%H:%M:%SZ", tz="")))
df
The formats "2018-10-04T12:13:41.333Z" and "2018-10-04T12:13:45.479Z" result in NA.
Is there a solution that would also be applicable to a big data frame where the two formats are mixed up?
We may use %OS instead of %S to account for decimals in seconds.
help("strptime")
Specific to R is %OSn, which for output gives the seconds truncated to
0 <= n <= 6 decimal places (and if %OS is not followed by a digit, it
uses the setting of getOption("digits.secs"), or if that is unset, n =
0).
as.POSIXct(time, format="%Y-%m-%dT%H:%M:%OSZ")
# [1] "2018-07-29 15:02:05 CEST" "2018-07-29 14:46:57 CEST"
# [3] "2018-10-04 12:13:41 CEST" "2018-10-04 12:13:45 CEST"
This base R code is considerably faster than the package solutions, try it yourself.
Update 1
time2 <- c("2018-09-01T12:42:37.000+02:00", "2018-10-01T11:42:37.000+03:00")
This one is trickier. ?strptime says we should use %z for offsets from UTC, but somehow it won't work with as.POSIXct. Instead we could do this,
as.POSIXct(substr(time2, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
{os <- as.numeric(el(strsplit(substring(time2, 24), "\\:")))
(os[1]*60 + os[2])*60}
# [1] "2018-09-01 14:42:37 CEST" "2018-10-01 13:42:37 CEST"
which cuts the unreadable part from the string, converts it to seconds and adds it to the "POSIXct" object.
If there are only hours as in time2, we could also say:
as.POSIXct(substr(time2, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
as.numeric(substr(time2, 24, 26))*3600
# [1] "2018-09-01 14:42:37 CEST" "2018-10-01 13:42:37 CEST"
That the code is slightly longer now should not obscure the fact that it runs practically as fast as the one at top of the answer.
Update 2
You could wrap the current three variants into a function with if (nchar(x) == 29) ... else structure, such as this one:
fixDateTime <- function(x) {
s <- split(x, nchar(x))
if ("20" %in% names(s))
s$`20` <- as.POSIXct(s$`20` , format="%Y-%m-%dT%H:%M:%SZ")
else if ("24" %in% names(s))
s$`24` <- as.POSIXct(s$`24`, format="%Y-%m-%dT%H:%M:%OSZ")
else if ("29" %in% names(s))
s$`29` <- as.POSIXct(substr(s$`29`, 1, 23), format="%Y-%m-%dT%H:%M:%OS") +
{os <- as.numeric(el(strsplit(substring(s[[3]], 24), "\\:")))
(os[1]*60 + os[2])*60}
return(unsplit(s, nchar(x)))
}
res <- fixDateTime(time3)
res
# [1] "2018-07-29 15:02:05 CEST" "2018-10-04 00:00:00 CEST" "2018-10-01 00:00:00 CEST"
str(res)
# POSIXct[1:3], format: "2018-07-29 15:02:05" "2018-10-04 00:00:00" "2018-10-01 00:00:00"
Compared to the packages only fixDateTime can handle all three defined date-time types. According to the concluding benchmark the function is still very fast.
Note: The function logically fails if different date formats have the same nchar, and it should be customized in the case (e.g. by another split condition)! Not tested: daylight saving time behavior when adding seconds to POSIXct.
Benchmark
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# fixDateTime 35.46387 35.94761 40.07578 36.05923 39.54706 68.46211 10 c
# as.POSIXct 20.32820 20.45985 21.00461 20.62237 21.16019 23.56434 10 b # to compare
# lubridate 11.59311 11.68956 12.88880 12.01077 13.76151 16.54479 10 a # produces NAs!
# anytime 198.57292 201.06483 203.95131 202.91368 203.62130 212.83272 10 d # produces NAs!
Data
time <- c("2018-07-29T15:02:05Z", "2018-07-29T14:46:57Z", "2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
time2 <- c("2018-07-29T15:02:05Z", "2018-07-29T15:02:05Z", "2018-07-29T15:02:05Z")
time3 <- c("2018-07-29T15:02:05Z", "2018-10-04T12:13:41.333Z",
"2018-10-01T11:42:37.000+03:00")
Benchmark code
n <- 1e3
t1 <- sample(time2, n, replace=TRUE)
t2 <- sample(time3, n, replace=TRUE)
library(lubridate)
library(anytime)
microbenchmark::microbenchmark(fixDateTime=fixDateTime(t2),
as.POSIXct=as.POSIXct(t1, format="%Y-%m-%dT%H:%M:%OSZ"),
lubridate=parse_date_time(t2, "ymd_HMS"),
anytime=anytime(t2),
times=10L)
You can use library anytime
library(anytime)
time<- c("2018-07-29T15:02:05Z",
"2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
anytime(time)
#[1] "2018-07-29 15:02:05 CEST" "2018-07-29 14:46:57 CEST" "2018-10-04 12:13:41 CEST" "2018-10-04 12:13:45 CEST"
or you can also use:
time<- c("2018-07-29T15:02:05Z",
"2018-07-29T14:46:57Z",
"2018-10-04T12:13:41.333Z",
"2018-10-04T12:13:45.479Z")
length<-c(15.8,132.1,12.5,33.2)
df<-data.frame(time,length)
library(lubridate)
# df$time2<-as_datetime(df$time)
df$time2 <-parse_date_time(df$time, "ymd_HMS")
df

Convert Date with special format using R

I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}

Subtract exactly one year from a POSIXct object

lets say we have this date "2014-05-11 14:45:00 UTC". I would like to get the exact POSIXct object for 1 year before so "2013-05-11 14:45:00 UTC".
My first thought is to create a whole new POSIXct object by subtracting one from the year bit and pasting it together with the remainder of the string and then creating a new POSIXct object with that string like so:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
newTime <- as.POSIXct(paste(as.character(as.numeric(substr(time,1,4)) - 1),substr(time,5,19),sep=""),tz="UTC",origin="1970-01-01")
this works fine (except in case of leap years!) but the thing is I need to do this in a large data.table for each row and preferably put the results right back in data.table.
Is there any other way of subtracting a year off an object like this?
Some extra I need to apply this to a data.table like this one:
Time
1: 1349206200
2: 1349207100
3: 1349208000
4: 1349208900
5: 1349209800
6: 1349210700
7: 1349211600
8: 1349212500
9: 1349213400
10: 1349214300
11: 1349215200
but this happens when I do:
SOdata[,Time:=as.numeric(as.POSIXct(paste(as.character(as.numeric(substr(Time,1,4)) - 1),substr(Time,5,19),sep=""),tz="UTC",origin="1970-01-01"))]
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I am guessing I need to use something like lapply, but I always mess up syntax when using that function. So does anyone know how?
lubridate is your friend.
library(lubridate)
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time-dyears(1)
#[1] "2013-05-11 14:45:00 UTC"
time+dyears(1)
#[1] "2015-05-11 14:45:00 UTC"
For leap years
> x <- as.POSIXct(c("2012-02-28", "2012-02-29"), tz="UTC",origin="1970-01-01")
> x - dyears(1)
[1] "2011-02-28 UTC" "2011-03-01 UTC"
I haven't tested the other answers, but the following should work as required regardless of leap years:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2013-05-11 14:45:00 UTC"
With Gabor's leap year example:
time <- as.POSIXct("2012-02-29 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2011-03-01 14:45:00 UTC"
seq in base can be used:
LastYr <- function(x) seq(x, length = 2, by = "-1 year")[2]
toPOSIXct <- function(x) as.POSIXct(x, origin = "1970-01-01")
# example 1
LastYr(as.POSIXct("2012-02-28"))
## [1] "2011-02-28 EST"
# example 2 - leap year
LastYr(as.POSIXct("2012-02-29"))
## [1] "2011-03-01 EST"
# example 3 - vector case
x <- as.POSIXct(c("2012-02-28", "2012-02-29")) # test data
toPOSIXct(sapply(x, LastYr))
## [1] "2011-02-28 EST" "2011-03-01 EST"
# example 4 - data.table shown in question
DT[, Time := sapply(toPOSIXct(Time), LastYr)]
Revised simplified using functions LastYr and toPOSIXct.
or you can try, in base R :
> time + as.difftime(52*7+1,units="days")
[1] "2015-05-11 14:45:00 UTC"
> time - as.difftime(52*7+1,units="days")
[1] "2013-05-11 14:45:00 UTC"
of course, it would be easier if units could be years...

R: Time stamps, Unix time and correct usage of 'strptime'

I have a column named timings of class factor with time stamps in the following format:
1/11/07 15:15
I applied strptime on timings to generate tStamp as follows:
tStamp=strptime(timings,format="%m/%d/%Y %H:%M")
i)
The corresponding entry in tStamp looks like 0007-01-11 15:15:00 now. Why has it made 2007 or 07 into 0007? What is a correct way to generate tStamp?
ii)
After generating tStamp correctly, how do we convert it to the Unix time Seconds. (Seconds since...1970) format?
You need the lowercase %y for 2-digit years:
R> pt <- strptime("1/11/07 15:15",format="%m/%d/%y %H:%M")
R> pt
[1] "2007-01-11 15:15:00 CST"
R>
where CST is my local timezone.
And as.numeric() or as.double() converts to a double ...
R> as.numeric(pt)
[1] 1168550100
... which has fractional seconds if those are in the input:
R> options("digits.secs"=3) # show milliseconds
R> as.numeric(Sys.time()) # convert current time
[1] 1372201674.52 # now with sub0seconds.

Resources