How to handle mixed precision dates using R as.POSIXct - r

I have a list of a series of dates, with mixed precision. Most have the format "1930-02-06T10:00:00", but a few have the format "2130-02-06" which I want to treat as 2130-02-06T00:00:00.
When I use
df$date <- as.POSIXct(df$date,tz=Sys.timezone())
I lose times from the data because some of the datetimes are missing time. I can write a little conversion routine
fixDateTime <- function (s) {
if(nchar(s) == 10) {
return (paste(s, "00:00:00"));
} else {
return (str_replace(s,"T", " "));
}
}
and then do
df$DATET <- as.POSIXct(fixDateTime(df$date),tz=Sys.timezone())
But that doesn't work because fixDateTime is actually given an array and I don't know how to adapt for that. I'm not sure which way to try to solve this. (and I'm sure this shows how newbie I am to R)

You can work with your fixDateTime function if you use ifelse which can handle vectors instead of if/else which works for scalars. Keeping everything in base R, we can do
fixDateTime <- function (s) {
ifelse(nchar(s) == 10, paste(s, "00:00:00"), sub("T", " ", s))
}
and then use it in as.POSIXct
as.POSIXct(fixDateTime(x), tz = "UTC")
#[1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
data
x <- c("1930-02-06T10:00:00", "2130-02-06")

Turns out lubridate is all you need:
library(lubridate)
data <- c("1930-02-06T10:00:00", "2130-02-06")
ymd_hms(data, truncated = 3)
#> [1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
Created on 2019-11-15 by the reprex package (v0.3.0)

#Ronak's answer is good as it uses just base R. Another solution is offered by the anytime() function of the anytime -- it does not need any formats.
R> library(anytime)
R> anytime(c("1930-02-06T10:00:00", "2130-02-06")) # localtime by default
[1] "1930-02-06 10:00:00 CST" "2130-02-06 00:00:00 CST"
R> anytime(c("1930-02-06T10:00:00", "2130-02-06"), tz="UTC", asUTC=TRUE) #override
[1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
R>
So you can have it as UTC, or in your local time.
The main key is that not giving hours:minutes:seconds is generally seen as midnight when you parse to datetime rather than date. So you may not need a helper function

Related

How to drop minutes in R?

I have a DateTime object in R.
tempDateTime<-as.POSIXct("2017-07-13 01:40:00 MDT")
class(tempDateTime)
[1] "POSIXct" "POSIXt"
I would like to drop the minutes from the DateTime object. ie have "2017-07-13 01:00:00 MDT"
Is there a simple way to do this?
In Base R
trunc(tempDateTime, units = "hours")
# "2017-07-13 01:00:00 AEST"
This works because the round function in base R has a method to handle POSIX objects.
From ?round.POSIXt
Round or truncate date-time objects.
As #Thelatemail points out, this returns a POSIXlt object, so you may want to wrap the result in as.POSIXct() again.
Another note, POSIXct is an object that stores the number of seconds since "1970-01-01 00:00:00" (the Unix epoch).
as.numeric(tempDateTime)
# 1499874000
So the manual way to round-down the hours would be
as.POSIXct(floor(as.numeric(tempDateTime) / 3600) * 3600, origin = "1970-01-01")
Try this:
library(lubridate)
> floor_date(tempDateTime, "hour")
[1] "2017-07-13 01:00:00 PDT"

Round time by X hours in R?

While doing predicting modeling on timestamped data, I want to write a function in R (possibly using data.table) that rounds the date by X number of hours. E.g. rounding by 2 hours should give this:
"2014-12-28 22:59:00 EDT" becomes "2014-12-28 22:00:00 EDT"
"2014-12-28 23:01:00 EDT" becomes "2014-12-29 00:00:00 EDT"
It's very easy to do when you round by 1 hour - using round.POSIXt(.date, "hour") function.
Writing a generic function, like I'm doing below using multiple if statements, becomes quite ugly however:
d7.dateRoundByHour <- function (.date, byHours) {
if (byHours == 1)
return (round.POSIXt(.date, "hour"))
hh = hour(.date); dd = mday(.date); mm = month(.date); yy = year(.date)
hh = round(hh/byHours,digits=0) * byHours
if (hh>=24) {
hh=0; dd=dd+1
}
if ((mm==2 & dd==28) |
(mm %in% c(1,3,5,7,8,10,12) & dd==31) |
(mm %in% c(2,4,6,9,11) & dd==30)) { # NB: it won't work on 29 Feb leap year.
dd=1; mm=mm+1
}
if (mm==13) {
mm=1; yy=yy+1
}
str = sprintf("%i-%02.0f-%02.0f %02.0f:%02.0f:%02.0f EDT", yy,mm,dd, hh,0,0)
as.POSIXct(str, format="%Y-%m-%d %H:%M:%S")
}
Anyone can show a better way to do that?
(perhaps by converting to numeric and back to POSIXt or some other POSIXt functions?)
Use the round_date function from the lubridate package. Assuming you had a data.table with a column named date you could do the following:
dt[, date := round_date(date, '2 hours')]
A quick example will give you exactly the results you were looking for:
x <- as.POSIXct("2014-12-28 22:59:00 EDT")
round_date(x, '2 hours')
This is actually really easy with just base R. The basic idea for round by "odd lots" that you
scale down by an appropriate scale factor
round down to integer in the downscaled unit
scale back up and re-convert
Or in two R code statements:
R> pt <- as.POSIXct(c("2014-12-28 22:59:00", "2014-12-28 23:01:00 EDT"))
R> pt # just to check
[1] "2014-12-28 22:59:00 CST" "2014-12-28 23:01:00 CST"
R>
R> scalefactor <- 60*60*2 # 2 hours of 60 minutes times 60 seconds
R>
R> as.POSIXct(round(as.numeric(pt)/scalefactor) * scalefactor, origin="1970-01-01")
[1] "2014-12-28 22:00:00 CST" "2014-12-29 00:00:00 CST"
R>
The key last line just does what I outlined: convert the POSIXct to a numeric representation, scales it down, then rounds before scaling back up and converting to a POSIXct again.

Convert Date with special format using R

I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}

How to get the beginning of the day in POSIXct

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

Modifying timezone of a POSIXct object without changing the display

I have a POSIXct object and would like to change it's tz attribute WITHOUT R to interpret it (interpret it would mean to change how the datetime is displayed on the screen).
Some background: I am using the fasttime package from S.Urbanek, which take strings and cast it to POSIXct very quickly. Problem is that the string should represent a datetime in "GMT" and it's not the case of my data.
I end up with a POSIXct object with tz=GMT, in reality it is tz=GMT+1, if I change the timezone with
attr(datetime, "tzone") <- "Europe/Paris";
datetime <- .POSIXct(datetime,tz="Europe/Paris");
then it will be "displayed" as GMT+2 (the underlying value never change).
EDIT: Here is an example
datetime=as.POSIXct("2011-01-01 12:32:23.234",tz="GMT")
attributes(datetime)
#$tzone
#[1] "GMT"
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
How can I change this attribute without R to interpret it aka how can I change tzone and still have datetime displayed as "2011-01-01 12:32:23.233" ?
EDIT/SOLUTION, #GSee's solution is reasonably fast, lubridate::force_tz very slow
datetime=rep(as.POSIXct("2011-01-01 12:32:23.234",tz="GMT"),1e5)
f <- function(x,tz) return(as.POSIXct(as.numeric(x), origin="1970-01-01", tz=tz))
> system.time(datetime2 <- f(datetime,"Europe/Paris"))
user system elapsed
0.01 0.00 0.02
> system.time(datetime3 <- force_tz(datetime,"Europe/Paris"))
user system elapsed
5.94 0.02 5.98
identical(datetime2,datetime3)
[1] TRUE
To change the tz attribute of a POSIXct variable it is not best practice to convert to character or numeric and then back to POSIXct. Instead you could use the force_tz function of the lubridate package
library(lubridate)
datetime2 <- force_tz(datetime, tzone = "CET")
datetime2
attributes(datetime2)
EDITED:
My previous solution was passing a character value to origin (i.e.origin="1970-01-01"). That only worked here because of a bug (#PR14973) that has now been fixed in R-devel.
origin was being coerced to POSIXct using the tz argument of the as.POSIXct call, and not "GMT" as it was documented to do. The behavior has been changed to match the documentation which, in this case, means that you have to specify your timezone for both the origin and the as.POSIXct call.
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
as.POSIXct(as.numeric(datetime), origin=as.POSIXct("1970-01-01", tz="Europe/Paris"),
tz="Europe/Paris")
#[1] "2011-01-01 12:32:23.233 CET"
This will also works in older versions of R.
An alternative to the lubridate package is via conversion to and back from character type:
recastTimezone.POSIXct <- function(x, tz) return(
as.POSIXct(as.character(x), origin = as.POSIXct("1970-01-01"), tz = tz))
(Adapted from GSee's answer)
Don't know if this is efficient, but it would work for time zones with daylight savings.
Test code:
x <- as.POSIXct('2003-01-03 14:00:00', tz = 'Etc/UTC')
x
recastTimezone.POSIXct(x, tz = 'Australia/Melbourne')
Output:
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 14:00:00 AEDT" # Nothing is changed apart from the time zone.
Output if I replaced as.character() by as.numeric() (as GSee had done):
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 15:00:00 AEDT" # An hour is added.

Resources