How to get the beginning of the day in POSIXct - r

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.

It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.

Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.

Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

Related

NA for 1 particular date when converting dates from "character" format to "POSIXct" with as.POSIXct

I'm converting a string vector to date format with as.POSIXct().
Here is the strange thing:
as.POSIXct("2017-03-26 03:00:00.000",format="%Y-%m-%d %H")
#Gives
"2017-03-26 03:00:00 CEST"
#While
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %H")
#Outputs
NA
This is really confusing and frustrating. It seem like the function really doesn't like the specific time:
02:00:00.000
We can specify the %T for time. In the format, there are minutes, seconds and millseconds. So, the %H is only matching the hour part
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %T")
[1] "2017-03-26 02:00:00 EDT"
Or to take care of the milliseconds as well
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %H:%M:%OS")
#[1] "2017-03-26 02:00:00 EDT"
Or using lubridate
library(lubridate)
ymd_hms("2017-03-26 02:00:00.000")
This was a daylight savings issue, the time:
"2017-03-26 02:00:00.000" does not exist in Sweden as we lost an hour this date when changing to "summer time".

lubridate: Parsing dates of the form '27.10.2013 02A:00' (daylight savings time)

I am trying to parse strings of the form 25.10.2013 17:30 (the timezone is CET/CEST but this is not specified in the strings themselves) as POSIXct using lubridates dmy_hm(..., tz = 'Europe/Brussels') function.
I have the problem that after parsing, there are duplicate values on the day CEST switches to CET (the clock jumps one hour back). The cause seems to be the way this shift is indicated in my data: 02A:00 for 2 o'clock CEST and 02B:00 2 o'clock CET, which is one hour later. dmy_hm(..., tz = 'Europe/Brussels') interprets both as CET.
Minimal working example:
> library(lubridate)
> times = c("27.10.2013 01:00", "27.10.2013 02A:00",
"27.10.2013 02B:00", "27.10.2013 03:00")
> times = dmy_hm(times, tz = "Europe/Brussels")
> times
[1] "2013-10-27 01:00:00 CEST" "2013-10-27 02:00:00 CET"
[3] "2013-10-27 02:00:00 CET" "2013-10-27 03:00:00 CET"
My question is: What would be the best way to fix the "wrong" dates?
I tried to use which(duplicated(times)) to find the indices of the duplicate values and remove one hour from the "wrong" values, however there seems to be another problem:
> times[2] - hours(1)
[1] "2013-10-27 01:00:00 CEST"
Why does removing one hour from '"2013-10-27 02:00:00 CET"' bring me to '"2013-10-27 01:00:00 CEST"'? Isn't that a jump of two hours? I would expect to land at '"2013-10-27 02:00:00 CEST"'.
EDIT: The last part is a know issue (see https://github.com/tidyverse/lubridate/issues/498). The solution is to use dhours() instead of hours()
> times[2] - dhours(1)
[1] "2013-10-27 02:00:00 CEST"

How to drop minutes in R?

I have a DateTime object in R.
tempDateTime<-as.POSIXct("2017-07-13 01:40:00 MDT")
class(tempDateTime)
[1] "POSIXct" "POSIXt"
I would like to drop the minutes from the DateTime object. ie have "2017-07-13 01:00:00 MDT"
Is there a simple way to do this?
In Base R
trunc(tempDateTime, units = "hours")
# "2017-07-13 01:00:00 AEST"
This works because the round function in base R has a method to handle POSIX objects.
From ?round.POSIXt
Round or truncate date-time objects.
As #Thelatemail points out, this returns a POSIXlt object, so you may want to wrap the result in as.POSIXct() again.
Another note, POSIXct is an object that stores the number of seconds since "1970-01-01 00:00:00" (the Unix epoch).
as.numeric(tempDateTime)
# 1499874000
So the manual way to round-down the hours would be
as.POSIXct(floor(as.numeric(tempDateTime) / 3600) * 3600, origin = "1970-01-01")
Try this:
library(lubridate)
> floor_date(tempDateTime, "hour")
[1] "2017-07-13 01:00:00 PDT"

R POSIXct returns NA with "03/12/2017 02:17:13"

I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.

Modifying timezone of a POSIXct object without changing the display

I have a POSIXct object and would like to change it's tz attribute WITHOUT R to interpret it (interpret it would mean to change how the datetime is displayed on the screen).
Some background: I am using the fasttime package from S.Urbanek, which take strings and cast it to POSIXct very quickly. Problem is that the string should represent a datetime in "GMT" and it's not the case of my data.
I end up with a POSIXct object with tz=GMT, in reality it is tz=GMT+1, if I change the timezone with
attr(datetime, "tzone") <- "Europe/Paris";
datetime <- .POSIXct(datetime,tz="Europe/Paris");
then it will be "displayed" as GMT+2 (the underlying value never change).
EDIT: Here is an example
datetime=as.POSIXct("2011-01-01 12:32:23.234",tz="GMT")
attributes(datetime)
#$tzone
#[1] "GMT"
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
How can I change this attribute without R to interpret it aka how can I change tzone and still have datetime displayed as "2011-01-01 12:32:23.233" ?
EDIT/SOLUTION, #GSee's solution is reasonably fast, lubridate::force_tz very slow
datetime=rep(as.POSIXct("2011-01-01 12:32:23.234",tz="GMT"),1e5)
f <- function(x,tz) return(as.POSIXct(as.numeric(x), origin="1970-01-01", tz=tz))
> system.time(datetime2 <- f(datetime,"Europe/Paris"))
user system elapsed
0.01 0.00 0.02
> system.time(datetime3 <- force_tz(datetime,"Europe/Paris"))
user system elapsed
5.94 0.02 5.98
identical(datetime2,datetime3)
[1] TRUE
To change the tz attribute of a POSIXct variable it is not best practice to convert to character or numeric and then back to POSIXct. Instead you could use the force_tz function of the lubridate package
library(lubridate)
datetime2 <- force_tz(datetime, tzone = "CET")
datetime2
attributes(datetime2)
EDITED:
My previous solution was passing a character value to origin (i.e.origin="1970-01-01"). That only worked here because of a bug (#PR14973) that has now been fixed in R-devel.
origin was being coerced to POSIXct using the tz argument of the as.POSIXct call, and not "GMT" as it was documented to do. The behavior has been changed to match the documentation which, in this case, means that you have to specify your timezone for both the origin and the as.POSIXct call.
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
as.POSIXct(as.numeric(datetime), origin=as.POSIXct("1970-01-01", tz="Europe/Paris"),
tz="Europe/Paris")
#[1] "2011-01-01 12:32:23.233 CET"
This will also works in older versions of R.
An alternative to the lubridate package is via conversion to and back from character type:
recastTimezone.POSIXct <- function(x, tz) return(
as.POSIXct(as.character(x), origin = as.POSIXct("1970-01-01"), tz = tz))
(Adapted from GSee's answer)
Don't know if this is efficient, but it would work for time zones with daylight savings.
Test code:
x <- as.POSIXct('2003-01-03 14:00:00', tz = 'Etc/UTC')
x
recastTimezone.POSIXct(x, tz = 'Australia/Melbourne')
Output:
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 14:00:00 AEDT" # Nothing is changed apart from the time zone.
Output if I replaced as.character() by as.numeric() (as GSee had done):
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 15:00:00 AEDT" # An hour is added.

Resources