In the R package lubridate, I can easily create a date with the following syntax:
> mdy("5/4/2015")
As expected, it produces the following result:
[1] "2015-05-04 UTC"
However, if I try to add that very value to an array, it seems to change from UTC to my local time (EDT):
> c(mdy("5/4/2015"))
[1] "2015-05-03 20:00:00 EDT"
Since I don't care about times this wouldn't affect me much except that this results in the date shifting back by 1, as follows:
> day(mdy("5/4/2015"))
[1] 4
> day(c(mdy("5/4/2015")))
[1] 3
To me, the act of adding something to an array should not change the value of that something. Am I missing something here, and is there a way to resolve this issue?
That's because lubridate::mdy assumes UTC. When you wrap it around c(), it reverts back to your local timezone EDT because c() does not pass on the timezone attribute:
> attr(mdy("5/4/2015", tz = "EDT"), "tzone")
# [1] "EDT"
> attr(c(mdy("5/4/2015", tz = "EDT")), "tzone")
# NULL
You can do:
Sys.setenv(TZ = "UTC")
To set your local timezone to UTC.
Alternatively, you can specity the timezone explicitly in mdy():
mdy("5/4/2015", tz = "UTC")
Apart from Steven's solution, you could also store your dates in a list
list(mdy("5/4/2015"))[[1]]
#[1] "2015-05-04 UTC"
This won't remove the timezone and you don't have to mess around with environment variables.
I agree with you: If you look at c as some form of constructor for a "vector" and you come from a C++ or similar background, the removal of attributes (except for names) certainly seems strange.
Related
I have a Time column in my df with value 1.01.2016 0:00:05. I want it without the seconds and therefore used df$Time <- as.POSIXct(df$Time, format = "%d.%m.%Y :%H:%M", tz = "Asia/Kolkata"). But I get NA value. What is the problem here?
I suspect there are two things working here: the storage of a time object (POSIXt), and the representation of that object.
The string you present is (I believe) not a proper POSIXt (whether POSIXct or POSIXlt) object for R, which means it is just a character string. In that case, you can remove it with:
gsub(':[^:]*$', '', '1.01.2016 0:00:05')
# [1] "1.01.2016 0:00"
However, that is still just a string, not a date or time object. If you parse it into a time-object that R knows about:
as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
# [1] "2016-01-01 00:00:05 IST"
then you now have a time object that R knows something about ... and it defaults to representing it (printing it on the console) with seconds-precision. Typically, all that is available to change for the console-printing is the precision of the seconds, as in
options("digits.secs")
# $digits.secs
# NULL
Sys.time()
# [1] "2018-06-26 18:21:06 PDT"
options("digits.secs"=3)
Sys.time()
# [1] "2018-06-26 18:21:10.090 PDT"
then you can get more. But alas, I do know think there is an R-option to say "always print my POSIXt objects in this way". So your only choice is (at the point where you no longer need it to be a time-like object) to change it back into a string with no time-like value:
x <- as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
x
# [1] "2016-01-01 00:00:05 IST"
?strptime
# see that day-of-month can either be "%d" for 01-31 or "%e" for 1-31
format(x, format="%e.%m.%Y %H:%M")
# [1] " 1.01.2016 00:00"
(This works equally well for a vector.)
Part of me suggests convert to POSIXt and back to string as opposed to my gsub example because using as.POSIXct will tell you when the string does not match the date-time-like object you are expecting, whereas gsub will happily do something wrong or nothing.
Try asPOSIXlt:
> test <- "1.01.2016 0:00:05"
> as.POSIXlt(test, "%d.%m.%Y %H:%M:%S", tz="Asia/Kolkata")
[1] "2016-01-01 00:00:05 IST"
The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours
I defined a date with a timezone, but when I print it out using the scales package date_format it gives me the time in UTC, which is awkward for my purpose.
> library(scales)
> st <- as.POSIXct("2015-10-31 00:00:00",tz="US/Pacific")
> st
[1] "2015-10-31 PDT"
> fmt <- date_format("%Y-%m-%d %R %Z")
> fmt(st)
[1] "2015-10-31 07:00 UTC"
Interestingly this works (so POSIXct seems to understand the timezone) - but does not give me enough control over the format:
> format(st,usetz=T)
[1] "2015-10-31 PDT"
This unreliability is hinted at in the help for ?date_format:
When %z or %Z is used for output with an object with an assigned time
zone an attempt is made to use the values for that time zone — but it
is not guaranteed to succeed.
So my question is, how do I make it succeed?
Suggesting workarounds is fine and may attact upvotes, but please understand the point of this question is that I want to obtain insight as to what is going on with date_format.
The definition of date_format is very short:
function (format = "%Y-%m-%d", tz = "UTC")
{
function(x) format(x, format, tz = tz)
}
It should be obvious why the timezone is changed if you don't change the default.
I have the following piece of R-code:
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString))
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString))
I have the problem that when I do as.Date(y) then I get 2015-07-16 (although its date is one day later!). Apparently the problem is the timezone. So I checked the timezones:
> x
[1] "2013-11-23 23:10:38 CET"
> y
[1] "2015-07-17 01:43:38 CEST"
>
Ok, so they deviate in their timezone. This is weird, because why does R decide that one timestamp (given without any timezone at all) lies in a different timezone than another (given without any timezone at all)?
Ok, so lets set the timezone. Googling revealed that attr(y, "tzone") <- "CET" should do the deal. Lets try this:
> attr(y, "tzone") <- "CET"
> y
[1] "2015-07-17 01:43:38 CEST"
>
Ok, that did not work. Let see what the timezone actually is in the beginning:
> formatString = "%Y-%m-%d %H:%M:%OS"
> x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString))
> y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString))
> unclass(x)
[1] 1385244638
attr(,"tzone")
[1] ""
> unclass(y)
[1] 1437090218
attr(,"tzone")
[1] ""
>
So... they dont have a timezone at all but their timezones are different????
--> here are my natural questions:
1) why are they initialized with a different timezone when I dont specify a timezone at all?
2) why do both objects apparently not have a timezone and at the same time... how come they have different timezones?
3) How can I make as.Date(y) == "2015-07-17" true? I.e. how can I set both to the current timezone? Sys.timezone() results in 'NA'... (EDIT: my timezone [Germany] seems to be "CET" --> how can I set both to CET?)
I'm scratching my head here... Thanks for any thoughts on this you share with me :-)
FW
If you don't specify a timezone then R will use your system's locale as POSIXct objects must have a timezone. The difference between CEST and CET is that one is summertime and one is not. That means if you define a date during the part of the year defined as summertime then R will decide to use the summertime version of the timezone. If you want to set dates that don't use summertime versions then define them as GMT from the beginning.
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="GMT")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="GMT")
If you want to truncate out the time, don't use as.Date on a POSIXct object since as.Date is meant to convert character objects to Date objects (which aren't the same as POSIXct objects). If you want to truncate POSIXct objects with base R then you'll have to wrap either round or trunc in as.POSIXct but I would recommend checking out the lubridate package for dealing with dates and times (specifically POSIXct objects).
If you want to keep CET but never use CEST you can use a location that doesn't observe daylight savings. According to http://www.timeanddate.com/time/zones/cet your only options are Algeria and Tunisia. According to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones the valid tz would be "Africa/Algiers". Therefore you could do
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="Africa/Algiers")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="Africa/Algiers")
and both x and y would be in CET.
One more thing about setting timezones. If you tell R you want a generic timezone then it won't override daylight savings settings. That's why setting attr(y, "tzone") <- "CET" didn't have the desired result. If you did attr(y, "tzone") <- "Africa/Algiers" then it would have worked as you expected. Do be careful with conversions though because when you change the timezone it will change the time to account for the new timezone. The package lubridate has the function force_tz which changes the timezone without changing the time for cases where the initial timezone setting was wrong but the time was right.
Complementary answer:
1) Just use the right timezone throughout from the beginning. Since I live in Hamburg, Germany, the right timezone for me is "Europe/Berlin", see this list as said by Dean.
2) For extracting information from POSIXct, for example, the date, I use
as.Date(format(timeStamp, "%Y-%m-%d"))
which is slow but seems to give the correct answer... plus I dont have to install new packages [which is a bit complicated in my situation].
I ran into the same issue and found your Question here.
While all the given answers are valid and get an upvote from me, I'd like to share another - not too elegant - but working solution:
When ever you want to transform from class 'Date' to 'POSIXct' or vice versa, use as.character() before you convert:
x = as.POSIXct("2022-01-01")
y = as.POSIXct("2022-06-01")
x_Date <- as.Date(x)
x_POSIXct_again <- as.POSIXct(x_Date)
identical(x, x_POSIXct_again)
# FALSE!
y_Date <- as.Date(y)
y_POSIXct_again <- as.POSIXct(y_Date)
identical(y, y_POSIXct_again)
# FALSE!
x_Date <- as.Date(as.character(x))
x_POSIXct_again <- as.POSIXct(as.character(x_Date))
identical(x, x_POSIXct_again)
# TRUE!
y_Date <- as.Date(as.character(y))
y_POSIXct_again <- as.POSIXct(as.character(y_Date))
identical(y, y_POSIXct_again)
# TRUE!
# little helpers
as_Date2 <- function(x, ...) {
if("POSIXct" %in% class(x)) x <- as.character(x)
as.Date(x, ...)
}
as_POSIXct2 <- function(x, ...) {
if("Date" %in% class(x)) x <- as.character(x)
as.POSIXct(x, ...)
}
Obviously - time information gets lost when converting from POSIXct to DATE. But no more day-switching at last.
I have a POSIXct object and would like to change it's tz attribute WITHOUT R to interpret it (interpret it would mean to change how the datetime is displayed on the screen).
Some background: I am using the fasttime package from S.Urbanek, which take strings and cast it to POSIXct very quickly. Problem is that the string should represent a datetime in "GMT" and it's not the case of my data.
I end up with a POSIXct object with tz=GMT, in reality it is tz=GMT+1, if I change the timezone with
attr(datetime, "tzone") <- "Europe/Paris";
datetime <- .POSIXct(datetime,tz="Europe/Paris");
then it will be "displayed" as GMT+2 (the underlying value never change).
EDIT: Here is an example
datetime=as.POSIXct("2011-01-01 12:32:23.234",tz="GMT")
attributes(datetime)
#$tzone
#[1] "GMT"
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
How can I change this attribute without R to interpret it aka how can I change tzone and still have datetime displayed as "2011-01-01 12:32:23.233" ?
EDIT/SOLUTION, #GSee's solution is reasonably fast, lubridate::force_tz very slow
datetime=rep(as.POSIXct("2011-01-01 12:32:23.234",tz="GMT"),1e5)
f <- function(x,tz) return(as.POSIXct(as.numeric(x), origin="1970-01-01", tz=tz))
> system.time(datetime2 <- f(datetime,"Europe/Paris"))
user system elapsed
0.01 0.00 0.02
> system.time(datetime3 <- force_tz(datetime,"Europe/Paris"))
user system elapsed
5.94 0.02 5.98
identical(datetime2,datetime3)
[1] TRUE
To change the tz attribute of a POSIXct variable it is not best practice to convert to character or numeric and then back to POSIXct. Instead you could use the force_tz function of the lubridate package
library(lubridate)
datetime2 <- force_tz(datetime, tzone = "CET")
datetime2
attributes(datetime2)
EDITED:
My previous solution was passing a character value to origin (i.e.origin="1970-01-01"). That only worked here because of a bug (#PR14973) that has now been fixed in R-devel.
origin was being coerced to POSIXct using the tz argument of the as.POSIXct call, and not "GMT" as it was documented to do. The behavior has been changed to match the documentation which, in this case, means that you have to specify your timezone for both the origin and the as.POSIXct call.
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
as.POSIXct(as.numeric(datetime), origin=as.POSIXct("1970-01-01", tz="Europe/Paris"),
tz="Europe/Paris")
#[1] "2011-01-01 12:32:23.233 CET"
This will also works in older versions of R.
An alternative to the lubridate package is via conversion to and back from character type:
recastTimezone.POSIXct <- function(x, tz) return(
as.POSIXct(as.character(x), origin = as.POSIXct("1970-01-01"), tz = tz))
(Adapted from GSee's answer)
Don't know if this is efficient, but it would work for time zones with daylight savings.
Test code:
x <- as.POSIXct('2003-01-03 14:00:00', tz = 'Etc/UTC')
x
recastTimezone.POSIXct(x, tz = 'Australia/Melbourne')
Output:
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 14:00:00 AEDT" # Nothing is changed apart from the time zone.
Output if I replaced as.character() by as.numeric() (as GSee had done):
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 15:00:00 AEDT" # An hour is added.