Is there a way to set default origin for as.Date?
I did this function to workaround:
as.date=function(x, origin='1970-01-01') as.Date(x, origin=origin)
For example:
as.Date(0)
Error in as.Date.numeric(0) : 'origin' deve ser especificado
as.date(0)
[1] "1970-01-01"
The zoo package adds a default origin:
library(zoo)
as.Date(0)
## [1] "1970-01-01"
Update
This is several years later but it looks like R has added .Date so we can now do this using only base R.
.Date(0)
## [1] "1970-01-01"
There is an elegant and simple solution, like zoo, but allows for some tweaking if you need it:
require(anytime)
The base is simply:
anytime(0)
which returns for me in eastern standard time:[1] "1969-12-31 19:00:00 EST"
If you want to be able to force it to the UTC center of the temporal universe
anytime(0, asUTC=TRUE)
which returns
[1] "1970-01-01 UTC"
And if you want to tell R that you your data is from a given time zone :
Sys.setenv(TZ= 'desiredTimeZone') with anytime:::getTZ() as your desired time zone if that is the one, in which, your dates were gathered.
Any of the answers will work, this one just gives you control over the integer (or string) of numerals as well as the time zone...so it is pretty universally useful if you are working with data gathered remotely.
The lubridate package has been specically made the work with dates easier :
library(lubridate)
as_date(0)
#[1] "1970-01-01"
Not really. There is no way the origin date can be changed and remain applicable forever in a session.
If you look at parameters for as.Date (i.e. function then origin does not has a default value when x is in numeric.
## S3 method for class 'numeric'
as.Date(x, origin, ...)
Perhaps, it would have been a good extension to as.Date function to provide default value for origin.
OP has done write thing to create a wrapper function to remove dependency on origin. Perhaps the function can be improved slightly like:
Modified function based on suggestions from suggestions from #sm1 and #Gregor.
## if date.origin is not defined then origin will be taken as "1970-01-01
options(date.origin = "1970-01-01")
as.date <- function(x, origin = getOption("date.origin")){
origin <- ifelse(is.null(origin), "1970-01-01", origin)
as.Date(x, origin)
}
## Results: (When date.origin is not set)
## > as.date(0)
## [1] "1970-01-01"
## > as.date(2)
## [1] "1970-01-03"
## Results: (When date.origin is set)
## > options(date.origin = "1970-01-05")
## > as.date(2)
## [1] "1970-01-07"
Related
I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE
The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours
I have the following piece of R-code:
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString))
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString))
I have the problem that when I do as.Date(y) then I get 2015-07-16 (although its date is one day later!). Apparently the problem is the timezone. So I checked the timezones:
> x
[1] "2013-11-23 23:10:38 CET"
> y
[1] "2015-07-17 01:43:38 CEST"
>
Ok, so they deviate in their timezone. This is weird, because why does R decide that one timestamp (given without any timezone at all) lies in a different timezone than another (given without any timezone at all)?
Ok, so lets set the timezone. Googling revealed that attr(y, "tzone") <- "CET" should do the deal. Lets try this:
> attr(y, "tzone") <- "CET"
> y
[1] "2015-07-17 01:43:38 CEST"
>
Ok, that did not work. Let see what the timezone actually is in the beginning:
> formatString = "%Y-%m-%d %H:%M:%OS"
> x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString))
> y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString))
> unclass(x)
[1] 1385244638
attr(,"tzone")
[1] ""
> unclass(y)
[1] 1437090218
attr(,"tzone")
[1] ""
>
So... they dont have a timezone at all but their timezones are different????
--> here are my natural questions:
1) why are they initialized with a different timezone when I dont specify a timezone at all?
2) why do both objects apparently not have a timezone and at the same time... how come they have different timezones?
3) How can I make as.Date(y) == "2015-07-17" true? I.e. how can I set both to the current timezone? Sys.timezone() results in 'NA'... (EDIT: my timezone [Germany] seems to be "CET" --> how can I set both to CET?)
I'm scratching my head here... Thanks for any thoughts on this you share with me :-)
FW
If you don't specify a timezone then R will use your system's locale as POSIXct objects must have a timezone. The difference between CEST and CET is that one is summertime and one is not. That means if you define a date during the part of the year defined as summertime then R will decide to use the summertime version of the timezone. If you want to set dates that don't use summertime versions then define them as GMT from the beginning.
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="GMT")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="GMT")
If you want to truncate out the time, don't use as.Date on a POSIXct object since as.Date is meant to convert character objects to Date objects (which aren't the same as POSIXct objects). If you want to truncate POSIXct objects with base R then you'll have to wrap either round or trunc in as.POSIXct but I would recommend checking out the lubridate package for dealing with dates and times (specifically POSIXct objects).
If you want to keep CET but never use CEST you can use a location that doesn't observe daylight savings. According to http://www.timeanddate.com/time/zones/cet your only options are Algeria and Tunisia. According to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones the valid tz would be "Africa/Algiers". Therefore you could do
formatString = "%Y-%m-%d %H:%M:%OS"
x = as.POSIXct(strptime("2013-11-23 23:10:38.000000", formatString), tz="Africa/Algiers")
y = as.POSIXct(strptime("2015-07-17 01:43:38.000000", formatString), tz="Africa/Algiers")
and both x and y would be in CET.
One more thing about setting timezones. If you tell R you want a generic timezone then it won't override daylight savings settings. That's why setting attr(y, "tzone") <- "CET" didn't have the desired result. If you did attr(y, "tzone") <- "Africa/Algiers" then it would have worked as you expected. Do be careful with conversions though because when you change the timezone it will change the time to account for the new timezone. The package lubridate has the function force_tz which changes the timezone without changing the time for cases where the initial timezone setting was wrong but the time was right.
Complementary answer:
1) Just use the right timezone throughout from the beginning. Since I live in Hamburg, Germany, the right timezone for me is "Europe/Berlin", see this list as said by Dean.
2) For extracting information from POSIXct, for example, the date, I use
as.Date(format(timeStamp, "%Y-%m-%d"))
which is slow but seems to give the correct answer... plus I dont have to install new packages [which is a bit complicated in my situation].
I ran into the same issue and found your Question here.
While all the given answers are valid and get an upvote from me, I'd like to share another - not too elegant - but working solution:
When ever you want to transform from class 'Date' to 'POSIXct' or vice versa, use as.character() before you convert:
x = as.POSIXct("2022-01-01")
y = as.POSIXct("2022-06-01")
x_Date <- as.Date(x)
x_POSIXct_again <- as.POSIXct(x_Date)
identical(x, x_POSIXct_again)
# FALSE!
y_Date <- as.Date(y)
y_POSIXct_again <- as.POSIXct(y_Date)
identical(y, y_POSIXct_again)
# FALSE!
x_Date <- as.Date(as.character(x))
x_POSIXct_again <- as.POSIXct(as.character(x_Date))
identical(x, x_POSIXct_again)
# TRUE!
y_Date <- as.Date(as.character(y))
y_POSIXct_again <- as.POSIXct(as.character(y_Date))
identical(y, y_POSIXct_again)
# TRUE!
# little helpers
as_Date2 <- function(x, ...) {
if("POSIXct" %in% class(x)) x <- as.character(x)
as.Date(x, ...)
}
as_POSIXct2 <- function(x, ...) {
if("Date" %in% class(x)) x <- as.character(x)
as.POSIXct(x, ...)
}
Obviously - time information gets lost when converting from POSIXct to DATE. But no more day-switching at last.
In the R package lubridate, I can easily create a date with the following syntax:
> mdy("5/4/2015")
As expected, it produces the following result:
[1] "2015-05-04 UTC"
However, if I try to add that very value to an array, it seems to change from UTC to my local time (EDT):
> c(mdy("5/4/2015"))
[1] "2015-05-03 20:00:00 EDT"
Since I don't care about times this wouldn't affect me much except that this results in the date shifting back by 1, as follows:
> day(mdy("5/4/2015"))
[1] 4
> day(c(mdy("5/4/2015")))
[1] 3
To me, the act of adding something to an array should not change the value of that something. Am I missing something here, and is there a way to resolve this issue?
That's because lubridate::mdy assumes UTC. When you wrap it around c(), it reverts back to your local timezone EDT because c() does not pass on the timezone attribute:
> attr(mdy("5/4/2015", tz = "EDT"), "tzone")
# [1] "EDT"
> attr(c(mdy("5/4/2015", tz = "EDT")), "tzone")
# NULL
You can do:
Sys.setenv(TZ = "UTC")
To set your local timezone to UTC.
Alternatively, you can specity the timezone explicitly in mdy():
mdy("5/4/2015", tz = "UTC")
Apart from Steven's solution, you could also store your dates in a list
list(mdy("5/4/2015"))[[1]]
#[1] "2015-05-04 UTC"
This won't remove the timezone and you don't have to mess around with environment variables.
I agree with you: If you look at c as some form of constructor for a "vector" and you come from a C++ or similar background, the removal of attributes (except for names) certainly seems strange.
I am trying to convert the string "2013-JAN-14" into a Date as follow :
sdate1 <- "2013-JAN-14"
ddate1 <- as.Date(sdate1,format="%Y-%b-%d")
ddate1
but I get :
[1] NA
What am I doing wrong ? should I install a package for this purpose (I tried installing chron) .
Works for me. The reasons it doesn't for you probably has to do with your system locale.
?as.Date has the following to say:
## This will give NA(s) in some locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- as.Date(x, "%d%b%Y")
## Sys.setlocale("LC_TIME", lct)
Worth a try.
This can also happen if you try to convert your date of class factor into a date of class Date. You need to first convert into POSIXt otherwise as.Date doesn't know what part of your string corresponds to what.
Wrong way: direct conversion from factor to date:
a<-as.factor("24/06/2018")
b<-as.Date(a,format="%Y-%m-%d")
You will get as an output:
a
[1] 24/06/2018
Levels: 24/06/2018
class(a)
[1] "factor"
b
[1] NA
Right way, converting factor into POSIXt and then into date
a<-as.factor("24/06/2018")
abis<-strptime(a,format="%d/%m/%Y") #defining what is the original format of your date
b<-as.Date(abis,format="%Y-%m-%d") #defining what is the desired format of your date
You will get as an output:
abis
[1] "2018-06-24 AEST"
class(abis)
[1] "POSIXlt" "POSIXt"
b
[1] "2018-06-24"
class(b)
[1] "Date"
My solution below might not work for every problem that results in as.Date() returning NA's, but it does work for some, namely, when the Date variable is read in in factor format.
Simply read in the .csv with stringsAsFactors=FALSE
data <- read.csv("data.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date)
After trying (and failing) to solve the NA problem with my system locale, this solution worked for me.