If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.
Related
I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}
I have a vector of dates called KeyDates containing two key dates. I would like to make a new vector of dates called KeyDatesPlus containing those two key dates and the two days after, in chronological order.
KeyDates <- structure(c(15159,15165), class = "Date")
#KeyDates Output:
[1] "2011-07-04" "2011-07-10"
#desired output for KeyDatesPlus:
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
How could I achieve that? Thank you very much.
sort(c(KeyDates, KeyDates + 1))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
structure( sapply(KeyDates, "+", (0:1)), class = "Date")
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
Or:
as.Date( sapply(KeyDates, "+", (0:1)))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDates.plus <- as.Date(sapply(KeyDates, function(x) c(x, x+1)))
An answer using the package lubridate:
library("lubridate")
your.vector <- c("2011-07-04", "2011-07-10")
your.vector <- parse_date_time(x = your.vector, orders = "ymd")
your.vector
# [1] "2011-07-04 UTC" "2011-07-10 UTC"
one.day <- days(x = 1)
one.day
# [1] "1d 0H 0M 0S"
your.vector + one.day
# [1] "2011-07-05 UTC" "2011-07-11 UTC"
# your exact desired output (non-UTC time zone can be specified in parse_date_time):
new.vector <- sort(x = c(your.vector, your.vector + one.day))
# [1] "2011-07-04 UTC" "2011-07-05 UTC" "2011-07-10 UTC" "2011-07-11 UTC"
Lubridate distinguishes a "period" from a "duration."
A period is the time on the clock (ie if daylight savings time happens, it's what the clock reads). That's what's specified here using days().
A duration is the physical time (ie if daylight savings time happens, it's how long you've actually been sitting there.) That could be specified instead using ddays().
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDatesPlus <- KeyDates+1
KeyDatesPlus <- sort(unique(c(KeyDates, KeyDatesPlus)))
Hi I'm trying to get a sequence of dates with lubridate
This doesn't work
seq(ymd('2012-04-07'),ymd('2013-03-22'),by=week(1))
the base command
seq(as.Date('2012-04-7'),as.Date('2013-03-22'),'weeks')
does, but I'd like to know if there is an elegant way to do this with lubridate.
EDIT
Please ignore : solved myself so leaving up for posterity only. Happy to have this deleted if necessary.
seq(ymd('2012-04-07'),ymd('2013-03-22'),by='weeks')
Does the trick
ymd is a wrapper to parse date strings and returns a POSIXct object.
You simply need to use standard terminology described in ?seq.POSIXt (not lubridate) to define weeks
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '1 week')
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = 'weeks')
will works
as will
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '2 week')
You could coerce the lubridate Period class object to a difftime, but that seems rather unnecessary
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = as.difftime(weeks(1)))
This is a way to stick within the POSIXct universe of lubridate and not change date formats to base R's POSIXt. I avoid changing the date format in my scripts because I find it is a common place where bugs (for example time-zone changes or losing timestamps) are introduced. It follows this advice to use %m+%: R: adding 1 month to a date
# example date is a leap day for a "worst case scenario"
library("lubridate")
posixct.in <- parse_date_time(x = "2016-02-29", orders = "ymd")
# [1] "2016-02-29 UTC"
posixct.seq <- posixct.in %m+% years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2017-02-28 UTC" "2018-02-28 UTC" "2019-02-28 UTC"
posixct.seq <- posixct.in %m+% months(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-29 UTC" "2016-04-29 UTC" "2016-05-29 UTC"
posixct.seq <- posixct.in %m+% days(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-01 UTC" "2016-03-02 UTC" "2016-03-03 UTC"
posixct.seq <- posixct.in %m+% weeks(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-07 UTC" "2016-03-14 UTC" "2016-03-21 UTC"
A regular + also works sometimes, but the %m+% prevents errors like this:
posixct.seq <- posixct.in + years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" NA NA NA
At first I was confused because I thought %m+ was just a way to add months, and similar lubridate commands like %y+% etc. do not exist. But, turns out the "m" doesn't stand for "month addition". My best guess is "magic" =)
Is there a function (built-in or packaged) that would allow parsing a time like "25:15:00" as "1:15 on the next day"? Unfortunately, as.POSIXct doesn't like it with the %X specification (equivalent to %H:%M:%S),
> as.POSIXct('25:15:00', format='%X')
[1] NA
> as.POSIXct('15:15:00', format='%X')
[1] "2013-05-24 15:15:00 CEST"
and I can't find a suitable conversion specification in the strptime docs.
Not thoroughly tested but you can try this function
parse_time <- function(x, format = "%X") {
hour <- as.numeric(substr(x, 1, 2))
delta <- ifelse(hour >= 24, 24 * 3600, 0)
hour <- hour %% 24
date <- paste0(hour, substr(x, 3, nchar(x)))
strptime(date, format = format) + delta
}
parse_time(c('25:15:00', "23:10:00"))
##[1] "2013-05-25 01:15:00 GMT" "2013-05-24 23:10:00 GMT"
Now there is:
library(devtools)
install_github('kimisc', 'krlmlr')
library(kimisc)
hms.to.seconds('25:15:00')
It uses a slightly different approach than dickoa's code: The argument is filtered by gsub using a suitable regular expression, and the actual conversion doesn't involve strptime at all. See the code.
If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.