R sequence of dates with lubridate - r

Hi I'm trying to get a sequence of dates with lubridate
This doesn't work
seq(ymd('2012-04-07'),ymd('2013-03-22'),by=week(1))
the base command
seq(as.Date('2012-04-7'),as.Date('2013-03-22'),'weeks')
does, but I'd like to know if there is an elegant way to do this with lubridate.
EDIT
Please ignore : solved myself so leaving up for posterity only. Happy to have this deleted if necessary.
seq(ymd('2012-04-07'),ymd('2013-03-22'),by='weeks')
Does the trick

ymd is a wrapper to parse date strings and returns a POSIXct object.
You simply need to use standard terminology described in ?seq.POSIXt (not lubridate) to define weeks
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '1 week')
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = 'weeks')
will works
as will
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '2 week')
You could coerce the lubridate Period class object to a difftime, but that seems rather unnecessary
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = as.difftime(weeks(1)))

This is a way to stick within the POSIXct universe of lubridate and not change date formats to base R's POSIXt. I avoid changing the date format in my scripts because I find it is a common place where bugs (for example time-zone changes or losing timestamps) are introduced. It follows this advice to use %m+%: R: adding 1 month to a date
# example date is a leap day for a "worst case scenario"
library("lubridate")
posixct.in <- parse_date_time(x = "2016-02-29", orders = "ymd")
# [1] "2016-02-29 UTC"
posixct.seq <- posixct.in %m+% years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2017-02-28 UTC" "2018-02-28 UTC" "2019-02-28 UTC"
posixct.seq <- posixct.in %m+% months(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-29 UTC" "2016-04-29 UTC" "2016-05-29 UTC"
posixct.seq <- posixct.in %m+% days(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-01 UTC" "2016-03-02 UTC" "2016-03-03 UTC"
posixct.seq <- posixct.in %m+% weeks(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-07 UTC" "2016-03-14 UTC" "2016-03-21 UTC"
A regular + also works sometimes, but the %m+% prevents errors like this:
posixct.seq <- posixct.in + years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" NA NA NA
At first I was confused because I thought %m+ was just a way to add months, and similar lubridate commands like %y+% etc. do not exist. But, turns out the "m" doesn't stand for "month addition". My best guess is "magic" =)

Related

R reading in short date instead of long date [duplicate]

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.

How to specify day and month for messy date data with missing day and month when converting to date in large data frame

I have a large date frame of over 100k rows. The date column contains dates in multiple formats such as "%m/%d/%Y", "%Y-%m", "%Y", and "%Y-%m-%d". I can convert these all to dates with parse_date_time() from lubridate.
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
parse_date_time(dates, orders = c("%m/%d/%Y","%Y-%m","%Y","%Y-%m-%d"))
[1] "1983-05-10 UTC" "2014-08-17 UTC" "1953-12-01 UTC" "1975-01-01 UTC" "2001-06-17 UTC"
But as you can see, this sets dates with missing day to the first of the month and dates with missing month and day to the first of the year. How can I set those to the 15th and June 15th, respectively?
Use nchar to check the dates vector and paste what is missing.
library(lubridate)
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
dates <- ifelse(nchar(dates) == 4, paste(dates, "06-15", sep = "-"),
ifelse(nchar(dates) == 7, paste(dates, 15, sep = "-"), dates))
dates
#[1] "05/10/1983" "8/17/2014" "1953-12-15" "1975-06-15"
#[5] "2001-06-17"
parse_date_time(dates, orders = c("%m/%d/%Y","%Y-%m","%Y","%Y-%m-%d"))
#[1] "1983-05-10 UTC" "2014-08-17 UTC" "1953-12-15 UTC"
#[4] "1975-06-15 UTC" "2001-06-17 UTC"
Another solution would be to use an index vector, also based on nchar.
n <- nchar(dates)
dates[n == 4] <- paste(dates[n == 4], "06-15", sep = "-")
dates[n == 7] <- paste(dates[n == 7], "15", sep = "-")
dates
#[1] "05/10/1983" "8/17/2014" "1953-12-15" "1975-06-15"
#[5] "2001-06-17"
As you can see, the result is the same as with ifelse.
Here's another way of doing that - based on orders:
library(lubridate)
dates <- c("05/10/1983","8/17/2014","1953-12","1975","2001-06-17")
parseDates <- function(x, orders = c('mdY', 'dmY', 'Ymd', 'Y', 'Ym')){
fmts <- guess_formats(x, orders = orders)
dte <- parse_date_time(x, orders = fmts[1], tz = 'UTC')
if(!grepl('m', fmts[1]) ){
dte <- dte + days(165)
return(dte)
}
if(!grepl('d', fmts[1]) ){
dte <- dte + days(14)
}
return(dte)
}
output
> parseDates(dates[4])
[1] "1975-06-15 UTC"
> parseDates(dates[3])
[1] "1953-12-15 UTC"
This way for different date formats you only need to change the orders argument while the rest is done using lubridate.
Hope this is helpful!

Add one day to every date in a days vector

I have a vector of dates called KeyDates containing two key dates. I would like to make a new vector of dates called KeyDatesPlus containing those two key dates and the two days after, in chronological order.
KeyDates <- structure(c(15159,15165), class = "Date")
#KeyDates Output:
[1] "2011-07-04" "2011-07-10"
#desired output for KeyDatesPlus:
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
How could I achieve that? Thank you very much.
sort(c(KeyDates, KeyDates + 1))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
structure( sapply(KeyDates, "+", (0:1)), class = "Date")
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
Or:
as.Date( sapply(KeyDates, "+", (0:1)))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDates.plus <- as.Date(sapply(KeyDates, function(x) c(x, x+1)))
An answer using the package lubridate:
library("lubridate")
your.vector <- c("2011-07-04", "2011-07-10")
your.vector <- parse_date_time(x = your.vector, orders = "ymd")
your.vector
# [1] "2011-07-04 UTC" "2011-07-10 UTC"
one.day <- days(x = 1)
one.day
# [1] "1d 0H 0M 0S"
your.vector + one.day
# [1] "2011-07-05 UTC" "2011-07-11 UTC"
# your exact desired output (non-UTC time zone can be specified in parse_date_time):
new.vector <- sort(x = c(your.vector, your.vector + one.day))
# [1] "2011-07-04 UTC" "2011-07-05 UTC" "2011-07-10 UTC" "2011-07-11 UTC"
Lubridate distinguishes a "period" from a "duration."
A period is the time on the clock (ie if daylight savings time happens, it's what the clock reads). That's what's specified here using days().
A duration is the physical time (ie if daylight savings time happens, it's how long you've actually been sitting there.) That could be specified instead using ddays().
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDatesPlus <- KeyDates+1
KeyDatesPlus <- sort(unique(c(KeyDates, KeyDatesPlus)))

Get date of timeseries object

I have a separately created time series object with daily frequency:
my.timeseries= ts(data= 1:10, start= c(2014,1,1), frequency = 365.25)
How can I get back the dates as POSIXct vector ("2014-01-01 UTC" ...) from this time series object?
Here's one potential method. I'm not really sure if it should be done this way, but it seems to work.
With your existing time series, try
p <- paste(attr(my.timeseries, "tsp")[1], my.timeseries)
as.POSIXct(as.Date(p, "%Y %j"))
# [1] "2014-01-01 UTC" "2014-01-02 UTC" "2014-01-03 UTC"
# [4] "2014-01-04 UTC" "2014-01-05 UTC" "2014-01-06 UTC"
# [7] "2014-01-07 UTC" "2014-01-08 UTC" "2014-01-09 UTC"
# [10] "2014-01-10 UTC"
As noted by G. Grothendieck in the comments, here is a more general solution
p <- paste(start(my.timeseries), seq_along(my.timeseries))
as.Date(p, "%Y %j")
# [1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04"
# [5] "2014-01-05" "2014-01-06" "2014-01-07" "2014-01-08"
# [9] "2014-01-09" "2014-01-10"
as.Date might be better to avoid any time-zone issues.
I strongly advise you to use xts object instead of ts.
Here is a code replicating what you want :
library(xts)
my.index = seq(from = as.Date("2014-01-01"), by = "day", length.out = 10)
my.timeseries = xts(x = 1:10, order.by = my.index)
index(my.timeseries)
Let us know if that helps :)
Romain

Is there a more elegant way to convert two-digit years to four-digit years with lubridate?

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.

Resources