Changing date formats and calculating duration using lubridate - r

I am trying to get the duration between issue_d and last_pymnt_d in my dataset using the lubridate package. issue_d is in the following formatting chr "2015-05-01T00:00:00Z" and last_pymnt_d is in chr "Feb-2017". I need them in the same format (just need "my" or "myd" is fine if "my" is not an option) Then I need to know calculate between issue_d and last_pymnt_d.
lcDataSet2$issue_d<-parse_date_time(lcDataSet2$issue_d, "myd")
turns my issue_d into NA. I also get the below error when even just trying to view last_pymnt_d in date format
as.Date(lcRawData$last_pymnt_d)
Error in charToDate(x) :
character string is not in a standard unambiguous format
How can I get these into the same date format and then calculate the duration?

The order and letter case of the format string is important for parsing dates.
library(lubridate)
parse_date_time('2015-05-01T00:00:00Z', 'Y-m-d H:M:S')
[1] "2015-05-01 UTC"
parse_date_time('Feb-2017', 'b-Y')
[1] "2017-02-01 UTC"
If wanting just the month and year there is a zoo function
library(zoo)
date1 <- as.yearmon('2015-05-01T00:00:00Z')
[1] "May 2015"
date2 <- as.yearmon('Feb-2017', '%b-%Y')
[1] "Feb 2017"
difftime(date2, date1)
Time difference of 642 days

The zoo package gives you a function as.yearmon to covert dates into yearmon objects containing only the month and year. Since your last_pymnt_d is only month and year, the best date difference you will get is number of months:
library(zoo)
issue_d <- "2015-05-01T00:00:00Z"
last_pymnt_d <- "Feb-2017"
diff <- as.yearmon(last_pymnt_d, format = "%b-%Y") - as.yearmon(as.Date(issue_d))
diff
1.75
Under the hood, the yearmon object is a number of years, with the decimal component representing the months. A difference in yearmon of 1.75 is 1 year and 9 months.
diff_months <- paste(round(diff * 12, 0), "months")
"21 months"
diff_yearmon <- paste(floor(diff), "years and", round((diff %% 1) * 12, 0), "months")
diff_yearmon
"1 years and 9 months"

Related

Convert character YYYY-MM-00 into date YYYY-MM in R

I imported Excel data into R and I have a problem to convert dates.
In R, my data are character and look like :
date<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
I would like to convert character into date (MM/YYYY) but the '00' value used for days poses a problem and 'NA' are returned systematically.
It works when I manually replace '00' with '01' and then use as.yearmon, ymd and format. But I have lots of dates to change and I don't know how to change all my '00' into '01' in R.
# data exemple
date1<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
# removing time -> doesn't work because of the '00' day
date1c<-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
date1c<-format(strptime(date1, format = '%Y-%m'), '%Y/%m')
# trying to convert character into date -> doesn't work either
date1c<-ymd(date1)
date1c<-strptime(date1, format = "%Y-%m-%d %H:%M:%S")
date1c<-as.Date(date1, format="%Y-%m-%d %H:%M:%S")
date1c<as.yearmon(date1, format='%Y%m')
# everything works if days are '01'
date2<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date2c<-as.yearmon(ymd(format(strptime(date2, format = "%Y-%m-%d"), "%Y/%m/%d")))
date2c
If you have an idea to do it or an another idea to solve my problem, I would be thankful!
Use gsub to replace -00 with -01.
date1<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date1 <- gsub("-00", "-01", date1)
date1c <-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
> date1c
[1] "1971/02/01" "1979/06/01"
Another possibility could be:
as.Date(paste0(substr(date1, 1, 9), "1"), format = "%Y-%m-%d")
[1] "1971-02-01" "1979-06-01"
Here it extracts the first nine characters, pastes it together with 1 and then converts it into a date object.
These alternatives each accept a vector input and produce a vector as output.
Date output
These all will accept a vector as input and produce a Date vector as the output.
# 1. replace first occurrence of '00 ' with '01 ' and then convert to Date
as.Date(sub("00 ", "01 ", date1))
## [1] "1971-02-01" "1979-06-01"
# 2. convert to yearmon class and then to Date
library(zoo)
as.Date(as.yearmon(date1, "%Y-%m"))
## [1] "1971-02-01" "1979-06-01"
# 3. insert a 1 and then convert to Date
as.Date(paste(1, date1), "%d %Y-%m")
## [1] "1971-02-01" "1979-06-01"
yearmon output
Note that if you really are trying to represent just months and years then yearmon class directly represents such objects without the kludge of using an unused day of the month. Such objects are internally represented as a year plus a fraction of a year, i.e. year + 0 for January, year + 1/12 for February, etc. They display in a meaningful way, they sort in the expected manner and can be manipulated, e.g. take the difference between two such objects or add 1/12 to get the next month, etc. As with the others it takes a vector in and produces a vector out.
library(zoo)
as.yearmon(date1, "%Y-%m")
## [1] "Feb 1971" "Jun 1979"
character output
If you want character output rather than Date or yearmon output then these variations work and again accept a vector as input and produce a vector as output:
# 1. replace -00 and everything after that with a string having 0 characters
sub("-00.*", "", date1)
## [1] "1971-02" "1979-06"
# 2. convert to yearmon and then format that
library(zoo)
format(as.yearmon(date1, "%Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 3. convert to Date class and then format that
format(as.Date(paste(1, date1), "%d %Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 4. pick off the first 7 characters
substring(date1, 1, 7)
## [1] "1971-02" "1979-06"

POSIX date from dates in weekly time format

I have dates encoded in a weekly time format (European convention >> 01 through 52/53, e.g. "2016-48") and would like to standardize them to a POSIX date:
require(magrittr)
(x <- as.POSIXct("2016-12-01") %>% format("%Y-%V"))
# [1] "2016-48"
as.POSIXct(x, format = "%Y-%V")
# [1] "2016-01-11 CET"
I expected the last statement to return "2016-12-01" again. What am I missing here?
Edit
Thanks to Dirk, I was able to piece it together:
y <- sprintf("%s-1", x)
While I still don't get why this doesn't work
(as.POSIXct(y, format = "%Y-%V-%u"))
# [1] "2016-01-11 CET"
this does
(as.POSIXct(y, format = "%Y-%U-%u")
# [1] "2016-11-28 CET"
Edit 2
Oh my, I think using %V is a very bad idea in general:
as.POSIXct("2016-01-01") %>% format("%Y-%V")
# [1] "2016-53"
Should this be considered to be on a "serious bug" level that requires further action?!
Sticking to either %U or %W seems to be the right way to go
as.POSIXct("2016-01-01") %>% format("%Y-%U")
# [1] "2016-00"
Edit 3
Nope, not quite finished/still puzzled: the approach doesn't work for the very first week
(x <- as.POSIXct("2016-01-01") %>% format("%Y-%W"))
# [1] "2016-00"
as.POSIXct(sprintf("%s-1", x), format = "%Y-%W-%u")
# [1] NA
It does for week 01 as defined in the underlying convention when using %U or %W (so "week 2", actually)
as.POSIXct("2016-01-1", format = "%Y-%W-%u")
# [1] "2016-01-04 CET"
As I have to deal a lot with reporting by ISO weeks, I've created the ISOweek package some years ago.
The package includes the function ISOweek2date() which returns the date of a given weekdate (year, week of the year, day of week according to ISO 8601). It's the inverse function to date2ISOweek().
With ISOweek, your examples become:
library(ISOweek)
# define dates to convert
dates <- as.Date(c("2016-12-01", "2016-01-01"))
# convert to full ISO 8601 week-based date yyyy-Www-d
(week_dates <- date2ISOweek(dates))
[1] "2016-W48-4" "2015-W53-5"
# convert back to class Date
ISOweek2date(week_dates)
[1] "2016-12-01" "2016-01-01"
Note that date2ISOweek() requires a full ISO week-based date in the format yyyy-Www-d including the day of the week (1 to 7, Monday to Sunday).
So, if you only have year and ISO week number you have to create a character string with a day of the week specified.
A typical phrase in many reports is, e.g., "reporting week 31 ending 2017-08-06":h
yr <- 2017
wk <- 31
ISOweek2date(sprintf("%4i-W%02i-%1i", yr, wk, 7))
[1] "2017-08-06"
Addendum
Please, see this answer for another use case and more background information on the ISOweek package.

Is there an inverse of the yday lubridate function?

I have a list of days in the format "day of the year" obtained by applying lubridate::yday() function to a list of dates. For instance, from the following dates (mm-dd-yyyy format):
01-01-2015
01-02-2015
...
by applying yday() you get
1
2
...
Is there a function that can do the reverse given the yday output and the year? Ie, from a yday value and a year, get back to a date in the mm-dd-yyyy format?
To do this with the lubridate package, you use the parse_date_time() function with j as the order = argument.
Example:
my.yday <- "1"
parse_date_time(x = my.yday, orders = "j")
# [1] "2021-01-01 UTC"
The default appears to be the current year. If you want to specify the year just add it in!
my.yday <- "1"
parse_date_time(x = paste(1995, my.yday), orders = "yj")
# [1] "1995-01-01 UTC"
Note: make sure you supply the julian day as character, not numeric. If numeric it will fail when over 3 digits (weird bug that they don't plan to fix)
Any sequence added to a Date() type creates a new Date() sequence with just that offset.
Witness:
R> as.Date("2016-01-01") + 0:9
[1] "2016-01-01" "2016-01-02" "2016-01-03"
[4] "2016-01-04" "2016-01-05" "2016-01-06"
[7] "2016-01-07" "2016-01-08" "2016-01-09"
[10] "2016-01-10"
R> as.Date("2016-01-01") + 100:109
[1] "2016-04-10" "2016-04-11" "2016-04-12"
[4] "2016-04-13" "2016-04-14" "2016-04-15"
[7] "2016-04-16" "2016-04-17" "2016-04-18"
[10] "2016-04-19"
R>
So once again a so-called lubridate question as nothing to do with that package but simply requires to know how the base R types function.
> yday ("1990-03-17") - 1 + as.Date ("1990-01-01")
[1] "1990-03-17"
Just as round out the answer by #DirkEddelbuettel, since I was having the same issue and its been 3 years.
In order to get the desired mm-dd-yyyy for a given dataset where you have only the year and the calendar day as a result of lubridate::yday().
You just need to offset a Date() type object for that given year starting at january 1st, by what your yday() output is subtracted by one.
Then if you want to get the month or day within that month you can use lubridate's month() and day() functions to pull those parts.
(I assume you have the year since that will impact the calendar date b/c of leap years, messing up your month/day assignment. If not, any year will do)
library(dplyr)
library(magrittr)
#Example dataset with years on/around leap year
my_df <- data.frame(
year = c(2010, 2011, 2012, 2013, 2014, 2015),
my_yday = c(150, 150, 150, 150, 150, 150)
)
#skip straight back to the yyyy-mm-dd format
my_df %>% mutate(new_date = as.Date(paste0(year, "-01-01")) + (my_yday - 1))
#Get every component
my_df %>%
mutate(
new_day = lubridate::day(as.Date(str_c(year, "-01-01")) + (my_yday - 1)),
new_month = lubridate::month(as.Date(paste0(year, "-01-01")) + (my_yday - 1)),
new_date = as.POSIXct(str_c(new_month, new_day, year, sep = "/"),
format = "%m/%d/%y"))

Converting character into a Date format [duplicate]

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Closed 5 years ago.
Is it possible to format the following number to Year-Month
I entries as follows:
1402
1401
1312
Meaning February 2014. January 2014 and December 2013.
I tried:
date <- 1402
date <- as.Date(as.character(date), format = "%y%m")
But I get an NA as an output.
The zoo package has a "yearmon" class that directly handles year/month objects:
library(zoo)
nums <- c(1402, 1401, 1312)
ym <- as.yearmon(as.character(nums), "%y%m")
giving:
> ym
[1] "Feb 2014" "Jan 2014" "Dec 2013"
You need to include day number, otherwise it is impossible to understand what day of month you have in mind, consider:
> strptime('011402', format = "%d%y%m")
[1] "2014-02-01"
as.Date requires a full date, with day specified. Since you don't include a day it doesn't know what to do.
You could add any day and it should work like this
date <- 140201
date <- as.Date(as.character(date), format="%y%m%d")
You could use the lubridate package to work with date a little bit easier.
> library(lubridate)
> month(ymd(as.character(140201), label=TRUE)
[1] February

How to transform dates in Y-m format without days [duplicate]

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Closed 6 years ago.
I have a data vector that looks like this:
dates<-c("2014-11", "2014-12", "2015-01", "2015-02", "2015-03", "2015-04")
I am trying to convert it into a recognizable date format, however no luck:
as.Date(dates,"%Y-%m")
[1] NA NA NA NA NA NA
I suspect that the problem lies in that that there is no day specified.
Any thoughs of how this can be solved?
If we need to convert to Date class, it needs a day. So, we can paste with one of the days of interest, say 1, and use as.Date
as.Date(paste0(dates, "-01"))
The zoo package has a nice interface to this, which allows storing of year-month data and a as.Date method to coerce to a Date object. For example:
library("zoo")
dates <- c("2014-11", "2014-12", "2015-01", "2015-02", "2015-03", "2015-04")
The function to convert the character vector or year-months into a yearmon is as.yearmon. The second argument is the format of the date parts in the individual strings. Here I use
%Y for year with century
%m for the month as a decimal
Separated by literal -
.
yrmo <- as.yearmon(dates, "%Y-%m")
This gives
> yrmo
[1] "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
This is actually the default, so you can leave off the format part entirely, e.g. yrmo <- as.yearmon(dates)
To convert to a Date class object, the as.Date method is used
> as.Date(yrmo)
[1] "2014-11-01" "2014-12-01" "2015-01-01" "2015-02-01" "2015-03-01"
[6] "2015-04-01"
This method has a second argument frac which is specified allows you to state how far through the month you want each resulting Date element to be (how many days as a fraction of the length of the month in days)
> as.Date(yrmo, frac = 0.5)
[1] "2014-11-15" "2014-12-16" "2015-01-16" "2015-02-14" "2015-03-16"
[6] "2015-04-15"

Resources