Negative months as difference between two dates - r

I am aware that there are some posts similar to this here on stackoverflow. But they did not directly address my issue. Here is my issue:
I have a variable called earliest_cr_line which contains dates as Jan-01. This is a string variable. I need to create a variable called "test" which should contain the difference between earliest_cr_line and Dec-2007 in months. To this end, I ran the following codes:
library(zoo)
loan_data$earliest_cr_line_date <- as.yearmon(loan_data$earliest_cr_line, "%b-%y")
ref_date <- as.yearmon("Dec-07", "%b-%y")
loan_data$test <- round((as.Date(ref_date) -
as.Date(loan_data$earliest_cr_line_date))/(365.25/12))
However, the newly created variable test contains many negative numbers as well. I figured out that when converting earliest_cr_line from string to yearmon, R misinterpreted years which were before 1970. For example, yearmon converted Jan-60 into Nov 2060 instead of Nov 1960. That's what is causing the negative output. Any idea how I should approach this problem?
Thanks.

Date's integer is a day, making day-to-month determination inconsistent. yearmon's integer is a year, which makes a month just 1/12, a bit simpler to deal with. If you start with zoo's yearmon object, then I suggest you stick with it instead of trying convert to/from R's Date object.
Handling wrong years is an annoying Y2K problem ... while this below will generally work (assuming that everything you're looking at is in the past), I urge you to fix this problem at the source. (I am astounded that something somewhere still thinks that 2-digit years is acceptable. *shrug*)
vec <- c("Nov-60","Nov-70","Nov-71","Jan-01","Mar-05","Dec-07")
(out <- zoo::as.yearmon(vec, format="%b-%y"))
# [1] "Nov 2060" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
(wrongcentury <- as.integer(gsub(".* ", "", out)) > as.integer(format(Sys.Date(), "%Y")))
# [1] TRUE FALSE FALSE FALSE FALSE FALSE
vec[wrongcentury]
# [1] "Nov-60"
zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
# [1] "Nov 1960"
out[wrongcentury] <- zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
out
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
Edit: much more concise recommendation from G. Grothendieck:
out <- zoo::as.yearmon(vec, format="%b-%y")
out - 100 * (out > zoo::as.yearmon(Sys.Date()))
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
If your source data ever comes close to 1920, then this inferential solution will further break. (More reason to fix it at the source :-)

Related

Convert Excel numeric to date

I have a vector of numeric excel dates i.e.
date <- c(42963,42994,42903,42933,42964)
The output am I expecting when using excel_numeric_to_date function from janitor package and as.yearmon function from zoo package
as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2016" "Sep 2016" "Jun 2017" "Jul 2017" "Aug 2017".
However, the conversion for the first to elements of the date vector are incorrect. The actual result are:
as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
I have tried using different option(modern and mac pre-2011) for the date_system argument in the excel_numeric_to_date but it does not help either
The excel version is 2010
You can simply use as.Date and specify the origin, i.e.
as.Date(date, origin="1899-12-30")
#[1] "2017-08-16" "2017-09-16" "2017-06-17" "2017-07-17" "2017-08-17"
#or format it to your liking,
format(as.Date(date, origin="1899-12-30"), '%b %Y')
#[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
This link gives quite a bit of information on this matter.
If you want to convert dates from Excel, you can use as.Date() with a specific origin. According to the documentation, "1900-01-0"' is used as day in Excel on Windows, but "this is complicated by Excel incorrectly treating 1900 as a leap year". So "1899-12-30" should be used for dates post 1901:
date <- c(42963,42994,42903,42933,42964)
This is the result of as.Date():
as.Date(date, origin = "1899-12-30")
[1] "2017-08-18" "2017-09-18" "2017-06-19" "2017-07-19" "2017-08-19"
You can then use zoo::as.yearmon()` to get the expected outcome:
zoo::as.yearmon(as.Date(date, origin = "1899-12-30"))
[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
Type excel_numeric_to_date to look at the function's code and you'll see it's a wrapper for the line of code used by the other answers to this question: as.Date(date_num, origin = "1899-12-30").
So that's not the issue.
The underlying matter here is confusion about date formatting. You say you expect your first number 42963 to become "Aug 2016", and your last number 42964 to become "Aug 2017". The latter is just one more than the former, which shows up in the conversion - they should be a day apart, not a year apart as you are expecting:
> excel_numeric_to_date(c(42963, 42964))
[1] "2017-08-16" "2017-08-17" # as expected, they are one day apart
Perhaps the day and year fields are switched upstream in your data at the point where these get mapped to integer dates, and it was hard to tell here because of the values chosen.

Error using if clause with yearmon data

library(zoo)
library(lubridate)
yearmon <- as.yearmon(c("01-10", "02-15", "03-30"), "%m-%y")
for (i in yearmon) {
if (year(yearmon[i]) > 2020) {
year(yearmon[i]) <- year(yearmon[i]) - 100
}}
Error in if (year(yearmona[i]) > 2020) { :
missing value where TRUE/FALSE needed
The idea is to take data with incorrect years > 2020, and put them back to 19XX form.
Here, we can use an ifelse and also the assignment part won't work
as.yearmon(paste(format(yearmon, "%b"),
ifelse(year(yearmon) > 2020, year(yearmon)-100, year(yearmon))))
#[1] "Jan 2010" "Feb 2015" "Mar 1930"
Zoo's yearmon objects are year + fractional year, so you subtract 100 from anything over 2020:
> yearmon <- as.yearmon(c("01-10", "02-15", "03-30"), "%m-%y")
> yearmon
[1] "Jan 2010" "Feb 2015" "Mar 2030"
> yearmon[yearmon > 2020] = yearmon[yearmon > 2020] - 100
> yearmon
[1] "Jan 2010" "Feb 2015" "Mar 1930"
this doesn't require lubridate, or any format conversion etc.

Converting factor YYYY/MM to year-month class

I have dataframe with a column of dates of the form YYYY/MM, factor class, and I wish to convert it to date class. E.g. 2000/01 -> Jan 2000
I note that as.Date() is unable to handle date formats without the day component. I have tried using the as.yearmon() function from the zoo package.
library('zoo')
as.yearmon(factor("2000-01")) # It works with YYYY-MM format
# [1] "Jan 2000"
as.yearmon(factor("2000/01"))
# [1] NA
as.yearmon(factor("2000/01"),"%y/%m")
# [1] NA
I'm looking for a function that will turn factor("2000/01") to "Jan 2000". Any help would be kindly appreciated.
If as.Date has a problem with the day of month not being present, then for your purposes you can temporarily feed it with any day:
# Generate 10 "YYYY/MM"
n <- 10
our_dates <- paste(sample(1000:2000, n), sample(11:12, n, replace = TRUE), sep = "/")
our_dates
[1] "1027/12" "1657/12" "1180/11" "1646/12" "1012/12" "1684/12" "1693/11" "1835/11"
[9] "1916/11" "1073/12"
# Dirty fix, add a "day of month" to our dates
our_dates <- paste0(our_dates, "/01")
our_dates
[1] "1027/12/01" "1657/12/01" "1180/11/01" "1646/12/01" "1012/12/01" "1684/12/01"
[7] "1693/11/01" "1835/11/01" "1916/11/01" "1073/12/01"
# Format as dates
x <- as.Date(our_dates,"%Y/%m/%d")
# Now print out in your fromat:
format(x, format = "%b %Y")
[1] "Dec 1027" "Dec 1657" "Nov 1180" "Dec 1646" "Dec 1012" "Dec 1684" "Nov 1693"
[8] "Nov 1835" "Nov 1916" "Dec 1073"

Convert numeric data such as "715" into Date "July-2015" in R

I would like to friendly ask a question about converting numeric data into Date format.
I would like to convert the numeric data like:
time1<-c(715, 1212, 0416)
to
July-2015, Dec-2012, Apr-2016
I have tried these code but it is not working.
time2<-as.Date(as.character(time1), format="%m%y")
Does anyone have some ideas to solve this issue?
Part of the issue is that "July 2015", "December 2012", and "April 2016" are not dates since the specific day is missing. Another approach is to convert to zoo::yearmon. Here, the numeric input needs to be converted to a string with leading zero so that the month is from 01 to 12:
library(zoo)
ym <- as.yearmon(sprintf("%04d",time1),format="%m%y")
ym
##[1] "Jul 2015" "Dec 2012" "Apr 2016"
The result is of class yearmon, which can then be coerced to Date:
class(ym)
##[1] "yearmon"
d <- as.Date(ym)
d
##[1] "2015-07-01" "2012-12-01" "2016-04-01"
class(d)
##[1] "Date"
Try lubridate::parse_date_time():
library(lubridate)
time2 <- parse_date_time(time1, orders = "my")
format.Date(time2, "%b-%Y")
[1] "juil.-2015" "déc.-2012" "avril-2016" # my locale lang is French

Reorder factor levels that are dates but only month and year in R

I'm using ggvis stacked barplots to create a graph where the x-axis is a time series of dates and the y-axis are frequencies.
The dates however, are in the format "Apr 2015" i.e. months and years.
In ggvis, this means that I have to make the dates in my dataframe factors.
I'm having trouble converting these month-year dates into factors that are ordered by their date. Instead, I get this order
"Apr 2015, Jun 2015, May 2015"
This is my code:
selecteddates <- with(dailyIndMelt, dailyIndMelt[date >= as.Date(input$monthlydateInd[1]) & date <= as.Date(input$monthlydateInd[2]),])
selecteddates <- aggregate(selecteddates[,3],by=list(selecteddates$variable,substr(selecteddates$date,1,7)),sum)
colnames(selecteddates) <- c("industry", "date", "vacancyno")
selecteddates$date <- as.Date(paste(selecteddates$date,"-28",sep=""))
selecteddates <- selecteddates[order(as.Date(selecteddates$date)),]
selecteddates$date <- format(selecteddates$date, "%m %Y")
selecteddates$date <- factor(selecteddates$date)
levels(selecteddates$date) <- levels(selecteddates$date)[order(as.Date(levels(selecteddates$date), format="%m %Y"))]
levels(selecteddates)
To get the levels in the order you want, you can feed the desired order to the levels argument of factor. For example:
selecteddates$date <- factor(selecteddates$date,
levels=paste(month.abb, rep(2014:2015, each=12)))
month.abb is a built-in vector of month abbreviations. The paste function pastes together the month abbreviations and the year values, so you get the correct ordering. I used years 2014 and 2015 for illustration. Just change that to whatever years appear in your data.
Here's the paste statement on its own for illustration:
paste(month.abb, rep(2014:2015, each=12))
[1] "Jan 2014" "Feb 2014" "Mar 2014" "Apr 2014" "May 2014" "Jun 2014" "Jul 2014" "Aug 2014"
[9] "Sep 2014" "Oct 2014" "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
[17] "May 2015" "Jun 2015" "Jul 2015" "Aug 2015" "Sep 2015" "Oct 2015" "Nov 2015" "Dec 2015"
Ideally, you can get the desired years programmatically, directly from your data. If your dates start out in a date format, you can do something analogous to this:
library(lubridate) # For year function
# Fake date data
dates = seq(as.Date("2010-01-15"), as.Date("2015-01-15"), by="1 day")
# Extract the years from the dates and use them to create the month-year levels
levels = paste(month.abb, rep(sort(unique(year(dates))), each=12))
In the code above, the sort ensures the years are in order, even if the dates are out of order.

Resources