Convert Excel numeric to date - r

I have a vector of numeric excel dates i.e.
date <- c(42963,42994,42903,42933,42964)
The output am I expecting when using excel_numeric_to_date function from janitor package and as.yearmon function from zoo package
as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2016" "Sep 2016" "Jun 2017" "Jul 2017" "Aug 2017".
However, the conversion for the first to elements of the date vector are incorrect. The actual result are:
as.yearmon(excel_numeric_to_date(date)) [1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
I have tried using different option(modern and mac pre-2011) for the date_system argument in the excel_numeric_to_date but it does not help either
The excel version is 2010

You can simply use as.Date and specify the origin, i.e.
as.Date(date, origin="1899-12-30")
#[1] "2017-08-16" "2017-09-16" "2017-06-17" "2017-07-17" "2017-08-17"
#or format it to your liking,
format(as.Date(date, origin="1899-12-30"), '%b %Y')
#[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"
This link gives quite a bit of information on this matter.

If you want to convert dates from Excel, you can use as.Date() with a specific origin. According to the documentation, "1900-01-0"' is used as day in Excel on Windows, but "this is complicated by Excel incorrectly treating 1900 as a leap year". So "1899-12-30" should be used for dates post 1901:
date <- c(42963,42994,42903,42933,42964)
This is the result of as.Date():
as.Date(date, origin = "1899-12-30")
[1] "2017-08-18" "2017-09-18" "2017-06-19" "2017-07-19" "2017-08-19"
You can then use zoo::as.yearmon()` to get the expected outcome:
zoo::as.yearmon(as.Date(date, origin = "1899-12-30"))
[1] "Aug 2017" "Sep 2017" "Jun 2017" "Jul 2017" "Aug 2017"

Type excel_numeric_to_date to look at the function's code and you'll see it's a wrapper for the line of code used by the other answers to this question: as.Date(date_num, origin = "1899-12-30").
So that's not the issue.
The underlying matter here is confusion about date formatting. You say you expect your first number 42963 to become "Aug 2016", and your last number 42964 to become "Aug 2017". The latter is just one more than the former, which shows up in the conversion - they should be a day apart, not a year apart as you are expecting:
> excel_numeric_to_date(c(42963, 42964))
[1] "2017-08-16" "2017-08-17" # as expected, they are one day apart
Perhaps the day and year fields are switched upstream in your data at the point where these get mapped to integer dates, and it was hard to tell here because of the values chosen.

Related

Negative months as difference between two dates

I am aware that there are some posts similar to this here on stackoverflow. But they did not directly address my issue. Here is my issue:
I have a variable called earliest_cr_line which contains dates as Jan-01. This is a string variable. I need to create a variable called "test" which should contain the difference between earliest_cr_line and Dec-2007 in months. To this end, I ran the following codes:
library(zoo)
loan_data$earliest_cr_line_date <- as.yearmon(loan_data$earliest_cr_line, "%b-%y")
ref_date <- as.yearmon("Dec-07", "%b-%y")
loan_data$test <- round((as.Date(ref_date) -
as.Date(loan_data$earliest_cr_line_date))/(365.25/12))
However, the newly created variable test contains many negative numbers as well. I figured out that when converting earliest_cr_line from string to yearmon, R misinterpreted years which were before 1970. For example, yearmon converted Jan-60 into Nov 2060 instead of Nov 1960. That's what is causing the negative output. Any idea how I should approach this problem?
Thanks.
Date's integer is a day, making day-to-month determination inconsistent. yearmon's integer is a year, which makes a month just 1/12, a bit simpler to deal with. If you start with zoo's yearmon object, then I suggest you stick with it instead of trying convert to/from R's Date object.
Handling wrong years is an annoying Y2K problem ... while this below will generally work (assuming that everything you're looking at is in the past), I urge you to fix this problem at the source. (I am astounded that something somewhere still thinks that 2-digit years is acceptable. *shrug*)
vec <- c("Nov-60","Nov-70","Nov-71","Jan-01","Mar-05","Dec-07")
(out <- zoo::as.yearmon(vec, format="%b-%y"))
# [1] "Nov 2060" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
(wrongcentury <- as.integer(gsub(".* ", "", out)) > as.integer(format(Sys.Date(), "%Y")))
# [1] TRUE FALSE FALSE FALSE FALSE FALSE
vec[wrongcentury]
# [1] "Nov-60"
zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
# [1] "Nov 1960"
out[wrongcentury] <- zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
out
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
Edit: much more concise recommendation from G. Grothendieck:
out <- zoo::as.yearmon(vec, format="%b-%y")
out - 100 * (out > zoo::as.yearmon(Sys.Date()))
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
If your source data ever comes close to 1920, then this inferential solution will further break. (More reason to fix it at the source :-)

convert "2014-05" into date format as "May 2015" for display in ggplot in R

I have date in this character format "2017-03" and I want to convert it in "March 2017" for display in ggplot in R. But when I try to convert it using as.Date("2017-03","%Y-%m") it gives NA
You can consider using zoo::as.yearmon function as:
library(zoo)
#Sample data
v <- c("2014-05", "2017-03")
as.yearmon(v, "%Y-%m")
#[1] "May 2014" "Mar 2017"
#if you want the month name to be in full. Then you can format yearmon type as
format(as.yearmon(v, "%Y-%m"), "%B %Y")
#[1] "May 2014" "March 2017"
Parse dates back and forth can be done like this:
The one you mentioned is done by quoting MKR:
Use zoo package
library(zoo)
date <- "2017-03"
as.yearmon(date, "%Y-%m")
#[1] "Mar 2017"
format(as.yearmon(date, "%Y-%m"), "%B %Y")
#[1] "March 2017"
If you want to parse March 2017 or other similar formats back to 2017-03:
Use hms package because base R doesn't provide a nice built-in class for date
library(hms)
DATE <- "March 1 2017"
parse_date(DATE, "%B %d %Y")
#[1] "2017-03-01"
Or if you are parsing dates with foreign language:
foreign_date <- "1 janvier 2018"
parse_date(foreign_date, "%d %B %Y", locale = locale("fr"))
#[1] "2018-01-01"
By using the locale = locale("language") you can parse dates with foreign months names to standard dates. Use this to check the language:
date_names_langs()
-Format:
-Year: %Y(4 digits) %y(2 digits; 00-69->2000-2069, 70-99 -> 1970-1999)
-Month: %m (2 digits), %b (abbreviation: Jan), %B full name January
-Day: %d (2 digits)

Converting character to date R

I am running into some date issues when working with Dates in R.
Here's my situation.
I'm working on a dataset with a column date (ProjectDate) having the following values
class(Dataset$ProjectDate)
"character"
head(Dataset$ProjectDate)
"End July 2014" "End August 2014" "End September 2014" "End October 2014"
I would like to convert it to "%M %Y" format
How can I do that ?
Thanks
You should think of using 2 step process. First remove the End part from the ProjectDate using sub.
Now you can apply yearmon from zoo library to convert to month year date format.
library(zoo)
as.yearmon(sub("^End ", "", df$ProjectDate), "%b %Y")
#[1] "Aug 2014" "Sep 2014"
Try the following.
First, the data.
x <- scan(what = character(),
text = '"End July 2014" "End August 2014"
"End September 2014" "End October 2014"')
Now the conversion to dates. Note that your dates do not have a day, so I replace "End" by day "1".
as.Date(sub("^[[:alpha:]]+", "1", x), "%d %B %Y")
#[1] "2014-07-01" "2014-08-01" "2014-09-01" "2014-10-01"

Convert numeric data such as "715" into Date "July-2015" in R

I would like to friendly ask a question about converting numeric data into Date format.
I would like to convert the numeric data like:
time1<-c(715, 1212, 0416)
to
July-2015, Dec-2012, Apr-2016
I have tried these code but it is not working.
time2<-as.Date(as.character(time1), format="%m%y")
Does anyone have some ideas to solve this issue?
Part of the issue is that "July 2015", "December 2012", and "April 2016" are not dates since the specific day is missing. Another approach is to convert to zoo::yearmon. Here, the numeric input needs to be converted to a string with leading zero so that the month is from 01 to 12:
library(zoo)
ym <- as.yearmon(sprintf("%04d",time1),format="%m%y")
ym
##[1] "Jul 2015" "Dec 2012" "Apr 2016"
The result is of class yearmon, which can then be coerced to Date:
class(ym)
##[1] "yearmon"
d <- as.Date(ym)
d
##[1] "2015-07-01" "2012-12-01" "2016-04-01"
class(d)
##[1] "Date"
Try lubridate::parse_date_time():
library(lubridate)
time2 <- parse_date_time(time1, orders = "my")
format.Date(time2, "%b-%Y")
[1] "juil.-2015" "déc.-2012" "avril-2016" # my locale lang is French

Reorder factor levels that are dates but only month and year in R

I'm using ggvis stacked barplots to create a graph where the x-axis is a time series of dates and the y-axis are frequencies.
The dates however, are in the format "Apr 2015" i.e. months and years.
In ggvis, this means that I have to make the dates in my dataframe factors.
I'm having trouble converting these month-year dates into factors that are ordered by their date. Instead, I get this order
"Apr 2015, Jun 2015, May 2015"
This is my code:
selecteddates <- with(dailyIndMelt, dailyIndMelt[date >= as.Date(input$monthlydateInd[1]) & date <= as.Date(input$monthlydateInd[2]),])
selecteddates <- aggregate(selecteddates[,3],by=list(selecteddates$variable,substr(selecteddates$date,1,7)),sum)
colnames(selecteddates) <- c("industry", "date", "vacancyno")
selecteddates$date <- as.Date(paste(selecteddates$date,"-28",sep=""))
selecteddates <- selecteddates[order(as.Date(selecteddates$date)),]
selecteddates$date <- format(selecteddates$date, "%m %Y")
selecteddates$date <- factor(selecteddates$date)
levels(selecteddates$date) <- levels(selecteddates$date)[order(as.Date(levels(selecteddates$date), format="%m %Y"))]
levels(selecteddates)
To get the levels in the order you want, you can feed the desired order to the levels argument of factor. For example:
selecteddates$date <- factor(selecteddates$date,
levels=paste(month.abb, rep(2014:2015, each=12)))
month.abb is a built-in vector of month abbreviations. The paste function pastes together the month abbreviations and the year values, so you get the correct ordering. I used years 2014 and 2015 for illustration. Just change that to whatever years appear in your data.
Here's the paste statement on its own for illustration:
paste(month.abb, rep(2014:2015, each=12))
[1] "Jan 2014" "Feb 2014" "Mar 2014" "Apr 2014" "May 2014" "Jun 2014" "Jul 2014" "Aug 2014"
[9] "Sep 2014" "Oct 2014" "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
[17] "May 2015" "Jun 2015" "Jul 2015" "Aug 2015" "Sep 2015" "Oct 2015" "Nov 2015" "Dec 2015"
Ideally, you can get the desired years programmatically, directly from your data. If your dates start out in a date format, you can do something analogous to this:
library(lubridate) # For year function
# Fake date data
dates = seq(as.Date("2010-01-15"), as.Date("2015-01-15"), by="1 day")
# Extract the years from the dates and use them to create the month-year levels
levels = paste(month.abb, rep(sort(unique(year(dates))), each=12))
In the code above, the sort ensures the years are in order, even if the dates are out of order.

Resources