I have dataframe with a column of dates of the form YYYY/MM, factor class, and I wish to convert it to date class. E.g. 2000/01 -> Jan 2000
I note that as.Date() is unable to handle date formats without the day component. I have tried using the as.yearmon() function from the zoo package.
library('zoo')
as.yearmon(factor("2000-01")) # It works with YYYY-MM format
# [1] "Jan 2000"
as.yearmon(factor("2000/01"))
# [1] NA
as.yearmon(factor("2000/01"),"%y/%m")
# [1] NA
I'm looking for a function that will turn factor("2000/01") to "Jan 2000". Any help would be kindly appreciated.
If as.Date has a problem with the day of month not being present, then for your purposes you can temporarily feed it with any day:
# Generate 10 "YYYY/MM"
n <- 10
our_dates <- paste(sample(1000:2000, n), sample(11:12, n, replace = TRUE), sep = "/")
our_dates
[1] "1027/12" "1657/12" "1180/11" "1646/12" "1012/12" "1684/12" "1693/11" "1835/11"
[9] "1916/11" "1073/12"
# Dirty fix, add a "day of month" to our dates
our_dates <- paste0(our_dates, "/01")
our_dates
[1] "1027/12/01" "1657/12/01" "1180/11/01" "1646/12/01" "1012/12/01" "1684/12/01"
[7] "1693/11/01" "1835/11/01" "1916/11/01" "1073/12/01"
# Format as dates
x <- as.Date(our_dates,"%Y/%m/%d")
# Now print out in your fromat:
format(x, format = "%b %Y")
[1] "Dec 1027" "Dec 1657" "Nov 1180" "Dec 1646" "Dec 1012" "Dec 1684" "Nov 1693"
[8] "Nov 1835" "Nov 1916" "Dec 1073"
Related
I have a data frame that has the date column as a char class. I've tried parsing as.Date but the amount of NAs is worrisome. The dates are are in the following formats: "2003-10-19", and "October 05, 2018"
date <- c("October 05, 2018", "2003-10-19")
as.Date(date) this is what I tried, but most of my results came back with NAs
Here is an option:
date <- c("October 05, 2018", "2003-10-19", "10/9/95", "6 Oct.2010")
lubridate::parse_date_time(date, orders = c("mdy", "ymd", "dmy"))
#> [1] "2018-10-05 UTC" "2003-10-19 UTC" "1995-10-09 UTC" "2010-10-06 UTC"
as.Date has a feature called tryFormats, it's not vectorized, but can be used with e.g. lapply.
date <- c("October 05, 2018", "2003-10-19", "02/04/20", "11/09/2002",
"14.05.2021", "Nov 1, 2022", "March 1, 2004")
lapply(date, as.Date, tryFormats=c("%Y-%m-%d", "%B %d, %Y", "%d/%m/%y",
"%m/%d/%Y", "%d.%m.%Y", "%b %d, %Y"))
[[1]]
[1] "2018-10-05"
[[2]]
[1] "2003-10-19"
[[3]]
[1] "2020-04-02"
[[4]]
[1] "2020-09-11"
[[5]]
[1] "2021-05-14"
[[6]]
[1] "2022-11-01"
[[7]]
[1] "2004-03-01"
This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Closed 2 years ago.
I have a column with strings of the form - "Nov - 16", "Apr - 18" and I would like R to recognize this as date columns.
I've tried using as.Date and Zoom. The former gave me a bunch of NAs and the latter threw an error.
Sum up the comments so far:
vec <- c("Nov - 16", "Apr - 18")
(o1 <- as.Date(paste("01", vec), format = "%d %b - %y"))
# [1] "2016-11-01" "2018-04-01"
(o2 <- lubridate::dmy(paste("01", vec)))
# [1] "2016-11-01" "2018-04-01"
(o3 <- zoo::as.yearmon(vec, "%b - %y"))
# [1] "Nov 2016" "Apr 2018"
It should be noted that the first two produce objects of class Date, and the third returns class yearmon, and their relative numeric values are a bit different:
dput(o1)
# structure(c(17106, 17622), class = "Date")
dput(o2)
# structure(c(17106, 17622), class = "Date")
dput(o3)
# structure(c(2016.83333333333, 2018.25), class = "yearmon")
though you can always convert from the third if need be, as suggested by #RonakShah.
as.Date(o3)
# [1] "2016-11-01" "2018-04-01"
I am aware that there are some posts similar to this here on stackoverflow. But they did not directly address my issue. Here is my issue:
I have a variable called earliest_cr_line which contains dates as Jan-01. This is a string variable. I need to create a variable called "test" which should contain the difference between earliest_cr_line and Dec-2007 in months. To this end, I ran the following codes:
library(zoo)
loan_data$earliest_cr_line_date <- as.yearmon(loan_data$earliest_cr_line, "%b-%y")
ref_date <- as.yearmon("Dec-07", "%b-%y")
loan_data$test <- round((as.Date(ref_date) -
as.Date(loan_data$earliest_cr_line_date))/(365.25/12))
However, the newly created variable test contains many negative numbers as well. I figured out that when converting earliest_cr_line from string to yearmon, R misinterpreted years which were before 1970. For example, yearmon converted Jan-60 into Nov 2060 instead of Nov 1960. That's what is causing the negative output. Any idea how I should approach this problem?
Thanks.
Date's integer is a day, making day-to-month determination inconsistent. yearmon's integer is a year, which makes a month just 1/12, a bit simpler to deal with. If you start with zoo's yearmon object, then I suggest you stick with it instead of trying convert to/from R's Date object.
Handling wrong years is an annoying Y2K problem ... while this below will generally work (assuming that everything you're looking at is in the past), I urge you to fix this problem at the source. (I am astounded that something somewhere still thinks that 2-digit years is acceptable. *shrug*)
vec <- c("Nov-60","Nov-70","Nov-71","Jan-01","Mar-05","Dec-07")
(out <- zoo::as.yearmon(vec, format="%b-%y"))
# [1] "Nov 2060" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
(wrongcentury <- as.integer(gsub(".* ", "", out)) > as.integer(format(Sys.Date(), "%Y")))
# [1] TRUE FALSE FALSE FALSE FALSE FALSE
vec[wrongcentury]
# [1] "Nov-60"
zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
# [1] "Nov 1960"
out[wrongcentury] <- zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
out
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
Edit: much more concise recommendation from G. Grothendieck:
out <- zoo::as.yearmon(vec, format="%b-%y")
out - 100 * (out > zoo::as.yearmon(Sys.Date()))
# [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
If your source data ever comes close to 1920, then this inferential solution will further break. (More reason to fix it at the source :-)
I would like to friendly ask a question about converting numeric data into Date format.
I would like to convert the numeric data like:
time1<-c(715, 1212, 0416)
to
July-2015, Dec-2012, Apr-2016
I have tried these code but it is not working.
time2<-as.Date(as.character(time1), format="%m%y")
Does anyone have some ideas to solve this issue?
Part of the issue is that "July 2015", "December 2012", and "April 2016" are not dates since the specific day is missing. Another approach is to convert to zoo::yearmon. Here, the numeric input needs to be converted to a string with leading zero so that the month is from 01 to 12:
library(zoo)
ym <- as.yearmon(sprintf("%04d",time1),format="%m%y")
ym
##[1] "Jul 2015" "Dec 2012" "Apr 2016"
The result is of class yearmon, which can then be coerced to Date:
class(ym)
##[1] "yearmon"
d <- as.Date(ym)
d
##[1] "2015-07-01" "2012-12-01" "2016-04-01"
class(d)
##[1] "Date"
Try lubridate::parse_date_time():
library(lubridate)
time2 <- parse_date_time(time1, orders = "my")
format.Date(time2, "%b-%Y")
[1] "juil.-2015" "déc.-2012" "avril-2016" # my locale lang is French
I have some data that looks a bit like this:
require(zoo)
X <- rbind(c(date='20111001', fmt='%Y%m%d'),
c('20111031', '%Y%m%d'),
c('201110', '%Y%m'),
c('102011', '%m%Y'),
c('31/10/2011', '%d/%m/%Y'),
c('20111000', '%Y%m%d'))
print(X)
# date fmt
# [1,] "20111001" "%Y%m%d"
# [2,] "20111031" "%Y%m%d"
# [3,] "201110" "%Y%m"
# [4,] "102011" "%m%Y"
# [5,] "31/10/2011" "%d/%m/%Y"
# [6,] "20111000" "%Y%m%d"
I only want the year and month. I don't need the day, so I'm not worried that the final day is invalid. R, unfortunately, is:
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)
# $`20111001`
# [1] "Oct 2011"
# $`20111031`
# [1] "Oct 2011"
# $`201110`
# [1] "Oct 2011"
# $`102011`
# [1] "Oct 2011"
# $`31/10/2011`
# [1] "Oct 2011"
# $`20111000`
# Error in charToDate(x) :
# character string is not in a standard unambiguous format
I know that the usual answer is to fix the day part of the date, e.g. using paste(x, '01', sep=''). I don't think that will work here, because I don't know in advance what the date format will be, and therefore I cannot set the day without converting to some sort of date object first.
Assuming the month always follows the year and is always two characters in your date. Why not just extract the information with substr. Perhaps something like:
lapply(X[,'date'],
function(x) paste(month.abb[as.numeric(substr(x, 5, 6))], substr(x, 1, 4))
)
You don't need to specify the day in your format if you don't need it. Read ?strptime carefully. The second paragraph in the Details section says:
Each input string
is processed as far as necessary for the format specified: any
trailing characters are ignored.
So adjust your format and everything should work.
X <- rbind(c(date='20111001', fmt='%Y%m'),
c('20111031', '%Y%m'),
c('201110', '%Y%m'),
c('102011', '%m%Y'),
c('20111000', '%Y%m'))
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)
Assuming that I'm always given a date (and never a time), and that any illegal 'day' is less than 61, I can guarantee a legal date as follows, by treating the supplied day as 'seconds' and replacing the supplied day with the 1st.
require(stringr)
safe_date <- str_c('01', X[, 'date'])
safe_fmt <- str_c('%d', str_replace(X[, 'fmt'], '%d', '%S'))
mapply(as.yearmon, safe_date, safe_fmt, SIMPLIFY=FALSE)
# $`0120111001`
# [1] "Oct 2011"
# $`0120111031`
# [1] "Oct 2011"
# $`01201110`
# [1] "Oct 2011"
# $`01102011`
# [1] "Oct 2011"
# $`0131/10/2011`
# [1] "Oct 2011"
# $`0120111000`
# [1] "Oct 2011"