Convert character YYYY-MM-00 into date YYYY-MM in R - r

I imported Excel data into R and I have a problem to convert dates.
In R, my data are character and look like :
date<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
I would like to convert character into date (MM/YYYY) but the '00' value used for days poses a problem and 'NA' are returned systematically.
It works when I manually replace '00' with '01' and then use as.yearmon, ymd and format. But I have lots of dates to change and I don't know how to change all my '00' into '01' in R.
# data exemple
date1<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
# removing time -> doesn't work because of the '00' day
date1c<-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
date1c<-format(strptime(date1, format = '%Y-%m'), '%Y/%m')
# trying to convert character into date -> doesn't work either
date1c<-ymd(date1)
date1c<-strptime(date1, format = "%Y-%m-%d %H:%M:%S")
date1c<-as.Date(date1, format="%Y-%m-%d %H:%M:%S")
date1c<as.yearmon(date1, format='%Y%m')
# everything works if days are '01'
date2<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date2c<-as.yearmon(ymd(format(strptime(date2, format = "%Y-%m-%d"), "%Y/%m/%d")))
date2c
If you have an idea to do it or an another idea to solve my problem, I would be thankful!

Use gsub to replace -00 with -01.
date1<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date1 <- gsub("-00", "-01", date1)
date1c <-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
> date1c
[1] "1971/02/01" "1979/06/01"

Another possibility could be:
as.Date(paste0(substr(date1, 1, 9), "1"), format = "%Y-%m-%d")
[1] "1971-02-01" "1979-06-01"
Here it extracts the first nine characters, pastes it together with 1 and then converts it into a date object.

These alternatives each accept a vector input and produce a vector as output.
Date output
These all will accept a vector as input and produce a Date vector as the output.
# 1. replace first occurrence of '00 ' with '01 ' and then convert to Date
as.Date(sub("00 ", "01 ", date1))
## [1] "1971-02-01" "1979-06-01"
# 2. convert to yearmon class and then to Date
library(zoo)
as.Date(as.yearmon(date1, "%Y-%m"))
## [1] "1971-02-01" "1979-06-01"
# 3. insert a 1 and then convert to Date
as.Date(paste(1, date1), "%d %Y-%m")
## [1] "1971-02-01" "1979-06-01"
yearmon output
Note that if you really are trying to represent just months and years then yearmon class directly represents such objects without the kludge of using an unused day of the month. Such objects are internally represented as a year plus a fraction of a year, i.e. year + 0 for January, year + 1/12 for February, etc. They display in a meaningful way, they sort in the expected manner and can be manipulated, e.g. take the difference between two such objects or add 1/12 to get the next month, etc. As with the others it takes a vector in and produces a vector out.
library(zoo)
as.yearmon(date1, "%Y-%m")
## [1] "Feb 1971" "Jun 1979"
character output
If you want character output rather than Date or yearmon output then these variations work and again accept a vector as input and produce a vector as output:
# 1. replace -00 and everything after that with a string having 0 characters
sub("-00.*", "", date1)
## [1] "1971-02" "1979-06"
# 2. convert to yearmon and then format that
library(zoo)
format(as.yearmon(date1, "%Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 3. convert to Date class and then format that
format(as.Date(paste(1, date1), "%d %Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 4. pick off the first 7 characters
substring(date1, 1, 7)
## [1] "1971-02" "1979-06"

Related

Converting non-standard date format strings ("April-20") to date objects R

I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"

How to parse an invalid date with lubridate?

I need to parse dates and have a cases like "31/02/2018":
library(lubridate)
> dmy("31/02/2018", quiet = T)
[1] NA
This makes sense as the 31st of Feb does not exist. Is there a way to parse the string "31/02/2018" to e.g. 2018-02-28 ? So not to get an NA, but an actual date?
Thanks.
We can write a function assuming you would only have dates which could be higher than the actual date and would have the same format always.
library(lubridate)
get_correct_date <- function(example_date) {
#Split vector on "/" and get 3 components (date, month, year)
vecs <- as.numeric(strsplit(example_date, "\\/")[[1]])
#Check number of days in that month
last_day_of_month <- days_in_month(vecs[2])
#If the input date is higher than actual number of days in that month
#replace it with last day of that month
if (vecs[1] > last_day_of_month)
vecs[1] <- last_day_of_month
#Paste the date components together to get new modified date
dmy(paste0(vecs, collapse = "/"))
}
get_correct_date("31/02/2018")
#[1] "2018-02-28"
get_correct_date("31/04/2018")
#[1] "2018-04-30"
get_correct_date("31/05/2018")
#[1] "2018-05-31"
With small modification you can adjust the dates if they have different format or even if some dates are smaller than the first date.

How to format a Date as "YYYY-Mon" with Lubridate?

I would like to create a vector of dates between two specified moments in time with step 1 month, as described in this thread (Create a Vector of All Days Between Two Dates), to be then converted into factors for data visualization.
However, I'd like to have the dates in the YYYY-Mon, ie. 2010-Feb, format. But so far I managed only to have the dates in the standard format 2010-02-01, using a code like this:
require(lubridate)
first <- ymd_hms("2010-02-07 15:00:00 UTC")
start <- ymd(floor_date(first, unit="month"))
last <- ymd_hms("2017-10-29 20:00:00 UTC")
end <- ymd(ceiling_date(last, unit="month"))
> start
[1] "2010-02-01"
> end
[1] "2017-11-01"
How can I change the format to YYYY-Mon?
You can use format():
start %>% format('%Y-%b')
To create the vector, use seq():
seq(start, end, by = 'month') %>% format('%Y-%b')
Obs: Use capital 'B' for full month name: '%Y-%B'.

Change Format of Date Column

I need to turn one date format into another with RStudio, since for lubridate and other date related functions a standard unambiguous format is needed for further work. I've included a few examples and informations below:
Example-Dataset:
Function,HiredDate,FiredDate
Waitress,16-06-01 12:40:02,16-06-13 11:43:12
Chef,16-04-17 15:00:59,16-04-18 15:00:59
Current Date Format (POSIXlt) of HiredDate and FiredDate:
"%y-%m-%d %H:%M:%S"
What I want the Date Format of HireDate and FiredDate to be:
"%Y-%m-%d %H:%M:%S" / 2016-06-01 12:40:02
or
"%Y/%m/%d %H:%M:%S" / 2016/06/01 12:40:02
In principle, you can convert date and time for example using the strftime function:
d <- "2016-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "2016/06/01 12:40:02"
In your case, the year is causing trouble:
d <- "16-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "0016/06/01 12:40:02"
As Dave2e suggested, the two digit year can be read by %y:
strftime(d, format="%y/%m/%d %H:%M:%S")
[1] "16/06/01 12:40:02"
Assuming that your data comes from the 20st and 21st century, you can paste a 19 or 20 in front of the HireDate and FireDate:
current <- 16
prefixHire <- ifelse(substr(data$HireDate, 1, 2)<=currentYear,20,19)
prefixFire <- ifelse(substr(data$FireDate, 1, 2)<=currentYear,20,19)
data$HireDate = paste(prefixHire, data$HireDate, sep="")
data$FireDate = paste(prefixFire, data$FireDate, sep="")
The code generates a prefix by assuming that any date from a year greater than the current ('16) is actually from the 20th century. The prefix is then pasted to HireDate and FireDate.

Separate Date into week and year

Currently my dataframe has dates displayed in the 'Date' column as 01/01/2007 etc I would like to convert these into a week/year value i.e. 01/2007. Any ideas?
I have been trying things like this and getting no where...
enviro$Week <- strptime(enviro$Date, format= "%W/%Y")
You have to first convert to date, then you can convert back to the week of the year using format, for example:
### Converts character to date
test.date <- as.Date("10/10/2014", format="%m/%d/%Y")
### Extracts only Week of the year and year
format(test.date, format="Week number %W of %Y")
[1] "Week number 40 of 2014"
### Or if you prefer
format(date, format="%W/%Y")
[1] "40/2014"
So, in your case, you would do something like this:
enviro$Week <- format(as.Date(enviro$Date, format="%m/%d/%Y"), format= "%W/%Y")
But remember that the part as.Date(enviro$Date, format="%m/%d/%Y") is only necessary if your data is not in Date format, and you also should put the right format parameter to convert your character to Date, if that is the case.
What is the class of enviro$Date? If it is of class Date there is probably a better way of doing this, otherwise you can try
v <- strsplit(as.character(enviro$Date), split = "/")
weeks <- sapply(v, "[", 2)
years <- sapply(v, "[", 3)
enviro$Week <- paste(weeks, years, sep = "/")

Resources