Error by changing all the date format in R - r

I'm trying to change the format from all the dates in one column. I tried it with strftime, as.Date and with parse_date, all giving me some sort of issue. The column includes 2200 different times currently expressed in the following format: Feb-03-2022, it should be expressed as: "%B %d %Y", how could I modify all dates?
ethereum <- read_csv('ethereum_2022-01-04_2022-02-03.csv')
head(ethereum)
# Changing date format in the dataset
ethereum$Date <- parse_date(ethereum$`Date`, "%d-%b-%y")
head(ethereum$Date)
# Naming the datatype and the timeseries
ds<- ethereum$Date
y<- ethereum$`Close`
df<- data.frame(ds,y)
View(df)
When I try with this code, I get the following error:
Warning: 2200 parsing failures.
row col expected actual
1 -- date like %d-%b-%y Feb-03-2022
2 -- date like %d-%b-%y Feb-02-2022
3 -- date like %d-%b-%y Feb-01-2022
4 -- date like %d-%b-%y Jan-31-2022
5 -- date like %d-%b-%y Jan-30-2022
... ... .................. ...........
See problems(...) for more details.

You can convert the vector to the date format and then apply any desired formatting.
vec_str <- c("Feb-03-2022", "Feb-02-2022", "Jan-31-2022", "Jan-30-2022")
vec_dates <- as.Date(x = vec_str, format = "%b-%d-%Y")
vec_dates_str <- format(vec_dates, "%B %d %y")
vec_dates_str
# [1] "February 03 22" "February 02 22" "January 31 22" "January 30 22"
For convenience of applying in data frame you can wrap this behaviour in a function:
my_date_transform <- function(x,date_in_format = "%b-%d-%Y",
date_out_format = "%B-%d-%Y") {
x_dates <- as.Date(x = x, format = date_in_format)
format(x = vec_dates, date_out_format)
}
my_date_transform(x = vec_str)
Example
sample_data <- data.frame(original_date_str = vec_str)
sample_data$new_date_format <- my_date_transform(sample_data$original_date_str)
sample_data
# >> sample_data
# original_date_str new_date_format
# 1 Feb-03-2022 February-03-2022
# 2 Feb-02-2022 February-02-2022
# 3 Jan-31-2022 January-31-2022
# 4 Jan-30-2022 January-30-2022
You can then apply your function to a data frame

I assume you are having a packages problem.
Try this example:
library(parsedate)
## Calling the function
parse_date("Feb-03-2022")
## Specifying the package to avoid masked functions
parsedate::parse_date("Feb-03-2022")
In your code would look like this:
ethereum$Date <- parse_date(ethereum$Date)
The package "parsedate", is used to parse from any date format. I don't know if all the 2.2k are in the same format, so my suggestion is to use that. In case you are 100% sure, you can use many other parsing date functions. You can even write one yourself using string processing techniques.

Related

Converting non-standard date format strings ("April-20") to date objects R

I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"

Convert a string into dates using R

I have a column of dates written as monthyear in the format:
11960 - this would be Jan 1960
121960 - this would be Dec 1960
I would like to convert this column into a day-month-year format assuming the first of the month as each date.
I have tried (using one number as an example as opposed to dt$dob)
x <- sprintf("%08d%", 11960)
and then x <- as.date(x, format = "%d%m%Y)
but this gives me NAs as I assume it doesn't like the 00 at the start
So I tried pasting 01 to each value but this pastes it to the end (R noob here). I was thinking maybe posting 01 to the start and then using the sprintf function may work still:
paste 01 to start of 11960 = 011960
sprintf("%08d%", 011960) to maybe give 0101960?
Then use as.Date to convert?
Many thanks for your help
i used paste0() instead of sprintf, but it seems it works.
> x<-paste0("010",11960)
> x
[1] "01011960"
> as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01"
EDIT for 2 digit months i use ifelse() and nchar()
y<-c(11960,11970,11980, 111960,111970,111980)
x<-ifelse(nchar(y) == 5,paste0("010",y),paste0("01",y))
> x
[1] "01011960" "01011970" "01011980" "01111960" "01111970" "01111980"
as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01" "1970-01-01" "1980-01-01" "1960-11-01" "1970-11-01" "1980-11-01"

Convert character YYYY-MM-00 into date YYYY-MM in R

I imported Excel data into R and I have a problem to convert dates.
In R, my data are character and look like :
date<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
I would like to convert character into date (MM/YYYY) but the '00' value used for days poses a problem and 'NA' are returned systematically.
It works when I manually replace '00' with '01' and then use as.yearmon, ymd and format. But I have lots of dates to change and I don't know how to change all my '00' into '01' in R.
# data exemple
date1<-c('1971-02-00 00:00:00', '1979-06-00 00:00:00')
# removing time -> doesn't work because of the '00' day
date1c<-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
date1c<-format(strptime(date1, format = '%Y-%m'), '%Y/%m')
# trying to convert character into date -> doesn't work either
date1c<-ymd(date1)
date1c<-strptime(date1, format = "%Y-%m-%d %H:%M:%S")
date1c<-as.Date(date1, format="%Y-%m-%d %H:%M:%S")
date1c<as.yearmon(date1, format='%Y%m')
# everything works if days are '01'
date2<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date2c<-as.yearmon(ymd(format(strptime(date2, format = "%Y-%m-%d"), "%Y/%m/%d")))
date2c
If you have an idea to do it or an another idea to solve my problem, I would be thankful!
Use gsub to replace -00 with -01.
date1<-c('1971-02-01 00:00:00', '1979-06-01 00:00:00')
date1 <- gsub("-00", "-01", date1)
date1c <-format(strptime(date1, format = "%Y-%m-%d"), "%Y/%m/%d")
> date1c
[1] "1971/02/01" "1979/06/01"
Another possibility could be:
as.Date(paste0(substr(date1, 1, 9), "1"), format = "%Y-%m-%d")
[1] "1971-02-01" "1979-06-01"
Here it extracts the first nine characters, pastes it together with 1 and then converts it into a date object.
These alternatives each accept a vector input and produce a vector as output.
Date output
These all will accept a vector as input and produce a Date vector as the output.
# 1. replace first occurrence of '00 ' with '01 ' and then convert to Date
as.Date(sub("00 ", "01 ", date1))
## [1] "1971-02-01" "1979-06-01"
# 2. convert to yearmon class and then to Date
library(zoo)
as.Date(as.yearmon(date1, "%Y-%m"))
## [1] "1971-02-01" "1979-06-01"
# 3. insert a 1 and then convert to Date
as.Date(paste(1, date1), "%d %Y-%m")
## [1] "1971-02-01" "1979-06-01"
yearmon output
Note that if you really are trying to represent just months and years then yearmon class directly represents such objects without the kludge of using an unused day of the month. Such objects are internally represented as a year plus a fraction of a year, i.e. year + 0 for January, year + 1/12 for February, etc. They display in a meaningful way, they sort in the expected manner and can be manipulated, e.g. take the difference between two such objects or add 1/12 to get the next month, etc. As with the others it takes a vector in and produces a vector out.
library(zoo)
as.yearmon(date1, "%Y-%m")
## [1] "Feb 1971" "Jun 1979"
character output
If you want character output rather than Date or yearmon output then these variations work and again accept a vector as input and produce a vector as output:
# 1. replace -00 and everything after that with a string having 0 characters
sub("-00.*", "", date1)
## [1] "1971-02" "1979-06"
# 2. convert to yearmon and then format that
library(zoo)
format(as.yearmon(date1, "%Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 3. convert to Date class and then format that
format(as.Date(paste(1, date1), "%d %Y-%m"), "%Y-%m")
## [1] "1971-02" "1979-06"
# 4. pick off the first 7 characters
substring(date1, 1, 7)
## [1] "1971-02" "1979-06"

How to handle Date in format Like "Wednesday-September 7-2011"

I am trying to get Date column from an excel data. The format of date in excel is like Wednesday-September 7-2011.
How do I handle dates in such format? I've read the documentation on Date and cannot find any method.
as.Date("Wednesday-September 7-2011", "%A-%B %d-%Y")
# [1] "2011-09-07"
https://www.stat.berkeley.edu/~s133/dates.html
If all your dates follow the same format, then I 'd suggest to remove the day and parse the rest, i.e.
x <- 'Wednesday - September 7 - 2011'
y <- paste(strsplit(x, ' - ')[[1]][-1], collapse = ' ')
#which gives [1] "September 7 2011"
as.POSIXct(y, format = '%B %d %Y')
#[1] "2011-09-07 EEST"
I'd probably strip out the weekday name and then parse the rest of the date. For example:
x <- "Wednesday-September 7-2011"
pos <- regexpr("-", x)
y<- (substr(x,pos+1,nchar(x)))
z<- parse_date(y, format = "%B %d-%Y")

Sequential numbering for each month on a period of time in R

I have set of dates for a period of 10 years starting April 2006 till August 2016 i.e. 125 months. I want to identify each month by marking them out by sequential numbering starting from "1" till "125" in corresponding column (new column).
Example:
All dates in Apr'2006 will be identified as 1...May'2006 as 2 ...... Aug'2016 as 125.
Dates in the data set is in format type.
Requesting guidance on how to achieve this.
Assume that you start with a vector of dates in factor format:
x<- as.factor(c("8/7/2006", "12/13/2006", "12/14/2006"))
First you should convert this vector to Date format. In your case this can be done like this
x<- as.Date(x, format= "%m/%d/%Y")
Using the format command you can delete the day of a specific date:
format(x, "%Y %m")
> "2006 08" "2006 12" "2006 12"
This way you get rid of the day and just keep year and month.
Next you define a reference vector which contains all months from April 2006 to August 2016:
ref<- seq(from= as.Date("04/01/2006", format= "%m/%d/%Y"), to= as.Date("08/01/2016", format= "%m/%d/%Y"), length.out = 125)
ref<- format(ref, "%Y %m").
Finally you compare the entries from x with the entries from ref. This can be done with the sapply function which basically applies a function to each component of x. Here, the function it applies is the function:
myfun<-function(z) {
which(ref == format(z, "%Y %m"))
}
But since you do not need the function myfun elsewhere you can directly plug it into the sapply funtion. In the end you use the command unlist, so you get a vector.
sapply(x, function(z) which(ref == format(z, "%Y %m")))
> 6 10 10
should do the trick.
Using lubridate to format the dates:
library(lubridate)
# Create a data frame from the string below, as a factor variable
dat <- '8/7/2006 12/13/2006 12/14/2006 12/15/2006 12/16/2006 8/28/2007 8/29/2007 4/22/2008 4/23/2008 4/24/2008 4/25/2008 4/28/2008 4/29/2008 4/30/2008 5/1/2008 5/2/2008 5/7/2016 5/7/2016 5/7/2016 5/7/2016 6/26/2016 7/4/2016 7/31/2016 8/28/2016'
test_df <- data.frame(original=as.factor(strsplit(dat, ' ')[[1]]))
# We will need to convert the dates to strings in the right format
test_df$converted_string <- as.character(floor_date(mdy(test_df$original), unit="month"))
# Create a lookup table
my_months <- seq(125)
names(my_months) <- seq(as.Date('2006-04-01'), by='month', length.out=125)
# Do the lookup
test_df$converted_int <- my_months[test_df$converted_string]

Resources