Fixing date format in R - r

I have three data tables in R. Each one has a date column. The tables are vix_data,gold_ohlc_data,btc_ohlc_data. They are formatted as follows:
head(vix_data$Date)
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
3435 Levels: 1/10/05 1/10/06 1/10/07 1/10/08 1/10/11 ... 9/9/16
head(gold_ohlc_data$date)
[1] 8/23/17 8/22/17 8/21/17 8/18/17 8/17/17 8/16/17
2519 Levels: 1/10/08 1/10/11 1/10/12 1/10/13 1/10/14 ... 9/9/16
head(btc_ohlc_data$Date)
[1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-20" "2017-08-19"
[6] "2017-08-18"
How can I change the date column in the vix_data and gold_ohlc_data tables to match the btc_ohlc_data format? I have tried several methods, for example using as.Date to transform each column- but this usually messes up the values and inserts a lot of N/A's

An option is to use functions from the package lubridate. The users need to know which one is day and which one is month to select the right function to use, such as dmy or mdy
# Load package
library(lubridate)
# Create example string
date1 <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
date2 <- c("8/23/17", "8/22/17", "8/21/17", "8/18/17", "8/17/17", "8/16/17")
# Convert to date class
dmy(date1)
# [1] "2004-02-01" "2004-05-01" "2004-06-01" "2004-07-01" "2004-08-01" "2004-09-01"
mdy(date1)
# [1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
mdy(date2)
# [1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-18" "2017-08-17" "2017-08-16"

Look into the package lubridate. lubridate::dmy() and ymd() should handle this just fine.

It looks like your data are read in as factors, so first you'll have to change them to characters. Then after that you can convert it to a date and specify the input format where %m represents the numerical month, %d represents the day, and %y represents the 2-digit year.
x <- c('1/2/04', '1/5/04', '1/6/04', '1/7/04', '1/8/04', '1/9/04')
y <- as.Date(x, format = "%m/%d/%y")
y
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08"
[6] "2004-01-09"

Are you sure you're specifying as.Date correctly? For example, do you have %y, instead of %Y?
I did the following and it worked:
> vix <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
> vix<- as.factor(vix)
> vix
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
Levels: 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
> as.Date(vix, "%m/%d/%y")
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"

Related

What does calling as.numeric() do to a lubridate Date object?

I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE

Converting non-standard date format strings ("April-20") to date objects R

I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"

Converting characters like "01APR2020" to date class R

I have the following column in my dataframe
> df$dates
[1] "01APR2020" "01JUN2020" "01MAR2020" "01MAY2020" "02APR2020" "02JUN2020"
[7] "02MAR2020"
I would like to format this to an object of Date class, so I want my output to look like this
> df$dates
[1] "01-04" "01-06" "01-03" "01-05" "02-04" "02-06"
[7] "02-03"
And I would like to order them from the oldest to the newest.
Edit:
For example I tried this but it doesn't work:
> format(as.Date("01APR2020", "%d%b%Y"), "%d-%m")
[1] NA
Thanks!
Just use the anydate() function from the anytime package
R> anydate(c("01APR2020", "01JUN2020", "01MAR2020"))
[1] "2020-04-01" "2020-06-01" "2020-03-01"
R>
It's idea is to not require a format for a variety of common and sensible date (or datetime) inputs. Once they are parsed, putting out day and months is easy too:
R> format(anydate(c("01APR2020", "01JUN2020", "01MAR2020")), "%d-%m")
[1] "01-04" "01-06" "01-03"
R>
We can use as.Date with format
df$dates <- format(as.Date(df$dates, "%d%b%Y"), "%d-%m")
df$dates
#[1] "01-04" "01-06" "01-03" "01-05" "02-04" "02-06" "02-03"
Or using lubridate
library(lubridate)
df$dates <- format(dmy(df$dates), "%d-%m")
NOTE: Both the solutions work on R 4.0
data
df <- data.frame(dates = c("01APR2020" ,"01JUN2020", "01MAR2020",
"01MAY2020", "02APR2020" ,"02JUN2020" , "02MAR2020"))

Convert numeric values to dates

I have a numeric vector as follows
aa <- c(1022011, 2022011, 13022011, 23022011) (this vector is just a sample, it is very long)
Values are written in such a way that first value is day then month and then year.
What I am doing right now is
as.Date(as.character(aa), %d%m%Y")
but,
it is causing problems (returning NA) in case of single digits day numbers. (i.e. 1022011, 2022011).
so basically
as.Date("1022011", "%d%m%Y") does not work
but
as.Date("01022011", "%d%m%Y") (pasting '0' ahead of the number) works.
I want to avoid pasting '0' in such cases. Is there any other (direct) alternative to convert numeric values to dates at once?
It could be rearranged using sub in which case a plain as.Date with no format works:
x <- c(1022011, 11022011) # test data
pat <- "^(..?)(..)(....)$"
as.Date(sub(pat, "\\3-\\2-\\1", x))
giving:
[1] "2011-02-01" "2011-02-11"
Depending on your platform, you could use sprintf in order to add a zero at the beginning. It seems that Mac is OK with this, but not windows 7 given the discussion with the OP.
aa <- c(1022011, 2022011, 13022011, 23022011)
as.Date(sprintf("%08s", aa), format = "%d%m%Y")
[1] "2011-02-01" "2011-02-02" "2011-02-13" "2011-02-23"
UPDATE
#CathyG kindly mentioned that sprintf("%08i",aa) works on Windows 7.
You can use dmy in lubridate:
library(lubridate)
aa <- c(1022011, 2022011, 13022011, 23022011)
> dmy(aa)
[1] "2011-02-01 UTC" "2011-02-02 UTC" "2011-02-13 UTC" "2011-02-23 UTC"
and if you don't want the timezone just wrap it in as.Date:
> as.Date(dmy(aa))
[1] "2011-02-01" "2011-02-02" "2011-02-13" "2011-02-23"
Thank you #Ben Bolker,
> as.Date(mdy(aa))
[1] "2011-01-02" "2011-02-02" "2012-01-02" "2011-01-02"
I know you don't want to add a "0" but still, in base R, this works :
as.Date(sapply(aa,function(x){ifelse(nchar(x)==8,x,paste("0",x,sep=""))}),format = "%d%m%Y")

Create a Vector of All Days Between Two Dates

Is there an easy way in R for me to itemize all valid days that occurred between two specified dates? For instance, I'd like the following inputs:
itemizeDates(startDate="12-30-11", endDate="1-4-12")
To produce the following dates:
"12-30-11" "12-31-11", "1-1-12", "1-2-12", "1-3-12", "1-4-12"
I'm flexible on classes and formatting of the dates, I just need an implementation of the concept.
You're looking for seq
> seq(as.Date("2011-12-30"), as.Date("2012-01-04"), by="days")
[1] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03"
[6] "2012-01-04"
Or, you can use :
> as.Date(as.Date("2011-12-30"):as.Date("2012-01-04"), origin="1970-01-01")
[1] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03"
[6] "2012-01-04"
Note that with : "Non-numeric arguments are coerced internally". Thus, we convert back to class Date, using as.Date method for class 'numeric' and provide origin.
Here's a function to meet your specific request
itemizeDates <- function(startDate="12-30-11", endDate="1-4-12",
format="%m-%d-%y") {
out <- seq(as.Date(startDate, format=format),
as.Date(endDate, format=format), by="days")
format(out, format)
}
> itemizeDates(startDate="12-30-11", endDate="1-4-12")
[1] "12-30-11" "12-31-11" "01-01-12" "01-02-12" "01-03-12" "01-04-12"
I prefer using the lubridate package to solve datetime problems. It is more intuitive and easier to understand and use once you know it.
library(lubridate)
#mdy() in lubridate package means "month-day-year", which is used to convert
#the string to date object
>start_date <- mdy("12-30-11")
>end_date <- mdy("1-4-12")
#calculate how many days in this time interval
>n_days <- interval(start_date,end_date)/days(1)
>start_date + days(0:n_days)
[1]"2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04"
#convert to original format
format(start_date + days(0:n_days), format="%m-%d-%y")
[1] "12-30-11" "12-31-11" "01-01-12" "01-02-12" "01-03-12" "01-04-12"
Reference:
Dates and Times Made Easy with lubridate
2 similar implementations in lubridate:
library(lubridate)
as_date(mdy("12-30-11"):mdy("1-4-12"))
# OR
seq(mdy("12-30-11"), mdy("1-4-12"), by = "days")
These don't format your dates in month-day-year but you can fix the formatting if you want. But year-month-day is a bit easy to work with when analyzing.

Resources