I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE
Related
I have a vector of date strings in the form month_name-2_digit_year i.e.
a = rbind("April-21", "March-21", "February-21", "January-21")
I'm trying to convert that vector into a vector of date objects. I'm aware this question is very similar to this: Convert non-standard date format to date in R posted some years ago, but unfortunately, it has not answered my question.
I have tried the following as.Date() calls to do this, but it just returns a vector of NA. I.e.
b = as.Date(a, format = "%B-%y")
b = as.Date(a, format = "%B%y")
b = as.Date(a, "%B-%y")
b = as.Date(a, "%B%y")
I'm also attempted to do it using the convertToDate function from the openxlsx package:
b = convertToDate(a, format = "%B-%y")
I have also tried all the above but using a single character string rather than a vector, but that produced the same issue.
I'm a little lost as to why this isn't working, as this format has worked in reverse earlier in my script (that is, I had a date object already in dd-mm-yyyy format and converted it to month_name-yy using %B-%y). Is there another way to go from string to date when the string is a non-standard (anything other than dd-mm-yyy or mm-dd-yy if you're in the US) date format?
For the record my R locales are all UK and english.
Thanks in advance.
A Date must have all three of day, month and year. Convert to yearmon class which requires only month and year and then to Date as in (1) and (2) below or add the day as in (3).
(1) and (3) give first of month and (2) gives the end of the month.
(3) uses only functions from base R.
Also consider not converting to Date at all but just use yearmon objects instead since they directly represent a year and month which is what the input represents.
library(zoo)
# test input
a <- c("April-21", "March-21", "February-21", "January-21")
# 1
as.Date(as.yearmon(a, "%B-%y"))
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# 2
as.Date(as.yearmon(a, "%B-%y"), frac = 1)
## [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
# 3
as.Date(paste(1, a), "%d %B-%y")
## [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
In addition to zoo, which #G. Grothendieck mentioned, you can also use clock or lubridate.
clock supports a variable precision calendar type called year_month_day. In this case you'd want "month" precision, then you can set the day to whatever you'd like and convert back to Date.
library(clock)
x <- c("April-21", "March-21", "February-21", "January-21")
ymd <- year_month_day_parse(x, format = "%B-%y", precision = "month")
ymd
#> <year_month_day<month>[4]>
#> [1] "2021-04" "2021-03" "2021-02" "2021-01"
# First of month
as.Date(set_day(ymd, 1))
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
# End of month
as.Date(set_day(ymd, "last"))
#> [1] "2021-04-30" "2021-03-31" "2021-02-28" "2021-01-31"
The simplest solution may be to use lubridate::my(), which parses strings in the order of "month then year". That assumes that you want the first day of the month, which may or may not be correct for you.
library(lubridate)
x <- c("April-21", "March-21", "February-21", "January-21")
# Assumes first of month
my(x)
#> [1] "2021-04-01" "2021-03-01" "2021-02-01" "2021-01-01"
I have three data tables in R. Each one has a date column. The tables are vix_data,gold_ohlc_data,btc_ohlc_data. They are formatted as follows:
head(vix_data$Date)
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
3435 Levels: 1/10/05 1/10/06 1/10/07 1/10/08 1/10/11 ... 9/9/16
head(gold_ohlc_data$date)
[1] 8/23/17 8/22/17 8/21/17 8/18/17 8/17/17 8/16/17
2519 Levels: 1/10/08 1/10/11 1/10/12 1/10/13 1/10/14 ... 9/9/16
head(btc_ohlc_data$Date)
[1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-20" "2017-08-19"
[6] "2017-08-18"
How can I change the date column in the vix_data and gold_ohlc_data tables to match the btc_ohlc_data format? I have tried several methods, for example using as.Date to transform each column- but this usually messes up the values and inserts a lot of N/A's
An option is to use functions from the package lubridate. The users need to know which one is day and which one is month to select the right function to use, such as dmy or mdy
# Load package
library(lubridate)
# Create example string
date1 <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
date2 <- c("8/23/17", "8/22/17", "8/21/17", "8/18/17", "8/17/17", "8/16/17")
# Convert to date class
dmy(date1)
# [1] "2004-02-01" "2004-05-01" "2004-06-01" "2004-07-01" "2004-08-01" "2004-09-01"
mdy(date1)
# [1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
mdy(date2)
# [1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-18" "2017-08-17" "2017-08-16"
Look into the package lubridate. lubridate::dmy() and ymd() should handle this just fine.
It looks like your data are read in as factors, so first you'll have to change them to characters. Then after that you can convert it to a date and specify the input format where %m represents the numerical month, %d represents the day, and %y represents the 2-digit year.
x <- c('1/2/04', '1/5/04', '1/6/04', '1/7/04', '1/8/04', '1/9/04')
y <- as.Date(x, format = "%m/%d/%y")
y
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08"
[6] "2004-01-09"
Are you sure you're specifying as.Date correctly? For example, do you have %y, instead of %Y?
I did the following and it worked:
> vix <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
> vix<- as.factor(vix)
> vix
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
Levels: 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
> as.Date(vix, "%m/%d/%y")
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours
I am trying to convert a column of dates into Date objects in R, but I can't seem to get the desired results. These individuals have birth dates before January 1, 1970, so when I use as.Date R converts a date like 1/12/54, for example, to 2054-01-12. How can I work around this? Thanks so much.
No need for add-on packages, base R is fine. But you need to specify the century:
R> as.Date("1954-01-12")
[1] "1954-01-12"
R>
If you need non-default formats, just specify them:
R> as.Date("19540112", "%Y%m%d")
[1] "1954-01-12"
R>
Edit: In case your data really comes in using the %y% format, and you happen to make the policy decision that the 19th century is needed
, here is one base R way of doing it:
R> d <- as.Date("540112", "%y%m%d")
R> dlt <- as.POSIXlt(d)
R> dlt$year <- dlt$year - 100
R> as.Date(dlt)
[1] "1954-01-12"
R>
If everything is in the 1900s, its a one-liner - just format it with a two-digit year at the start and slap a 19 on the front and convert to a date. Again. Man this would look cool some %>% stuff:
s = c("1/12/54","1/12/74")
as.Date(format(as.Date(s,format="%d/%m/%y"), "19%y%m%d"), "%Y%m%d")
# [1] "1954-12-01" "1974-12-01"
If years from "69" to "99" are 1800s, then here's another one-liner:
library(dplyr) # for pipe operator:
s %>% as.Date(format="%d/%m/%y") %>%
format("%y%m%d") %>%
(function(d){
paste0(ifelse(d>700101,"18","19"),d)
}) %>%
as.Date("%Y%m%d")
## [1] "1954-12-01" "1874-12-01"
Note not thoroughly tested so might be some off-by-one errors or I've mixed months and days because you need to be ISO8601 Compliant
I would do:
library(lubridate)
x <- as.Date("1/12/54", format = "%m/%d/%y")
year(x) <- 1900 + year(x) %% 100
> x
[1] "1954-01-12"
I am trying to convert the string "2013-JAN-14" into a Date as follow :
sdate1 <- "2013-JAN-14"
ddate1 <- as.Date(sdate1,format="%Y-%b-%d")
ddate1
but I get :
[1] NA
What am I doing wrong ? should I install a package for this purpose (I tried installing chron) .
Works for me. The reasons it doesn't for you probably has to do with your system locale.
?as.Date has the following to say:
## This will give NA(s) in some locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- as.Date(x, "%d%b%Y")
## Sys.setlocale("LC_TIME", lct)
Worth a try.
This can also happen if you try to convert your date of class factor into a date of class Date. You need to first convert into POSIXt otherwise as.Date doesn't know what part of your string corresponds to what.
Wrong way: direct conversion from factor to date:
a<-as.factor("24/06/2018")
b<-as.Date(a,format="%Y-%m-%d")
You will get as an output:
a
[1] 24/06/2018
Levels: 24/06/2018
class(a)
[1] "factor"
b
[1] NA
Right way, converting factor into POSIXt and then into date
a<-as.factor("24/06/2018")
abis<-strptime(a,format="%d/%m/%Y") #defining what is the original format of your date
b<-as.Date(abis,format="%Y-%m-%d") #defining what is the desired format of your date
You will get as an output:
abis
[1] "2018-06-24 AEST"
class(abis)
[1] "POSIXlt" "POSIXt"
b
[1] "2018-06-24"
class(b)
[1] "Date"
My solution below might not work for every problem that results in as.Date() returning NA's, but it does work for some, namely, when the Date variable is read in in factor format.
Simply read in the .csv with stringsAsFactors=FALSE
data <- read.csv("data.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date)
After trying (and failing) to solve the NA problem with my system locale, this solution worked for me.