I am trying to convert a column of dates into Date objects in R, but I can't seem to get the desired results. These individuals have birth dates before January 1, 1970, so when I use as.Date R converts a date like 1/12/54, for example, to 2054-01-12. How can I work around this? Thanks so much.
No need for add-on packages, base R is fine. But you need to specify the century:
R> as.Date("1954-01-12")
[1] "1954-01-12"
R>
If you need non-default formats, just specify them:
R> as.Date("19540112", "%Y%m%d")
[1] "1954-01-12"
R>
Edit: In case your data really comes in using the %y% format, and you happen to make the policy decision that the 19th century is needed
, here is one base R way of doing it:
R> d <- as.Date("540112", "%y%m%d")
R> dlt <- as.POSIXlt(d)
R> dlt$year <- dlt$year - 100
R> as.Date(dlt)
[1] "1954-01-12"
R>
If everything is in the 1900s, its a one-liner - just format it with a two-digit year at the start and slap a 19 on the front and convert to a date. Again. Man this would look cool some %>% stuff:
s = c("1/12/54","1/12/74")
as.Date(format(as.Date(s,format="%d/%m/%y"), "19%y%m%d"), "%Y%m%d")
# [1] "1954-12-01" "1974-12-01"
If years from "69" to "99" are 1800s, then here's another one-liner:
library(dplyr) # for pipe operator:
s %>% as.Date(format="%d/%m/%y") %>%
format("%y%m%d") %>%
(function(d){
paste0(ifelse(d>700101,"18","19"),d)
}) %>%
as.Date("%Y%m%d")
## [1] "1954-12-01" "1874-12-01"
Note not thoroughly tested so might be some off-by-one errors or I've mixed months and days because you need to be ISO8601 Compliant
I would do:
library(lubridate)
x <- as.Date("1/12/54", format = "%m/%d/%y")
year(x) <- 1900 + year(x) %% 100
> x
[1] "1954-01-12"
Related
I have a date in R, e.g.:
dt = as.Date('2010/03/17')
I would like to subtract 2 years from this date, without worrying about leap years and such issues, getting as.Date('2008-03-17').
How would I do that?
With lubridate
library(lubridate)
ymd("2010/03/17") - years(2)
The easiest thing to do is to convert it into POSIXlt and subtract 2 from the years slot.
> d <- as.POSIXlt(as.Date('2010/03/17'))
> d$year <- d$year-2
> as.Date(d)
[1] "2008-03-17"
See this related question: How to subtract days in R?.
You could use seq:
R> dt = as.Date('2010/03/17')
R> seq(dt, length=2, by="-2 years")[2]
[1] "2008-03-17"
If leap days are to be taken into account then I'd recommend using this lubridate function to subtract months, as other methods will return either March 1st or NA:
> library(lubridate)
> dt %m-% months(12*2)
[1] "2008-03-17"
# Try with leap day
> leapdt <- as.Date('2016/02/29')
> leapdt %m-% months(12*2)
[1] "2014-02-28"
Same answer than the one by rcs but with the possibility to operate it on a vector (to answer to MichaelChirico, I can't comment I don't have enough rep):
R> unlist(lapply(c("2015-12-01", "2016-12-01"),
function(x) { return(as.character(seq(as.Date(x), length=2, by="-1 years")[2])) }))
[1] "2014-12-01" "2015-12-01"
This way seems to do the job as well
dt = as.Date("2010/03/17")
dt-365*2
[1] "2008-03-17"
as.Date("2008/02/29")-365*2
## [1] "2006-03-01"
cur_date <- str_split(as.character(Sys.Date()), pattern = "-")
cur_yr <- cur_date[[1]][1]
cur_month <- cur_date[[1]][2]
cur_day <- cur_date[[1]][3]
new_year <- as.integer(year) - 2
new_date <- paste(new_year, cur_month, cur_day, sep="-")
Using Base R, you can simply use the following without installing any package.
1) Transform your character string to Date format, specifying the input format in the second argument, so R can correctly interpret your date format.
dt = as.Date('2010/03/17',"%Y/%m/%d")
NOTE: If you look now at your enviroment tab you will see dt as variable with the following value "2010-03-17" (Year-month-date separated by "-" not by "/")
2) specify how many years to substract
years_substract=2
3) Use paste() combined with format () to only keep Month and Day and Just substract 2 year from your original date. Format() function will just keep the specific part of your date accordingly with format second argument.
dt_substract_2years<-
as.Date(paste(as.numeric(format(dt,"%Y"))-years_substract,format(dt,"%m"),format(dt,"%d"),sep = "-"))
NOTE1: We used paste() function to concatenate date components and specify separator as "-" (sep = "-")as is the R separator for dates by default.
NOTE2: We also used as.numeric() function to transform year from character to numeric
I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE
I have a date in R, e.g.:
dt = as.Date('2010/03/17')
I would like to subtract 2 years from this date, without worrying about leap years and such issues, getting as.Date('2008-03-17').
How would I do that?
With lubridate
library(lubridate)
ymd("2010/03/17") - years(2)
The easiest thing to do is to convert it into POSIXlt and subtract 2 from the years slot.
> d <- as.POSIXlt(as.Date('2010/03/17'))
> d$year <- d$year-2
> as.Date(d)
[1] "2008-03-17"
See this related question: How to subtract days in R?.
You could use seq:
R> dt = as.Date('2010/03/17')
R> seq(dt, length=2, by="-2 years")[2]
[1] "2008-03-17"
If leap days are to be taken into account then I'd recommend using this lubridate function to subtract months, as other methods will return either March 1st or NA:
> library(lubridate)
> dt %m-% months(12*2)
[1] "2008-03-17"
# Try with leap day
> leapdt <- as.Date('2016/02/29')
> leapdt %m-% months(12*2)
[1] "2014-02-28"
Same answer than the one by rcs but with the possibility to operate it on a vector (to answer to MichaelChirico, I can't comment I don't have enough rep):
R> unlist(lapply(c("2015-12-01", "2016-12-01"),
function(x) { return(as.character(seq(as.Date(x), length=2, by="-1 years")[2])) }))
[1] "2014-12-01" "2015-12-01"
This way seems to do the job as well
dt = as.Date("2010/03/17")
dt-365*2
[1] "2008-03-17"
as.Date("2008/02/29")-365*2
## [1] "2006-03-01"
cur_date <- str_split(as.character(Sys.Date()), pattern = "-")
cur_yr <- cur_date[[1]][1]
cur_month <- cur_date[[1]][2]
cur_day <- cur_date[[1]][3]
new_year <- as.integer(year) - 2
new_date <- paste(new_year, cur_month, cur_day, sep="-")
Using Base R, you can simply use the following without installing any package.
1) Transform your character string to Date format, specifying the input format in the second argument, so R can correctly interpret your date format.
dt = as.Date('2010/03/17',"%Y/%m/%d")
NOTE: If you look now at your enviroment tab you will see dt as variable with the following value "2010-03-17" (Year-month-date separated by "-" not by "/")
2) specify how many years to substract
years_substract=2
3) Use paste() combined with format () to only keep Month and Day and Just substract 2 year from your original date. Format() function will just keep the specific part of your date accordingly with format second argument.
dt_substract_2years<-
as.Date(paste(as.numeric(format(dt,"%Y"))-years_substract,format(dt,"%m"),format(dt,"%d"),sep = "-"))
NOTE1: We used paste() function to concatenate date components and specify separator as "-" (sep = "-")as is the R separator for dates by default.
NOTE2: We also used as.numeric() function to transform year from character to numeric
I have a date in R, e.g.:
dt = as.Date('2010/03/17')
I would like to subtract 2 years from this date, without worrying about leap years and such issues, getting as.Date('2008-03-17').
How would I do that?
With lubridate
library(lubridate)
ymd("2010/03/17") - years(2)
The easiest thing to do is to convert it into POSIXlt and subtract 2 from the years slot.
> d <- as.POSIXlt(as.Date('2010/03/17'))
> d$year <- d$year-2
> as.Date(d)
[1] "2008-03-17"
See this related question: How to subtract days in R?.
You could use seq:
R> dt = as.Date('2010/03/17')
R> seq(dt, length=2, by="-2 years")[2]
[1] "2008-03-17"
If leap days are to be taken into account then I'd recommend using this lubridate function to subtract months, as other methods will return either March 1st or NA:
> library(lubridate)
> dt %m-% months(12*2)
[1] "2008-03-17"
# Try with leap day
> leapdt <- as.Date('2016/02/29')
> leapdt %m-% months(12*2)
[1] "2014-02-28"
Same answer than the one by rcs but with the possibility to operate it on a vector (to answer to MichaelChirico, I can't comment I don't have enough rep):
R> unlist(lapply(c("2015-12-01", "2016-12-01"),
function(x) { return(as.character(seq(as.Date(x), length=2, by="-1 years")[2])) }))
[1] "2014-12-01" "2015-12-01"
This way seems to do the job as well
dt = as.Date("2010/03/17")
dt-365*2
[1] "2008-03-17"
as.Date("2008/02/29")-365*2
## [1] "2006-03-01"
cur_date <- str_split(as.character(Sys.Date()), pattern = "-")
cur_yr <- cur_date[[1]][1]
cur_month <- cur_date[[1]][2]
cur_day <- cur_date[[1]][3]
new_year <- as.integer(year) - 2
new_date <- paste(new_year, cur_month, cur_day, sep="-")
Using Base R, you can simply use the following without installing any package.
1) Transform your character string to Date format, specifying the input format in the second argument, so R can correctly interpret your date format.
dt = as.Date('2010/03/17',"%Y/%m/%d")
NOTE: If you look now at your enviroment tab you will see dt as variable with the following value "2010-03-17" (Year-month-date separated by "-" not by "/")
2) specify how many years to substract
years_substract=2
3) Use paste() combined with format () to only keep Month and Day and Just substract 2 year from your original date. Format() function will just keep the specific part of your date accordingly with format second argument.
dt_substract_2years<-
as.Date(paste(as.numeric(format(dt,"%Y"))-years_substract,format(dt,"%m"),format(dt,"%d"),sep = "-"))
NOTE1: We used paste() function to concatenate date components and specify separator as "-" (sep = "-")as is the R separator for dates by default.
NOTE2: We also used as.numeric() function to transform year from character to numeric
I can't figure out how to turn Sys.Date() into a number in the format YYYYDDD. Where DDD is the day of the year, i.e. Jan 1 would be 2016001 Dec 31 would be 2016365
Date <- Sys.Date() ## The Variable Date is created as 2016-01-01
SomeFunction(Date) ## Returns 2016001
You can just use the format function as follows:
format(Date, '%Y%j')
which gives:
[1] "2016161" "2016162" "2016163"
If you want to format it in other ways, see ?strptime for all the possible options.
Alternatively, you could use the year and yday functions from the data.table or lubridate packages and paste them together with paste0:
library(data.table) # or: library(lubridate)
paste0(year(Date), yday(Date))
which will give you the same result.
The values that are returned by both options are of class character. Wrap the above solutions in as.numeric() to get real numbers.
Used data:
> Date <- Sys.Date() + 1:3
> Date
[1] "2016-06-09" "2016-06-10" "2016-06-11"
> class(Date)
[1] "Date"
Here's one option with lubridate:
library(lubridate)
x <- Sys.Date()
#[1] "2016-06-08"
paste0(year(x),yday(x))
#[1] "2016160"
This should work for creating a new column with the specified date format:
Date <- Sys.Date
df$Month_Yr <- format(as.Date(df$Date), "%Y%d")
But, especially when working with larger data sets, it is easier to do the following:
library(data.table)
setDT(df)[,NewDate := format(as.Date(Date), "%Y%d"
Hope this helps. May have to tinker if you only want one value and are not working with a data set.