Converting dates from imported CSV file - r

I'm importing time series data from a CSV file and one of the vectors/columns are dates in the format DD/MM/YYYY. Vector class is characters or factors if I chose the Strings as factors = True. I convert the imported file to a data frame and then run the following:
df$Date <- as.Date(df$Date , "%d/%m/%y")
I get no error message, but the dates are all messed up in the format YYYYMMDD and all the YYYY are the year 2020...
Before:
10/09/2009
11/09/2009
14/09/2009
After:
2020-09-10
2020-09-11
2020-09-14

You are using %y when it should be %Y. See the documentation here.
%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year). Note that the standards also say that years before 1582 in its calendar should only be used with agreement of the parties involved.
Try running the code again so that the data frame is not modified by any previous attempt but this time use
df$Date <- as.Date(df$Date , "%d/%m/%Y")

#Heroka is right.
If ever you need it you could also use posixct objects (they contain information of seconds)
Try this:
df$Date.time <- as.POSIXct(df$Date , format="%d/%m/%Y")
If you want the date and time in strings you can try the following:
df$Date.time <- format(as.POSIXct(df$Date , format="%d/%m/%Y"),format="%Y-%m-%d %H:%M")
or
df$Date <- format(as.POSIXct(df$Date , format="%d/%m/%Y"),format="%Y-%m-%d")

Related

as.Date giving me NA's

I've tried everything in this thread as.Date returning NA while converting from 'ddmmmyyyy' to try and sort my problem.
I'm using these commands to turn a factor into a date:
cohort$doi <- as.Date(cohort$doi, format= "%Y/%m/%d")
All my dates are currently in the format: YYYY-MM-DD, so as far as I'm aware the above should work
I used this code yesterday to convert all my dates for various variables from a factor to a date. It worked yesterday and everything was fine. Today I opened my script and imported in my data, ran this command and viewed my data but all of the dates now say NA.
I've tried everything from previous threads (I looked at a few more than just the one I linked above) but nothing has so far worked. I'm not sure what to do now
Example of what doi column looks like:
1970-01-01
1970-02-02
1970-03-03
1970-04-04
The column is currently classed as an factor. And when I do the code I used above, the column is defined as a date but all the dates now say NA
Other than closing R and opening it up again for today, I've done nothing else.
If you read the documentation for as.Date you will note the default format is %Y-%d-%m or %Y/%d/%m:
The default formats follow the rules of the ISO 8601 international standard which expresses a day as "2001-02-03".
In your code you have specified your dates are formatted by slashes, but your sample data shows they are formatted in the default format used by as.Date:
doi <- as.factor(c("1970-01-01",
"1970-02-02",
"1970-03-03",
"1970-04-04"))
as.Date(doi) # default format %Y-%m-%d
[1] "1970-01-01" "1970-02-02" "1970-03-03" "1970-04-04"
as.Date(doi, format = "%Y/%m/%d") # incorrect specification of your date format
[1] NA NA NA NA
as.Date("1970/01/01") # also a default format
[1] "1970-01-01"
Note: as.Date accepts character strings, factors, logical NA and objects of classes "POSIXlt" and "POSIXct".

Date Transformation in R

I'm facing a very minor issue, but somehow can't resolve it.
When I'm importing a csv file that has date, the date is coming in "%Y-%m-%d" format. But I want it to be in "%d-%m-%Y" format. I tried "as.Date" to transform it. But it's not working.
The data structure look like this after importing:
Date Share_Val
21/01/2015 20
22/01/2015 19
23/01/2015 21
24/01/2015 23
25/01/2015 26
But when I'm importing the file by read.csv, the data look like the following:
Date Share_Val
01/21/2015 20
01/22/2015 19
01/23/2015 21
01/24/2015 23
01/25/2015 26
I tried lubridate. But it didn't help.
Sam's result comes exactly the way I wanted. But when I'm trying the following, it's not coming
data$date<-format(as.Date(data$date,"%m/%d/%Y"))
Can anybody please give me any suggestions?
See if this helps. Note the stringsAsFactors. If your Date field is a factor, you will need data$Date <- as.character(data$Date) first
data <- data.frame(Date = c("21/01/2015", "22/01/2015", "23/01/2015",
"24/01/2015", "25/01/2015"), Share_Val=c(20, 19, 21, 23, 26),
stringsAsFactors=F)
format(as.Date(data$Date, "%d/%m/%Y"), "%d-%m-%Y")
[1] "21-01-2015" "22-01-2015" "23-01-2015" "24-01-2015" "25-01-2015"
Too long for a comment.
I think you may be misunderstanding how Dates work in R. A variable (or column) of class Date is stored internally as the number of days since 1970-01-01. When you print a Date variable, it is displayed using the %Y-%m-%d format. The as.Date(...) function converts character to Date. The format=... argument controls how the character string is interpreted, not how the result is displayed, as in:
as.Date("02/05/2015", format="%m/%d/%Y")
# [1] "2015-02-05"
as.Date("02/05/2015", format="%d/%m/%Y")
# [1] "2015-05-02"
So in the first case the string is interpreted as 05 Feb, in the second 02 May. Note that in both cases the result is displayed (printed) in %Y-%m-%d format.

How to convert ordinal date day-month-year format using R

I have log files where the date is mentioned in the ordinal date format.
wikipedia page for ordinal date
i.e 14273 implies 273'rd day of 2014 so 14273 is 30-Sep-2014.
is there a function in R to convert ordinal date (14273) to (30-Sep-2014).
Tried the date package but didn come across a function that would do this.
Try as.Date with the indicated format:
as.Date(sprintf("%05d", 14273), format = "%y%j")
## [1] "2014-09-30"
Notes
For more information see ?strptime [link]
The 273 part is sometimes referred to as the day of the year (as opposed to the day of the month) or the day number or the julian day relative to the beginning of the year.
If the input were a character string of the form yyjjj (rather than numeric) then as.Date(x, format = "%y%j") will do.
Update Have updated to also handle years with one digit as per comments.
Data example
x<-as.character(c("14273", "09001", "07031", "01033"))
Data conversion
x1<-substr(x, start=0, stop=2)
x2<-substr(x, start=3, stop=5)
x3<-format(strptime(x2, format="%j"), format="%m-%d")
date<-as.Date(paste(x3, x1, sep="-"), format="%m-%d-%y")
You can use lubridate package as follows:
>library(lubridate)
# Create a template date object
>date <- as.POSIXlt("2009-02-10")
# Update the date using
> update(date, year=2014, yday=273)
[1] "2014-09-30 JST"

R as.Date conversion century error

In my dataset a column contains Date of Births of many employees so many of them lies in the range 1960 to 1980. I am trying to format them using as.Date and in some of them the results are not per my expectation.
Example:
as.Date("7/1/61","%m/%d/%y")
i want it to return "1961-07-01" but it returns "2061-07-01".
Read:
?strptime # where all the formatting details are available
%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behavior specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
So you need a regex to backdate and it's probably better to do as a string conversion before sending to as.Date:
dvec <- c("7/1/61", "7/1/79")
as.Date( sub("/(..$)", "/19\\1",dvec) , "%m/%d/%Y")
[1] "1961-07-01" "1979-07-01"
If this goes into production it will become an error waiting to happen when the age of your employees starts to creep above the last two digits of the current year.

date import, incorrect century

I have a bunch of dates that I am parsing that are in the form "%m/%d/%y". as.Date(dates, format = "%m/%d/%y") converts a date like "1/01/64" to "2064-01-01" but I need that to be "1964-01-01." I suppose I can find instances where the year is in the future and then subtract a century, but that seems a little ridiculous.
Dates are stored internal as integer days, so there is only such formatting at the time of input or output. As for input without century information I think you are out of luck. Here's what ?strptime says about the %y format spec: "On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’."
as.Date( "01/01/64", "%m/%d/%y", origin="1970-01-01") -100*365.25
#[1] "1964-01-01"
It might be possible to start a bar fight about programmers who allow removal of century information given that Y2K is so recent in the past.
Since the default is to assume year 00-68 is 2000-2068, it is certainly possible to create an as.Dateshift
Another way to fix the dates is to change all years that occur in the future (relative to today's date using Sys.Date()) as starting with 19 instead of 20.
dates=as.Date(c("01/01/64", "12/31/15"))
# [1] "2064-01-01" "2015-12-31" ## contains an incorrect date
## Now correct the dates that havn't yet occurred
as.Date(ifelse(dates > Sys.Date(), format(dates, "19%y-%m-%d"), format(dates)))
#[1] "1964-01-01" "2015-12-31"

Resources