Formatting Unconventional Date - r

I'm having trouble formatting a list of dates in R. The conventional methods of formatting in R such as as.Date or as.POSIXct don't seem to be working.
I have dates in the format: 1012015
using
as.POSIXct(as.character(data$Start_Date), format = "%m%d%Y")
does not give me an error, but my date returns
"0015-10-12" because the month is not a two digit number.
Is there a way to change this into the correct date format?F

The lubridate package can help with this:
lubridate::mdy(1012015)
[1] "2015-01-01"
The format looks ambiguous but the OP gave two hints:
He is using format = "%m%d%Y" in his own attempt, and
he argues the issue is because the month is not a two digit number

This uses only base R. The %08d specifies a number to be formatted into 8 characters with 0 fill giving in this case "01012015".
as.POSIXct(sprintf("%08d", 1012015), format = "%m%d%Y")
## [1] "2015-01-01 EST"
Note that if you don't have any hours/minutes/seconds it would be less error prone to use "Date" class since then the possibility of subtle time zone errors is eliminated.
as.Date(sprintf("%08d", 1012015), format = "%m%d%Y")
## [1] "2015-01-01"

Related

transform julian day with decimal to date and hour in R

I have to convert on R a column with julian dates with decimal part (as parts of the day) to date and hour.
I tried this function :
as.Date(10625.15, origin=as.Date("1990-01-01 00:00:00"))
But it only gave me the date without the times :
"2019-02-02"
Someone can help me to resolve it ? Thanks in advance!!
You used as.Date and it returned a date, exactly what it is designed to do (ref: ?as.Date says it will return an object of class "Date"). Fortunately, it returns a fractional date:
dput(as.Date(10625.15, origin=as.Date("1990-01-01 00:00:00")))
# structure(17930.15, class = "Date")
### ^^^ yay! we have fraction
so we can wrap it in as.POSIXct:
as.POSIXct(as.Date(10625.15, origin=as.Date("1990-01-01 00:00:00")))
# [1] "2019-02-02 22:36:00 EST"
Timezone notwithstanding. .15 of a day is 3.6 hours, so with converting to UTC above, this would show 03:36:00.
One might be tempted to use as.POSIXct in place of as.Date, though realize that 10625.15 is in fractional days, not fractional seconds (which is what as.POSIXct will expect). To do that, you need to convert from "days" to "seconds":
as.POSIXct(86400*10625.15, origin=as.Date("1990-01-01 00:00:00"))
# [1] "2019-02-02 22:36:00 EST"

as.Date returns the day with four digits in R

I am trying to convert a character to date format, but "day" is replaced with 4 digits.
> mydate="20/3/20"
> mydate=as.Date(mydate)
> mydate
[1] "0020-03-20"
This didn't happen a few days ago, only now as I re-run an old script. I don't know if this has to do with me upgrading recently to R version 3.6.3.
I tried to remove the "00" with gsub, but it turns the date back to character again.
> mydate=gsub("00","",mydate)
> class(mydate)
[1] "character"
Does anyone know how I can fix that with mydate remaining as date, or why this happens?
Thank you in advance
The default format for Date class is "%Y-%m-%d" i.e. 4 digit year followed by a dash, followed by two digits, then a dash and two digits. Any other format, we need to specify the format argument in as.Date
as.Date(mydate, "%d/%m/%y")
#[1] "2020-03-20"
We need to be very careful with two-digit year as it can lead some strange output. It is better to have 4 digit year

How can I change character class date variables to POSIXlt class when there are multiple date formats?

I'm struggling with converting character class dates of many different format types (e.g., yyyy/mm/dd; mm/dd/yyyy; yyyy-mm-dd; mm-dd-yyyy; yy-mm-dd; mm-dd-yy; etc.) to POSIXlt class. Ideally, I would like to convert all birth_dates to POSIXlt class with yyyy/mm/dd format (see sample data below). Is there any simple way to do this in R?:
id birth_date start_date age
102 08/09/1993 2013/09/01 20
103 1995-02-21 2013/09/01 18
104 01-15-94 2013/09/01 19
105 88-12-30 2013/09/01 24
Here is what I have been doing thus far. Unfortunately, this doesn't seem to work (I wind up with more NAs than there should be) given all of the different ways in which the original date is formatted:
library(lubridate)
data$birth_date1<-as.Date(data$birth_date,format="%Y-%m-%d") #Convert character class to date class
data$birth_date2<-ymd(swc3$birth_date1) #Convert date class to POSIXlt class using lubridate pkg
That's horrible. Could be worse though. At least there are delimiters in there, like "-" and "/".
Short Answer
Yes, there's an easy way to parse that in R. Apply parse_date_time() separately to each birth date, giving it a decent orders list to chose from, and carefully set the order of the guesses. You'll need to convert the "integer-time" to a useful time when you're done.
See the Long Answer for details.
Long Answer
This is why the lubridate package has parse_date_time(). But there are problems. Let's see:
require(lubridate)
# WRONG! doesn't work as intended.
as.Date(
parse_date_time(data$birth_date,
orders=c("ymd", "mdy", "mdY", "Ymd")
)
)
[1] "1993-08-09" "1995-02-21" "1994-01-15" "0088-12-30"
That looks great, except for the last one. What's going on?
parse_date_time() is selecting a "best fit" set of orders and formats to use when parsing the dates, and the last element is the odd one out.
To make this work as intended, you'll need to apply parse_date_time() one-by-one to each date, because each date format was apparently selected more-or-less at random. This will be slower, but it will give more useful answers.
# RIGHT. Some conversion of results required.
parsed <- sapply(data[,"birth_date"],
parse_date_time,
orders=c("ymd", "mdy", "mdY", "Ymd") )
parsed
08/09/1993 1995-02-21 01-15-94 88-12-30
744854400 793324800 758592000 599443200
Ok, those look like Unix-time integers, which are the unclass()'d version of what parse_date_time() produces. And none are negative, so they must all have happened after 1970. This is encouraging. Convert:
# Conversion of results
parsed <- as.POSIXct(parsed, origin="1970-01-01", tz = "GMT")
as.Date(parsed)
08/09/1993 1995-02-21 01-15-94 88-12-30
"1993-08-09" "1995-02-21" "1994-01-15" "1988-12-30"
lubridate and parse_date_time() are very good at what they do.
Since you asked for POSIXlt, not Date types:
as.POSIXlt(parsed)
08/09/1993 1995-02-21
"1993-08-09 10:00:00 AEST" "1995-02-21 11:00:00 AEDT"
01-15-94 88-12-30
"1994-01-15 11:00:00 AEDT" "1988-12-30 11:00:00 AEDT"
Though I personally prefer only having dates when the actual time isn't important; these are assumed to be all happening at midnight UTC, and are converted to my time zone (Eastern Australia).

converting mddyy date to date

I have a date column in a .csv spreadsheet generated in inquisit.
the format is mddyy, so the 13th of July is 71315. R recognizes this as an integer.
Can anyone recommend a way to convert this to ISO 8601 date format?
Since you mention that the day is always two numbers, you could use a little sprintf() magic to add zeros out front.
as.Date(sprintf("%06d", 71315), "%m%d%y")
# [1] "2015-07-13"
The sprintf() call here adds zeros up to 6 characters and also turn it into a character vector, so as.Date() will accept it.
sprintf("%06d", 71315)
# [1] "071315"
We could also use mdy from library(lubridate)
library(lubridate)
mdy(71315) #returns POSIXct class.
#[1] "2015-07-13 UTC"
as.Date(mdy(71315)) #convert to `Date` class.
#[1] "2015-07-13"

In R, how to convert a character of specific format into a date?

I have a character string that deals with time:
"2007-06-11T09:15:35Z"
However, I cannot convert this into date using the following command:
strptime("2007-06-11T09:15:35Z", paste("%y-%m-%d","T","%H:%M:%S","Z",sep=""))
I got the output as NA instead. What went wrong? How should I correctly deal with date and time?
Use the correct format. "%y" is for 2-digit years (without the century). You need "%Y".
R> strptime("2007-06-11T09:15:35Z", "%Y-%m-%dT%H:%M:%SZ")
[1] "2007-06-11 09:15:35 CDT"

Resources