Date conversion without specifying the format - r

I do not understand how the "ymd" function from the library "lubridate" works in R. I am trying to build a feature which converts the date correctly without having to specify the format. I am checking for the minimum number of NA's occurring as a result of dmy(), mdy() and ymd() functions.
So ymd() is giving NA sometimes and sometimes not for the same Date value. Are there any other functions or packages in R, which will help me get over this problem.
> data$DTTM[1:5]
[1] "4-Sep-06" "27-Oct-06" "8-Jan-07" "28-Jan-07" "5-Jan-07"
> ymd(data$DTTM[1])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[2])
[1] "2027-10-06 UTC"
> ymd(data$DTTM[3])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[4])
[1] "2028-01-07 UTC"
> ymd(data$DTTM[5])
[1] NA
Warning message:
All formats failed to parse. No formats found.
>
> ymd(data$DTTM[1:5])
[1] "2004-09-06 UTC" "2027-10-06 UTC" "2008-01-07 UTC" "2028-01-07 UTC"
[5] "2005-01-07 UTC"
Thanks

#user1317221_G has already pointed out that you dates are in day-month-year format, which suggests that you should use dmy instead of ymd. Furthermore, because your month is in %b format ("Abbreviated month name in the current locale"; see ?strptime), your problem may have something to do with your locale. The month names you have seem to be English, which may differ from how they are spelled in the locale you are currently using.
Let's see what happens when I try dmy on the dates in my locale:
date_english <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07")
dmy(date_english)
# [1] "2006-09-04 UTC" NA "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
# Warning message:
# 1 failed to parse.
"27-Oct-06" failed to parse. Let's check my time locale:
Sys.getlocale("LC_TIME")
# [1] "Norwegian (Bokmål)_Norway.1252"
dmy does not recognize "oct" as a valid %b month in my locale.
One way to deal with this issue would be to change "oct" to the corresponding Norwegian abbreviation, "okt":
date_nor <- c("4-Sep-06", "27-Okt-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
dmy(date_nor)
# [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
Another possibility is to use the original dates (i.e. in their original 'locale'), and set the locale argument in dmy. Exactly how this is done is platform dependent (see ?locales. Here is how I would do it in Windows:
dmy(date_english, locale = "English")
[1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"

Using the guess_formats function in the lubridate package would be the closest to what you are after.
library(lubridate)
x <- c("4-Sep-06", "27-Oct-06","8-Jan-07" ,"28-Jan-07","5-Jan-2007")
format <- guess_formats(x, c("mdY", "BdY", "Bdy", "bdY", "bdy", "mdy", "dby"))
strptime(x, format)
HTH

from the documentation on ymd on page 70
As long as the order of formats is
correct, these functions will parse dates correctly even when the input vectors contain differently
formatted dates
ymd() expects year-month-day, you have day-month-year
x <- c("2009-01-01", "2009-01-02", "2009-01-03")
ymd(x)
maybe you need something like
y <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
as.POSIXct(y, format = "%d-%b-%y")
PS the reason I think you get NAs for some is that you only have a single digit for year and ymd doesn't know what to do with that, but it works when you have two digits for year e.g. "27-Oct-06" "28-Jan-07" but fails for "5-Jan-07" etc

Related

R timezone GMT and CET convert Unix timestamp

I got a Unix timestamp from a GMT-based database which I want to convert to datetime format. I use R in Germany.
Unix timestamp from GMT-based database: 1525732148
I convert it with:
install.packages("anytime")
library(anytime)
anytime(as.numeric(as.character(1525732148)))
Why do I get same date time, independent of system-date?
anytime(as.numeric(as.character(1525732148, tz="GMT")))
anytime(as.numeric(as.character(1525732148, tz="CET")))
Both give me as result: "2018-05-07 22:29:08 UTC"
I expected different results, because of different timezones.
The problem is as follows:
> as.character(1525732148, tz="GMT")
[1] "1525732148"
This is what anytime gets as input. The tz parameter is supposed to be passed to anytime, not to as.character, which swallows without warning any additional parameters it does not recognise. Try anytime(1525732148, tz="GMT").
> anytime::anytime(1525732148, tz="GMT")
[1] "2018-05-07 22:29:08 GMT"
> anytime::anytime(1525732148, tz="CET")
[1] "2018-05-08 00:29:08 CEST"
See: anytime

Handle durations but no dates

What is the best way to mainpulate only durations in R ? I mean I have a string vector like:
> test
[1] "00:04:06" "00:04:02" "00:04:16" "00:03:51" "00:03:55"
and I want to convert it to some specific class, which will understand these durations. I know I can use for example strptime:
> strptime(test, format = '%H:%M:%S')
[1] "2016-05-02 00:04:06 UTC" "2016-05-02 00:04:02 UTC" "2016-05-02 00:04:16 UTC" "2016-05-02 00:03:51 UTC" "2016-05-02 00:03:55 UTC"
but this creates a real dates vectors with today's date. I'd like to avoid it since this can cause troubles in the future for my application and this is a 'wrong' info.
Code:
require(lubridate)
test<-c("00:04:06", "00:04:02", "00:04:16", "00:03:51", "00:03:55")
t2<-lapply(test,lubridate::hms)
as.numeric(unlist(t2))
Output:
[1] 6 2 16 51 55

Why does lubridate appear to change time zones for two dates combined into a vector?

I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"
Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help

set date to posixct without knowing origin

I have a datetime supplied in the following format:
ex <-2008123118
and am trying to convert it to datetime format. I try:
mytime2 <- as.POSIXct(ex, format = "%Y%m%d%H")
but receive the error:
Error in as.POSIXct.numeric(ex, format = "%Y%m%d%H") :
'origin' must be supplied
The error seems straightfoward, but the only issue is what if I don't know the origin. For example, this part of my code will go into a much larger code that will process a number of files.
I would like my example to return:
2008-12-31 18:00
Is that possible?
There are two versions of as.POSIXct working behind the scenes, as.POSIXct.numeric and as.POSIXct.character, and you have tried to call both of them at once. Because that value is coming in as "numeric" it gets passed to as.POSIXct.numeric and that is actually the cause of the error here, because as.POSIXct.numeric has no detection algorithm for date-like values and also has no "format" parameter. Coerce it to "character and all is well:
> ex <-2008123118
> mytime2 <- as.POSIXct(as.character(ex), format = "%Y%m%d%H")
> mytime2
[1] "2008-12-31 18:00:00 PST"
Here is an example with lubridate with the ymd_h function:
library(lubridate)
ymd_h(ex)
[1] "2008-12-31 18:00:00 UTC"
here a base R solution
strptime(ex, format = "%Y%m%d%H")
[1] "2008-12-31 18:00:00 CET"
You can try the following code, in base R:
> ex <-2008123118
> as.POSIXct(strptime(ex, format = "%Y%m%d%H")) #this gives you additional seconds
"2008-12-31 18:00:00 EST"
> strftime(as.POSIXct(strptime(ex, format = "%Y%m%d%H")), format = "%Y-%m-%d %H:%M") #this gives your desired format
"2008-12-31 18:00"
Hope this helps.

ymd with vector of dates

A simple question, I think. I have some dates, d:
d <- as.POSIXct(c("2014-01-01 00:00:00 BST", "2014-01-01 00:30:00 BST"))
> class(d)
[1] "POSIXct" "POSIXt"
If I try and extract just the date part with lubridate, it works fine with a single value but not the whole vector, i.e.:
> ymd(d[1])
[1] "2014-01-01 UTC"
> ymd(d)
[1] NA NA
Warning message:
All formats failed to parse. No formats found.
For the record, this works:
> as.Date(d, format="%F")
[1] "2014-01-01" "2014-01-01"
What's going on here?
Your issue is that your vector is not just year, month, day (ymd), but also hour, minute, second (hms). Consider using this instead:
ymd_hms(d)
If you want to just extract the date, you can use:
strftime(ymd_hms(d),'%Y-%m-%d')

Resources