Handle durations but no dates - r

What is the best way to mainpulate only durations in R ? I mean I have a string vector like:
> test
[1] "00:04:06" "00:04:02" "00:04:16" "00:03:51" "00:03:55"
and I want to convert it to some specific class, which will understand these durations. I know I can use for example strptime:
> strptime(test, format = '%H:%M:%S')
[1] "2016-05-02 00:04:06 UTC" "2016-05-02 00:04:02 UTC" "2016-05-02 00:04:16 UTC" "2016-05-02 00:03:51 UTC" "2016-05-02 00:03:55 UTC"
but this creates a real dates vectors with today's date. I'd like to avoid it since this can cause troubles in the future for my application and this is a 'wrong' info.

Code:
require(lubridate)
test<-c("00:04:06", "00:04:02", "00:04:16", "00:03:51", "00:03:55")
t2<-lapply(test,lubridate::hms)
as.numeric(unlist(t2))
Output:
[1] 6 2 16 51 55

Related

Using anytime() function in R [duplicate]

This question already has an answer here:
Issue with the format in anytime package
(1 answer)
Closed yesterday.
I am trying to convert a string of date and time in R using anytime() function. The string have the format 'date-month-year hour-minute-second'. It seems that the anytime() function does not work for a few cases, as shown in the example below.
library(lubridate)
x1<-"03-01-2019 01:00:00"
x2<-"23-01-2019 17:00:00"
anytime(x1)
[1] "2019-03-01 01:00:00 CET"
anytime(x2)
[1] NA
I am trying to figure out how to get rid of this problem. Thanks for your help :)
You can use addFormats() to add a specific format to the stack known and used by anytime() (and/or anydate())
> library(anytime)
> addFormats("%d-%m-%Y %H:%M:%S")
> anytime(c("03-01-2019 01:00:00", "23-01-2019 17:00:00"))
[1] "2019-01-03 01:00:00 CST" "2019-01-23 17:00:00 CST"
>
As explained a few times before on this site, the xx-yy notation is ambiguous and interpreted differently in different parts of the world.
So anytime is guided by use as the separator: / is more common in North America so we use "mm/dd/yyy". On the other hand a hyphen is more common in Europe so the "dd-mm-yyyy" starts that way. You can use getFormats() to see the formats in anytime(). In a fresh session:
> head(getFormats(), 12) ## abbreviated for display here
[1] "%Y-%m-%d %H:%M:%S%f" "%Y-%m-%e %H:%M:%S%f" "%Y-%m-%d %H%M%S%f"
[4] "%Y-%m-%e %H%M%S%f" "%Y/%m/%d %H:%M:%S%f" "%Y/%m/%e %H:%M:%S%f"
[7] "%Y%m%d %H%M%S%f" "%Y%m%d %H:%M:%S%f" "%m/%d/%Y %H:%M:%S%f"
[10] "%m/%e/%Y %H:%M:%S%f" "%m-%d-%Y %H:%M:%S%f" "%m-%e-%Y %H:%M:%S%f"
>
You can use it without the head() to see all.
An alternative approach could be using parsedate package:
library(parsedate)
> parsedate::parse_date(x1)
[1] "2019-03-01 01:00:00 UTC"
> parsedate::parse_date(x2)
[1] "2019-01-23 17:00:00 UTC"

Numeric to DateTime format in R

I would like to know what is the best way to convert 201509122150 (numeric) to Date class within YYYY-MM-DD hh:mm format.
E.g.
x <- 201509122150
as.POSIXct(as.character(x), format="%Y%m%d%H%M")
# [1] "2015-09-12 21:50:00 CEST"
This can be easily done with lubridate
library(lubridate)
ymd_hm(x)
#[1] "2015-09-12 21:50:00 UTC"
data
x <- 201509122150

How to get the beginning of the day in POSIXct

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

Date conversion without specifying the format

I do not understand how the "ymd" function from the library "lubridate" works in R. I am trying to build a feature which converts the date correctly without having to specify the format. I am checking for the minimum number of NA's occurring as a result of dmy(), mdy() and ymd() functions.
So ymd() is giving NA sometimes and sometimes not for the same Date value. Are there any other functions or packages in R, which will help me get over this problem.
> data$DTTM[1:5]
[1] "4-Sep-06" "27-Oct-06" "8-Jan-07" "28-Jan-07" "5-Jan-07"
> ymd(data$DTTM[1])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[2])
[1] "2027-10-06 UTC"
> ymd(data$DTTM[3])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[4])
[1] "2028-01-07 UTC"
> ymd(data$DTTM[5])
[1] NA
Warning message:
All formats failed to parse. No formats found.
>
> ymd(data$DTTM[1:5])
[1] "2004-09-06 UTC" "2027-10-06 UTC" "2008-01-07 UTC" "2028-01-07 UTC"
[5] "2005-01-07 UTC"
Thanks
#user1317221_G has already pointed out that you dates are in day-month-year format, which suggests that you should use dmy instead of ymd. Furthermore, because your month is in %b format ("Abbreviated month name in the current locale"; see ?strptime), your problem may have something to do with your locale. The month names you have seem to be English, which may differ from how they are spelled in the locale you are currently using.
Let's see what happens when I try dmy on the dates in my locale:
date_english <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07")
dmy(date_english)
# [1] "2006-09-04 UTC" NA "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
# Warning message:
# 1 failed to parse.
"27-Oct-06" failed to parse. Let's check my time locale:
Sys.getlocale("LC_TIME")
# [1] "Norwegian (Bokmål)_Norway.1252"
dmy does not recognize "oct" as a valid %b month in my locale.
One way to deal with this issue would be to change "oct" to the corresponding Norwegian abbreviation, "okt":
date_nor <- c("4-Sep-06", "27-Okt-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
dmy(date_nor)
# [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
Another possibility is to use the original dates (i.e. in their original 'locale'), and set the locale argument in dmy. Exactly how this is done is platform dependent (see ?locales. Here is how I would do it in Windows:
dmy(date_english, locale = "English")
[1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
Using the guess_formats function in the lubridate package would be the closest to what you are after.
library(lubridate)
x <- c("4-Sep-06", "27-Oct-06","8-Jan-07" ,"28-Jan-07","5-Jan-2007")
format <- guess_formats(x, c("mdY", "BdY", "Bdy", "bdY", "bdy", "mdy", "dby"))
strptime(x, format)
HTH
from the documentation on ymd on page 70
As long as the order of formats is
correct, these functions will parse dates correctly even when the input vectors contain differently
formatted dates
ymd() expects year-month-day, you have day-month-year
x <- c("2009-01-01", "2009-01-02", "2009-01-03")
ymd(x)
maybe you need something like
y <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
as.POSIXct(y, format = "%d-%b-%y")
PS the reason I think you get NAs for some is that you only have a single digit for year and ymd doesn't know what to do with that, but it works when you have two digits for year e.g. "27-Oct-06" "28-Jan-07" but fails for "5-Jan-07" etc

Incorrect Conversion of Date as a Factor to a Date

I am having trouble calculating a date that is imported in from a .csv file. What I want to do is take that date in the factor DateClosed and generate a date in a date field (a). Example if a=203 I want the date to be the equivalent of DateClosed-203. However, I am having trouble with the code listed below.
DateClose is a factor.
> head(DateClosed)
[1] 7/30/2007 12/12/2007 5/8/2009 6/24/2009 6/24/2009 2/29/2008
165 Levels: 1/12/2010 1/15/2011 1/15/2013 1/17/2009 1/18/2008 1/19/2012 1/2/2013 1/21/2013 1/22/2010 1/24/2013 1/26/2014 ... 9/7/2010
> head(as.Date(DateClosed,format="%m/%d/%y"))
[1] "2020-07-30" "2020-12-12" "2020-05-08" "2020-06-24" "2020-06-24" "2020-02-29"
head(as.Date(DateClosed,format="%m/%d/%y"))-203
[1] "2020-01-09" "2020-05-23" "2019-10-18" "2019-12-04" "2019-12-04" "2019-08-10"
It subtracts 203 days correctly but for some reason reads the date wrong.
DateClosed <- factor(c("7/30/2007","12/12/2007", "5/8/2009"))
as.Date(DateClosed, format="%m/%d/%Y")
Produces:
[1] "2007-07-30" "2007-12-12" "2009-05-08"
Notice the capital "Y" in the format param. The lower case "y" is for 2 digit years, so as.Date reads the first two digits of the year token ("20"), and then assumes that refers to just the last two digits of the year, and adds the current date's century (also "20"), so you end up with dates in 2020.
Manipulating dates becomes really easy using lubridate package.
mdy(factor(c("7/30/2007","12/12/2007", "5/8/2009")))
"2007-07-30 UTC" "2007-12-12 UTC" "2009-05-08 UTC"
Or using parse_date_time with the same package:
parse_date_time(factor(c("7/30/2007","12/12/2007", "5/8/2009")),c('mdY'))
[1] "2007-07-30 UTC" "2007-12-12 UTC" "2009-05-08 UTC"

Resources