Handling integer times in R - r

Times in my data frame are recorded as integers as in: 1005,1405,745,1130,2030 etc. How do I convert these integers so R will understand and use it in functions such as strptime. Thanks in advance for your help

Solution using strptime()
As was pointed out by Psidom in his comment, you can convert the integers to character and use strptime():
int_times <- c(1005,1405,745,1130,2030)
strptime(as.character(int_times), format="%H%M")
## [1] "2016-04-21 10:05:00 CEST" "2016-04-21 14:05:00 CEST" NA
## [4] "2016-04-21 11:30:00 CEST" "2016-04-21 20:30:00 CEST"
However, as you can see, you run into trouble as soon as the number has only three digits. You can get around this by using formatC() to format the integers to character with four digits and a leading zero (if needed):
char_times <- formatC(int_times, flag = 0, width = 4)
char_times
[1] "1005" "1405" "0745" "1130" "2030"
Now, conversion works:
strptime(char_times, format="%H%M")
## [1] "2016-04-21 10:05:00 CEST" "2016-04-21 14:05:00 CEST" "2016-04-21 07:45:00 CEST"
## [4] "2016-04-21 11:30:00 CEST" "2016-04-21 20:30:00 CEST"
Note that strptime() always returns a POSIXct object that involves time and date. Since no data was given, the current day was used. But you could also use paste() to combine the times with any date:
strptime(paste("2010-03-21", char_times), format="%Y-%m-%d %H%M")
## [1] "2010-03-21 10:05:00 CET" "2010-03-21 14:05:00 CET" "2010-03-21 07:45:00 CET"
## [4] "2010-03-21 11:30:00 CET" "2010-03-21 20:30:00 CET"
Solution using lubridate::hm()
As was suggested by Richard Telford in his comment, you could also make use of lubridate's period class, if you prefer not to have any date involved. This class is for periods of times and thus you could represent a clock time, say 10:23, as the period 10 hours, 23 minutes. However, simply using hm() from lubridate does not work:
library(lubridate)
hm(char_times)
## [1] NA NA NA NA NA
## Warning message:
## In .parse_hms(..., order = "HM", quiet = quiet) :
## Some strings failed to parse
The reason is that without a separator, it is not clear how these times should be converted. hm() just expects a representation that has hours before minutes. But "1005" could be 100 hours and 5 minutes just as well as 1 hour and 5 minutes. So you need to introduce a separation between hours and minutes, which you could do for instance as follows:
char_times2 <- paste(substr(char_times, 1, 2), substr(char_times, 3, 4))
hm(char_times2)
## [1] "10H 5M 0S" "14H 5M 0S" "7H 45M 0S" "11H 30M 0S" "20H 30M 0S"
Note that I have again used the fixed width string represantation char_times, because then the hours are always given by the first two characters. This makes it easy to use substr().

Related

lubridate: Parsing dates of the form '27.10.2013 02A:00' (daylight savings time)

I am trying to parse strings of the form 25.10.2013 17:30 (the timezone is CET/CEST but this is not specified in the strings themselves) as POSIXct using lubridates dmy_hm(..., tz = 'Europe/Brussels') function.
I have the problem that after parsing, there are duplicate values on the day CEST switches to CET (the clock jumps one hour back). The cause seems to be the way this shift is indicated in my data: 02A:00 for 2 o'clock CEST and 02B:00 2 o'clock CET, which is one hour later. dmy_hm(..., tz = 'Europe/Brussels') interprets both as CET.
Minimal working example:
> library(lubridate)
> times = c("27.10.2013 01:00", "27.10.2013 02A:00",
"27.10.2013 02B:00", "27.10.2013 03:00")
> times = dmy_hm(times, tz = "Europe/Brussels")
> times
[1] "2013-10-27 01:00:00 CEST" "2013-10-27 02:00:00 CET"
[3] "2013-10-27 02:00:00 CET" "2013-10-27 03:00:00 CET"
My question is: What would be the best way to fix the "wrong" dates?
I tried to use which(duplicated(times)) to find the indices of the duplicate values and remove one hour from the "wrong" values, however there seems to be another problem:
> times[2] - hours(1)
[1] "2013-10-27 01:00:00 CEST"
Why does removing one hour from '"2013-10-27 02:00:00 CET"' bring me to '"2013-10-27 01:00:00 CEST"'? Isn't that a jump of two hours? I would expect to land at '"2013-10-27 02:00:00 CEST"'.
EDIT: The last part is a know issue (see https://github.com/tidyverse/lubridate/issues/498). The solution is to use dhours() instead of hours()
> times[2] - dhours(1)
[1] "2013-10-27 02:00:00 CEST"

Parsing 12-hour times using lubridate

I'm trying to get my head around parsing 12-hour times using lubridate. If I run
library(lubridate)
times <- c("1:30 AM", "6:29 AM", "6:59 AM", "9:54 AM", "2:45 PM")
hm(times)
I get
[1] "1H 30M 0S" "6H 29M 0S" "6H 59M 0S" "9H 54M 0S" "2H 45M 0S"
Note that the AM/PM designation is not used. However, if if the time strings also includes a date then the parsing works
ymd_hm(paste("01-01-01", times))
[1] "2001-01-01 01:30:00 UTC" "2001-01-01 06:29:00 UTC"
[3] "2001-01-01 06:59:00 UTC" "2001-01-01 09:54:00 UTC"
[5] "2001-01-01 14:45:00 UTC"
It seems to me that the time parsing functions: hm, hms, ... doesn't recognize the AM/PM, but the date functions do. Is it possible to allow for 12-hour parsing without going through the dates?
[I know I can do this by parsing the strings but I was wondering it it was possible within lubidate]
The two objects belong to different classes each one designed for a specific purpose.
With the first function you create a period class object. This kind of class if designed to represent times, like time of a race, or "how many hours Bolt runs 100 meters?" 0 hours 0 minutes 9 seconds 58 and so on.
See:
a <- hm(times)
class(a)
[1] "Period"
attr(,"package")
[1] "lubridate"
The second object with the function ymd_hm creates an object of class:
b <- ymd_hm(paste("01-01-01", times))
class(b)
[1] "POSIXct" "POSIXt"
This class of object is designed to represent "time", in the sense of Gregorian calendar (or maybe other kind of calendars). It does parse also AM/PM that are vital to differentiate hours of the day in a 12 hours clock.

Best way to deal with differing date data [duplicate]

I am trying to do some simple operation in R, after loading a table i encountered a date column which has many formats combined.
**Date**
1/28/14 6:43 PM
1/29/14 4:10 PM
1/30/14 12:09 PM
1/30/14 12:12 PM
02-03-14 19:49
02-03-14 20:03
02-05-14 14:33
I need to convert this to format like 28-01-2014 18:43 i.e. %d-%m-%y %h:%m
I tried this
tablename$Date <- as.Date(as.character(tablename$Date), "%d-%m-%y %h:%m")
but doing this its filling NA in the entire column. Please help me to get this right!
The lubridate package makes quick work of this:
library(lubridate)
d <- parse_date_time(dates, names(guess_formats(dates, c("mdy HM", "mdy IMp"))))
d
## [1] "2014-01-28 18:43:00 UTC" "2014-01-29 16:10:00 UTC"
## [3] "2014-01-30 12:09:00 UTC" "2014-01-30 12:12:00 UTC"
## [5] "2014-02-03 19:49:00 UTC" "2014-02-03 20:03:00 UTC"
## [7] "2014-02-05 14:33:00 UTC"
# put in desired format
format(d, "%m-%d-%Y %H:%M:%S")
## [1] "01-28-2014 18:43:00" "01-29-2014 16:10:00" "01-30-2014 12:09:00"
## [4] "01-30-2014 12:12:00" "02-03-2014 19:49:00" "02-03-2014 20:03:00"
## [7] "02-05-2014 14:33:00"
You'll need to adjust the vector in guess_formats if you come across other format variations.

Changing dates in different time zones by adding to POSIXlt

I am running into an error when I try to localize times for "date" (a variable of class=POSIXlt) in my dataset. Example code is as follows:
# All dates are coded by survey software in EST(not local time)
date <- c("2011-07-26 07:23", "2011-07-29 07:34", "2011-07-29 07:40")
region <-c("USA-EST", "UK", "Singapore")
#Change the times based on time-zone differences
start_time<-strptime(date,"%Y-%m-%d %h:%m")
localtime=as.POSIXlt(start_time)
localtime<-ifelse(region=="UK",start_time+6,start_time)
localtime<-ifelse(region=="Singapore",start_time+12,start_time)
#Then, I need to extract the hour and weekday
weekday<-weekdays(localtime)
hour<-factor(localtime)
There must be something wrong with my "ifelse" statement, because I get the error: number of items to replace is not a multiple of replacement length. Please help!
How about using R's native time code? The trick is that you can't have more than one time-zone in a POSIX vector, so use a list instead:
region <- c("EST","Europe/London","Asia/Singapore")
(localtime <- lapply(seq(date),function(x) as.POSIXlt(date[x],tz=region[x])))
[[1]]
[1] "2011-07-26 07:23:00 EST"
[[2]]
[1] "2011-07-29 07:34:00 Europe/London"
[[3]]
[1] "2011-07-29 07:40:00 Asia/Singapore"
And to convert to a vector in a single timezone:
Reduce("c",localtime)
[1] "2011-07-26 13:23:00 BST" "2011-07-29 07:34:00 BST"
[3] "2011-07-29 00:40:00 BST"
Note that my system timezone is BST, but if yours is EST it will convert to that.
You can use the timezone handling built in in POSIXct:
> start_time <- as.POSIXct(date,"%Y-%m-%d %H:%M", tz = "America/New_York")
> start_time
[1] "2011-07-26 07:23:00 EDT" "2011-07-29 07:34:00 EDT" "2011-07-29 07:40:00 EDT"
> format(start_time, tz="Europe/London", usetz=TRUE)
[1] "2011-07-26 12:23:00 BST" "2011-07-29 12:34:00 BST" "2011-07-29 12:40:00 BST"
> format(start_time, tz="Asia/Singapore", usetz=TRUE)
[1] "2011-07-26 19:23:00 SGT" "2011-07-29 19:34:00 SGT" "2011-07-29 19:40:00 SGT"

Round a POSIX date (POSIXct) with base R functionality

I'm currently playing around a lot with dates and times for a package I'm building.
Stumbling across this post reminded me again that it's generally not a bad idea to check out if something can be done with basic R features before turning to contrib packages.
Thus, is it possible to round a date of class POSIXct with base R functionality?
I checked
methods(round)
which "only" gave me
[1] round.Date round.timeDate*
Non-visible functions are asterisked
This is what I'd like to do (Pseudo Code)
x <- as.POSIXct(Sys.time())
[1] "2012-07-04 10:33:55 CEST"
round(x, atom="minute")
[1] "2012-07-04 10:34:00 CEST"
round(x, atom="hour")
[1] "2012-07-04 11:00:00 CEST"
round(x, atom="day")
[1] "2012-07-04 CEST"
I know this can be done with timeDate, lubridate etc., but I'd like to keep package dependencies down. So before going ahead and checking out the source code of the respective packages, I thought I'd ask if someone has already done something like this.
base has round.POSIXt to do this. Not sure why it doesn't come up with methods.
x <- as.POSIXct(Sys.time())
x
[1] "2012-07-04 10:01:08 BST"
round(x,"mins")
[1] "2012-07-04 10:01:00 BST"
round(x,"hours")
[1] "2012-07-04 10:00:00 BST"
round(x,"days")
[1] "2012-07-04"
On this theme with lubridate, also look into the ceiling_date() and floor_date() functions:
x <- as.POSIXct("2009-08-03 12:01:59.23")
ceiling_date(x, "second")
# "2009-08-03 12:02:00 CDT"
ceiling_date(x, "hour")
# "2009-08-03 13:00:00 CDT"
ceiling_date(x, "day")
# "2009-08-04 CDT"
ceiling_date(x, "week")
# "2009-08-09 CDT"
ceiling_date(x, "month")
# "2009-09-01 CDT"
If you don't want to call external libraries and want to keep POSIXct as I do this is one idea (inspired by this question): use strptime and paste a fake month and day. It should be possible to do it more straight forward, as said in this comment
"For strptime the input string need not specify the date completely:
it is assumed that unspecified seconds, minutes or hours are zero, and
an unspecified year, month or day is the current one."
Thus it seems that you have to use strftime to output a truncated string, paste the missing part and convert again in POSIXct.
This is how an update answer could look:
x <- as.POSIXct(Sys.time())
x
[1] "2018-12-27 10:58:51 CET"
round(x,"mins")
[1] "2018-12-27 10:59:00 CET"
round(x,"hours")
[1] "2018-12-27 11:00:00 CET"
round(x,"days")
[1] "2018-12-27 CET"
as.POSIXct(paste0(strftime(x,format="%Y-%m"),"-01")) #trunc by month
[1] "2018-12-01 CET"
as.POSIXct(paste0(strftime(x,format="%Y"),"-01-01")) #trunc by year
[1] "2018-01-01 CET"

Resources