Recognize time in R - r

I am working on a dataset in R with a time variable like this:
Time = data.frame("X1"=c(930,1130,914,1615))
The first one/two digits of X1 refers to hour and the last two refers to minute. I want to make R recognize it as a time variable.
I try to use lubridate hm function but it didnt work probably because a ":" is missing between the hour and minute in my data.
I also thought about using str_sub function to separate the hour and minute first and then put them together with a ":" in between and finally use the lubridate function but I dont know how to extract the hour since sometimes it is presented as one digit but sometimes it is presented as two digits.
How do I make R recognize this as a time variable?
Thanks very much!

You could 0-pad to 4 digits and then format using standard R date tools:
as.POSIXct(sprintf("%04d",Time$X1), format="%H%M")
#[1] "2018-04-22 09:30:00 AEST" "2018-04-22 11:30:00 AEST"
#[3] "2018-04-22 09:14:00 AEST" "2018-04-22 16:15:00 AEST

This converts them to chron "times" class. Internally such variables are stored as a fraction of a day and are rendered on output as shown below. The sub inserts a : before the last 2 characters and :00 after them so that they are in HH:MM:SS format which times understands.
library(chron)
times(sub("(..)$", ":\\1:00", Time$X1))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00
It could also be done like this where we transform each to a fraction of a day:
with(Time, times( (X1 %/% 100) / 24 + (X1 %% 100) / (24 * 60) ))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00

Related

Lubridate - how to properly parse decimal minutes using parse_date_time [duplicate]

This question already has an answer here:
Converting numeric time to datetime POSIXct format in R
(1 answer)
Closed 2 years ago.
Let's say that you have a vector of times in minutes stored as decimal numbers. For this example, I am going to pick 2.5, which means 2 minutes and 30 seconds.
I want to parse this using lubridate's parse_date_time function. However, the output I see is wrong. After running this line:
parse_date_time(2.5, "M:S")
I am getting the following output:
"0000-01-01 00:02:05 UTC"
Instead of expected
"0000-01-01 00:02:30 UTC"
Somehow lubridate doesn't recognize that my .5 doesn't mean 5 seconds, but 30 seconds since two and a half minutes are 2 minutes and 30 seconds and not 2 minutes and 5 seconds.
What is the proper format of the parse_date_time I should use to get the expected output?
Here's a base R option :
val <- 2.5
as.POSIXct(val*60, origin = '0000-01-01', tz = 'UTC')
#[1] "0000-01-01 00:02:30 UTC"
if you want to use lubridate you can do something like this
library(lubridate)
time <- 2.5
as.period(as.duration(days(x=1))*(time/24))
#[1] "2H 30M 0S"
as.period(as.duration(days(x=1))*(time/1440))
#[1] "2M 30S"

R posixct dates and times not centering on midnight

I have dates and times stored in two columns. The first has the date as "20180831." The time is stored as the number of seconds from midnight; 3am would be stored as 10,800.
I need a combined date time column and am having a hard time with something that should be simple.
I can get the dates in no problem but lubridate "hms" interprets the time field as a period, not a 'time' per se.
I tried converting the date to posix.ct format and then using that as the origin for the time field but posix.ct does not set the time for midnight, instead it sets it for either 1800 or 1900 hours depending on the date. I need it set to midnight for all rows, I don't want any daylight savings time adjustment.
Here's the code:
First I made a function because there are several date and time fields I have to do this for.
mkdate<-function(x){
a<-as.Date(as.character(x),format='%Y%m%d')
a<-as.POSIXct(a)
return(a)
}
df$date<-mkdate(df$date) #applies date making function to date field
df$datetime<-as.POSIXct(df$time,origin=df$date)
I'm sure this has to do with time zones. I'm in Central time zone and I have experimented with adding the "tz" specification into these commands in both the mkdate function and in the time code creating "datetime" column.
I've tried:
tz="America/Chicago"
tz="CST"
tz="UTC"
Help would be much appreciated!
Edited with example:
x<-c(20180831,20180710,20160511,20170105,20180101) #these are dates.
as.POSIXct(as.Date(as.character(x),format="%Y%m%d"))
Above code converts dates to seconds from the Jan 1 1970. I could convert this to numeric and add my 'seconds' value to this field BUT it is not correct. This is what I see instead as the output:
[1] "2018-08-30 19:00:00 CDT" "2018-07-09 19:00:00 CDT" "2016-05-10 19:00:00 CDT" "2017-01-04 18:00:00 CST" "2017-12-31 18:00:00 CST"
Look at the first date - it should be 8/31 but instead it is 8/30. Somewhere in there there is a timezone adjustment taking place. It's moving the clock back 5 or 6 hours because I am on central time. The first entry should be 2018-08-31 00:00:00. I would then convert it to numeric and add the seconds field on and convert back to POSIXct format. I've tried including tz specification all over the place with no luck.
Sys.getlocale("LC_TIME")
returns "English_United States.1252"
I believe the following does what you want.
My locale is the following, so the results are different from yours.
Sys.getlocale("LC_TIME")
#[1] "Portuguese_Portugal.1252"
The difference will be due to the daylight savings time, the summer hour.
As for your problem, all you have to do is to remeber that the objects of class "POSIXct are coded as the number of seconds since an origin, and that origin is usually the midnight of 1970-01-01. So you have to add your seconds since midnight to the seconds of as.Date.
x <- "20180831"
xd <- mkdate(x)
y <- 10800
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01")
#[1] "2018-08-31 04:00:00 BST"
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01", tz = "America/Chicago")
#[1] "2018-08-30 22:00:00 CDT"
There are very many ways to do this:
mktime = function(a, b)modifyList(strptime(a, '%Y%m%d'), list(sec = as.numeric(gsub(',', '', b))))
mktime("20180831",'10,800')
[1] "2018-08-31 03:00:00 PDT"
mktime('20180301','10800')
[1] "2018-03-01 03:00:00 PST"
mktime('20180321','10800')
[1] "2018-03-21 03:00:00 PDT"
Looking at the above code, it does not adjust for the daylight saving time. Irrespective of the date, the seconds still show that it Is 3 AM, including the dates when ST-->DT. This will also take into consideration, your LOCAL timezone.

dates change after combining vectors of dates in R

I'm trying to combine two vectors of dates into a single vector. I have been using dates with the lubridate package.
First I create two vectors of dates:
library(lubridate)
mydate <- mdy("04/01/2016")
mydate_range <- mydate + (1:12)*months(1)
anotherdate_range <- mdy("05/01/2017") + (1:12)*months(1)
Inspecting mydate_range and anotherdate_range these seem to have worked fine.
But then when I try to combine these into one vector things get weird.
combineddates <- c(mydate_range, anotherdate_range)
combineddates
[1] "2016-04-30 19:00:00 CDT" "2016-05-31 19:00:00 CDT" "2016-06-30 19:00:00 CDT"
The first date of combineddates is now "2016-04-30". Before I combined them using the c() function the first date of mydate_range was "2016-05-01".
Not sure why this changed. How should I join these date vectors?
The reason for the date change is the conversion due to time zone adjustments. 2016-04-30 19:00:00 CDT is the same as 2016-05-01 GMT. Most likely your initial sequence was in GMT and somewhere along the way it got converted to local time.
I find it best to define the time zone in your initial definition and it should stay consistent throughout.

Problems using POSIXct with ">" and "<" in R

I'm getting inconsistent results when trying to subset data based on a date being before or after some POSIXct date and time. When I make a string of dates like this:
myDates <- c(as.POSIXct("2014-12-27 08:10:00 UTC"),
as.POSIXct("2014-12-27 08:15:00 UTC"),
as.POSIXct("2014-12-27 09:30:00 UTC"))
and then try to subset to find all the entries in myDates that were before 8:15 a.m. on Dec. 27, 2014 like this:
myDates[myDates < as.POSIXct("2014-12-27 08:15:00")]
that works fine and I get
"2014-12-27 08:10:00 PST"
(although I don't understand why it says "PST" for the time zone; that's where I am, but I set it to UTC).
However, my original date and time data were in Excel, where they were in numeric format. I imported them as a data.frame called Samples and converted the date and time column into POSIXct format by doing:
as.POSIXct(Samples$DateTime, origin = "1970-01-01", tz = "UTC")
Now, I'm having hair-pulling, head-onto-desk-bashing frustrations with subsetting those dates. Take one date in particular, x <- Samples$DateTime[34], which, according to the output R gives me, is "2014-12-27 08:10:00 UTC". If I check whether x < 2014-12-27 08:15, that should be true, and here's what I see:
x < as.POSIXct("2014-12-27 08:15:00 UTC")
TRUE
But x should NOT be less 2014-12-27 8:09:00 UTC, right? This is what I see:
X < as.POSIXct("2014-12-27 08:09:00 UTC")
TRUE
Why, for the love of Pete, does R tell me that 8:10 is before 8:09?!? This doesn't seem to be a problem for data that I just type in like above, only for data I've imported from Excel.
You probably need to get everything in the same timezone first. Try
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC", tz="UTC"))
#[1] 1419667800
# equivalent to "2014-12-27 08:10:00 UTC"
vs.
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC"))
#[1] 1419631800
# equivalent to 8:10 in local timezone - in my case Aust. EST.
# "2014-12-27 08:10:00 AEST"
You can see that they are actually numerically different.
To fix this, specify the tz= explicitly when importing as the "UTC" in your text strings will not be detected on input.
Also, be really careful with variable names. Likely you just slipped in typing it in here, but in the description of the problem and the first logical comparison you used x and in the second one you used X.
R is case sensitive, so it would not compare your date to the one stored in x. If anything else was stored in memory with X it may actually be that you were given the right answer for the question you asked.

Binning time series in R?

I'm new to R. My data has 600k objects defined by three attributes: Id, Date and TimeOfCall.
TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59.
I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on).
Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance!
While you could convert to a formal time representation, in this case it might be easier to just use substr:
test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1] 0 2 22
Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):
testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1] 0 2 22
You can use cut.POsixlt function. But you should coerce your data to a valid time object. here I am using handy hms from lubridate. And strftime to get the time format.
library(lubridate)
x <- c("09:10:01", "08:10:02", "08:20:02","06:10:03 ", "Collided at 9:20:04 pm")
x.h <- strftime(cut(as.POSIXct(hms(x),origin=Sys.Date()),'hours'),
format='%H:%M:%S')
data.frame(x,x.h)
x x.h
1 09:10:01 10:00:00
2 08:10:02 09:00:00
3 08:20:02 09:00:00
4 06:10:03 07:00:00
5 Collided at 9:20:04 pm 22:00:00

Resources