I would like to find the mid time between time 1 and time 2 in R. I already have the duration in hours. Sometimes it's overnight (row 1) and sometimes not (row 2).
Here are the first two rows in the data frame
dat <- data.frame(id=1:2, tm1=c("23:00","01:00"), tm2=c("07:00","06:00"), dur=c("8.0","5.0"))
So in row 1 the mid time should be 3:00 and in row 2 it should be 03:30.
you can just add half of the duration
format(strptime(dat$tm1,format = '%H:%M') + dat$dur*3600/2, format = '%H:%M')
Here is one way -
library(dplyr)
dat %>%
mutate(across(starts_with('tm'), as.POSIXct, tz = 'UTC', format = '%H:%M'),
tm2 = if_else(tm1 > tm2, tm2 + 86400, tm2),
midtime = tm1 + difftime(tm2, tm1, units = 'secs')/2,
across(c(starts_with('tm'), midtime), format, '%H:%M'))
# id tm1 tm2 dur midtime
#1 1 23:00 07:00 8.0 03:00
#2 2 01:00 06:00 5.0 03:30
The logic is -
Convert tm1 and tm2 to POSIXct class. This will add today's date in both the columns.
Now if tm1 > tm2 (like in row 1) add 1 day to tm2.
Subtract the two times, divide it by 2 and add the difference to tm1 to get midtime.
Finally change the time in '%H:%M' format in all columns.
my data frame looks like:
set.seed(1)
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
It is fair easy to count number of observations per 5 mins by using cut
df <- data.frame(table(cut(MyDates, breaks = "5 mins")))
It gives me intervals as
00:00:00 -- 00:05:00,
00:05:00 -- 00:10:00
But how about if I want to get a 'customized' intervals as
00:00:00 -- 00:05:00,
00:01:00 -- 00:06:00,
00:02:00 -- 00:07:00
Any help would be appreciated!
You just need to pass a vector of numeric values to cut.
An example:
data.frame(table(cut(MyDates,
c(min(MyDates), ## "Leftmost" cut
min(MyDates) + 5000, ## Custom cut 1
min(MyDates) + 17000, ## Custom cut 2
min(MyDates) + 65000),## "Rightmost" cut
right = TRUE))) ## let cut() know where should the infinite be closed or open.
I have data of the following form:
DateTime | Var1
11/01/2016 06:01 | 0
11/01/2016 06:02 | 0.70
...
...
11/01/2016 23:59 | 35.08
11/02/2016 00:01 | 33.29
...
11/02/2016 06:00 | 24.62
...
11/30/2016 23:59 | 42.08
12/01/2016 00:01 | 39.79
....
I have ~5 months data. I have to subset the data from 6:00am of 1 day to just before 6:00am of next day. I can use the following code to subset the data once I have the dates in hand, but how to automatically obtain all the successive dates from the input data?
Date1 <- as.integer(as.POSIXct(Date1))
Date2 <- as.integer(as.POSIXct(Date2))
subset <- subset(data, as.integer(as.POSIXct(data$txtime)) >= Date1 & as.integer(as.POSIXct(data$txtime)) < Date2)
Right now, I can use to following code to obtain successive dates within a month, but this won't work for the last day of the month, where part of the data to be subsetted is on the first day of the next month. So I can't do it automatically for the duration 6:00am 30th November - 5:59am 1st December. Also, the code is not fully automated, as the number of days (used in the loop) varies across months.
for (dateofmonth in c(1:29)) {
Date1 <- paste("2016-11-", dateofmonth, ' 06:00:00', sep = '')
Date2 <- paste("2016-11-", (dateofmonth+1), ' 06:00:00', sep = '')
}
There is possibly an easier way to do this, but I can't figure it out. Please suggest.
Try this:
datelist <- split(data, as.Date(as.POSIXct(data$txtime)-21600))
This will shift your time 6 hours backwards, and then split your data by date. So that each sub dataframe will contain times from 6:00 am in that date to 5:59 am in next day.
Considering the data collected with 5 minutes time interval with a numeric variable a,and a discret variable acc, which represents if there's any incident happened(0 for no incident while 1 for incident):
a<-c(1:(288*4))
t<-seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '5 min')
acc<-rep(0,288*4)
df<-data.frame(t,a,acc)
Now I have another data set which has the time(accurates to 1 sec) at which the incidents happened during the collection period:
T<-sample(seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '1 sec'),size = 5)
I want to mark the nearest 2 prior observation's acc as 1 according to the time in T. For example, if the incident happened at 2016-01-02 07:13:23, the observations' acc with t of 2016-01-02 07:05:00 and 2016-01-02 07:10:00 are marked as 1
How could I manage to do this?
ind <- findInterval(T, df$t)
df$acc[c(ind, ind + 1)] <- 1
One way could be:
library(lubridate)
df$acc=apply(sapply(T,function(x) x %within% interval((df$t - minutes(4)-seconds(59)),(df$t + minutes(4)+seconds(59)))),1,sum)
lubridate allows for the easy manipulation of dates, minutes(x) and seconds(x) adds x minutes or second to a period object.
interval() is used to create a time interval confined by the time in df$t ± 4min59s.
sapply() is used to check if any of the time in T is within the interval.
apply() is used to collapse the results of sapply() (it outputs 1 column for each element in T)
If T contains a value that is exactly equal to one in df$t such as 2016-01-04 12:05:00 CET this will only put 1 for this one.
I have data which includes Date as well as Time enter and Time exit. These latter two contain data like this: 08:02, 12:02, 23:45 etc.
I would like to manipulate the Time eXXX data - for example, substract Time enter from Time exit to work out duration, or plot the distributions of Time enter and Time exit, e.g. to see if most entries are before 10:00, or if most exits are after 17:00.
All the packages I've looked at require a date to precede the time, e.g. 01/02/2012 12:33.
Is this possible, or should I simply append an identical date to every time for the sake of calculations? This seem a bit messy!
Use the "times" class found in the chron package:
library(chron)
Enter <- c("09:12", "17:01")
Enter <- times(paste0(Enter, ":00"))
Exit <- c("10:15", "18:11")
Exit <- times(paste0(Exit, ":00"))
Exit - Enter # durations
sum(Enter < "10:00:00") # no entering before 10am
mean(Enter < "10:00:00") # fraction entering before 10am
sum(Exit > "17:00:00") # no exiting after 5pm
mean(Exit > "17:00:00") # fraction exiting after 5pm
table(cut(hours(Enter), breaks = c(0, 10, 17, 24))) # Counts for indicated hours
## (0,10] (10,17] (17,24]
## 1 1 0
table(hours(Enter)) # Counts of entries each hour
## 9 17
## 1 1
stem(hours(Enter), scale = 2)
## The decimal point is at the |
## 9 | 0
## 10 |
## 11 |
## 12 |
## 13 |
## 14 |
## 15 |
## 16 |
## 17 | 0
Graphics:
tab <- c(table(Enter), -table(Exit)) # Freq at each time. Enter is pos; Exit is neg.
plot(times(names(tab)), tab, type = "h", xlab = "Time", ylab = "Freq")
abline(v = c(10, 17)/24, col = "red", lty = 2) # vertical red lines
abline(h = 0) # X axis
Thanks for the feedback and sorry for the confusion I have edited it a bit to clarify.
New Edit:
First, chron package and strptime with fixed format both work well as demonstrated in other answers. I just want to introduce lubridate a little bit since it's easier to use, and flexible with time format.
Example data
df <- data.frame(TimeEnterChar = c(rep("07:58", 10), "08:02", "08:03", "08:05", "08:10", "09:00"),
TimeExitChar = c("16:30", "16:50", "17:00", rep("17:02", 10), "17:30", "18:59"),
stringsAsFactors = F)
If all you want is to count how many entry time were later than 8:00, then you can compare the character directly. Below would should 5 entry time were later.
sum(df$TimeEnterChar > "08:00")
If you want more, personally, I like lubridate package when dealing with time data, especially timestamps with dates although it's not the focus of this post at all.
library(lubridate)
# Convert character to a "Period" class by lubridate, shows in form of H M S
df$TimeEnterTime <- hm(df$TimeEnterChar)
df$TimeExitTime <- hm(df$TimeExitChar)
head(df)
sum(df$TimeEnterTime > hm("08:00"))
You can still compare the time.
A little more about using them as numeric:
I assume only minute-level time is wanted. Thus, I divided number of seconds by 60 to get number of minutes.
df$DurationMinute <- as.numeric( df$TimeExitTime - df$TimeEnterTime )/60
hist(df$DurationMinute, breaks = seq(500, 600, 5))
head(df)
TimeEnterChar TimeExitChar TimeEnterTime TimeExitTime DurationMinute
1 07:58 16:30 7H 58M 0S 16H 30M 0S 512
2 07:58 16:50 7H 58M 0S 16H 50M 0S 532
3 07:58 17:00 7H 58M 0S 17H 0M 0S 542
4 07:58 17:02 7H 58M 0S 17H 2M 0S 544
5 07:58 17:02 7H 58M 0S 17H 2M 0S 544
6 07:58 17:02 7H 58M 0S 17H 2M 0S 544
You can simply plot a histogram to see the distribution of time duration between entry and exit.
You can also look at the distribution of entry/exit time. But some effort is needed to convert the axis.
df$TimeEnterNumMin <- as.numeric(df$TimeEnterTime) / 60
df$TimeExitNumMin <- as.numeric(df$TimeExitTime) / 60
hist(df$TimeEnterNumMin, breaks = seq(0, 1440, 60), xaxt = 'n', main = "Whole by 1hr")
axis(side = 1, at = seq(0, 1440, 60), labels = paste0(seq(0, 24, 1), ":00"))
hist(df$TimeEnterNumMin, breaks = seq(420, 600, 15), xaxt = 'n', main = "Morning by 15min")
axis(side = 1, at = seq(420, 600, 60), labels = paste0(seq(7, 10, 1), ":00"))
I did not polish the plot, nor make the axis flexible. Please do based on your needs. Hopefully, it helps.
Below is old useless post: (no need to read. kept so that comments don't look weird)
Came across a similar issue and was inspired by this post. #G. Grothendieck and #David Arenburg provided great answers for transforming the time.
For comparison, I feel forcing the time into numeric helps. Instead of comparing "11:22:33" with "9:00:00", comparing as.numeric(hms("11:22:33")) (which is 40953 seconds) and as.numeric(hms("9:00:00")) (32400) would be much easier.
as.numeric(hms("11:22:33")) > as.numeric(hms("9:00:00")) & as.numeric(hms("11:22:33")) < as.numeric(hms("17:00:00"))
[1] TRUE
The above example shows 11:22:33 is between 9AM and 5PM.
To extract just time from the date or POSIXct object, substr("2013-10-01 11:22:33 UTC", 12, 19) should work, although it looks stupid to change a time object to string/character and back to time again.
Converting the time to numeric should work for plotting as #G. Grothendieck descirbed. You can convert the numbers back to time as needed for x axis labels.
Would something like that work?
SubstracTimes <- function(TimeEnter, TimeExit){
(as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%H")) +
as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%M"))/60) -
(as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%H")) +
as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%M"))/60)
}
Testing:
TimeEnter <- "08:02"
TimeExit <- "12:02"
SubstracTimes(TimeEnter, TimeExit)
> SubstracTimes(TimeEnter, TimeExit)
[1] 4