Calculation of the maximum duration over threshold in R (timeseries) - r

I have a xts-timeseries temperature data in 5 min resolution.
head(dataset)
Time Temp
2016-04-26 10:00:00 6.877
2016-04-26 10:05:00 6.877
2016-04-26 10:10:00 6.978
2016-04-26 10:15:00 6.978
2016-04-26 10:20:00 6.978
I want to calculate the longest duration the temperature exceeds a certain threshold. (let's say 20 °C)
I want to calculate all the periods with their duration the temperature exceeds a certain threshold.
I create a data.frame from my xts-data:
df=data.frame(Time=index(dataset),coredata(dataset))
head(df)
Time Temp
1 2016-04-26 10:00:00 6.877
2 2016-04-26 10:05:00 6.877
3 2016-04-26 10:10:00 6.978
4 2016-04-26 10:15:00 6.978
5 2016-04-26 10:20:00 6.978
6 2016-04-26 10:25:00 7.079
then I create a subset with only the data that exceeds the threshold:
sub=(subset(x=df,subset = df$Temp>20))
head(sub)
Time Temp
7514 2016-05-22 12:05:00 20.043
7515 2016-05-22 12:10:00 20.234
7516 2016-05-22 12:15:00 20.329
7517 2016-05-22 12:20:00 20.424
7518 2016-05-22 12:25:00 20.615
7519 2016-05-22 12:30:00 20.805
But now im having trouble to calculate the duration of the event the temperature exceeds the threshold. I dont know how to identify a connected period and calculate their duration?
I would be happy if you have a solution for this question (it's my first thread so please excuse minor mistakes) If you need more information on my data, feel free to ask.

This may work. I take as example this data:
df <- structure(list(Time = structure(c(1463911500, 1463911800, 1463912100,
1463912400, 1463912700, 1463913000), class = c("POSIXct", "POSIXt"
), tzone = ""), Temp = c(20.043, 20.234, 6.329, 20.424, 20.615,
20.805)), row.names = c(NA, -6L), class = "data.frame")
> df
Time Temp
1 2016-05-22 12:05:00 20.043
2 2016-05-22 12:10:00 20.234
3 2016-05-22 12:15:00 6.329
4 2016-05-22 12:20:00 20.424
5 2016-05-22 12:25:00 20.615
6 2016-05-22 12:30:00 20.805
library(dplyr)
df %>%
# add id for different periods/events
mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>%
# keep only periods with high temperature
filter(tmp_Temp) %>%
# for each period/event, get its duration
group_by(id) %>%
summarise(event_duration = difftime(last(Time), first(Time)))
id event_duration
<int> <time>
1 1 5 mins
2 3 10 mins

Related

R Populate a datetime column starting from last row

I have a dataframe that I need to add a column of datetime to. It is recording water levels every hour for 2 years. The original data frame has the wrong dates and times. i.e. the dates say 2015 instead of 2020. The date and month are also wrong. I do not know the original start date and time. However, I know the date and time of the very last recording (28-03-2022 14:00:00). I need to calculate a column from the bottom to the top to figure out the original start date.
Current Code
I have this code which populates the dates from a known start date (i.e. top down), but I want to population the data from down up. Is these a way to alter this or another solution??
# recalculate date to correct date
# set start dates
startDate5 <- as.POSIXct("2020-03-05 17:00:00")
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
# find length of dataframe to populate required rows.
len5 <- max(dataList$`HMB 5`$Rec)
lenMere <- max(dataList$`HM SSSI 4`$Rec)
# calculate new date column
dataList$`HMB 5`$DateTimeNew <- seq(startDate5, by='hour', length.out=len5)
dataList$`HM SSSI 4`$DateTimeNew <-seq(startDateMere, by='hour', length.out=lenMere)
Current dataframe - top 10 rows
structure(list(Rec = 1:10, DateTime = structure(c(1436202000,
1436205600, 1436209200, 1436212800, 1436216400, 1436220000, 1436223600,
1436227200, 1436230800, 1436234400), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Temperature = c(16.59, 16.49, 16.74, 17.14,
17.47, 17.71, 18.43, 18.78, 19.06, 19.18), Pressure = c(1050.64,
1050.86, 1051.28, 1051.56, 1051.48, 1051.2, 1051.12, 1050.83,
1050.83, 1050.76), DateTimeNew = structure(c(1594051200L, 1594054800L,
1594058400L, 1594062000L, 1594065600L, 1594069200L, 1594072800L,
1594076400L, 1594080000L, 1594083600L), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
Desired Output
This is what the desired output looks like: The date I know is correct for example is '2020-07-07 02:00:00' (e.g. value in 10th row, final column). And I need to figure out the rest of the column from this value.
NB: I do not actually know what the original start date is (2020-07-06 17:00:00) should be. Its just illustrative.
Here's a sequence method:
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
new_date = seq(startDateMere, length.out = nrow(data), by = "-1 hour")
data$result = rev(new_date)
data
# Rec DateTime Temperature Pressure DateTimeNew result
# 1 1 2015-07-06 17:00:00 16.59 1050.64 2020-07-06 12:00:00 2020-07-06 08:00:00
# 2 2 2015-07-06 18:00:00 16.49 1050.86 2020-07-06 13:00:00 2020-07-06 09:00:00
# 3 3 2015-07-06 19:00:00 16.74 1051.28 2020-07-06 14:00:00 2020-07-06 10:00:00
# 4 4 2015-07-06 20:00:00 17.14 1051.56 2020-07-06 15:00:00 2020-07-06 11:00:00
# 5 5 2015-07-06 21:00:00 17.47 1051.48 2020-07-06 16:00:00 2020-07-06 12:00:00
# 6 6 2015-07-06 22:00:00 17.71 1051.20 2020-07-06 17:00:00 2020-07-06 13:00:00
# 7 7 2015-07-06 23:00:00 18.43 1051.12 2020-07-06 18:00:00 2020-07-06 14:00:00
# 8 8 2015-07-07 00:00:00 18.78 1050.83 2020-07-06 19:00:00 2020-07-06 15:00:00
# 9 9 2015-07-07 01:00:00 19.06 1050.83 2020-07-06 20:00:00 2020-07-06 16:00:00
# 10 10 2015-07-07 02:00:00 19.18 1050.76 2020-07-06 21:00:00 2020-07-06 17:00:00

How to subset data by specific hours of interest?

I have a dataset of temperature values taken at specific datetimes across five locations. For whatever reason, sometimes the readings are every hour, and some every four hours. Another issue is that when the time changed as a result of daylight savings, the readings are off by one hour. I am interested in the readings taken every four hours and would like to subset these by day and night to ultimately get daily and nightly mean temperatures.
To summarise, the readings I am interested in are either:
0800, 1200, 1600 =day
2000, 0000, 0400 =night
Recordings between 0800-1600 and 2000-0400 each day should be averaged.
During daylight savings, the equivalent times are:
0900, 1300, 1700 =day
2100, 0100, 0500 =night
Recordings between 0900-1700 and 2100-0500 each day should be averaged.
In the process, I am hoping to subset by site.
There are also some NA values or blank cells which should be ignored.
So far, I tried to subset by one hour of interest just to see if it worked, but haven't got any further than that. Any tips on how to subset by a series of times of interest? Thanks!
temperature <- read.csv("SeaTemperatureData.csv",
stringsAsFactors = FALSE)
temperature <- subset(temperature, select=-c(X)) #remove last column that contains comments, not needed
temperature$Date.Time < -as.POSIXct(temperature$Date.Time,
format="%d/%m/%Y %H:%M",
tz="Pacific/Auckland")
#subset data by time, we only want to include temperatures recorded at certain times
temperature.goat <- subset(temperature, Date.Time==c('01:00:00'), select=c("Goat.Island"))
Date.Time Goat.Island Tawharanui Kawau Tiritiri Noises
1 2019-06-10 16:00:00 16.820 16.892 16.749 16.677 15.819
2 2019-06-10 20:00:00 16.773 16.844 16.582 16.654 15.796
3 2019-06-11 00:00:00 16.749 16.820 16.749 16.606 15.819
4 2019-06-11 04:00:00 16.487 16.796 16.654 16.558 15.796
5 2019-06-11 08:00:00 16.582 16.749 16.487 16.463 15.867
6 2019-06-11 12:00:00 16.630 16.773 16.725 16.654 15.867
One possible solution is to extract hours from your DateTime variable, then filter for particular hours of interest.
Here a fake example over 4 days:
library(lubridate)
df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
Value = sample(1:100,97, replace = TRUE))
DateTime Value
1 2020-02-01 00:00:00 99
2 2020-02-01 01:00:00 51
3 2020-02-01 02:00:00 44
4 2020-02-01 03:00:00 49
5 2020-02-01 04:00:00 60
6 2020-02-01 05:00:00 56
Now, you can extract hours with hour function of lubridate and subset for the desired hour:
library(lubridate)
subset(df, hour(DateTime) == 5)
DateTime Value
6 2020-02-01 05:00:00 56
30 2020-02-02 05:00:00 31
54 2020-02-03 05:00:00 65
78 2020-02-04 05:00:00 80
EDIT: Getting mean of each sites per subset of hours
Per OP's request in comments, the question is to calcualte the mean of values for various sites for different period of times.
Basically, you want to have two period per days, one from 8:00 to 17:00 and the other one from 18:00 to 7:00.
Here, a more elaborated example based on the previous one:
df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
Site1 = sample(1:100,97, replace = TRUE),
Site2 = sample(1:100,97, replace = TRUE))
DateTime Site1 Site2
1 2020-02-01 00:00:00 100 6
2 2020-02-01 01:00:00 9 49
3 2020-02-01 02:00:00 86 12
4 2020-02-01 03:00:00 34 55
5 2020-02-01 04:00:00 76 29
6 2020-02-01 05:00:00 41 1
....
So, now you can do the following to label each time point as daily or night, then group by this category for each day and calculate the mean of each individual sites using summarise_at:
library(lubridate)
library(dplyr)
df %>% mutate(Date = date(DateTime),
Hour= hour(DateTime),
Category = ifelse(between(hour(DateTime),8,17),"Daily","Night")) %>%
group_by(Date, Category) %>%
summarise_at(vars(c(Site1,Site2)), ~ mean(., na.rm = TRUE))
# A tibble: 9 x 4
# Groups: Date [5]
Date Category Site1 Site2
<date> <chr> <dbl> <dbl>
1 2020-02-01 Daily 56.9 63.1
2 2020-02-01 Night 58.9 46.6
3 2020-02-02 Daily 54.5 47.6
4 2020-02-02 Night 36.9 41.7
5 2020-02-03 Daily 42.3 56.9
6 2020-02-03 Night 44.1 55.9
7 2020-02-04 Daily 54.3 50.4
8 2020-02-04 Night 54.8 34.3
9 2020-02-05 Night 75 16
Does it answer your question ?

Sum a part of a time series between set end points

I have a time series (xts) of rain gage data and I would like to be able to sum all the rain amounts between a beginning and end time point from a list. And then make a new data frame that is StormNumber and TotalRain over that time
> head(RainGage)
Rain_mm
2019-07-01 00:00:00 0
2019-07-01 00:15:00 0
2019-07-01 00:30:00 0
2019-07-01 00:45:00 0
2019-07-01 01:00:00 0
2019-07-01 01:15:00 0
head(StormTimes)
StormNumber RainStartTime RainEndTime
1 1 2019-07-21 20:00:00 2019-07-22 04:45:00
2 2 2019-07-22 11:30:00 2019-07-22 23:45:00
3 3 2019-07-11 09:15:00 2019-07-11 19:00:00
4 4 2019-05-29 17:00:00 2019-05-29 20:45:00
5 5 2019-06-27 14:30:00 2019-06-27 17:15:00
6 6 2019-07-11 06:15:00 2019-07-11 09:00:00
I have this code that I got from the SO community when I was trying to do something similar in the past (but extract data rather than sum it). However, I have no idea how it works so I am struggling to adapt it to this situation.
do.call(rbind, Map(function(x, y) RainGage[paste(x, y, sep="/")],
StormTimes$RainStartTime, StormTimes$RainEndTime)
In this case I would suggest just to write your own function and then use apply to achieve what you want, for example:
dates <- c('2019-07-01 00:00:00', '2019-07-01 00:15:00',
'2019-07-01 00:30:00', '2019-07-01 00:45:00',
'2019-07-01 01:00:00', '2019-07-01 01:15:00')
dates <- as.POSIXct(strptime(dates, '%Y-%m-%d %H:%M:%S'))
mm <- c(0, 10, 10, 20, 0, 0)
rain <- data.frame(dates, mm)
number <- c(1,2)
start <- c('2019-07-01 00:00:00','2019-07-01 00:18:00')
start <- as.POSIXct(strptime(start, '%Y-%m-%d %H:%M:%S'))
end <- c('2019-07-01 00:17:00','2019-07-01 01:20:00')
end <- as.POSIXct(strptime(end, '%Y-%m-%d %H:%M:%S'))
storms <- data.frame(number, start, end)
# Sum of rain
f = function(x, output) {
# Get storm number
number = x[1]
# Get starting moment
start = x[2]
# Get ending moment
end = x[3]
# Calculate sum
output <- sum(rain[rain$dates >= start & rain$dates < end, 'mm'])
}
# Apply function to each row of the dataframe
storms$rain <- apply(storms, 1, f)
print(storms)
This yields:
number start end rain
1 1 2019-07-01 00:00:00 2019-07-01 00:17:00 10
2 2 2019-07-01 00:18:00 2019-07-01 01:20:00 30
So a column rain in storms now holds the sum of rain$mm, which is what you're after.
Hope that helps you out!

R convert hourly to daily data up to 0:00 instead of 23:00

How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example :
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120))
The following functions show periods ends at 23:00
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days")
x1[endpoints(x1, 'days')]
So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?
As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin".
What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly.
(More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.)
Here is an example:
> tail(x1["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -1.2760349
# 2018-02-01 19:00:00 -0.1496041
# 2018-02-01 20:00:00 -0.5989614
# 2018-02-01 21:00:00 -0.9691905
# 2018-02-01 22:00:00 -0.2519618
# 2018-02-01 23:00:00 -1.6081656
> head(x1["2018-02-02"])
# [,1]
# 2018-02-02 00:00:00 -0.3373271
# 2018-02-02 01:00:00 0.8312698
# 2018-02-02 02:00:00 0.9321747
# 2018-02-02 03:00:00 0.6719425
# 2018-02-02 04:00:00 -0.5597391
# 2018-02-02 05:00:00 -0.9810128
> head(x1["2018-02-03"])
# [,1]
# 2018-02-03 00:00:00 2.3746424
# 2018-02-03 01:00:00 0.8536594
# 2018-02-03 02:00:00 -0.2467268
# 2018-02-03 03:00:00 -0.1316978
# 2018-02-03 04:00:00 0.3079848
# 2018-02-03 05:00:00 0.2445634
x2 <- x1
.index(x2) <- .index(x1) - 3600
> tail(x2["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -0.1496041
# 2018-02-01 19:00:00 -0.5989614
# 2018-02-01 20:00:00 -0.9691905
# 2018-02-01 21:00:00 -0.2519618
# 2018-02-01 22:00:00 -1.6081656
# 2018-02-01 23:00:00 -0.3373271
x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days")
> x.d2
# [,1]
# 2018-01-31 23:00:00 0.12516594
# 2018-02-01 23:00:00 -0.33732710
# 2018-02-02 23:00:00 2.37464235
# 2018-02-03 23:00:00 0.51797747
# 2018-02-04 23:00:00 0.08955208
# 2018-02-05 22:00:00 0.33067734
x.d2 <- align.time(x.d2, n = 86400)
> x.d2
# [,1]
# 2018-02-01 0.12516594
# 2018-02-02 -0.33732710
# 2018-02-03 2.37464235
# 2018-02-04 0.51797747
# 2018-02-05 0.08955208
# 2018-02-06 0.33067734
Want to convince yourself? Try something like this:
x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00"))))
x3["2018-02-01 23/2018-02-02 01"]
# [,1]
# 2018-02-01 23:00:00.000 -1.6081656
# 2018-02-01 23:59:59.999 1.0000000
# 2018-02-02 00:00:00.000 -0.3373271
# 2018-02-02 00:00:00.000 2.0000000
# 2018-02-02 01:00:00.000 0.8312698
x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days")
> x3.d <- align.time(x3.d, 86400)
> x3.d
[,1]
2018-02-02 1.00000000
2018-02-03 -0.09832625
2018-02-04 -0.65075506
2018-02-05 -0.09423664
2018-02-06 0.33067734
See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999.
Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates:
index(x3.d) = index(x3.d) - 86400
I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs:
#e.g. bad: day light savings occurs on this weekend for US EST
z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York"))
> index(z) - 86400
[1] "2018-03-10 23:00:00 EST"
i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00).
You could get around this problem using something much safer like this:
library(lubridate)
# right
> index(z) - days(1)
[1] "2018-03-11 EST"
I don't think this is possible because 00:00 is the start of the day. From the manual:
These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day
I think the solution here is to use minutes instead of hours. Using your example:
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200))
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day")
x1[endpoints(x1, 'day')]

summarize by time interval not working

I have the following data as a list of POSIXct times that span one month. Each of them represent a bike delivery. My aim is to find the average amount of bike deliveries per ten-minute interval over a 24-hour period (producing a total of 144 rows). First all of the trips need to be summed and binned into an interval, then divided by the number of days. So far, I've managed to write a code that sums trips per 10-minute interval, but it produces incorrect values. I am not sure where it went wrong.
The data looks like this:
head(start_times)
[1] "2014-10-21 16:58:13 EST" "2014-10-07 10:14:22 EST" "2014-10-20 01:45:11 EST"
[4] "2014-10-17 08:16:17 EST" "2014-10-07 17:46:36 EST" "2014-10-28 17:32:34 EST"
length(start_times)
[1] 1747
The code looks like this:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(1747) * 1000)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2014-10-01 00:00:00"), as.POSIXct("2014-10-31 23:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
head(out)
out_buckets trip_count
1 2014-10-01 00:00:00 0
2 2014-10-01 00:10:00 0
3 2014-10-01 00:20:00 0
4 2014-10-01 00:30:00 0
5 2014-10-01 00:40:00 0
6 2014-10-01 00:50:00 0
dim(out)
[1] 4459 2
test <- format(out$out_buckets,"%H:%M:%S")
test2 <- out$trip_count
test <- cbind(test, test2)
colnames(test)[1] <- "interval"
colnames(test)[2] <- "count"
test <- as.data.frame(test)
test$count <- as.numeric(test$count)
test <- aggregate(count~interval, test, sum)
head(test, n = 20)
interval count
1 00:00:00 32
2 00:10:00 33
3 00:20:00 32
4 00:30:00 31
5 00:40:00 34
6 00:50:00 34
7 01:00:00 31
8 01:10:00 33
9 01:20:00 39
10 01:30:00 41
11 01:40:00 36
12 01:50:00 31
13 02:00:00 33
14 02:10:00 34
15 02:20:00 32
16 02:30:00 32
17 02:40:00 36
18 02:50:00 32
19 03:00:00 34
20 03:10:00 39
but this is impossible because when I sum the counts
sum(test$count)
[1] 7494
I get 7494 whereas the number should be 1747
I'm not sure where I went wrong and how to simplify this code to get the same result.
I've done what I can, but I can't reproduce your issue without your data.
library(dplyr)
I created the full sequence of 10 minute blocks:
blocks.of.10mins <- data.frame(out_buckets=seq(as.POSIXct("2014/10/01 00:00"), by="10 mins", length.out=30*24*6))
Then split the start_times into the same bins. Note: I created a baseline time of midnight to force the blocks to align to 10 minute intervals. Removing this later is an exercise for the reader. I also changed one of your data points so that there was at least one example of multiple records in the same bin.
start_times <- as.POSIXct(c("2014-10-01 00:00:00", ## added
"2014-10-21 16:58:13",
"2014-10-07 10:14:22",
"2014-10-20 01:45:11",
"2014-10-17 08:16:17",
"2014-10-07 10:16:36", ## modified
"2014-10-28 17:32:34"))
trip_times <- data.frame(start_times) %>%
mutate(out_buckets = as.POSIXct(cut(start_times, breaks="10 mins")))
The start_times and all the 10 minute intervals can then be merged
trips_merged <- merge(trip_times, blocks.of.10mins, by="out_buckets", all=TRUE)
These can then be grouped by 10 minute block and counted
trips_merged %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(time) (int)
1 2014-10-01 00:00:00 1
2 2014-10-07 10:10:00 2
3 2014-10-17 08:10:00 1
4 2014-10-20 01:40:00 1
5 2014-10-21 16:50:00 1
6 2014-10-28 17:30:00 1
Instead, if we only consider time, not date
trips_merged2 <- trips_merged
trips_merged2$out_buckets <- format(trips_merged2$out_buckets, "%H:%M:%S")
trips_merged2 %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(chr) (int)
1 00:00:00 1
2 01:40:00 1
3 08:10:00 1
4 10:10:00 2
5 16:50:00 1
6 17:30:00 1

Resources