I would like to determine the difference between timestamps regading a userID. Here I just want to measure the difference between those users who have a login und logout status. There are some users haven only logout our login status. For them I just would like to mark dem as NA:
Some data:
library(dplyr)
start <- as.POSIXct("2012-01-15")
interval <- 70
end <- start + as.difftime(1, units="days")
tseq<- seq(from=start, by=interval*70, to=end)
employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
# put together
data <- data.frame(tseq, employeID, status)
tseq employeID status
#1 2012-01-15 00:00:00 1_e login
#2 2012-01-15 01:21:40 1_e logout
#3 2012-01-15 02:43:20 2_b login
#4 2012-01-15 04:05:00 2_b logout
#5 2012-01-15 05:26:40 3_c login
#6 2012-01-15 06:48:20 3_c logout
#7 2012-01-15 08:10:00 100_c login
#8 2012-01-15 09:31:40 4_d logout
#9 2012-01-15 10:53:20 4_d login
#10 2012-01-15 12:15:00 52_f logout
#11 2012-01-15 13:36:40 9_f login
#12 2012-01-15 14:58:20 9_f logout
#13 2012-01-15 16:20:00 7_u login
#14 2012-01-15 17:41:40 7_u logout
#15 2012-01-15 19:03:20 10_5 logout
#16 2012-01-15 20:25:00 22_2 login
#17 2012-01-15 21:46:40 33_a logout
#18 2012-01-15 23:08:20 33_a login
test<- data %>%
group_by(employeID) %>%
mutate(time.difference = tseq - lag(tseq))
However, that seems only to produce a time.difference constant
How about this. Mainly, it looks like you're using mutate when you want summarise. Also, I've converted the status column from factor to character, and included an ifelse statement to only take the users with both "login" and "logout" entries:
test <- data %>%
mutate( status = as.character( status ) ) %>%
group_by( employeID ) %>%
summarise( time.difference = ifelse( "login" %in% status && "logout" %in% status,
difftime( tseq[ status == "logout" ], tseq[ status == "login" ] ),
NA )
)
Which gives:
> head( test )
# A tibble: 6 × 2
employeID time.difference
<fctr> <dbl>
1 1_e 1.361111
2 10_5 NA
3 100_c NA
4 2_b 1.361111
5 22_2 NA
6 3_c 1.361111
As others have suggested, your data does contain constant intervals of time, so wherever there is a relevant value, it's always the same. I assume your actual data looks a little different, so you'll get more sensical output.
We first filter groups which have unpaired status by checking count for each group. With dplyr::do we then
calculate time difference for each group
library(dplyr)
start <- as.POSIXct("2012-01-15")
interval <- 70
end <- start + as.difftime(1, units="days")
tseq<- seq(from=start, by=interval*70, to=end)
employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
# put together
DF <- data.frame(tseq, employeID, status)
tseq employeID status
#1 2012-01-15 00:00:00 1_e login
#2 2012-01-15 01:21:40 1_e logout
#3 2012-01-15 02:43:20 2_b login
#4 2012-01-15 04:05:00 2_b logout
#5 2012-01-15 05:26:40 3_c login
#6 2012-01-15 06:48:20 3_c logout
#7 2012-01-15 08:10:00 100_c login
#8 2012-01-15 09:31:40 4_d logout
#9 2012-01-15 10:53:20 4_d login
#10 2012-01-15 12:15:00 52_f logout
#11 2012-01-15 13:36:40 9_f login
#12 2012-01-15 14:58:20 9_f logout
#13 2012-01-15 16:20:00 7_u login
#14 2012-01-15 17:41:40 7_u logout
#15 2012-01-15 19:03:20 10_5 logout
#16 2012-01-15 20:25:00 22_2 login
#17 2012-01-15 21:46:40 33_a logout
#18 2012-01-15 23:08:20 33_a login
testDF<- DF %>%
dplyr::group_by(employeID) %>%
dplyr::filter(count(unique(status)) > 1 ) %>%
dplyr::do(.,data.frame(logINTime =.$tseq[.$status=="login"],logOUTTime =.$tseq[.$status=="logout"],
deltaTime=difftime(.$tseq[.$status=="logout"],.$tseq[.$status=="login"],units="secs"))) %>%
as.data.frame()
testDF
# employeID logINTime logOUTTime deltaTime
# 1 1_e 2012-01-15 00:00:00 2012-01-15 01:21:40 4900
# 2 2_b 2012-01-15 02:43:20 2012-01-15 04:05:00 4900
# 3 3_c 2012-01-15 05:26:40 2012-01-15 06:48:20 4900
# 4 33_a 2012-01-15 23:08:20 2012-01-15 21:46:40 -4900
# 5 4_d 2012-01-15 10:53:20 2012-01-15 09:31:40 -4900
# 6 7_u 2012-01-15 16:20:00 2012-01-15 17:41:40 4900
# 7 9_f 2012-01-15 13:36:40 2012-01-15 14:58:20 4900
This line seems to create a constant time interval:
tseq<- seq(from=start, by=interval*70, to=end)
So when you take the difference again, wouldn't it be constant?
Related
I have a raw data frame that looks like this:
test
id class time
1 1 start 2019-06-20 00:00:00
2 1 end 2019-06-20 00:05:00
3 1 start 2019-06-20 00:10:00
4 1 end 2019-06-20 00:15:00
5 2 end 2019-06-20 00:20:00
6 2 start 2019-06-20 00:25:00
7 2 end 2019-06-20 00:30:00
8 2 start 2019-06-20 00:35:00
9 3 end 2019-06-20 00:40:00
10 3 start 2019-06-20 00:45:00
11 3 end 2019-06-20 00:50:00
12 3 start 2019-06-20 00:55:00
My goal is to map the values to an output table for each id only where there is a start and an end in consecutive order (time). Therefore, the output would look like:
output
id start end
1 1 2019-06-20 00:00:00 2019-06-20 00:05:00
2 1 2019-06-20 00:10:00 2019-06-20 00:15:00
3 2 2019-06-20 00:25:00 2019-06-20 00:30:00
4 3 2019-06-20 00:45:00 2019-06-20 00:50:00
I have tried with the dplyr package, but
test %>% group_by(id) %>% arrange(time) %>% starts_with("start")
Error in starts_with(., "start") : is_string(match) is not TRUE
starts_with always throws an error. I would like to avoid writing a for loop because I am sure this can be handled by a few chain operations. Any ideas for a workaround in dplyr or data.table?
One possible approach:
test[, {
si <- which(class=="start" & shift(class, -1L)=="end")
.(id, start=time[si], end=time[si + 1L])
}, by=.(id)]
output:
id start end
1: 1 1 2019-06-20 00:00:00 2019-06-20 00:05:00
2: 1 1 2019-06-20 00:10:00 2019-06-20 00:15:00
3: 2 2 2019-06-20 00:25:00 2019-06-20 00:30:00
4: 3 3 2019-06-20 00:45:00 2019-06-20 00:50:00
data:
library(data.table)
test <- fread("id,class,time
1,start,2019-06-20 00:00:00
1,end,2019-06-20 00:05:00
1,start,2019-06-20 00:10:00
1,end,2019-06-20 00:15:00
2,end,2019-06-20 00:20:00
2,start,2019-06-20 00:25:00
2,end,2019-06-20 00:30:00
2,start,2019-06-20 00:35:00
3,end,2019-06-20 00:40:00
3,start,2019-06-20 00:45:00
3,end,2019-06-20 00:50:00
3,start,2019-06-20 00:55:00")
I usually use cumsum() is these cases
test %>%
group_by(id) %>%
arrange(time, .by_group = TRUE) %>% # should use .by_group arg
mutate(flag = cumsum(class == "start")) %>%
group_by(id, flag) %>%
filter(n() == 2L) %>%
ungroup() %>%
spread(class, time) %>%
select(-flag)
Using dplyr and tidyr, we can first filter the rows which follow the "start" and "end" pattern, create groups of 2 rows and spread to long format.
library(dplyr)
library(tidyr)
test %>%
group_by(id) %>%
filter(class == "start" & lead(class) == "end" |
class == "end" & lag(class) == "start") %>%
group_by(group = gl(n()/2, 2)) %>%
spread(class, time) %>%
ungroup() %>%
select(-group) %>%
select(id, start, end)
# id start end
# <int> <dttm> <dttm>
#1 1 2019-06-20 00:00:00 2019-06-20 00:05:00
#2 1 2019-06-20 00:10:00 2019-06-20 00:15:00
#3 2 2019-06-20 00:25:00 2019-06-20 00:30:00
#4 3 2019-06-20 00:45:00 2019-06-20 00:50:00
You can keep each start row plus the end immediately after it (if any), then use dcast to switch from long to wide form:
test[,
if (.N >= 2) head(.SD, 2)
, by=.(g = rleid(id, cumsum(class=="start"))), .SDcols=names(test)][,
dcast(.SD, id + g ~ factor(class, levels=c("start", "end")), value.var="time")
]
id g start end
1: 1 1 2019-06-20 00:00:00 2019-06-20 00:05:00
2: 1 2 2019-06-20 00:10:00 2019-06-20 00:15:00
3: 2 4 2019-06-20 00:25:00 2019-06-20 00:30:00
4: 3 7 2019-06-20 00:45:00 2019-06-20 00:50:00
rleid and cumsum are used to find the sequences; and factor is needed to tell dcast the column order.
Side note: This is essentially the same as #cheetahfly's answer (I didn't realize when I posted): since the cumsum is increasing, it is sufficient to group by id + cumsum and there's no need to use rleid (which is for tracking runs of values). The only difference is that my approach woudl keep a run like start, end, end; while the other answer would filter it out with the n() == 2 check.
I have a dataframe. Some dates fall on the weekend. However I would like to change all weekend dates to the past Friday.
as.Date(aapl_earnings$Date, "%Y/%m/%d")
[1] "2018-04-30" "2018-01-31" "2017-11-01" "2017-07-31" "2017-05-01" "2017-01-30" "2016-10-24"
[8] "2016-07-25" "2016-04-25" "2016-01-25" "2015-10-26" "2015-07-20" "2015-04-26" "2015-01-26"
[15] "2014-10-19" "2014-07-21" "2014-04-22" "2014-01-26" "2013-10-27"
We can use a nested ifelse here and check the day of the week using weekdays and adjust the date accordingly.
dates <- weekdays(as.Date(x))
as.Date(ifelse(dates == "Saturday", x - 1,
ifelse(dates == "Sunday", x - 2, x)), origin = "1970-01-01")
#[1]"2018-04-30" "2018-01-31" "2017-11-01" "2017-07-31" "2017-05-01" "2017-01-30"
#[7]"2016-10-24" "2016-07-25" "2016-04-25" "2016-01-25" "2015-10-26" "2015-07-20"
#[13]"2015-04-24" "2015-01-26" "2014-10-17" "2014-07-21" "2014-04-22" "2014-01-24"
#[19]"2013-10-25"
Or we can also use case_when from dplyr which is more verbose.
library(dplyr)
aapl_earnings <- data.frame(Date = as.Date(x))
aapl_earnings %>%
mutate(date = weekdays(Date),
new_date = case_when(date == "Saturday" ~ Date - 1,
date == "Sunday" ~ Date - 2,
TRUE ~ Date)) %>%
select(-date)
# Date new_date
#1 2018-04-30 2018-04-30
#2 2018-01-31 2018-01-31
#3 2017-11-01 2017-11-01
#4 2017-07-31 2017-07-31
#5 2017-05-01 2017-05-01
#6 2017-01-30 2017-01-30
#7 2016-10-24 2016-10-24
#8 2016-07-25 2016-07-25
#9 2016-04-25 2016-04-25
#10 2016-01-25 2016-01-25
#11 2015-10-26 2015-10-26
#12 2015-07-20 2015-07-20
#13 2015-04-26 2015-04-24
#14 2015-01-26 2015-01-26
#15 2014-10-19 2014-10-17
#16 2014-07-21 2014-07-21
#17 2014-04-22 2014-04-22
#18 2014-01-26 2014-01-24
#19 2013-10-27 2013-10-25
data
x <- c("2018-04-30","2018-01-31","2017-11-01","2017-07-31","2017-05-01",
"2017-01-30","2016-10-24","2016-07-25","2016-04-25","2016-01-25","2015-10-26",
"2015-07-20","2015-04-26","2015-01-26" ,"2014-10-19","2014-07-21","2014-04-22",
"2014-01-26", "2013-10-27")
I have a data set with a structure such as this:
structure(list(id = c(43956L, 46640L, 71548L, 71548L, 71548L,
72029L, 72029L, 74558L, 74558L, 100596L, 100596L, 100596L, 104630L,
104630L, 104630L, 104630L, 104630L, 104630L, 104630L, 104630L
), event = c("LOGIN", "LOGIN", "LOGIN", "LOGIN", "LOGOUT", "LOGIN",
"LOGOUT", "LOGIN", "LOGOUT", "LOGIN", "LOGOUT", "LOGIN", "LOGIN",
"LOGIN", "LOGIN", "LOGIN", "LOGIN", "LOGOUT", "LOGIN", "LOGOUT"
), timestamp = c("2017-03-27 09:19:29", "2016-06-10 00:09:08",
"2016-01-27 12:00:25", "2016-06-20 11:34:29", "2016-06-20 11:35:44",
"2016-12-28 10:43:25", "2016-12-28 10:56:30", "2016-10-15 15:08:39",
"2016-10-15 15:10:06", "2016-03-09 14:30:48", "2016-03-09 14:31:10",
"2017-04-03 10:36:54", "2016-01-11 16:52:08", "2016-02-03 14:40:32",
"2016-03-30 12:34:56", "2016-05-26 13:14:25", "2016-08-22 15:20:02",
"2016-08-22 15:21:53", "2016-08-22 15:22:23", "2016-08-22 15:23:08"
)), .Names = c("id", "event", "timestamp"), row.names = c(5447L,
5446L, 5443L, 5444L, 5445L, 5441L, 5442L, 5439L, 5440L, 5436L,
5437L, 5438L, 5425L, 5426L, 5427L, 5428L, 5429L, 5430L, 5431L,
5432L), class = "data.frame")
id event timestamp
5447 43956 LOGIN 2017-03-27 09:19:29
5446 46640 LOGIN 2016-06-10 00:09:08
5443 71548 LOGIN 2016-01-27 12:00:25
5444 71548 LOGIN 2016-06-20 11:34:29
5445 71548 LOGOUT 2016-06-20 11:35:44
5441 72029 LOGIN 2016-12-28 10:43:25
5442 72029 LOGOUT 2016-12-28 10:56:30
5439 74558 LOGIN 2016-10-15 15:08:39
5440 74558 LOGOUT 2016-10-15 15:10:06
5436 100596 LOGIN 2016-03-09 14:30:48
5437 100596 LOGOUT 2016-03-09 14:31:10
5438 100596 LOGIN 2017-04-03 10:36:54
5425 104630 LOGIN 2016-01-11 16:52:08
5426 104630 LOGIN 2016-02-03 14:40:32
5427 104630 LOGIN 2016-03-30 12:34:56
5428 104630 LOGIN 2016-05-26 13:14:25
5429 104630 LOGIN 2016-08-22 15:20:02
5430 104630 LOGOUT 2016-08-22 15:21:53
5431 104630 LOGIN 2016-08-22 15:22:23
5432 104630 LOGOUT 2016-08-22 15:23:08
I wish to calculate the time difference between LOGIN and LOGOUT (session duration) as well as between LOGOUT and LOGIN (session interval). Unfortunately, I have LOGIN events that do not have a matching LOGOUT event.
The correct LOGOUT event always follows its' corresponding LOGIN event (as I ordered the data frame based on id and timestamp. I tried adapting this answer, but have had no luck. I also tried creating an event identifier, but since I can't find a way to get the numbering for the LOGOUT event to match the numbering for the LOGIN event, I am unsure as to how useful such an identifier will be:
df$eventNum <- as.numeric(ave(as.character(df$id), df$id, as.character(df$event), FUN = seq_along))
Here's the approach I'd take:
First, I'd convert the event variable to an ordered factor, because it makes sense to think of its values this way (i.e. Login < Logout, in terms of order), and because it will enable easier comparison between rows:
df$event <- factor(df$event, levels = c("LOGIN", "LOGOUT"), ordered = T)
Then, assuming that timestamp is in a viable format, as this would provide:
df$timestamp <- lubridate::parse_date_time(df$timestamp, "%Y-%m-%d %H:%M:%S")
You can conditionally mutate your data.frame by grouping by ID and then calling mutate with ifelse functions:
df %>% group_by(id) %>% mutate(
timeElapsed = ifelse(event != lag(event), lubridate::seconds_to_period(timestamp - lag(timestamp)), NA),
eventType = ifelse(event > lag(event), 'Duration', ifelse(event < lag(event), 'Interval', NA))
)
# id event timestamp timeElapsed eventType
# <int> <ord> <dttm> <dbl> <chr>
# 1 43956 LOGIN 2017-03-27 09:19:29 NA <NA>
# 2 46640 LOGIN 2016-06-10 00:09:08 NA <NA>
# 3 71548 LOGIN 2016-01-27 12:00:25 NA <NA>
# 4 71548 LOGIN 2016-06-20 11:34:29 NA <NA>
# 5 71548 LOGOUT 2016-06-20 11:35:44 1.25000 Duration
# 6 72029 LOGIN 2016-12-28 10:43:25 NA <NA>
# 7 72029 LOGOUT 2016-12-28 10:56:30 13.08333 Duration
# 8 74558 LOGIN 2016-10-15 15:08:39 NA <NA>
# 9 74558 LOGOUT 2016-10-15 15:10:06 1.45000 Duration
# 10 100596 LOGIN 2016-03-09 14:30:48 NA <NA>
# 11 100596 LOGOUT 2016-03-09 14:31:10 22.00000 Duration
# 12 100596 LOGIN 2017-04-03 10:36:54 44.00000 Interval
# 13 104630 LOGIN 2016-01-11 16:52:08 NA <NA>
# 14 104630 LOGIN 2016-02-03 14:40:32 NA <NA>
# 15 104630 LOGIN 2016-03-30 12:34:56 NA <NA>
# 16 104630 LOGIN 2016-05-26 13:14:25 NA <NA>
# 17 104630 LOGIN 2016-08-22 15:20:02 NA <NA>
# 18 104630 LOGOUT 2016-08-22 15:21:53 51.00000 Duration
# 19 104630 LOGIN 2016-08-22 15:22:23 30.00000 Interval
# 20 104630 LOGOUT 2016-08-22 15:23:08 45.00000 Duration
Using lubridate::seconds_to_period will give you the time difference in "%d %H %M %S" format.
Assuming that any user will stay logged in indefinitely until logs out, it seems the data can be ordered in a way so that a simple "lag" function will do the trick.
Using the library dplyr and assuming that you've called your dataframe "df" and you have already converted the timestamp to a date format such as POSIXct:
df %>% arrange(id,timestamp) %>%
group_by(id,event)%>%
mutate(rank = dense_rank(timestamp)) %>%
ungroup() %>%
arrange(id, rank,timestamp) %>%
group_by(id)%>%
mutate(duration = ifelse(event == "LOGOUT", timestamp- lag(timestamp),NA))
Line by line.
First, we order the data by "id" and "timestamp" and we group by "id" and "event" to assign the rank of the login and logout events. The First login for the same user will have the "rank" 1 and the first log out for that user will also have the "rank" 1.
df %>% arrange(id,timestamp) %>%
group_by(id,event)%>%
mutate(rank = dense_rank(timestamp))
Then, we remove the groupings of the data and we sort again by id, rank and timestamp. This will yield a dataframe with the right order, with the LOGIN events followed by LOGOUT events for each user, so we can apply a lag calculation.
ungroup() %>%
arrange(id, rank,timestamp) %>%
Finally, we group again by "id" and we use mutate to calculate the lag of the timestamps only for the LOGOUT events.
group_by(id)%>%
mutate(duration = ifelse(event == "LOGOUT", timestamp- lag(timestamp),NA))
That should yield a dataframe such as:
id event timestamp rank duration
<int> <chr> <dttm> <int> <dbl>
1 43956 LOGIN 2017-03-27 09:19:29 1 NA
2 46640 LOGIN 2016-06-10 00:09:08 1 NA
3 71548 LOGIN 2016-01-27 12:00:25 1 NA
4 71548 LOGOUT 2016-06-20 11:35:44 1 208715.31667
5 71548 LOGIN 2016-06-20 11:34:29 2 NA
6 72029 LOGIN 2016-12-28 10:43:25 1 NA
7 72029 LOGOUT 2016-12-28 10:56:30 1 13.08333
8 74558 LOGIN 2016-10-15 15:08:39 1 NA
9 74558 LOGOUT 2016-10-15 15:10:06 1 1.45000
10 100596 LOGIN 2016-03-09 14:30:48 1 NA
11 100596 LOGOUT 2016-03-09 14:31:10 1 22.00000
I have timestamp column, having data in the form 2016-01-01 00:41:23
I want to convert this data into 12 slots each of 2hrs from the entire dataset. The data is of not importance, only the time needs to be considered.
00:00:00 - 01:59:59 - slot1
02:00:00 - 03:59:59 - slot2
.......
22:00:00 - 23:59:59 - slot12
How can I achieve this in R?
x <- c("01:59:59", "03:59:59", "05:59:59",
"07:59:59", "09:59:59", "11:59:59",
"13:59:59", "15:59:59", "17:59:59",
"19:59:59", "21:59:59", "23:59:59")
cut(pickup_time, breaks = x)
Above code gives error: : 'x' must be numeric
Considering your dataframe as df we can use cut with breaks of 2 hours.
df$slotnumber <- cut(strptime(df$x, "%H:%M:%S"), breaks = "2 hours",
labels = paste0("slot", 1:12))
# x slotnumber
#1 01:59:59 slot1
#2 03:59:59 slot2
#3 05:59:59 slot3
#4 07:59:59 slot4
#5 09:59:59 slot5
#6 11:59:59 slot6
#7 13:59:59 slot7
#8 15:59:59 slot8
#9 17:59:59 slot9
#10 19:59:59 slot10
#11 21:59:59 slot11
#12 23:59:59 slot12
data
df <- data.frame(x)
I am trying to split a data.table (an enhanced data.frame) by a POSIXct columns, without success...
rangedt <- as.POSIXct(c("2012-10-01 06:00","2012-10-01 21:00"), tz='GMT'); N=1e2
dts <- as.POSIXct(runif(n=N, min=min(rangedt), max=max(rangedt)), tz='GMT', origin='1970-01-01')
DT <- data.table(x=rnorm(N), dts=dts) # put data.frame if you prefer
# x dts
#1: 0.938973900218328494383 2012-10-01 17:11:46.503828
#2: 0.582959687387282210480 2012-10-01 17:33:24.203815
#3: -1.492752410394331263888 2012-10-01 08:37:37.585960
#4: 0.677074458537853418605 2012-10-01 08:55:04.598939
#5: 0.012120685348577473275 2012-10-01 09:35:16.664197
#6: -1.353204371844073161668 2012-10-01 18:45:46.737178
f <- cut(rangedt, breaks='10 min');
f
#[1] 2012-10-01 06:00:00 2012-10-01 21:00:00
#91 Levels: 2012-10-01 06:00:00 2012-10-01 06:10:00 2012-10-01 06:20:00
DT.split <- split(DT, f=findInterval(DT$dts,f))
length(DT.split)
#[1] This is because R make one class only fron the data, which I do not understand
I found the problem here, factors and POSIXct don't work well together
rangedt <- as.POSIXct(c("2012-10-01 06:00","2012-10-01 21:00"), tz='GMT'); N=1e2
dts <- as.POSIXct(runif(n=N, min=min(rangedt), max=max(rangedt)), tz='GMT', origin='1970-01-01')
DT <- data.table(x=rnorm(N), dts=dts) # put data.frame if you prefer
# x dts
#1: 0.938973900218328494383 2012-10-01 17:11:46.503828
#2: 0.582959687387282210480 2012-10-01 17:33:24.203815
#3: -1.492752410394331263888 2012-10-01 08:37:37.585960
#4: 0.677074458537853418605 2012-10-01 08:55:04.598939
#5: 0.012120685348577473275 2012-10-01 09:35:16.664197
#6: -1.353204371844073161668 2012-10-01 18:45:46.737178
f <- cut(rangedt, breaks='10 min');
f
#[1] 2012-10-01 06:00:00 2012-10-01 21:00:00
#91 Levels: 2012-10-01 06:00:00 2012-10-01 06:10:00 2012-10-01 06:20:00
f <- as.POSIXct(levels(f), tz='GMT', origin='1970-01-01')
DT[, groups:=findInterval(dts,f)]
DT.split <- split(DT, DT$groups)