R: Combine two dataframes by the nearest time - r

I have two dataframes; one that contains a year's worth of hourly temperatures and the other contains flight information. Bellow shows an extract from the temperature dataframe:
Time <- c("2000-01-01 00:53:00","2000-01-01 06:53:00","2000-01-01 10:53:00")
Time <- as.POSIXct(Time)
Temp <- c(20,30,10)
Temperature <- data.frame(Time,Temp)
Temperature
Time Temp
1 2000-01-01 00:53:00 20
2 2000-01-01 06:53:00 30
3 2000-01-01 10:53:00 10
Bellow shows an extract from the flight information dataframe:
DepartureTime <- c("2000-01-01 03:01:00","2000-01-01 10:00:00","2000-01-01 14:00:00")
DepartureTime <- as.POSIXct(DepartureTime)
FlightInformation <- data.frame(DepartureTime)
FlightInformation
DepartureTime
1 2000-01-01 03:01:00
2 2000-01-01 10:14:00
3 2000-01-01 14:55:00
My goal is to take each row of FlightInformation$DepartureTime and find the closest time in the whole column Temperature$Time. I then want to add the corresponding temperature to the FlightInformation dataframe. The desired output should look like this:
FlightInformation
DepartureTime Temp
1 2000-01-01 03:01:00 20
2 2000-01-01 10:14:00 10
3 2000-01-01 14:55:00 10
My attempts so far have come up with this:
i <- 1
j <- 1
while(i <= nrow(Temperature)){
while(j <= nrow(FlightInformation)){
if(Temperature$Time[i] == FlightInformation$Time[j]){
FlightInformation$Temp[j] == Temperature$Temp[i]
}
j <- j + 1
}
i <- i + 1
}
This involves first rounding all times to the nearest hour. This method is not as accurate as i would like it to be and seems VERY inefficient! Is there an easy way to find the nearest posix to give my desired output?

Some assumptions:
you have temperature data before and after all flight information; otherwise you'll see NA
temperature data is continuous-enough, meaning with the interpolation this presents, you don't grab something from 3 months prior (not useful)
temperature data is ordered (easy enough to fix if not)
We'll use cut, that finds the interval in which values fit within a series of breaks:
(ind <- cut(FlightInformation$DepartureTime, Temperature$Time, labels = FALSE))
# [1] 1 2 NA
These indicate rows within Temperature from which we should retrieve the $Temp. Unfortunately, it is absolute and does not allow for being closer to the next value, so we can compensate for that:
(ind <- ind + (abs(Temperature$Time[ind] - FlightInformation$DepartureTime) >
abs(Temperature$Time[1+ind] - FlightInformation$DepartureTime)))
# [1] 1 3 NA
Okay, now that NA: that indicates that the latest $DepartureTime is outside of the known times. This indicates a violation of my first assumption above, but it can be fixed. I use a magic-constant of "6 hours" here to determine that the data is close enough to be able to use it; there are certainly many other heuristics which will be less-wrong. For those, we can just assume the latest temperature:
(is_recoverable <- is.na(ind) & abs(FlightInformation$DepartureTime - max(Temperature$Time)) < 60*60*6)
# [1] FALSE FALSE TRUE
ind[is_recoverable] <- nrow(Temperature)
ind
# [1] 1 3 3
The the results:
FlightInformation$Temp <- Temperature$Temp[ ind ]
FlightInformation
# DepartureTime Temp
# 1 2000-01-01 03:01:00 20
# 2 2000-01-01 10:00:00 10
# 3 2000-01-01 14:00:00 10
Though definitely quicker than double while loops, it will be a problem if you have large gaps in your temperature data. That is, if you have a 3-year gap in your data, the most-recent temperature will be used, which might be 2.99 years ago. For a double-check, use this:
FlightInformation$TempTime <- Temperature$Time[ ind ]
FlightInformation$TimeDelta <- with(FlightInformation, abs(TempTime - DepartureTime))
FlightInformation
# DepartureTime Temp TempTime TimeDelta
# 1 2000-01-01 03:01:00 20 2000-01-01 00:53:00 128 mins
# 2 2000-01-01 10:00:00 10 2000-01-01 10:53:00 53 mins
# 3 2000-01-01 14:00:00 10 2000-01-01 10:53:00 187 mins
You can use different units for the time delta and check for problems with:
units(FlightInformation$TimeDelta) <- "secs"
which(FlightInformation$TimeDelta > 60*60*6)
# integer(0)
(where integer(0) says you have none that are outside of my magic window of 6 hours.)

Here's a way! Time is easiest to work with for this if you convert it to a numeric value. Then you can compare the numeric values to find the closest times before/after your reference time (FlightInformation$time_num in the below example). Once you have the closest time before and after your reference value, figure out which is really the closest to your reference. Use that time value to look up (index) the correct temperature value and add it to your data frame.
#convert time to numeric (seconds since origin of time)
Temperature$time_num <- as.numeric(Temperature$Time)
FlightInformation$time_num <- as.numeric(FlightInformation$DepartureTime)
#make sure time data is in correct order so that indexes for time are in correct order
Temperature <- Temperature[with(Temperature, order(time_num)), ] #sort data
for (i in 1:nrow(FlightInformation)) #for each row of data in flight...
{
#find the time in Temp that is closest + prior to Flight time
#create a logical vector saying which Temperature$time_num are <= to FlightInformation$time_num.
#pull the max row index from the logical vector where value == TRUE (this is the closest time for Temp that is prior to Flight Time)
#use that row index to look up the Temperature$time_num value that is closest + prior to Flight time
#will return NA/warning message if no time in Temp is before time in Flight
temptime_prior <- Temperature[max(which(Temperature$time_num <= FlightInformation$time_num[i])), "time_num"]
#find the time in Temp that is closest + after to Flight time
#will return NA/warning message if no time in Temp is after time in Flight
temptime_after <- Temperature[min(which(Temperature$time_num > FlightInformation$time_num[i])), "time_num"]
#compare times before and after to see which is closest to flight time. If no before/after time was found (e.g., NA was returned), always use the other time value
temptime_closest <- ifelse(is.na(temptime_prior), temptime_after,
ifelse(is.na(temptime_after), temptime_prior,
ifelse((FlightInformation$time_num[i] - temptime_prior) <= (temptime_after - FlightInformation$time_num[i]),
temptime_prior, temptime_after)))
#look up the right temp by finding the row index of right Temp$time_num value and add it to Flight info
FlightInformation$Temp[i] <- Temperature[which(Temperature$time_num == temptime_closest), "Temp"]
}
#get rid of numeric time column, you don't need it anymore
FlightInformation <- FlightInformation[,!(names(FlightInformation) %in% c("time_num"))]
Output
DepartureTime Temp
1 2000-01-01 03:01:00 20
2 2000-01-01 10:00:00 10
3 2000-01-01 14:00:00 10
If you have subsets of data in each data frame you need to match up to (e.g., match df1$group1 time values only to df2$group1 time values), you can use survival::neardate. It's a nice function for this that does basically what the above code does, but has some additional parameters if you need them.
Hope this helps! The codes a lot shorter without all the comments =)

Related

Create a variable based on time difference between dates on consecutive rows

I have a very large set of data driven off of an id and a date. The dataset has several hundred million rows and about 10 million id's. I am running in a non-windows environment with ample RAM and multiple processors available. I am doing this in parallel. At the moment, I'm working with multidplyr, though am considering all options.
For illustration:
> df[1:11,]
id date gap episode
1 100000019 2015-01-24 0 1
2 100000019 2015-02-20 27 1
3 100000019 2015-03-31 39 2
4 100000019 2015-04-29 29 2
5 100000019 2015-05-27 28 2
6 100000019 2015-06-24 28 2
7 100000019 2015-07-24 30 2
8 100000019 2015-08-23 30 2
9 100000019 2015-09-21 29 2
10 100000019 2015-10-22 31 3
11 100000019 2015-12-30 69 4
The data is sorted before the function call. The order is important. For each id, after the first date, I need to determine the number of days between each subsequent date. I call this a gap. So, the first date for the id gets a gap of zero. The second date gets the value of the second date minus the date in the prior row. An so on.
I am splitting the data by id, then sending the data for each id to the following function.
assign_gap <- function(x) {
# x$gap <- NA
for(i in 1:nrow(x)) {
x[i, ]$gap <- ifelse(i == 1, 0, x[i,]$date - x[i-1, ]$date)
}
return(x)
}
cluster <- create_cluster(8)
cluster_assign_value(cluster, 'assign_gap', assign_gap)
system.time(df <- df %>% partition(id, cluster = cluster) %>% do(assign_gap(.)) %>% collect())
I then apply another function that groups the sequence of gaps across dates into "episodes" based on allowable_gap (I am using a value of 30). So, each id will potentially have multiple episodes assigned based on the date sequence and the gap.
assign_episode <- function(x, allowable_gap){
ep <- 1
for(i in 1:nrow(x)){
ifelse(x[i,]$gap <= allowable_gap, ep <- ep, ep <- ep + 1)
x[i, ]$episode <- ep
}
return(x)
}
cluster <- create_cluster(8)
cluster_assign_value(cluster, 'assign_episode', assign_episode)
cluster_assign_value(cluster, 'allowable_gap', allowable_gap)
system.time(df <- df %>% partition(id, cluster = cluster) %>% do(assign_episode(., allowable_gap)) %>% collect())
Given the amount of data I have, I'd really like to find a way to avoid these loops in the functions, which I expect will improve efficiency considerably. If anyone can think of an alternative that accomplishes the same thing, I would be grateful.
I would recommend using the data.table library. This library is extremely fast, particularly if one is working with large data sets like yours. Here is a partial solution, where I solve the first step of your question:
1. calculate gap between dates, making sure the first row of each id is 0
library(data.table)
setDT(df)
df[, gap := c(0L, diff(date)) , by = id ]
Even though this is not working in parallel, I would expect this code to be faster than the loop you're currently using.
2. Assign a group episode for consecutive observations when the gap is under 30 by id
I haven't found a solution for the second part of your question yet, but I would encourage others to complement this answer if they find a solution.

How to mark the observations with given information

Considering the data collected with 5 minutes time interval with a numeric variable a,and a discret variable acc, which represents if there's any incident happened(0 for no incident while 1 for incident):
a<-c(1:(288*4))
t<-seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '5 min')
acc<-rep(0,288*4)
df<-data.frame(t,a,acc)
Now I have another data set which has the time(accurates to 1 sec) at which the incidents happened during the collection period:
T<-sample(seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '1 sec'),size = 5)
I want to mark the nearest 2 prior observation's acc as 1 according to the time in T. For example, if the incident happened at 2016-01-02 07:13:23, the observations' acc with t of 2016-01-02 07:05:00 and 2016-01-02 07:10:00 are marked as 1
How could I manage to do this?
ind <- findInterval(T, df$t)
df$acc[c(ind, ind + 1)] <- 1
One way could be:
library(lubridate)
df$acc=apply(sapply(T,function(x) x %within% interval((df$t - minutes(4)-seconds(59)),(df$t + minutes(4)+seconds(59)))),1,sum)
lubridate allows for the easy manipulation of dates, minutes(x) and seconds(x) adds x minutes or second to a period object.
interval() is used to create a time interval confined by the time in df$t ± 4min59s.
sapply() is used to check if any of the time in T is within the interval.
apply() is used to collapse the results of sapply() (it outputs 1 column for each element in T)
If T contains a value that is exactly equal to one in df$t such as 2016-01-04 12:05:00 CET this will only put 1 for this one.

Time series of categorical data -- how to calculate percent of each category, over time spans?

I have a dataframe of time stamps which specify a categorical status. The status is valid until the next time stamp, at which time the category might change.
I'd like to be able to determine percentage of time spent in each category over regular time periods, like monthly, quarterly, or annually.
This seems like a common enough problem, but I've been unable to find an elegant solution or library to solve it.
For example, with the following sample dataframe:
date status
2016-02-20 09:11:00 a
2016-03-06 02:38:00 c
2016-03-10 15:20:00 b
2016-03-10 21:20:00 a
2016-03-11 11:51:00 b
2016-03-12 01:19:00 c
2016-03-22 14:39:00 c
2016-03-23 11:37:00 b
2016-03-25 17:38:00 c
2016-03-26 01:24:00 c
2016-03-26 12:40:00 a
2016-04-12 10:28:00 c
... I might want to report weekly from 3/1-3/7, 3/8-3/14, 3/15-3/21, the percent time in each week of 'a', 'b', and 'c' status.
I started brute force coding a solution to this (it's ugly...), when I decided maybe I should ask here whether there's a more elegant way to do it.
======== Edited to add an inelegant brute-force solution below ========
time_analysis <- function(df, starttime, endtime) {
# - assumes sorted by date
startindex <- sum(df$date <= starttime) # find the index of the entry which contains the start time
endindex <- sum(df$date <= endtime) + 1 # find the index of the entry which contains the end time
if ( (startindex == 0) || (endindex > nrow(df) ) ) {
print("Date outside of available data")
return(NULL)
}
df2 <- df[ startindex:endindex, ] # subset the dataframe to include the range, but still need to trim ends
df2$date[1] <- starttime # trim to the start time
df2$date[nrow(df2)] <- endtime # trim back the end time
df2$status[nrow(df2)] <- df2$status[nrow(df2)-1] # status hasn't changed yet, so still the previous status
duration <- diff(df2$date) # vector of the time within each segment, 1 fewer elements than the dataframe
units(duration) <- 'days'
duration <- as.numeric(duration) # need to convert to numeric, or else can't divide by total duration
df2 <- df2[ -nrow(df2), ] # remove the last row, to make length same as the duration vector
df2$duration <- duration # add the duration column
total <- sum(df2$duration) # to allow calculations within the ddply
return(ddply(df2[, c('status','duration')], 'status', function(x) { # calculate by each status category
return( c(
date = starttime,
totaldays = round(sum(x$duration), 2),
fraction = round(sum(x$duration) / total, 3)) )
} ))
}
And below would be a sample use, that would split the reporting into roughly 2-week chunks. I hate the use manual date coding and using a loop in R, but am too inexperienced to know a better way.
times <- c("2016-03-01","2016-03-15","2016-04-01","2016-04-15","2016-05-01","2016-05-15")
result <- data.frame()
for (i in 1:(length(times) - 1)) {
result <- rbind( result, time_analysis(d, times[i], times[i+1]) )
}
print(result, row.names = FALSE)
Yielding (other than some errors for dates out of range):
status date totaldays fraction
a 2016-03-01 5.71 0.409
b 2016-03-01 0.81 0.058
c 2016-03-01 7.43 0.532
a 2016-03-15 5.47 0.322
b 2016-03-15 2.25 0.132
c 2016-03-15 9.28 0.546
=====
And after posting, found a much nicer way to generate the times:
times <- as.character( seq( as.Date("2016-03-01"), as.Date("2016-05-15"), by = '2 weeks' ) )
Here's an approach that combines the cut.POSIXt() S3 specific with a nested data.table aggregation.
## define data
library(data.table);
dt <- data.table(date=as.POSIXct(c('2016-02-20 09:11:00','2016-03-06 02:38:00','2016-03-10 15:20:00','2016-03-10 21:20:00','2016-03-11 11:51:00','2016-03-12 01:19:00','2016-03-22 14:39:00','2016-03-23 11:37:00','2016-03-25 17:38:00','2016-03-26 01:24:00','2016-03-26 12:40:00','2016-04-12 10:28:00')),status=c('a','c','b','a','b','c','c','b','c','c','a','c'));
## solution
dt[,{ n1 <- .N; .SD[,.(pct=.N/n1*100),.(status)]; },.(month=cut(df$date,'month'))];
## month status pct
## 1: 2016-02-01 a 100
## 2: 2016-03-01 c 50
## 3: 2016-03-01 b 30
## 4: 2016-03-01 a 20
## 5: 2016-04-01 c 100

Map a list of events (instants) to a list of periods (intervals) in R (with or without lubridate)

I have two data frames. One containing time periods marked with character unique IDs and another containing events with another set of unique IDs associated with them
Period DF (code):
periodID <- c("P_UID_00", "P_UID_01", "P_UDI_02", "P_UID_03")
periodStart <- as.POSIXct(c("2016/02/10 19:00", "2016/02/11 19:00",
"2016/02/12 19:00", "2016/02/13 19:00"))
periodEnd <- as.POSIXct(c("2016/02/10 21:00", "2016/02/11 21:00",
"2016/02/12 21:00", "2016/02/13 21:00"))
periodDF <- data.frame(periodID, periodStart, periodEnd)
Period DF:
periodID periodStart periodEnd
1 P_UID_00 2016-02-10 19:00:00 2016-02-10 21:00:00
2 P_UID_01 2016-02-11 19:00:00 2016-02-11 21:00:00
3 P_UDI_02 2016-02-12 19:00:00 2016-02-12 21:00:00
4 P_UID_03 2016-02-13 19:00:00 2016-02-13 21:00:00
Event DF (code):
eventID <- c("E_UID_00", "E_UID_01", "E_UDI_02", "E_UID_03")
eventTime <- as.POSIXct(c("2016/02/09 19:55:01", "2016/02/11 19:12:01",
"2016/02/11 20:22:01", "2016/02/15 19:00:01"))
eventDF <- data.frame(eventID, eventTime)
Event DF:
eventID eventTime
1 E_UID_00 2016-02-09 19:55:01
2 E_UID_01 2016-02-11 19:12:01
3 E_UDI_02 2016-02-11 20:22:01
4 E_UID_03 2016-02-15 19:00:01
I want to to map the event times in second DF to the time periods in the first DF in order to match the ID of the event to the ID of the period. Essentially the result table I want to see should look like:
eventID periodID
1 E_UID_00 NA
2 NA P_UID_00
3 E_UID_01 P_UID_01
4 E_UDI_02 P_UID_01
5 NA P_UID_02
6 NA P_UID_03
7 E_UID_03 NA
I suppose this can be achieved by using lubricate to transform the start and end cloumns in the first DF to intervals and the use some form of apply and instant %within% interval combination, but I am not really familiar with lubridate and did not manage to produce a working code
Additional considerations:
- periods are completely arbitrary and can last from seconds to years
- periods never overlap, so this is not an issue
- more than one event could be associated with a time period
- it is possible for DFs to contain unassociatable events and time periods
- the solution must not include loops
- does not have to be solved with lubridate, in fact a solution with the base R will be even more welcome.
I actually managed to come up with the code that produces exactly what I wanted using lubridate. So if anyone knows how to do this in base OR simply a better way than the one suggested below, sharing this will be greatly appreciated!
First off, the start and end times in the period DF should be converted to lubridate intervals:
intervalsP <- as.interval(periodStart, periodEnd)
Step 2: A function should be created for checking if an instant is located within a list of intervals. The only reason I have created a separate function is to be able using it with apply:
PeriodAssign <- function(x, y){
# x - instants
# y - intervals
variable1 <- mapply(`%within%`, x, y)
if (length(y[variable1]) != 0) {
as.character(y[variable1])
} else {
NA
}
}
NOTE: I had to use the interval to character coercion, because otherwise intervals were coerced to their length in seconds by the apply function and as such being not really useful for matching purposes - i.e. all four intervals in this example are the same length
Step 3: The function can the be used on the event DF and both DFs can then be merged to produce the DF I was looking for:
eventDF$intervals <- lapply(eventTime, PeriodAssign, intervalsP)
periodDF$intervals <- as.character(intervalsP)
mergedDF <- merge(periodDF, eventDF, by = "intervals")
presentableDF <- mergedDF[, c(2, 5)]
# adding in the unmatched Periods and Evenets
tDF1 <- data.frame(periodDF[!(periodDF$periodID %in% presentableDF$periodID), 1], NA)
colnames(tDF1) <- c("periodID", "eventID")
presentableDF <- rbind(presentableDF, tDF1)
tDF2 <- data.frame(NA, eventDF[!(eventDF$eventID %in% presentableDF$eventID), 1])
colnames(tDF2) <- c("periodID", "eventID")
presentableDF <- rbind(presentableDF, tDF2)
presentableDF <- presentableDF[order(presentableDF[,1]),]
The eventual DF looks like:
> presentableDF
periodID eventID
3 P_UID_00 <NA>
1 P_UID_01 E_UID_01
2 P_UID_01 E_UDI_02
4 P_UID_02 <NA>
5 P_UID_03 <NA>
6 <NA> E_UID_00
7 <NA> E_UID_03

R: Efficiently subsetting dataframe based on time of day

I have a large (150,000x7) dataframe that I intend to use for back-testing and real-time analysis of a financial market. The data represents the condition of an investment vehicle at 5 minute intervals (although holes do exist). It looks like this (but much longer):
pTime Time Price M1 M2 M3 M4
1 1212108300 20:45:00 1.5518 12.21849 -0.37125 4.50549 -31.00559
2 1212108900 20:55:00 1.5516 11.75350 -0.81792 -1.53846 -32.12291
3 1212109200 21:00:00 1.5512 10.75070 -1.47438 -8.24176 -34.35754
4 1212109500 21:05:00 1.5514 10.23529 -1.06044 -8.46154 -33.24022
5 1212109800 21:10:00 1.5514 9.74790 -1.02759 -10.21978 -33.24022
6 1212110100 21:15:00 1.5513 9.31092 -1.17076 -11.97802 -33.79888
7 1212110400 21:20:00 1.5512 8.84034 -1.28428 -13.62637 -34.35754
8 1212110700 21:25:00 1.5509 8.07843 -1.63715 -18.24176 -36.03352
9 1212111000 21:30:00 1.5509 7.39496 -1.49198 -20.65934 -36.03352
10 1212111300 21:35:00 1.5512 7.65266 -1.03717 -18.57143 -34.35754
The data is pre-loaded into R, but during my back-test I need to subset it by two criteria:
The first criteria is a sliding window to avoid peeking into the future. The window must be such that, each new 5 minute interval on the back-test shifts the whole window into the future by 5 minutes. This part I can do like this:
require(zoo)
zooser <- zoo(x=tser$Close, order.by=as.POSIXct(tser$pTime, origin="1970-01-01"))
window(zooser, start=A, end=B)
The second criteria is another sliding window, but one that slides through time of day and contains only those entries that are within N minutes of the input time on any given day.
Example: If the window's size is 2 hours, and the input time is 12:00PM then the window must contain all rows with Time between 10:00AM and 2:00PM
This is the part that I am having trouble figuring out.
Edit: My data has holes in it, two consecutive rows could be MORE than 5 minutes apart. The data looks like this (very zoomed in)
As the window moves through these gaps the number of points inside the windows should vary.
The following is my MySQL code that does what I want to do in R (same table structure):
SET #qTime = Time(FROM_UNIXTIME(SAMP_endTime));
SET #inc = -1;
INSERT INTO MetIndListBuys (pTime,ArrayPos,M1,M2,M3,M4)
SELECT pTime,#inc:=#inc+1,M1,M2,M3,M4
FROM mergebuys USE INDEX (`y`) WHERE pTime BETWEEN SAMP_startTime AND SAMP_endTime
AND TIME_TO_SEC(TIMEDIFF(Time,#qTime))/3600 BETWEEN 0-HourSpan AND HourSpan
;
Say that you have your target time t0 on the same scale as pTime: seconds since epoch. Then t0 - pTime = (difference in the number of days since epoch between the two) + (difference in remaining seconds). Taking t0 - pTime %% (num. seconds per day) will leave us with the difference in seconds in clock arithmetic (wrapped around if the difference is negative). This suggests the following function:
SecondsPerDay <- 24 * 60 * 60
within <- function(d, t0Sec, wMin) {
diff <- (d$pTime - t0Sec) %% SecondsPerDay
wSec <- 60 * wMin
return(d[diff < wSec | diff > (SecondsPerDay - wSec), ])
}
1) If DF is the data frame shown in the question then create a zoo object from it as you have done and split it into days giving zs. Then lapply your function f to each successive set of w points in each component (i.e. in each day). For example, if you want to apply your function to 2 hours of data at a time and your data is regularly spaced 5 minute data then w = 24 (since there are 24 five minute periods in two hours). In such a case f would be passed 24 rows of data as a matrix each time its called. Also align has been set to "right" below but it can alternately be set to align="center" and the condition giving ix can be changed to double sided, etc. For more on rollapply see: ?rollapply
library(zoo)
z <- zoo(DF[-2], as.POSIXct(DF[,1], origin = "1970-01-01"))
w <- 3 # replace this with 24 to handle two hours at a time with five min data
f <- function(x) {
tt <- x[, 1]
ix <- tt[w] - tt <= w * 5 * 60 # RHS converts w to seconds
x <- x[ix, -1]
sum(x) # replace sum with your function
}
out <- rollapply(z, w, f, by.column = FALSE, align = "right")
Using the data frame in the question we get this:
> out
$`2008-05-30`
2008-05-30 02:00:00 2008-05-30 02:05:00 2008-05-30 02:10:00 2008-05-30 02:15:00
-66.04703 -83.92148 -95.93558 -100.24924
2008-05-30 02:20:00 2008-05-30 02:25:00 2008-05-30 02:30:00 2008-05-30 02:35:00
-108.15038 -121.24519 -134.39873 -140.28436
By the way, be sure to read this post .
2) This could alternately be done as the following where w and f are as above:
n <- nrow(DF)
m <- as.matrix(DF[-2])
sapply(w:n, function(i) { m <- m[seq(length = w, to = i), ]; f(m) })
Replace the sapply with lapply if needed. Also this may seem shorter than the first solution but its not much different once you add the code to define f and w (which appear in the first but not the second).
If there are no holes during the day and only holes between days then these solutions could be simplified.

Resources