merge data tables by time intervals overlap - r

Suppose I have two tables. One with appointments and second with receptions. Each table has filial ID, medic ID, start and end time (plan for appointments and fact for receptions) and some other data. I want to count how much of appointments have receptions inside time interval of appointment period. Reception fact can begin before appointment start time, after, it can be inside app. interval, etc.
Below I made two tables. One for appointments and one for receptions. I wrote nested loop but it works very slow. My tables contains approximately 50 mio rows each. I need fast solution for this problem. How can I do this without loop? Thanks in advance!
library(data.table)
date <- as.POSIXct('2015-01-01 14:30:00')
# appointments data table
app <- data.table(med.id = 1:10,
filial.id = rep(c(100,200), each = 5),
start.time = rep(seq(date, length.out = 5, by = "hours"),2),
end.time = rep(seq(date+3599, length.out = 5, by = "hours"),2),
A = rnorm(10))
# receptions data table
re <- data.table(med.id = c(1,11,3,4,15,6,7),
filial.id = c(rep(100, 5), 200,200),
start.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('14:25:00', '14:25:00','16:32:00', '17:25:00', '16:10:00', '15:35:00','15:50:00'))),
end.time = as.POSIXct(paste(rep('2015-01-01 ',7), c('15:25:00', '15:20:00','17:36:00', '18:40:00', '16:10:00', '15:49:00','16:12:00'))),
B = rnorm(7))
app$count <- 0
for (i in 1:dim(app)[1]){
for (j in 1:dim(re)[1]){
if ((app$med.id[i] == re$med.id[j]) & # med.id is equal and
app$filial.id[i] == re$filial.id[j]) { # filial.id is equal
if ((re$start.time[j] < app$start.time[i]) & (re$end.time[j] > app$start.time[i])) { # reception starts before appointment start time and ends after appointment start time OR
app$count[i] <- app$count[i] + 1
} else if ((re$start.time[j] < app$end.time[i]) & (re$start.time[j] > app$start.time[i])) { # reception starts before appointment end time and after app. start time
app$count[i] <- app$count[i] + 1
}
}
}
}

Using foverlaps():
setkey(re, med.id, filial.id, start.time, end.time)
olaps = foverlaps(app, re, which=TRUE, nomatch=0L)[, .N, by=xid]
app[, count := 0L][olaps$xid, count := olaps$N]
app
# med.id filial.id start.time end.time A count
# 1: 1 100 2015-01-01 14:30:00 2015-01-01 15:29:59 0.60878560 1
# 2: 2 100 2015-01-01 15:30:00 2015-01-01 16:29:59 -0.11545284 0
# 3: 3 100 2015-01-01 16:30:00 2015-01-01 17:29:59 0.68992084 1
# 4: 4 100 2015-01-01 17:30:00 2015-01-01 18:29:59 0.04703938 1
# 5: 5 100 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.95315419 0
# 6: 6 200 2015-01-01 14:30:00 2015-01-01 15:29:59 0.26193554 0
# 7: 7 200 2015-01-01 15:30:00 2015-01-01 16:29:59 1.55206077 1
# 8: 8 200 2015-01-01 16:30:00 2015-01-01 17:29:59 0.44517362 0
# 9: 9 200 2015-01-01 17:30:00 2015-01-01 18:29:59 0.11475881 0
# 10: 10 200 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.66139828 0
PS: please go through the vignettes and learn to use data tables effectively.

I actually don't think you need to merge by time overlap at all: your code is actually merging by med.id and filial.id then performing a simple comparison.
First, for clarity, let's rename the start.time and end.time fields:
setnames(app, c("start.time", "end.time"), c("app.start.time", "app.end.time"))
setnames(re, c("start.time", "end.time"), c("re.start.time", "re.end.time"))
You should then merge the two data.tables on the keys med.id and filial.id, like this:
app_re <- re[app, on=c("med.id", "filial.id")]
# med.id filial.id re.start.time re.end.time B
# 1: 1 100 2015-01-01 14:25:00 2015-01-01 15:25:00 0.4307760
# 2: 2 100 <NA> <NA> NA
# 3: 3 100 2015-01-01 16:32:00 2015-01-01 17:36:00 -1.2933755
# 4: 4 100 2015-01-01 17:25:00 2015-01-01 18:40:00 -1.2374469
# 5: 5 100 <NA> <NA> NA
# 6: 6 200 2015-01-01 15:35:00 2015-01-01 15:49:00 -0.8054822
# 7: 7 200 2015-01-01 15:50:00 2015-01-01 16:12:00 2.5742241
# 8: 8 200 <NA> <NA> NA
# 9: 9 200 <NA> <NA> NA
# 10: 10 200 <NA> <NA> NA
# app.start.time app.end.time A
# 1: 2015-01-01 14:30:00 2015-01-01 15:29:59 -0.26828337
# 2: 2015-01-01 15:30:00 2015-01-01 16:29:59 0.24246341
# 3: 2015-01-01 16:30:00 2015-01-01 17:29:59 1.55824948
# 4: 2015-01-01 17:30:00 2015-01-01 18:29:59 1.25829302
# 5: 2015-01-01 18:30:00 2015-01-01 19:29:59 1.14244558
# 6: 2015-01-01 14:30:00 2015-01-01 15:29:59 -0.41234563
# 7: 2015-01-01 15:30:00 2015-01-01 16:29:59 0.07710022
# 8: 2015-01-01 16:30:00 2015-01-01 17:29:59 -1.46421985
# 9: 2015-01-01 17:30:00 2015-01-01 18:29:59 1.21682394
# 10: 2015-01-01 18:30:00 2015-01-01 19:29:59 1.11197318
You can then create your count variable with the same conditions as before:
app_re[, count :=
as.numeric(re.start.time < app.start.time & re.end.time > app.start.time) |
(re.start.time < app.end.time & re.start.time > app.start.time)]
# Convert the NAs to 0
app_re[, count := ifelse(is.na(count), 0, count)]
This should be much faster than the for loops.

Related

R: do calculation for each factor level separately, then calculate min/mean/max over levels

So I do have the output of a water distribution model, which is inflow and discharge values of a river for every hour. I have done 5 model runs
reproducible example:
df <- data.frame(rep(seq(
from=as.POSIXct("2012-1-1 0:00", tz="UTC"),
to=as.POSIXct("2012-1-1 23:00", tz="UTC"),
by="hour"
),5),
as.factor(c(rep(1,24),rep(2,24),rep(3,24), rep(4,24),rep(5,24))),
rep(seq(1,300,length.out=24),5),
rep(seq(1,180, length.out=24),5) )
colnames(df)<-c("time", "run", "inflow", "discharge")
In reality, of course, the values for the runs are varying. (And I do have a lot of more data, as I do have 100 runs and hourly values of 35 years).
So, at first I would like to calculate a water scarcity factor for every run, which means I need to calculate something like (1 - (discharge / inflow of 6 hours before)), as the water needs 6 hours to run through the catchment.
scarcityfactor <- 1 - (discharge / lag(inflow,6))
And then I want to calculate to a mean, max and min of scarcity factors over all runs (to find out the highest, the lowest and mean value of scarcity that could happen at every time step; according to the different model runs). So I would say, I could just calculate a mean, max and min for every time step:
f1 <- function(x) c(Mean = (mean(x)), Max = (max(x)), Min = (min(x)))
results <- do.call(data.frame, aggregate(scarcityfactor ~ time,
data = df,
FUN = f1))
Can anybody help me with the code??
I believe this is what you want, if I understand the problem description correctly.
I'll use data.table:
library(data.table)
setDT(df)
# add scarcity_factor (group by run)
df[ , scarcity_factor := 1 - discharge/shift(inflow, 6L), by = run]
# group by time, excluding times for which the
# scarcity factor is missing
df[!is.na(scarcity_factor), by = time,
.(min_scarcity = min(scarcity_factor),
mean_scarcity = mean(scarcity_factor),
max_scarcity = max(scarcity_factor))]
# time min_scarcity mean_scarcity max_scarcity
# 1: 2012-01-01 06:00:00 -46.695652174 -46.695652174 -46.695652174
# 2: 2012-01-01 07:00:00 -2.962732919 -2.962732919 -2.962732919
# 3: 2012-01-01 08:00:00 -1.342995169 -1.342995169 -1.342995169
# 4: 2012-01-01 09:00:00 -0.776086957 -0.776086957 -0.776086957
# 5: 2012-01-01 10:00:00 -0.487284660 -0.487284660 -0.487284660
# 6: 2012-01-01 11:00:00 -0.312252964 -0.312252964 -0.312252964
# 7: 2012-01-01 12:00:00 -0.194826637 -0.194826637 -0.194826637
# 8: 2012-01-01 13:00:00 -0.110586011 -0.110586011 -0.110586011
# 9: 2012-01-01 14:00:00 -0.047204969 -0.047204969 -0.047204969
# 10: 2012-01-01 15:00:00 0.002210759 0.002210759 0.002210759
# 11: 2012-01-01 16:00:00 0.041818785 0.041818785 0.041818785
# 12: 2012-01-01 17:00:00 0.074275362 0.074275362 0.074275362
# 13: 2012-01-01 18:00:00 0.101356965 0.101356965 0.101356965
# 14: 2012-01-01 19:00:00 0.124296675 0.124296675 0.124296675
# 15: 2012-01-01 20:00:00 0.143977192 0.143977192 0.143977192
# 16: 2012-01-01 21:00:00 0.161047028 0.161047028 0.161047028
# 17: 2012-01-01 22:00:00 0.175993343 0.175993343 0.175993343
# 18: 2012-01-01 23:00:00 0.189189189 0.189189189 0.189189189
You can be a tad more concise by lapplying over different aggregators:
df[!is.na(scarcity_factor), by = time,
lapply(list(min, mean, max), function(f) f(scarcity_factor))]
Lastly you could think of this as reshaping with aggregation and use dcast:
dcast(df, time ~ ., value.var = 'scarcity_factor',
fun.aggregate = list(min, mean, max))
(use df[!is.na(scarcity_factor)] in the first argument of dcast if you want to exclude the meaningless rows)
library(tidyverse)
df %>%
group_by(run) %>%
mutate(scarcityfactor = 1 - discharge / lag(inflow,6)) %>%
group_by(time) %>%
summarise(Mean = mean(scarcityfactor),
Max = max(scarcityfactor),
Min = min(scarcityfactor))
# # A tibble: 24 x 4
# time Mean Max Min
# <dttm> <dbl> <dbl> <dbl>
# 1 2012-01-01 00:00:00 NA NA NA
# 2 2012-01-01 01:00:00 NA NA NA
# 3 2012-01-01 02:00:00 NA NA NA
# 4 2012-01-01 03:00:00 NA NA NA
# 5 2012-01-01 04:00:00 NA NA NA
# 6 2012-01-01 05:00:00 NA NA NA
# 7 2012-01-01 06:00:00 -46.7 -46.7 -46.7
# 8 2012-01-01 07:00:00 -2.96 -2.96 -2.96
# 9 2012-01-01 08:00:00 -1.34 -1.34 -1.34
#10 2012-01-01 09:00:00 -0.776 -0.776 -0.776
# # ... with 14 more rows

R: time series monthly max adjusted by group

I have a df like that (head):
date Value
1: 2016-12-31 169361280
2: 2017-01-01 169383153
3: 2017-01-02 169494585
4: 2017-01-03 167106852
5: 2017-01-04 166750164
6: 2017-01-05 164086438
I would like to calculate a ratio, for that reason I need the max of every period. The max it´s normally the last day of the month but sometime It could be some days after and before (28,29,30,31,01,02).
In order to calculate it properly I would like to assign to my reference date (the last day of the month) the max value of this group of days to be sure that the ratio reflects what it supossed to.
This could be a reproducible example:
Start<-as.Date("2016-12-31")
End<-Sys.Date()
window<-data.table(seq(Start,End,by='1 day'))
dt<-cbind(window,rep(rnorm(nrow(window))))
colnames(dt)<-c("date","value")
# Create a Dateseq
DateSeq <- function(st, en, freq) {
st <- as.Date(as.yearmon(st))
en <- as.Date(as.yearmon(en))
as.Date(as.yearmon(seq(st, en, by = paste(as.character(12/freq),
"months"))), frac = 1)
}
# df to be fulfilled with the group max.
Value.Max.Month<-data.frame(DateSeq(Start,End,12))
colnames(Value.Max.Month)<-c("date")
date
1 2016-12-31
2 2017-01-31
3 2017-02-28
4 2017-03-31
5 2017-04-30
6 2017-05-31
7 2017-06-30
8 2017-07-31
9 2017-08-31
10 2017-09-30
11 2017-10-31
12 2017-11-30
13 2017-12-31
14 2018-01-31
15 2018-02-28
16 2018-03-31
You could use data.table:
library(lubridate)
library(zoo)
Start <- as.Date("2016-12-31")
End <- Sys.Date()
window <- data.table(seq(Start,End,by='1 day'))
dt <- cbind(window,rep(rnorm(nrow(window))))
colnames(dt) <- c("date","value")
dt <- data.table(dt)
dt[,period := as.Date(as.yearmon(date)) %m+% months(1) - 1,][, maximum:=max(value), by=period][, unique(maximum), by=period]
In the first expression we create a new column called period. Then we group by this new column and look for the maximum in value. In the last expression we just output these unique rows.
Notice that to get the last day of each period we add one month using lubridate and then substract 1 day.
The output is:
period V1
1: 2016-12-31 -0.7832116
2: 2017-01-31 2.1988660
3: 2017-02-28 1.6644812
4: 2017-03-31 1.2464980
5: 2017-04-30 2.8268820
6: 2017-05-31 1.7963104
7: 2017-06-30 1.3612476
8: 2017-07-31 1.7325457
9: 2017-08-31 2.7503439
10: 2017-09-30 2.4369036
11: 2017-10-31 2.4544802
12: 2017-11-30 3.1477730
13: 2017-12-31 2.8461506
14: 2018-01-31 1.8862944
15: 2018-02-28 1.8946470
16: 2018-03-31 0.7864341

Count time stamps in different time intervals - issue with interval which spans midnight

I have a dataframe ("observations") with time stamps in H:M format ("Time"). In a second dataframe ("intervals"), I have time ranges defined by "From" and "Till" variables, also in H:M format.
I want to count number of observations which falls within each interval. I have been using between from data.table, which has been working without any problem when dates are included.
However, now I only have time stamps, without date. This causes some problems for the times which occurs in the interval which spans midnight (20:00 - 05:59). These times are not counted in the code I have tried.
Example below
interval.data <- data.frame(From = c("14:00", "20:00", "06:00"), Till = c("19:59", "05:59", "13:59"), stringsAsFactors = F)
observations <- data.frame(Time = c("14:32", "15:59", "16:32", "21:34", "03:32", "02:00", "00:00", "05:57", "19:32", "01:32", "02:22", "06:00", "07:50"), stringsAsFactors = F)
interval.data
# From Till
# 1: 14:00:00 19:59:00
# 2: 20:00:00 05:59:00 # <- interval including midnight
# 3: 06:00:00 13:59:00
observations
# Time
# 1: 14:32:00
# 2: 15:59:00
# 3: 16:32:00
# 4: 21:34:00 # Row 4-8 & 10-11 falls in 'midnight interval', but are not counted
# 5: 03:32:00 #
# 6: 02:00:00 #
# 7: 00:00:00 #
# 8: 05:57:00 #
# 9: 19:32:00
# 10: 01:32:00 #
# 11: 02:22:00 #
# 12: 06:00:00
# 13: 07:50:00
library(data.table)
library(plyr)
adply(interval.data, 1, function(x, y) sum(y[, 1] %between% c(x[1], x[2])), y = observations)
# From Till V1
# 1 14:00 19:59 4
# 2 20:00 05:59 0 # <- zero counts - wrong!
# 3 06:00 13:59 2
One approach is to use a non-equi join in data.table, and their helper function as.ITime for working with time strings.
You'll have an issue with the interval that spans midnight, but, there should only ever be one of those. And as you're interested in the number of observations per 'group' of intervals, you can treat this group as the equivalent of the 'Not' of the others.
For example, first convert your data.frame to data.table
library(data.table)
## set your data.frames as `data.table`
setDT(interval.data)
setDT(observations)
Then use as.ITime to convert to an integer representation of time
## convert time stamps
interval.data[, `:=`(FromMins = as.ITime(From),
TillMins = as.ITime(Till))]
observations[, TimeMins := as.ITime(Time)]
## you could combine this step with the non-equi join directly, but I'm separating it for clarity
You can now use a non-equi join to find the interval that each time falls within. Noting that those times that reutrn 'NA' are actually those that fall inside the midnight-spanning interval
interval.data[
observations
, on = .(FromMins <= TimeMins, TillMins > TimeMins)
]
# From Till FromMins TillMins Time
# 1: 14:00 19:59 872 872 14:32
# 2: 14:00 19:59 959 959 15.59
# 3: 14:00 19:59 992 992 16:32
# 4: NA NA 1294 1294 21:34
# 5: NA NA 212 212 03:32
# 6: NA NA 120 120 02:00
# 7: NA NA 0 0 00:00
# 8: NA NA 357 357 05:57
# 9: 14:00 19:59 1172 1172 19:32
# 10: NA NA 92 92 01:32
# 11: NA NA 142 142 02:22
# 12: 06:00 13:59 360 360 06:00
# 13: 06:00 13:59 470 470 07:50
Then to get the number of observatins for the groups of intervals, you just .N grouped by each time point, which can just be chained onto the end of the above statement
interval.data[
observations
, on = .(FromMins <= TimeMins, TillMins > TimeMins)
][
, .N
, by = .(From, Till)
]
# From Till N
# 1: 14:00 19:59 4
# 2: NA NA 7
# 3: 06:00 13:59 2
Where the NA group corresponds to the one that spans midnight
I just tweaked your code to get the desired result. Hope this helps!
adply(interval.data, 1, function(x, y)
if(x[1] > x[2]) return(sum(y[, 1] %between% c(x[1], 23:59), y[, 1] %between% c(00:00, x[2]))) else return(sum(y[, 1] %between% c(x[1], x[2]))), y = observations)
Output is:
From Till V1
1 14:00 19:59 4
2 20:00 05:59 7
3 06:00 13:59 2

Alter values in one data frame based on comparison values in another in R

I am trying to subtract one hour to date/times within a POSIXct column that are earlier than or equal to a time stated in a different comparison dataframe for that particular ID.
For example:
#create sample data
Time<-as.POSIXct(c("2015-10-02 08:00:00","2015-11-02 11:00:00","2015-10-11 10:00:00","2015-11-11 09:00:00","2015-10-24 08:00:00","2015-10-27 08:00:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,01,02,02,03,03)
data<-data.frame(Time,ID)
Which produces this:
Time ID
1 2015-10-02 08:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 10:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 08:00:00 3
6 2015-10-27 08:00:00 3
I then have another dataframe with a key date and time for each ID to compare against. The Time in data should be compared against Comparison in ComparisonData for the particular ID it is associated with. If the Time value in data is earlier than or equal to the comparison value one hour should be subtracted from the value in data:
#create sample comparison data
Comparison<-as.POSIXct(c("2015-10-29 08:00:00","2015-11-02 08:00:00","2015-10-26 08:30:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,02,03)
ComparisonData<-data.frame(Comparison,ID)
This should look like this:
Comparison ID
1 2015-10-29 08:00:00 1
2 2015-11-02 08:00:00 2
3 2015-10-26 08:30:00 3
In summary, the code should check all times of a certain ID to see if any are earlier than or equal to the value specified in ComparisonData and if they are, subtract one hour. This should give this data frame as an output:
Time ID
1 2015-10-02 07:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 09:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 07:00:00 3
6 2015-10-27 08:00:00 3
I have looked at similar solutions such as this but I cannot understand how to also check the times using the right timing with that particular ID.
I think ddply seems quite a promising option but I'm not sure how to use it for this particular problem.
Here's a quick and efficient solution using data.table. First we join the two data sets by ID and then just modify the Times which are lower or equal to Comparison
library(data.table) # v1.9.6+
setDT(data)[ComparisonData, end := i.Comparison, on = "ID"]
data[Time <= end, Time := Time - 3600L][, end := NULL]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
Alternatively, we could do this in one step while joining using ifelse (not sure how efficient this though)
setDT(data)[ComparisonData,
Time := ifelse(Time <= i.Comparison,
Time - 3600L, Time),
on = "ID"]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
I am sure there is going to be a better solution than this, however, I think this works.
for(i in 1:nrow(data)) {
if(data$Time[i] < ComparisonData[data$ID[i], 1]){
data$Time[i] <- data$Time[i] - 3600
}
}
# Time ID
#1 2015-10-02 07:00:00 1
#2 2015-11-02 11:00:00 1
#3 2015-10-11 09:00:00 2
#4 2015-11-11 09:00:00 2
#5 2015-10-24 07:00:00 3
#6 2015-10-27 08:00:00 3
This is going to iterate through every row in data.
ComparisonData[data$ID[i], 1] gets the time column in ComparisonData for the corresponding ID. If this is greater than the Time column in data then reduce the time by 1 hour.

How to join 2 data.tables using a time interval and a group-by

I have a data.table of frequently collected data:
set.seed(1)
t1 <- seq(from=as.POSIXct('2014-1-1'), to=as.POSIXct('2014-6-1'), by='day')
T1 <- data.table(time1=t1, group=rep(c('A', 'B'), length(t1)/2), value1=rnorm(length(t1)))
and a data.table of infrequently collected data:
t2 <- seq(from=as.POSIXct('2014-1-1'), to=as.POSIXct('2014-6-1'), by='week')
T2 <- data.table(time2=t2, group=rep(c('A', 'B'), length(t2)/2), value2='ArbitraryText')
For each row of T2 I would like to find all of the rows in T1 that fall between T2$t2 and T2$t2minus 1 week, then take the average value of T1$V2, by T2$group.
So the number of rows in the resulting table would be exactly equal to the number of rows in T2 and the "correct" value that should be returned for the second row of T2 (the average value of those T1$value that are in T1$group B and fall between Jan 1 and Jan 22) would look like this:
t2 group value1 value2
2014-01-22 00:00:00 B 0.1674069 "Arbitrary Text"
I imagine the fist step would be setting the keys for each data.table:
setkey(T1, group, time1)
setkey(T2, group, time2)
I'm unsure of how to proceed. Curiously T1[T2[time1 %between% c(t2, t2-604800)]] yields only results between Jan 1 and Jan 8, despite the default mult='all'.
EDIT: I should point out that each of the intervals (T2$time2 minus 3 weeks to T2$time2) overlap each other on purpose. This means that each row of T1 "belongs" to more than one desired average because it falls into the interval specified by more than one row of T2.
Try creating a grouping vector within T1 that is constructed using T2 breakpoints passed to the cut.POSIXt function:
T1[ , grp := cut(time1, breaks=T2[,time2]) ]
> str(T1)
Classes ‘data.table’ and 'data.frame': 151 obs. of 4 variables:
$ time1: POSIXct, format: "2014-01-01 00:00:00" "2014-01-02 00:00:00" "2014-01-03 00:00:00" ...
$ group: chr "A" "B" "A" "B" ...
$ value: num -0.626 0.184 -0.836 1.595 0.33 ...
$ grp : Factor w/ 21 levels "2014-01-01 00:00:00",..: 1 1 1 1 1 1 1 2 2 2 ...
- attr(*, ".internal.selfref")=<externalptr>
#------------------
> T1[, mean(value), by="grp"]
#----------------
grp V1
1: 2014-01-01 00:00:00 0.04475859
2: 2014-01-08 00:00:00 0.01062880
3: 2014-01-15 00:00:00 0.62024902
4: 2014-01-22 00:00:00 -0.31364304
5: 2014-01-29 00:00:00 0.02178433
6: 2014-02-05 00:00:00 0.08238828
7: 2014-02-12 00:00:00 0.12544920
8: 2014-02-19 00:00:00 0.47033820
9: 2014-02-26 00:00:00 0.29648943
10: 2014-03-05 00:00:00 0.20856893
11: 2014-03-12 01:00:00 -0.28046960
12: 2014-03-19 01:00:00 -0.22334306
13: 2014-03-26 01:00:00 0.25434429
14: 2014-04-02 01:00:00 0.48056376
15: 2014-04-09 01:00:00 -0.52624880
16: 2014-04-16 01:00:00 0.62330703
17: 2014-04-23 01:00:00 0.01092562
18: 2014-04-30 01:00:00 0.12544150
19: 2014-05-07 01:00:00 -0.15919531
20: 2014-05-14 01:00:00 -0.61236195
21: 2014-05-21 01:00:00 -0.37797879
22: NA -0.61483084
grp V1
You don't get the same number of groups as events in T2 but rather that number minus 1. I didn't use setkey since my by call was to the constructed column. If it's only a one time use, then I'm not sure its needed.

Resources