Calculate cumulative product based on date by category - r

I want to add a new column to my data.table containing the cumulative product of Data1 based on the Date. The cumulative product should be calculated for each category (Cat) and should start with the latest available Date.
Sample data:
DF = data.frame(Cat=rep(c("A","B"),each=4), Date=rep(c("01-08-2013","01-07-2013","01-04-2013","01-03-2013"),2), Data1=c(1:8))
DF$Date = as.Date(DF$Date , "%m-%d-%Y")
DT = data.table(DF)
DT[ , Data1_cum:=NA_real_]
DT
Cat Date Data1 Data1_cum
1: A 2013-01-08 1 NA
2: A 2013-01-07 2 NA
3: A 2013-01-04 3 NA
4: A 2013-01-03 4 NA
5: B 2013-01-08 5 NA
6: B 2013-01-07 6 NA
7: B 2013-01-04 7 NA
8: B 2013-01-03 8 NA
The result should look like this:
Cat Date Data1 Data1_cum
1: A 2013-01-08 1 1
2: A 2013-01-07 2 2
3: A 2013-01-04 3 6
4: A 2013-01-03 4 24
5: B 2013-01-08 5 5
6: B 2013-01-07 6 30
7: B 2013-01-04 7 210
8: B 2013-01-03 8 1680
I figured out that I could do something similar using cumprod(), but I do not know how to handle the categories. NAs in Data1 should be ignored / treated as 1.
The real dataset has about 8 million rows and 1000 categories.

If the only looksissue is the ordering...
DT[order(Date, decreasing=TRUE), Data1_cum := cumprod(Data1), by=Cat]
DT
Cat Date Data1 Data1_cum
1: A 2013-01-08 1 1
2: A 2013-01-07 2 2
3: A 2013-01-04 3 6
4: A 2013-01-03 4 24
5: B 2013-01-08 5 5
6: B 2013-01-07 6 30
7: B 2013-01-04 7 210
8: B 2013-01-03 8 1680
However, if you have NA's to deal with, then there is a few extra steps:
Note: If you shuffle the order of the rows, your results can vary. Careful with how you implement the order(.) command
## Let's add some NA values
DT <- rbind(DT, DT)
DT[c(2, 6, 11, 15), Data1 := NA]
# shuffle the rows, to make sure this is right
set.seed(1)
DT <- DT[sample(nrow(DT))]
Assigning the cumulative product:
Leaving NA's
## If you want to leave the NA's as NA's in the cum prod, use:
DT[ , Data1_cum := NA_real_ ]
DT[ intersect(order(Date, decreasing=TRUE), which(!is.na(Data1)))
, Data1_cum := cumprod(Data1)
, by=Cat]
# View the data, orderly
DT[order(Date, decreasing=TRUE)][order(Cat)]
Cat Date Data1 Data1_cum
1: A 2013-01-08 1 1
2: A 2013-01-08 1 1
3: A 2013-01-07 2 2
4: A 2013-01-07 NA NA <~~~~~~~ Note that the NA rows have the value of the prev row
5: A 2013-01-04 3 6
6: A 2013-01-04 NA NA <~~~~~~~ Note that the NA rows have the value of the prev row
7: A 2013-01-03 4 24
8: A 2013-01-03 4 96
9: B 2013-01-08 5 5
10: B 2013-01-08 5 25
11: B 2013-01-07 6 150
12: B 2013-01-07 NA NA <~~~~~~~ Note that the NA rows have the value of the prev row
13: B 2013-01-04 7 1050
14: B 2013-01-04 NA NA <~~~~~~~ Note that the NA rows have the value of the prev row
15: B 2013-01-03 8 8400
16: B 2013-01-03 8 67200
Replacing NA's with value of previous Row
## If instead you want to treat the NA's as 1, use:
DT[order(Date, decreasing=TRUE), Data1_cum := {Data1[is.na(Data1)] <- 1; cumprod(Data1 [order(Date, decreasing=TRUE)] )}, by=Cat]
# View the data, orderly
DT[order(Date, decreasing=TRUE)][order(Cat)]
Cat Date Data1 Data1_cum
1: A 2013-01-08 1 1
2: A 2013-01-08 1 1
3: A 2013-01-07 2 2
4: A 2013-01-07 NA 2 <~~~~~~~ Rows with NA took on values of the previous Row
5: A 2013-01-04 3 6
6: A 2013-01-04 NA 6 <~~~~~~~ Rows with NA took on values of the previous Row
7: A 2013-01-03 4 24
8: A 2013-01-03 4 96
9: B 2013-01-08 5 5
10: B 2013-01-08 5 25
11: B 2013-01-07 6 150
12: B 2013-01-07 NA 150 <~~~~~~~ Rows with NA took on values of the previous Row
13: B 2013-01-04 7 1050
14: B 2013-01-04 NA 1050 <~~~~~~~ Rows with NA took on values of the previous Row
15: B 2013-01-03 8 8400
16: B 2013-01-03 8 67200
Alternatively, If you already have the cumulative product and simply want to remove the NA's you can do so as follows:
# fix the NA's with the previous value
DT[order(Date, decreasing=TRUE),
Data1_cum := {tmp <- c(0, head(Data1_cum, -1));
Data1_cum[is.na(Data1_cum)] <- tmp[is.na(Data1_cum)];
Data1_cum }
, by=Cat ]

Related

How to calculate moving average from previous rows in data.table?

I've a data like this;
library(data.table)
set.seed(1)
df <- data.table(store = sample(LETTERS[1:2],size = 10,replace = T),
week = sample(1:10),
demand = round(sample(rnorm(10,mean = 20,sd=2)),2))
random_na_index <- sample(1:nrow(df),3)
df[random_na_index,demand := NA]
setorder(df,store,week)
store week demand
1: A 3 19.18
2: A 5 NA
3: A 6 NA
4: A 8 19.55
5: A 9 20.50
6: A 10 NA
7: B 1 20.75
8: B 2 17.70
9: B 4 19.40
10: B 7 17.52
I need to calculate moving average using the 2 weeks before the current week. I couldn't do it because zoo's and data.table's frollmean uses current row also while calculating moving average. I don't also know how to handle NA's while applying a rolling function.
The desired output should look like;
store week demand desired_column
1: A 3 19.18 NA
2: A 5 NA 19.180
3: A 6 NA 19.180
4: A 8 19.55 NA
5: A 9 20.50 19.550
6: A 10 NA 20.025
7: B 1 20.75 NA
8: B 2 17.70 20.750
9: B 4 19.40 19.225
10: B 7 17.52 18.550
You could shift the values before applying frollmean with na.rm = TRUE argument:
df[order(store,week),desired:=frollmean(shift(demand),n=2,na.rm=T),by=.(store)][]
store week demand desired
<char> <int> <num> <num>
1: A 3 19.18 NA
2: A 5 NA 19.180
3: A 6 NA 19.180
4: A 8 19.55 NaN
5: A 9 20.50 19.550
6: A 10 NA 20.025
7: B 1 20.75 NA
8: B 2 17.70 20.750
9: B 4 19.40 19.225
10: B 7 17.52 18.550

creating a unique variable based on row differences of another variable considering groups

By using the data below, I want to create a new unique customer id by considering their contact date.
Rule: After every two days, I want each customer to get a new unique customer id and preserve it on the following record if the following contact date for the same customer is within the following two days if not assign a new id to this same customer.
I couldn't go any further than calculating date differences.
The original dataset I work is bigger; therefore, I prefer a data.table solution if possible.
library(data.table)
treshold <- 2
dt <- structure(list(customer_id = c('10','20','20','20','20','20','30','30','30','30','30','40','50','50'),
contact_date = as.Date(c("2019-01-05","2019-01-01","2019-01-01","2019-01-02",
"2019-01-08","2019-01-09","2019-02-02","2019-02-05",
"2019-02-05","2019-02-09","2019-02-12","2019-02-01",
"2019-02-01","2019-02-05")),
desired_output = c(1,2,2,2,3,3,4,5,5,6,7,8,9,10)),
class = "data.frame",
row.names = 1:14)
setDT(dt)
setorder(dt, customer_id, contact_date)
dt[, date_diff_in_days:=contact_date - shift(contact_date, type = c("lag")), by=customer_id]
dt[, date_diff_in_days:=as.numeric(date_diff_in_days)]
dt
customer_id contact_date desired_output date_diff_in_days
1: 10 2019-01-05 1 NA
2: 20 2019-01-01 2 NA
3: 20 2019-01-01 2 0
4: 20 2019-01-02 2 1
5: 20 2019-01-08 3 6
6: 20 2019-01-09 3 1
7: 30 2019-02-02 4 NA
8: 30 2019-02-05 5 3
9: 30 2019-02-05 5 0
10: 30 2019-02-09 6 4
11: 30 2019-02-12 7 3
12: 40 2019-02-01 8 NA
13: 50 2019-02-01 9 NA
14: 50 2019-02-05 10 4
Rule: After every two days, I want each customer to get a new unique customer id and preserve it on the following record if the following contact date for the same customer is within the following two days if not assign a new id to this same customer.
When creating a new ID, if you set up the by= vectors correctly to capture the rule, the auto-counter .GRP can be used:
thresh <- 2
dt[, g := .GRP, by=.(
customer_id,
cumsum(contact_date - shift(contact_date, fill=first(contact_date)) > thresh)
)]
dt[, any(g != desired_output)]
# [1] FALSE
I think the code above is correct since it works on the example, but you might want to check on your actual data (comparing against results from, eg, Gregor's approach) to be sure.
We use cumsum to increment whenever date_diff_in_days is NA or when the threshold is exceeded.
dt[, result := cumsum(is.na(date_diff_in_days) | date_diff_in_days > treshold)]
# customer_id contact_date desired_output date_diff_in_days result
# 1: 10 2019-01-05 1 NA 1
# 2: 20 2019-01-01 2 NA 2
# 3: 20 2019-01-01 2 0 2
# 4: 20 2019-01-02 2 1 2
# 5: 20 2019-01-08 3 6 3
# 6: 20 2019-01-09 3 1 3
# 7: 30 2019-02-02 4 NA 4
# 8: 30 2019-02-05 5 3 5
# 9: 30 2019-02-05 5 0 5
# 10: 30 2019-02-09 6 4 6
# 11: 30 2019-02-12 7 3 7
# 12: 40 2019-02-01 8 NA 8
# 13: 50 2019-02-01 9 NA 9
# 14: 50 2019-02-05 10 4 10

How to calculate moving average by specified grouping and deal with NAs

I have a data.table which needs a moving average to be calculated on the previous n days of data (let's use n=2 for simplicity, not incl. current day) for a specified grouping (ID1, ID2). The moving average should attempt to include the last 2 days of values for each ID1-ID2 pair. I would like to calculate moving average to handle NAs two separate ways:
1. Only calculate when there are 2 non-NA observations, otherwise avg should be NA (e.g. first 2 days within an ID1-ID2 will always have NAs).
2. Calculate the moving average based on any non-NA observations within the last 2 days (na.rm=TRUE ?).
I've tried to use the zoo package and various functions within it. I've settled on the following (used shift() to exclude the week considered in the avg, put dates in reverse order to highlight dates are not always ordered initially):
library(zoo)
library(data.table)
DATE = rev(rep(seq(as.Date("2018-01-01"),as.Date("2018-01-04"),"day"),4))
VALUE =seq(1,16,1)
VALUE[16] <- NA
ID1 = rep(c("A","B"),each=8)
ID2 = rep(1:2,2,each=4)
testdata = data.frame (DATE, ID1, ID2, VALUE)
setDT(testdata)[order(DATE), VALUE_AVG := shift(rollapplyr(VALUE, 2, mean,
na.rm=TRUE,fill = NA)), by = c("ID1", "ID2")]
I seem to have trouble grouping by multiple columns. Groupings where VALUE begins/ends with NA values also seem to cause issues. I'm open to any solutions which make sense within a data.table framework, especially frollmean (need to update my versions of R + data.table). I don't know if I need to order the dates differently in conjunction with a specified alignment (e.g. "right").
I would hope my output would look something like the following except ordered by oldest date first per ID1-ID2 grouping:
DATE ID1 ID2 VALUE VALUE_AVG
1: 2018-01-04 A 1 1 2.5
2: 2018-01-03 A 1 2 3.5
3: 2018-01-02 A 1 3 NA
4: 2018-01-01 A 1 4 NA
5: 2018-01-04 A 2 5 6.5
6: 2018-01-03 A 2 6 7.5
7: 2018-01-02 A 2 7 NA
8: 2018-01-01 A 2 8 NA
9: 2018-01-04 B 1 9 10.5
10: 2018-01-03 B 1 10 11.5
11: 2018-01-02 B 1 11 NA
12: 2018-01-01 B 1 12 NA
13: 2018-01-04 B 2 13 14.5
14: 2018-01-03 B 2 14 15.0
15: 2018-01-02 B 2 15 NA
16: 2018-01-01 B 2 NA NA
My code seems to roughly achieve the desired results for the sample data. Nevertheless, when trying to run the same code on large dataset for a 4-week average where ID1 and ID2 are both integers, I get the following error:
Error in seq.default(start.at, NROW(data), by = by) :
wrong sign in 'by' argument
My results seem right for most ID1-ID2 combinations but there are specific cases of ID1 where VALUE has leading and trailing NAs. I'm guessing this is causing the issue, although it hasn't for the example above.
Using shift complicates this unnecessarily. rollapply already can handle that itself. In rollapplyr specify:
a width of list(-seq(2)) to specify that it should act on offsets -1 and -2.
partial = TRUE to indicate that if there are fewer than 2 prior rows it will use whatever is there.
fill = NA to fill empty cells with NA
na.rm = TRUE to remove any NAs and only perform the mean on the remaining cells. If the prior cells are all NA then mean gives NaN.
To only consider situations where there are 2 prior non-NAs giving NA otherwise remove the partial = TRUE and na.rm = TRUE arguments.
First case
Take mean of non-NAs in prior 2 rows or fewer rows if fewer prior rows.
testdata <- data.table(DATE, ID1, ID2, VALUE, key = c("ID1", "ID2", "DATE"))
testdata[, VALUE_AVG :=
rollapplyr(VALUE, list(-seq(2)), mean, fill = NA, partial = TRUE, na.rm = TRUE),
by = c("ID1", "ID2")]
testdata
giving:
DATE ID1 ID2 VALUE VALUE_AVG
1: 2018-01-01 A 1 4 NA
2: 2018-01-02 A 1 3 4.0
3: 2018-01-03 A 1 2 3.5
4: 2018-01-04 A 1 1 2.5
5: 2018-01-01 A 2 8 NA
6: 2018-01-02 A 2 7 8.0
7: 2018-01-03 A 2 6 7.5
8: 2018-01-04 A 2 5 6.5
9: 2018-01-01 B 1 12 NA
10: 2018-01-02 B 1 11 12.0
11: 2018-01-03 B 1 10 11.5
12: 2018-01-04 B 1 9 10.5
13: 2018-01-01 B 2 NA NA
14: 2018-01-02 B 2 15 NaN
15: 2018-01-03 B 2 14 15.0
16: 2018-01-04 B 2 13 14.5
Second case
NA if any of the prior 2 rows are NA or if there are fewer than 2 prior rows.
testdata <- data.table(DATE, ID1, ID2, VALUE, key = c("ID1", "ID2", "DATE"))
testdata[, VALUE_AVG :=
rollapplyr(VALUE, list(-seq(2)), mean, fill = NA),
by = c("ID1", "ID2")]
testdata
giving:
DATE ID1 ID2 VALUE VALUE_AVG
1: 2018-01-01 A 1 4 NA
2: 2018-01-02 A 1 3 NA
3: 2018-01-03 A 1 2 3.5
4: 2018-01-04 A 1 1 2.5
5: 2018-01-01 A 2 8 NA
6: 2018-01-02 A 2 7 NA
7: 2018-01-03 A 2 6 7.5
8: 2018-01-04 A 2 5 6.5
9: 2018-01-01 B 1 12 NA
10: 2018-01-02 B 1 11 NA
11: 2018-01-03 B 1 10 11.5
12: 2018-01-04 B 1 9 10.5
13: 2018-01-01 B 2 NA NA
14: 2018-01-02 B 2 15 NA
15: 2018-01-03 B 2 14 NA
16: 2018-01-04 B 2 13 14.5
Maybe something like:
setorder(setDT(testdata), ID1, ID2, DATE)
testdata[order(DATE), VALUE_AVG := shift(
rollapplyr(VALUE, 2L, function(x) if(sum(!is.na(x)) > 0L) mean(x, na.rm=TRUE), fill = NA_real_)
), by = c("ID1", "ID2")]

Match dates from list of data frames in R

I have a list of 100+ time series dataframes my.list with daily observations for each product in its own data frame. Some values are NA without any record of the date. I would like to update each data frame in this list to show the date and NA if it does not have a record on this date.
Dates:
start = as.Date('2016/04/08')
full <- seq(start, by='1 days', length=10)
Sample Time Series Data:
d1 <- data.frame(Date = seq(start, by ='2 days',length=5), Sales = c(5,10,15,20,25))
d2 <- data.frame(Date = seq(start, by= '1 day', length=10),Sales = c(1, 2, 3,4,5,6,7,8,9,10))
my.list <- list(d1, d2)
I want to merge all full date values into each data frame, and if no match exists then sales is NA:
my.list
[[d1]]
Date Sales
2016-04-08 5
2016-04-09 NA
2016-04-10 10
2016-04-11 NA
2016-04-12 15
2016-04-13 NA
2016-04-14 20
2016-04-15 NA
2016-04-16 25
2016-04-17 NA
[[d2]]
Date Sales
2016-04-08 1
2016-04-09 2
2016-04-10 3
2016-04-11 4
2016-04-12 5
2016-04-13 6
2016-04-14 7
2016-04-15 8
2016-04-16 9
2016-04-17 10
If I understand correctly, the OP wants to update each of the dataframes in my.list to contain one row for each date given in the vector of dates full
Base R
In base R, merge() can be used as already mentioned by Hack-R. However, th answer below expands this to work on all dataframes in the list:
# creat dataframe from vector of full dates
full.df <- data.frame(Date = full)
# apply merge on each dataframe in the list
lapply(my.list, merge, y = full.df, all.y = TRUE)
[[1]]
Date Sales
1 2016-04-08 5
2 2016-04-09 NA
3 2016-04-10 10
4 2016-04-11 NA
5 2016-04-12 15
6 2016-04-13 NA
7 2016-04-14 20
8 2016-04-15 NA
9 2016-04-16 25
10 2016-04-17 NA
[[2]]
Date Sales
1 2016-04-08 1
2 2016-04-09 2
3 2016-04-10 3
4 2016-04-11 4
5 2016-04-12 5
6 2016-04-13 6
7 2016-04-14 7
8 2016-04-15 8
9 2016-04-16 9
10 2016-04-17 10
Caveat
The answer assumes that full covers the overall range of Date of all dataframes in the list.
In order to avoid any mishaps, the overall range of Date can be retrieved from the available data in my.list:
overall_date_range <- Reduce(range, lapply(my.list, function(x) range(x$Date)))
full <- seq(overall_date_range[1], overall_date_range[2], by = "1 days")
Using rbindlist()
Alternatively, the list of dataframes which are identical in structure can be stored in a large dataframe. An additional attribute indicates to which product each row belongs to. The homogeneous structure simplifies subsequent operations.
The code below uses the rbindlist() function from the data.table package to create a large data.table. CJ() (cross join) creates all combinations of dates and product id which is then merged / joined to fill in the missing dates:
library(data.table)
all_products <- rbindlist(my.list, idcol = "product.id")[
CJ(product.id = unique(product.id), Date = seq(min(Date), max(Date), by = "1 day")),
on = .(Date, product.id)]
all_products
product.id Date Sales
1: 1 2016-04-08 5
2: 1 2016-04-09 NA
3: 1 2016-04-10 10
4: 1 2016-04-11 NA
5: 1 2016-04-12 15
6: 1 2016-04-13 NA
7: 1 2016-04-14 20
8: 1 2016-04-15 NA
9: 1 2016-04-16 25
10: 1 2016-04-17 NA
11: 2 2016-04-08 1
12: 2 2016-04-09 2
13: 2 2016-04-10 3
14: 2 2016-04-11 4
15: 2 2016-04-12 5
16: 2 2016-04-13 6
17: 2 2016-04-14 7
18: 2 2016-04-15 8
19: 2 2016-04-16 9
20: 2 2016-04-17 10
Subsequent operations can be grouped by product.id, e.g., to determine the number of valid sales data for each product:
all_products[!is.na(Sales), .(valid.sales.data = .N), by = product.id]
product.id valid.sales.data
1: 1 5
2: 2 10
Or, the totals sales per product:
all_products[, .(total.sales = sum(Sales, na.rm = TRUE)), by = product.id]
product.id total.sales
1: 1 75
2: 2 55
If required for some reason the result can be converted back to a list by
split(all_products, by = "product.id")

Rolling joins: roll forwards and backwards

data.table is awesome, because I can do rolling joins, and even do rolling joins within groups!
library(data.table)
set.seed(42)
metrics <- data.frame(
ID=c(rep(1, 10), rep(2,5), rep(3,5)),
Time=c(1:10, 4:8, 8:12),
val1=runif(20),
val2=runif(20),
val3=runif(20),
val4=runif(20)
)
metrics <- data.table(metrics[sample(1:nrow(metrics), 15),], key=c('ID', 'Time'))
calendar <- data.table(expand.grid(ID=1:3, Time=1:12), key=c('ID', 'Time'))
metrics[calendar,roll=TRUE]
However, this isn't awesome enough for me. This data.table still has NAs:
> metrics[calendar,roll=TRUE]
ID Time val1 val2 val3 val4
1: 1 1 0.9148060 0.9040314 0.3795592 0.675607275
2: 1 2 0.9370754 0.1387102 0.4357716 0.982817198
3: 1 3 0.9370754 0.1387102 0.4357716 0.982817198
4: 1 4 0.8304476 0.9466682 0.9735399 0.566488424
5: 1 5 0.8304476 0.9466682 0.9735399 0.566488424
6: 1 6 0.5190959 0.5142118 0.9575766 0.189473935
7: 1 7 0.7365883 0.3902035 0.8877549 0.271286615
8: 1 8 0.7365883 0.3902035 0.8877549 0.271286615
9: 1 9 0.6569923 0.4469696 0.9709666 0.693204820
10: 1 10 0.7050648 0.8360043 0.6188382 0.240544740
11: 1 11 0.7050648 0.8360043 0.6188382 0.240544740
12: 1 12 0.7050648 0.8360043 0.6188382 0.240544740
13: 2 1 NA NA NA NA
14: 2 2 NA NA NA NA
15: 2 3 NA NA NA NA
16: 2 4 0.4577418 0.7375956 0.3334272 0.042988796
17: 2 5 0.7191123 0.8110551 0.3467482 0.140479094
18: 2 6 0.9346722 0.3881083 0.3984854 0.216385415
19: 2 7 0.2554288 0.6851697 0.7846928 0.479398564
20: 2 8 0.2554288 0.6851697 0.7846928 0.479398564
21: 2 9 0.2554288 0.6851697 0.7846928 0.479398564
22: 2 10 0.2554288 0.6851697 0.7846928 0.479398564
23: 2 11 0.2554288 0.6851697 0.7846928 0.479398564
24: 2 12 0.2554288 0.6851697 0.7846928 0.479398564
25: 3 1 NA NA NA NA
26: 3 2 NA NA NA NA
27: 3 3 NA NA NA NA
28: 3 4 NA NA NA NA
29: 3 5 NA NA NA NA
30: 3 6 NA NA NA NA
31: 3 7 NA NA NA NA
32: 3 8 0.9400145 0.8329161 0.7487954 0.719355838
33: 3 9 0.9400145 0.8329161 0.7487954 0.719355838
34: 3 10 0.1174874 0.2076590 0.1712643 0.375489965
35: 3 11 0.4749971 0.9066014 0.2610880 0.514407708
36: 3 12 0.5603327 0.6117786 0.5144129 0.001570554
ID Time val1 val2 val3 val4
I could fill these NA's using zoo:::na.locf, fromLast=TRUE, but that's not very fun. Can anyone think of an elegant way I can roll NA's backward, (after rolling them forward), during the data.table join?
This is possible in data.table version 1.8.8 released March 2013:
metrics[calendar, roll=TRUE, rollends=c(TRUE, TRUE)]
From the data.table NEWS file:
In addition to TRUE/FALSE, 'roll' may now be a positive number (roll forwards/LOCF) or
negative number (roll backwards/NOCB). A finite number limits the distance a value is
rolled (limited staleness). roll=TRUE and roll=+Inf are equivalent.
'rollends' is a new parameter holding two logicals. The first observation is rolled
backwards if the first value of rollends is TRUE. The last observation is rolled forwards if the second value of rollends
is TRUE. If roll is a finite number, the same limit applies to the ends.
New value roll='nearest' joins to the nearest value (either backwards or forwards) when
the value falls in a gap, and to the end value according to 'rollends'.
'rolltolast' has been deprecated. For backwards compatibility it is converted to
{roll=TRUE;rollends=c(FALSE,FALSE)}.
As always, to download the most up-to-date version of data.table, see Installation.
metrics[calendar, roll = TRUE, rollends = c(TRUE, TRUE)]

Resources