R tsibble add support for custom index - r

Problem description
I work with trice monthly data a lot. Trice monthly (or roughly every 10 days, also referred to as a dekad) it is the typical reporting interval for water related data in the former Soviet Union and for many more climate/water related data sets around the world. Below is an examplary data set with 2 variables:
> date = unique(floor_date(seq.Date(as.Date("2019-01-01"), as.Date("2019-12-31"),
by="day"), "10days"))
> example_data <- tibble(
date = date[day(date)!=31],
value = seq(1,36,1),
var = "A") %>%
add_row(tibble(
date = date[day(date)!=31],
value = seq(10,360,10),
var = "B"))
> example_data
# A tibble: 72 x 3
# Groups: var [2]
date value var
<ord> <dbl> <chr>
1 2019-01-01 1 A
2 2019-01-01 10 B
3 2019-01-11 2 A
4 2019-01-11 20 B
5 2019-01-21 3 A
6 2019-01-21 30 B
7 2019-02-01 4 A
8 2019-02-01 40 B
9 2019-02-11 5 A
10 2019-02-11 50 B
# … with 62 more rows
In the example I chose the 1., 11., and 21. to date the decades but it would actually be more appropriate to index them in dekad 1 to 3 per month (analogue to months 1 to 12 per year) or in dekad 1 to 36 per year (analogue to day of the year). The most elegant solution would be to have a proper date format for dekadal data like yearmonth in lubridate. However, lubridate may not plan to do support dekadal data in the near future (github conversation).
I have workflows using tsibble and timetk which work well with monthly data but it would really be more appropriate to work with the original dekadal time steps and I'm looking for a way to be able to use the tidyverse functions with dekadal data with as few cumbersome workarounds as possible.
The problem with using daily dates for dekadal data in tsibble is that is identifies the time interval as daily and you get a lot of data gaps between your 3 values per month:
> example_data_tsbl <- as_tsibble(example_data, index = date, key = var)
> count_gaps(example_data_tsbl, .full = FALSE)
# A tibble: 70 x 4
var .from .to .n
<chr> <date> <date> <int>
1 A 2019-01-02 2019-01-10 9
2 A 2019-01-12 2019-01-20 9
3 A 2019-01-22 2019-01-31 10
# …
Here's what I did so far:
I saw here the possibility to define ordered factors as indices in tsibble but timetk does not recognise factors as indices. timetk suggests to define custom indices (see 2.).
There is the possibility to add custom indices to tsibble but I haven't found examples on this and I don't understand how I have to use these functions (a vignette is still planned). I have started reading the code to try to understand how to use the functions to get support for dekadal data but I'm a bit overwhelmed.
Questions
Will dekadal custom indices in tsibble behave similarly as the yearmonth or weekyear?
Would anyone here have an example to share on how to add custom indices to tsibble?
Or does anyone know of another way to elegantly handle dekadal data in the tidyverse?

This doesn't discuss tsibbles but it was too long for a comment and does provide an alternative.
zoo can do this either by (1) the code below which does not require the creation of a new class or (2) by creating a new class and methods. For that alternative following the methods that the yearmon class has would be sufficient. See here. zoo itself does not have to be modified.
As we see below, for the first approach dates will be shown as year(cycle) where cycle is 1, 2, ..., 36. Internally the dates are stored as year + (cycle-1)/36 .
It would also be possible to use ts class if the dates were consecutive month thirds (or if not if you don't mind having NAs inserted to make them so). For that use as.ts(z).
Start a fresh session with no packages loaded and then copy and paste the input DF shown in the Note at the end and then this code. Date2dek will convert a Date vector or a character vector representing dates in standard yyyy-mm-dd format to a dek format which is described above. dek2Date performs the inverse transformation. It is not actually used below but might be useful.
library(zoo)
# convert Date or yyyy-mm-dd char vector
Date2dek <- function(x, ...) with(as.POSIXlt(x, tz="GMT"),
1900 + year + (mon + ((mday >= 11) + (mday >= 21)) / 3) / 12)
dek2Date <- function(x, ...) { # not used below but shows inverse
cyc <- round(36 * (as.numeric(x) %% 1)) + 1
if(all(is.na(x))) return(as.Date(x))
month <- (cyc - 1) %/% 3 + 1
day <- 10 * ((cyc - 1) %% 3) + 1
year <- floor(x + .001)
ix <- !is.na(year)
as.Date(paste(year[ix], month[ix], day[ix], sep = "-"))
}
# DF given in Note below
z <- read.zoo(DF, split = "var", FUN = Date2dek, regular = TRUE, freq = 36)
z
The result is the following zooreg object:
A B
2019(1) 1 10
2019(2) 2 20
2019(3) 3 30
2019(4) 4 40
2019(5) 5 50
Note
DF <- data.frame(
date = as.Date(ISOdate(2019, rep(1:2, 3:2), c(1, 11, 21))),
value = c(1:5, 10*(1:5)),
var = rep(c("A", "B"), each = 5))

Extending tsibble to support a new index requires defining methods for these generics:
index_valid() - This method should return TRUE if the class is acceptable as an index
interval_pull() - This method accepts your index values and computes the interval of the data. The interval can be created using tsibble:::new_interval(). You may find tsibble::gcd_interval() useful for computing the smallest interval.
seq() and + - These methods are used to produce future time values using the new_data() function.
A minimal example of a new tsibble index class for 'year' is as follows:
library(tsibble)
#>
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, union
library(vctrs)
# Object creation function
my_year <- function(x = integer()) {
x <- vec_cast(x, integer())
vctrs::new_vctr(x, class = "year")
}
# Declare this class as a valid index
index_valid.year <- function(x) TRUE
# Compute the interval of a year input
interval_pull.year <- function(x) {
tsibble::new_interval(
year = tsibble::gcd_interval(vec_data(x))
)
}
# Specify how sequences are generated from years
seq.year <- function(from, to, by, length.out = NULL, along.with = NULL, ...) {
from <- vec_data(from)
if (!rlang::is_missing(to)) {
vec_assert(to, my_year())
to <- vec_data(to)
}
my_year(NextMethod())
}
# Define `+` operation as needed for `new_data()`
vec_arith.year <- function(op, x, y, ...) {
my_year(vec_arith(op, vec_data(x), vec_data(y), ...))
}
# Use the new index class
x <- tsibble::tsibble(
year = my_year(c(2018, 2020, 2024)),
y = rnorm(3),
index = "year"
)
x
#> # A tsibble: 3 x 2 [2Y]
#> year y
#> <year> <dbl>
#> 1 2018 0.211
#> 2 2020 -0.410
#> 3 2024 0.333
interval(x)
#> <interval[1]>
#> [1] 2Y
new_data(x, 3)
#> # A tsibble: 3 x 1 [2Y]
#> year
#> <year>
#> 1 2026
#> 2 2028
#> 3 2030
Created on 2021-02-08 by the reprex package (v0.3.0)

Related

R: how to let user to subset certain period of data through readline?

How can I lets user (in terminal) to chose certain period (e.g 2005-2009) from year column and subset data through this filter? using readline() and even also menu() functions
df <- data.frame (year = c(2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010),
sale = c(11,12,9,9,4,12,18,36,21,30,44))
Here is one way to do it with readline and strsplit:
df <- data.frame (year = c(2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010),
sale = c(11,12,9,9,4,12,18,36,21,30,44))
x <- readline("choose time span (e.g. '2001-2003'):")
# Enter: 2001-2003
x1 <- as.numeric(unlist(strsplit(x, "-")))
subset(df, year >= x1[1] & year <= x1[2])
#> year sale
#> 2 2001 12
#> 3 2002 9
#> 4 2003 9
Created on 2022-05-24 by the reprex package (v2.0.1)

Simple Moving Average Column-Wise in R

So I clean revenue data every quarter and I need to do the two quarter moving average to predict the next five year quarterly revenue for each individual product (I know this will just end up being the same average for now). Attached here is the data frame: Revenue Df
Right now I have the data in wide format, and you'll see I created the empty forecasting columns by have the user enter a start and end date for the forecast, then it creates the columns for every quarter between. How can I fill these forecast using a moving average? I also converted it to long, and still could not figure out how to fill the forecast. Also I know the 9-30-2020 shows in the forecast, we want to replace that with the actuals even if the user inputs that date for the forecast.
for(i in ncol(Revenue_df)){
if(i<3)
{Revenue_df[,i]<- Revenue_df[,i]}
else{
Revenue_df[,i]<-(Revenue_df[,i-1]+Revenue_df[,i-2])/2
}
}
Product<- c("a","b","c","d","e")
Revenue.3_30_2020<- c(50,40,30,20,10)
Revenue.6_30_2020<- c(50,45,28,19,17)
Revenue.9_30_2020<- c(25,20,22,17,24)
revenue<- data.frame(Product,Revenue.3_30_2020,Revenue.6_30_2020,Revenue.9_30_2020)
forecast.sequence<- c("2020-09-30","2020-12-31","2021-03-31","2021-06-30","2021-09-30","2021-12-31","2022-03-31"
"2022-06-30","2022-09-30","2022-12-31","2023-03-31","2023-06-30","2023-09-30","2023-12-31","2024-03-31"
"2024-06-30","2024-09-30","2024-12-31")
forecast.sequence.amount<- paste("FC.Amount.",forecast.sequence)
revenue[,forecast.sequence.amount]<-NA
I tried this code and it did not work, any suggestions? Also attached is the code for the sample data frame shown in the picture, sorry for the bad format this is my second time asking a question on here.
This seems to be a bit simple for a product forecast. You might want to look at the forecast and fable packages for forecast functions that can account for trends and seasonality in forecasts. These would, however, require for than two data points of data. Anyway, taking your problem as given, the following code seems to do what you describe.
EDIT
I've made the forecast calculation a function to make it more straightforward to use.
library(tidyverse)
product<- c("a","b","c","d","e")
Revenue.3_30_2020<- c(50,40,30,20,10)
Revenue.6_30_2020<- c(50,45,28,19,17)
Revenue.9_30_2020<- c(25,20,22,17,24)
revenue<- data.frame( Product = product, Revenue.3_30_2020,Revenue.6_30_2020,Revenue.9_30_2020)
rev_frcst <- function(revenue, frcst_end, frcst_prefix) {
#
# Arguments:
# revenue = data frame with
# Product containing product name
# columns with the format "prefix.m_day_year" containing product quantities for past quarters
# frcst_end = end date for quarterly forecast
# frcst_prefix = string containing prefix for forecast
#
# convert revenue to long format
#
rev_long <- revenue %>% pivot_longer(cols = -Product, names_to = "Quarter", values_to = "Revenue") %>%
mutate(quarter_end = as.Date(str_remove(Quarter,"Revenue."), "%m_%d_%Y"))
num_revenue <- nrow(rev_long)/length(product)
#
# generate forecast dates
#
forecast.sequence <- seq( max(rev_long$quarter_end),
as.Date(frcst_end),
by = "quarter")[-1]
#
# Add forecast rows to data
#
rev_long <- rev_long %>%
bind_rows(expand_grid(Product=unique(revenue$Product), quarter_end = forecast.sequence) %>%
mutate(Quarter = paste(frcst_prefix, quarter_end)) ) )
#
# Define moving average function
#
mov_avg <- function(num_frcst, x) {
y <- c(x, numeric(num_frcst))
for(i in 1:num_frcst + 2) {
y[i] <- .5*(y[i-1] + y[i-2]) }
y[1:num_frcst + 2]
}
#
# Calculate forecast
#
rev_long_2 <- rev_long %>% group_by(Product) %>%
mutate(forecast = c(Revenue[1:num_revenue],
mov_avg(num_frcst =length(forecast.sequence),
x = Revenue[1:2 + num_revenue - 2]))) %>%
arrange(Product, quarter_end)
}
#
# call rev_frcst to calcuate forecast
#
rev_forecast <- rev_frcst(revenue=revenue,
frcst_end = "2024-12-31",
frcst_prefix = "FC.Amount.")
which gives
Product Quarter Revenue quarter_end forecast
<chr> <chr> <dbl> <date> <dbl>
1 a Revenue.3_30_2020 50 2020-03-30 50
2 a Revenue.6_30_2020 50 2020-06-30 50
3 a Revenue.9_30_2020 25 2020-09-30 25
4 a FC.Amount. 2020-12-30 NA 2020-12-30 37.5
5 a FC.Amount. 2021-03-30 NA 2021-03-30 31.2
6 a FC.Amount. 2021-06-30 NA 2021-06-30 34.4
7 a FC.Amount. 2021-09-30 NA 2021-09-30 32.8
8 a FC.Amount. 2021-12-30 NA 2021-12-30 33.6
9 a FC.Amount. 2022-03-30 NA 2022-03-30 33.2
10 a FC.Amount. 2022-06-30 NA 2022-06-30 33.4

Merge overlapping time periods with milliseconds in R

I'm trying to find a way of merging overlapping time intervals that can deal with milliseconds.
Three potential options have been posted here:
How to flatten / merge overlapping time periods
However, I don't need to group by ID, and so am finding the dplyr and data.table methods confusing (I'm not sure whether they can deal with milliseconds, as I can't get them to work).
I have managed to get the IRanges solution working, but it converts POSIXct objects to as.numeric integers to calculate the overlaps. So, I'm assuming this is why milliseconds are absent from the output?
The lack of milliseconds doesn't seem to be a display issue, as when I subtract the resulting start and end times, I get integer results in seconds.
Here's a sample of my data:
start <- c("2019-07-15 21:32:43.565",
"2019-07-15 21:32:43.634",
"2019-07-15 21:32:54.301",
"2019-07-15 21:34:08.506",
"2019-07-15 21:34:09.957")
end <- c("2019-07-15 21:32:48.445",
"2019-07-15 21:32:49.045",
"2019-07-15 21:32:54.801",
"2019-07-15 21:34:10.111",
"2019-07-15 21:34:10.236")
df <- data.frame(start, end)
The output I get from the IRanges solution:
start end
1 2019-07-15 21:32:43 2019-07-15 21:32:49
2 2019-07-15 21:32:54 2019-07-15 21:32:54
3 2019-07-15 21:34:08 2019-07-15 21:34:10
And the desired result:
start end
1 2019-07-15 21:32:43.565 2019-07-15 21:32:49.045
2 2019-07-15 21:32:54.301 2019-07-15 21:32:54.801
3 2019-07-15 21:34:08.506 2019-07-15 21:34:10.236
Suggestions would be very much appreciated!
I've found it is quite easy to preserve milliseconds if you use POSIXlt format. Although there are faster ways to calculate the overlap, it's fast enough for most purposes to just loop through the data frame.
Here's a reproducible example.
start <- c("2019-07-15 21:32:43.565",
"2019-07-15 21:32:43.634",
"2019-07-15 21:32:54.301",
"2019-07-15 21:34:08.506",
"2019-07-15 21:34:09.957")
end <- c("2019-07-15 21:32:48.445",
"2019-07-15 21:32:49.045",
"2019-07-15 21:32:54.801",
"2019-07-15 21:34:10.111",
"2019-07-15 21:34:10.236")
df <- data.frame(start = as.POSIXlt(start), end = as.POSIXlt(end))
i <- 1
df <- data.frame(start = as.POSIXlt(start), end = as.POSIXlt(end))
while(i < nrow(df))
{
overlaps <- which(df$start < df$end[i] & df$end > df$start[i])
if(length(overlaps) > 1)
{
df$end[i] <- max(df$end[overlaps])
df <- df[-overlaps[-which(overlaps == i)], ]
i <- i - 1
}
i <- i + 1
}
So now our data frame doesn't have overlaps:
df
#> start end
#> 1 2019-07-15 21:32:43 2019-07-15 21:32:49
#> 3 2019-07-15 21:32:54 2019-07-15 21:32:54
#> 4 2019-07-15 21:34:08 2019-07-15 21:34:10
Although it appears we have lost the milliseconds, this is just a display issue, as we can show by doing this:
df$end - df$start
#> Time differences in secs
#> [1] 5.48 0.50 1.73
as.numeric(df$end - df$start)
#> [1] 5.48 0.50 1.73
Created on 2020-02-20 by the reprex package (v0.3.0)
I think the best thing to do here is to use the clock package (for a true sub-second precision date-time type) along with the ivs package (for merging overlapping intervals).
Using POSIXct for sub-second date-times can be a bit challenging for various reasons, which I've talked about here.
The key here is iv_groups(), which merges all overlapping intervals and returns the intervals that remain after all of the overlaps have been merged. It is also backed by a C implementation that is very fast.
library(clock)
library(ivs)
library(dplyr)
df <- tibble(
start = c(
"2019-07-15 21:32:43.565", "2019-07-15 21:32:43.634",
"2019-07-15 21:32:54.301", "2019-07-15 21:34:08.506",
"2019-07-15 21:34:09.957"
),
end = c(
"2019-07-15 21:32:48.445", "2019-07-15 21:32:49.045",
"2019-07-15 21:32:54.801", "2019-07-15 21:34:10.111",
"2019-07-15 21:34:10.236"
)
)
# Parse into "naive time" (i.e. with a yet-to-be-defined time zone)
# using a millisecond precision
df <- df %>%
mutate(
start = naive_time_parse(start, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond"),
end = naive_time_parse(end, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond"),
)
df
#> # A tibble: 5 × 2
#> start end
#> <tp<naive><milli>> <tp<naive><milli>>
#> 1 2019-07-15T21:32:43.565 2019-07-15T21:32:48.445
#> 2 2019-07-15T21:32:43.634 2019-07-15T21:32:49.045
#> 3 2019-07-15T21:32:54.301 2019-07-15T21:32:54.801
#> 4 2019-07-15T21:34:08.506 2019-07-15T21:34:10.111
#> 5 2019-07-15T21:34:09.957 2019-07-15T21:34:10.236
# Now combine these start/end boundaries into a single interval vector
df <- df %>%
mutate(interval = iv(start, end), .keep = "unused")
df
#> # A tibble: 5 × 1
#> interval
#> <iv<tp<naive><milli>>>
#> 1 [2019-07-15T21:32:43.565, 2019-07-15T21:32:48.445)
#> 2 [2019-07-15T21:32:43.634, 2019-07-15T21:32:49.045)
#> 3 [2019-07-15T21:32:54.301, 2019-07-15T21:32:54.801)
#> 4 [2019-07-15T21:34:08.506, 2019-07-15T21:34:10.111)
#> 5 [2019-07-15T21:34:09.957, 2019-07-15T21:34:10.236)
# And use `iv_groups()` to merge all overlapping intervals.
# It returns the remaining intervals after all overlaps have been removed.
df %>%
summarise(interval = iv_groups(interval))
#> # A tibble: 3 × 1
#> interval
#> <iv<tp<naive><milli>>>
#> 1 [2019-07-15T21:32:43.565, 2019-07-15T21:32:49.045)
#> 2 [2019-07-15T21:32:54.301, 2019-07-15T21:32:54.801)
#> 3 [2019-07-15T21:34:08.506, 2019-07-15T21:34:10.236)
Created on 2022-04-05 by the reprex package (v2.0.1)

How do i get the corresponding cell value of the result in R

I am trying to get the corresponding value of the cell in R but unable to do so. My df has basically 2 columns. Date and Price for a set of 5 observations. I want to know at which date was the price the maximum.
I wrote the below code but it only shows Date
HH <- max(df$price, show = "Date")
HH
[1] Date
I suggest something like:
df$date[df$price == max(df$price)]
You might read this as show me the value for df$date such that the value for df$price is the maximal value in the price column. Use the $ operator to select a column, read the [ and ] as 'such that' and note the == sign is not = as == means 'is equal too' and = (or <-) would be used to assign a value to a variable. Your answer should be the date at which the price was maximal.
I think this is what you want; which.max gives the index of the maximum value in a vector.
df <- data.frame(date = 1:5, price = 6:10)
df
#> date price
#> 1 1 6
#> 2 2 7
#> 3 3 8
#> 4 4 9
#> 5 5 10
df$date[which.max(df$price)]
#> [1] 5
Created on 2018-12-10 by the reprex package (v0.2.0).
I'm not sure if this is the quickest way but this can be done in dplyr
library(tidyverse)
test <- data.frame(date = as.Date(c("2018-01-01","2018-01-02","2018-01-03", "2018-01-04", "2018-01-05")),
price = c(20, 35, 21, 39, 40))
answer <- test %>%
filter(price == max(test$price))

How to get sum of values every 8 days by date in data frame in R

I don't often have to work with dates in R, but I imagine this is fairly easy. I have daily data as below for several years with some values and I want to get for each 8 days period the sum of related values.What is the best approach?
Any help you can provide will be greatly appreciated!
str(temp)
'data.frame':648 obs. of 2 variables:
$ Date : Factor w/ 648 levels "2001-03-24","2001-03-25",..: 1 2 3 4 5 6 7 8 9 10 ...
$ conv2: num -3.93 -6.44 -5.48 -6.09 -7.46 ...
head(temp)
Date amount
24/03/2001 -3.927020472
25/03/2001 -6.4427004
26/03/2001 -5.477592528
27/03/2001 -6.09462162
28/03/2001 -7.45666902
29/03/2001 -6.731540928
30/03/2001 -6.855206184
31/03/2001 -6.807210228
1/04/2001 -5.40278802
I tried to use aggregate function but for some reasons it doesn't work and it aggregates in wrong way:
z <- aggregate(amount ~ Date, timeSequence(from =as.Date("2001-03-24"),to =as.Date("2001-03-29"), by="day"),data=temp,FUN=sum)
I prefer the package xts for such manipulations.
I read your data, as zoo objects. see the flexibility of format option.
library(xts)
ts.dat <- read.zoo(text ='Date amount
24/03/2001 -3.927020472
25/03/2001 -6.4427004
26/03/2001 -5.477592528
27/03/2001 -6.09462162
28/03/2001 -7.45666902
29/03/2001 -6.731540928
30/03/2001 -6.855206184
31/03/2001 -6.807210228
1/04/2001 -5.40278802',header=TRUE,format = '%d/%m/%Y')
Then I extract the index of given period
ep <- endpoints(ts.dat,'days',k=8)
finally I apply my function to the time series at each index.
period.apply(x=ts.dat,ep,FUN=sum )
2001-03-29 2001-04-01
-36.13014 -19.06520
Use cut() in your aggregate() command.
Some sample data:
set.seed(1)
mydf <- data.frame(
DATE = seq(as.Date("2000/1/1"), by="day", length.out = 365),
VALS = runif(365, -5, 5))
Now, the aggregation. See ?cut.Date for details. You can specify the number of days you want in each group using cut:
output <- aggregate(VALS ~ cut(DATE, "8 days"), mydf, sum)
list(head(output), tail(output))
# [[1]]
# cut(DATE, "8 days") VALS
# 1 2000-01-01 8.242384
# 2 2000-01-09 -5.879011
# 3 2000-01-17 7.910816
# 4 2000-01-25 -6.592012
# 5 2000-02-02 2.127678
# 6 2000-02-10 6.236126
#
# [[2]]
# cut(DATE, "8 days") VALS
# 41 2000-11-16 17.8199285
# 42 2000-11-24 -0.3772209
# 43 2000-12-02 2.4406024
# 44 2000-12-10 -7.6894484
# 45 2000-12-18 7.5528077
# 46 2000-12-26 -3.5631950
rollapply. The zoo package has a rolling apply function which can also do non-rolling aggregations. First convert the temp data frame into zoo using read.zoo like this:
library(zoo)
zz <- read.zoo(temp)
and then its just:
rollapply(zz, 8, sum, by = 8)
Drop the by = 8 if you want a rolling total instead.
(Note that the two versions of temp in your question are not the same. They have different column headings and the Date columns are in different formats. I have assumed the str(temp) output version here. For the head(temp) version one would have to add a format = "%d/%m/%Y" argument to read.zoo.)
aggregate. Here is a solution that does not use any external packages. It uses aggregate based on the original data frame.
ix <- 8 * ((1:nrow(temp) - 1) %/% 8 + 1)
aggregate(temp[2], list(period = temp[ix, 1]), sum)
Note that ix looks like this:
> ix
[1] 8 8 8 8 8 8 8 8 16
so it groups the indices of the first 8 rows, the second 8 and so on.
Those are NOT Date classed variables. (No self-respecting program would display a date like that, not to mention the fact that these are labeled as factors.) [I later noticed these were not the same objects.] Furthermore, the timeSequence function (at least the one in the timeDate package) does not return a Date class vector either. So your expectation that there would be a "right way" for two disparate non-Date objects to be aligned in a sensible manner is ill-conceived. The irony is that just using the temp$Date column would have worked since :
> z <- aggregate(amount ~ Date, data=temp , FUN=sum)
> z
Date amount
1 1/04/2001 -5.402788
2 24/03/2001 -3.927020
3 25/03/2001 -6.442700
4 26/03/2001 -5.477593
5 27/03/2001 -6.094622
6 28/03/2001 -7.456669
7 29/03/2001 -6.731541
8 30/03/2001 -6.855206
9 31/03/2001 -6.807210
But to get it in 8 day intervals use cut.Date:
> z <- aggregate(temp$amount ,
list(Dts = cut(as.Date(temp$Date, format="%d/%m/%Y"),
breaks="8 day")), FUN=sum)
> z
Dts x
1 2001-03-24 -49.792561
2 2001-04-01 -5.402788
A more cleaner approach extended to #G. Grothendieck appraoch. Note: It does not take into account if the dates are continuous or discontinuous, sum is calculated based on the fixed width.
code
interval = 8 # your desired date interval. 2 days, 3 days or whatevea
enddate = interval-1 # this sets the enddate
nrows = nrow(z)
z <- aggregate(.~V1,data = df,sum) # aggregate sum of all duplicate dates
z$V1 <- as.Date(z$V1)
data.frame ( Start.date = (z[seq(1, nrows, interval),1]),
End.date = z[seq(1, nrows, interval)+enddate,1],
Total.sum = rollapply(z$V2, interval, sum, by = interval, partial = TRUE))
output
Start.date End.date Total.sum
1 2000-01-01 2000-01-08 9.1395926
2 2000-01-09 2000-01-16 15.0343960
3 2000-01-17 2000-01-24 4.0974712
4 2000-01-25 2000-02-01 4.1102645
5 2000-02-02 2000-02-09 -11.5816277
data
df <- data.frame(
V1 = seq(as.Date("2000/1/1"), by="day", length.out = 365),
V2 = runif(365, -5, 5))

Resources