R xts: apply over a rolling window - r

I wish to execute a function FUN over a rolling window of 1 year. My xts has not the same number of points per year. How can I do that in an efficient way?
P.S. usually to execute a FUN over a fixed number of datapoints (for instance 100) I use:
as.xts(rollapply(data = zoo(indicator), FUN = FUN, width = 100, align = "right"))
but obviously this doesn't work if there are not always the same number of points per year.

I'll try to answer my own question: One way to do that is:
First to NA-pad the time series so that there is one datapoint per day (or any unit relevant for your case),
(optional, depending on your FUN) Then to use na.locf to carry over the last data to fill the holes.
Finally to use the usual rollapply as shown in the question, over a fixed number of datapoints that corresponds to 1 year.

Your can use the apply.yearly(x, FUN, ...) function from the xts library.
dat <- xts(rnorm(1000), order.by = as.Date(1:1000))
plot(dat)
apply.yearly(dat, mean)

Related

Using mutate ifelse and rollappy to create a conditional factor based on changes in continuous variable

tl;dr I need to condition if a promotion was on or not based upon drops(or not) in price over time. I am open to alternative approaches.
I have a data frame of prices split across several grouping factors over time. My goal is for each 'ITEM' in 'EACH' store to check the mode of the 'PRICE' for the past 7 dates (if they exist). If the value of the observation is less than 10% of the mode of price, then in the 'Promotion' column should be populated with a 1, if not a 0.
EXAMPLE DATA
dat <- data.frame(Date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 10),
Item = rep(LETTERS[1:4], times = 10),
Store = as.factor(sample(rep(c("NY","SYD","LON","PAR"), each = 10))),
Price = rnorm(n = 40, mean = 2.5, sd = 1))
So far I have used dplyr's group_split to break out item and store groupings into separate data frames to capture all the conditions. What I believe I need to do now is mutate the new column using an ifelse statement with rollapply. I have so far attempted to use the following line of code...
data %>% mutate(Promotion = ifelse(rollapply(Price, 7, Mode <= Price*0.91,1,0)))
this returns an error statement...
Error: Problem with `mutate()` input `PRMT_IND2`.
x comparison (5) is possible only for atomic and list types
i Input `PRMT_IND2` is `ifelse(...)`.
I am not really sure where to go from here. If you have time I would also appreciate it if you could tell me how to apply this across all the groups created by the group_split, and how to stitch this back together.
note. Observations (dates/rows) are no even across shops, and some are populated with less than 7 days. I can remove these if the rolling apply will not work without it. But that loses quite a chunk of data.
I am using this function for the Mode...
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Maybe you can use rolling mean instead of mode.
library(dplyr)
library(zoo)
dat %>%
group_by(Item, Store) %>%
mutate(Promotion = as.integer(abs((Price -
rollmeanr(Price, 7, fill = NA))/Price) > 0.1))
This will give NA's to first 6 value and give 1 if Price varies more than 10% than previous 7 days value and 0 otherwise. Also note, that we take absolute value here so it will give 1 if the price increases by 10% or decreases.
As Ronah Shak pointed out, the function does not seem like the most appropriate choice.
Also, note that the use of tabulate converts the values to integers, which may be problematic for the values you have.
Regarding the error, as you correctly guessed, the problem was that your splitted data does not always have 7 dates so the rollapply function with width=7returned an error.
Allowing your function to use the length of the Date vector OR 7 if available solves the issue.
Also, you can use just apply your function using group_by, splitting the data is not necessary.
dat %>%
group_by(Store,Item)%>%
mutate(price_check = Price*0.91,
Promotion = ifelse(rollapply(Price, width = min(length(Date),7), Mode)>=price_check,1,0))

Calculating the difference of elements in a vector with varying lag/lead

I have some lab data and I am looking to calculate the difference between sample measurements over a moving time frame/window e.g 2 minutes (as apposed to 0-2, 2-4, 4-6 minute, static windows)
The problem is that although the data is sampled every second there are some missed samples (e.g. 1,2,4,6,7) so I cannot use a fixed lag function especially for larger time windows.
Here is the most promising I have tried. I have tried to calculate the difference in the row positions that will then use that to determine the lag value.
library(tidyverse)
df <- data.frame(sample_group = c(rep("a", 25), rep("b", 25)),t_seconds = c(1:50), measurement = seq(1,100,2))
df <- df[-c(5,10,23,33,44),] #remove samples
t_window = 5
df_diff <- df %>%
group_by(sample_group) %>%
arrange(t_seconds) %>%
mutate(lag_row = min(which(t_seconds >= t_seconds + t_window))- min(which(t_seconds == t_seconds)), #attempt to identify the lag value for each element
Meas_diff = measurement - lag(measurement, lag_row))
In this example (lag_row) I am trying to call an element from a vector and the vector itself, which obviously does not work! to make it clearer, I have added '_v' to identify what I wanted as a vector and '_e' as an element of that vector min(which(t_seconds_v >= t_seconds_e + t_window))- min(which(t_seconds_v == t_seconds_e))
I have tried to stay away from using loops but I have failed to solve the problem.
I would appreciate if anyone has any better ideas?
Your first step should be inserting missing observations into your time series. Then you could fill the missing values using a Last-Observation-Carried-Backwards operation. This provides you with a complete regular time series.
Your desired output is very unclear, so the next step after that in the following example is just a guess. Adjust as needed.
#complete time series (using a data.table join):
library(data.table)
setDT(df)
df_fill <- df[, .SD[data.table(t_seconds = min(t_seconds):max(t_seconds)),
on = "t_seconds"],
by = sample_group]
df_fill[, filled := is.na(measurement)]
#last observation carried backwards
library(zoo)
df_fill[, measurement := na.locf(measurement, fromLast = TRUE), by = sample_group]
#differences
df_fill[, diff_value := shift(measurement, -t_window) - measurement, by = sample_group]

alignment and offsets in rollapply

I am trying to calculate some statistics for a moving window and am using rollapply in the zoo package. My question is how do I get rollapply to apply that function to the previous n observations instead of the current observation and the previous n-1 observations as align right seems to do.
require(zoo)
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
I have a hunch this is answered by ?rollapply "If width is a plain numeric vector its elements are regarded as widths to be interpreted in conjunction with align whereas if width is a list its components are regarded as offsets. In the above cases if the length of width is 1 then width is recycled for every by-th point. If width is a list its components represent integer offsets such that the i-th component of the list refers to time points at positions i + width[[i]]." But I have no idea what that means in terms of R code an no example is provided.
Nevermind, I deciphered the 'help.' Adding the parameter width to rollapply like this:
width=list(-1:-5)
accomplishes it.
If I'm reading correctly, you just want the column "shifted" down by one - so that the value for row n is the value that row n+1 has now.
This can be easily done using the lag function:
z <- data.frame(x1=11:111, x2=111:211, x3=as.Date(31:131))#generate data
output<-data.frame(dates=z$x3,
rollapply(z[,1:2],by.column=TRUE, 5, max, fill=NA, align='right'))
output$x1 <- lag(output$x1, 1)
output$x2 <- lag(output$x2, 1)

Moving average with changing period in R

I have a data frame named abc on which I'm doing moving average using rollapply. The following code works:
forecast <- rollapply(abc, width=12, FUN=mean, align = "right", fill=NA)
Now, I want to do the same thing with the width being variable, i.e. for the 1st month, it'll be empty, for the second month, first month's value will come. For the third month, it'll be (first month+second month/2), i.e. for the ith month, if i<=12, the value will be (sum(1:i-1)/(i-1)) and for i>=12 it will be the average of the last 12 months as done by the forecast. Please help.
Here are some appraoches:
1) partial=TRUE
n <- length(x)
c(NA, rollapplyr(x, 12, mean, partial = TRUE)[-n])
Note the r at the end of rollapplyr.
2) width as list The width argument of rollapply can be a list such that the ith list element is a vector of the offsets to use for the ith rolling computation. If we specify partial=TRUE then offsets that run off the end of the vector will be ignored. If we only specify one element in the list it will be recycled:
rollapply(x, list(-seq(12)), mean, partial = TRUE, fill = NA)
2a) Rather than recycling and depending on partial we can write it out. Here we want width <- list(numeric(0), -1, -(1:2), -(1:3), ..., -(1:12), ..., -(1:12)) which can be calculated like this:
width <- lapply(seq_along(x), function(x) -seq_len(min(12, x-1)))
rollapply(x, width, mean)
This one would mainly be of interest if you want to modify the specification slightly because it is very flexible.
Note: Later in the comments the poster asked for the same rolling average except for it not to be lagged. That would be just:
rollapplyr(x, 12, mean, partial = TRUE)
Note the r at the end of rollapplyr.
Update Some improvements and additional solutions.

Computing a weighted rolling average R [duplicate]

This question already has answers here:
Adaptive moving average - top performance in R
(3 answers)
Closed 8 years ago.
Say i have two columns in a dataframe/data.table, one the level and the other one volume. I want to compute a rolling average of the level, weighted by volume, so volume acts as weight (normalized to 1) for some rolling window.
Base R has a weighted.mean() function which does similar calculation for two static vectors. I tried using sapply to pass a list/vector fo argument to it and create a rollign series, but to no avail.
Which "apply" mechanism should i use with weighted.mean() to get the desired result, or i would have to loop/write my own function?
////////////////////////////////////////////////////////////////////////////////////////
P.S. in the end i settled on writing simple custom function, which utilizes the great RccpRoll package. I found RccpRoll to be wicked fast, much faster than other rolling methods, which is important to me, as my data is several million rows.
the code for the function looks like this(i've added some NAs in the beggining since RccpRoll returns data without NAs):
require(RcppRoll)
my.rollmean.weighted <- function(vec1,vec2,width){
return(c(rep(NA,width-1),roll_sum(vec1*vec2,width)/roll_sum(vec2,width)))
}
I think this might work. It employs the technique demonstrated in the rollapply documentation for rolling regression. The key is by.column=FALSE. This provides a matrix of all the columns on a rolling basis.
require(zoo)
df <- data.frame(
price = cumprod(1 + runif(1000,-0.03,0.03)) * 25,
volume = runif(1000,1e6,2e6)
)
rollapply(
df,
width = 50,
function(z){
#uncomment if you want to see the structure
#print(head(z,3))
return(
weighted_mean = weighted.mean(z[,"price"],z[,"volume"])
)
},
by.column = FALSE,
align = "right"
)
Let me know if it doesn't work or is not clear.
Here is a code snippet that might help. It uses the rollmean function from the zoo package, and intervals of two (you pick the interval). The variable you would calculate using the weighted.mean function, I assume:
library(zoo) # for the rollmean() function
movavg <- rollmean(df$weightedVariable, k = 2, align = "right")

Resources