R - problem with optim() when passing arguments through a function - r

I am trying to build a double optimization in R. By double optimization, I mean that there is going to be an inner call, in which I optimize a function called inner_function(), and then an outer call, in which I optimize an outer_function() whose output is computed using the optimization of inner_function().
I can make this work when inner_function() is optimized through optim() and outer_function() takes only one argument and is optimized through optimize():
constructor_function <- function(data, fixed = c(FALSE, FALSE)) {
params <- fixed
function(p) {
params[!fixed] <- p
a <- data[1]
b <- data[2]
c <- data[3]
d <- data[4]
e <- params[1]
f <- params[2]
## Calculate something
tot <- abs(a + b + c + d + e + f)
return(tot)
}
}
inner_function <- constructor_function(c(1, 2, 3, 4))
inner_function(c(5, 6))
#> [1] 21
optim(c(0, 0), inner_function)$par
#> [1] -3.454274 -6.545726
sum(optim(c(0, 0), inner_function)$par)
#> [1] -10
outer_function <- function(first_factor) {
inner_function <- constructor_function(c(first_factor, 2, 3, 4))
values <- optim(c(0, 0), inner_function)$par
tot <- sum(values)
return(tot)
}
# check
outer_function(1)
#> [1] -10
optimize(outer_function, lower = 0, upper = 4)
#> $minimum
#> [1] 3.99994
#>
#> $objective
#> [1] -12.99994
# check
outer_function(3.99994)
#> [1] -12.99994
But I can't make the double optimization work when the outer function (now called outer_function_args) takes more than one argument, so that it can be optimized only with optim():
outer_function_args <- function(first_factor, second_factor) {
inner_function <- constructor_function(c(first_factor, second_factor, 3, 4))
values <- optim(c(0, 0), inner_function)$par
tot <- sum(values)
return(tot)
}
outer_function_args(1,2)
#> [1] -10
optim(par=c(0,2), outer_function_args)
#> Error in fn(par, ...): argument "second_factor" is missing, with no default
The error mentions that argument "second_factor" is missing, but outer_function_args is running correctly.
Created on 2021-04-15 by the reprex package (v0.3.0)

You need to modify your function to take in the parameters as a vector, like so:
outer_function_args <- function(par) {
inner_function <- constructor_function(c(par[1], par[2], 3, 4))
values <- optim(c(0, 0), inner_function)$par
tot <- sum(values)
return(tot)
}
outer_function_args(par = c(1, 2))
#> [1] -10
optim(par=c(0,2), outer_function_args)
#$par
#[1] 3355434 3355445
#
#$value
#[1] -6710886
#
#$counts
#function gradient
# 253 NA
#
#$convergence
#[1] 0
#
#$message
#NULL
From the documentation of optim(par, fn) in help("optim"):
fn   A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.

Related

How to implicitly pass all arguments down the stack?

Take the following code:
f2 <- function(...) {
print(list(...))
}
f1 <- function(x, y = 1, ...) {
z <- 20
f2(x, y, ...)
}
f1(5, k = 6)
If I change the arguments to f1, and I still want to pass all those arguments to f2, I would need to change the call to f2. Is it possible to write the call to f2 so that it does not name x and y explicitly? Something like the following (non-working code):
f1 <- function(x, y = 1, ...) {
z <- 20
do.call(f2, formals())
}
I can use environment(), but then I need to take care that I call it at the very beginning:
f1 <- function(x, y = 1, ...) {
argv <- c(as.list(environment()), ...)
z <- 20
do.call(f2, argv)
}
Is there maybe a simpler, more direct way?
It's not clear whether you wanted the variable z added to the call, but in either case you can achieve what you are looking for using match.call. You can simply swap in the quoted name of f2 as the first element in the matched call, and if you wish to add the missing defaults from the f1 formals, you can find them in formals() and write any missing ones into the matched call. Finally, you evaluate this call.
f1 <- function(x, y = 1, ...) {
mc <- match.call()
form <- names(formals())[!names(formals()) %in% names(mc)]
form <- form[form != "..."]
mc[[1]] <- quote(f2)
mc[form] <- formals()[form]
mc$z <- 20
eval(mc)
}
f2 <- function(...) {
print(list(...))
}
f1(5, k = 6)
#> $x
#> [1] 5
#>
#> $k
#> [1] 6
#>
#> $y
#> [1] 1
#>
#> $z
#> [1] 20
Created on 2020-09-29 by the reprex package (v0.3.0)

How to perform sequential mathematical operations with R [duplicate]

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
Or you can simply calculate it using filter, here's the function I use:
ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}
If you use dplyr, be careful to specify stats::filter in the function above.
Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast
Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers
cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
As pointed out in the comments by #mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from #Ricardo Cruz:
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.
As there is no reproducible example in the question there is not much more to address here.
You can find more info about ?frollmean in manual, also available online at ?frollmean.
Examples from manual below:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.
You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.
Otherwise, this (slower) for loop should do the trick:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n):i])
}
res
}
Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.
library(tidyverse)
library(zoo)
some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#> day cma tma
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 2 NA
#> 3 3 3 2
#> 4 4 4 3
#> 5 5 5 4
#> 6 6 6 5
#> 7 7 7 6
#> 8 8 8 7
#> 9 9 9 8
#> 10 10 NA 9
In fact RcppRoll is very good.
The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
ma <- function(x, n=2,parcial=TRUE){
res = x #set the first values
if (parcial==TRUE){
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res
}else{
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
}
}
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)
You may calculate the moving average of a vector x with a window width of k by:
apply(embed(x, k), 1, mean)
The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.
library(slider)
x <- 1:3
# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5
df <- data.frame(x = x, y = x)
# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#> x y
#> 1 1 1
#>
#> [[2]]
#> x y
#> 1 1 1
#> 2 2 2
#>
#> [[3]]
#> x y
#> 1 2 2
#> 2 3 3
The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.
library(slider)
library(zoo)
library(data.table)
x <- 1:50000 + 0L
bench::mark(
slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
datatable = frollapply(x, n = 6, FUN = function(x) 1L),
iterations = 200
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slider 19.82ms 26.4ms 38.4 829.8KB 19.0
#> 2 zoo 177.92ms 211.1ms 4.71 17.9MB 24.8
#> 3 datatable 7.78ms 10.9ms 87.9 807.1KB 38.7
EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.
For people just wanting to calculate this themselves, it's nothing more than:
# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))
for (i in seq_len(length(x))) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- mean(x[ind])
}
y
But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!
# our working horse:
moving_fn <- function(x, w, fun, ...) {
# x = vector with numeric data
# w = window length
# fun = function to apply
# side = side to take, (c)entre, (l)eft or (r)ight
# ... = parameters passed on to 'fun'
y <- numeric(length(x))
for (i in seq_len(length(x))) {
if (side %in% c("c", "centre", "center")) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
} else if (side %in% c("l", "left")) {
ind <- c((i - floor(w) + 1):i)
} else if (side %in% c("r", "right")) {
ind <- c(i:(i + floor(w) - 1))
} else {
stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
}
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- fun(x[ind], ...)
}
y
}
# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}
moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}
moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}
moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}
moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}
moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
reqd_ma <- rollapply(x, FUN = mean, width = n)
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:
library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))
mean_run(x1)
#> [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#> [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809
mean_run(x2, na_rm = TRUE)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202
mean_run(x2, na_rm = FALSE )
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] NA NA NA NA NA NA
#> [13] NA NA NA
mean_run(x2, na_rm = TRUE, k = 4)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.10546063 -0.16299272
#> [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684 0.01103493
#> [13] 0.09609256 0.09738460 0.04740283
mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696 0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571 0.009742884 0.009742884 0.012326968
#> [13] 0.182442234 0.125737145 0.059094786
One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.
Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:
wma <- function(x) {
wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
nside <- (length(wts)-1)/2
# pad x with begin and end values for filter to avoid NAs
xp <- c(rep(first(x), nside), x, rep(last(x), nside))
z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector
z[(nside+1):(nside+length(x))]
}
vector_avg <- function(x){
sum_x = 0
for(i in 1:length(x)){
if(!is.na(x[i]))
sum_x = sum_x + x[i]
}
return(sum_x/length(x))
}
I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:
v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)
Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.
In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

average sequence of lag variables R [duplicate]

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
Or you can simply calculate it using filter, here's the function I use:
ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}
If you use dplyr, be careful to specify stats::filter in the function above.
Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast
Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers
cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
As pointed out in the comments by #mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from #Ricardo Cruz:
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.
As there is no reproducible example in the question there is not much more to address here.
You can find more info about ?frollmean in manual, also available online at ?frollmean.
Examples from manual below:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.
You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.
Otherwise, this (slower) for loop should do the trick:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n):i])
}
res
}
Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.
library(tidyverse)
library(zoo)
some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#> day cma tma
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 2 NA
#> 3 3 3 2
#> 4 4 4 3
#> 5 5 5 4
#> 6 6 6 5
#> 7 7 7 6
#> 8 8 8 7
#> 9 9 9 8
#> 10 10 NA 9
In fact RcppRoll is very good.
The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
ma <- function(x, n=2,parcial=TRUE){
res = x #set the first values
if (parcial==TRUE){
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res
}else{
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
}
}
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)
You may calculate the moving average of a vector x with a window width of k by:
apply(embed(x, k), 1, mean)
The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.
library(slider)
x <- 1:3
# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5
df <- data.frame(x = x, y = x)
# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#> x y
#> 1 1 1
#>
#> [[2]]
#> x y
#> 1 1 1
#> 2 2 2
#>
#> [[3]]
#> x y
#> 1 2 2
#> 2 3 3
The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.
library(slider)
library(zoo)
library(data.table)
x <- 1:50000 + 0L
bench::mark(
slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
datatable = frollapply(x, n = 6, FUN = function(x) 1L),
iterations = 200
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slider 19.82ms 26.4ms 38.4 829.8KB 19.0
#> 2 zoo 177.92ms 211.1ms 4.71 17.9MB 24.8
#> 3 datatable 7.78ms 10.9ms 87.9 807.1KB 38.7
EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.
For people just wanting to calculate this themselves, it's nothing more than:
# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))
for (i in seq_len(length(x))) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- mean(x[ind])
}
y
But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!
# our working horse:
moving_fn <- function(x, w, fun, ...) {
# x = vector with numeric data
# w = window length
# fun = function to apply
# side = side to take, (c)entre, (l)eft or (r)ight
# ... = parameters passed on to 'fun'
y <- numeric(length(x))
for (i in seq_len(length(x))) {
if (side %in% c("c", "centre", "center")) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
} else if (side %in% c("l", "left")) {
ind <- c((i - floor(w) + 1):i)
} else if (side %in% c("r", "right")) {
ind <- c(i:(i + floor(w) - 1))
} else {
stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
}
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- fun(x[ind], ...)
}
y
}
# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}
moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}
moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}
moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}
moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}
moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
reqd_ma <- rollapply(x, FUN = mean, width = n)
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:
library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))
mean_run(x1)
#> [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#> [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809
mean_run(x2, na_rm = TRUE)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202
mean_run(x2, na_rm = FALSE )
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] NA NA NA NA NA NA
#> [13] NA NA NA
mean_run(x2, na_rm = TRUE, k = 4)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.10546063 -0.16299272
#> [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684 0.01103493
#> [13] 0.09609256 0.09738460 0.04740283
mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696 0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571 0.009742884 0.009742884 0.012326968
#> [13] 0.182442234 0.125737145 0.059094786
One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.
Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:
wma <- function(x) {
wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
nside <- (length(wts)-1)/2
# pad x with begin and end values for filter to avoid NAs
xp <- c(rep(first(x), nside), x, rep(last(x), nside))
z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector
z[(nside+1):(nside+length(x))]
}
vector_avg <- function(x){
sum_x = 0
for(i in 1:length(x)){
if(!is.na(x[i]))
sum_x = sum_x + x[i]
}
return(sum_x/length(x))
}
I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:
v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)
Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.
In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

Improving efficiency in R program [duplicate]

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
Or you can simply calculate it using filter, here's the function I use:
ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}
If you use dplyr, be careful to specify stats::filter in the function above.
Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast
Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers
cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
As pointed out in the comments by #mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from #Ricardo Cruz:
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.
As there is no reproducible example in the question there is not much more to address here.
You can find more info about ?frollmean in manual, also available online at ?frollmean.
Examples from manual below:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.
You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.
Otherwise, this (slower) for loop should do the trick:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n):i])
}
res
}
Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.
library(tidyverse)
library(zoo)
some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#> day cma tma
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 2 NA
#> 3 3 3 2
#> 4 4 4 3
#> 5 5 5 4
#> 6 6 6 5
#> 7 7 7 6
#> 8 8 8 7
#> 9 9 9 8
#> 10 10 NA 9
In fact RcppRoll is very good.
The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
ma <- function(x, n=2,parcial=TRUE){
res = x #set the first values
if (parcial==TRUE){
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res
}else{
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
}
}
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)
You may calculate the moving average of a vector x with a window width of k by:
apply(embed(x, k), 1, mean)
The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.
library(slider)
x <- 1:3
# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5
df <- data.frame(x = x, y = x)
# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#> x y
#> 1 1 1
#>
#> [[2]]
#> x y
#> 1 1 1
#> 2 2 2
#>
#> [[3]]
#> x y
#> 1 2 2
#> 2 3 3
The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.
library(slider)
library(zoo)
library(data.table)
x <- 1:50000 + 0L
bench::mark(
slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
datatable = frollapply(x, n = 6, FUN = function(x) 1L),
iterations = 200
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slider 19.82ms 26.4ms 38.4 829.8KB 19.0
#> 2 zoo 177.92ms 211.1ms 4.71 17.9MB 24.8
#> 3 datatable 7.78ms 10.9ms 87.9 807.1KB 38.7
EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.
For people just wanting to calculate this themselves, it's nothing more than:
# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))
for (i in seq_len(length(x))) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- mean(x[ind])
}
y
But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!
# our working horse:
moving_fn <- function(x, w, fun, ...) {
# x = vector with numeric data
# w = window length
# fun = function to apply
# side = side to take, (c)entre, (l)eft or (r)ight
# ... = parameters passed on to 'fun'
y <- numeric(length(x))
for (i in seq_len(length(x))) {
if (side %in% c("c", "centre", "center")) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
} else if (side %in% c("l", "left")) {
ind <- c((i - floor(w) + 1):i)
} else if (side %in% c("r", "right")) {
ind <- c(i:(i + floor(w) - 1))
} else {
stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
}
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- fun(x[ind], ...)
}
y
}
# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}
moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}
moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}
moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}
moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}
moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
reqd_ma <- rollapply(x, FUN = mean, width = n)
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:
library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))
mean_run(x1)
#> [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#> [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809
mean_run(x2, na_rm = TRUE)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202
mean_run(x2, na_rm = FALSE )
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] NA NA NA NA NA NA
#> [13] NA NA NA
mean_run(x2, na_rm = TRUE, k = 4)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.10546063 -0.16299272
#> [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684 0.01103493
#> [13] 0.09609256 0.09738460 0.04740283
mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696 0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571 0.009742884 0.009742884 0.012326968
#> [13] 0.182442234 0.125737145 0.059094786
One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.
Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:
wma <- function(x) {
wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
nside <- (length(wts)-1)/2
# pad x with begin and end values for filter to avoid NAs
xp <- c(rep(first(x), nside), x, rep(last(x), nside))
z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector
z[(nside+1):(nside+length(x))]
}
vector_avg <- function(x){
sum_x = 0
for(i in 1:length(x)){
if(!is.na(x[i]))
sum_x = sum_x + x[i]
}
return(sum_x/length(x))
}
I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:
v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)
Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.
In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

Calculating moving average

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
Or you can simply calculate it using filter, here's the function I use:
ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}
If you use dplyr, be careful to specify stats::filter in the function above.
Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast
Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers
cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
As pointed out in the comments by #mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from #Ricardo Cruz:
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.
As there is no reproducible example in the question there is not much more to address here.
You can find more info about ?frollmean in manual, also available online at ?frollmean.
Examples from manual below:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.
You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.
Otherwise, this (slower) for loop should do the trick:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n):i])
}
res
}
Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.
library(tidyverse)
library(zoo)
some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#> day cma tma
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 2 NA
#> 3 3 3 2
#> 4 4 4 3
#> 5 5 5 4
#> 6 6 6 5
#> 7 7 7 6
#> 8 8 8 7
#> 9 9 9 8
#> 10 10 NA 9
In fact RcppRoll is very good.
The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
ma <- function(x, n=2,parcial=TRUE){
res = x #set the first values
if (parcial==TRUE){
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res
}else{
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
}
}
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)
You may calculate the moving average of a vector x with a window width of k by:
apply(embed(x, k), 1, mean)
The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.
library(slider)
x <- 1:3
# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5
df <- data.frame(x = x, y = x)
# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#> x y
#> 1 1 1
#>
#> [[2]]
#> x y
#> 1 1 1
#> 2 2 2
#>
#> [[3]]
#> x y
#> 1 2 2
#> 2 3 3
The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.
library(slider)
library(zoo)
library(data.table)
x <- 1:50000 + 0L
bench::mark(
slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
datatable = frollapply(x, n = 6, FUN = function(x) 1L),
iterations = 200
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slider 19.82ms 26.4ms 38.4 829.8KB 19.0
#> 2 zoo 177.92ms 211.1ms 4.71 17.9MB 24.8
#> 3 datatable 7.78ms 10.9ms 87.9 807.1KB 38.7
EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.
For people just wanting to calculate this themselves, it's nothing more than:
# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))
for (i in seq_len(length(x))) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- mean(x[ind])
}
y
But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!
# our working horse:
moving_fn <- function(x, w, fun, ...) {
# x = vector with numeric data
# w = window length
# fun = function to apply
# side = side to take, (c)entre, (l)eft or (r)ight
# ... = parameters passed on to 'fun'
y <- numeric(length(x))
for (i in seq_len(length(x))) {
if (side %in% c("c", "centre", "center")) {
ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
} else if (side %in% c("l", "left")) {
ind <- c((i - floor(w) + 1):i)
} else if (side %in% c("r", "right")) {
ind <- c(i:(i + floor(w) - 1))
} else {
stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
}
ind <- ind[ind %in% seq_len(length(x))]
y[i] <- fun(x[ind], ...)
}
y
}
# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}
moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}
moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}
moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}
moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}
moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
reqd_ma <- rollapply(x, FUN = mean, width = n)
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:
library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))
mean_run(x1)
#> [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#> [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809
mean_run(x2, na_rm = TRUE)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202
mean_run(x2, na_rm = FALSE )
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] NA NA NA NA NA NA
#> [13] NA NA NA
mean_run(x2, na_rm = TRUE, k = 4)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.10546063 -0.16299272
#> [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684 0.01103493
#> [13] 0.09609256 0.09738460 0.04740283
mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696 0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571 0.009742884 0.009742884 0.012326968
#> [13] 0.182442234 0.125737145 0.059094786
One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.
Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:
wma <- function(x) {
wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
nside <- (length(wts)-1)/2
# pad x with begin and end values for filter to avoid NAs
xp <- c(rep(first(x), nside), x, rep(last(x), nside))
z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector
z[(nside+1):(nside+length(x))]
}
vector_avg <- function(x){
sum_x = 0
for(i in 1:length(x)){
if(!is.na(x[i]))
sum_x = sum_x + x[i]
}
return(sum_x/length(x))
}
I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:
v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)
Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.
In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

Resources