How to calculate rolling bootstrapped values and confidence intervals in R - r

I am new to R and am trying to calculate the bootstrapped standard deviation (sd) and associated standard error within a 30 observation rolling window. The function below performs the rolling window appropriately if I just want sd. But when I add the bootstrap function using the boot package I get the error specified below. I gather that I am trying to store bootstrap results in a vector that isn't the correct size. Does anyone have any advice on how to store just the bootstrapped sd and associated stderror for each window in rows of a new matrix? The goal is to then plot the sd and associated 95% confidence intervals for each window along the timeseries. Thanks in advance for any help.
> head(data.srs)
1 0.8274083
2 1.0853433
3 0.8049845
4 0.8912097
5 1.3514569
6 0.8694499
###Function to apply rolling window
rollWin <- function(timeSeries, windowLength)
nOut <- length(data[, 1]) - windowLength + 1
out <- numeric(nOut)
if (length(data[,1]) >= windowLength)
for (i in 1:nOut)
{ <- function(data,d)sd(data[d], na.rm = TRUE)
out[i] <- boot(data[i:(i + windowLength - 1), ],, R=1000)
return (list(result=out))
###run rolling window function. ex. rollWin(data, windowlength)
> warnings()
Warning messages:
1: In out[i] <- boot(data[i:(i + windowLength - 1), ],, ... :
number of items to replace is not a multiple of replacement length

You can simplify it quite a lot. I am not familiar with the boot package, but we can roll a function along a vector using the rollapply function quite easily, and then we can make bootstrap samples using the replicate function:
# Create some data, 12 items long
r <- runif(12)
# [1] 0.44997964 0.27425412 0.07327872 0.68054759 0.33577348 0.49239478
# [7] 0.93421646 0.19633079 0.45144966 0.53673296 0.71813017 0.85270346
# use rollapply to calculate function alonga moving window
# width is the width of the window
sds <- rollapply( r , width = 4 , by = 1 , sd )
#[1] 0.19736258 0.26592331 0.16770025 0.12585750 0.13730946 0.08488467
#[7] 0.16073722 0.22460430 0.22462168
# Now we use replicate to repeatedly evaluate a bootstrap sampling method
# 'n' is number of replications
n <- 4
replicate( n , rollapply( r , width = n , function(x) sd( x[ sample(length(x) , repl = TRUE) ] ) ) )
# [,1] [,2] [,3] [,4]
# [1,] 0.17934073 0.1815371 0.11603320 0.2992379
# [2,] 0.03551822 0.2862702 0.18492837 0.2526193
# [3,] 0.09042535 0.2419768 0.13124738 0.1666012
# [4,] 0.17238705 0.1410475 0.18136178 0.2457248
# [5,] 0.32008385 0.1709326 0.32909368 0.2550859
# [6,] 0.30832533 0.1480320 0.02363968 0.1275594
# [7,] 0.23069951 0.1275594 0.25648052 0.3016909
# [8,] 0.11235170 0.2493055 0.26089969 0.3012610
# [9,] 0.16819174 0.2099518 0.18033502 0.0906986
Each column represents the rollapply which bootstraps the observations in the current window before applying sd.


Optimization function applied to table of values in R

`values <- matrix(c(0.174,0.349,1.075,3.1424,0.173,0.346,1.038,3.114,0.171,0.343,1.03,3.09,0.17,0.34,1.02,3.06),ncol=4) `
I am attempting to maximize the total value for the dataset taking only one value from each row, and with associated costs for each column
subject to:
One value column used per row.
cost of each use of column 1 is 4
cost of each use of column 2 is 3
cost of each use of column 3 is 2
cost of each use of column 4 is 1
total cost <= 11
These are stand in values for a larger dataset. I need to be able to apply it directly to all the rows of a dataset.
I have been trying to use the lpSolve package, with no success.
`f.obj <- values
f.con <- c(4,3,2,1)
f.dir <- "<="
f.rhs <- 11
lp("max", f.obj, f.con, f.dir, f.rhs)`
I am getting a solution of "0"
I do not know how to model this in a way that chooses one value per row and then uses a different value in calculating the constraints.
Looks like the problem is as follows:
We have a matrix a[i,j] with values, and a vector c[j] with costs.
We want to select one value for each row such that:
a. total cost <= 11
b. total value is maximized
To develop a mathematical model, we introduce binary variables x[i,j] ∈ {0,1}. With this, we can write:
max sum((i,j), a[i,j]*x[i,j])
subject to
sum((i,j), c[j]*x[i,j]) <= 11
sum(j, x[i,j]) = 1 ∀i
x[i,j] ∈ {0,1}
Implement in R. I use here CVXR.
# data
# A : values
# C : cost
A <- matrix(c(0.174,0.349,1.075,3.1424,0.173,0.346,1.038,3.114,0.171,0.343,1.03,3.09,0.17,0.34,1.02,3.06),ncol=4)
C <- c(4,3,2,1)
maxcost <- 11
# form a matrix cmat[i,j] indicating the cost of element i,j
cmat <- matrix(C,nrow=dim(A)[1],ncol=dim(A)[2],byrow=T)
# problem:
# pick one value from each row
# such that total value of selected cells is maximized
# and cost of selected cells is limited to maxcost
# model:
# min sum((i,j), a[i,j]*x[i,j])
# subject to
# sum((i,j), c[j]*x[i,j]) <= maxcost
# sum(j,x[i,j]) = 1 ∀i
# x[i,j] ∈ {0,1}
x = Variable(dim(A), name="x", boolean=T)
p <- Problem(Maximize(sum_entries(A*x)),
sum_entries(cmat*x) <= maxcost,
sum_entries(x,axis=1) == 1
res <- solve(p,verbose=T)
The output looks like:
> res$status
[1] "optimal"
> res$value
[1] 4.7304
> res$getValue(x)*A
[,1] [,2] [,3] [,4]
[1,] 0.0000 0 0.000 0.17
[2,] 0.0000 0 0.343 0.00
[3,] 1.0750 0 0.000 0.00
[4,] 3.1424 0 0.000 0.00
The description in the original post is not very precise. For instance, I assumed that we need to select precisely one cell from each row. If we just want "select at most one cell from each row", then replace
sum(j, x[i,j]) = 1 ∀i
sum(j, x[i,j]) <= 1 ∀i
As mentioned by Steve, the lpSolve package expects a single objective function not a matrix. You could reformulate as maximize(sum(RowSums(values*xij)) given constraint
Eg, change the matrix to a vector, and change the problem to a integer optimization problem
obj <- as.vector(values)
f.con <- rep(f.con, each = 4)
r <- lp('max', obj, matrix(f.con, nrow = 1), f.dir, f.rhs, int.vec = seq_along(obj))
#' Success: the objective function is 9.899925

using cor.test function in R

If x be a n*m matrix, when I use cor(x), I have a m*m correlation matrix between each pair of columns.
How can I use cor.test function on the n*m matrix to have a m*m p-value matrix also?
There may be an existing function, but here's my version. p_cor_mat runs cor.test on each pair of columns in matrix x and records the p-value. These are then put into a square matrix and returned.
# Set seed
# Matrix of data
x <- matrix(runif(120), ncol = 4)
# Function for creating p value matrix
p_cor_mat <- function(x){
# All combinations of columns
colcom <- t(combn(1:ncol(x), 2))
# Calculate p values
p_vals <- apply(colcom, MAR = 1, function(i)cor.test(x[,i[1]], x[,i[2]])$p.value)
# Create matrix for result
p_mat <- diag(ncol(x))
# Fill upper & lower triangles
p_mat[colcom] <- p_mat[colcom[,2:1]] <- p_vals
# Return result
# Test function
#> [,1] [,2] [,3] [,4]
#> [1,] 1.0000000 0.4495713 0.9071164 0.8462530
#> [2,] 0.4495713 1.0000000 0.5960786 0.7093539
#> [3,] 0.9071164 0.5960786 1.0000000 0.7466226
#> [4,] 0.8462530 0.7093539 0.7466226 1.0000000
Created on 2019-03-06 by the reprex package (v0.2.1)
Please also see the cor.mtest() function in the corrplot package.

Expected return and covariance from return time series

I’m trying to simulate the Matlab ewstats function here defined:
The results given by Matlab are the following ones:
> ExpReturn = 1×2
0.1995 0.1002
> ExpCovariance = 2×2
0.0032 -0.0017
-0.0017 0.0010
I’m trying to replicate the example with the RiskPortfolios R package:
The R code I’m using is this one:
rets <- as.matrix(cbind(c(0.24, 0.15, 0.27, 0.14), c(0.08, 0.13, 0.06, 0.13)))
w <- 0.98
meanEstimation(rets, control = list(type = 'ewma', lambda = w))
covEstimation(rets, control = list(type = 'ewma', lambda = w))
The mean estimation is the same of the one in the example, but the covariance matrix is different:
> rets
[,1] [,2]
[1,] 0.24 0.08
[2,] 0.15 0.13
[3,] 0.27 0.06
[4,] 0.14 0.13
> w
[1] 0.98
> meanEstimation(rets, control = list(type = 'ewma', lambda = w))
[1] 0.1995434 0.1002031
> covEstimation(rets, control = list(type = 'ewma', lambda = w))
[,1] [,2]
[1,] 0.007045044 -0.003857217
[2,] -0.003857217 0.002123827
Am I missing something?
They give the same answer if type = "lw" is used:
round(covEstimation(rets, control = list(type = 'lw')), 4)
## 0.0032 -0.0017
## -0.0017 0.0010
They are using different algorithms. From the RiskPortfolio manual:
ewma ... See RiskMetrics (1996)
From the Matlab hlp page:
There is no relationship between ewstats function and the RiskMetrics® approach for determining the expected return and covariance from a return time series.
Unfortunately Matlab does not tell us which algorithm is used.
For those who eventually need an equivalent ewstats function in R, here the code I wrote:
ewstats <- function(RetSeries, DecayFactor=NULL, WindowLength=NULL){
#EWSTATS Expected return and covariance from return time series.
# Optional exponential weighting emphasizes more recent data.
# [ExpReturn, ExpCovariance, NumEffObs] = ewstats(RetSeries, ...
# DecayFactor, WindowLength)
# Inputs:
# RetSeries : NUMOBS by NASSETS matrix of equally spaced incremental
# return observations. The first row is the oldest observation, and the
# last row is the most recent.
# DecayFactor : Controls how much less each observation is weighted than its
# successor. The k'th observation back in time has weight DecayFactor^k.
# DecayFactor must lie in the range: 0 < DecayFactor <= 1.
# The default is DecayFactor = 1, which is the equally weighted linear
# moving average Model (BIS).
# WindowLength: The number of recent observations used in
# the computation. The default is all NUMOBS observations.
# Outputs:
# ExpReturn : 1 by NASSETS estimated expected returns.
# ExpCovariance : NASSETS by NASSETS estimated covariance matrix.
# NumEffObs: The number of effective observations is given by the formula:
# NumEffObs = (1-DecayFactor^WindowLength)/(1-DecayFactor). Smaller
# DecayFactors or WindowLengths emphasize recent data more strongly, but
# use less of the available data set.
# The standard deviations of the asset return processes are given by:
# STDVec = sqrt(diag(ECov)). The correlation matrix is :
# CorrMat = VarMat./( STDVec*STDVec' )
# See also MEAN, COV, COV2CORR.
NumObs <- dim(RetSeries)[1]
NumSeries <- dim(RetSeries)[2]
# size the series and the window
if (is.null(WindowLength)) {
WindowLength <- NumObs
if (is.null(DecayFactor)) {
DecayFactor = 1
if (DecayFactor <= 0 | DecayFactor > 1) {
stop('Must have 0< decay factor <= 1.')
if (WindowLength > NumObs){
stop(sprintf('Window Length #d must be <= number of observations #d',
WindowLength, NumObs))
# ------------------------------------------------------------------------
# size the data to the window
RetSeries <- RetSeries[NumObs-WindowLength+1:NumObs, ]
# Calculate decay coefficients
DecayPowers <- seq(WindowLength-1, 0, by = -1)
VarWts <- sqrt(DecayFactor)^DecayPowers
RetWts <- (DecayFactor)^DecayPowers
NEff = sum(RetWts) # number of equivalent values in computation
# Compute the exponentially weighted mean return
WtSeries <- matrix(rep(RetWts, times = NumSeries),
nrow = length(RetWts), ncol = NumSeries) * RetSeries
ERet <- colSums(WtSeries)/NEff;
# Subtract the weighted mean from the original Series
CenteredSeries <- RetSeries - matrix(rep(ERet, each = WindowLength),
nrow = WindowLength, ncol = length(ERet))
# Compute the weighted variance
WtSeries <- matrix(rep(VarWts, times = NumSeries),
nrow = length(VarWts), ncol = NumSeries) * CenteredSeries
ECov <- t(WtSeries) %*% WtSeries / NEff
list(ExpReturn = ERet, ExpCovariance = ECov, NumEffObs = NEff)

Iterating through a vector, calculating standard deviation of every n number of values in R

I'm trying to write a loop that will iterate through my vector of doubles, and calculate the standard deviation of every group of 5 values. Below is the code I've written to do so, however, when I attempt to run it, it gives the majority of my out NA as a value, which isn't accurate.
data is large matrix with 53412 elements, should be approx 1175 rows.
for(i in floor((nrow(data)/5)-5)){sd5[i] <-sd(data[seq((5*i) + 1,(5*i) + 5),6])}
I've attempted to itterate through it manually, just executing the following in the console
sd(data[seq((5) + 1,(5*i) + 5),6])
sd(data[seq((10) + 1,(10) + 5),6])
sd(data[seq((15) + 1,(15) + 5),6])
Each of those operate properly, however, when I attempted to do it with the loop, it results in NA for the majority of my data, including the 2nd and 3rd lines in the code block above.
Here is a few lines from the CSV it is reading
Just in case, I wanted to point out that I am grabbing the correct values from the CSV file, atleast when I manually execute sd(), as I've compared the console output to the CSV file. However, that doesn't mean I'm not iterating incorrectly in a way that I just can't seem to find. I set the loop to round down to avoid any out of bounds errors.
No need for a for loop.
If the vector is stored as x, you could do:
NN <- length(x)
x <- x[1:(5*floor(length(x)/5))]
dim(x) <- c(5, length(x)/5)
apply(x, 2, sd)
If it's in a data.frame, I'd use data.table (especially since sd is GForce-optimized in the current devel version):
library(data.table); setDT(data)
data[ , sd(x), by = .(grp = (0:(length(x) - 1) %/% 5))]
You can just recast into a 5 column (or row) matrix and get the FUN of the rows (or columns)
And since the matrix is so large, you can use the matrixStats library
mm <- read.csv(header = FALSE, text = "2016-04-01,108.779999,110.00,108.199997,109.989998,25626200,109.989998
mm <- mm[, -1]
mm <- matrix(sample(unlist(mm), 1500 * 55000, TRUE), 1500)
# num [1:1500, 1:55000] 110 109 110 110 110 ...
m2 <- matrix(mm, ncol = 5, byrow = TRUE)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 110.42 108.88 109.56 109.56 109.90
# [2,] 108.99 107.68 105.19 107.68 109.72
# [3,] 109.90 110.00 25626200.00 108.88 30774100.00
# [4,] 105.06 25685700.00 105.19 108.88 30774100.00
# [5,] 107.68 109.90 105.19 104.89 107.79
# [6,] 108.88 108.78 108.88 108.99 108.20
sds <- apply(m2, 1, sd)
# user system elapsed
## a damn long time
# Timing stopped at: 114.028 0.81 115.398
sds <- rowSds(m2)
# user system elapsed
# 0.347 0.051 0.402
# [1] 5.620328e-01 1.726982e+00 1.555266e+07 1.556640e+07 2.072692e+00 3.141340e-01

calculating z scores in R

I have a sample dataframe:
And wish to calculate the z-scores for every row in the data frame and did :
I used the zscore function from
install.packages(c("R.basic"), contriburl="")
And obtained this
row.names V1 V2 V3
a -1.2558275 -1.2649111 -1.0883839
b -0.2511655 -0.3162278 -0.4186092
c 0.4186092 0.6324555 0.2511655
d 1.0883839 0.9486833 1.2558275
But when I try manually calculating the z score for the first row of the data frame I obtain the following values:
-1.45 -0.29 0.4844, 1.25
Manually, for the first row, I calculated as follows:
1) calculate the row mean (4.75) for first row
2) Subtract each value from the row mean (e.g; 4.75-1., 4.75-4., 4.75-6., 4.75-8)
3) square each difference.
4) add them up and divide by the amount of samples in row 1
5) thus I obtain the variance( answer = 6.685) and then get the standard deviation ( 2.58) of the first row alone
6) Then apply the formula of z score.
The zscore function, whatever it is, seems to be the same as scale in the base package.
apply(data, 1, scale)
## [,1] [,2] [,3]
## [1,] -1.2558275 -1.2649111 -1.0883839
## [2,] -0.2511655 -0.3162278 -0.4186092
## [3,] 0.4186092 0.6324555 0.2511655
## [4,] 1.0883839 0.9486833 1.2558275
For each column, it is calculating (x - mean(x)) / sd(x).
