R: How to add jitter only on singular matrices within a function? - r

I have the following function that I need to (m)apply on a list of more than 1500 large matrices (Z) and a list of vectors (p) of the same length. However, I get the error that some matrices are singular as I already posted here. Here my function:
kastner <- function(item, p) { print(item)
imp <- rowSums(Z[[item]])
exp <- colSums(Z[[item]])
x = p + imp
ac = p + imp - exp
einsdurchx = 1/as.vector(x)
einsdurchx[is.infinite(einsdurchx)] <- 0
A = Z[[item]] %*% diag(einsdurchx)
R = solve(diag(length(p))-A) %*% diag(p)
C = ac * einsdurchx
R_bar = diag(as.vector(C)) %*% R
rR_bar = round(R_bar)
return(rR_bar)
}
and my mapply command that also prints the names of the running matrix:
KASTNER <- mapply(kastner, names(Z), p, SIMPLIFY = FALSE)
In order to overcome the singularity problem, I want to add a small amount of jitter the singular matrices. The problem starts in line 9 of the function R = solve(diag(length(p))-A) %*% diag(p) as this term(diag(length(p))-A) gets singular and can't be solved. I tried to add jitter to all Z matrices in the first line of the function using: Z <- lapply(Z,function(x) jitter(x, factor = 0.0001, amount = NULL)), but this is very very low and produces still errors.
Therefore my idea is to check with if/else or something similar if this matrix diag(length(p))-A is singular (maybe using eigenvectors to check collinearity) and add on those matrices jitter, else (if not) the solve command should performed as it is. Ideas how to implement this on the function? Thanks
Here some example data, although there is no problem with singularity as I was not able to rebuild this error for line 9:
Z <- list("111.2012"= matrix(c(0,0,100,200,0,0,0,0,50,350,0,50,50,200,200,0),
nrow = 4, ncol = 4, byrow = T),
"112.2012"= matrix(c(10,90,0,30,10,90,0,10,200,50,10,350,150,100,200,10),
nrow = 4, ncol = 4, byrow = T))
p <- list("111.2012"=c(200, 1000, 100, 10), "112.2012"=c(300, 900, 50, 100))
Edit: a small amount o jitter shouldn't be problematic in my data as I have probably more than 80% of zeros in my matrices and than large values. And I am only interested in those large values, but the large amount of 0s are probably the reason for the singularity, but needed.

Since you didn't provide a working example I couldn't test this easily, so the burden of proof is on you. :) In any case, it should be a starting point for further tinkering. Comments in the code.
kastner <- function(item, p) { print(item)
imp <- rowSums(Z[[item]])
exp <- colSums(Z[[item]])
x = p + imp
ac = p + imp - exp
einsdurchx = 1/as.vector(x)
einsdurchx[is.infinite(einsdurchx)] <- 0
# start a chunk that repeats until you get a valid result
do.jitter <- TRUE # bureaucracy
while (do.jitter == TRUE) {
# run the code as usual
A = Z[[item]] %*% diag(einsdurchx)
# catch any possible errors, you can even catch "singularity" error here by
# specifying error = function(e) e
R <- tryCatch(solve(diag(length(p))-A) %*% diag(p), error = function(e) "jitterme")
# if you were able to solve(), and the result is a matrix (carefuly if it's a vector!)...
if (is.matrix(R)) {
# ... turn the while loop off
do.jitter <- FALSE
} else {
#... else apply some jitter and repeat by construcing A from a jittered Z[[item]]
Z[[item]] <- jitter(Z[[item]])
}
}
C = ac * einsdurchx
R_bar = diag(as.vector(C)) %*% R
rR_bar = round(R_bar)
return(rR_bar)
}

Related

Integrating under a curve in R

I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
list(c(dP,dI))
}
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)
integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05

cross product of two matrices with missing values

I would like to vectorize a distance matrix calculation by using the Law of Cosines. For a matrix with no missing values, this calculation is even faster than dist for very big matrices.
The function goes like this:
vectorizedDistMat <- function(x,y) {
an = rowSums(x^2)
bn = rowSums(y^2)
m = nrow(x)
n = nrow(y)
tmp = matrix(rep(an, n), nrow=m)
tmp = tmp + matrix(rep(bn, m), nrow=m, byrow=TRUE)
sqrt( abs(tmp - 2 * tcrossprod(x,y) ))
}
Now, missing values complicate things, especially in the last line of the above function, when the two matrices are multiplied. There is a way of accounting for missing values retrospectively, e.g. here. But how do I efficiently prevent missing value to "multiply" when multiplying the matrices?
See for example M1 and M2 below:
set.seed(42)
M1 = matrix(rnorm(50), nrow = 10, ncol = 5)
M2 = matrix(rnorm(50), nrow = 10, ncol = 5)
M1[1,2] = NA
M2[3,4] = NA
here, tcrossprod(M1, M2) yields NA's in the first row and third column. How do I account for them and get rid of them in advance (like the built-in function dist)?

Sequential Quadratic Programming in R to find optimal weights of an Equally-Weighted Risk Contribution Portfolio

Introduction to the problem
I am trying to write down a code in R so to obtain the weights of an Equally-Weighted Contribution (ERC) Portfolio. As some of you may know, the portfolio construction was presented by Maillard, Roncalli and Teiletche.
Skipping technicalities, in order to find the optimal weights of an ERC portfolio one needs to solve the following Sequential Quadratic Programming problem:
with:
Suppose we are analysing N assets. In the above formulas, we have that x is a (N x 1) vector of portfolio weights and Σ is the (N x N) variance-covariance matrix of asset returns.
What I have done so far
Using the function slsqp of the package nloptr which solves SQP problems, I would like to solve the above minimisation problem. Here is my code. Firstly, the objective function to be minimised:
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
}
Secondly, the starting point (we start by an equally-weighted portfolio):
x0 <- matrix(1/N, nrow = N, ncol = 1)
Then, the equality constraint (weights must sum to one, that is: sum of the weights minus one equal zero):
heqERC <- function (x) {
h <- numeric(1)
h[1] <- (t(matrix(1, nrow = N, ncol = 1)) %*% x) - 1
return(h)
}
Finally, the lower and upper bounds constraints (weights cannot exceed one and cannot be lower than zero):
lowerERC <- matrix(0, nrow = N, ncol = 1)
upperERC <- matrix(1, nrow = N, ncol = 1)
So that the function which should output optimal weights is:
slsqp(x0 = x0, fn = ObjFuncERC, Sigma = Sigma, lower = lowerERC, upper = upperERC, heq = heqERC)
Unfortunately, I do not know how to share with you my variance-covariance matrix (which takes name Sigma and is a (29 x 29) matrix, so that N = 29) so to reproduce my result, still you can simulate one.
The output error
Running the above code yields the following error:
Error in nl.grad(x, fn) :
Function 'f' must be a univariate function of 2 variables.
I have no idea what to do guys. Probably, I have misunderstood how things must be written down in order for the function slsqp to understand what to do. Can someone help me understand how to fix the problem and get the result I want?
UPDATE ONE: as pointed out by #jogo in the comments, I have updated the code, but it still produces an error. The code and the error above are now updated.
UPDATE 2: as requested by #jaySf, here is the full code that allows you to reproduce my error.
## ERC Portfolio Test
# Preliminary Operations
rm(list=ls())
require(quantmod)
require(nloptr)
# Load Stock Data in R through Yahoo! Finance
stockData <- new.env()
start <- as.Date('2014-12-31')
end <- as.Date('2017-12-31')
tickers <-c('AAPL','AXP','BA','CAT','CSCO','CVX','DIS','GE','GS','HD','IBM','INTC','JNJ','JPM','KO','MCD','MMM','MRK','MSFT','NKE','PFE','PG','TRV','UNH','UTX','V','VZ','WMT','XOM')
getSymbols.yahoo(tickers, env = stockData, from = start, to = end, periodicity = 'monthly')
# Create a matrix containing the price of all assets
prices <- do.call(cbind,eapply(stockData, Op))
prices <- prices[-1, order(colnames(prices))]
colnames(prices) <- tickers
# Compute Returns
returns <- diff(prices)/lag(prices)[-1,]
# Compute variance-covariance matrix
Sigma <- var(returns)
N <- 29
# Set up the minimization problem
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
}
x0 <- matrix(1/N, nrow = N, ncol = 1)
heqERC <- function (x) {
h <- numeric(1)
h[1] <- t(matrix(1, nrow = N, ncol = 1)) %*% x - 1
}
lowerERC <- matrix(0, nrow = N, ncol = 1)
upperERC <- matrix(1, nrow = N, ncol = 1)
slsqp(x0 = x0, fn = ObjFuncERC, Sigma = Sigma, lower = lowerERC, upper = upperERC, heq = heqERC)
I spotted several mistakes in your code. For instance, ObjFuncERC is not returning any value. You should use the following instead:
# Set up the minimization problem
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
sum
}
heqERC doesn't return anything too, I also changed your function a bit
heqERC <- function (x) {
sum(x) - 1
}
I made those changes and tried slsqp without lower and upper and it worked. Still, another thing to consider is that you set lowerERC and upperERC as matrices. Use the following instead:
lowerERC <- rep(0,N)
upperERC <- rep(1,N)
Hope this helps.

R Optimization given objective function

obj1<-function(monthly.savings,
success,
start.capital,
target.savings,
monthly.mean.return,
monthly.ret.std.dev,
monthly.inflation,
monthly.inf.std.dev,
n.obs,
n.sim=1000){
req = matrix(start.capital, n.obs+1, n.sim) #matrix for storing target weight
monthly.invest.returns = matrix(0, n.obs, n.sim)
monthly.inflation.returns = matrix(0, n.obs, n.sim)
monthly.invest.returns[] = rnorm(n.obs * n.sim, mean = monthly.mean.return, sd = monthly.ret.std.dev)
monthly.inflation.returns[] = rnorm(n.obs * n.sim, mean = monthly.inflation, sd = monthly.inf.std.dev)
#for loop to be
for (a in 1:n.obs){
req[a + 1, ] = req[a, ] * (1 + monthly.invest.returns[a,] - monthly.inflation.returns[a,]) + monthly.savings
}
ending.values=req[nrow(req),]
suc<-sum(ending.values>target.savings)/n.sim
value<-success-suc
return(abs(value))
}
I have the above objective function that I want to minimize for. It tries to solve for the monthly savings required for a given probability of success. Given the following input assumptions
success<-0.9
start.capital<-1000000
target.savings<-1749665
monthly.savings=10000
monthly.mean.return<-(5/100)/12
monthly.ret.std.dev<-(3/100)/sqrt(12)
monthly.inflation<-(5/100)/12
monthly.inf.std.dev<-(1.5/100)/sqrt(12)
monthly.withdrawals<-10000
n.obs<-10*12 #years * 12 months in a year
n.sim=1000
I used the following notation:
optimize(f=obj1,
success=success,
start.capital=start.capital,
target.savings=target.savings,
monthly.mean.return=monthly.mean.return,
monthly.ret.std.dev=monthly.ret.std.dev,
monthly.inflation=monthly.inflation,
monthly.inf.std.dev=monthly.inf.std.dev,
n.obs = n.obs,
n.sim = n.sim,
lower = 0,
upper = 10000,
tol = 0.000000001,maximum=F)
I get 7875.03
Since I am sampling from a normal distribution, the output will be different each time but they should be around the same give or take a few % points. The problem I am having is that I can't specify a upper limit arbitrarily. The above example's upper limit (10000) is cherry picked after numerous trials. If say I put in a upper limit of 100000 (unreasonable I know) it will return that number as oppose to finding the global minimum saving. Any ideas where I am structuring my objective function incorrectly?
thanks,
The fact that your function does not always return the same output for a given input
is likely to pose a few problems (it will create a lot of spurious local minima):
you can avoid them by setting the seed of the random number generator
inside the function (e.g., set.seed(1)),
or by storing the random numbers and reusing them each time,
or by using a low-discrepancy sequence (e.g., randtoolbox::sobol).
Since it is a function of one variable, you can simply plot it to see what happens:
it has a plateau after 10,000 -- optimization algorithms cannot distinguish
between a plateau and a local optimum.
f <- function(x) {
set.seed(1)
obj1(x,
success = success,
start.capital = start.capital,
target.savings = target.savings,
monthly.mean.return = monthly.mean.return,
monthly.ret.std.dev = monthly.ret.std.dev,
monthly.inflation = monthly.inflation,
monthly.inf.std.dev = monthly.inf.std.dev,
n.obs = n.obs,
n.sim = n.sim
)
}
g <- Vectorize(f)
curve(g(x), xlim=c(0, 20000))
Your initial problem is actually not a minimization problem,
but a root finding problem, which is much easier.
obj2 <- function(monthly.savings) {
set.seed(1)
req = matrix(start.capital, n.obs+1, n.sim)
monthly.invest.returns <- matrix(0, n.obs, n.sim)
monthly.inflation.returns <- matrix(0, n.obs, n.sim)
monthly.invest.returns[] <- rnorm(n.obs * n.sim, mean = monthly.mean.return, sd = monthly.ret.std.dev)
monthly.inflation.returns[] <- rnorm(n.obs * n.sim, mean = monthly.inflation, sd = monthly.inf.std.dev)
for (a in 1:n.obs)
req[a + 1, ] <- req[a, ] * (1 + monthly.invest.returns[a,] - monthly.inflation.returns[a,]) + monthly.savings
ending.values <- req[nrow(req),]
suc <- sum(ending.values>target.savings)/n.sim
success - suc
}
uniroot( obj2, c(0, 1e6) )
# [1] 7891.187

R: Complex numbers not compatible with boot() function

I found out that boot function of the boot package is not working with complex numbers. I am trying to bootstrap a data by taking the eigenvalue of the bivariate matrix. The problem with the eigenvalue is that, it often returns complex numbers, and by that it (boot) gives error. Is there a way to avoid complex numbers?
Here is my codes,
Data <- read.table('http://ubuntuone.com/6n1igcHXq4EnOm4x2zeqFb', header = FALSE)
Mat <- cbind(Data[["V1"]],Data[["V2"]])
Data.ts <- as.ts(Mat)
Below are some functions needed,
library(mvtnorm)
var1.sim <- function(T, n.start=100, phi1=matrix(c(0.7,0.2,0.2,0.7),nr=2),
err.mu=c(0,0), err.sigma2=matrix(c(1,0.5,0.5,1), nr=2),
errors=NULL) {
e <- rmvnorm(n.start + T, err.mu, err.sigma2) # (n.start+T) x 2 matrix
y <- matrix(0, nrow=n.start+T, ncol=2)
if (!is.null(errors) && is.matrix(errors) && ncol(errors) == 2) {
rows <- nrow(errors)
if (rows < n.start + T) {
# replace last nrow(errors) errors
e[seq.int(n.start+T-rows+1,n.start+T),] <- errors
} else {
e <- errors[seq.int(n.start+T+1, rows)]
}
}
for (t in seq.int(2, n.start + T)) {
y[t,] <- phi1 %*% y[t-1,] + e[t,]
}
return(ts(y[seq.int(n.start+1,n.start+T),]))
}
########
coef.var1 <- function(var.fit) {
k <- coef(var.fit)
rbind(k[[1]][,"Estimate"], k[[2]][,"Estimate"])
}
And here is the main method,
library(vars)
library(boot)
y.var <- VAR(Data.ts, p=1, type="none")
y.resid <- resid(y.var)
rm(y.boot)
y.boot <- boot(y.resid, R=100, statistic=function(x,i) {
resid.boot <- x[i,]
y.boot1 <- var1.sim(T=nrow(x), errors=resid.boot)
min(eigen(coef.var1(VAR(y.boot1, p=1, type="none")))$values)
}, stype="i")
y.boot$t
y.ci <- boot.ci(y.boot, type="norm", conf=0.95)$normal[2:3]
list(t=y.boot$t,ci=y.ci)
The problem occurs in y.boot object, particularly this line
min(eigen(coef.var1(VAR(y.boot1, p=1, type="none")))$values)
When the obtain minimum eigenvalue is complex, then boot will return this error
Error in min(eigen(coef.var1(VAR(y.boot1, p = 1, type = "none")))$values) :
invalid 'type' (complex) of argument
Otherwise, there is no problem. Now, it would be safe if this 100 bootstraps is performed once, but I am going to loop this actually about 100 times too. So, there is a big chance that complex values will occur in these loops. Hence, we will obtain the above error again.
Is there a way to avoid these complex values?

Resources