I have a CHOLMOD factorization of a sparse matrix H, and I want to edit the sparse representation of the upper, lower, and block diagonal factors. How can I do this? When I run the below, the last line doesn't work.
H = sprand(10,10,0.5)
fac = ldltfact(H; shift=0.0)
fD = fac[:D]
D = Base.SparseArrays.CHOLMOD.Sparse(fD)
And is there any way to go in the reverse direction from a sparse matrix to a CHOLMOD.factor?
Extracting the relevant factorization matrices of ldltfact can be a little tedious. The following example shows an example similar to the one in the question with a final test that the extracted matrices recover the original factorized one:
srand(1)
pre = sprand(10,10,0.5)
H = pre + pre' + speye(10,10)
fac = ldltfact(H; shift=0.0)
P = sparse(1:size(H,1),fac[:p],ones(size(H,1)))
LD = sparse(fac[:LD]) # this matrix contains both D and L embedded in it
L = copy(LD)
for i=1:size(L,1)
L[i,i] = 1.0
end
D = sparse(1:size(L,1),1:size(L,1),diag(LD))
PHP = P*H*P'
LDL = L*D*L'
using Base.Test
#test PHP ≈ LDL
The expected output (and actual on Julia v0.6.3):
julia> #test PHP ≈ LDL
Test Passed
Hope this helps.
Related
I am working on a portfolio optimazion algorithm and part of the problem consists in generating moment matching scenario.
My choice due to its simplicity and quickness was to go through paper "An algorithm for moment-matching scenario generation with application to financial portfolio optimization" (Ponomareva, Roman and Date).
The problem is that even though the mathematics are very simple, I am stuck by the fact that some of probability weights pi are negative even though the formulas in the paper should ensure otherwise. If I put a loop to run the algorithm until it finds a positive combination it essentially runs forever.
I put the bit of code based on the paper were things get stuck:
dummy1 = 0
while (dummy1 <=0 | dummy1 >= 1) {
dummy1 = round(rnorm(1, mean = 0.5, sd = 0.25), 2)
}
diag.cov.returns = diag(cov.returns)
Z = dummy1 * sqrt (diag.cov.returns) #Vector Z according to paper formula
ZZT = Z %*% t(Z)
LLT = cov.returns - ZZT
L = chol(LLT) #cholesky decomposition to get matrix L
s = sample (1:5, 1)
F1 = 0
F2 = -1
S = (2*N*s)+3
while (((4*F2)-(3*F1*F1)) < 0) {
#Gamma = (2*s*s)*(((N*mean.fourth) - (0.75*(sum(Z^4)* (N*mean.third/sum(Z^3))^2)))/sum(L^4))
#Gamma is necessary if we want to get p from Uniform Distribution
#U = runif(s, 0, 1)
U = rgamma(s, shape = 1, scale = ((1/exp(1)):1))
#p = (s*(N/Gamma)) + ((1/(2*N*s)) - (s/(N*Gamma)))*U
p = (-log(U, base = exp(1)))
p = p/(((2*sum(p))+max(p))*N*s) #this is the array expected to have positive and bounded between 0 and 1
q1 = 1/p
pz = p
p[s+1] = (1-(2*N*sum(p))) #extra point necessary to get the 3 moment mathcing probabilities
F1 = (N*mean.third*sqrt(p[s+1]))/(sum(Z^3))
F2 = p[s+1]*(((N*mean.fourth) - (1/(2*s*s))*sum(L^4)*(sum(1/p)))/sum(Z^4))
}
alpha = (0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
beta = -(0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
w1 = 1/(alpha*(alpha+beta))
w2 = 1/(beta*(alpha+beta))
w0 = 1 - (1/(alpha*beta))
P = rep(pz, 2*N) #Vector with Probabilities starting from p + 3 extra probabilities to match third and fourth moments
P[(2*N*s)+1] = p[s+1]*w0
P[(2*N*s)+2] = p[s+1]*w1
P[(2*N*s)+3] = p[s+1]*w2
Unfortunately I cannot discolose the input dataset containing funds returns. However I can surely be more specific. Starting from a data.frame() containing N assets' returns (in my case there 11 funds and monthly returns from 30/01/2001 to 30/09/2020). Once the mean returns, covariance matrix, central third and fourth moments (NOT skewness and kurtosis) and the averages are computed. The algorithm follows as I have reported in the problem. The point where i get stuck is that p takes also negative values. This is a problem since the first s elements of p are later used as probabilities in P.
I hope that in this way the problem is more clear. I also want to add that in the paper the data used by the authors is reported, unfortunately to import them in R would be necessary to import them manually. However I repeat any data.frame() containing assets' returns will do.
I'm having problems implementing this exercise from a quantitative economics course.
Here's my code:
N = 50
M = 20
a = 0.1
b = 0.2
c = 0.5
d = 1.0
σ = 0.1
estimates = zeros(M, 5)
for i ∈ 1:M
x₁ = Vector{BigFloat}(randn(N))
x₂ = Vector{BigFloat}(randn(N))
w = Vector{BigFloat}(randn(N))
# Derive y vector (element wise operations)
y = a*x₁ .+ b.*(x₁.^2) .+ c.*x₂ .+ d .+ σ.*w
# Derive X matrix
X = [x₁ x₁ x₂ fill(d, (N, 1)) w]
# Implementation of the formula β = inv(XᵀX)Xᵀy
estimates[i, :] = (X'*X)\X'*y
end
histogram(estimates, layout=5, labels=["a", "b", "c", "d", "σ"])
I get a SingularException(5) error, as the matrix X'X has a determinant of 0 and has no inverse. My question is, where have I gone wrong in this exercise? I heard that a reason the determinant might be zero is floating point inaccuracy, so I made the random variables BigFloats to no avail. I know the mistake I'm making isn't very complicated but I'm lost. Thank you!
Your X should be
X = [x₁ x₁*x₁ x₂ fill(d, (N, 1))]
Explanation
It looks that you are trying to test OLS to estimate the parameters of the model:
y = α₀ + α₁x₁ + α₁₁x₁² + α₂x₂ + ϵ
where α₀, is the intercept of the model, α₁, α₁₁, α₂ are parameters for explanatory variables, and ϵ is the random error with the expected value 0 and variance σ². Hence the structure of X must match your case.
Putting the α₁ twice you introduced co-linearity and got the error.
You also do not want to "estimate" the parameter for ϵ because it represents the randomness.
I everybody,
I have a function to optimize, subject to linear constraints.
I am actually using maxLik R-package, but this is a wrapper for various method, thus what I am actually running in constrOptim.
The problem is the following: I have a matrix of constraints which is n^2 x n, but n is ~ 10^3, so the matrix is huge and the routine stops for memory problems.
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
It seemed quite natural to me to shift to sparse matrices (indeed my matrix is very sparse) with the Matrix package, but I always get the following error
Error: Matrices must have same dimensions in ineqA * gi.old
even for small n.
Does it mean that sparseMatrix is not supported in constrOptim?
Do you know any way out?
reproducible example
you can find the dataset I am using to optimize here:
http://konect.uni-koblenz.de/downloads/extraction/opsahl.tar.bz2
and here you have the code
#read edgelist
edgelist <- read.table('out.opsahl-usairport',skip=2)
colnames(edgelist) = c('V1','V2','weight')
require(igraph)
g = graph_from_data_frame(edgelist)
s_in = strength(g,v=V(g), mode= 'in')
s_out = strength(g,v=V(g),mode='out')
n = length(s_in)
# optimization function
objective_fun = function(x){
theta_out = x[1:(length(x)/2)]; theta_in = x[(length(x)/2+1):length(x)];
llikelihood(s_out,s_in,theta_out,theta_in)
}
llikelihood = function(s_out,s_in,theta_out, theta_in){
theta_sum_mat = outer(theta_out,rep(1,length(theta_out))) + outer(rep(1,length(theta_in)),theta_in)
theta_sum_mat = log(1-exp(-theta_sum_mat))
diag(theta_sum_mat) = 0 # avoid self loops
f = -sum(s_out*theta_out+s_in*theta_in) + sum(theta_sum_mat)
f
}
#choose appropriate starting point
starting_point = function(s_out,s_in){
s_tot = sum(s_in) # =sum(s_out)
s_mean = mean(mean(s_in),mean(s_out))
z = log((s_tot + s_mean^2)/(s_mean^2))
list(theta_out = rep(1,length(s_out)), theta_in=rep(z-1,length(s_in))) # starting parameters
}
#gradient
grad = function(x){
theta_out = x[1:(length(x)/2)]; theta_in = x[(length(x)/2+1):length(x)];
ret = grad_fun(s_out,s_in,theta_out,theta_in)
ret
}
grad_fun = function(s_out,s_in, theta_out, theta_in){
theta_sum_mat = outer(theta_out,rep(1,length(theta_out))) + outer(rep(1,length(theta_in)),theta_in)
theta_sum_mat = exp(-theta_sum_mat)/(1-exp(-theta_sum_mat))
diag(theta_sum_mat) = 0 # avoid self loops
c(-s_out + rowSums(theta_sum_mat), -s_in + colSums(theta_sum_mat))
}
#constraints
constraints = function(n){
a1 = Diagonal(n); a2 = sparseMatrix(c(1:n),rep(1,n), x=1, dims=c(n,n)) # Diagonal is a sparse diagonal matrix
a12 = cBind(a1,a2)
a12[1,] = 0 # avoid self loops
dd = function(j){
sparseMatrix(c(1:n),rep(j,n), x=rep(1,n), dims=c(n,n))
}
b1 = sparseMatrix(i=1, j=1, x=1, dims=c(n^2,1)) # 1,0,0,... n^2 vector
for(j in c(2:n)) {
a = cBind(Diagonal(n),dd(j))
a[j,]=0 # avoid self loops
a12 = rBind(a12, a)
b1[(j-1)*n+j] = 1 # add 1 to ''self loops'' rows, in order to have the inequality satisfied
}
return(list(A=a12, B=b1))
}
# starting point
theta_0 = starting_point(s_out,s_in)
x_0 = c(theta_0$theta_out, theta_0$theta_in)
#constraints
constr = list(ineqA=constraints(n)$A, ineqB=constraints(n)$B)
# optimization
co = maxControl(printLevel = 1, iterlim=500, tol=1e-4) #tol=1e-8 (def) iterlim=150 (def)
res = maxLik(objective_fun, grad=grad, start=x_0, constraints=constr, control=co)
When we do variable clustering in R using package ClustOfVar, we typically need to cut the clustering tree into groups by specifying parameters h (height) or k (the desired number of clusters).
It always works fine when we use k (as long as k is less than the number of columns).
But when we use h to cutree, if there are 3 or more than 3 identical columns, chances are big that there will be an error. (Please run the following code some more times if you don't get the error message.)
A = sample(LETTERS[1:4],40,replace = TRUE)
B = A
C = A
fit = hclustvar(X.quali = data.frame(A,B,C))
plot(fit)
cutree(fit,h=0.5)
#Error in cutree(fit, h = 0.5) :
# the 'height' component of 'tree' is not sorted (increasingly)
In the following example, we have groups of only 2 identical columns. And it never fails.
A = sample(LETTERS[1:4],40,replace = TRUE)
B = A
C = sample(LETTERS[1:2],40,replace = TRUE)
D = C
fit = hclustvar(X.quali = data.frame(A,B,C,D))
plot(fit)
cutree(fit,h=0.5)
If we add one more identical column to the second example:
A = sample(LETTERS[1:4],40,replace = TRUE)
B = A
C = sample(LETTERS[1:2],40,replace = TRUE)
D = C
E = C
fit = hclustvar(X.quali = data.frame(A,B,C,D,E))
plot(fit)
cutree(fit,h=0.5)
#Error in cutree(fit, h = 0.5) :
# the 'height' component of 'tree' is not sorted (increasingly)
Again it fails from time to time.
I think cutting the clustering tree by h is more practical and convenient cause it's more consistent and sense-making.
Is there any workaround that could make it work? Or is there any more stable packages in R that can do variable clustering? Thanks.
I am trying to build a portfolio which is optimized with respect to another in R.
I am trying to minimize the objective function
$$min Var(return_p-return'weight_{bm})$$
with the constraints
$$ 1_n'w = 1$$
$$w > .005$$
$$w < .8$$
with w being the returns from a portfolio. there are 10 securities, so I set the benchmark weights at .1 each.
I know that
$$ Var(return_p-return'weight_{bm})= var(r) + var(r'w_{bm}) - 2*cov(r_p, r'w_{bm})=var(r'w)-2cov(r'w,r'w_{bm})=w'var(r)w-2cov(r'w,r'w_{bm})$$
$$=w'var(r)w-2cov(r',r'w_bm)w$$
the last term is of the form I need so I tried to solve this with solve.QP in R, the constraints are giving me a problem though.
here is my code
trackport <- array(rnorm(obs * assets, mean = .2, sd = .15), dim = c(obs,
assets)) #this is the portfolio which the assets are tracked against
wbm <- matrix(rep(1/assets, assets)) #random numbers for the weights
Aeq <- t(matrix(rep(1,assets), nrow=assets, ncol = 1)) #col of 1's to add
#the weights
Beq <- 1 # weights should sum to 1's
H = 2*cov(trackport) #times 2 because of the syntax
#multiplies the returns times coefficients to create a vector of returns for
#the benchmark
rbm = trackport %*% wbm
#covariance between the tracking portfolio and benchmark returns
eff <- cov(trackport, rbm)
#constraints
Amatrix <- t(matrix(c(Aeq, diag(assets), -diag(assets)), ncol = assets,
byrow = T))
Bvector <- matrix(c(1,rep(.005, assets), rep(.8, assets)))
#solve
solQP3 <- solve.QP(Dmat = H,
dvec = zeros, #reduces to min var portfolio for
#troubleshooting purposes
Amat = Amatrix,
bvec = Bvector,
meq = 1)
the error I am getting is "constraints are inconsistent, no solution!" but I can't find what's wrong with my A matrix
My (transposed) A matrix looks like this
[1,1,...,1]
[1,0,...,0]
[0,1,...,0]
...
[0,0,...,1]
[-1,0,...,0]
[0,-1,...,0]
...
[0,0,...,-1]
and my $b_0$ looks like this
[1]
[.005]
[.005]
...
[.005]
[.8]
[.8]
...
[.8]
so I'm not sure why it isn't finding a solution, could anyone take a look?
I'm not familiar with the package, but just took a quick look at https://cran.r-project.org/web/packages/quadprog/quadprog.pdf , which apparently is what you are using.
Your RHS values of .8 should be -0.8 because this function uses ≥ inequalities. So you have been constraining the variables to be ≥ .005 and ≤ -0.8, which of course is not what you want, and is infeasible.
So leave transposed A as is and make
b0:
[1]
[.005]
[.005]
...
[.005]
[-.8]
[-.8]
...
[-.8]