Implementing OLS in matrix form - julia

I'm having problems implementing this exercise from a quantitative economics course.
Here's my code:
N = 50
M = 20
a = 0.1
b = 0.2
c = 0.5
d = 1.0
σ = 0.1
estimates = zeros(M, 5)
for i ∈ 1:M
x₁ = Vector{BigFloat}(randn(N))
x₂ = Vector{BigFloat}(randn(N))
w = Vector{BigFloat}(randn(N))
# Derive y vector (element wise operations)
y = a*x₁ .+ b.*(x₁.^2) .+ c.*x₂ .+ d .+ σ.*w
# Derive X matrix
X = [x₁ x₁ x₂ fill(d, (N, 1)) w]
# Implementation of the formula β = inv(XᵀX)Xᵀy
estimates[i, :] = (X'*X)\X'*y
end
histogram(estimates, layout=5, labels=["a", "b", "c", "d", "σ"])
I get a SingularException(5) error, as the matrix X'X has a determinant of 0 and has no inverse. My question is, where have I gone wrong in this exercise? I heard that a reason the determinant might be zero is floating point inaccuracy, so I made the random variables BigFloats to no avail. I know the mistake I'm making isn't very complicated but I'm lost. Thank you!

Your X should be
X = [x₁ x₁*x₁ x₂ fill(d, (N, 1))]
Explanation
It looks that you are trying to test OLS to estimate the parameters of the model:
y = α₀ + α₁x₁ + α₁₁x₁² + α₂x₂ + ϵ
where α₀, is the intercept of the model, α₁, α₁₁, α₂ are parameters for explanatory variables, and ϵ is the random error with the expected value 0 and variance σ². Hence the structure of X must match your case.
Putting the α₁ twice you introduced co-linearity and got the error.
You also do not want to "estimate" the parameter for ϵ because it represents the randomness.

Related

Moment Matching Scenario Generation in R

I am working on a portfolio optimazion algorithm and part of the problem consists in generating moment matching scenario.
My choice due to its simplicity and quickness was to go through paper "An algorithm for moment-matching scenario generation with application to financial portfolio optimization" (Ponomareva, Roman and Date).
The problem is that even though the mathematics are very simple, I am stuck by the fact that some of probability weights pi are negative even though the formulas in the paper should ensure otherwise. If I put a loop to run the algorithm until it finds a positive combination it essentially runs forever.
I put the bit of code based on the paper were things get stuck:
dummy1 = 0
while (dummy1 <=0 | dummy1 >= 1) {
dummy1 = round(rnorm(1, mean = 0.5, sd = 0.25), 2)
}
diag.cov.returns = diag(cov.returns)
Z = dummy1 * sqrt (diag.cov.returns) #Vector Z according to paper formula
ZZT = Z %*% t(Z)
LLT = cov.returns - ZZT
L = chol(LLT) #cholesky decomposition to get matrix L
s = sample (1:5, 1)
F1 = 0
F2 = -1
S = (2*N*s)+3
while (((4*F2)-(3*F1*F1)) < 0) {
#Gamma = (2*s*s)*(((N*mean.fourth) - (0.75*(sum(Z^4)* (N*mean.third/sum(Z^3))^2)))/sum(L^4))
#Gamma is necessary if we want to get p from Uniform Distribution
#U = runif(s, 0, 1)
U = rgamma(s, shape = 1, scale = ((1/exp(1)):1))
#p = (s*(N/Gamma)) + ((1/(2*N*s)) - (s/(N*Gamma)))*U
p = (-log(U, base = exp(1)))
p = p/(((2*sum(p))+max(p))*N*s) #this is the array expected to have positive and bounded between 0 and 1
q1 = 1/p
pz = p
p[s+1] = (1-(2*N*sum(p))) #extra point necessary to get the 3 moment mathcing probabilities
F1 = (N*mean.third*sqrt(p[s+1]))/(sum(Z^3))
F2 = p[s+1]*(((N*mean.fourth) - (1/(2*s*s))*sum(L^4)*(sum(1/p)))/sum(Z^4))
}
alpha = (0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
beta = -(0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
w1 = 1/(alpha*(alpha+beta))
w2 = 1/(beta*(alpha+beta))
w0 = 1 - (1/(alpha*beta))
P = rep(pz, 2*N) #Vector with Probabilities starting from p + 3 extra probabilities to match third and fourth moments
P[(2*N*s)+1] = p[s+1]*w0
P[(2*N*s)+2] = p[s+1]*w1
P[(2*N*s)+3] = p[s+1]*w2
Unfortunately I cannot discolose the input dataset containing funds returns. However I can surely be more specific. Starting from a data.frame() containing N assets' returns (in my case there 11 funds and monthly returns from 30/01/2001 to 30/09/2020). Once the mean returns, covariance matrix, central third and fourth moments (NOT skewness and kurtosis) and the averages are computed. The algorithm follows as I have reported in the problem. The point where i get stuck is that p takes also negative values. This is a problem since the first s elements of p are later used as probabilities in P.
I hope that in this way the problem is more clear. I also want to add that in the paper the data used by the authors is reported, unfortunately to import them in R would be necessary to import them manually. However I repeat any data.frame() containing assets' returns will do.

How do I solve a SDE with two cases in R?

I want to solve the following stochastic differential equation with R:
\frac{dx}{dt}=f(x)+sigma*dW
f(x)= a+bx+cx^2 (for x \leq 1) f(x)= a+bx (for x > 1)
and
sigma=d^2
where (a, b, c, and d are constants).
I tried using:
f = expression(a+bx+cx^2)
s = expression(d^2)
solution <- sde.sim(X0=0.6, t0=0, N=2000, delta=0.01, drift = f, sigma = s )
But how do I include the second case (when x>1)?
Sorry for the poor inclusion of the mathematical expression. I do not how to write latex here.
Maybe something like this where (x <= 1) would evaluate to 0 or 1 depending on the case.
f = expression(1+ 2 * x + (x <= 1) * 3*x^2)
s = expression(2^2)
solution <- sde.sim(X0=0.6, t0=0, N=2000, delta=0.01, drift = f, sigma = s)

Julia: converting CHOLMOD factor to sparse matrix and back again

I have a CHOLMOD factorization of a sparse matrix H, and I want to edit the sparse representation of the upper, lower, and block diagonal factors. How can I do this? When I run the below, the last line doesn't work.
H = sprand(10,10,0.5)
fac = ldltfact(H; shift=0.0)
fD = fac[:D]
D = Base.SparseArrays.CHOLMOD.Sparse(fD)
And is there any way to go in the reverse direction from a sparse matrix to a CHOLMOD.factor?
Extracting the relevant factorization matrices of ldltfact can be a little tedious. The following example shows an example similar to the one in the question with a final test that the extracted matrices recover the original factorized one:
srand(1)
pre = sprand(10,10,0.5)
H = pre + pre' + speye(10,10)
fac = ldltfact(H; shift=0.0)
P = sparse(1:size(H,1),fac[:p],ones(size(H,1)))
LD = sparse(fac[:LD]) # this matrix contains both D and L embedded in it
L = copy(LD)
for i=1:size(L,1)
L[i,i] = 1.0
end
D = sparse(1:size(L,1),1:size(L,1),diag(LD))
PHP = P*H*P'
LDL = L*D*L'
using Base.Test
#test PHP ≈ LDL
The expected output (and actual on Julia v0.6.3):
julia> #test PHP ≈ LDL
Test Passed
Hope this helps.

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

predict points on grid over time?

I have a, hopefully, simple question. Im using Nuke to do a linear animation and I have 2 points.
point1 # frame 1 is (5,90)
point2 # frame 10 is (346,204)
Using a linear interpolation type, I want to fiqure out where the x and y point is at frame 30.
The way i tried is using the slope formula and then finding the y intercept.
m = (204 - 90) / (346 - 5)
m = 114/341 = .3343
then I got the intercept by:
Y = Mx + b
90 = .3343(5) + b
90 = 1.6715 + b
88.3285 = b
so...I got the formula for my line. y = .3343X + 88.3285
Can someone help me figure out where the point is going to be at any given frame?
If you'd please refer to the image attached... you can see image of my graph.
I guess the problem I'm having is relating the time to the coord points.
Thanks
Just consider x as a function of time (t).
Here's some coordinates:
(t, x)
(1, 5)
(10, 346)
and some calculation of the line equation:
x = mt+b
m = (346-5) / (10-1)
m = 341/9
b = 5 - (341/9)*1
b = - 296/9
x = (341t - 296)/9
And using my formula (t -> x) and your formula (x -> y), I can calculate where things are at t=30
t = 30
x = 1103 + 7/9
y = 457.3214

Resources