Related
Following this question: How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?
I first sampling 500 eigenvectors v of a random matrix G and then generate 100 different random vectors initial of dimension 500. I normalized them in mats.
#make this example reproducible
set.seed(100001)
n <- 500
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
Then I compute res
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
I want to get res:
res <- lapply(xmats, find_t)
However, it shows error that Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) : f() values at end points not of opposite sign
res is a list. I run hist(unlist(res)) and it worked well.
How can I calculate the OLS coefficient in R without the function lm.
Formula: ß = (X'X)^-1*X'y
X <- cbind(runif(1000000), rnorm(1000000), rchisq(1000000,50))
y <- 100 * X[,1] + 200 * X[,2] + rnorm(nrow(X), 0, 10)
Would be really grateful for help since I have no idea how I can do this
It's basic linear algebra for OLS. You might want to have a look at https://en.wikipedia.org/wiki/Linear_regression
set.seed(123)
X <- cbind(runif(1000000), rnorm(1000000), rchisq(1000000,50))
y <- 100 * X[,1] + 200 * X[,2] + rnorm(nrow(X), 0, 10)
# (X'X)^-1*X'y
# basic matrix algebra
solve(t(X) %*% X) %*% (t(X) %*% y)
# crossprod for numeric stability
crossprod(solve(crossprod(X)), crossprod(X,y))
# same in lm()
lm(y~0+X)
If your linear model has an intercept
x <- cbind(1, X)
# (X'X)^-1*X'y
solve(t(x) %*% x) %*% (t(x) %*% y)
crossprod(solve(crossprod(x)), crossprod(x,y))
lm(y~X)
Here is my version, including gradient decent. Kudos also to this post.
x0 <- c(1,1,1,1,1) # Intercept
x1 <- c(1,2,3,4,5)
x2 <- c(8,4,3,1,8)
x <- as.matrix(cbind(x0,x1,x2))
y <- as.matrix(c(3,7,5,11,14))
# (X'X)^-1 X'y
beta1 = solve(t(x)%*%x) %*% t(x)%*%y
# R's regression command
beta2 = summary(lm(y ~ x[, 2:3]))
# Gradient decent
m <- nrow(y)
grad <- function(x, y, theta) {
gradient <- (1/m)* (t(x) %*% ((x %*% t(theta)) - y))
return(t(gradient))
}
# Define gradient descent update algorithm
grad.descent <- function(x, maxit){
theta <- matrix(c(0, 0, 0), nrow=1) # Initialize the parameters
alpha = .05 # set learning rate
for (i in 1:maxit) {
theta <- theta - alpha * grad(x, y, theta)
}
return(theta)
}
# Results without feature scaling
print(grad.descent(x,2000))
beta1
beta2
I'm trying to code from scratch a function that estimates regression coefficients using LASSO with coordinated descent (Gauss-Seidel) and soft thresholding.
My code is the following:
library(MASS)
set.seed(1)
n = 200
p = 200
V = matrix(0.2, p, p)
diag(V) = 1
X = as.matrix(mvrnorm(n, mu = rep(0, p), Sigma = V))
y = X[, 1] + 0.5*X[, 2] + 0.25*X[, 3] + rnorm(n)
X = scale(X)
y = scale(y)
soft_th <- function(b, lambda){
if (b > lambda){
return (b - lambda)
}
else if (b < -lambda){
return(b + lambda)
}
else {
return (0)
}
}
myLasso <- function(X,y, lambda=0.3,tol=1e-5,maxitr=100){
beta_old <- rep(0,p)
beta_new <- rep(0,p)
for(i in 1:maxitr){
beta_old <- beta_new
for (j in (1:ncol(X)))
{
X_j <- X[,j]
y_pred <- t(X)%*%beta_old
rho <- t(X_j)%*%(y - y_pred + beta_old[j]*X_j)
beta_new[j] <- soft_th(rho,0.7)
}
l1 <- sum(abs(beta_old-beta_new))
print(l1)
r <- y - t(X)%*%beta_old
if (l1<tol){
print('Convergence reached')
break
}
}
}
myLasso(X,y)
The problem that I have is that the L1 norm between beta_old and beta_new increases (a lot!) between each iteration. I'm following what is said in this post:
https://stats.stackexchange.com/questions/123672/coordinate-descent-soft-thresholding-update-operator-for-lasso/351134#351134
I think that somewhat I'm not correctly implementing the descent update rule.
Any help would be appreciated. Thanks in advance.
I was doing more research, and it seems that I didn't normalise the X matrix. After adding X <- X/norm(X,type='2') after defining X, the problem solved.
Now a new problem that I have is that this new function does not replicate the results of the glmnet implementation of LASSO regression. What could it be? I got a RMSE of 0.6 with glmnet and 0.997 with my implementation. I'd love if someone could guide me in how to improve my function.
Thanks in advance.
I have a code which has been used for some paper.
After defining the function to be optimized, the author used the Nelder-Mead method to estimate the parameters needed. When I run the code, it freezes after 493 function evaluations have been used, it doesn't show any kind of error message or anything. I've been trying to find some info but I haven't been lucky. How can I modify the optim command in order to evaluate all possible combinations, and/or what is preventing the function from being optimized?
Here's the code. It's relatively long, BUT the second-to-last line (system.time(stcopfit...)) is the ONLY ONE I need to make work / fix / modify. So you can just copy&paste the code (as I said, taken from the author of the mentioned paper) and let it run, you don't have to go through the all code, just the last few lines. This is the data over which to run the optimization, i.e. a matrix of [0,1] uniform variables of dimension 2172x9.
Any help is appreciated, thanks!
Here's a screenshot in RStudio (it took around 2 minutes to arrive at 493, and then it's been stuck like this for the last 30 minutes):
Code:
#download older version of "sn" package
url <- "https://cran.r-project.org/src/contrib/Archive/sn/sn_1.0-0.tar.gz"
install.packages(url, repos=NULL, type="source")
install.packages(signal)
library(sn)
library(signal)
#1. redefine qst function
qst <- function (p, xi = 0, omega = 1, alpha = 0, nu = Inf, tol = 1e-08)
{
if (length(alpha) > 1)
stop("'alpha' must be a single value")
if (length(nu) > 1)
stop("'nu' must be a single value")
if (nu <= 0)
stop("nu must be non-negative")
if (nu == Inf)
return(qsn(p, xi, omega, alpha))
if (nu == 1)
return(qsc(p, xi, omega, alpha))
if (alpha == Inf)
return(xi + omega * sqrt(qf(p, 1, nu)))
if (alpha == -Inf)
return(xi - omega * sqrt(qf(1 - p, 1, nu)))
na <- is.na(p) | (p < 0) | (p > 1)
abs.alpha <- abs(alpha)
if (alpha < 0)
p <- (1 - p)
zero <- (p == 0)
one <- (p == 1)
x <- xa <- xb <- xc <- fa <- fb <- fc <- rep(NA, length(p))
nc <- rep(TRUE, length(p))
nc[(na | zero | one)] <- FALSE
fc[!nc] <- 0
xa[nc] <- qt(p[nc], nu)
xb[nc] <- sqrt(qf(p[nc], 1, nu))
fa[nc] <- pst(xa[nc], 0, 1, abs.alpha, nu) - p[nc]
fb[nc] <- pst(xb[nc], 0, 1, abs.alpha, nu) - p[nc]
regula.falsi <- FALSE
while (sum(nc) > 0) {
xc[nc] <- if (regula.falsi)
xb[nc] - fb[nc] * (xb[nc] - xa[nc])/(fb[nc] - fa[nc])
else (xb[nc] + xa[nc])/2
fc[nc] <- pst(xc[nc], 0, 1, abs.alpha, nu) - p[nc]
pos <- (fc[nc] > 0)
xa[nc][!pos] <- xc[nc][!pos]
fa[nc][!pos] <- fc[nc][!pos]
xb[nc][pos] <- xc[nc][pos]
fb[nc][pos] <- fc[nc][pos]
x[nc] <- xc[nc]
nc[(abs(fc) < tol)] <- FALSE
regula.falsi <- !regula.falsi
}
x <- replace(x, zero, -Inf)
x <- replace(x, one, Inf)
Sign <- function(x) sign(x)+ as.numeric(x==0)
q <- as.numeric(xi + omega * Sign(alpha)* x)
names(q) <- names(p)
return(q)
}
#2. initial parameter setting
mkParam <- function(Omega, delta, nu){
ndim <- length(delta)+1;
R <- diag(ndim);
for (i in 2:ndim){
R[i,1] <- R[1,i] <- delta[i-1];
if (i>=3){for (j in 2:(i-1)){R[i,j] <- R[j,i] <- Omega[i-1,j-1];}}
}
LTR <- t(chol(R));
Mtheta <- matrix(0, nrow=ndim, ncol=ndim);
for (i in 2:ndim){
Mtheta[i,1] <- acos(LTR[i,1]);
cumsin <- sin(Mtheta[i,1]);
if (i >=3){for (j in 2:(i-1)){
Mtheta[i,j] <- acos(LTR[i,j]/cumsin);
cumsin <- cumsin*sin(Mtheta[i,j]);}
}
}
c(Mtheta[lower.tri(Mtheta)], log(nu-2));
}
#3. from internal to original parameters
paramToExtCorr <- function(param){
ntheta <- dim*(dim+1)/2;
theta <- param[1:ntheta];
ndim <- (1+sqrt(1+8*length(theta)))/2;
LTR <- diag(ndim);
for (i in 2:ndim){
LTR[i,1] <- cos(theta[i-1]);
cumsin <- sin(theta[i-1]);
if (i >=3){for (j in 2:(i-1)){
k <- i+ndim*(j-1)-j*(j+1)/2;
LTR[i,j] <- cumsin*cos(theta[k]);
cumsin <- cumsin*sin(theta[k]);}
}
LTR[i,i] <- cumsin;
}
R <- LTR %*% t(LTR);
R;
}
#4. show estimated parameters and log likelihood
resultVec <- function(fit){
R <- paramToExtCorr(fit$par);
logLik <- -fit$value;
Omega <- R[-1, -1];
delta <- R[1, -1];
ntheta <- dim*(dim+1)/2;
nu <- exp(fit$par[ntheta+1])+2;
c(Omega[lower.tri(Omega)], delta, nu, logLik);
}
#5. negative log likelihood for multivariate skew-t copula
stcopn11 <- function(param){
N <- nrow(udat);
mpoints <- 150;
npar <- length(param);
nu <- exp(param[npar])+2;
R <- paramToExtCorr(param);
Omega <- R[-1, -1];
delta <- R[1, -1];
zeta <- delta/sqrt(1-delta*delta);
iOmega <- solve(Omega);
alpha <- iOmega %*% delta / sqrt(1-(t(delta) %*% iOmega %*% delta)[1,1]);
ix <- matrix(0, nrow=N, ncol=dim);
lm <- matrix(0, nrow=N, ncol=dim);
for (j in 1:dim){
minx <- qst(min(udat[,j]), alpha=zeta[j], nu=nu);
maxx <- qst(max(udat[,j]), alpha=zeta[j], nu=nu);
xx <- seq(minx, maxx, length=mpoints);
px <- sort(pst(xx, alpha=zeta[j], nu=nu));
ix[,j] <- pchip(px, xx, udat[,j]);
lm[,j] <- dst(ix[,j], alpha=zeta[j], nu=nu, log=TRUE);
}
lc <- dmst(ix, Omega=Omega, alpha=alpha, nu=nu, log=TRUE);
-sum(lc)+sum(lm)
}
#6. sample setting
dim <- 9;
smdelta <- c(-0.36,-0.33,-0.48,-0.36,-0.33,-0.48,-0.36,-0.33,-0.48);
smdf <- 5;
smOmega <- cor(udat);
smzeta <- smdelta/sqrt(1-smdelta*smdelta);
iOmega <- solve(smOmega);
smalpha <- iOmega %*% smdelta /sqrt(1-(t(smdelta) %*% iOmega %*% smdelta)[1,1]);
#7. estimation
iniPar <- mkParam(diag(dim),numeric(dim),6);
system.time(stcopfit<-optim(iniPar,stcopn11,control=list(reltol=1e-8,trace=6)));
resultVec(stcopfit);
The parameters you arrive at by step 493 lead to an infinite loop in your qst function: not having any idea what this very complex code is actually doing, I'm afraid I can't diagnose further. Here's what I did to get that far:
I stated cur.params <- NULL in the global environment, then put cur.params <<- params within stcopn11; this saves the current set of parameters to the global environment, so that when you break out of the optim() call manually (via Control-C or ESC depending on your platform) you can inspect the current set of parameters, and restart from them easily
I put in old-school debugging statements (e.g. cat("entering stcopn11\n") and cat("leaving stcopn11\n") at the beginning and at the next-to-last line of the objective function, a few within stopc11 to indicate progress markers within)
once I had the "bad" parameters I used debug(stcopn11) and stcopn11(cur.param) to step through the function
I discovered that it was hanging on dimension 3 (j==3 in the for loop within stcopn11) and particularly on the first qst() call
I added a maxit=1e5 argument to qst; initialized it <- 1 before the while loop; set it <- it+1 each time through the loop; changed the stopping criterion to while (sum(nc) > 0 && it<maxit); and added if (it==maxit) stop("hit max number of iterations in qst") right after the loop
1e5 iterations in qst took 74 seconds; I have no idea whether it might stop eventually, but didn't want to wait to find out.
This was my modified version of stcopn11:
cur.param <- NULL ## set parameter placeholder
##5. negative log likelihood for multivariate skew-t copula
stcopn11 <- function(param,debug=FALSE) {
cat("stcopn11\n")
cur.param <<- param ## record current params outside function
N <- nrow(udat)
mpoints <- 150
npar <- length(param)
nu <- exp(param[npar])+2
R <- paramToExtCorr(param)
Omega <- R[-1, -1]
delta <- R[1, -1]
zeta <- delta/sqrt(1-delta*delta)
cat("... solving iOmega")
iOmega <- solve(Omega)
alpha <- iOmega %*% delta /
sqrt(1-(t(delta) %*% iOmega %*% delta)[1,1])
ix <- matrix(0, nrow=N, ncol=dim)
lm <- matrix(0, nrow=N, ncol=dim)
cat("... entering dim loop\n")
for (j in 1:dim){
if (debug) cat(j,"\n")
minx <- qst(min(udat[,j]), alpha=zeta[j], nu=nu)
maxx <- qst(max(udat[,j]), alpha=zeta[j], nu=nu)
xx <- seq(minx, maxx, length=mpoints)
px <- sort(pst(xx, alpha=zeta[j], nu=nu))
ix[,j] <- pchip(px, xx, udat[,j])
lm[,j] <- dst(ix[,j], alpha=zeta[j], nu=nu, log=TRUE)
}
lc <- dmst(ix, Omega=Omega, alpha=alpha, nu=nu, log=TRUE)
cat("leaving stcopn11\n")
-sum(lc)+sum(lm)
}
I'm having trouble optimizing a multivariate normal log-likelihood in R. If anyone has a good solution for that, please let me know. Specifically, I cannot seem to keep the variance-covariance matrix positive-definite and the parameters in a reasonable range.
Let me introduce the problem more completely. I am essentially trying to simultaneously solve these two regression equations using MLE:
$$
y_1 = \beta_1 + \beta_2 x_1 + \beta_3 x_2 \\
y_2 = \beta_4 + \beta_3 x_1 + \beta_5 x_2
$$
The fact that $\beta_3$ is in both equations is not a mistake. I try to solve this using MLE by maximizing the likelihood of the multivariate normal distribution for $Y = (y_1, y_2)^\top$ where the mean is parameterized as above in the regression equations.
I've attached the log-likelihood function as I believe it should be, where I constrain the variance covariance matrix to be positive-definite by recreating it from necessarily positive eigenvalues and a cholesky decomposition.
mvrestricted_ll <- function(par, Y, X) {
# Indices
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
# Extract parameters
beta <- par[1:nbetas]
eigvals <- exp(par[(nbetas + 1):(nbetas + ncol(Y))]) # constrain to be positive
chole <- par[(nbetas + ncol(Y) + 1):(nbetas + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2)]
# Build Sigma from positive eigenvalues and cholesky (should be pos def)
L <- diag(ncol(Y))
L[lower.tri(L, diag=T)] <- chole
Sigma <- diag(eigvals) + tcrossprod(L)
# Linear predictor
# Hard coded for 2x2 example for now
mu <- cbind(beta[1] + beta[2]*X[,1] + beta[3]*X[,2],
beta[4] + beta[3]*X[,1] + beta[5]*X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- solve(Sigma)
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
return(- nlogs - sum(diag(meat)))
}
# Create fake data
n <- 1000
p <- 2
set.seed(20160201)
X <- matrix(rnorm(n*p), nrow = n)
set.seed(20160201)
Y <- matrix(rnorm(n*p), nrow = n)
# Initialize parameters
initpars <- c(rep(0, (2 + 3 * (ncol(Y) - 1)) + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2))
# Optimize fails with BFGS
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y, method = "BFGS")
# Optim does not converge with Nelder-mead, if you up the maxits it also fails
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y)
Any help would be greatly appreciated.
EDIT: I should note that just letting Sigma be a vector in the parameters and then returning a very large value whenever it is not positive definite does not work either.
I have no idea if the code/answer is correct, but
invSigma <- try(solve(Sigma))
if (inherits(invSigma, "try-error")) return(NA)
and running
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y,
control = list(maxit = 1e5))
gets me a little farther to a convergence code of 10 (degenerate Nelder-Mead simplex).
$par
[1] 1.361612e+01 4.674349e+01 -3.050170e+01 3.305013e+01 6.731194e+01
[6] -3.117192e+01 -5.408598e+00 -6.326897e-07 -1.987449e+01 -1.795924e+01
$value
[1] -1.529013e+19
$counts
function gradient
1219 NA
$convergence
[1] 10
I suspect that a real solution will involve looking more carefully at the code to see if it's really doing what you think it's doing (sorry); understanding why solve() errors occur might be a good first step. You can work on troubleshooting this by putting a cat(par, "\n") as the first line of the function and running it without the try/NA-return code. That will allow you to isolate an example data set that throws the error — then you can work your way through your code a line at a time (with debug() or by hand) to see what's happening.
You can consider using the following approach :
library(DEoptim)
fn <- function(par, mat_X, mat_Y)
{
X <- mat_X
Y <- mat_Y
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
beta <- par[1 : nbetas]
eigvals <- exp(par[(nbetas + 1) : (nbetas + ncol(Y))])
chole <- par[(nbetas + ncol(Y) + 1) : (nbetas + ncol(Y) + ncol(Y) * (ncol(Y) + 1) / 2)]
L <- diag(ncol(Y))
L[lower.tri(L, diag = TRUE)] <- chole
Sigma <- tryCatch(diag(eigvals) + tcrossprod(L), error = function(e) NA)
if(is.null(dim(Sigma)))
{
return(10 ^ 30)
}else
{
mu <- cbind(beta[1] + beta[2] * X[,1] + beta[3] * X[,2],
beta[4] + beta[3] * X[,1] + beta[5] * X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- tryCatch(solve(Sigma), error = function(e) NA)
if(is.null(dim(invSigma)))
{
return(10 ^ 30)
}else
{
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
log_Lik <- - nlogs - sum(diag(meat))
if(is.na(log_Lik) | is.nan(log_Lik) | is.infinite(log_Lik))
{
return(10 ^ 30)
}else
{
return(-log_Lik)
}
}
}
}
n <- 1000
p <- 2
set.seed(20160201)
mat_X <- matrix(rnorm(n * p), nrow = n)
set.seed(2436537)
mat_Y <- matrix(rnorm(n * p), nrow = n)
lower <- rep(-10, 10)
upper <- rep(10, 10)
DEoptim(fn = fn, lower = lower, upper = upper,
control = list(itermax = 10000, parallelType = 1), mat_X = mat_X, mat_Y = mat_Y)