I am using CVXR to code a penalized linear regression. My global loss is composed of 4 elements: two differents SSE losses loss_u, loss_b on two different data sets, a ridge penalty and a specific distance D. The code works if I use the 'distance == "MM"'. However, there is an error for 'distance == "MMD"'. I use an external rcpp function from kernal "kernlab::kmmd". The problem is that "Xb %*% beta" is a MulExpression. I dont know if I should convert it into a numeric (but how?) or if it is impossible to use rcpp function.
deb_reg <- function(Xu, Yu, Xb, Yb, beta, lambda = 0, theta = 0.5, alpha = 0, distance = "MM") {
n <- nrow(Xu)
m <- nrow(Xb)
ridge <- lambda * sum(beta^2)
loss_u <- sum((Yu - Xu %*% beta)^2) * ( theta/ n )
loss_b <- sum((Yb - Xb %*% beta)^2) * ( (1-theta)/ m )
if(distance == "MM"){
D <- alpha * ( mean(Yu) - mean(Xb %*% beta) )^2
} else if(distance == "MMD"){
y <- as.numeric(Yu)
# print(beta)
x <- Xb %*% beta
# D <- alpha * EasyMMD::MMD(y, x)
MMD <- kernlab::kmmd(as.matrix(y), as.matrix(x))
D <- alpha * sum(MMD#mmdstats)
} else{
D <- 0
}
obj <- loss_u + loss_b + ridge + D
return(obj)
}
p <- ncol(X_unbiased)
beta <- Variable(p)
obj <- deb_reg(Xu = X_unbiased, Yu = Y_unbiased, Xb = X_biased, Yb = Y_biased, beta,
lambda = 0.1, theta=0.5, alpha = 10, distance = "MMD")
prob <- Problem(Minimize(obj))
result <- solve(prob)
Related
I'm setting up an alternative response function to the commonly used exponential function in poisson glms, which is called softplus and defined as $\frac{1}{c} \log(1+\exp(c \eta))$, where $\eta$ corresponds to the linear predictor $X\beta$
I already managed optimization by setting parameter $c$ to arbitrary fixed values and only searching for $\hat{\beta}$.
BUT now for the next step I have to optimize this parameter $c$ as well (iteratively changing between updated $\beta$ and current $c$).
I tried to write a log-lik function, score function and then setting up a Newton Raphson optimization (using a while loop)
but I don't know how to seperate the updating of c in an outer step and updating \beta in an inner step..
Are there any suggestions?
# Response function:
sp <- function(eta, c = 1 ) {
return(log(1 + exp(abs(c * eta)))/ c)
}
# Log Likelihood
l.lpois <- function(par, y, X){
beta <- par[1:(length(par)-1)]
c <- par[length(par)]
l <- rep(NA, times = length(y))
for (i in 1:length(l)){
l[i] <- y[i] * log(sp(X[i,]%*%beta, c)) - sp(X[i,]%*%beta, c)
}
l <- sum(l)
return(l)
}
# Score function
score <- function(y, X, par){
beta <- par[1:(length(par)-1)]
c <- par[length(par)]
s <- matrix(rep(NA, times = length(y)*length(par)), ncol = length(y))
for (i in 1:length(y)){
s[,i] <- c(X[i,], 1) * (y[i] * plogis(c * X[i,]%*%beta) / sp(X[i,]%*%beta, c) - plogis(c * X[i,]%*%beta))
}
score <- rep(NA, times = nrow(s))
for (j in 1:length(score)){
score[j] <- sum(s[j,])
}
return(score)
}
# Optimization function
opt <- function(y, X, b.start, eps=0.0001, maxiter = 1e5){
beta <- b.start[1:(length(b.start)-1)]
c <- b.start[length(b.start)]
b.old <- b.start
i <- 0
conv <- FALSE
while(conv == FALSE){
eta <- X%*%b.old[1:(length(b.old)-1)]
s <- score(y, X, b.old)
h <- numDeriv::hessian(l.lpois,b.old,y=y,X=X)
invh <- solve(h)
# update
b.new <- b.old + invh %*% s
i <- i + 1
# Test
if(any(is.nan(b.new))){
b.new <- b.old
warning("convergence failed")
break
}
# convergence reached?
if(sqrt(sum((b.new - b.old)^2))/sqrt(sum(b.old^2)) < eps | i >= maxiter){
conv <- TRUE
}
b.old <- b.new
}
eta <- X%*%b.new[1:(length(b.new)-1)]
# covariance
invh <- solve(numDeriv::hessian(l.lpois,b.new,y=y,X=X))
fitted <- sp(eta, b.new[length(b.new)])
result <- list("coefficients" = c(beta = b.new),
"fitted.values" = fitted,
"covariance" = invh)
}
# Running fails ..
n <- 100
x <- runif(n, 0, 1)
Xdes <- cbind(1, x)
eta <- 1 + 2 * x
y <- rpois(n, sp(eta, c = 1))
opt(y,Xdes,c(0,1,1))
You have 2 bugs:
line 25:
(y[i] * plogis(c * X[i,]%*%beta) / sp(X[i,]%*%beta, c) - plogis(c * X[i,]%*%beta))
this returns matrix so you must convert to numeric:
as.numeric(y[i] * plogis(c * X[i,]%*%beta) / sp(X[i,]%*%beta, c) - plogis(c * X[i,]%*%beta))
line 23:
) is missing:
you have:
s <- matrix(rep(NA, times = length(y)*length(par), ncol = length(y))
while it should be:
s <- matrix(rep(NA, times = length(y)*length(par)), ncol = length(y))
I calculated the integral of the product of a Gaussian density and some function.
First, I did it with the function int2() (rmutil package).
And then, I did it with Gauss-Hermite points.
The two results I have obtained are different.
Should I consider that the Gauss-Hermite method is the good one and the numerical integration is an approximation ?
I provide below an example :
1. rmutil::int2()
library(rmutil)
Sig <- matrix (c(0.2^2, 0, 0, 0.8^2), ncol=2)
Mu<- c(2, 0)
to.integrate <- function(B0, B1) {
first.int= 1/0.8 * (1.2 * exp(B0 + B1 * 0.5))^(-1/0.8) * gamma(1/0.8)
B=matrix(c(B0, B1), ncol=1)
multi.norm=1 / (2 * pi * det(Sig)^(1/2)) *
exp (- 0.5 * t( B - Mu ) %*% solve(Sig) %*%( B - Mu ) )
return (first.int %*% multi.norm)
}
result_int2 <- int2(to.integrate, a=c(-Inf, -Inf), b=c(Inf, Inf),
eps=1.0e-6, max=16, d=5)
2. Compute multivariate Gaussian quadrature points:
library(statmod)
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n), dm)))
pts <- matrix(gh[idx, 1], nrow(idx), dm)
wts <- apply(matrix(gh[idx, 2], nrow(idx), dm), 1, prod)
eig <- eigen(sigma)
rot <- eig$vectors %*% diag(sqrt(eig$values))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
nod_wei <- mgauss.hermite(10, mu=Mu, sigma=Sig)
gfun <- function(B0, B1) {
first.int <- 1/0.8 *(1.2 * exp(B0 + B1 * 0.5))^(-1/0.8)* gamma(1/0.8)
return(first.int)
}
result_GH <- sum(gfun(nod_wei$points[, 1], nod_wei$points[, 2]) * nod_wei$weights)/pi
result_int2
result_GH
The mistake came from the way the points were calculated in the mgauss.hermite function.
I changed the decomposition of the Sigma matrix for a Cholesky decomposition with a multiplication by square root of 2.
And the results of the two methods became very similar.
Below is the correction of the mgauss.hermite function
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n),dm)))
pts <- matrix(gh[idx,1],nrow(idx),dm)
wts <- apply(matrix(gh[idx,2],nrow(idx),dm), 1, prod)
rot <- 2.0**0.5*t(chol(sigma))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
I'm having trouble optimizing a multivariate normal log-likelihood in R. If anyone has a good solution for that, please let me know. Specifically, I cannot seem to keep the variance-covariance matrix positive-definite and the parameters in a reasonable range.
Let me introduce the problem more completely. I am essentially trying to simultaneously solve these two regression equations using MLE:
$$
y_1 = \beta_1 + \beta_2 x_1 + \beta_3 x_2 \\
y_2 = \beta_4 + \beta_3 x_1 + \beta_5 x_2
$$
The fact that $\beta_3$ is in both equations is not a mistake. I try to solve this using MLE by maximizing the likelihood of the multivariate normal distribution for $Y = (y_1, y_2)^\top$ where the mean is parameterized as above in the regression equations.
I've attached the log-likelihood function as I believe it should be, where I constrain the variance covariance matrix to be positive-definite by recreating it from necessarily positive eigenvalues and a cholesky decomposition.
mvrestricted_ll <- function(par, Y, X) {
# Indices
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
# Extract parameters
beta <- par[1:nbetas]
eigvals <- exp(par[(nbetas + 1):(nbetas + ncol(Y))]) # constrain to be positive
chole <- par[(nbetas + ncol(Y) + 1):(nbetas + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2)]
# Build Sigma from positive eigenvalues and cholesky (should be pos def)
L <- diag(ncol(Y))
L[lower.tri(L, diag=T)] <- chole
Sigma <- diag(eigvals) + tcrossprod(L)
# Linear predictor
# Hard coded for 2x2 example for now
mu <- cbind(beta[1] + beta[2]*X[,1] + beta[3]*X[,2],
beta[4] + beta[3]*X[,1] + beta[5]*X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- solve(Sigma)
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
return(- nlogs - sum(diag(meat)))
}
# Create fake data
n <- 1000
p <- 2
set.seed(20160201)
X <- matrix(rnorm(n*p), nrow = n)
set.seed(20160201)
Y <- matrix(rnorm(n*p), nrow = n)
# Initialize parameters
initpars <- c(rep(0, (2 + 3 * (ncol(Y) - 1)) + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2))
# Optimize fails with BFGS
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y, method = "BFGS")
# Optim does not converge with Nelder-mead, if you up the maxits it also fails
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y)
Any help would be greatly appreciated.
EDIT: I should note that just letting Sigma be a vector in the parameters and then returning a very large value whenever it is not positive definite does not work either.
I have no idea if the code/answer is correct, but
invSigma <- try(solve(Sigma))
if (inherits(invSigma, "try-error")) return(NA)
and running
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y,
control = list(maxit = 1e5))
gets me a little farther to a convergence code of 10 (degenerate Nelder-Mead simplex).
$par
[1] 1.361612e+01 4.674349e+01 -3.050170e+01 3.305013e+01 6.731194e+01
[6] -3.117192e+01 -5.408598e+00 -6.326897e-07 -1.987449e+01 -1.795924e+01
$value
[1] -1.529013e+19
$counts
function gradient
1219 NA
$convergence
[1] 10
I suspect that a real solution will involve looking more carefully at the code to see if it's really doing what you think it's doing (sorry); understanding why solve() errors occur might be a good first step. You can work on troubleshooting this by putting a cat(par, "\n") as the first line of the function and running it without the try/NA-return code. That will allow you to isolate an example data set that throws the error — then you can work your way through your code a line at a time (with debug() or by hand) to see what's happening.
You can consider using the following approach :
library(DEoptim)
fn <- function(par, mat_X, mat_Y)
{
X <- mat_X
Y <- mat_Y
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
beta <- par[1 : nbetas]
eigvals <- exp(par[(nbetas + 1) : (nbetas + ncol(Y))])
chole <- par[(nbetas + ncol(Y) + 1) : (nbetas + ncol(Y) + ncol(Y) * (ncol(Y) + 1) / 2)]
L <- diag(ncol(Y))
L[lower.tri(L, diag = TRUE)] <- chole
Sigma <- tryCatch(diag(eigvals) + tcrossprod(L), error = function(e) NA)
if(is.null(dim(Sigma)))
{
return(10 ^ 30)
}else
{
mu <- cbind(beta[1] + beta[2] * X[,1] + beta[3] * X[,2],
beta[4] + beta[3] * X[,1] + beta[5] * X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- tryCatch(solve(Sigma), error = function(e) NA)
if(is.null(dim(invSigma)))
{
return(10 ^ 30)
}else
{
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
log_Lik <- - nlogs - sum(diag(meat))
if(is.na(log_Lik) | is.nan(log_Lik) | is.infinite(log_Lik))
{
return(10 ^ 30)
}else
{
return(-log_Lik)
}
}
}
}
n <- 1000
p <- 2
set.seed(20160201)
mat_X <- matrix(rnorm(n * p), nrow = n)
set.seed(2436537)
mat_Y <- matrix(rnorm(n * p), nrow = n)
lower <- rep(-10, 10)
upper <- rep(10, 10)
DEoptim(fn = fn, lower = lower, upper = upper,
control = list(itermax = 10000, parallelType = 1), mat_X = mat_X, mat_Y = mat_Y)
I have this matrix calculations in my code that are taking a long time to run. So far the only way I can think of to speed is up is to use a foreach instead of a for loop, but I feel like there's more that can be done. Is there some way of vectorizing things or using an alternative to for loop that I'm missing out on?
Thanks!
require(foreach)
require(mvtnorm)
# some dummy input values
omega.input.jP <- matrix(rnorm(3000*5, 0.1, 0.1), 3000, 5)
nsteps.obs <- ncol(omega.input.jP)
sigma.j <- rnorm(3000, 0.02, 0.05)
rho1.j <- rnorm(3000, 0.8, 0.1)
rho2.j <- rnorm(3000, 0.05, 0.1)
y.lastobs <- 0.3
mu.input.jP <- matrix(NA, nrow(omega.input.jP), ncol(omega.input.jP))
# note: j is an index denoting sample number (here there are 3000 samples in total, and P denotes the time step (5 time steps here)
mu.input.jP <- foreach (j = 1:nrow(mu.input.jP), .combine = "rbind") %do% {
omega <- omega.input.jP[j, ]
Sigma.mu <- GetSigmaMu(nsteps = nsteps.obs, sigma_ar = sigma.j[j], rho1 = rho1.j[j], rho2 = rho2.j[j])
mu.input.P <- GetConditionalMu(omega = omega, Sigma.mu = Sigma.mu, y = y.lastobs)
return(mu.input.P)
}
GetSigmaMu <- function( # Get Sigma.mu, a \code{nsteps} x \code{nsteps} matrix, for AR(2) process
nsteps,
sigma_ar,
rho1,
rho2
) {
rho <- c(rho1, rho2)
cor <- ARMAacf(ar = rho, pacf = FALSE, lag.max = nsteps) # phi's, first element is phi0 = 1
var <- sigma_ar^2/(1 - sum(rho*cor[2:3])) # stationary variance # cor[2:3] gives first two phi's; cor[1] gives phi0 = 1 # change JR, 20140304
cov <- cor*var
Sigma.mu <- matrix(NA, nsteps, nsteps)
for (i in 1:nsteps) {
for (k in 1:nsteps) {
Sigma.mu[i,k] <- cov[abs(i-k)+1]
}
}
return(Sigma.mu)
}
GetConditionalMu <- function( # Get values of mu given y
omega,
Sigma.mu,
y,
method = "svd" # Method to get eigenvalues in matrix. Default method does not work, "svd" used instead.
) {
nsteps <- length(omega)
one <- rep(1, nsteps)
mean.mu.cond <- c(omega + (1/(sum(Sigma.mu)))*(Sigma.mu %*% one)*c(nsteps*y - t(one) %*% omega))
Sigma.mu.cond <- Sigma.mu - (1/(sum(Sigma.mu)))*(Sigma.mu %*% one %*% t(one) %*% Sigma.mu)
mu.cond <- rmvnorm(1, mean.mu.cond, Sigma.mu.cond, method = method)
return(mu.cond)
}
I am not able to apply ucminf function to minimise my cost function in R.
Here is my cost function:
costfunction <- function(X,y,theta){
m <- length(y);
J = 1/m * ((-t(y)%*%log(sigmoid(as.matrix(X)%*%as.matrix(theta)))) - ((1-t(y))%*%log(1-sigmoid(as.matrix(X)%*%as.matrix(theta)))))
}
Here is my sigmoid function:
sigmoid <- function(t){
g = 1./(1+exp(-t))
}
Here is my gradient function:
gradfunction <- function(X,y,theta){
grad = 1/ m * t(X) %*% (sigmoid(as.matrix(X) %*% as.matrix(theta) - y));
}
I am trying to do the following:
library("ucminf")
data <- read.csv("ex2data1.txt",header=FALSE)
X <<- data[,c(1,2)]
y <<- data[,3]
qplot(X[,1],X[,2],colour=factor(y))
m <- dim(X)[1]
n <- dim(X)[2]
X <- cbind(1,X)
initial_theta <<- matrix(0,nrow=n+1,ncol=1)
cost <- costfunction(X,y,initial_theta)
grad <- gradfunction(X,y,initial_theta)
This is where I want to call ucminf to find the minimum cost and values of theta. I am not sure how to do this.
Looks like you are trying to do the week2 problem of the machine learning course of Coursera.
No need to use ucminf packages here, you can simply use the R function optim it works
We will define the sigmoid and cost function first.
sigmoid <- function(z)
1 / (1 + exp(-z))
costFunction <- function(theta, X, y) {
m <- length(y)
J <- -(1 / m) * crossprod(c(y, 1 - y),
c(log(sigmoid(X %*% theta)), log(1 - sigmoid(X %*% theta))))
grad <- (1 / m) * crossprod(X, sigmoid(X %*% theta) - y)
list(J = J, grad = grad)
}
Let's load the data now, to make this code it reproductible, I put the data in my dropbox.
download.file("https://dl.dropboxusercontent.com/u/8750577/ex2data1.txt",
method = "curl", destfile = "/tmp/ex2data1.txt")
data <- matrix(scan('/tmp/ex2data1.txt', what = double(), sep = ","),
ncol = 3, byrow = TRUE)
X <- data[, 1:2]
y <- data[, 3, drop = FALSE]
m <- nrow(X)
n <- ncol(X)
X <- cbind(1, X)
initial_theta = matrix(0, nrow = n + 1)
We can then compute the result of the cost function at the initial theta like this
cost <- costFunction(initial_theta, X, y)
(grad <- cost$grad)
## [,1]
## [1,] -0.100
## [2,] -12.009
## [3,] -11.263
(cost <- cost$J)
## [,1]
## [1,] 0.69315
Finally we can use optim to ge the optimal theta
res <- optim(par = initial_theta,
fn = function(t) costFunction(t, X, y)$J,
gr = function(t) costFunction(t, X, y)$grad,
method = "BFGS", control = list(maxit = 400))
(theta <- res$par)
## [,1]
## [1,] -25.08949
## [2,] 0.20566
## [3,] 0.20089
(cost <- res$value)
## [1] 0.2035
If you have some problem with the function download.file, the data can be downloaded
here
As you did not provide a reproducible example it is hard to exactly give you the code you need, but the general idea is to hand the functions over to ucminf:
ucminf(start, costfunction, gradfunction, y = y, theta = initial_theta)
Note that start needs to be a vector of initial starting values which when handed over as X to the two functions need to produce a result. Usually you use random starting value (e.g., runif).