My predictor (x) has U-shaped distribution in relation to the binary outcome (y), with positive outcomes at both low and high values of x, leading to a biconcave roc curve with a poor area under the curve (auc).
To maximise its ability to discriminate the outcome, I am trying to optimise the parameters of a second grade polynomial of x, by using optim and 1 - auc as the cost function to minimise.
x = c(13,7,7,7,1,100,3,4,4,2,2,7,14,8,3,14,5,12,8,
13,9,4,9,4,8,3,13,9,4,4,5,9,10,10,7,6,12,7,2,
6,6,4,3,2,3,10,5,2,5,8,3,5,4,2,7,5,7,6,79,9)
y = c(0,0,1,0,0,1,0,0,1,1,0,1,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0)
theta = c(0, 0, 0)
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*x^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
The results are as follow:
$par
[1] 0.0 0.1 0.0
$value
[1] 0.4380054
$counts
function gradient
8 NA
$convergence
[1] 0
$message
NULL
However, from a manual definition of the parameters, I know that min_auc can be further minimised.
theta = c(0, -40, 1)
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*(x^2)))$auc)
[1] 0.2762803
Could anyone explain to me what I am doing wrong, please? Is it possibly due to a non-convex cost function?
One possibility is it's a collinearity problem. Scaling the inputs helps:
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*scale(x) + theta[3]*scale(x)^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
# $par
# [1] -0.02469136 -0.03117284 0.11049383
#
# $value
# [1] 0.2762803
#
# $counts
# function gradient
# 30 NA
#
# $convergence
# [1] 0
#
# $message
# NULL
#
Another potential problem is that the surface over which you're optimizing has some flat spots. Let's say, for example, that we fix the intercept in this equation to -2. This is about what you get if you do qlogis(mean(y)). Then, you're only optimizing over 2 parameters so the surface is easier to see. Here's what it looks like with the two remaining theta terms on the two horizontal axes and the 1-AUC value on the y-axis.
min_auc <- function(theta, x, y) {
(1 - roc(y, (-2 + theta[1]*scale(x) + theta[2]*scale(x)^2))$auc)
}
s <- seq(-.25, .25, length=50)
o <- outer(s, s, Vectorize(function(z,w)min_auc(c(z,w),x,y)))
library(plotly)
plot_ly(x = ~s, y = ~s, z = ~o) %>% add_surface()
As you may have noticed above, there is no unique solution to the problem. There are lots of solutions that seem to get to the minimum value.
I am using R optim() function to estimate set of parameters which optimize user defined function shown below. But optim() out put is:
Error in optim(pstart, llAgedepfn, method = "L-BFGS-B", upper = up, lower = lo) :
L-BFGS-B needs finite values of 'fn'
Please help. The complete script is shown below:
dataM<-cbind(c(1.91,0.29,0.08,0.02,0.01,0.28,0.45,0.36,0.42,0.17,0.16,0.06,0.17,0.17,0.12),
c(0.27,4.54,0.59,0.05,0.04,0.13,0.48,0.68,0.66,0.18,0.11,0.06,0.08,0.08,0.08),
c(0.07,0.57,4.48,0.48,0.02,0.05,0.09,0.43,0.78,0.52,0.17,0.10,0.05,0.05,0.14),
c(0.02,0.04,0.44,4.34,0.36,0.09,0.07,0.11,0.41,0.77,0.43,0.10,0.03,0.04,0.14),
c(0.01,0.04,0.01,0.36,2.20,0.46,0.19,0.15,0.19,0.34,0.62,0.30,0.09,0.03,0.22),
c(0.22,0.11,0.05,0.09,0.45,0.91,0.61,0.43,0.37,0.26,0.41,0.63,0.29,0.16,0.15),
c(0.31,0.35,0.07,0.05,0.16,0.54,0.81,0.59,0.48,0.36,0.33,0.43,0.47,0.26,0.20),
c(0.22,0.45,0.29,0.08,0.11,0.34,0.53,0.85,0.71,0.39,0.27,0.26,0.26,0.28,0.38),
c(0.22,0.36,0.44,0.26,0.12,0.24,0.36,0.59,0.91,0.61,0.35,0.28,0.20,0.22,0.29),
c(0.09,0.10,0.30,0.49,0.22,0.17,0.28,0.33,0.62,0.80,0.52,0.29,0.20,0.11,0.46),
c(0.10,0.07,0.12,0.32,0.48,0.32,0.30,0.27,0.42,0.61,0.78,0.47,0.33,0.23,0.49),
c(0.04,0.04,0.06,0.08,0.24,0.53,0.41,0.28,0.36,0.36,0.50,0.67,0.51,0.19,0.47),
c(0.10,0.05,0.04,0.02,0.07,0.23,0.43,0.26,0.23,0.23,0.33,0.48,0.75,0.51,0.49),
c(0.05,0.04,0.03,0.05,0.02,0.10,0.19,0.22,0.21,0.10,0.18,0.14,0.40,0.79,0.82),
c(0.03,0.02,0.03,0.03,0.06,0.04,0.06,0.12,0.11,0.18,0.16,0.14,0.16,0.34,1.26)
)
NormCM <- dataM/eigen(CMWkday)$values[1] #Normalizing the contact mtrix - divide by the largest eigen value
w <- c(495,528,548,603,617,634,720,801,957,937,798,755,795,1016,2469)
g2 <- c(770,622,726,559,410,547,564,472,399,397,340,308,337,91,84)
h2 <- c(269,426,556,430,271,284,303,207,194,181,126,106,74,24,23)
z2 <- h2/g2
g1 <- c(774,527,665,508,459,539,543,492,402,412,365,342,213,146,152)
h1 <- c(56,31,84,173,103,85,123,70,71,80,55,25,18,12,26)
z1 <- h1/g1
#### Normal loglikelihood #########
llnormfn <- function(q) {
tol <- 1e-9
final.size.start <- 0.8
zeta <- rep(final.size.start, nrow(NormCM))
last.zeta <- rep(0, nrow(NormCM))
first.run <- T
current.diff <- tol+1
loglik <- 0
while (current.diff > tol) {
zeta <- 1-exp(-(q*(zeta%*%NormCM)))
current.diff <- sum(abs(last.zeta-zeta))
last.zeta <-zeta
}
mu <- c(zeta)
zigma <- z1*(1-z1)/g1 + (z1+mu)*(1-(z1+mu))/g2
logliknorm <- -sum((((z2-z1)-mu)**2)/2*zigma + 0.5*log(2*pi*zigma))
return(logliknorm)
}
pstart <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
up <- c(5,5,5,5,5,5,5,5,5,5,5,5,5,5,5)
lo <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1)
estm <- optim(pstart, llnormfn, method = "L-BFGS-B", upper = up, lower = lo )
Your llnormfn doesn't return a finite value for all values of its parameters within the range. For example at the upper limit:
> llnormfn(up)
[1] NaN
Warning message:
In log(2 * pi * zigma) : NaNs produced
Because zigma must be less than zero here.
If you restrict the range a bit you can eventually find a spot where it does work...
> llnormfn(up-2)
[1] NaN
Warning message:
In log(2 * pi * zigma) : NaNs produced
> llnormfn(up-3)
[1] 42.96818
Let's check it works at the lower range:
> llnormfn(lo)
[1] 41.92578
that looks fine. So either you've set that upper limit outside the computationally valid range of your function, or you've got a bug in your llnormfn function, or both, or something else.
If you do run the optimisation with a reduced upper bound you do get convergence:
> estm <- optim(pstart, llnormfn, method = "L-BFGS-B", upper = up-3, lower = lo )
> estm
$par
[1] 1.9042672 1.0891264 0.9916916 0.6208685 1.2413983 1.4822433 1.1243878
[8] 1.5224263 1.3686933 1.4876350 1.6231518 2.0000000 2.0000000 2.0000000
[15] 2.0000000
$value
[1] 38.32182
$counts
function gradient
23 23
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Although you might notice some of those parameters are at the upper value (2.0) which is an alarm bell.
Check your function behaves sensibly for its input values - try fixing all-but-one and plotting how llnormfn behaves while varying one. I just had a quick look and the function does not look smooth at all, with lots of discontinuities, so I doubt BFGS is a good method for optimising.
e.g varying the fifth parameter between 0.1 and 2:
> s = seq(0.1,2,len=300)
> ss = sapply(1:length(s),function(i){ll=lo;ll[5]=s[i];llnormfn(ll)})
> plot(s,ss)
gives:
I want to calculate the following
So I want to find Theta and Sigma that maximizes the function.
The constraints are:
> Theta>-Sigma
> -1<Sigma<1
So one of my problem is that I dont know how to deal with the fact that one parameter is included in the constraint of the other Parameter, that I want to optimize over.
I tried with optim(), constrOptim and dfoptim!
Using optim():
k=8
i=1:(k-1)
x=c(5,0.2)
n=24
nj=c(3,4,8,1,1,4,2,1)
EPPF <- function(x,n,nj) {
y=(x[1]+1):(x[1]+1+(n-1)-1)
z=-(prod(x[1]+i*x[2])/(prod(y))*prod(sapply(nj, hfun)))
return(z)}
hfun <- function(p){
h=(1-x[2]):((1-x[2])+p-1)
hfun=prod(h)
return(hfun)
}
> optim(c(6,0.3), fn=EPPF,method = "L-BFGS-B", n=n,nj=nj, lower = c(-x[1],-1), upper = c(Inf,1))
$par
[1] 6.0 0.3
$value
[1] -1.258458e-15
$counts
function gradient
2 2
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
I also tried using a constructor function:
make.EPPF <- function(n,nj,fixed=c(FALSE,FALSE)){
params <-fixed
function(p) {
hfun <- function(y){
h=(1-sigma):((1-sigma)+y-1)
hfun=prod(h)
return(hfun)
}
params[!fixed] <- p
theta <- params[1]
sigma <- params[2]
y=(theta+1):(theta+1+(n-1)-1)
z=(prod(theta+i*sigma)/(prod(y))*prod(sapply(nj, hfun)))
z
}
}
EPPF <-make.EPPF (n,nj)
> optim(c(theta=6, sigma=0.5), fn=EPPF,method = "L-BFGS-B",lower = c(-sigma,-1), upper = c(Inf,1))
Error in optim(c(theta = 6, sigma = 0.5), fn = EPPF, method = "L-BFGS-B", :
object 'sigma' not found
Using constrOptim():
> A <- matrix(c(1,1,0,1,0,-1),3,2,byrow=T)
> b <- c(0,-1,-1)
>
> constrOptim(c(3,0.3),EPPF,NULL,A,b, control=list(fnscale=-1))
$par
[1] 3.0 0.3
$value
[1] 9.712117e-16
$counts
[1] 0
$convergence
[1] 0
$message
NULL
$outer.iterations
[1] 1
$barrier.value
[1] 7.313452e-05
Using Package dfoptim:
> library(dfoptim)
> nmkb(x=c(6,0.3), EPPF, lower=c(-x[2],-1), upper=c(Inf, 1 ))
Error in par < lower :
comparison (3) is possible only for atomic and list types
Either there is for some reasons no convergence or some other Errors.
I am relativ new to programming and R and would really appreciate if someone could help me. Thanks!
These are 3 linear inequality constraints:
sigma + theta > 0
sigma + 1 > 0
-sigma + 1 > 0
You can do this in maxLik. But note that maxLik maximizes the function, hence remove the '-' in front of z. Here is the code that works for me (using Rscript):
k=8
i=1:(k-1)
x=c(5,0.2)
n=24
nj=c(3,4,8,1,1,4,2,1)
EPPF <- function(x,n,nj) {
theta <- x[1]
sigma <- x[2]
y=(x[1]+1):(x[1]+1+(n-1)-1)
z <- prod(x[1]+i*x[2])/(prod(y))*prod(sapply(nj, hfun))
z <- log(z)
return(z)
}
hfun <- function(p){
h=(1-x[2]):((1-x[2])+p-1)
hfun=prod(h)
return(hfun)
}
library(maxLik)
constraints <- list(ineqA=matrix(c(1,0,0,1,1,-1),3,2), ineqB=c(0,1,1))
m <- maxBFGS(EPPF, start=c(6,0.3), constraints=constraints, n=n, nj=nj)
print(summary(m))
I also took logarithm of the result as this leads to more "human" numbers. Otherwise you have to re-tune the stopping conditions. The answer seems to be -1, 1.