Optimisation reproduces initial values? - r

I am trying to maximize the following function with optimx in R
#Initial Parameter Values
beta1=0.5
beta2=0.5
lambda1=0.5
lambda2=0.5
delta=5
loglik=function(par) {sum(log(lambda1*PDF1+lambda2*PDF2))+delta*(lambda1+lambda2-1)}
G2=optimx(c(0.5,0.5,0.5,0.5,2),fn=loglik,gr=NULL, lower=-Inf, upper=Inf, hessian=FALSE)
But every time the optimization reproduces the initial values that I provide to the system, for e.g this is the response I get to the optimization using the above mentioned initial values.
p1 p2 p3 p4 p5 value fevals gevals niter convcode kkt1 kkt2
Nelder-Mead 0.5 0.5 0.5 0.5 2 5144.569 6 NA NA 0 TRUE NA
BFGS 0.5 0.5 0.5 0.5 2 5144.569 1 1 NA 0 TRUE NA
xtimes
Nelder-Mead 0
BFGS 0
Can anyone please tell whats going on?

Note: I switched out optimx for the built in optim in my answer. This does not change the content. I also switched out your PDF1 and PDF2 in the function body for beta1 and beta2 on a hunch about your intent.
You're misunderstanding how optimx, and honestly, functions, work.
Here's your definition of loglik
loglik <- function(par) {
sum(log(lambda1*beata1 + lambda2*beta2)) + delta*(lambda1 + lambda2 - 1)
}
Now watch
> loglik(1)
[1] -0.6931472
> loglik(2)
[1] -0.6931472
> loglik("I like cats.")
[1] -0.6931472
You haven't so much defined a function, but a constant. You can see this by observing that the function you defined makes no reference to its argument par. Instead, it ignores par and simply looks up the variables it contains in its enclosing environment.
You most likely meant to do this
loglik <- function(par) {
sum(log(par[3]*par[1] + par[4]*par[2])) + par[5]*(par[3] + par[4] - 1)
}
After which the optimization works as intended
optim(c(0.5,0.5,0.5,0.5,2), fn=loglik, gr=NULL, lower=-Inf, upper=Inf, hessian=FALSE)
$par
[1] 0.6466066 0.8102440 -0.2802594 0.2236580 2.6381565
$value
[1] -40.91057
$counts
function gradient
501 NA
$convergence
[1] 1
A convergence code of 1 does not indicate convergence
1
indicates that the iteration limit maxit had been reached.
Indeed, there are plenty of warnings
warnings()
1: In log(par[3] * par[1] + par[4] * par[2]) : NaNs produced
2: In log(par[3] * par[1] + par[4] * par[2]) : NaNs produced
You'll have to sort that out, I don't know what you are actually trying to accomplish with this call.
Thank you, But the problem is PDF1 and PDF2 are also functions defined by
PDF1=function(beta1) {BiCopPDF(u,v,par=abs(beta1),family=5)}
and
PDF2=function(beta2) {BiCopPDF(u,v,par=beta2,family=3)}
How do I manage that?
You will have to call into PDF1 and PDF2 within the function you are optimizing. If I understand correctly, this would result in something like:
loglik <- function(par) {
sum(log( par[3]*PDF1(par[1]) + par[4]*PDF2(par[2]) )) + par[5]*(par[3] + par[4] - 1)
}

Related

It is possible to solve equation R that are not linear?

I want to build a function that takes E[x] and Var[X] and give me the mean and standard error of a univariate lognormal variable.
E[x] = exp(mu + theta)
Var[x] = exp(2*mu + theta)*(exp(theta) - 1)
The function would take E[x] and Var[x] as input and as output would give me theta and mu
There are several packages that provide ways and means to solve a system of nonlinear equations. One of these is nleqslv.
You nee to provide a function that function that returns the differences between the actual value of the equations and the desired value.
Load package nleqslv and define the following function
library(nleqslv)
f <- function(x,Ex,Varx) {
y<- numeric(length(x))
mu <- x[1]
theta <- x[2]
y[1] <- exp(mu+theta) - Ex
y[2] <- exp(2*mu+theta)*(exp(theta)-1) - Varx
y
}
The vector x in the function contains the values of mu and theta.
An example with Ex=2 and Varx=3 and some random starting values
xstart <- c(1,1)
nleqslv(xstart,f,Ex=2,Varx=3)
gives the following
$x
[1] -0.6931472 1.3862944
$fvec
[1] -8.095125e-11 -8.111645e-11
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1 1
$nfcnt
[1] 31
$njcnt
[1] 2
$iter
[1] 22
See the manual of nleqslv for the meaning of the different elements of the return value of nleqslv.
If you want to investigate the effect of the different solving methods try this
testnslv(xstart,f,Ex=2,Varx=3)

R optimize multiple parameters

I am using R optim() function to estimate set of parameters which optimize user defined function shown below. But optim() out put is:
Error in optim(pstart, llAgedepfn, method = "L-BFGS-B", upper = up, lower = lo) :
L-BFGS-B needs finite values of 'fn'
Please help. The complete script is shown below:
dataM<-cbind(c(1.91,0.29,0.08,0.02,0.01,0.28,0.45,0.36,0.42,0.17,0.16,0.06,0.17,0.17,0.12),
c(0.27,4.54,0.59,0.05,0.04,0.13,0.48,0.68,0.66,0.18,0.11,0.06,0.08,0.08,0.08),
c(0.07,0.57,4.48,0.48,0.02,0.05,0.09,0.43,0.78,0.52,0.17,0.10,0.05,0.05,0.14),
c(0.02,0.04,0.44,4.34,0.36,0.09,0.07,0.11,0.41,0.77,0.43,0.10,0.03,0.04,0.14),
c(0.01,0.04,0.01,0.36,2.20,0.46,0.19,0.15,0.19,0.34,0.62,0.30,0.09,0.03,0.22),
c(0.22,0.11,0.05,0.09,0.45,0.91,0.61,0.43,0.37,0.26,0.41,0.63,0.29,0.16,0.15),
c(0.31,0.35,0.07,0.05,0.16,0.54,0.81,0.59,0.48,0.36,0.33,0.43,0.47,0.26,0.20),
c(0.22,0.45,0.29,0.08,0.11,0.34,0.53,0.85,0.71,0.39,0.27,0.26,0.26,0.28,0.38),
c(0.22,0.36,0.44,0.26,0.12,0.24,0.36,0.59,0.91,0.61,0.35,0.28,0.20,0.22,0.29),
c(0.09,0.10,0.30,0.49,0.22,0.17,0.28,0.33,0.62,0.80,0.52,0.29,0.20,0.11,0.46),
c(0.10,0.07,0.12,0.32,0.48,0.32,0.30,0.27,0.42,0.61,0.78,0.47,0.33,0.23,0.49),
c(0.04,0.04,0.06,0.08,0.24,0.53,0.41,0.28,0.36,0.36,0.50,0.67,0.51,0.19,0.47),
c(0.10,0.05,0.04,0.02,0.07,0.23,0.43,0.26,0.23,0.23,0.33,0.48,0.75,0.51,0.49),
c(0.05,0.04,0.03,0.05,0.02,0.10,0.19,0.22,0.21,0.10,0.18,0.14,0.40,0.79,0.82),
c(0.03,0.02,0.03,0.03,0.06,0.04,0.06,0.12,0.11,0.18,0.16,0.14,0.16,0.34,1.26)
)
NormCM <- dataM/eigen(CMWkday)$values[1] #Normalizing the contact mtrix - divide by the largest eigen value
w <- c(495,528,548,603,617,634,720,801,957,937,798,755,795,1016,2469)
g2 <- c(770,622,726,559,410,547,564,472,399,397,340,308,337,91,84)
h2 <- c(269,426,556,430,271,284,303,207,194,181,126,106,74,24,23)
z2 <- h2/g2
g1 <- c(774,527,665,508,459,539,543,492,402,412,365,342,213,146,152)
h1 <- c(56,31,84,173,103,85,123,70,71,80,55,25,18,12,26)
z1 <- h1/g1
#### Normal loglikelihood #########
llnormfn <- function(q) {
tol <- 1e-9
final.size.start <- 0.8
zeta <- rep(final.size.start, nrow(NormCM))
last.zeta <- rep(0, nrow(NormCM))
first.run <- T
current.diff <- tol+1
loglik <- 0
while (current.diff > tol) {
zeta <- 1-exp(-(q*(zeta%*%NormCM)))
current.diff <- sum(abs(last.zeta-zeta))
last.zeta <-zeta
}
mu <- c(zeta)
zigma <- z1*(1-z1)/g1 + (z1+mu)*(1-(z1+mu))/g2
logliknorm <- -sum((((z2-z1)-mu)**2)/2*zigma + 0.5*log(2*pi*zigma))
return(logliknorm)
}
pstart <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
up <- c(5,5,5,5,5,5,5,5,5,5,5,5,5,5,5)
lo <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1)
estm <- optim(pstart, llnormfn, method = "L-BFGS-B", upper = up, lower = lo )
Your llnormfn doesn't return a finite value for all values of its parameters within the range. For example at the upper limit:
> llnormfn(up)
[1] NaN
Warning message:
In log(2 * pi * zigma) : NaNs produced
Because zigma must be less than zero here.
If you restrict the range a bit you can eventually find a spot where it does work...
> llnormfn(up-2)
[1] NaN
Warning message:
In log(2 * pi * zigma) : NaNs produced
> llnormfn(up-3)
[1] 42.96818
Let's check it works at the lower range:
> llnormfn(lo)
[1] 41.92578
that looks fine. So either you've set that upper limit outside the computationally valid range of your function, or you've got a bug in your llnormfn function, or both, or something else.
If you do run the optimisation with a reduced upper bound you do get convergence:
> estm <- optim(pstart, llnormfn, method = "L-BFGS-B", upper = up-3, lower = lo )
> estm
$par
[1] 1.9042672 1.0891264 0.9916916 0.6208685 1.2413983 1.4822433 1.1243878
[8] 1.5224263 1.3686933 1.4876350 1.6231518 2.0000000 2.0000000 2.0000000
[15] 2.0000000
$value
[1] 38.32182
$counts
function gradient
23 23
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Although you might notice some of those parameters are at the upper value (2.0) which is an alarm bell.
Check your function behaves sensibly for its input values - try fixing all-but-one and plotting how llnormfn behaves while varying one. I just had a quick look and the function does not look smooth at all, with lots of discontinuities, so I doubt BFGS is a good method for optimising.
e.g varying the fifth parameter between 0.1 and 2:
> s = seq(0.1,2,len=300)
> ss = sapply(1:length(s),function(i){ll=lo;ll[5]=s[i];llnormfn(ll)})
> plot(s,ss)
gives:

optimx function in R

I am trying to use the optimx function in R, but keep getting the error message:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
I have looked at another stackoverflow question R- Optimx for exponential function with 2 parameters - cannot evaluate function at initial parameter values
but that solution has not worked for me.
Here is my test data:
t <- seq(from=1,to=60,by=1)
len <- 100*(1-exp(-0.2*(t-0)))
t.data<-data.frame(t,len)
starting values for par in the optimx function
p <- as.vector(c(30,110,0.3,1.0))
ages for the function below
Age1 <- 1 #### a young age
Age2 <- 50 #### an old age
function to be minimized
schnute_richards <- function(p,data) # which are Len1,Len2,K,R
{
zero <- p[1]^p[4] # Len1^R
one <- p[2]^p[4]-p[1]^p[4] # (Len2^R-Len1^R)
two <- 1-exp(-p[3]*(data$t-Age1)) # (1-EXP(-K*(ObsAge-Age1)))
three <- 1-exp(-p[3]*(Age2-Age1)) # (1-EXP(-K*(Age2-Age1)))
pred <- (zero + one*(two/three))^(1/p[4]) # final equation
sum((data$len-pred)^2)
}
optimx code
temp <-optimx(p,function (x) schnute_richards(x[1],x[2]))
I tried other versions of this code, but I get the same error message. This code was used in the other stackoverflow message I referred earlier that was the solution.
Thanks for any help.
Study help(optimx).
library(optimx)
temp <- optimx(p, schnute_richards, data = t.data)
# p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2 xtimes
#Nelder-Mead 18.12639 99.99589 0.1999604 1.0005907 7.282821e-05 475 NA NA 0 FALSE TRUE 0.03
#BFGS 18.12844 99.99493 0.2000565 0.9993415 6.034452e-05 82 20 NA 0 FALSE TRUE 0.01

A strange error from an algorithm

I have written this function that computes the MLE from a Cauchy distribution numerically based on the Newton-Raphson algorithm:
mlec <- function(x,theta0=median(x),numstp=100,eps=0.01){
numfin <- numstp
ic <- 0
istop <- 0
while(istop==0){
ic <- ic+1
ltheta <- -2*sum((x-theta0)/(1+(x-theta0)^2))
lprimetheta <- -2*(sum(2*(x-theta0)^2/
(1+(x-theta0)^2)^2-1/(1+(x-theta0)^2)^2))
theta1 <- theta0-(ltheta/lprimetheta)
check <- abs((theta1-theta0)/theta1)
if(check < eps ) { istop <- 1 }
theta0 <- theta1
}
list(theta1=theta1,check=check,realnumstps=ic)
}
The goal is then to generate observations from a Cauchy distribution with scale parameter 2 and see how the MLE performs. The problem is that while for some samples, the MLE runs wonderfully for others I get the strange error
Error in if (check < eps) { : missing value where TRUE/FALSE needed
What is going on here? I have defined what "check" is so that shouldn't happen.
Thank you.
I've added a little bit of instrumentation (see the cat() statement in the middle), and fixed the second-derivative expression (fixed: see below):
mlec <- function(x,theta0=median(x),numstp=100,eps=0.01,
debug=TRUE,fixed=FALSE){
numfin <- numstp
ic <- 0
istop <- 0
while(istop==0){
ic <- ic+1
ltheta <- -2*sum((x-theta0)/(1+(x-theta0)^2))
lprimetheta <- -2*(sum(2*(x-theta0)^2/
(1+(x-theta0)^2)^2-1/(1+(x-theta0)^2)^2))
if (!fixed) {
theta1 <- theta0-(ltheta/lprimetheta)
} else theta1 <- theta0-ltheta/ff(theta0)
check <- abs((theta1-theta0)/theta1)
if (debug) cat(ic,ltheta,lprimetheta,theta0,theta1,check,"\n")
if(check < eps ) { istop <- 1 }
theta0 <- theta1
}
list(theta1=theta1,check=check,realnumstps=ic)
}
set.seed(1)
x <- rcauchy(100,2)
mlec(x)
Here's the tail end of the output:
## ic ltheta lprimetheta theta0 theta1 check
## 427 -4.48838e-75 -2.014555e-151 -4.455951e+76 -6.683926e+76 0.3333333
## 428 -2.992253e-75 -8.953579e-152 -6.683926e+76 -1.002589e+77 0.3333333
## 429 -1.994835e-75 -3.979368e-152 -1.002589e+77 -1.503883e+77 0.3333333
## 430 -1.32989e-75 0 -1.503883e+77 -Inf NaN
Now, why is it happening? Either you've got a bug somewhere, or the algorithm is unstable. tl;dr it turns out the answer is actually "both"; your second-derivative expression seems wrong, but even it were correct the N-R algorithm is extremely unstable for this problem.
Here are your derivative and second-derivative functions (I'm wrapping them with Vectorize() for convenience so I can evaluate these at multiple theta values simultaneously):
lthetafun <- Vectorize(function(theta) {
-2*sum((x-theta)/(1+(x-theta)^2))
})
lprimethetafun <- Vectorize(function(theta) {
-2*(sum(2*(x-theta)^2/
(1+(x-theta)^2)^2-1/(1+(x-theta)^2)^2))
})
A negative log-likelihood function based on the built-in dcauchy function:
thetafun <- Vectorize(function(theta) -sum(dcauchy(x,theta,log=TRUE)))
Check differentiation (at an arbitrarily chosen point):
library("numDeriv")
all.equal(grad(thetafun,2),lthetafun(2)) ## TRUE
Check second derivative:
hessian(thetafun,2) ## 36.13297
lprimethetafun(2) ## 8.609859: ???
I think your second-derivative expression is wrong.
The following alternative second-derivative function is based on lazily cheating with Wolfram Alpha, differentiating your gradient function (which matches with the finite-difference approximation):
ff <- Vectorize(function(theta)
2*sum(((x-theta)^2-1)/((x-theta)^2+1)^2))
ff(2) ## matches hessian() above.
But it looks like you may have further problems.
The negative log-likelihood surface looks OK:
curve(thetafun, from=-10,to=10,n=501)
But trouble is on the horizon:
curve(lthetafun, from=-10,to=10, n=501)
This looks irregular, and going up one level to the second derivative shows that it is:
curve(ff, from=-10, to=10, n=501)
Here's the curve of N-R updates:
ff2 <- function(x) x-lthetafun(x)/ff(x)
curve(ff2, from=-10, to=10, n=501,ylim=c(-100,100))
Yikes! This indicates why the Newton-Raphson method could go wrong unless you start close enough to the minimum (any time the likelihood surface has an inflection point, the N-R updating curve will have a pole ...). Further analysis of the problem would probably tell you why the second derivative of the Cauchy is so scary.
If you just want to find the MLE you can do it by some more robust 1-D method:
library("bbmle")
mle2(x~dcauchy(location=m),
data=data.frame(x),
start=list(m=median(x)),
method="Brent",
lower=-100,upper=100)
##
## Call:
## mle2(minuslogl = x ~ dcauchy(location = m), start = list(m = median(x)),
## method = "Brent", data = data.frame(x), lower = -100, upper = 100)
##
## Coefficients:
## m
## 1.90179
##
## Log-likelihood: -262.96
##
If you start close enough, N-R seems to work OK:
mlec(x,1.85,debug=FALSE,fixed=TRUE,eps=0.0001)
## $theta1
## [1] 1.901592
##
## $check
## [1] 5.214763e-05
##
## $realnumstps
## [1] 37079

constrained optimization in R

I am trying to use http://rss.acs.unt.edu/Rdoc/library/stats/html/constrOptim.html in R to do optimization in R with some given linear constraints but not able to figure out how to set up the problem.
For example, I need to maximize $f(x,y) = log(x) + \frac{x^2}{y^2}$ subject to constraints $g_1(x,y) = x+y < 1$, $g_2(x,y) = x > 0$ and $g_3(x,y) = y > 0$. How do I do this in R? This is just a hypothetical example. Do not worry about its structure, instead I am interested to know how to set this up in R.
thanks!
Setting up the function was trivial:
fr <- function(x) { x1 <- x[1]
x2 <- x[2]
-(log(x1) + x1^2/x2^2) # need negative since constrOptim is a minimization routine
}
Setting up the constraint matrix was problematic due to a lack of much documentation, and I resorted to experimentation. The help page says "The feasible region is defined by ui %*% theta - ci >= 0". So I tested and this seemed to "work":
> rbind(c(-1,-1),c(1,0), c(0,1) ) %*% c(0.99,0.001) -c(-1,0, 0)
[,1]
[1,] 0.009
[2,] 0.990
[3,] 0.001
So I put in a row for each constraint/boundary:
constrOptim(c(0.99,0.001), fr, NULL, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0)) # the thresholds
For this problem there is a potential difficulty in that for all values of x the function goes to Inf as y -> 0. I do get a max around x=.95 and y=0 even when I push the starting values out to the "corner", but I'm somewhat suspicious that this is not the true maximum which I would have guessed was in the "corner".
EDIT:
Pursuing this I reasoned that the gradient might provide additional "direction" and added a gradient function:
grr <- function(x) { ## Gradient of 'fr'
x1 <- x[1]
x2 <- x[2]
c(-(1/x[1] + 2 * x[1]/x[2]^2),
2 * x[1]^2 /x[2]^3 )
}
This did "steer" the optimization a bit closer to the c(.999..., 0) corner, instead of moving away from it, as it did for some starting values. I remain somewhat disappointed that the process seems to "head for the cliff" when the starting values are close to the center of the feasible region:
constrOptim(c(0.99,0.001), fr, grr, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0) )
$par
[1] 9.900007e-01 -3.542673e-16
$value
[1] -7.80924e+30
$counts
function gradient
2001 37
$convergence
[1] 11
$message
[1] "Objective function increased at outer iteration 2"
$outer.iterations
[1] 2
$barrier.value
[1] NaN
Note: Hans Werner Borchers posted a better example on R-Help that succeeded in getting the corner values by setting the constraint slightly away from the edge:
> constrOptim(c(0.25,0.25), fr, NULL,
ui=rbind( c(-1,-1), c(1,0), c(0,1) ),
ci=c(-1, 0.0001, 0.0001))
$par
[1] 0.9999 0.0001

Resources