The following code was presented by our teacher in a practical lesson. I have questions about the last line of the code in the optimization step using the function optim() but I add the other lines to clarify the work.
population <- read.csv("data.csv", header=FALSE, sep=';', dec=',')
and the data looks like:
1 8.29
2 5.37
3 10.61
4 5.92
5 14.99
6 9.74
7 15.47
.
.
.
We sample 100 elements from the population as
Sampled_Data <- sample(population$V1, 100)
then we write a function to calculate the likelihood "of the log-normal distribution" for the sampled elements as
MyFunc <- function(Myparameters,data){
Firstparameter <- Myparameters[1]
Secondparameter <- Myparameters[2]
n <- length(data)
Mydistribution <- -n/2*log(2*pi*(Secondparameter^2)) - sum(log(data)) - (1/(2*Secondparameter^2))*sum((log(data)-Firstparameter)^2)
return(Mydistribution)
}
Finally, we use the function optim() to estimate the two parameters of the distribution using the likelihood maximum function
optimisation <- optim(c(1,1),MyFunc,data=Sampled_Data)
in the function optim(), I don't understand why he added the vector c(1,1) while, from the documentation, we should fill it by the initial values for the parameters? does he assume that the initial values are 1 & 1 ? if so, based on what we assume the initial values.
Also, why he added data=Sampled_Data whereas there is no similar thing in the documentation? From the documentation, after adding the function, we should add other things like the gradient, method and bounds! but not the data we have!
Finally, if I want to specify the lower and upper bounds, it is not clear for me which values to use in my case with the log normal distribution.
I was lost where to post the question, here or in cross validated but I saw similar questions here. Anyway, if it is not the suitable place I will delete the question.
Unnamed arguments to functions in R are assigned in the order that they appear in the definition.
If we look at help with help(optim):
Usage
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
We will see that the first argument by default is par. Therefore, you are correct that your instructor has set par = c(1,1) and fn = MyFun. You will see also in help(optim) that you can set upper = and lower = arguments as well.
If you look back to the definition of MyFun, you will see that the first argument is Myparameters. Therefore, when optim is called, c(1,1) is passed to MyFun as its first argument. Thus, on the initial step, both Firstparameter and Secondparameter will be set to 1.
Finally, if you look carefully at help(optim):
Usage
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
Arguments
par Initial values for the parameters to be optimized over.
fn A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.
gr A function to return the gradient for the "BFGS", "CG" and "L-BFGS-B" methods. If it is NULL, a finite-difference approximation will be used.
... Further arguments to be passed to fn and gr.
You will see that ... can be further arguments to be passed to fn and thus to MyFun. In this case, data=Sampled_Data.
Related
Not sure if this numerical methods problem should really be here or in crossvalidated, but since I have a nice reproducible example I though I would start here.
I am going to be estimating and fitting a bunch of distributions both to some large data sets and to data sets generated randomly from similar distributions. As part of this process I will be generating estimates for the conditional mean of various value ranges, including truncated and non-truncated values of the right tail.
The function cr_moment below, given a pdf function for dfun and parameters for that function in params calculates the unconditional mean of that distribution. Given the upper, lower, or both bounds, it calculates the conditional mean for the range specified by those bounds, using the singly- or doubly-truncated distribution for those bounds. The function beneath it, cr_gb2, specializes cr_moment to the generalized beta distribution of the second kind. Finally, the parameter values supplied beneath that approximate the unadjusted current-dollar household income distribution from the US Census/BLS Current Population Survey for the year 2000. McDonald & Ransom 2008. (Also, kudos to Mikko Marttila on this list for help with coding this function).
This function gives me a failure to converge error, copied below, for various lower bounds and an upper bound equal to 4.55e8, or higher, but not at 4.54e8. The kth moment of the GB2 exists for k < shape1 * shape3, here about 2.51. This is a nice smooth unimodal function being integrated over a finite interval, and I don’t know why it is failing to converge and don-t know what to do about it. For other parameter values, but not this one, I have also seen convergence problems at the low end for lower bounds ranging from 6 to a couple of hundred.
Error in integrate(f = prob_interval, lower = lb, upper = ub, subdivisions = 100L):
the integral is probably divergent
455 billion will be above the highest observable income level, by one or two orders of magnitude, but given a wider range of parameter values and using hill-climbing algorithms to fit real and simulated data I think I will hit this wall many times. I know very little about numerical methods in a case like this and don’t really know where to start. Help and suggestions greatly appreciated.
cr_moment <- function(lb = -Inf, ub = Inf, dfun, params, v=1, ...){
x_pdf <- function(X){
X^v * do.call(what=dfun, args=c(list(x=X), params))
}
prob_interval <- function(X){
do.call(what=dfun, args=c(list(x=X), params))
}
integral_val <- integrate(f = x_pdf, lower = lb, upper = ub)
integral_prob <- integrate(f = prob_interval, lower = lb, upper = ub)
crm <- interval_val[[1]] / interval_prob[[1]]
out <- list(value = integral_val[[1]], probability = integral_prob[[1]],
cond_moment = crm)
out
}
library(GB2)
cr_gb2 <- function(lb = -Inf, ub = Inf, v = 1, params){
cr_moment(lb, ub, dfun = dgb2, params = get("params"))
}
GB2_params <- list(shape1 = 2.2474, scale = 58441.5, shape2 = 0.6186, shape3 = 1.118)
cr_gb2(lb=1, ub= 4.55e8, params = GB2_params)
So I'm trying to maximize the likelihood function for a gamma-poisson and I've programmed it into R as the following:
lik<- function(x,t,a,b){
for(i in 1:n){
like[i] =
log(gamma(a + x[i]))-log(gamma(a))
-log(gamma(1+x[i] + x[i]*log(t[i]/b)-(a+x[i])*log(1+t[i]/b)
}
return(sum(like))
}
where x and t are the data, and I have n data rows.
I need a and b to be solved for simultaneously. Does a built in function exist in R? Or do I need to hard code an algorithm to solve the system of equations? [I'd rather not] I know optimize() solves for 1 variable and so does fminbnd(). I'm trying to copy the behavior of FindMaximum() in mathematica. In a perfect world I'd like the code to work something like this:
optimize(f=lik, a>0, b>0, x=x, t=t, maximum=TRUE, iteration=5000)
$maximum
a 150
b 6
Thanks.
optim's first argument can be a vector of parameters. So you could try something like this:
lik <- function(p=c(1,1), x, t){
# In the body of the function replace a by p[1] and b by p[2]
}
optim(c(1,1), lik, method = c("L-BFGS-B"), x=x, t=t, control=list(fnscale=-1))
So the solution that ended up working out is:
attempt2d <- optim(
par = c(sumx/sumt, 1), fn = lik, data = data11,
method = "L-BFGS-B", control = list(fnscale = -1, trace=TRUE),
lower=0.1, upper = 170
)
However my parameters run out to 170, essentially meaning that my gamma parameters are Inf. Because gamma() hits infinity relatively quickly. And in mathematica the solutions are a=169 and b=16505, and R gets nowhere near that maxing out at 170. The known solutions are beyond 170 in some cases any solution for this anomaly?
I am trying to optmize a function in R
The function is the Likelihood function of negative binominal when estimating only mu parameter. This should not be a problem since the function clearly has just one point of maximum. But, I am not being able to reach the desirable result.
The function to be optmized is:
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(Resultado)
}
Data is a vector of negative binomial variables with parameters 2 and 2
data <- rnegbin(10000, mu = 2, theta = 2)
When I plot the function having mu as variable with the following code:
x <- seq(0.1, 100, 0.02)
z <- EMV(data,0.1)
for (aux in x) {z <- rbind(z, EMV(data,aux))}
z <- z[2:NROW(z)]
plot(x,z)
I get the following curve:
And the maximum value of z is close to parameter value --> 2
x[which.max(z)]
But the optimization is not working with BFGS
Error in optim(par = theta, fn = EMV, data = data, method = "BFGS") :
non-finite finite-difference value [1]
And is not going to right value using SANN, for example:
$par
[1] 5.19767e-05
$value
[1] -211981.8
$counts
function gradient
10000 NA
$convergence
[1] 0
$message
NULL
The questions are:
What am I doing wrong?
Is there a way to tell optim that the param should be bigger than 0?
Is there a way to tell optim that I want to maximize the function? (I am afraid the optim is trying to minimize and is going to a very small value where function returns smallest values)
Minimization or Maximization?
Although ?optim says it can do maximization, but that is in a bracket, so minimization is default:
fn: A function to be minimized (or maximized) ...
Thus, if we want to maximize an objective function, we need to multiply an -1 to it, and then minimize it. This is quite a common situation. In statistics we often want to find maximum log likelihood, so to use optim(), we have no choice but to minimize the negative log likelihood.
Which method to use?
If we only do 1D minimization, we should use method "Brent". This method allows us to specify a lower bound and an upper bound of search region. Searching will start from one bound, and search toward the other, until it hit the minimum, or it reach the boundary. Such specification can help you to constrain your parameters. For example, you don't want mu to be smaller than 0, then just set lower = 0.
When we move to 2D or higher dimension, we should resort to "BFGS". In this case, if we want to constrain one of our parameters, say a, to be positive, we need to take log transform log_a = log(a), and reparameterize our objective function using log_a. Now, log_a is free of constraint. The same goes when we want constrain multiple parameters to be positive.
How to change your code?
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(-1 * Resultado)
}
optim(par = theta, fn = EMV, data = data, method = "Brent", lower = 0, upper = 1E5)
The help file for optim says: "By default optim performs minimization, but it will maximize if control$fnscale is negative." So if you either multiply your function output by -1 or change the control object input, you should get the right answer.
I am teaching myself how to run some Markov models in R, by working through the textbook "Hidden Markov Models for Time Series: An Introduction using R". I am a bit stuck on how to go about implementing something that is mentioned in the text.
So, I have the following function:
f <- function(samples,lambda,delta) -sum(log(outer(samples,lambda,dpois)%*%delta))
Which I can optimize with respect to, say, lambda using:
optim(par, fn=f, samples=x, delta=d)
where "par" is the initial guess for lambda, for some x and d.
Which works perfectly fine. However, in the part of the text corresponding to the example I am trying to reproduce, they note: "The parameters delta and lambda are constrained by sum(delta_i)=1 for i=1,...m, delta_i>0, and lambda_i>0. It is therefore necessary to reparametrize if one wishes to use an unconstrained optimizer such as nlm". One possibility is to maximize the likelihood with respect to the 2m-1 unconstrained parameters".
The unconstrained parameters are given by
eta<-log(lambda)
tau<-log(delta/(1-sum(delta))
I don't entirely understand how to go about implementing this. How would I write a function to optimize over this transformed parameter space?
When using optim() without parmater transfromations like so:
simpleFun <- function(x)
(x-3)^2
out = optim(par=5,
fn=simpleFun)
the set of parmaters estimates would be obtained via out$par which is 3 in
the case, as you might expect. Alternatively, you can wrap your function
f in a transformation the parameters like so:
out = optim(par=5,
fn=function(x)
# apply the transformation x -> x^3
simpleFun(x^3))
and now the trick to get the correct set of optimal parmeters to your
function you need to apply the same transfromation to the parameter
estimates as in:
(out$par)^3
#> 2.99741
(and yes, the parameter estimate is slightly different. For this contrived
example, you could set method="BFGS" for a slightly better estimate. Anyhow, this goes to show that the choice of transformation does matter in
some cases, but that's for another discussion...)
To complete the answer, It sounds like you a want to use a wrapper like so
# the function to be optimized
f <- function(samples,lambda,delta)
-sum(log(outer(samples,lambda,dpois)%*%delta))
out <- optim(# par it now a 2m vector
par = c(eta1 = 1,
eta2 = 1,
eta3 = 1,
tau1 = 1,
tau2 = 1,
tau3 = 1),
# a wrapper that applies the constraints
fn=function(x,samples){
# exp() guarantees that the values of lambda are > 0
lambda = exp(x[1:3])
# delta is also > 0
delta = exp(x[4:6])
# and now it sums to 1
delta = delta / sum(delta)
f(samples,lambda,delta)
},
samples=samples)
The above guarantees that the the parameters passed to f()have the correct constraints, and as long as you apply the same transformation to out$par, optim() will estimate an optimal set of parameters for f().
(If anyone has a suggestion for a better title, please let me know.)
I am trying to write a backward induction optimization problem. (That might not be important, but if it helps, great.)
I have a function that is a function of two variables, x and y.
I have a matrix for which I know only the terminal column, and each column needs to be solves backwards using the last column and optimization over x and y.
For example
m.state=matrix(1:16,16,1)
m.valuemat=matrix(0,16,5)
# five is number of periods
#16 is num of states (rows)
##Suppose i want to make optim avoid chosing a configuration that lands us in states 1-5 at the end
m.valuemat[1:5,5]=-Inf
f.foo0=function(x,y){
util=2*x^2-y^1.5
return(util)
}
foo=function(x,y,a){
footomorrow=function(x,y,a){
at1=-x+2*y+a
atround=abs(m.state-at1)
round2=m.state[which(min(atround)==atround)]
at1=round2
Vtp1=m.valuemat[which(m.state==at1),(5+1)]
return(Vtp1)
}
valuetoday=f.foo0(x,y)+.9*footomorrow(x,y,a)
return(valuetoday)
}
# I know the final column should be all 0's
for(i in 1:4){
print(i)
i=5-i
for(j in 1:16){
tempfunction=function(x){
foo(x[1],x[2],m.state[j])
}
result=optim(c(.001,1), tempfunction, gr = NULL, method = "L-BFGS-B",
lower = c(0.001,0.001), upper = c(5,1),
control = list(fnscale=-1,
maxit=50000), hessian = FALSE)
m.valuemat[j,i]=result$value
print( m.valuemat)
}
}
The error you get is: Error in optim(c(0.001, 1), f.Vt.ext, gr = NULL, method = "L-BFGS-B", :
L-BFGS-B needs finite values of 'fn'.
Is there a way to make optim smarter about this? Or a condition I can put or something? This is obviously a simplified version of my real code.