Debugging optimization of a function - r

I have a function that works just fine when asked to calculate the -logLik given parameters. However, if I try to optimize the function it returns an error message. I'm familiar with debug() to work through problems with a function, but how would I go about debugging optimization for a function that othwerwise works?
Lik <- function(params, data) {
....
return(-log( **likelihood equation** ))
}
These work!
Lik(params=c(3,10,2,9,rowMeans(data[1,])[1]), data = data1)
Lik(params=c(3,10,2,9.5,rowMeans(data[1,])[1]), data = data1)
GENE1 32.60705
GENE1 32.31657
This doesn't work!
optim(params=c(3,10,2,9,rowMeans(data[1,])[1]), data = data1, Lik, method = "BFGS")
Error in optim(params = c(3, 10, 2, 9, rowMeans(data[1, ])[1]), data = data1, :
cannot coerce type 'closure' to vector of type 'double'

The optim parameter name for the parameters to optimize over is par, not params. You don't need to change your Lik function, it just needs to have the parameters to optimize over as the first argument, the name doesn't matter.
This should work. Here I name the fn argument too, but because the others are named the positional finding works.
optim(par=c(3, 10, 2, 9, rowMeans(data[1, ])[1]),
data=data1, fn=Lik, method="BFGS")
So what was happening in your code was that it was saving both params and data to send to the function, and then the first unnamed parameter was Lik so it was getting matched to the first parameter of optim, which is par, the parameters to optimize over. That parameter should be a numeric (a double, technically) but you were sending it a function (a closure, technically), hence the error message.
To debug, you could have turned on debugging for optim debug(optim) and then at the first browse, explored what the parameters were that it was using. You would have found exactly this, though simply in exploring the parameters, you would have discovered you named them incorrectly.
Browse[2]> print(par)
function(params, data) {... return(-log( **likelihood equation** ))}
Browse[2]> print(fn)
Error in print(fn) : argument "fn" is missing, with no default

It is bad practice to use built-in function names as object names created (or to be created) by the user.
When there is no "data" object (a matrix or a data frame) yet created by the user, R interpreter scans the environments and finds that the only object named "data" is the built in "data" function:
> class(data)
[1] "function"
> str(data)
function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"),
envir = .GlobalEnv)
Hence R treats the "data" object as a closure (a function declaration) that cannot be subsetted:
> data[1]
Error in data[1] : object of type 'closure' is not subsettable
So you should change the name of the parameter to sth other than data.
And a second point, the syntax of optim is:
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
So in your example, the second parameter supplied to optim should be the function Lik, not the data. And the interpreter tries to interpret data1 as a closure. You can try to swap the positions of data1 and Lik.
And more importantly as #李哲源ZheyuanLi also points, there is no parameter in optim named as "data". You should just write it as "data1" in place of the additional function parameters "...".
And last, as also #Aaron pointed out, the first parameter is named "par" not params".

Related

Use of variable arguments (dot-dot-dot) in stats::lm in R [duplicate]

This question already has answers here:
Ellipsis Trouble: Passing ... to lm
(2 answers)
Closed last year.
Suppose we have a function that makes a call to stats::lm and takes a formula and a data frame as arguments. Further arguments that we want to pass to stats::lm can be provided using variable arguments:
outer_function <- function(formula, data, ...) {
z <- stats::lm(formula = formula, data = data, ...)
return(z)
}
Now suppose we want to use this function and provide an additional argument (weights) that will be passed to stats::lm.
data <- data.frame(replicate(5, rnorm(100)))
weights <- replicate(100, 1)
formula <- X1 ~ X2 + X3
outer_function(formula = formula, data = data, weights = weights)
This produces the following error in stats::lm:
Error in eval(extras, data, env) :
..1 used in an incorrect context, no ... to look in
Debugging the call to stats::lm I see that argument weights is correctly passed to stats::lm, but match.call(), which is later used for evaluation in the function, is
stats::lm(formula = formula, data = data, weights = ..1)
such that weights is assigned the first element of the ...-list, which is empty.
Can anybody elaborate on why this approach fails? In particular, if weights was a scalar (say 5) the problem would not arise and the match.call() would be
stats::lm(formula = formula, data = data, weights = 5)
For now, I am using the following solution for my function:
outer_function <- function(formula, data, ...) {
args <- list(formula = formula, data = data, ...)
z <- do.call(stats::lm, args)
return(z)
}
which works but I am still wondering whether there is no way around do.call in case the arguments in ... are vectors or lists.
I can't think of a work-around as safe and as succinct as do.call. I can explain what's going on, having debugged into the lm call.
In the body of lm, you'll find the statement
mf <- eval(mf, parent.frame())
On the right hand side of the assignment, mf is the call
stats::model.frame(formula = formula, data = data, weights = ..1,
drop.unused.levels = TRUE)
and parent.frame() is the frame of the outer_function call (in other words, the evaluation environment of outer_function). eval is evaluating mf in parent.frame(). Due to S3 dispatch, what is ultimately evaluated in parent.frame() is the call
stats::model.frame.default(formula = formula, data = data, weights = ..1,
drop.unused.levels = TRUE)
In the body of model.frame.default, you'll find the statement
extras <- eval(extras, data, env)
On the right hand side of this assignment, extras is the call
list(weights = ..1)
specifying the arguments from mf matched to the formal argument ... of model.frame.default (just weights, in this case, because model.frame.default has formal arguments named formula, data, and drop.unused.levels); data is the data frame containing your simulated data; and env is your global environment. (env is defined earlier in the body of model.frame.default as environment(formula), which is indeed your global environment, because that is where you defined formula.)
eval is evaluating extras in data with env as an enclosure. An error is thrown here, because the data frame data and your global environment env are not valid contexts for ..n. The symbol ..1 is valid only in the frame of a function with ... as a formal argument.
You might have deduced the issue from ?lm, which notes:
All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.
There is no problem when weights is given the value of a constant (i.e., not the name of a variable bound in an environment and not a function call) in the outer_function call, because in that situation match.call does not substitute the symbol ..n. Hence
outer_function(formula = formula, data = data, weights = 5)
works (well, a different error is thrown), but
weights <- 5
outer_function(formula = formula, data = data, weights = weights)
and
outer_function(formula = formula, data = data, weights = rep(1, 100))
don't.

how to use Vector of two vectors with optim

I have a wrapper function of two functions. Each function has its own parameters vectors. The main idea is to pass the vectors of parameters (which is a vector or two vectors) to optim and then, I would like to maximize the sum of the function.
Since my function is so complex, then I tried to provide a simple example which is similar to my original function. Here is my code:
set.seed(123)
x <- rnorm(10,2,0.5)
ff <- function(x, parOpt){
out <- -sum(log(dnorm(x, parOpt[[1]][1], parOpt[[1]][2]))+log(dnorm(x,parOpt[[2]][1],parOpt[[2]][2])))
return(out)
}
# parameters in mu,sd vectors arranged in list
params <- c(set1 = c(2, 0.2), set2 = c(0.5, 0.3))
xy <- optim(par = params, fn=ff ,x=x)
Which return this error:
Error in optim(par = params, fn = ff, x = x) :
function cannot be evaluated at initial parameters
As I understand, I got this error because optim cannot pass the parameters to each part of my function. So, how can I tell optim that the first vector is the parameter of the first part of my function and the second is for the second part.
You should change method parameter to use initial parameters.
You can read detailed instructions about optim function using ?optim command.
For example you can use "L-BFGS-B" method to use upper and lower constraints.

Methods of summary() function

I'm using summary() on output of mle(stats4) function, its output belongs to class mle. I would like to find out how summary() estimates standard deviation of coefficient returned by mle(stats4), but I do not see summary.mle in list printed by methods(summary), why can't I find summary.mle() function ?
(I guess the proper function is summary.mlm(), but I'm not sure that and don't know why it would be mlm, instead of mle)
It's actually what summary.mle would be if it were an S3 method. S3 methods get created and then dispatched using the generic_function_name.class_of_first_argument mechanism whereas S4 methods are dispatched on the basis of their argument "signature" which allows consideration of second and later arguments. This is how to get showMethods to display the code that is called when an S4-method is called. This is an instance where only the first argument is used as the signature. You can choose any of the object signatures that appear in the abbreviated output to specify the classes-agument, and it is the includeDefs flag that prompts display of the code:
showMethods("summary",classes="mle", includeDefs=TRUE)
#---(output to console)----
Function: summary (package base)
object="mle"
function (object, ...)
{
cmat <- cbind(Estimate = object#coef, `Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2 * object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
}
As shown in
>library(stats4)
>showMethods("summary")
Function: summary (package base)
object="ANY"
object="mle"
The summary is interpreted in the S4 way. I don't know how to check the code in R directly, so I search the source of stats4 directly for you.
In stats4/R/mle.R, there is:
setMethod("summary", "mle", function(object, ...){
cmat <- cbind(Estimate = object#coef,
`Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2*object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
})
So it creates a S4 object summary.mle. And I guess you could trace the code by yourself now.

Why does order matter when using "data" and "formula" keyword arguments?

In R, why is it that the order of the data and formula keywords matters when plotting? I thought that with named arguments order isn't supposed to matter...
For an example of what I mean, check out this code:
library(MASS)
data(menarche)
# Correct formulation (apparently):
plot(formula=Menarche/Total ~ Age, data=menarche)
# In contrast, note how the following returns an error:
plot(data=menarche, formula=Menarche/Total ~ Age)
Is this just a quirk of the plot function or is this behavior exhibited in other functions as well?
It is related to S3 methods for the S3 generic plot(). S3 dispatches methods based on the first argument however the exact functioning is complicated because formula is allowed as a special exception from the usual generic arguments of plot(), which are x and y plus ...:
> args(plot)
function (x, y, ...)
NULL
Hence what happens in the first case is that the plot.formula() method is run because the first argument supplied is a formula and this matches the arguments of plot.formula()
> args(graphics:::plot.formula)
function (formula, data = parent.frame(), ..., subset, ylab = varnames[response],
ask = dev.interactive())
NULL
for example:
> debugonce(graphics:::plot.formula)
> plot(formula=Menarche/Total ~ Age, data=menarche)
debugging in: plot.formula(formula = Menarche/Total ~ Age, data = menarche)
debug: {
m <- match.call(expand.dots = FALSE)
[...omitted...]
In contrast, when you call plot(data=menarche, formula=Menarche/Total ~ Age), the first argument is a data frame and hence the graphics:::plot.data.frame method is called:
> plot(data=menarche, formula=Menarche/Total ~ Age)
Error in is.data.frame(x) : argument "x" is missing, with no default
> traceback()
3: is.data.frame(x)
2: plot.data.frame(data = menarche, formula = Menarche/Total ~ Age)
1: plot(data = menarche, formula = Menarche/Total ~ Age)
but because that method expects an argument x, which you didn't supply, you get the error about missing x.
So in a sense, the ordering of named arguments doesn't and shouldn't matter but when S3 generics are in play method dispatch kicks in first to decide which method to pass the arguments on to and then the arguments supplied - not the ordering - is what will often catch you out, especially when mixing the formula methods with other non-formula methods.

Optimizing 2 sets of variable length vectors

I did searched the questions here before posting and I found only one question in this regard but it doesn't apply to my case.
I have uploaded the data for PRD, INJ, tao and lambda with the links below, which shall be used to reproduce the code:
PRD
INJ
lambda
tao
the code:
PRD=read.csv(file="PRD.csv")
INJ=read.csv(file="INJ.csv")
PRD=do.call(cbind, PRD)
INJ=do.call(cbind, INJ)
tao=do.call(cbind, read.csv(file="tao.csv",header=FALSE))
lambda=do.call(cbind, read.csv(file="lambda.csv",header=FALSE))
fn1 <- function (tao,lambda) {
#perparing i.dash
i.dash=matrix(ncol=ncol(INJ), nrow=(nrow(INJ)))
for (i in 1:ncol(INJ)){
for (j in 1:nrow (INJ)){
temp=0
for (k in 1:j){
temp=(1/tao[i])*exp((k-j)/tao[i])*INJ[k,i]+temp
}
i.dash[j,i]=temp
}
#preparing lambdaXi.dash
lambda.i=matrix(ncol=ncol(INJ),nrow=nrow(INJ))
for (i in 1: ncol(INJ)){
lambda.i[,i]=lambda[i+1]*i.dash[,i]
}
#calc. q. hat (I need to add the pp term)
q.hat=matrix(nrow=nrow(INJ),1 )
for (i in 1:nrow(INJ)){
q.hat[i,1]=sum(lambda.i[i,1:ncol(INJ)])
target= sum((PRD[,1]-q.hat[,1])^2)
}
}
}
what I am trying to do is to minimize the value target by optimizing lambda and tao which the starting values will be the same as the ones uploaded above. I've used optim to do so but I still receive the error cannot coerce type 'closure' to vector of type double
I've used many variations of optim and still recieve the same error.
the last syntax I've used was optim(fn1, tao=tao, lambda=lambda, hessian=T)
Thanks
The calling form of optim is
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
So, you need to pass the parameters first, not the function. Note that "closure" is another term for "function", which explains the error message: you have passed a function as the first argument, when optim expected initial parameter values.
Note also, that optim only optimises over the first argument of the function fn, so you will need to redesign your function fn1 so it only takes a single function. For example, it could be a single vector where of the form c(n, t1, t2,...,tn, l1, l2, l3, ... lm) where ti are the components of tao and li components of lambda and n tells you how many components tao has.

Resources