Background
I'm doing empirical likelihood maximization, and hit an error. The statistics is a bit complex and I describe the error like following.
Problem
Suppose I have multivariate function with domain:
,
And I want to minimize the function in the domain.
Here is the counter plot for my function, the optimum seems to be x = y = 0.5 and z = 0.
if(!require(plotly)){
install.packages('plotly')
require(plotly)
}else{
require(plotly)
}
myfun = function(x,y){
z <- 1-x-y
# want to minimiaze f
f = (x - 0.26)^2 + (y-0.51)^2 + 100*(z+0.6)^2
return(f) # return f, so that optim can minimize f
}
plot_ly(
x = seq(0,1,length.out = 101),
y = seq(0,1,length.out = 101),
z = outer(seq(0,1,length.out = 101), seq(0,1,length.out = 101),FUN = myfun),
type = "contour"
)
My fail attempt
since z = 1-x-y, I tried with 2 parameters (x,y) and their boundry [0,1].
Code:
myobj <- function(para){
x <- para[1]
y <- para[2]
z <- 1-x-y
# want to minimiaze f
f = (x - 0.26)^2 + (y-0.51)^2 + 100*(z+1.6)^2
return(f) # return f, so that optim can minimize f
}
# initialization x=y=0, lower bound (0,0), upper bound (1,1), without restriction on z
optim(par = c(0,0),fn = myobj,method = "L-BFGS-B",lower = c(0,0),upper = c(1,1))
Output:
$par
[1] 1 1
$value
[1] 36.7877
$counts
function gradient
2 2
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
The output shows that x=y=1 is the result, but when x=y=1, z = 1-1-1 = -1 does not in its domain.
I am wondering how can I put my restrition on z and get the right result,using function like optim?
Thanks a lot!
CVXR vs. constrOptim time comparation
library(microbenchmark)
library(CVXR)
method_const <- function(){
myobj <- function(para){
x <- para[1]
y <- para[2]
z <- 1 - x-y
# want to minimiaze f
f = (x - 0.26)^2 + (y-0.51)^2 + 100*(z+0.6)^2
return(f) # return f, so that optim can minimize f
}
res <- constrOptim(c(0.01,0.01), myobj, NULL,
ui = rbind(c(1,0),c(-1,0),c(0,1),c(0,-1),c(-1,-1)),
ci = c(0,-1,0,-1,-1))
return(round(c(res$par,res$value),4))
}
method_CVXR <-function(){
# declaration of variables x, y, and z
x <- Variable(1)
y <- Variable(1)
z <- Variable(1)
# setup constraints
cons <- list(sum(gg)==1,gg[1]>=0, gg[2]>=0, gg[3]>=0)
# formulate objective function
obj <- Minimize((gg[1] - 0.26)^2 + (gg[2]-0.51)^2 + 100*(gg[3]+0.6)^2)
problem <- Problem(obj,cons)
# solve the optimization problem
res <- solve(problem)
return(round(c(res$getValue(gg),res$value),4))
}
method_CVXR2 <-function(){
# declaration of variables x, y, and z
gg <- Variable(3)
# setup constraints
cons <- list(x+y+z==1,x>=0, y>=0, z>=0)
# formulate objective function
obj <- Minimize((x - 0.26)^2 + (y-0.51)^2 + 100*(z+0.6)^2)
problem <- Problem(obj,cons)
# solve the optimization problem
res <- solve(problem)
return(round(c(res$getValue(x),res$getValue(y),res$value),4))
}
time_res = microbenchmark(method_const(),method_CVXR(),method_CVXR2())
print(time_res)
ggplot2::autoplot(time_res)
It will be more natural to solve this as a quadratic programming problem. Package quadprog provides such a QP solver, except that the problem formulation is a bit clumsy.
# objective function as quadratic problem
D <- diag(c(1, 1, 100))
d <- c(0.26, 0.51, -60)
# equality and bound constraints
A <- rbind(c(1,1,1), diag(3))
b <- c(1, 0,0,0)
meq <- 1
s <- solve.QP(D, d, t(A), b, meq)
s$solution
## [1] 0.375 0.625 0.000
Time comparisons on my computer are as follows:
Unit: milliseconds
expr mean median
method_quadprog() 0.035 0.032
method_solnl() 1.696 1.037
method_fmincon() 1.677 1.092
method_constroptim() 2.130 1.644
method_CVXR() 113.590 97.924
REMARK: Please note that fmincon is just a wrapper for solnl.
The solver behind CVXR is one of the fastest available in R, but building the model is taking some time. That is why CVXR is not efficient for very small problems such as this one, but can be hundreds of times faster than others for quite large problems.
Here is the performance comparison among several approaches
> time_res
Unit: milliseconds
expr min lq mean median uq max neval
method_constroptim() 1.8112 1.86370 3.140725 1.97750 2.07470 11.8188 20
method_fmincon() 1.1804 1.22620 1.633585 1.37365 1.45635 7.0064 20
method_solnl() 1.0980 1.17495 2.165110 1.27700 1.40575 9.3543 20
method_CVXR() 111.4424 121.00940 155.573570 129.92280 149.25700 414.5042 20
where the code for benchmark is given as below
library(microbenchmark)
library(CVXR)
library(pracma)
library(NlcOptim)
# objective function for minimization
f <- function(v) {
x <- v[1]
y <- v[2]
z <- 1- x - y
r <- (x - 0.26)^2 + (y-0.51)^2 + 100*(z +0.6)^2
}
# constrOptim()
method_constroptim <- function(){
v0 <- c(0.5,0.5)
ui <- rbind(c(1,0),c(-1,0),c(0,1),c(0,-1),c(-1,-1))
ci <- c(0,-1,0,-1,-1)
constrOptim(c(0.01,0.01), f, NULL, ui = ui,ci = ci)
}
# fmincon() from "pracma" package
method_fmincon <-function(){
v0 <- c(0.5,0.5,0)
Aeq <- t(matrix(c(1,1,1)))
beq <- 1
lb <- c(0,0,0)
ub <- c(1,1,1)
fmincon(v0,f,Aeq = Aeq, beq = beq, lb = lb, ub = ub)
}
# solnl() from "NlcOptim" package
method_solnl <- function() {
v0 <- c(0.5,0.5,0)
Aeq <- t(matrix(c(1,1,1)))
beq <- 1
lb <- c(0,0,0)
ub <- c(1,1,1)
solnl(v0,f,Aeq = Aeq, Beq = beq, lb = lb, ub = ub)
}
# solnl() from "CVXR" package
method_CVXR <-function(){
x <- Variable(1)
y <- Variable(1)
cons <- list(x>=0, y>=0, x+y<=1)
obj <- Minimize((x - 0.26)^2 + (y-0.51)^2 + 100*(1-x-y +0.6)^2)
problem <- Problem(obj,cons)
solve(problem)
}
time_res = microbenchmark(method_constroptim(),
method_fmincon(),
method_solnl(),
method_CVXR(),
times = 20)
Related
I wrote some R code for simulating random samples from a Poisson distribution, based on the description of an algorithm (see attached image). But my code does not seem to work correctly, because the generated random samples are of a different pattern compared with those generated by R's built-in rpois() function. Can anybody tell me what I did wrong and how to fix my function?
r.poisson <- function(n, l=0.5)
{
U <- runif(n)
X <- rep(0,n)
p=exp(-l)
F=p
for(i in 1:n)
{
if(U[i] < F)
{
X[i] <- i
} else
{
p=p*l/(i+1)
F=F+p
i=i+1
}
}
return(X)
}
r.poisson(50)
The output is very different from rpois(50, lambda = 0.5). The algorithm I followed is:
(Thank you for your question. Now I know how a Poisson random variable is simulated.)
You had a misunderstanding. The inverse CDF method (with recursive computation) you referenced is used to generate a single Poisson random sample. So you need to fix this function to produce a single number. Here is the correct function, commented to help you follow each step.
rpois1 <- function (lambda) {
## step 1
U <- runif(1)
## step 2
i <- 0
p <- exp(-lambda)
F <- p
## you need an "infinite" loop
## no worry, it will "break" at some time
repeat {
## step 3
if (U < F) {
X <- i
break
}
## step 4
i <- i + 1
p <- lambda * p / i ## I have incremented i, so it is `i` not `i + 1` here
F <- F + p
## back to step 3
}
return(X)
}
Now to get n samples, you need to call this function n times. R has a nice function called replicate to repeat a function many times.
r.poisson <- function (n, lambda) {
## use `replicate()` to call `rpois1` n times
replicate(n, rpois1(lambda))
}
Now we can make a reasonable comparison with R's own rpois.
x1 <- r.poisson(1000, lambda = 0.5)
x2 <- rpois(1000, lambda = 0.5)
## set breaks reasonably when making a histogram
xmax <- max(x1, x2) + 0.5
par(mfrow = c(1, 2))
hist(x1, main = "proof-of-concept-implementation", breaks = seq.int(-0.5, xmax))
hist(x2, main = "R's rpois()", breaks = seq.int(-0.5, xmax))
Remark:
Applaud jblood94 for exemplifying how to seek vectorization opportunity of an R loop, without converting everything to C/C++. R's rpois is coded in C, that is why it is fast.
A vectorized version will run much faster than a non-vectorized function using replicate. The idea is to iteratively drop the uniform random samples as i is incremented.
r.poisson1 <- function(n, l = 0.5) {
U <- runif(n)
i <- 0L
X <- integer(n)
p <- exp(-l)
F <- p
idx <- 1:n
while (length(idx)) {
bln <- U < F
X[idx[bln]] <- i
p <- l*p/(i <- i + 1L)
F <- F + p
idx <- idx[!bln]
U <- U[!bln]
}
X
}
#Zheyuan Li's non-vectorized functions:
rpois1 <- function (lambda) {
## step 1
U <- runif(1)
## step 2
i <- 0
p <- exp(-lambda)
F <- p
## you need an "infinite" loop
## no worry, it will "break" at some time
repeat {
## step 3
if (U < F) {
X <- i
break
}
## step 4
i <- i + 1
p <- lambda * p * i
F <- F + p
## back to step 3
}
return(X)
}
r.poisson2 <- function (n, lambda) {
## use `replicate()` to call `rpois1` n times
replicate(n, rpois1(lambda))
}
Benchmark:
microbenchmark::microbenchmark(r.poisson1(1e5),
r.poisson2(1e5, 0.5),
rpois(1e5, 0.5))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> r.poisson1(1e+05) 3.063202 3.129151 3.782200 3.225402 3.734600 18.377700 100
#> r.poisson2(1e+05, 0.5) 217.631002 244.816601 269.692648 267.977001 287.599251 375.910601 100
#> rpois(1e+05, 0.5) 1.519901 1.552300 1.649026 1.579551 1.620451 7.531401 100
I have an assignment to estimate parameter $θ$ from a sample with Pareto distribution with density $f(x; θ) = θ/x^(θ + 1), x ≥ 1$, where $θ>0$ is an unknown parameter. However, we do not know the realized sample $x$, we only know for each $x_i$ the given interval $(u_i; v_i)$ in which it is located.
Using the EM algorithm we have to estimate parameter $θ$. Also, EM has to be implemented in R and the code for that is down below.
When I run the code, I have an error because of NaN values. I've tried changing the starting value of the parameter, but NaN values still appear. How to fix this?
set.seed(1)
library(VGAM)
library(ggplot2)
#--------------------------------------------------------------------
# EM algorithm
# step E
expected_xs <- function(teta, u, v) {
teta/(teta-1) *
1/(1/(u^teta)-1/(v^teta))
}
# step M
maximize_logL <- function(xs) {
length(xs)/
sum(log(xs))
}
EM_estimate <- function(teta_0, u, v, tol = 1e-8, maxiter = 1000) {
xs <- expected_xs(teta_0, u, v)
teta <- maximize_logL(xs)
print(teta)
iter <- 1
while(!is.na(teta) && (abs(teta - teta_0) > tol) &&
iter < maxiter) {
iter <- iter + 1
teta_0 <- teta
xs <- expected_xs(teta_0, u, v)
teta <- maximize_logL(xs)
print(teta)
}
return(teta)
}
#--------------------------------------------------------------------
# Data
df <- read.table(header=T, text="
interval freq
1 1-1.5 15
2 1.5-2 5
3 2-2.5 3
4 2.5-3 3
5 3-1000 4")
df
#u <- c(1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5)
#v <- c(1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10)
u <- seq(1, 999.5, by=0.5)
v <- seq(1.5, 1000, by=0.5)
teta1=EM_estimate(0.3, u, v)
teta1
# we compare barplot with density (with its now estimated parameter)
barplot(df$freq, names.arg = df$interval)
curve(100*dpareto(x,teta1,1), add=TRUE, col="steelblue", lwd = 2)
One more thing, when I change teta/(teta-1)to teta/(teta+1) in here:
expected_xs <- function(teta, u, v) {
teta/(teta-1) *
1/(1/(u^teta)-1/(v^teta))
}
everything works normally.
I am trying to run the following simulation below. Note that this does require Mosek and RMosek to be installed!
I keep getting the error
Error in KWDual(A, d, w, ...) :
Mosek error: MSK_RES_TRM_STALL: The optimizer is terminated due to slow progress.
How can I resolve the MSK_RES_TRM_STALL error?
Further Research
When looking up the documentation for this I found this:
The optimizer is terminated due to slow progress.
Stalling means that numerical problems prevent the optimizer from making reasonable progress and that it makes no sense to continue. In many cases this happens if the problem is badly scaled or otherwise ill-conditioned. There is no guarantee that the solution will be feasible or optimal. However, often stalling happens near the optimum, and the returned solution may be of good quality. Therefore, it is recommended to check the status of the solution. If the solution status is optimal the solution is most likely good enough for most practical purposes.
Please note that if a linear optimization problem is solved using the interior-point optimizer with basis identification turned on, the returned basic solution likely to have high accuracy, even though the optimizer stalled.
Some common causes of stalling are a) badly scaled models, b) near feasible or near infeasible problems.
So I checked the final value A, but nothing was in it. I found that if I change the simulations from 1000 to 30 I do get values (A <- sim1(30, 30, setting = 1)), but this is suboptimal.
Reproducible Script
KFE <- function(y, T = 300, lambda = 1/3){
# Kernel Fourier Estimator: Stefanski and Carroll (Statistics, 1990)
ks <- function(s,x) exp(s^2/2) * cos(s * x)
K <- function(t, y, lambda = 1/3){
k <- y
for(i in 1:length(y)){
k[i] <- integrate(ks, 0, 1/lambda, x = (y[i] - t))$value/pi
}
mean(k)
}
eps <- 1e-04
if(length(T) == 1) T <- seq(min(y)-eps, max(y)+eps, length = T)
g <- T
for(j in 1:length(T))
g[j] <- K(T[j], y, lambda = lambda)
list(x = T, y = g)
}
BDE <- function(y, T = 300, df = 5, c0 = 1){
# Bayesian Deconvolution Estimator: Efron (B'ka, 2016)
require(splines)
eps <- 1e-04
if(length(T) == 1) T <- seq(min(y)-eps, max(y)+eps, length = T)
X <- ns(T, df = df)
a0 <- rep(0, ncol(X))
A <- dnorm(outer(y,T,"-"))
qmle <- function(a, X, A, c0){
g <- exp(X %*% a)
g <- g/sum(g)
f <- A %*% g
-sum(log(f)) + c0 * sum(a^2)^.5
}
ahat <- nlm(qmle, a0, X=X, A=A, c0 = c0)$estimate
g <- exp(X %*% ahat)
g <- g/integrate(approxfun(T,g),min(T),max(T))$value
list(x = T,y = g)
}
W <- function(G, h, interp = FALSE, eps = 0.001){
#Wasserstein distance: ||G-H||_W
H <- cumsum(h$y)
H <- H/H[length(H)]
W <- integrate(approxfun(h$x, abs(G(h$x) - H)),min(h$x),max(h$x))$value
list(W=W, H=H)
}
biweight <- function(x0, x, bw){
t <- (x - x0)/bw
(1-t^2)^2*((t> -1 & t<1)-0) *15/16
}
Wasser <- function(G, h, interp = FALSE, eps = 0.001, bw = 0.7){
#Wasserstein distance: ||G-H||_W
if(interp == "biweight"){
yk = h$x
for (j in 1:length(yk))
yk[j] = sum(biweight(h$x[j], h$x, bw = bw)*h$y/sum(h$y))
H <- cumsum(yk)
H <- H/H[length(H)]
}
else {
H <- cumsum(h$y)
H <- H/H[length(H)]
}
W <- integrate(approxfun(h$x, abs(G(h$x) - H)),min(h$x),max(h$x),
rel.tol = 0.001, subdivisions = 500)$value
list(W=W, H=H)
}
sim1 <- function(n, R = 10, setting = 0){
A <- matrix(0, 4, R)
if(setting == 0){
G0 <- function(t) punif(t,0,6)/8 + 7 * pnorm(t, 0, 0.5)/8
rf0 <- function(n){
s <- sample(0:1, n, replace = TRUE, prob = c(1,7)/8)
rnorm(n) + (1-s) * runif(n,0,6) + s * rnorm(n,0,0.5)
}
}
else{
G0 <- function(t) 0 + 7 * (t > 0)/8 + (t > 2)/8
rf0 <- function(n){
s <- sample(0:1, n, replace = TRUE, prob = c(1,7)/8)
rnorm(n) + (1-s) * 2 + s * 0
}
}
for(i in 1:R){
y <- rf0(n)
g <- BDE(y)
Wg <- Wasser(G0, g)
h <- GLmix(y)
Wh <- Wasser(G0, h)
Whs <- Wasser(G0, h, interp = "biweight")
k <- KFE(y)
Wk <- Wasser(G0, k)
A[,i] <- c(Wg$W, Wk$W, Wh$W, Whs$W)
}
A
}
require(REBayes)
set.seed(12)
A <- sim1(1000, 1000, setting = 1)
I ran the code and indeed it stalls at the end, but the solution is not any worse than in the preceding cases that solve without stall:
17 1.7e-07 3.1e-10 6.8e-12 1.00e+00 5.345949918e+00 5.345949582e+00 2.4e-10 0.40
18 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.41
19 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.48
20 2.6e-08 3.8e-11 2.9e-13 1.00e+00 5.345949389e+00 5.345949348e+00 2.9e-11 0.54
Optimizer terminated. Time: 0.62
Interior-point solution summary
Problem status : PRIMAL_AND_DUAL_FEASIBLE
Solution status : OPTIMAL
Primal. obj: 5.3459493890e+00 nrm: 6e+00 Viol. con: 2e-08 var: 0e+00 cones: 4e-09
Dual. obj: 5.3459493482e+00 nrm: 7e-01 Viol. con: 1e-11 var: 4e-11 cones: 0e+00
A quick hack for now that worked for me is to relax the termination tolerances a little bit in the call to GLmix:
control <- list()
control$dparam <- list(INTPNT_CO_TOL_REL_GAP=1e-7,INTPNT_CO_TOL_PFEAS=1e-7,INTPNT_CO_TOL_DFEAS=1e-7)
h <- GLmix(y,control=control,verb=5)
A better solution as I indicated in the comments is not to treat the stall termination code as an error by the REBayes package but use solution status/quality instead.
I have modified the return from KWDual to avoid such messages provided that
the status sol$itr$solsta from Mosek is "Optimal" in REBayes v2.2 now on CRAN.
I would like to use multiroot command in the rootSolve package to find b and m.
The code is given below.
I tried different starting values but the result was either NaN or NaNs produced.
n <- 23
model <- function(theta){
b <- theta[1]
m <- theta[2]
power <- exp(-(x-m)*b)
a <- -n/sum(log(1-power))
betat <- apply(x,1,function(x) (x-m)*power/(1-power))
mut <- apply(x,1, function(x) power/(1-power))
F1 <- n/b-sum(x)+n*m+(a-1)*sum(betat)
F2 <- n*b-b*(a-1)*sum(mut)
c(F1=F1,F2=F2)
}
multiroot(f = model, start = c(.5, .5))
So can someone explain me where the mistake is, please?
library(rootSolve)
x<- c(17.88,28.92,33,41.52,42.12,45.6,48.40,51.84,51.96,54.12,55.56,67.80,
68.64,68.64,68.88,84.12,93.12, 98.64,105.12,105.84,127.92,128.04,173.4)
n <- length(x)
model <- function(theta){
b <- theta[1]
m <- theta[2]
power <- exp(-(x-m)*b)
a <- -n/sum(log(1-power))
F1 <- n/b-sum(x-m) + (a-1)*sum((x-m)*power/(1-power))
F2 <- n*b - b*(a-1)*sum(power/(1-power))
c(F1=F1,F2=F2)
}
# model(c(b = 0.031, m = 4.748))
multiroot(f = model, start = c(.03, 5))
so the result is:
> multiroot(f = model, start = c(.03, 5))
$root
[1] 0.03140027 4.55976021
$f.root
F1 F2
-2.046363e-12 -6.217249e-15
$iter
[1] 5
$estim.precis
[1] 1.02629e-12
I am not able to apply ucminf function to minimise my cost function in R.
Here is my cost function:
costfunction <- function(X,y,theta){
m <- length(y);
J = 1/m * ((-t(y)%*%log(sigmoid(as.matrix(X)%*%as.matrix(theta)))) - ((1-t(y))%*%log(1-sigmoid(as.matrix(X)%*%as.matrix(theta)))))
}
Here is my sigmoid function:
sigmoid <- function(t){
g = 1./(1+exp(-t))
}
Here is my gradient function:
gradfunction <- function(X,y,theta){
grad = 1/ m * t(X) %*% (sigmoid(as.matrix(X) %*% as.matrix(theta) - y));
}
I am trying to do the following:
library("ucminf")
data <- read.csv("ex2data1.txt",header=FALSE)
X <<- data[,c(1,2)]
y <<- data[,3]
qplot(X[,1],X[,2],colour=factor(y))
m <- dim(X)[1]
n <- dim(X)[2]
X <- cbind(1,X)
initial_theta <<- matrix(0,nrow=n+1,ncol=1)
cost <- costfunction(X,y,initial_theta)
grad <- gradfunction(X,y,initial_theta)
This is where I want to call ucminf to find the minimum cost and values of theta. I am not sure how to do this.
Looks like you are trying to do the week2 problem of the machine learning course of Coursera.
No need to use ucminf packages here, you can simply use the R function optim it works
We will define the sigmoid and cost function first.
sigmoid <- function(z)
1 / (1 + exp(-z))
costFunction <- function(theta, X, y) {
m <- length(y)
J <- -(1 / m) * crossprod(c(y, 1 - y),
c(log(sigmoid(X %*% theta)), log(1 - sigmoid(X %*% theta))))
grad <- (1 / m) * crossprod(X, sigmoid(X %*% theta) - y)
list(J = J, grad = grad)
}
Let's load the data now, to make this code it reproductible, I put the data in my dropbox.
download.file("https://dl.dropboxusercontent.com/u/8750577/ex2data1.txt",
method = "curl", destfile = "/tmp/ex2data1.txt")
data <- matrix(scan('/tmp/ex2data1.txt', what = double(), sep = ","),
ncol = 3, byrow = TRUE)
X <- data[, 1:2]
y <- data[, 3, drop = FALSE]
m <- nrow(X)
n <- ncol(X)
X <- cbind(1, X)
initial_theta = matrix(0, nrow = n + 1)
We can then compute the result of the cost function at the initial theta like this
cost <- costFunction(initial_theta, X, y)
(grad <- cost$grad)
## [,1]
## [1,] -0.100
## [2,] -12.009
## [3,] -11.263
(cost <- cost$J)
## [,1]
## [1,] 0.69315
Finally we can use optim to ge the optimal theta
res <- optim(par = initial_theta,
fn = function(t) costFunction(t, X, y)$J,
gr = function(t) costFunction(t, X, y)$grad,
method = "BFGS", control = list(maxit = 400))
(theta <- res$par)
## [,1]
## [1,] -25.08949
## [2,] 0.20566
## [3,] 0.20089
(cost <- res$value)
## [1] 0.2035
If you have some problem with the function download.file, the data can be downloaded
here
As you did not provide a reproducible example it is hard to exactly give you the code you need, but the general idea is to hand the functions over to ucminf:
ucminf(start, costfunction, gradfunction, y = y, theta = initial_theta)
Note that start needs to be a vector of initial starting values which when handed over as X to the two functions need to produce a result. Usually you use random starting value (e.g., runif).