Optimization with a function that involves integration - r

I'm working on a problem where a parameter is estimated through minimizing the sum of squares. The equations needed are:
I used optim in the package stats:
# provide the values for a test dataset (the y estimated should be 1.41)
pvector <- c(0.0036,0.0156,0.0204,0.0325,0.1096,0.1446,0.1843,0.4518)
zobs <- c(0.0971,0.0914,0.1629,0.1623,0.3840,0.5155,0.3648,0.6639)
# make input of the C value
c <- function(y){
gamma(y)/((gamma(y*(1-pvector)))*(gamma(y*pvector)))
}
# make input of the gamma function
F1 <- function(y){
f1 <- function(x){
c*(1-x)^(y*(1-pvector)-1)*x^(y*pvector-1)
}
return (f1)
}
# integration over x
int <- function(y){
integrate (F1(y),lower =0.001, upper =1)
}
# write the function for minimization
f2 <- function(y) {
sum ((int-zobs)^2)
}
# minimization
optim(0.01,f2, method = "Brent", lower =0, upper = 1000, hessian=TRUE)
Which didn't work. I received the following error message:
Error in int - zobs : non-numeric argument to binary operator
I think there must be something fundamentally wrong with the way how the function was written.

Related

Integration problem in R when I use the function "integrate"

I'm trying to compute a kind of Gini index using a generated dataset.
But, I got a problem in the last integrate function.
If I try to integrate the function named f1,
R says
Error in integrate(Q, 0, p) : length(upper) == 1 is not TRUE
My code is
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integrate(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- G(p)$value
dino <- G(1)$value
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
integrate(f1,0,1) # In this integration, the aforementioned error appears
I think, the repeated integrate could make a problem but I have no idea what is the exact problem.
Please help me!
As mentioned by #John Coleman, integrate needs to have a vectorized function and a proper subdivisions option to fulfill the integral task. Even if you have already provided a vectorized function for integral, it is sometimes tricky to properly set the subdivisions in integrate(...,subdivisions = ).
To address your problem, I recommend integral from package pracma, where you still a vectorized function for integral (see what I have done to functions G and L), but no need to set subdivisions manually, i.e.,
library(pracma)
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integral(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- Vectorize(G)(p)
dino <- G(1)
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
res <- integral(f1,0,1)
then you will get
> res
[1] 0.1283569
The error that you reported is due to the fact that the function in integrate must be vectorized and integrate itself isn't vectorized.
From the help (?integrate):
f must accept a vector of inputs and produce a vector of function
evaluations at those points. The Vectorize function may be helpful to
convert f to this form.
Thus one "fix" is to replace your definition of f1 by:
f1 <- Vectorize(function(p) {
((1-p)^(d-2))*L(p)
})
But when I run the resulting code I always get:
Error in integrate(Q, 0, p) : maximum number of subdivisions reached
A solution might be to assemble a large number of quantiles and then smooth it out and use that rather than your Q, although the error here strikes me as odd.

Error in using optim function due to length

I am trying to estimate the below log function using maximum likelihood method in R, but I get the following error:
Error in optim(start, f, method = method, hessian = TRUE, ...) : objective function in optim evaluates to length 10 not 1
My attempt was as follows:
Generating data
set.seed(101)
n <- 10
u <- runif(n)
theta1 <- 1
lamba1 <- 0.5
Generating PTIR data using quantile function
x <- function(u, theta1, lamba1) {
(-theta1/(log((1+lamba1)-sqrt((1+lamba1)^2-(4*lamba1*u)))/(2*lamba1)))^(1/(2))
}
x <- x(u = u, theta1 = theta1, lamba1 = lamba1)
Declaring the Log-Likelihood function
LL <- function(theta, lamba) {
R = suppressWarnings((n*log(2))+
(n*log(theta))-(((2)+1)*sum(log(x)))-
(sum(theta/(x^(2))))+
(log(1+lamba-(2*lamba*exp(-theta/(x^(2)))))))
return(-R)
}
mle(LL, start = list(theta = 5, lamba=0.5))
Any advice would be greatly appreciated.
I don't know how to fix your problem, but hopefully I can help you diagnose it. As #KonradRudolph suggests in comments, This may be a case where the usual advice "add more parentheses if you're not sure" may do more harm than good ... I've rewritten your function in a way that matches what you've got above, but has fewer parentheses and more consistent line breaking/indentation. Every line below is a separate additive term. Your specific problem is that the last term involves x (which has length 10 in this case), but is not summed, so the return value ends up being a length-10 vector.
LL2 <- function(theta, lambda) {
R <- n*log(2)+
n*log(theta)-
((2)+1)*sum(log(x))-
sum(theta/(x^2))+
log(1+lambda-(2*lambda*exp(-theta/x^2)))
return(-R)
}
all.equal(LL(1,1),LL2(1,1)) ## TRUE
length(LL2(1,1)) ## 10

function minimization in r with constraints on function of parameters

Many libraries are available in R to perform minimisation. However, all the ones I could find (e.g. rcgmin, or optimx) only allow lower and upper bounds on the input parameters:
opt_Params <- Rcgmin(par = Params_init,
fn = cost_func,
gr = params_grad,
lower = min_par,
upper = max_par
)
I'm looking for something different: boundaries not on the input parameters, but on the values of a function that takes them.
Concretely, my cost_func is a cost function that measures the sum of the squared residuals between my (fixed) observed data Y_obs and the prediction from my fitted parameters Y_calc:
cost_func <- function(Params) {
X <- Params[1:(num_items*num_features)]
dim(X) <- c(num_items,num_features)
Theta <- Params[(num_items*num_features+1):length(Params)]
dim(Theta) <- c(num_users,num_features)
Y_calc <- X * t(Theta)
J <- ((Y_calc - Y_obs) * (Y_calc - Y_obs))
cost <- sum(rowSums(J)[])
return(cost)
}
Minimising the cost function, I can ensure that my predicted Y_calc get ever closer to Y_obs.
This however allows for arbitrary values in Y_calc. What I would like to do is to impose the same boundaries as I know to be present in Y_obs (between 0 and 10 - this is a collaborative filtering algorithm).
So I'm not trying to put constraints on my input parameters, but on a function of them (namely, on every element of Y_calc <- X * t(Theta)).
Is there an R minimisation library in which this is possible? Or do I need to change my approach?
I think you can try something like this :
library(DEoptim)
cost_func <- function(Params)
{
X <- Params[1 : (num_items * num_features)]
dim(X) <- c(num_items,num_features)
Theta <- Params[(num_items * num_features + 1) : length(Params)]
dim(Theta) <- c(num_users,num_features)
Y_calc <- X * t(Theta)
if((Y_calc < 0) | (Y_calc > 10))
{
return(10 ^ 30)
}else
{
J <- ((Y_calc - Y_obs) * (Y_calc - Y_obs))
cost <- sum(rowSums(J)[])
return(cost)
}
}
DEoptim(par = Params_init,
fn = cost_func,
lower = min_par,
upper = max_par)
If a parameter set generates a value of Y_calc that is between 0 and 10, the objective function will return a very high value. Hence, the DEoptim algorithm will not consider this set of parameters as a candidate solution.

Unsure what's causing this error in R (pmvt - Package: mvtnorm)?

I have a simple hazard function, the line causing the error is marked.
h <- function(t,u) {
x <- 1 - Sa(t)
y <- 1 - Sm(u)
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
[ERROR LINE] copula <- pmvt(lower=as.numeric(cbind(-9999,-9999)),upper=cbind(invx,invy),df=d1,corr=matrix(cbind(1,d2,d2,1),byrow=T,ncol=2) )
density <- dmvt(cbind(invx,invy),sigma=matrix(cbind(1,d2,d2,1),byrow=T,ncol=2),df=d1)
num <- (sa(t)*sm(u))*density/dt(invx,df=d1)/dt(invy,df=d1)
den <- 1 - x - y + copula
hazard <- num/den
return(hazard)
}
This hazard function is then called on by a likelihood function:
# log Likelihood function for each individual car i
lli <- function(data) {
result <- 0;
# for all claims, evaluate hazard function at that point
if (nrow(data)> 2) {
for (k in 1:nrow(data)) {
if (data[k,3] == 1) {
result <- result + log(h(data[k,2],data[k,1]));
}
}
}
# integrate hazard function over areas between claims
for (k in 1:(nrow(data)-1)) {
integral <- quad2d(h,data[k,2],data[k+1,2],data[k,1],data[k+1,1]);
result <- result - integral;
}
return(result)
}
Now this likelihood function is then called on by a third function to use over my entire data set; however it is the above function that is causing the error, not the function below
# log Likelihood function over all vehicles
ll <- function(x) {
# Unpack parameters
d1 <<- x[1];
d2 <<- x[2];
total <- 0;
# Get log Likelihood for each vehicle
for (i in 1:length(alldata)) {
total <- total + lli(alldata[[i]]);
#print(sprintf("Found candidate solution %d value: %f",i,total));
}
#print(sprintf("Found candidate solution value: %f",total));
if (is.nan(total)) { #If it is undefined, make it a large negative number
total <- -2147483647 ;
}
return(-1*total); # Minimise instead of maximise
}
Error message is as follows:
> ll(cbind(50,0.923))
Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr, :
‘diag(corr)’ and ‘lower’ are of different length
I kept getting this same error when using pmvnorm, and ended up having to use the pbivnorm package to get around this. I can't find an alternative package for the bivariate t distribution though. I don't understand what the problem is. When I call the function h(t,u) by itself it executes without a problem. But when lli(data) calls h(t,u), it doesn't work. What's even more bizarre is that they are the same length.
> length(as.numeric(cbind(-9999,-9999)))
[1] 2
> length(diag(matrix(cbind(1,d2,d2,1),byrow=T,ncol=2)))
[1] 2
I apologize for the messy code. I don't use R much. Anyway this has me completely stumped.
Data file is here: https://files.fm/u/yx9pw2b3
Additional code I forgot to include, basically some constants and marginal CDF functions:
Marginals.R:
p1 <- 0.4994485;
p2 <- 0.2344439;
p3 <- 0.1151654;
p4 <- 0.1509421;
b1 <- 0.7044292
t1 <- 1713.3170267
mu1 <- 7.014415
sig1 <- 1.394735
mu2 <- 6.926146
sig2 <- 1.056647
mu3 <- 6.7995896
sig3 <- 0.7212853
b2 <- 0.6444582
t2 <- 762.9962093
b3 <- 1.494303
t3 <- 410.828780
b1 <- 0.903
t1 <- 864.896
b2 <- 0.9109
t2 <- 314.2946
# Marginal survival distribution and density
Sa <- function(t) {return(exp(-(t / t1) ** b1))}
Sm <- function(u) {return(exp(-(u / t2) ** b2))}
sa <- function(t) {return((t / t1) ** b1 * b1 * exp(-(t / t1) ** b1) / t ) }
sm <- function(u) {return((u / t2) ** b2 * b2 * exp(-(u / t2) ** b2) / u ) }
Summary:
The problem is the difference length between lower and upper when calling pvmt, which upper has a length of 2048, while lower has length of 2.
Reasoning:
1. pmvt checks the coming parameters by calling checkmvArgs in mvtnorm package.
2. In checkmvArgs, lower, upper and mean have been put together by rec <- cbind(lower, upper, mean). Here the new data rec has 2048 row instead of 2.
3. lower is then replace by lower <- rec[, "lower"], which lower now has length 2048 instead of 2.
4. Given corr is still a 2 * 2 matrix, error occurs when checking length(corr) != length(lower)
Solution:
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
upper mean to be a length 2 vector, hence invx and invy needs to be single numbers.
As not sure what's the upper range you want to define, I cannot solve it further. Possible one is :
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
copula <- pmvt(lower=as.numeric(cbind(-9999,-9999)),upper=range(c(invx,invy)),df=d1,corr=matrix(c(1,d2,d2,1),byrow=T,ncol=2) )
Which is using the range of invx and invy as the input. Hence the dmvt would not be affect.
Note:
As value a is not provided, the next line below (calling dmvt) the error line failed.
Edit:
To make the issue more specific:
1. quad2d will generate a Gauss-Legendre Quadrature which will be created by default a length of 32 on a given range. And,
2. Your function h is then called with the x and y from this Gauss-Legendre Quadrature. Hence, the t and u defined in h is not a single mumber, instead, it is a vector.

Trying to use the collin function in the R package FME to identify parameters and then fit them using modFit

So I have a system of ode's and some data I am using the R packages deSolve and FME to fit the parameters of the ode system to data. I am getting a singular matrix result when I fit the full parameter set to the data. So I went back and looked at the collinearity of the parameters using a collinearity index cut-off of 20 as suggested in all the FME package documentation I then picked a few models with subsets of parameters to fit. Then when I run modFit I get this error:
Error in approx(xMod, yMod, xout = xDat) :
need at least two non-NA values to interpolate
Can anyone enlighten me as to a fix for this. Everything else is working fine. So this is not a coding problem.
Here is a minimal working example (removing r=2 in modFit creates the error which I can fix in the minimal working example but not in my actual problem so I doubt a minimal working example helps here):
`## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
# You need these packages
library('deSolve')
library('FME')
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10,r=2), f = Cost))
summary(Fit)`
I think the problem is in your Cost function. If you don't provide both K and r, then the cost function will override the start value of r to NA. You can test this:
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
print(parms)
#out <- ode(x, time, logist, parms)
#return(modCost(out, Ana))
}
Cost(c(K=10, r = 2))
Cost(c(K=10))
This function works:
Cost <- function(P) {
parms[names(P)] <- P
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
The vignette FMEDyna is very helpful: https://cran.r-project.org/web/packages/FME/vignettes/FMEdyna.pdf See page 14 on how to specify the Objective (Cost) function.

Resources