I'm working on a problem where a parameter is estimated through minimizing the sum of squares. The equations needed are:
I used optim in the package stats:
# provide the values for a test dataset (the y estimated should be 1.41)
pvector <- c(0.0036,0.0156,0.0204,0.0325,0.1096,0.1446,0.1843,0.4518)
zobs <- c(0.0971,0.0914,0.1629,0.1623,0.3840,0.5155,0.3648,0.6639)
# make input of the C value
c <- function(y){
gamma(y)/((gamma(y*(1-pvector)))*(gamma(y*pvector)))
}
# make input of the gamma function
F1 <- function(y){
f1 <- function(x){
c*(1-x)^(y*(1-pvector)-1)*x^(y*pvector-1)
}
return (f1)
}
# integration over x
int <- function(y){
integrate (F1(y),lower =0.001, upper =1)
}
# write the function for minimization
f2 <- function(y) {
sum ((int-zobs)^2)
}
# minimization
optim(0.01,f2, method = "Brent", lower =0, upper = 1000, hessian=TRUE)
Which didn't work. I received the following error message:
Error in int - zobs : non-numeric argument to binary operator
I think there must be something fundamentally wrong with the way how the function was written.
Related
I'm trying to compute a kind of Gini index using a generated dataset.
But, I got a problem in the last integrate function.
If I try to integrate the function named f1,
R says
Error in integrate(Q, 0, p) : length(upper) == 1 is not TRUE
My code is
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integrate(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- G(p)$value
dino <- G(1)$value
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
integrate(f1,0,1) # In this integration, the aforementioned error appears
I think, the repeated integrate could make a problem but I have no idea what is the exact problem.
Please help me!
As mentioned by #John Coleman, integrate needs to have a vectorized function and a proper subdivisions option to fulfill the integral task. Even if you have already provided a vectorized function for integral, it is sometimes tricky to properly set the subdivisions in integrate(...,subdivisions = ).
To address your problem, I recommend integral from package pracma, where you still a vectorized function for integral (see what I have done to functions G and L), but no need to set subdivisions manually, i.e.,
library(pracma)
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integral(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- Vectorize(G)(p)
dino <- G(1)
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
res <- integral(f1,0,1)
then you will get
> res
[1] 0.1283569
The error that you reported is due to the fact that the function in integrate must be vectorized and integrate itself isn't vectorized.
From the help (?integrate):
f must accept a vector of inputs and produce a vector of function
evaluations at those points. The Vectorize function may be helpful to
convert f to this form.
Thus one "fix" is to replace your definition of f1 by:
f1 <- Vectorize(function(p) {
((1-p)^(d-2))*L(p)
})
But when I run the resulting code I always get:
Error in integrate(Q, 0, p) : maximum number of subdivisions reached
A solution might be to assemble a large number of quantiles and then smooth it out and use that rather than your Q, although the error here strikes me as odd.
I am trying to estimate the below log function using maximum likelihood method in R, but I get the following error:
Error in optim(start, f, method = method, hessian = TRUE, ...) : objective function in optim evaluates to length 10 not 1
My attempt was as follows:
Generating data
set.seed(101)
n <- 10
u <- runif(n)
theta1 <- 1
lamba1 <- 0.5
Generating PTIR data using quantile function
x <- function(u, theta1, lamba1) {
(-theta1/(log((1+lamba1)-sqrt((1+lamba1)^2-(4*lamba1*u)))/(2*lamba1)))^(1/(2))
}
x <- x(u = u, theta1 = theta1, lamba1 = lamba1)
Declaring the Log-Likelihood function
LL <- function(theta, lamba) {
R = suppressWarnings((n*log(2))+
(n*log(theta))-(((2)+1)*sum(log(x)))-
(sum(theta/(x^(2))))+
(log(1+lamba-(2*lamba*exp(-theta/(x^(2)))))))
return(-R)
}
mle(LL, start = list(theta = 5, lamba=0.5))
Any advice would be greatly appreciated.
I don't know how to fix your problem, but hopefully I can help you diagnose it. As #KonradRudolph suggests in comments, This may be a case where the usual advice "add more parentheses if you're not sure" may do more harm than good ... I've rewritten your function in a way that matches what you've got above, but has fewer parentheses and more consistent line breaking/indentation. Every line below is a separate additive term. Your specific problem is that the last term involves x (which has length 10 in this case), but is not summed, so the return value ends up being a length-10 vector.
LL2 <- function(theta, lambda) {
R <- n*log(2)+
n*log(theta)-
((2)+1)*sum(log(x))-
sum(theta/(x^2))+
log(1+lambda-(2*lambda*exp(-theta/x^2)))
return(-R)
}
all.equal(LL(1,1),LL2(1,1)) ## TRUE
length(LL2(1,1)) ## 10
Many libraries are available in R to perform minimisation. However, all the ones I could find (e.g. rcgmin, or optimx) only allow lower and upper bounds on the input parameters:
opt_Params <- Rcgmin(par = Params_init,
fn = cost_func,
gr = params_grad,
lower = min_par,
upper = max_par
)
I'm looking for something different: boundaries not on the input parameters, but on the values of a function that takes them.
Concretely, my cost_func is a cost function that measures the sum of the squared residuals between my (fixed) observed data Y_obs and the prediction from my fitted parameters Y_calc:
cost_func <- function(Params) {
X <- Params[1:(num_items*num_features)]
dim(X) <- c(num_items,num_features)
Theta <- Params[(num_items*num_features+1):length(Params)]
dim(Theta) <- c(num_users,num_features)
Y_calc <- X * t(Theta)
J <- ((Y_calc - Y_obs) * (Y_calc - Y_obs))
cost <- sum(rowSums(J)[])
return(cost)
}
Minimising the cost function, I can ensure that my predicted Y_calc get ever closer to Y_obs.
This however allows for arbitrary values in Y_calc. What I would like to do is to impose the same boundaries as I know to be present in Y_obs (between 0 and 10 - this is a collaborative filtering algorithm).
So I'm not trying to put constraints on my input parameters, but on a function of them (namely, on every element of Y_calc <- X * t(Theta)).
Is there an R minimisation library in which this is possible? Or do I need to change my approach?
I think you can try something like this :
library(DEoptim)
cost_func <- function(Params)
{
X <- Params[1 : (num_items * num_features)]
dim(X) <- c(num_items,num_features)
Theta <- Params[(num_items * num_features + 1) : length(Params)]
dim(Theta) <- c(num_users,num_features)
Y_calc <- X * t(Theta)
if((Y_calc < 0) | (Y_calc > 10))
{
return(10 ^ 30)
}else
{
J <- ((Y_calc - Y_obs) * (Y_calc - Y_obs))
cost <- sum(rowSums(J)[])
return(cost)
}
}
DEoptim(par = Params_init,
fn = cost_func,
lower = min_par,
upper = max_par)
If a parameter set generates a value of Y_calc that is between 0 and 10, the objective function will return a very high value. Hence, the DEoptim algorithm will not consider this set of parameters as a candidate solution.
I have a simple hazard function, the line causing the error is marked.
h <- function(t,u) {
x <- 1 - Sa(t)
y <- 1 - Sm(u)
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
[ERROR LINE] copula <- pmvt(lower=as.numeric(cbind(-9999,-9999)),upper=cbind(invx,invy),df=d1,corr=matrix(cbind(1,d2,d2,1),byrow=T,ncol=2) )
density <- dmvt(cbind(invx,invy),sigma=matrix(cbind(1,d2,d2,1),byrow=T,ncol=2),df=d1)
num <- (sa(t)*sm(u))*density/dt(invx,df=d1)/dt(invy,df=d1)
den <- 1 - x - y + copula
hazard <- num/den
return(hazard)
}
This hazard function is then called on by a likelihood function:
# log Likelihood function for each individual car i
lli <- function(data) {
result <- 0;
# for all claims, evaluate hazard function at that point
if (nrow(data)> 2) {
for (k in 1:nrow(data)) {
if (data[k,3] == 1) {
result <- result + log(h(data[k,2],data[k,1]));
}
}
}
# integrate hazard function over areas between claims
for (k in 1:(nrow(data)-1)) {
integral <- quad2d(h,data[k,2],data[k+1,2],data[k,1],data[k+1,1]);
result <- result - integral;
}
return(result)
}
Now this likelihood function is then called on by a third function to use over my entire data set; however it is the above function that is causing the error, not the function below
# log Likelihood function over all vehicles
ll <- function(x) {
# Unpack parameters
d1 <<- x[1];
d2 <<- x[2];
total <- 0;
# Get log Likelihood for each vehicle
for (i in 1:length(alldata)) {
total <- total + lli(alldata[[i]]);
#print(sprintf("Found candidate solution %d value: %f",i,total));
}
#print(sprintf("Found candidate solution value: %f",total));
if (is.nan(total)) { #If it is undefined, make it a large negative number
total <- -2147483647 ;
}
return(-1*total); # Minimise instead of maximise
}
Error message is as follows:
> ll(cbind(50,0.923))
Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr, :
‘diag(corr)’ and ‘lower’ are of different length
I kept getting this same error when using pmvnorm, and ended up having to use the pbivnorm package to get around this. I can't find an alternative package for the bivariate t distribution though. I don't understand what the problem is. When I call the function h(t,u) by itself it executes without a problem. But when lli(data) calls h(t,u), it doesn't work. What's even more bizarre is that they are the same length.
> length(as.numeric(cbind(-9999,-9999)))
[1] 2
> length(diag(matrix(cbind(1,d2,d2,1),byrow=T,ncol=2)))
[1] 2
I apologize for the messy code. I don't use R much. Anyway this has me completely stumped.
Data file is here: https://files.fm/u/yx9pw2b3
Additional code I forgot to include, basically some constants and marginal CDF functions:
Marginals.R:
p1 <- 0.4994485;
p2 <- 0.2344439;
p3 <- 0.1151654;
p4 <- 0.1509421;
b1 <- 0.7044292
t1 <- 1713.3170267
mu1 <- 7.014415
sig1 <- 1.394735
mu2 <- 6.926146
sig2 <- 1.056647
mu3 <- 6.7995896
sig3 <- 0.7212853
b2 <- 0.6444582
t2 <- 762.9962093
b3 <- 1.494303
t3 <- 410.828780
b1 <- 0.903
t1 <- 864.896
b2 <- 0.9109
t2 <- 314.2946
# Marginal survival distribution and density
Sa <- function(t) {return(exp(-(t / t1) ** b1))}
Sm <- function(u) {return(exp(-(u / t2) ** b2))}
sa <- function(t) {return((t / t1) ** b1 * b1 * exp(-(t / t1) ** b1) / t ) }
sm <- function(u) {return((u / t2) ** b2 * b2 * exp(-(u / t2) ** b2) / u ) }
Summary:
The problem is the difference length between lower and upper when calling pvmt, which upper has a length of 2048, while lower has length of 2.
Reasoning:
1. pmvt checks the coming parameters by calling checkmvArgs in mvtnorm package.
2. In checkmvArgs, lower, upper and mean have been put together by rec <- cbind(lower, upper, mean). Here the new data rec has 2048 row instead of 2.
3. lower is then replace by lower <- rec[, "lower"], which lower now has length 2048 instead of 2.
4. Given corr is still a 2 * 2 matrix, error occurs when checking length(corr) != length(lower)
Solution:
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
upper mean to be a length 2 vector, hence invx and invy needs to be single numbers.
As not sure what's the upper range you want to define, I cannot solve it further. Possible one is :
invx <- as.numeric(qt(x,df=d1))
invy <- as.numeric(qt(x,df=d1))
copula <- pmvt(lower=as.numeric(cbind(-9999,-9999)),upper=range(c(invx,invy)),df=d1,corr=matrix(c(1,d2,d2,1),byrow=T,ncol=2) )
Which is using the range of invx and invy as the input. Hence the dmvt would not be affect.
Note:
As value a is not provided, the next line below (calling dmvt) the error line failed.
Edit:
To make the issue more specific:
1. quad2d will generate a Gauss-Legendre Quadrature which will be created by default a length of 32 on a given range. And,
2. Your function h is then called with the x and y from this Gauss-Legendre Quadrature. Hence, the t and u defined in h is not a single mumber, instead, it is a vector.
So I have a system of ode's and some data I am using the R packages deSolve and FME to fit the parameters of the ode system to data. I am getting a singular matrix result when I fit the full parameter set to the data. So I went back and looked at the collinearity of the parameters using a collinearity index cut-off of 20 as suggested in all the FME package documentation I then picked a few models with subsets of parameters to fit. Then when I run modFit I get this error:
Error in approx(xMod, yMod, xout = xDat) :
need at least two non-NA values to interpolate
Can anyone enlighten me as to a fix for this. Everything else is working fine. So this is not a coding problem.
Here is a minimal working example (removing r=2 in modFit creates the error which I can fix in the minimal working example but not in my actual problem so I doubt a minimal working example helps here):
`## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
# You need these packages
library('deSolve')
library('FME')
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10,r=2), f = Cost))
summary(Fit)`
I think the problem is in your Cost function. If you don't provide both K and r, then the cost function will override the start value of r to NA. You can test this:
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
print(parms)
#out <- ode(x, time, logist, parms)
#return(modCost(out, Ana))
}
Cost(c(K=10, r = 2))
Cost(c(K=10))
This function works:
Cost <- function(P) {
parms[names(P)] <- P
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
The vignette FMEDyna is very helpful: https://cran.r-project.org/web/packages/FME/vignettes/FMEdyna.pdf See page 14 on how to specify the Objective (Cost) function.