Would nonidentifiability create an inconsistent response from optim in R? - r

My objective is to use a kinetic model to describe reaction data. The application is for a fuel and the model is widely accepted as one of the more accurate ones given the setup of my problem. I may have a nonidentifiability issue, but it bothers me that the response from optim is given such an inconsistent response.
Take the two graphs, , I have picked that point based on its low squared error. The second is what optim selected (I don't have enough rep for picture 2, I will try to post comment, hint hint, its not close to lining up). When I ran the numbers that optim gave me it did not match the expected response. I wish I could paste the exact values, but the optimization takes more than two hours each run so I have been tuning it as much as I can with the time I can get. I can say that R is settling on the boundaries. The bounds are set to physical limits at room temperature one can obtain from the pure compound (i.e. the molarity at room temperature). I can be flexible, but not too much as the point of the project was to limit the model parameters to observed physical parameters.
This is all to prep it for an MCMC to add Bayesian elements to this. If my first guess is junk so is the whole project.
To sum it up I would like to know why the errors are inconsistent and if it is coming from nonidentifiability if improving the initial guess can fix that or if I need to remove a variable.
Code for reference.
Objective function
init = function(param){
#Solve for displacement of triglycerides
T.mcmc2 = T.hat.isf
T.mcmc2 = T.mcmc2 - min(T.mcmc2)
A.mcmc2 = T.mcmc2
A.mcmc2[1] <- (6*1.02*.200)/.250
B.mcmc2 = T.mcmc2
primes <- Bprime(x.fine1, param, T.mcmc2, A.mcmc2, B.mcmc2)
B.mcmc2 <- as.numeric(unlist(primes[1]))
A.mcmc2 <- as.numeric(unlist(primes[2]))
res = t(B.obs-B.mcmc2[x.points])%*%(B.obs-B.mcmc2[x.points])
#print(res)
return (res)
}
Optimization with parameters
l = c(1e-8,1e-8, 1e-8, 1e-8)
u = c(2,1.2,24,24)
th0=c(.052, 0.19, .50, 8)
op = optim(th0[1:3], init, method="L-BFGS-B", lower=l, upper = u)
Once run, this message often occurs "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Related

How to estimate a parameter in R- sample with replace

I have a txt file with numbers that looks like this(but with 100 numbers) -
[1] 7.1652348 5.6665965 4.4757553 4.8497086 15.2276296 -0.5730937
[7] 4.9798067 2.7396933 5.1468304 10.1221489 9.0165661 65.7118194
[13] 5.5205704 6.3067488 8.6777177 5.2528503 3.5039562 4.2477401
[19] 11.4137624 -48.1722034 -0.3764006 5.7647536 -27.3533138 4.0968204
I need to estimate MLE theta parameter from this distrubution -
[![this is my distrubution ][1]][1]
and I need to estimate theta from a sample of 1000 observations with replace, and save the sample, and do a hist.
How can I estimate theta from my sample? I have no information about normal distrubation.
I wrote something like this -
data<-read.table(file.choose(), header = TRUE, sep= "")
B <- 1000
sample.means <- numeric(data)
sample.sd <- numeric(data)
for (i in 1:B) {
MySample <- sample(data, length(data), replace = TRUE)
sample.means <- c(sample.means,mean(MySample))
sample.sd <- c(sample.sd,sd(MySample))
}
sd(sample.sd)
but it doesn't work..
This question incorporates multiple different ones, so let's tackle each step by step.
First, you will need to draw a random sample from your population (with replacement). Assuming your 100 population-observations sit in a vector named pop.
rs <- sample(pop, 1000, replace = True)
gives you your vector of random samples. If you wanna save it, you can write it to your disk in multiple formats, so I'll just suggest a few related questions (How to Export/Import Vectors in R?).
In a second step, you can use the mle()-function of the stats4-package (https://stat.ethz.ch/R-manual/R-devel/library/stats4/html/mle.html) and specify the objective function explicitly.
However, the second part of your question is more of a statistical/conceptual question than R related, IMO.
Try to understand what MLE actually does. You do not need normally distributed variables. The idea behind MLE is to choose theta in such a way, that under the resulting distribution the random sample is the most probable. Check https://en.wikipedia.org/wiki/Maximum_likelihood_estimation for more details or some youtube videos, if you'd like a more intuitive approach.
I assume, in the description of your task, it is stated that f(x|theta) is the conditional joint density function and that the observations x are iir?
What you wanna do in this case, is to select theta such that the squared difference between the observation x and the parameter theta is minimized.
For your statistical understanding, in such cases, it makes sense to perform log-linearization on the equation, instead of dealing with a non-linear function.
Minimizing the squared difference is equivalent to maximizing the log-transformed function since the sum is negative (<=> the product was in the denominator) and the log, as well as the +1 are solely linear transformations.
This leaves you with the maximization problem:
And the first-order condition:
Obviously, you would also have to check that you are actually dealing with a maximum via the second-order condition but I'll omit that at this stage for simplicity.
The algorithm in R does nothing else than solving this maximization problem.
Hope this helps for your understanding. Maybe some smarter people can give some additional input.

Fitting Exponential Distribution to Task Duration Counts

In my dataset, I have ants that switch between one state (in this case a resting state) and all other states over a period of time. I am attempting to fit an exponential distribution to the number of times an ant spends in a resting state for some duration of time (for instance, the ant may rest for 5 seconds 10 times, or it could rest for 6 seconds 5 times, etc.). While subjectively this distribution of durations seems to be exponential, I can't fit a single parameter exponential distribution (where the one parameter is rate) to the data. Is this possible to do with my dataset, or do I need to use a two parameter exponential distribution?
I am attempting to fit the data to the following equation (where lambda is rate):
lambda * exp(-lambda * x).
This, however, doesn't seem to be mathematically possible to fit to either the counts of my data or the probability density of my data. In R I attempt to fit the data with the following code:
fit = nls(newdata$x.counts ~ (b*exp(b*newdata$x.mids)), start =
list(x.counts = 1, x.mids = 1, b = 1))
When I do this, though, I get the following message:
Error in parse(text= x, keep.source = FALSE):
<text>:2:0: unexpected end of input
1: ~
^
I believe I am getting this because its mathematically impossible to fit this particular equation to my data. Am I correct in this, or is there a way to transform the data or alter the equation so I can make it fit? I can also make it fit with the equation lambda * exp(mu * x) where mu is another free parameter, but my goal is to make this equation as simple as possible, so I would prefer to use the one parameter version.
Here is the data, as I can't seem to find a way to attach it as a csv:
https://docs.google.com/spreadsheets/d/1euqdgHfHoDmQKXHrtOLcn5x5o81zY1sr9Kq6NCbisYE/edit?usp=sharing
First, you have a typo in your formula, you forgot the - sign in
(b*exp(b*newdata$x.mids))
But this is not what is throwing the error. The start parameter should be a list that initializes only the parameter value, not x.counts nor x.mids.
So the correct version would be:
fit = nls(newdata$x.counts ~ b*exp(-b*newdata$x.mids), start = list(b = 1))

Brent's method in the R optim implementation always returns the same local minimum

I'm trying to minimise the function shown above. I'm searching between (-1,1). I use the following code
optim(runif(1,min=-1,max=+1), ..., method = "Brent", lower = -1.0, upper = 1.0)
and I've noticed that it always returns a value of x = -0.73 instead of the correct x = 0.88 answer. The reason is given in the optimise help page:
The first evaluation of f is always at x_1 = a + (1-φ)(b-a) where (a,b) = (lower, upper) and phi = (sqrt(5) - 1)/2 = 0.61803.. is the golden section ratio. Almost always, the second evaluation is at x_2 = a + phi(b-a). Note that a local minimum inside [x_1,x_2] will be found as solution, even when f is constant in there, see the last example.
I'm curious if there is anyway to use Brent's method without hitting the same local minimum each time.
Changing method to "L-BFGS-B" works better (a random local minimum is returned each time):
optim(runif(1,min=-1,max=+1), ..., method = "L-BFGS-B", lower = -1.0, upper = 1.0)
Your function is NOT convex, therefore you will have multiple local/global minima or maxima. For your function I would run a non traditional/ derivative free global optimizer like simulated annealing or genetic algorithm and use the output as a starting point for BFGS or any other local optimizers to get a precise solution. Repeat the above step multiple times you will find all the global and local optimum points.

Linear Dependence errors in DiceKriging with linearly independent data

The Problem (with minimal working example)
Dicekriging gives linear dependence errors about half the time when data is close to being linearly dependent. This can be seen by the below example which gave an error about half the time when I ran it on both an Ubuntu and windows computer. This occurs when I run it with either genetic or BFGS optimisation.
install.packages("DiceKriging")
library(DiceKriging)
x = data.frame(Param1 = c(0,1,2,2,2,2), Param2 = c(2,0,1,1e-7,0,2))
y = 1:6
duplicated(cbind(x,y))
Model = km( design = x, response = y , optim.method = "gen", control = list(maxit = 1e4), coef.cov = 1)
Model = km( design = x, response = y , optim.method = "BFGS", control = list(maxit = 1e4), coef.cov = 1)
When the data is a a little more dispersed no such errors occur.
# No problems occur if the data is more dispersed.
x = data.frame(Param1 = c(0,1,2,2,2,2), Param2 = c(2,0,1,1e-2,0,2))
y = 1:6
duplicated(cbind(x,y))
Model = km( design = x, response = y , optim.method = "gen", control = list(maxit = 1e4), coef.cov = 1)
Why this is a problem
Using Kriging for optimization of expensive models means that points near the optima will be densely sampled. It is not possible to do this with this error occuring. In addition the closeness points need to be to get this error can be closer than the 1e-7 above when there are multiple parameters that are all close. I got errors (in my actual problem, not MWE above) when the 4 coordinates of a point were around 1e-3 apart from another point and this problem occurred.
Related Questions
There are not many DiceKriging questions on stack overflow. The closest question is this one (from the Kriging package ) in which the problem is genuine linear dependence. Note the Kriging package is not a substitute for DiceKriging in that it is restricted to 2 dimensions.
Desired Solution
I would like either:
A way to change my km call to avoid this problem (preferred)
A way to determine when this problem will occur so that I can drop observations that are too close to each other for the kriging call.
Your problem is not a software problem. It's rather a mathematical one.
Your first data contains the two following points (0 , 2) and (1e-7, 2) that are very very close but correspond to (very) different outputs: 4 and 5. Therefore, you are trying to adjust a Kriging model, which is an interpolation model, to a chaotic response. This cannot work. Kriging/Gaussian process modelling is not the good tool if your response varies a lot between points which are close.
However, when you are optimizing expensive models, things are not like on your example. There is not such a difference in the response for very close input points.
But, there could be indeed numerical problem if your points are very close.
In order to soften these numerical errors, you can add a nugget effect. The nugget is a small constant variance added to the diagonal of the covariance matrix, which allows the points not to be exactly interpolated. Your kriging approximation curve is not forced to pass exactly through the learning points. Thus, the kriging model becomes a regression model.
In DiceKriging, the nugget can be added in two ways. First, you can choose a priori a value and add it "manually" by setting km(..., nugget=you_value), or you can ask km to learn it at the same time it learns the parameters of the covariance function, by setting km(..., nugget.estim=TRUE). I advise you to use the second in general.
Your small example becomes then:
Model = km( design = x, response = y , optim.method = "BFGS", control = list(maxit = 1e4),
coef.cov = 1, nugget.estim=TRUE)
Extract from the DiceKriging documentation:
nugget: an optional variance value standing for the homogeneous nugget
effect.
nugget.estim: an optional boolean indicating whether the nugget
effect should be estimated. Note that this option does not concern the
case of heterogeneous noisy observations (see noise.var below). If
nugget is given, it is used as an initial value. Default is FALSE.
PS: The choice of the covariance kernel can be important in some applications. When the function to approximate is quite rough, the exponential kernel (set covtype="exp" in km) can be a good choice. See the book of Rasmussen and Williams for more information (freely and legaly available at http://www.gaussianprocess.org/)

numerical integration of a tricky function

The prob package numerically evaluates characteristic functions for base R distributions. For almost all distributions there are existing formulas. For a few cases, though, no closed-form solution is known. Case in point: the Weibull distribution (but see below).
For the Weibull characteristic function I essentially compute two integrals and put them together:
fr <- function(x) cos(t * x) * dweibull(x, shape, scale)
fi <- function(x) sin(t * x) * dweibull(x, shape, scale)
Rp <- integrate(fr, lower = 0, upper = Inf)$value
Ip <- integrate(fi, lower = 0, upper = Inf)$value
Rp + (0+1i) * Ip
Yes, it's clumsy, but it works surprisingly well! ...ahem, most of the time. A user reported recently that the following breaks:
cfweibull(56, shape = 0.5, scale = 1)
Error in integrate(fr, lower = 0, upper = Inf) :
the integral is probably divergent
Now, we know that the integral isn't divergent, so it must be a numerical problem. With some fiddling I could get the following to work:
fr <- function(x) cos(56 * x) * dweibull(x, 0.5, 1)
integrate(fr, lower = 0.00001, upper = Inf, subdivisions=1e7)$value
[1] 0.08024055
That's OK, but it isn't quite right, plus it takes a fair bit of fiddling which doesn't scale well. I've been investigating this for a better solution. I found a recently published "closed-form" for the characteristic function with scale > 1 (see here), but it involves Wright's generalized confluent hypergeometric function which isn't implemented in R (yet). I looked into the archives for integrate alternatives, and there's a ton of stuff out there which doesn't seem very well organized.
As part of that searching it occurred to me to translate the region of integration to a finite interval via the inverse tangent, and voila! Check it out:
cfweibull3 <- function (t, shape, scale = 1){
if (shape <= 0 || scale <= 0)
stop("shape and scale must be positive")
fr <- function(x) cos(t * tan(x)) * dweibull(tan(x), shape, scale)/(cos(x))^2
fi <- function(x) sin(t * tan(x)) * dweibull(tan(x), shape, scale)/(cos(x))^2
Rp <- integrate(fr, lower = 0, upper = pi/2, stop.on.error = FALSE)$value
Ip <- integrate(fi, lower = 0, upper = pi/2, stop.on.error = FALSE)$value
Rp + (0+1i) * Ip
}
> cfweibull3(56, shape=0.5, scale = 1)
[1] 0.08297194+0.07528834i
Questions:
Can you do better than this?
Is there something about numerical integration routines that people who are expert about such things could shed some light on what's happening here? I have a sneaking suspicion that for large t the cosine fluctuates rapidly which causes problems...?
Are there existing R routines/packages which are better suited for this type of problem, and could somebody point me to a well-placed position (on the mountain) to start the climb?
Comments:
Yes, it is bad practice to use t as a function argument.
I calculated the exact answer for shape > 1 using the published result with Maple, and the brute-force-integrate-by-the-definition-with-R kicked Maple's ass. That is, I get the same answer (up to numerical precision) in a small fraction of a second and an even smaller fraction of the price.
Edit:
I was going to write down the exact integrals I'm looking for but it seems this particular site doesn't support MathJAX so I'll give links instead. I'm looking to numerically evaluate the characteristic function of the Weibull distribution for reasonable inputs t (whatever that means). The value is a complex number but we can split it into its real and imaginary parts and that's what I was calling Rp and Ip above.
One final comment: Wikipedia has a formula listed (an infinite series) for the Weibull c.f. and that formula matches the one proved in the paper I referenced above, however, that series has only been proved to hold for shape > 1. The case 0 < shape < 1 is still an open problem; see the paper for details.
You may be interested to look at this paper, which discuss different integration methods for highly oscillating integrals -- that's what you are essentially trying to compute:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.6944
Also, another possible advice, is that instead of infinite limit you may want to specify a smaller one, because if you specify the precision that you want, then based on the cdf of the weibull you can easily estimate how much of the tail you can truncate. And if you have a fixed limit, then you can specify exactly (or almost) the number of subdivisions (e.g. in order to have a few(4-8) points per period).
I had the same problem than Jay - not with the Weibull distribution but with the integrate function. I found my answer to Jay's question 3 in a comment to this question:
Divergent Integral in R is solvable in Wolfram
The R package pracma contains several functions for solving integrals numerically. In the package, one finds some R functions for integrating certain mathematical functions. And there is a more general function integral. That helped in my case. Example code is given below.
To questions 2: The first answer to the linked question (above) states that not the complete error message of the C source file is printed out by R (The function may just converge too slowly). Therefore, I would agree with Jay that the fast fluctuation of the cosine may be a problem. In my case and in the example below it was the problem.
Example Code
# load Practical Numerical Math Functions package
library(pracma)
# define function
testfun <- function(r) cos(r*10^6)*exp(-r)
# Integrate it numerically with the basic 'integrate'.
out1 = integarte(testfun, 0, 100)
# "Error in integrate(testfun, 0, 100) : the integral is probably divergent"
# Integrate it numerically with 'integral' from 'pracma' package
# using 'Gauss-Kronrod' method and 10^-8 as relative tolerance. I
# did not try the other ones.
out2 = integral(testfun, 0, 100, method = 'Kronrod', reltol = 1e-8)
Two remarks
The integral function does not break as the integrate function does but it may take quite a long time to run. I do not know (and I did not try) whether the user can limit the number of iterations (?).
Even if the integral function finalises without errors I am not sure how correct the result is. Numerically integrating a function which is fast fluctuating around zero seems to be quite tricky since one does not know where exactly values on the fluctuating function are calculated (twice as much positive than negative values; positive values close to local maxima and negative values far off). I am not on expert on numeric integration but I just got to know some basic fixed-step integration methods in my numerics lectures. So maybe the adaptive methods used in integral deal with this problem in some way.
I'm attempting to answer questions 1 & 3. That being said I am not contributing any original code. I did a google search and hopefully this is helpful. Good luck!
Source:http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf (p.6)
#Script
library(ggplot2)
## sampling from a Weibull distribution with parameters shape=2.1 and scale=1.1
x.wei<-rweibull(n=200,shape=2.1,scale=1.1)
#Weibull population with known paramters shape=2 e scale=1
x.teo<-rweibull(n=200,shape=2, scale=1) ## theorical quantiles from a
#Figure
qqplot(x.teo,x.wei,main="QQ-plot distr. Weibull") ## QQ-plot
abline(0,1) ## a 45-degree reference line is plotted
Is this of any use?
http://www.sciencedirect.com/science/article/pii/S0378383907000452
Muraleedharana et al (2007) Modified Weibull distribution for maximum and significant wave height simulation and prediction, Coastal Engineering, Volume 54, Issue 8, August 2007, Pages 630–638
From the abstract: "The characteristic function of the Weibull distribution is derived."

Resources