How to solve this "non-finite function value" in R? - r

I am using R to calculate a nested functions like this:
C1_B <- function(T){integrate(function(tau)f(tau),lower=0.01*T,upper=0.99*T)$value}
f <- function(tau) {integrate(function(tau1)sqrt(1/(tau-tau1)),lower=0.01*tau,upper=0.99*tau)$value}
C1_B(0.5)
However, I receive a message like
"Error in integrate(function(tau1) sqrt(1/(tau - tau1)), lower = 0.01
* : non-finite function value
In addition: Warning message:**
In sqrt(1/(tau - tau1)) : NaNs produced"
I guess the problem is about the "(tau-tau1)" in my code; but from the
integral domain I defined ("lower=0.01*tau,upper=0.99*tau"), (tau-tau1) could not be equal to zero.
Could any body please tell me how can I solve this problem?

I gave it a try - the problem is that integrate expects the handed over function to be able to deal with input vectors and output a vector of same size.
Luckily the solution is easy - just wrap your function in sapply.
The following code works:
f <- function(tau) {integrate(function(tau1)sqrt(1/(tau-tau1)),lower=0.01*tau,upper=0.99*tau)$value}
intfun <- function(x) sapply(x,f)
C1_B <- function(T){integrate(function(tau) intfun(tau),lower=0.01*T,upper=0.99*T)$value}
C1_B(0.5)

There exists an exact solution to your integral f. However the value I get does not agree with this numeric approximation. I would say the integral of
d(tau1)/sqrt(tau - tau1)
is
-2 * sqrt(tau - tau1)
With you upper bound of 0.99*tau and you lower bound of 0.01*tau you get
-2 * (sqrt(tau - 0.99 * tau) - sqrt(tau - 0.01 * tau)) =
-2 * sqrt(tau) * (sqrt(0.01) - sqrt(0.99))
The integration of that for tau can again be solved exactly. It yields
-(4/3)(sqrt(0.01) - sqrt(0.99)) * tau^(3/2)
Edit: With your given boundaries 0.01*T and 0.99*T the final resulting solution is
-(4/3)(sqrt(0.01)-sqrt(0.99)) * ((0.99 * T)^3/2 - (0.01 * T)^3/2)
You can use integrate on the first exact integration result (for f). No error are produced. The errors you report are probably due to the method of approximation. Maybe you could try another integration function that uses another approximation. The exact solution of the function f matches the integral calculated in your program.
When you use integrate to integrate the exact result for f the results are equal to the exact final solution I gave.

Related

Optimize within for loop cannot find function

I've got a function, KozakTaper, that returns the diameter of a tree trunk at a given height (DHT). There's no algebraic way to rearrange the original taper equation to return DHT at a given diameter (4 inches, for my purposes)...enter R! (using 3.4.3 on Windows 10)
My approach was to use a for loop to iterate likely values of DHT (25-100% of total tree height, HT), and then use optimize to choose the one that returns a diameter closest to 4". Too bad I get the error message Error in f(arg, ...) : could not find function "f".
Here's a shortened definition of KozakTaper along with my best attempt so far.
KozakTaper=function(Bark,SPP,DHT,DBH,HT,Planted){
if(Bark=='ob' & SPP=='AB'){
a0_tap=1.0693567631
a1_tap=0.9975021951
a2_tap=-0.01282775
b1_tap=0.3921013594
b2_tap=-1.054622304
b3_tap=0.7758393514
b4_tap=4.1034897617
b5_tap=0.1185960455
b6_tap=-1.080697381
b7_tap=0}
else if(Bark=='ob' & SPP=='RS'){
a0_tap=0.8758
a1_tap=0.992
a2_tap=0.0633
b1_tap=0.4128
b2_tap=-0.6877
b3_tap=0.4413
b4_tap=1.1818
b5_tap=0.1131
b6_tap=-0.4356
b7_tap=0.1042}
else{
a0_tap=1.1263776728
a1_tap=0.9485083275
a2_tap=0.0371321602
b1_tap=0.7662525552
b2_tap=-0.028147685
b3_tap=0.2334044323
b4_tap=4.8569609081
b5_tap=0.0753180483
b6_tap=-0.205052535
b7_tap=0}
p = 1.3/HT
z = DHT/HT
Xi = (1 - z^(1/3))/(1 - p^(1/3))
Qi = 1 - z^(1/3)
y = (a0_tap * (DBH^a1_tap) * (HT^a2_tap)) * Xi^(b1_tap * z^4 + b2_tap * (exp(-DBH/HT)) +
b3_tap * Xi^0.1 + b4_tap * (1/DBH) + b5_tap * HT^Qi + b6_tap * Xi + b7_tap*Planted)
return(y=round(y,4))}
HT <- .3048*85 #converting from english to metric (sorry, it's forestry)
for (i in c((HT*.25):(HT+1))) {
d <- KozakTaper(Bark='ob',SPP='RS',DHT=i,DBH=2.54*19,HT=.3048*85,Planted=0)
frame <- na.omit(d)
optimize(f=abs(10.16-d), interval=frame, lower=1, upper=90,
maximum = FALSE,
tol = .Machine$double.eps^0.25)
}
Eventually I would like this code to iterate through a csv and return i for the best d, which will require some rearranging, but I figured I should make it work for one tree first.
When I print d I get multiple values, so it is iterating through i, but it gets held up at the optimize function.
Defining frame was my most recent tactic, because d returns one NaN at the end, but it may not be the best input for interval. I've tried interval=c((HT*.25):(HT+1)), defining KozakTaper within the for loop, and defining f prior to the optimize, but I get the same error. Suggestions for what part I should target (or other approaches) are appreciated!
-KB
Forestry Research Fellow, Appalachian Mountain Club.
MS, University of Maine
**Edit with a follow-up question:
I'm now trying to run this script for each row of a csv, "Input." The row contains the values for KozakTaper, and I've called them with this:
Input=read.csv...
Input$Opt=0
o <- optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='Input$Species',
DHT=x,
DBH=(2.54*Input$DBH),
HT=(.3048*Input$Ht),
Planted=0)),
lower=Input$Ht*.25, upper=Input$Ht+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
Input$Opt <- o$minimum
Input$Mht <- Input$Opt/.3048. # converting back to English
Input$Ht and Input$DBH are numeric; Input$Species is factor.
However, I get the error invalid function value in 'optimize'. I get it whether I define "o" or just run optimize. Oddly, when I don't call values from the row but instead use the code from the answer, it tells me object 'HT' not found. I have the awful feeling this is due to some obvious/careless error on my part, but I'm not finding posts about this error with optimize. If you notice what I've done wrong, your explanation will be appreciated!
I'm not an expert on optimize, but I see three issues: 1) your call to KozakTaper does not iterate through the range you specify in the loop. 2) KozakTaper returns a a single number not a vector. 3) You haven't given optimize a function but an expression.
So what is happening is that you are not giving optimize anything to iterate over.
All you should need is this:
optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='RS',
DHT=x,
DBH=2.54*19,
HT=.3048*85,
Planted=0)),
lower=HT*.25, upper=HT+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
$minimum
[1] 22.67713 ##Hopefully this is the right answer
$objective
[1] 0
Optimize will now substitute x in from lower to higher, trying to minimize the difference

R: How to add additional constraints to DEoptim

I am trying to minimize an objective function using DEoptim, subject to a simple constraint. I am not clear as to how to add the simple constraint to the call to DEoptim. Here is the objective function:
obj_min <- function(n,in_data) {
gamma <- in_data$Gamma
delta <- in_data$Delta
theta <- in_data$Theta
gammaSum <- sum(n * gamma)
deltaSum <- sum(n * delta)
thetaSum <- sum(n * theta)
abs((EPC * gammaSum - 2 * abs(deltaSum)) / thetaSum )
}
My mapping function (to impose integer constraints) is as follows:
mappingFun <- function(x) {
x[1:length(x)] <- round(x[1:length(x)], 0)
}
My call to DEoptim is:
out <- DEoptim(DTRRR_min, lower = c(rep(-5, length(in_data[, 1]))),
upper = c(rep(5, length(in_data[, 1]))),
fnMap = mappingFun, DEoptim.control(trace = F),in_data)
My in_data object (data frame) is:
Underlying.Price Delta Gamma Theta Vega Rho Implied.Volatility
1 40.69 0.9237 3.2188 -0.7111 2.0493 0.0033 0.3119
2 40.69 0.7713 6.2267 -1.6352 4.3240 0.0032 0.3402
3 40.69 0.5822 8.4631 -2.0019 5.5782 0.0338 0.3229
4 40.69 0.3642 8.5186 -1.8403 5.3661 0.0210 0.3086
5 40.69 0.1802 6.1968 -1.2366 3.7517 0.0093 0.2966
I would like to add a simple constraint that:
sum(n * delta) = target
In other words, the summation of the optimized parameters, n, multiplied by the deltas in my in_data data frame sum to a target of some sort. For simplicity, lets just say 0.5. How do I impose
sum(n * delta) = 0.5
as a constraint? Thank you for your help!
OK, thank you for all of your suggestions. I have researched and worked through my problem from many angles, and I wanted to share my thoughts with everyone, in case they can be helpful to some of you.
Most obvious, in my particular objective function, deltaSum is a variable, and I am attempting to constrain it to a particular value. Simple substitution of this constrained value into the objective function is the solution to this (trivial). However, assuming I was to introduce a constraint on a variable which is not already a variable in the objective function, I can simply run a for loop which returns Inf for any constraint I wish to impose, ie:
obj_func_sum_RRRs <- function(n, in_data) {
#Declare deltaSum, gammaSum, thetaSum, vegaSum, and rhoSum from in_data
#Impose constraints
#No dividing by 0:
if (thetaSum == 0) {
return(Inf)
}
#Specify that regardless of the length of vector of variables to
#be optimized, we only want our final results to include either 4 or 6
#nonzero n's in our final optimized solution
if (!sum(n[1:length(n)] != 0) == 4 &
!sum(n[1:length(n)] != 0) == 6) {
return(Inf)
}
(deltaSum + gammaSum)/thetaSum
}
The first for loop, (thetaSum == 0, return Inf) works because while Inf is a solution which the optimizer understands (and will never select as optimal), division by 0 in R returns NaN, which "breaks" the optimization process. This is a bit "hacky", in that it is likely NOT the most computationally efficient way to approach the problem, but to be honest, with the infrastructure that I am developing with a close friend and software architect guru (which utilizes microservices deployed through the Microsoft Service Fabric), our long-range backtesting is still lightening quick. This methodology actually allows you to impose any number of constraints on your problem, although further testing would need to be done to see how burdensome the computational complexity could become using this technique...
The Lagrange technique above can be viable, but only if you derive an analytical form of lambda on paper, then implement in code. It is not always practical in application, and while you may be able to code up an algorithm to optimize the parameter, it sounds like a bad idea to paint yourself into a corner where you have to optimize a parameter which is, in turn, necessary to the optimizing of the original objective function. Just setting a for loop as advised above seems the better way to go.
Food for thought....
DEOptim package description says
Implements the differential evolution algorithm for global
optimization of a realvalued function of a real-valued parameter
vector.
The concept of global optimization doesn't have place for constraints and it is also known as unconstrained optimization. So sorry but its not possible directly. Having said that you can always use "Lagrange's multiplier" hack if you must do it. To do it you need to do something like:
abs((EPC * gammaSum - 2 * abs(deltaSum))/thetaSum) - lambda* (sum(n * delta) - 0.5)
where you penalizing slack of your constraint.
I am using a wrapper which customises the call of DEoptim based on external constraints. Not very elegant I admit it but it works to some extent.
My objective function - a Monte Carlo simulation - is quite time consuming
so constraints are really helpful...
Chris
Due to the very specific character of what I am doing (Monte Carlo raytracing for the optimisation of neutron beam optics) I did not see any reason to add code. I think it is really the concept what matters here. I'll gladly share what I have with anybody interested. Just let me know.... Chris

Conditioned random generating variables from a distribution function

My question is related to my previous one Generate random variables from a distribution function using inverse sampling
Now I want to generate random variables from a distribution function using inverse sampling but the sampling should be conditioned.
For example, if the inverse of my cdf is :
invcdf <- function(y) a2 * log(a1/y - 1) + a3
I used inverse sampling to generate 10 rv as follows :
invcdf(runif(10))
Now, the problem is that I want the values generated greater or less than a value.
How should I introduce this condition in random generator?
When I use this to have value greater than 500 :
invcdf(runif(10,500,1e6))
I get this error message :
Warning message:
In log((a0/y) - 1) : NaNs produced
I already try to repeat the process until having values satsifying my constraints but it is not efficient!
repeat{
x=invcdf(runif(1))
if(x>100){
break
}
As #spf614 noted, you'd better have checks in your function like
invcdf <- function(y) {
if (a1 > y) {
return( a2 * log(a1/y - 1) + a3 )
}
NaN
}
Then it works for all arguments
Sampling would be
low <- ...
r <- invcdf(runif(low, a1, 1e6))
UPDATE
checking for NaNs in output
nof_nans <- sum(is.nan(r))
if (nof_nans > 0) {
....
The reason that you're getting NaNs is that R is trying to take the logarithm of a negative number. Do you want the log term to be log((a1/y)-1) or log(a1/(y-1))? You currently have the function written the first way, and when you get a very high value for y, the term a1/y approaches zero (the speed with which it approaches zero depends on the value of a1). Thus, subtracting 1 gives you a negative number inside the log function. So if the term is meant to be how you have it written (log(a1/y-1)), you simply won't be able to calculate that above certain values of y.
The simple fix is just
invcdf <- function(y){
a2 * log(a1/(y-1)) + a3
}

Numerical precision problems in R?

I have a problem with the following function in R:
test <- function(alpha, beta, n){
result <- exp(lgamma(alpha) + lgamma(n + beta) - lgamma(alpha + beta + n) - (lgamma(alpha) + lgamma(beta) - lgamma(alpha + beta)))
return(result)
}
Now if you insert the following values:
betabinom(-0.03292708, -0.3336882, 10)
It should fail and result in a NaN. That is because if we implement the exact function in Excel, we would get a result that is not a number. The implementation in Excel is simple, for J32 is a cell for alpha, K32 beta and L32 for N. The implementation of the resulting cell is given below:
=EXP(GAMMALN(J32)+GAMMALN(L32+K32)-GAMMALN(J32+K32+L32)-(GAMMALN(J32)+GAMMALN(K32)-GAMMALN(J32+K32)))
So this seems to give the correct answer, because the function is only defined for alpha and beta greater than zero and n greater or equal to zero. Therefore I am wondering what is happening here? I have also tried the package Rmpf to increase the numerical accuracy, but that does not seem to do anything.
Thanks
tl;dr log(gamma(x)) is defined more generally than you think, or than Excel thinks. If you want your function not to accept negative values of alpha and beta, or to return NaN, just test manually and return the appropriate values (if (alpha<0 || beta<0) return(NaN)).
It's not a numerical accuracy problem, it's a definition issue. The Gamma function is defined for negative real values: ?lgamma says:
The gamma function is defined by (Abramowitz and Stegun section 6.1.1, page 255)
Gamma(x) = integral_0^Inf t^(x-1) exp(-t) dt
for all real ‘x’ except zero and negative integers (when ‘NaN’ is returned).
Furthermore, referring to lgamma ...
... and the natural logarithm of the absolute value of the gamma function ...
(emphasis in original)
curve(lgamma(x),-1,1)
gamma(-0.1) ## -10.68629
log(gamma(-0.1)+0i) ## 2.368961+3.141593i
log(abs(gamma(-0.1)) ## 2.368961
lgamma(-0.1) ## 2.368961
Wolfram Alpha agrees with second calculation.

integrate a very peaked function

I am using integrate function in R to integrate a very peaked function.
Say that function is a log-normal density:
xs <- seq(0,3,0.00001)
fun <- function(xs) dlnorm(xs, meanlog=-1.057822,sdlog=0.001861871)
plot(xs,fun(xs),type="l")
From the plot, I know that the peak is at around 0.3-0.4.
If I integrate this density function over its support (with increased abs.tol and increased subdivisions) the integrate() gives me zero, which should not be true.
integrate(fun,lower=0,upper=Inf,subdivisions=10000000,abs.tol=1e-100)
0 with absolute error < 0
However, if I restrict the interval to 0.3 - 0.4, it gives me the correct answer.
integrate(fun,lower=0.3,upper=0.4,subdivisions=10000000,abs.tol=1e-100)
1 with absolute error < 1.7e-05
Is there a way to integrate this density without manually choosing the interval?
Not sure whether this is helpful -- might be too specific to dlnorm, but you can partition [0, Inf[, especially if you have a good idea of where the peak will end up:
integrate.dlnorm <- function(mu=0, sd=1, width=2) {
integral.l <- integrate(f=dlnorm, lower=0, upper=exp(mu - width * sd), meanlog=mu, sdlog=sd)$value
integral.m <- integrate(f=dlnorm, lower=exp(mu - width * sd), upper=exp(mu + width * sd), meanlog=mu, sdlog=sd)$value
integral.u <- integrate(f=dlnorm, lower=exp(mu + width * sd), upper=Inf, meanlog=mu, sdlog=sd)$value
return(integral.l + integral.m + integral.u)
}
integrate.dlnorm() # 1
integrate.dlnorm(-1.05, 10^-3) # .97
integrate.dlnorm(-1.05, 10^-3, 3) # .998
integrate:
Like all numerical integration routines, these evaluate the function
on a finite set of points. If the function is approximately constant
(in particular, zero) over nearly all its range it is possible that
the result and error estimate may be seriously wrong.
So, the answer is no.
You really need to know something about the function to compute the integral correctly - for any automated algorithm which detects support there is a function for which it fails.
PS (7 years later). For any deterministic algorithm, and any error, there is a function, such that this algorithm will make this error on it.

Resources