Integration of a function with while loop in R - r

I want to integrate a function involving while loop in R. I have pasted here an MWE. Could anyone please guide about how to get rid of warning messages when integrating such a function?
Thank You
myfun <- function(X, a, b, kmin, kmax){
term <- 0
k <- 1
while(k < kmax | term < 10000){
term <- term + a * b * X^k
k <- k+1
}
fx <- exp(X) * term
return(fx)
}
a <- 5
b <- 4
kmax <- 20
integrate(myfun, lower = 0, upper = 10, a = a, b = b, kmax = kmax)
Produces a warning, accessed via warnings():
In while (k < kmax | term < 10000) { ... :
the condition has length > 1 and only the first element will be used

From the integrate() documentation:
f must accept a vector of inputs and produce a vector of function evaluations at those points.
This is the crux of the problem here, which you can see by running myfun(c(1, 2), a, b, kmin, kmax) and reproducing a similar warning. What's happening is that integrate() wants to pass a vector of inputs to myfun in X; this means that inside your while loop, term will become a vector as well. This creates a problem when the while loop kicks back to the evaluation stage, because now the condition k < kmax | term < 10000 has a vector structure as well (since term does), which while doesn't like.
This warning is very good in this case, because it strongly suggests that integrate() isn't doing what you want it to do. Your goal here isn't to get rid of the warning messages; the function as written simply won't work with integrate() due to the while loop structure.
Your choices for how to proceed are to either (1) rewrite the function in a way that doesn't use a while loop, or (2) just hard-code some numeric integration yourself, perhaps with a for loop. The best way to use R is to vectorize everything and to avoid things like while and for when at all possible.
Finally, I'll note that there seems to be some problem with the underlying function, since myfun(0.5, a, b, kmin, kmax) does not converge (note the problem with the mathematics when the supplied X term is less than 1), so you won't be able to integrate it on the interval [0, 10] no matter what you do.

Related

R - In sqrt(1 - h * h) : NaNs produced from within rcorr - full sample data available

I have some code that creates a matrix of survey question responses, Rows - answers to the questions from a unique survey instrument, columns the individual questions. A final column has been appended with the row means. This is then passed to rcorr for evaluation. I have 15 sets of data, but only within this particular set is it throwing NaNs, and I can't see what the difference is.
m.rcorr <- rcorr(matrix, type="pearson")
A CSV of the matrix being passed is linked here. There are legitimate values of NA in some columns, as not every respondent answers each question. Other responses are 0, 25,50,75, or 100.
I get two warnings of: In sqrt(1 - h * h) : NaNs produced on this data set.
I have 14 other sets that run without NaNs being produced that have varying occurrences of NA, and I even took a look at whether 0 was the problem, but other sets again handle those fine.
Next I stepped into rcorr, assigning my matrix to x:
type <- "pearson"
{
type <- match.arg(type)
if (!missing(y))
x <- cbind(x, y)
x[is.na(x)] <- 1e+50
storage.mode(x) <- "double"
p <- as.integer(ncol(x))
if (p < 1)
stop("must have >1 column")
n <- as.integer(nrow(x))
if (n < 5)
stop("must have >4 observations")
h <- .Fortran(F_rcorr, x, n, p, itype = as.integer(1 + (type == "spearman")), hmatrix = double(p * p), npair = integer(p * p), double(n), double(n), double(n), double(n), double(n), integer(n))
The assignment of h is where I get stuck
Error: object 'F_rcorr' not found
The package Hmisc is installed and loaded, as again, this code works 14 out of 15 times.
F_rcorr is an internal Hmisc function, according to the help, not to be called by the user or undocumented, so I'm not quite sure where to go next.
I'm looking to answer two questions:
Why is this particular set throwing the NaNs
What impact does that have on the final values of the list $r from the rcorr results.
Addendum: Using the Hmisc::: prefix as suggested in the comment, I was able to get further and found two pairs in my data that when the value of h was 1, instead of 1 - h * h evaluating to 0, it was evaluating to the two very small negative numbers. It was only in these two pairs, and didn't happen on the diagonal, or in other places where that pair was valued at 1, so I'm not sure why those two generated weirdness, since 1 - 1 * 1 should equal 0 all day long.
However, to get around that I copied the rcorr function into a new function, adding these two lines before the P assignment, and then took the sqrt of the new D that substituted the negative numbers with 0.
D <- 1 - h * h
D[D<0] <- 0
P <- matrix(2 * (1 - pt(q = abs(h) * sqrt(npair - 2)/sqrt(D), df = npair - 2)), ncol = p)
I still like to know what may be going on that created the result of tiny negative number instead of 0 in that calculation, but I believe I've found a non-harmful way of getting around it.
So I figured what the heck, and emailed Dr. Harrell, and he replied back that in the next publication of Hmisc he's going to replace sqtr(1 - h * h) with max(0, 1-h^2), which would resolve it (more cleanly) as I did, in substituting 0 for the tiny negative numbers.
I'll admit I fan-girled a bit with him answering my email.

Nested integration for incomplete convolution of gauss densities

Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.

Optimize within for loop cannot find function

I've got a function, KozakTaper, that returns the diameter of a tree trunk at a given height (DHT). There's no algebraic way to rearrange the original taper equation to return DHT at a given diameter (4 inches, for my purposes)...enter R! (using 3.4.3 on Windows 10)
My approach was to use a for loop to iterate likely values of DHT (25-100% of total tree height, HT), and then use optimize to choose the one that returns a diameter closest to 4". Too bad I get the error message Error in f(arg, ...) : could not find function "f".
Here's a shortened definition of KozakTaper along with my best attempt so far.
KozakTaper=function(Bark,SPP,DHT,DBH,HT,Planted){
if(Bark=='ob' & SPP=='AB'){
a0_tap=1.0693567631
a1_tap=0.9975021951
a2_tap=-0.01282775
b1_tap=0.3921013594
b2_tap=-1.054622304
b3_tap=0.7758393514
b4_tap=4.1034897617
b5_tap=0.1185960455
b6_tap=-1.080697381
b7_tap=0}
else if(Bark=='ob' & SPP=='RS'){
a0_tap=0.8758
a1_tap=0.992
a2_tap=0.0633
b1_tap=0.4128
b2_tap=-0.6877
b3_tap=0.4413
b4_tap=1.1818
b5_tap=0.1131
b6_tap=-0.4356
b7_tap=0.1042}
else{
a0_tap=1.1263776728
a1_tap=0.9485083275
a2_tap=0.0371321602
b1_tap=0.7662525552
b2_tap=-0.028147685
b3_tap=0.2334044323
b4_tap=4.8569609081
b5_tap=0.0753180483
b6_tap=-0.205052535
b7_tap=0}
p = 1.3/HT
z = DHT/HT
Xi = (1 - z^(1/3))/(1 - p^(1/3))
Qi = 1 - z^(1/3)
y = (a0_tap * (DBH^a1_tap) * (HT^a2_tap)) * Xi^(b1_tap * z^4 + b2_tap * (exp(-DBH/HT)) +
b3_tap * Xi^0.1 + b4_tap * (1/DBH) + b5_tap * HT^Qi + b6_tap * Xi + b7_tap*Planted)
return(y=round(y,4))}
HT <- .3048*85 #converting from english to metric (sorry, it's forestry)
for (i in c((HT*.25):(HT+1))) {
d <- KozakTaper(Bark='ob',SPP='RS',DHT=i,DBH=2.54*19,HT=.3048*85,Planted=0)
frame <- na.omit(d)
optimize(f=abs(10.16-d), interval=frame, lower=1, upper=90,
maximum = FALSE,
tol = .Machine$double.eps^0.25)
}
Eventually I would like this code to iterate through a csv and return i for the best d, which will require some rearranging, but I figured I should make it work for one tree first.
When I print d I get multiple values, so it is iterating through i, but it gets held up at the optimize function.
Defining frame was my most recent tactic, because d returns one NaN at the end, but it may not be the best input for interval. I've tried interval=c((HT*.25):(HT+1)), defining KozakTaper within the for loop, and defining f prior to the optimize, but I get the same error. Suggestions for what part I should target (or other approaches) are appreciated!
-KB
Forestry Research Fellow, Appalachian Mountain Club.
MS, University of Maine
**Edit with a follow-up question:
I'm now trying to run this script for each row of a csv, "Input." The row contains the values for KozakTaper, and I've called them with this:
Input=read.csv...
Input$Opt=0
o <- optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='Input$Species',
DHT=x,
DBH=(2.54*Input$DBH),
HT=(.3048*Input$Ht),
Planted=0)),
lower=Input$Ht*.25, upper=Input$Ht+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
Input$Opt <- o$minimum
Input$Mht <- Input$Opt/.3048. # converting back to English
Input$Ht and Input$DBH are numeric; Input$Species is factor.
However, I get the error invalid function value in 'optimize'. I get it whether I define "o" or just run optimize. Oddly, when I don't call values from the row but instead use the code from the answer, it tells me object 'HT' not found. I have the awful feeling this is due to some obvious/careless error on my part, but I'm not finding posts about this error with optimize. If you notice what I've done wrong, your explanation will be appreciated!
I'm not an expert on optimize, but I see three issues: 1) your call to KozakTaper does not iterate through the range you specify in the loop. 2) KozakTaper returns a a single number not a vector. 3) You haven't given optimize a function but an expression.
So what is happening is that you are not giving optimize anything to iterate over.
All you should need is this:
optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='RS',
DHT=x,
DBH=2.54*19,
HT=.3048*85,
Planted=0)),
lower=HT*.25, upper=HT+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
$minimum
[1] 22.67713 ##Hopefully this is the right answer
$objective
[1] 0
Optimize will now substitute x in from lower to higher, trying to minimize the difference

Find minimums with R (1 Variable X, n times a fixed parameter U)

I'm trying to minimize a function f(X,U) = (X*log(X)-1/(1-U))^2
where U=(U_1,...,U_n) ~ U(0,1), that means I have n amount of fixed U's and want to find the min of:
(x_1*ln(x_1)-1/(1-u_1))^2
(x_2*ln(x_2)-1/(1-u_2))^2
......
(x_n*ln(x_n)-1/(1-u_n))^2
For that, I wanted to use the optim function.
I have defined:
n <- 10^3
U <- sort(runif(n,min=0,max=1))
X <- c()
Xsolution<- c()
f <- function(X,U){
return(-(X*log(X)-(1/(1-U)))^2)
} #-, because min(f) = max(-f)
now I have no idea how to do this with optim()? I always get the following error for the following code:
for(i in 1:n){
Xsolution[i] <- optim(f(X,U[i])
}
Error in log(X) : non-numeric argument to mathematical function
Sidenote: I would welcome a method without a for-loop, since for great n, it will take too long. Maybe you can help me get it work with sapply? Or an alternative way?
Alternatively, I thought I got it working with optimize(...,maximize=FALSE,..):
f <- function (X, a) ((X*log(X)-(1/(1-a)))^2)
for (i in 1:n){
xmin[i] <- optimize(f, c(0, 10000), tol = 0.0001, a = U[i])
}
This doesn't work either properly...
Also, the problem may be that it will take tooooo long. I want to do it with n=10^6. But I'm quite sure there has to be a way doing it without a for-loop? I think the for-loop is the problem that makes this take ages. Please help me, I've been sitting on this problem for ages and it's quite frustrating.
Since X * log(X) = 1 / (1 - U[i]) can be solved numerically for any U[i], there is a solution for each distinct U[i] so any of the (X*ln(X)-1/(1-U[i]))^2 can be driven to zero and therefore there is a solution for each distinct U[i]. If typically the U[i] are all distinct that means there are length(U) solutions. The solutions are given by (can omit the unique if the U[i] are all distinct):
f <- function (X, a) ((X*log(X)-(1/(1-a)))^2)
unique(sapply(U, function(a) optimize(f, c(0, 1000000), a = a)$minimum))

Non-conformable arrays in R

y <- matrix(c(7, 9, -5, 0, 2, 6), ncol = 1)
try <- t(y)
tryy <- try %*% y
i <- solve(tryy)
h <- y %*% i %*% try
uniroot(as.vector(solve(((1-x) * diag(6)) + h)), c(-Inf, Inf))
Error in (1 - x) * diag(6) : non-conformable arrays
The purpose of this command uniroot(as.vector(solve(((1-x) * diag(6)) + h)), c(-Inf, Inf)) is to solve the characteristics equation det[(1-λ)I+h] = 0
where, λ=eigenvalues , I=identity matrix , h=hat matrix=y(y'y)^(-1)y'
here λ is unknown ,we have to solve for it.
I am not understanding where is the problem here? I have tried as:
as.vector(solve(6*diag(6)+h))
This is not non-conformable. But why is not working inside the uniroot function?
Your question is a bit confusing, so I have to make a couple of assumptions. If you want the eigenvalues of h, then the characteristic equation is:
det(h - I*λ) = 0
not
det[(1-λ)I+h] = 0
So I used the former.
Given the above, the short answer is: do it this way.
f <- function(lambda) det(h -lambda*diag(6))
F <- Vectorize(f)
library(rootSolve)
uniroot.all(F,c(-1000,1000),n=2000)
# [1] 0 1
# or, much more simply
eigen(h)$values
# [1] 1.000000e+00 2.220446e-16 0.000000e+00 -2.731318e-18 -6.876381e-18 -7.365903e-17
So h has 2 eigenvalues, 0 and 1. Note that the built-in function eigen(...) finds 6 roots, but 5 of them are within the machine tolerance of 0.
The question about why your code fails is a bit more involved.
First, your code:
tryy <- try %*% y
is the dot product of y with itself (so, a scalar), returned as a matrix with one element. When you "invert" that using solve(...)
i <- solve(tryy)
you simply take the reciprocal, so i is also a matrix with 1 element. I'm not sure if this is what you had in mind.
Second, uniroot(...) does not work this way. The first argument must be a function; you've passed an expression which depends on x, which in turn is undefined. You could try:
f <- function(x) det(h-x*diag(6))
uniroot(f,c(-Inf,Inf))
but this wouldn't work either because (a) uniroot(...) works on a finite interval, (b) it requires that the function f(...) have different sign at the ends of the interval, and (c) in any event it would return only one root (the smaller one).
So you could use uniroot.all(...) in package rootSolve. uniroot.all(...) also requires a function as it's first argument, but there's a twist: the function must be "vectorized". This means that if you pass a vector of lambda values, f(...) should return a vector of the same length. Fortunately in R there is an easy way to "vectorize" a given function, as in:
F <- Vectorize(f).
Even this has it's limits. uniroot.all(...) also requires a finite interval, so we have to guess what that is, and also it evaluates F on n sub-intervals. So if your interval does not contain all the roots, or if the sub-intervals are not small enough, you will not find all the roots.
Using the built-in eigen(...) function is definitely the best option.

Resources