I have an operation inside tight loop in R that I need to optimize. It's updating the weights inside an IRLS algorithm by calculating the Schur product of a vector and a matrix. That is, it multiplies each element in the matrix by the corresponding row value in the vector, producing a result of the same dimensions as the matrix. In overly simplified schematic form, it looks like this:
reweight = function(iter, w, Q) {
for (i in 1:iter) {
wT = w * Q
}
}
In normal R code, a new matrix of dim() [rows,cols] is created on each iteration:
cols = 1000
rows = 1000000
w = runif(rows)
Q = matrix(1.0, rows, cols)
Rprofmem()
reweight(5, w, Q)
Rprofmem(NULL)
nate#ubuntu:~/R$ less Rprofmem.out
8000000040 :"reweight"
8000000040 :"reweight"
8000000040 :"reweight"
8000000040 :"reweight"
8000000040 :"reweight"
And if the matrix is large (multiple GB), the cost of the memory allocation exceeds the time spent on the numeric operation:
nate#ubuntu:~/R$ perf record -p `pgrep R` sleep 5 && perf report
49.93% R [kernel.kallsyms] [k] clear_page_c_e
47.67% R libR.so [.] real_binary
0.57% R [kernel.kallsyms] [k] get_page_from_freelist
0.35% R [kernel.kallsyms] [k] clear_huge_page
0.34% R libR.so [.] RunGenCollect
0.20% R [kernel.kallsyms] [k] clear_page
It also consumes a lot of memory:
USER PID VSZ RSS COMMAND
nate 17099 22.5GB 22.5GB /usr/local/lib/R/bin/exec/R --vanilla
If the matrix is smaller (several MB) but the number of iterations is larger, the memory usage is more reasonable, but at the cost of the garbage collector using more time than the numeric calculations:
cols = 100
rows = 10000
w = runif(rows)
Q = matrix(1.0, rows, cols)
reweight(1000, w, Q)
(note that this is a new process starting from scratch)
61.51% R libR.so [.] RunGenCollect
26.40% R libR.so [.] real_binary
7.94% R libR.so [.] SortNodes
2.79% R [kernel.kallsyms] [k] clear_page_c_e
USER PID VSZ RSS COMMAND
nate 17099 191MB 72MB /usr/local/lib/R/bin/exec/R --vanilla
If I write my own function with Rcpp that does the work in place, I can get the memory allocation that I want:
library(Rcpp)
cppFunction('
void weightMatrix(NumericVector w,
NumericMatrix Q,
NumericMatrix wQ) {
size_t numRows = Q.rows();
for (size_t row = 0; row < numRows; row++) {
wQ(row,_) = w(row) * Q(row,_);
}
return;
}
')
reweightCPP = function(iter, w, Q) {
# Initialize workspace to non-NA
wQ = matrix(1.0, nrow(Q), ncol(Q))
for (i in 1:iter) {
weightMatrix(w, Q, wQ)
}
}
cols = 100
rows = 10000
w = runif(rows)
Q = matrix(1.0, rows, cols)
wQ = matrix(NA, rows, cols)
Rprofmem()
reweightCPP(5, w, Q)
Rprofmem(NULL)
nate#ubuntu:~/R$ less Rprofmem.out
8000040 :"matrix" "reweightCPP"
2544 :"<Anonymous>" "weightMatrix" "reweightCPP"
2544 :"<Anonymous>" "weightMatrix" "reweightCPP"
2544 :"<Anonymous>" "weightMatrix" "reweightCPP"
2544 :"<Anonymous>" "weightMatrix" "reweightCPP"
2544 :"<Anonymous>" "weightMatrix" "reweightCPP"
(What's the 2544 bytes of allocation for? It seems to be an Rcpp constant. Is there any way I can avoid it?)
Performance is still suboptimal due to the Rcpp sugar:
76.53% R sourceCpp_82335.so [.] _Z12weightMatrixN4Rcpp6VectorILi14ENS_15PreserveStorageEEENS_6MatrixILi14ES1_EES4_
10.46% R libR.so [.] Rf_getAttrib
9.53% R libR.so [.] getAttrib0
2.06% R libR.so [.] Rf_isMatrix
0.42% R libR.so [.] INTEGER
But I can mostly fix that by resorting to lower level C++:
cppFunction('
void weightMatrix(NumericVector w_,
NumericMatrix Q_,
NumericMatrix wQ_) {
size_t numCols = Q_.ncol();
size_t numRows = Q_.nrow();
double * __restrict__ w = &w_[0];
double * __restrict__ Q = &Q_[0];
double * __restrict__ wQ = &wQ_[0];
for (size_t row = 0; row < numRows; row++) {
size_t colOffset = 0;
for (size_t col = 0; col < numCols; col++) {
wQ[colOffset + row] = w[row] * Q[colOffset + row];
colOffset += numRows;
}
}
return;
}
')
99.18% R sourceCpp_59392.so [.] sourceCpp_48203_weightMatrix
0.06% R libR.so [.] PutRNGstate
0.06% R libR.so [.] do_begin
0.06% R libR.so [.] Rf_eval
That said, I still haven't figured out to get the compiler to reliably generate efficient assembly without resorting to using SIMD intrinsics to force the use of VMULPD. Even with the ugly '__restrict__' attributes, in the form shown here it seems compelled to invert my loop order and do a lot of unnecessary work. But presumably I'll find the magic cross-compiler syntax eventually, or more likely, call out to a Fortran BLAS function.
Which brings me to my questions:
Is there any way that I can get the performance I want without going to all this trouble? Failing that, is there any way that I can at least hide it behind the scenes so that the end user in R can use "wQ = w * Q" and have it magically reuse wQ instead of allocating and throwing away another giant matrix?
The BLAS wrappers in R seem to do a fairly good job for cases where the answer can be written into one of the operands (Q = w * Q), but I haven't found any way to do this when I need a "3rd party" workspace. Is there maybe some reasonable way to define a method for %=% that will convert "wQ = w * Q" to "op_mult(w, Q, wQ)"?
To preempt the question as to whether it matters: yes, I've measured, and it matters. The use case is an ensemble of cross-validated logistic regressions inside a loop handling large arrays of longitudinal data (http://cran.r-project.org/web/packages/ltmle/ltmle.pdf). It will be called millions (if not billions) of times per analysis. A good optimization of this function would help to get the runtime from "impossible" down to "days". A great optimization (or rather the combination of several such optimizations) might get it down to "hours" or even "minutes".
Edit: In the comments, Henrik correctly points out that the example loop has been simplified to the point that it simply repeats the same calculation multiple times. I hoped this would focus the issue, but perhaps it confuses it. In the real version, there will be more steps in the loop such that the 'w' in the 'w * Q' is different each iteration. Below is a poorly tested draft version of the actual functions. This one is a "semi-optimized" logistic regression in straight R based on O'Leary's QR Newton IRLS described by Bryan Lewis.
logistic_irls_qrnewton = function(A, y, maxIter=25, targetSSE=1e-16) {
# warn user below on first weight less than threshold
tinyWeightsFound = FALSE
tiny = sqrt(.Machine$double.eps)
# decompose A to QR (only once, Choleski done in loop)
QR = qr(A) # A[rows=samples, cols=covariates]
Q = qr.Q(QR) # Q[rows, cols] (same dimensions as A)
R = qr.R(QR) # R[cols, cols] (upper right triangular)
# copying now prevents copying each time y is used as argument
y = y + 0; # y[rows]
# first pass is outside loop since initial values are constant
iter = 1
t = (y - 0.5) * 4.0 # t[rows] = (y - m) * initial weight
C = chol(crossprod(Q, Q)) # C[rows, rows]
t = crossprod(Q,t)
s = forwardsolve(t(C), t) # s[cols]
s = backsolve(C, s))
t = Q %*% s
sse = crossprod(s) # sum of squared errors
print(as.vector(sse))
converged = ifelse(sse < targetSSE, 1, 0)
while (converged == 0 && iter < maxIter) {
iter = iter + 1
# only t is required as an input
dim(t) = NULL # matrix to vector to counteract crossprod
e = exp(t)
m = e / (e + 1) # mu = exp(eta) / (1 + exp(eta))
d = m / (e + 1) # mu.eta = exp(eta) / (1 + exp(eta))^2
w = d * d / (m - m^2) # W = (1 / variance) = 1 / (mu * (1 - mu))
if(tinyWeightsFound == FALSE && min(w) < tiny) {
print("Tiny weights found")
tinyWeightsFound = TRUE
}
t = crossprod(Q, w * (((y - m) / d) + t))
C = chol(crossprod(Q, w * Q))
n = forwardsolve(t(C), t)
n = backsolve(C, n)
t = Q %*% n
sse = crossprod(n - s) # divergence from previous
s = n # save divergence for difference from next
print(as.vector(sse))
if (sse < targetSSE) converged = iter
}
if (converged == 0) {
print(paste("Failed to converge after", iter, "iterations"))
print(paste("Final SSE was", sse))
} else {
print(paste("Convergence after iteration", iter))
}
coefficients = backsolve(R, crossprod(Q,t))
dim(coefficients) = NULL # return as a vector
coefficients
}
Related
The renewal function for Weibull distribution m(t) with t = 10 is given as below.
I want to find the value of m(t). I wrote the following r code to compute m(t)
last_term = NULL
gamma_k = NULL
n = 50
for(k in 1:n){
gamma_k[k] = gamma(2*k + 1)/factorial(k)
}
for(j in 1: (n-1)){
prev = gamma_k[n-j]
last_term[j] = gamma(2*j + 1)/factorial(j)*prev
}
final_term = NULL
find_value = function(n){
for(i in 2:n){
final_term[i] = gamma_k[i] - sum(last_term[1:(i-1)])
}
return(final_term)
}
all_k = find_value(n)
af_sum = NULL
m_t = function(t){
for(k in 1:n){
af_sum[k] = (-1)^(k-1) * all_k[k] * t^(2*k)/gamma(2*k + 1)
}
return(sum(na.omit(af_sum)))
}
m_t(20)
The output is m(t) = 2.670408e+93. Does my iteratvie procedure correct? Thanks.
I don't think it will work. First, lets move Γ(2k+1) from denominator of m(t) into Ak. Thus, Ak will behave roughly as 1/k!.
In the nominator of the m(t) terms there is t2k, so roughly speaking you're computing sum with terms
100k/k!
From Stirling formula
k! ~ kk, making terms
(100/k)k
so yes, they will start to decrease and converge to something but after 100th term
Anyway, here is the code, you could try to improve it, but it breaks at k~70
N <- 20
A <- rep(0, N)
# compute A_k/gamma(2k+1) terms
ps <- 0.0 # previous sum
A[1] = 1.0
for(k in 2:N) {
ps <- ps + A[k-1]*gamma(2*(k-1) + 1)/factorial(k-1)
A[k] <- 1.0/factorial(k) - ps/gamma(2*k+1)
}
print(A)
t <- 10.0
t2 <- t*t
r <- 0.0
for(k in 1:N){
r <- r + (-t2)^k*A[k]
}
print(-r)
UPDATE
Ok, I calculated Ak as in your question, got the same answer. I want to estimate terms Ak/Γ(2k+1) from m(t), I believe it will be pretty much dominated by 1/k! term. To do that I made another array k!*Ak/Γ(2k+1), and it should be close to one.
Code
N <- 20
A <- rep(0.0, N)
psum <- function( pA, k ) {
ps <- 0.0
if (k >= 2) {
jmax <- k - 1
for(j in 1:jmax) {
ps <- ps + (gamma(2*j+1)/factorial(j))*pA[k-j]
}
}
ps
}
# compute A_k/gamma(2k+1) terms
A[1] = gamma(3)
for(k in 2:N) {
A[k] <- gamma(2*k+1)/factorial(k) - psum(A, k)
}
print(A)
B <- rep(0.0, N)
for(k in 1:N) {
B[k] <- (A[k]/gamma(2*k+1))*factorial(k)
}
print(B)
shows that
I got the same Ak values as you did.
Bk is indeed very close to 1
It means that term Ak/Γ(2k+1) could be replaced by 1/k! to get quick estimate of what we might get (with replacement)
m(t) ~= - Sum(k=1, k=Infinity) (-1)k (t2)k / k! = 1 - Sum(k=0, k=Infinity) (-t2)k / k!
This is actually well-known sum and it is equal to exp() with negative argument (well, you have to add term for k=0)
m(t) ~= 1 - exp(-t2)
Conclusions
Approximate value is positive. Probably will stay positive after all, Ak/Γ(2k+1) is a bit different from 1/k!.
We're talking about 1 - exp(-100), which is 1-3.72*10-44! And we're trying to compute it precisely summing and subtracting values on the order of 10100 or even higher. Even with MPFR I don't think this is possible.
Another approach is needed
OK, so I ended up going down a pretty different road on this. I have implemented a simple discretization of the integral equation which defines the renewal function:
m(t) = F(t) + integrate (m(t - s)*f(s), s, 0, t)
The integral is approximated with the rectangle rule. Approximating the integral for different values of t gives a system of linear equations. I wrote a function to generate the equations and extract a matrix of coefficients from it. After looking at some examples, I guessed a rule to define the coefficients directly and used that to generate solutions for some examples. In particular I tried shape = 2, t = 10, as in OP's example, with step = 0.1 (so 101 equations).
I found that the result agrees pretty well with an approximate result which I found in a paper (Baxter et al., cited in the code). Since the renewal function is the expected number of events, for large t it is approximately equal to t/mu where mu is the mean time between events; this is a handy way to know if we're anywhere in the neighborhood.
I was working with Maxima (http://maxima.sourceforge.net), which is not efficient for numerical stuff, but which makes it very easy to experiment with different aspects. At this point it would be straightforward to port the final, numerical stuff to another language such as Python.
Thanks to OP for suggesting the problem, and S. Pappadeux for insightful discussions. Here is the plot I got comparing the discretized approximation (red) with the approximation for large t (blue). Trying some examples with different step sizes, I saw that the values tend to increase a little as step size gets smaller, so I think the red line is probably a little low, and the blue line might be more nearly correct.
Here is my Maxima code:
/* discretize weibull renewal function and formulate system of linear equations
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*
* This is a program for Maxima, a computer algebra system.
* http://maxima.sourceforge.net/
*/
"Definition of the renewal function m(t):" $
renewal_eq: m(t) = F(t) + 'integrate (m(t - s)*f(s), s, 0, t);
"Approximate integral equation with rectangle rule:" $
discretize_renewal (delta_t, k) :=
if equal(k, 0)
then m(0) = F(0)
else m(k*delta_t) = F(k*delta_t)
+ m(k*delta_t)*f(0)*(delta_t / 2)
+ sum (m((k - j)*delta_t)*f(j*delta_t)*delta_t, j, 1, k - 1)
+ m(0)*f(k*delta_t)*(delta_t / 2);
make_eqs (n, delta_t) :=
makelist (discretize_renewal (delta_t, k), k, 0, n);
make_vars (n, delta_t) :=
makelist (m(k*delta_t), k, 0, n);
"Discretized integral equation and variables for n = 4, delta_t = 1/2:" $
make_eqs (4, 1/2);
make_vars (4, 1/2);
make_eqs_vars (n, delta_t) :=
[make_eqs (n, delta_t), make_vars (n, delta_t)];
load (distrib);
subst_pdf_cdf (shape, scale, e) :=
subst ([f = lambda ([x], pdf_weibull (x, shape, scale)), F = lambda ([x], cdf_weibull (x, shape, scale))], e);
matrix_from (eqs, vars) :=
(augcoefmatrix (eqs, vars),
[submatrix (%%, length(%%) + 1), - col (%%, length(%%) + 1)]);
"Subsitute Weibull pdf and cdf for shape = 2 into discretized equation:" $
apply (matrix_from, make_eqs_vars (4, 1/2));
subst_pdf_cdf (2, 1, %);
"Just the right-hand side matrix:" $
rhs_matrix_from (eqs, vars) :=
(map (rhs, eqs),
augcoefmatrix (%%, vars),
[submatrix (%%, length(%%) + 1), col (%%, length(%%) + 1)]);
"Generate the right-hand side matrix, instead of extracting it from equations:" $
generate_rhs_matrix (n, delta_t) :=
[delta_t * genmatrix (lambda ([i, j], if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))];
"Generate numerical right-hand side matrix, skipping over formulas:" $
generate_rhs_matrix_numerical (shape, scale, n, delta_t) :=
block ([f, F, numer: true], local (f, F),
f: lambda ([x], pdf_weibull (x, shape, scale)),
F: lambda ([x], cdf_weibull (x, shape, scale)),
[genmatrix (lambda ([i, j], delta_t * if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))]);
"Solve approximate integral equation (shape = 3, t = 1) via LU decomposition:" $
fpprintprec: 4 $
n: 20 $
t: 1;
[AA, bb]: generate_rhs_matrix_numerical (3, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Iterative solution of approximate integral equation (shape = 3, t = 1):" $
xx: bb;
for i thru 10 do xx: AA . xx + bb;
xx - (AA.xx + bb);
xx_iterative: xx;
"Should find iterative and LU give same result:" $
xx_diff: xx_iterative - xx_by_lu[1];
sqrt (transpose(xx_diff) . xx_diff);
"Try shape = 2, t = 10:" $
n: 100 $
t: 10 $
[AA, bb]: generate_rhs_matrix_numerical (2, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Baxter, et al., Eq. 3 (for large values of t) compared to discretization:" $
/* L.A. Baxter, E.M. Scheuer, D.J. McConalogue, W.R. Blischke.
* "On the Tabulation of the Renewal Function,"
* Econometrics, vol. 24, no. 2 (May 1982).
* H(t) is their notation for the renewal function.
*/
H(t) := t/mu + sigma^2/(2*mu^2) - 1/2;
tx_points: makelist ([float (k/n*t), xx_by_lu[1][k, 1]], k, 1, n);
plot2d ([H(u), [discrete, tx_points]], [u, 0, t]), mu = mean_weibull(2, 1), sigma = std_weibull(2, 1);
So I searched the in internet looking for programs with Cramer's Rule and there were some few, but apparently these examples were for fixed matrices only like 2x2 or 4x4.
However, I am looking for a way to solve a NxN Matrix. So I started and reached the point of asking the user for the size of the matrix and asked the user to input the values of the matrix but then I don't know how to move on from here.
As in I guess my next step is to apply Cramer's rule and get the answers but I just don't know how.This is the step I'm missing. can anybody help me please?
First, you need to calculate the determinant of your equations system matrix - that is the matrix, that consists of the coefficients (from the left-hand side of the equations) - let it be D.
Then, to calculate the value of a certain variable, you need to take the matrix of your system (from the previous step), replace the coefficients of the corresponding column with constant terms (from the right-hand side), calculate the determinant of resulting matrix - let it be C, and divide C by D.
A bit more about the replacement from the previous step: say, your matrix if 3x3 (as in the image) - so, you have a system of equations, where every a coefficient is multiplied by x, every b - by y, and every c by z, and ds are the constant terms. So, to calculate y, you replace those coefficients that are multiplied by y - bs in this case, with ds.
You perform the second step for every variable and your system gets solved.
You can find an example in https://rosettacode.org/wiki/Cramer%27s_rule#C
Although the specific example deals with a 4X4 matrix the code is written to accommodate any size square matrix.
What you need is calculate the determinant. Cramer's rule is just for the determinant of a NxN matrix
if N is not big, you can use the Cramer's rule(see code below), which is quite straightforward. However, this method is not efficient; if your N is big, you need to resort to other methods, such as lu decomposition
Assuming your data is double, and result can be hold by double.
#include <malloc.h>
#include <stdio.h>
double det(double * matrix, int n) {
if( 1 >= n ) return matrix[ 0 ];
double *subMatrix = (double*)malloc(( n - 1 )*( n - 1 ) * sizeof(double));
double result = 0.0;
for( int i = 0; i < n; ++i ) {
for( int j = 0; j < n - 1; ++j ) {
for( int k = 0; k < i; ++k )
subMatrix[ j*( n - 1 ) + k ] = matrix[ ( j + 1 )*n + k ];
for( int k = i + 1; k < n; ++k )
subMatrix[ j*( n - 1 ) + ( k - 1 ) ] = matrix[ ( j + 1 )*n + k ];
}
if( i % 2 == 0 )
result += matrix[ 0 * n + i ] * det(subMatrix, n - 1);
else
result -= matrix[ 0 * n + i ] * det(subMatrix, n - 1);
}
free(subMatrix);
return result;
}
int main() {
double matrix[ ] = { 1,2,3,4,5,6,7,8,2,6,4,8,3,1,1,2 };
printf("%lf\n", det(matrix, 4));
return 0;
}
I was wondering how I can convert this code from Matlab to R code. It seems this is the code for midpoint method. Any help would be highly appreciated.
% Usage: [y t] = midpoint(f,a,b,ya,n) or y = midpoint(f,a,b,ya,n)
% Midpoint method for initial value problems
%
% Input:
% f - Matlab inline function f(t,y)
% a,b - interval
% ya - initial condition
% n - number of subintervals (panels)
%
% Output:
% y - computed solution
% t - time steps
%
% Examples:
% [y t]=midpoint(#myfunc,0,1,1,10); here 'myfunc' is a user-defined function in M-file
% y=midpoint(inline('sin(y*t)','t','y'),0,1,1,10);
% f=inline('sin(y(1))-cos(y(2))','t','y');
% y=midpoint(f,0,1,1,10);
function [y t] = midpoint(f,a,b,ya,n)
h = (b - a) / n;
halfh = h / 2;
y(1,:) = ya;
t(1) = a;
for i = 1 : n
t(i+1) = t(i) + h;
z = y(i,:) + halfh * f(t(i),y(i,:));
y(i+1,:) = y(i,:) + h * f(t(i)+halfh,z);
end;
I have the R code for Euler method which is
euler <- function(f, h = 1e-7, x0, y0, xfinal) {
N = (xfinal - x0) / h
x = y = numeric(N + 1)
x[1] = x0; y[1] = y0
i = 1
while (i <= N) {
x[i + 1] = x[i] + h
y[i + 1] = y[i] + h * f(x[i], y[i])
i = i + 1
}
return (data.frame(X = x, Y = y))
}
so based on the matlab code, do I need to change h in euler method (R code) to (b - a) / n to modify Euler code to midpoint method?
Note
Broadly speaking, I agree with the expressed comments; however, I decided to vote up this question. (now deleted) This is due to the existence of matconv that facilitates this process.
Answer
Given your code, we could use matconv in the following manner:
pacman::p_load(matconv)
out <- mat2r(inMat = "input.m")
The created out object will attempt to translate Matlab code into R, however, the job is far from finished. If you inspect the out object you will see that it requires further work. Simple statements are usually translated correctly with Matlab comments % replaced with # and so forth but more complex statements may require a more detailed investigation. You could then inspect respective line and attempt to evaluate them to see where further work may be required, example:
eval(parse(text=out$rCode[1]))
NULL
(first line is a comment so the output is NULL)
I tried to implement bessel function using that formula, this is the code:
function result=Bessel(num);
if num==0
result=bessel(0,1);
elseif num==1
result=bessel(1,1);
else
result=2*(num-1)*Bessel(num-1)-Bessel(num-2);
end;
But if I use MATLAB's bessel function to compare it with this one, I get too high different values.
For example if I type Bessel(20) it gives me 3.1689e+005 as result, if instead I type bessel(20,1) it gives me 3.8735e-025 , a totally different result.
such recurrence relations are nice in mathematics but numerically unstable when implementing algorithms using limited precision representations of floating-point numbers.
Consider the following comparison:
x = 0:20;
y1 = arrayfun(#(n)besselj(n,1), x); %# builtin function
y2 = arrayfun(#Bessel, x); %# your function
semilogy(x,y1, x,y2), grid on
legend('besselj','Bessel')
title('J_\nu(z)'), xlabel('\nu'), ylabel('log scale')
So you can see how the computed values start to differ significantly after 9.
According to MATLAB:
BESSELJ uses a MEX interface to a Fortran library by D. E. Amos.
and gives the following as references for their implementation:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
The forward recurrence relation you are using is not stable. To see why, consider that the values of BesselJ(n,x) become smaller and smaller by about a factor 1/2n. You can see this by looking at the first term of the Taylor series for J.
So, what you're doing is subtracting a large number from a multiple of a somewhat smaller number to get an even smaller number. Numerically, that's not going to work well.
Look at it this way. We know the result is of the order of 10^-25. You start out with numbers that are of the order of 1. So in order to get even one accurate digit out of this, we have to know the first two numbers with at least 25 digits precision. We clearly don't, and the recurrence actually diverges.
Using the same recurrence relation to go backwards, from high orders to low orders, is stable. When you start with correct values for J(20,1) and J(19,1), you can calculate all orders down to 0 with full accuracy as well. Why does this work? Because now the numbers are getting larger in each step. You're subtracting a very small number from an exact multiple of a larger number to get an even larger number.
You can just modify the code below which is for the Spherical bessel function. It is well tested and works for all arguments and order range. I am sorry it is in C#
public static Complex bessel(int n, Complex z)
{
if (n == 0) return sin(z) / z;
if (n == 1) return sin(z) / (z * z) - cos(z) / z;
if (n <= System.Math.Abs(z.real))
{
Complex h0 = bessel(0, z);
Complex h1 = bessel(1, z);
Complex ret = 0;
for (int i = 2; i <= n; i++)
{
ret = (2 * i - 1) / z * h1 - h0;
h0 = h1;
h1 = ret;
if (double.IsInfinity(ret.real) || double.IsInfinity(ret.imag)) return double.PositiveInfinity;
}
return ret;
}
else
{
double u = 2.0 * abs(z.real) / (2 * n + 1);
double a = 0.1;
double b = 0.175;
int v = n - (int)System.Math.Ceiling((System.Math.Log(0.5e-16 * (a + b * u * (2 - System.Math.Pow(u, 2)) / (1 - System.Math.Pow(u, 2))), 2)));
Complex ret = 0;
while (v > n - 1)
{
ret = z / (2 * v + 1.0 - z * ret);
v = v - 1;
}
Complex jnM1 = ret;
while (v > 0)
{
ret = z / (2 * v + 1.0 - z * ret);
jnM1 = jnM1 * ret;
v = v - 1;
}
return jnM1 * sin(z) / z;
}
}
I've just been working though converting some MATLAB scripts to work in R, however having never used MATLAB in my life, and not exactly being an expert on R I'm having some trouble.
Edit: It's a script I was given designed to correct temperature measurements for lag generated by insulation mass effects. My understanding is that It looks at the rate of change of the temperature and attempts to adjust for errors generated by the response time of the sensor. Unfortunately there is no literature available to me to give me an indication of the numbers i am expecting from the function, and the only way to find out will be to experimentally test it at a later date.
the original script:
function [Tc, dT] = CTD_TempTimelagCorrection(T0,Tau,t)
N1 = Tau/t;
Tc = T0;
N = 3;
for j=ceil(N/2):numel(T0)-ceil(N/2)
A = nan(N,1);
# Compute weights
for k=1:N
A(k) = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)));
end
A = A./sum(A);
# Verify unity
if sum(A) ~= 1
disp('Error: Sum of weights is not unity');
end
Comp = nan(N,1);
# Compute components
for k=1:N
Comp(k) = A(k)*T0(j - (ceil(N/2)) + k);
end
Tc(j) = sum(Comp);
dT = Tc - T0;
end
where I've managed to get to:
CTD_TempTimelagCorrection <- function(temp,Tau,t){
## Define which equation to use based on duration of lag and frequency
## With ESM2 profiler sampling # 2hz: N1>tau/t = TRUE
N1 = Tau/t
Tc = temp
N = 3
for(i in ceiling(N/2):length(temp)-ceiling(N/2)){
A = matrix(nrow=N,ncol=1)
# Compute weights
for(k in 1:N){
A[k] = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)))
}
A = A/sum(A)
# Verify unity
if(sum(A) != 1){
print("Error: Sum of weights is not unity")
}
Comp = matrix(nrow=N,ncol=1)
# Compute components
for(k in 1:N){
Comp[k] = A[k]*temp[i - (ceiling(N/2)) + k]
}
Tc[i] = sum(Comp)
dT = Tc - temp
}
return(dT)
}
I think the problem is the Comp[k] line, could someone point out what I've done wrong? I'm not sure I can select the elements of the array in such a way.
by the way, Tau = 1, t = 0.5 and temp (or T0) will be a vector.
Thanks
edit: apparently my description is too brief in explaining my code samples, not really sure what more I could write that would be relevant and not just wasting peoples time. Is this enough Mr Filter?
The error is as follows:
Error in Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
replacement has length zero
In addition: Warning message:
In Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
number of items to replace is not a multiple of replacement length
If you write print(i - (ceiling(N/2)) + k) before that line, you will see that you are using incorrect indices for temp[i - (ceiling(N/2)) + k], which means that nothing is returned to be inserted into Comp[k]. I assume this problem is due to Matlab allowing the use of 0 as an index and not R, and the way negative indices are handled (they don't work the same in both languages). You need to implement a fix to return the correct indices.