A numerical library which uses a paralleled algorithm to do one dimensional integration? - numerical

Is there a numerical library which can use a paralleled algorithm to do one dimensional integration (global adaptive method)? The infrastructure of my code decides that I cannot do multiple numerical integrations in parallel, but I have to use a paralleled algorithm to speed up.
Thanks!

Nag C numerical library does have a parallel version of adaptive quadrature (link here). Their trick is to request the user the following function
void (*f)(const double x[], Integer nx, double fv[], Integer *iflag, Nag_Comm *comm)
Here the function "f" evaluates the integrand at nx abscise points given by the vector x[]. This is where parallelization comes along, because you can use parallel_for (implemented in openmp for example) to evaluate f at those points concurrently. The integrator itself is single threaded.
Nag is a very expensive library, but if you code the integrator yourself using, for example, numerical recipes, it is not difficult to modify serial implementations to create parallel adaptive integrators using NAG idea.
I can't reproduce numerical recipes book to show where modifications are necessary due to license restriction. So let's take the simplest example of trapezoidal rule, where the implementation is quite simple and well known. The simplest way to create adaptive method using trapezoidal rule is to calculate the integral at a grid of points, then double the number of abscise points and compare the results. If the result changes by less than the requested accuracy, then there is convergence.
At each step, the trapezoidal rule can be computed using the following generic implementation
double trapezoidal( double (*f)(double x), double a, double b, int n)
{
double h = (b - a)/n;
double s = 0.5 * h * (f(a) + f(b));
for( int i = 1; i < n; ++i ) s += h * f(a + i*h);
return s;
}
Now you can make the following changes to implement NAG idea
double trapezoidal( void (*f)( double x[], int nx, double fv[] ), double a, double b, int n)
{
double h = (b - a)/n;
double x[n+1];
double fv[n+1];
for( int i = 0; i < n; ++i ) x[i+1] = (a + i * h);
x[n] = b;
f(x, n, fv); // inside f, use parallel_for to evaluate the integrand at x[i], i=0..n
double s = 0.5 * h * ( fv[0] + fv[n] );
for( int i = 1; i < n; ++i ) s += h * fv[i];
return s;
}
This procedure, however, will only speed-up your code if the integrand is very expensive to compute. Otherwise, you should parallelize your code at higher loops and not inside the integrator.

Why not simply implement a wrapper around a single threaded algorithm that dispatches integrals of subdivisions of the bounds to different threads and then adds them together at the end? e.g.
thread 0: i0 = integral(x0, (x0+x1)/2)
thread 1: i1 = integral((x0+x1)/2, x1)
i = i0 + i1

Related

Complexity Recursion in For

Hi i wanted to know how can i solve the tine complexity of this algorithm
I solved with f(n/4) but not f(n/i)
void f(int n){
if (n<4) return;
for (int i=0;i*i<n;i++)
printf("-");
for (int i=2;i<4;i++)
f(n/i); // solved the case f(n/4) but stuck f(n/i)
}
Note that the loop condition is i<4, so i never reaches 4. i.e. the only recursive terms are f(n/2) and f(n/3).
Recurrence relation:
T(n) = T(n/2) + T(n/3) + Θ(sqrt(n))
There are two ways to approach this problem:
Find upper and lower bounds by replacing one of the recursive terms with the other:
R(n) = 2T(n/3) + Θ(sqrt(n))
S(n) = 2T(n/2) + Θ(sqrt(n))
R(n) ≤ T(n) ≤ S(n)
You can easily solve for both bounds by substitution or applying the Master Theorem:
R(n) = O(n^[log3(2)]) = O(n^0.63...)
S(n) = O(n)
If you need an exact answer, use the Akra-Bazzi method:
a1 = a2 = 1
h1(x) = h2(x) = 0
g(x) = sqrt(x)
b1 = 1/2
b2 = 1/3
You need to solve for a power p such that [1/2]^p + [1/3]^p = 1. Do this numerically with e.g. Newton-Raphson, to obtain p = 0.78788.... Perform the integral:
‒ to obtain T(n) = O(n^0.78...), which is consistent with the bounds found before.
I think this is about O(sqrt(9/2) * sqrt(n)) time, but I'd go with O(sqrt(n)) to be safe. It's admittedly been a while since I worked with time complexity.
If n < 4, the function returns immediately, at constant time O(1)
If n >= 4, the function's for loop, for (int i=0; i*i<n; i++) performs the constant-time function printf("-"); a total number of sqrt(n) times. So far we're at O(sqrt(n)) time.
The next for loop performs two recursive calls: one for f(n/2) and one for f(n/3)
The first runs in O(sqrt(n/2)) time, the second in O(sqrt(n/4)) time, and so on - this series converges to O(sqrt(2n))
Likewise, the function f(n/3) converges to O(sqrt(3/2 n))
This doesn't factor in the fact that each recursive call also invokes a little extra time by calling both of these functions when it runs, but I believe this converges to about O(sqrt(n)) + O(sqrt(2n)) + O(sqrt(3/2 n)), which itself converges to O(sqrt(9/2) * sqrt(n))
This is likely a little low bit low for an exact constant value, but I believe you can safely say this runs at O(sqrt(n)) time, with some small-ish constant out front.

What is the runtime of this recursive code?

I am wondering what the runtime for the following recursive function would be:
int f(int n) {
if (n <= 1) {
return 1;
}
return f(n-1) + f(n-1);
}
If you think of it as a call tree, each node would have 2 branches. The number of nodes in that call tree would be 2⁰ + 2¹ + 2² + 2³ + ... + 2^n which is equivalent to 2^(n+1) - 1. So the time complexity of this function should be O(2^(n+1)-1) assuming that each call has a constant time of O(1) - Am I correct?. According to the book where I have this example from, the time complexity is O(2^n). I am confused - what am I missing?
Big-O Notation ignores constant factors and lower order terms. So O(2^(n+1)-1) is equivalent to O(2^n).
O(2^(n+1)-1) = O(2^n * 2^1 - 1)
We drop the constant factor of 2^1, and then we drop the lower order term of -1 as 2^n grows asymptotically faster.

Any analytical math solver as preprocessor?

Solving (system of) equations by hand can be error prone. I waisted a lot of time in my life looking for bugs on my piece of paper in my derivation. Having experienced Maple, and its ability to solve equations and rsolve recursive equations, I was wondering if there is any tool out there that preprocesses code to solve equations analytically for a certain set of variables, and output the solution in analytical form, such that in the next step, the compiler compiles the solution.
I wrote this toy example (completely made up, incorrect and under-constrained), of what I think this might look like, in C++:
tuple<3> example_function(double y, double z, double alpha) {
// define variables
$real x, p0, gamma; // << unknowns
$real y, z, alpha; // << givens
// define equations
$eq rule1 = x*x / p0^gamma == 1;
$eq rule2 = gamma^2 + p0^2 == 1;
// instruct the preprocessor to solve it
$prog pr = solve {rule1, rule2}
for {x, p0, gamma}
assuming (positive x,
positive z,
positive gamma);
// define output variables to store the results in
double x, p0, gamma;
// now let the preprocessor output the solved equations
$exec pr;
return tuple<3>({x, p0, gamma});
}
The dollar ($) signs indicate the math language to be preprocessed.
This could for example be preprocessed to something like:
tuple<3> example_function(double y, double z, double alpha) {
double x, p0, gamma;
x = std::pow(y, -z) / (alpha - 1.0);
p0 = x / alpha;
gamma = alpha * alpha;
return tuple<3>({x, p0, gamma});
}
Of course, the preprocessor would know this language is C++ such that it can use keywords like double and use the std:: math functions.
Does anything like this exist? If not, any reason for this and what do you think about this idea?

K nearest neighbor search for caret R implementation

[EDIT: I understand that it is faster also because the function is written in C, but I want to know if It does a brute force search on all the training instances or something more sophisticated ]
I'm implementing in R, for studying purpose, the KNN algorithm.
I'm also checking the code correctness by comparison with the caret implementation.
The problem lies on the execution time of the two versions. My version seems to take a lot of time, instead the caret implementation is very fast (even with crossvalidation with 10 folds).
Why? I'm calculating every euclidean distance of my test instances from the training ones. Which means that I'm doing NxM distance calculation (where N are my test instances, and M my training instances):
for (i in 1:nrow(test)){
distances <- c()
classes <- c()
for(j in 1:nrow(training)){
d = calculateDistance(test[i,], training[j,])
distances <- c(distances, d)
classes <- c(classes, training[j,][[15]])
}
}
Is the caret implementation using some approximate search? Or an exact search, for example with the kd-tree? How can I speed up the search? I got 14 features for the problem, but I've been reading that the kd-tree is suggested for problem with 1 to 5 features.
EDIT:
I've found the C function called by R (VR_knn), which is pretty complex for me to understand, maybe someone can help.
Anyway I've written on the fly a brute force search in cpp, which seems to go faster than my previous R version, (but not fast as the caret C version) :
#include <Rcpp.h>
using namespace Rcpp;
double distance(NumericVector x1, NumericVector x2){
int vectorLen = x1.size();
double sum = 0;
for(int i=0;i<vectorLen-1;i++){
sum = sum + pow((x1.operator()(i)-x2.operator()(i)),2);
}
return sqrt(sum);
}
// [[Rcpp::export]]
void searchCpp(NumericMatrix training, NumericMatrix test) {
int numRowTr = training.rows();
int numColTr = training.cols();
int numRowTe = test.rows();
int numColTe = test.cols();
for (int i=0;i<numRowTe;i++)
{
NumericVector test_i = test.row(i);
NumericVector distances = NumericVector(numRowTe);
for (int j=0;j<numRowTr;j++){
NumericVector train_j = training.row(j);
double dist = distance(test_i, train_j);
distances.insert(i,dist);
}
}
}

How to refine the result of a floating point division result?

I have an an algorithm for calculating the floating point square root divide using the newton-raphson algorith. My results are not fully accurate and sometimes off by 1 ulp.
I was wondering if there is a refinement algorithm for floating point division to get the final bits of accuracy. I use the tuckerman test for square root, but is there a similar algorithm for division? Or can the tuckerman test be adapted for division?
I tried using this algorithm too but didn't get full accuracy results:
z= divisor
r_temp = divisor*q
r = dividend - r_temp
result_temp = r*z
q + result_temp
One practical way of correctly rounding the result of iterative division is to produce a preliminary quotient to within one ulp of the mathematical result, then use the exactly-computed residual to compute the final result.
The tool of choice for the exact computation of residuals is the fused-multiply add (FMA) operation. Much of the foundational work of this approach (both in terms of the mathematics and of practical implementations) is due to Peter Markstein and was later refined by other researchers. Markstein's results are nicely summarized in his book:
Peter Markstein, IA-64 and Elementary Functions: Speed and Precision. Prentice-Hall 2000.
A straightforward approach to correctly-rounded division using Markstein's approach is to first compute a correctly-rounded reciprocal, then compute the correctly-rounded quotient by multiplying it with the
dividend, followed by the final residual-based rounding step.
The residual can be used to compute the final rounded result directly, as is shown for the quotient rounding in the code below (I noticed that this code sequence resulted in an incorrectly rounded result in one out of 1011 divisions, and replaced it with another instance of the comparison-and-select idiom) which is the technique used by Markstein. Alternatively it may be used as part of a two-sided comparison-and-select process somewhat akin to Tuckerman rounding, which is shown for the reciprocal rounding in the code below.
There is one caveat with regard to the reciprocal computation. Many commonly used iterative approaches (including the one I used below), when combined with Markstein's rounding technique, deliver an incorrect result if the mantissa of the divisor consists entirely of 1-bits.
One way of getting around this is to treat this case specially. In the code below I instead opted for a two-sided comparison-and-select approach, which also allows errors slightly larger than one ulp prior to rounding and thus eliminates the need to use FMA in the reciprocal iteration itself.
Please note that I omitted the handling of sub-normal results in the C code below to keep the code concise and easy to follow. I have limited myself to standard C library functions for tasks like extracting parts of floating-point operands, assembling floating-point numbers, and applying one-ulp increments and decrements. Most platforms will offer machine-specific options with higher performance for these.
float my_divf (float a, float b)
{
float q, r, ma, mb, e, s, t;
int ia, ib;
if (!isnanf (a+b) && !isinff (a) && !isinff (b) && (b != 0.0f)) {
/* normal cases: remove sign, split args into exponent and mantissa */
ma = frexpf (fabsf (a), &ia);
mb = frexpf (fabsf (b), &ib);
/* minimax polynomial approximation to 1/mb for mb in [0.5,1) */
r = - 3.54939341e+0f;
r = r * mb + 1.06481802e+1f;
r = r * mb - 1.17573657e+1f;
r = r * mb + 5.65684575e+0f;
/* apply one iteration with cubic convergence */
e = 1.0f - mb * r;
e = e * e + e;
r = e * r + r;
/* round reciprocal to nearest-or-even */
e = fmaf (-mb, r, 1.0f); // residual of 1st candidate
s = nextafterf (r, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, 1.0f); // residual of 2nd candidate
r = (fabsf (e) < fabsf (t)) ? r : s; // candidate with smaller residual
/* compute preliminary quotient from correctly-rounded reciprocal */
q = ma * r;
/* round quotient to nearest-or-even */
e = fmaf (-mb, q, ma); // residual of 1st candidate
s = nextafterf (q, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, ma); // residual of 2nd candidate
q = (fabsf (e) < fabsf (t)) ? q : s; // candidate with smaller residual
/* scale back into result range */
r = ldexpf (q, ia - ib);
if (r < 1.17549435e-38f) {
/* sub-normal result, left as an exercise for the reader */
}
/* merge in sign of quotient */
r = copysignf (r, a * b);
} else {
/* handle special cases */
if (isnanf (a) || isnanf (b)) {
r = a + b;
} else if (b == 0.0f) {
r = (a == 0.0f) ? (0.0f / 0.0f) : copysignf (1.0f / 0.0f, a * b);
} else if (isinff (b)) {
r = (isinff (a)) ? (0.0f / 0.0f) : copysignf (0.0f, a * b);
} else {
r = a * b;
}
}
return r;
}

Resources