Some OpenCL functions crash the kernel if any of their arguments are non-standard floats like NAN or INFINITY. I am trying to stop these crashes from occurring by wrapping my arguments to remove bad values.
Here are three of my attempts to create argument sanitizers:
clean_arg = clamp(dirty_arg, 1, 10000); Crashes
clean_arg = convert_float(convert_int(dirty_arg)); Hacky
clean_arg = isnan(dirty_arg) ? 1 : dirty_arg; Verbose
Is there a better way to detect and remove undesirable floating-point values?
Related
My code contains some random steps and exponential expression (monotonic expression), which needs to find its root at the end. The "RuntimeWarning: invalid value encountered in double_scalars" appeared occasionally. For example, 3 or 2 times it appeared when I run 5 times. Could you tell me what's going on here? PS: each time I can get the result, but it's just the warning makes me confused.
There are two possible way to solve it, depends on your data.
1.
As you are handling some huge number and exceed the limit of double
To solve this, the method is actually quite mathematical.
First, if and only if (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) is always smaller than 1.7976931348623157e+308.
As a*e^(-x*b) = e(ln(a)-xb)
Thus, (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) = np.exp(T_data[runs][0]-(x)*(T_data[runs][1]))
Use np.exp(np.log(T_data[runs][0])-(x)*(T_data[runs][1])) instead.
2.
However, as you said you get the result everytime, it is possible that (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) is approaching zero, which is too small that double can no longer hold but cause no harm to save as 0.
And you should change your code like this to avoid the warning.
temp = (x)*(T_data[runs][1])) > 709 ? 0 : np.exp(-(x)*(T_data[runs][1]))
exponential += (T_data[runs][0]) * temp
## As ln(1.7976931348623157e+308) ~= 709.78
I have written a function that takes two arguments, a number between 0:16 and a vector which contains four parameter values.
The output of the function does change if I change the parameters in the vector, but it does not change if I change the number between 0:16.
I can add, that the function I'm having troubles with, includes another function (called 'pi') which takes the same arguments.
I have checked that the 'pi' function does actually change values if I change the value from 0:16 (and it does also change if I change the values of the parameters).
Firstly, here is my code;
pterm_ny <- function(x, theta){
(1-sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*theta[4])/pi(x, theta)
}
pi <- function(x, theta){
theta[1]*1*(x==0)+theta[2]*(theta[3]^(x))*exp((-1)*(theta[3]))+(1-
sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*(theta[4]))
}
Which returns 0.75 for pterm_ny(i,c(0.2,0.2,2,2)), were i = 1,...,16 and 0.2634 for i = 0, which tells me that the indicator function part in 'pi' does work.
With respect to raising a number to a certain power, I have been told that one should wrap the wished number in a 'I', as an example it would be like;
x^I(2)
I have tried to do that in my code, but that didn't help either.
I can't remember the argument for doing it, but I expect that it's to ensure that the number in parentheses is interpreted as an integer.
My end goal is to get 17 different values of the 'pterm' and to accomplish that, I was thinking of using the sapply function like this;
sapply(c(0:16),pterm_ny,theta = c(0.2,0.2,2,2))
I really hope that someone can point out what I'm missing here.
In advance, thank you!
You have a theta[4]^x term both in your main expression and in your pi() function; these are cancelling out, leaving the result invariant to changes in x ...
Also:
you might want to avoid using pi as your function name, as it's also a built-in variable (3.14159...) - this can sometimes cause confusion
the advice about using the "as is" function I() to protect powers is only relevant within formulas, e.g. as used in lm() (linear regression). (It would be used as I(x^2), not x^I(2)
There's a conditional debugging flag I miss from Matlab: dbstop if infnan described here. If set, this condition will stop code execution when an Inf or NaN is encountered (IIRC, Matlab doesn't have NAs).
How might I achieve this in R in a more efficient manner than testing all objects after every assignment operation?
At the moment, the only ways I see to do this are via hacks like the following:
Manually insert a test after all places where these values might be encountered (e.g. a division, where division by 0 may occur). The testing would be to use is.finite(), described in this Q & A, on every element.
Use body() to modify the code to call a separate function, after each operation or possibly just each assignment, which tests all of the objects (and possibly all objects in all environments).
Modify R's source code (?!?)
Attempt to use tracemem to identify those variables that have changed, and check only these for bad values.
(New - see note 2) Use some kind of call handlers / callbacks to invoke a test function.
The 1st option is what I am doing at present. This is tedious, because I can't guarantee I've checked everything. The 2nd option will test everything, even if an object hasn't been updated. That is a massive waste of time. The 3rd option would involve modifying assignments of NA, NaN, and infinite values (+/- Inf), so that an error is produced. That seems like it's better left to R Core. The 4th option is like the 2nd - I'd need a call to a separate function listing all of the memory locations, just to ID those that have changed, and then check the values; I'm not even sure this will work for all objects, as a program may do an in-place modification, which seems like it would not invoke the duplicate function.
Is there a better approach that I'm missing? Maybe some clever tool by Mark Bravington, Luke Tierney, or something relatively basic - something akin to an options() parameter or a flag when compiling R?
Example code Here is some very simple example code to test with, incorporating the addTaskCallback function proposed by Josh O'Brien. The code isn't interrupted, but an error does occur in the first scenario, while no error occurs in the second case (i.e. badDiv(0,0,FALSE) doesn't abort). I'm still investigating callbacks, as this looks promising.
badDiv <- function(x, y, flag){
z = x / y
if(flag == TRUE){
return(z)
} else {
return(FALSE)
}
}
addTaskCallback(stopOnNaNs)
badDiv(0, 0, TRUE)
addTaskCallback(stopOnNaNs)
badDiv(0, 0, FALSE)
Note 1. I'd be satisfied with a solution for standard R operations, though a lot of my calculations involve objects used via data.table or bigmemory (i.e. disk-based memory mapped matrices). These appear to have somewhat different memory behaviors than standard matrix and data.frame operations.
Note 2. The callbacks idea seems a bit more promising, as this doesn't require me to write functions that mutate R code, e.g. via the body() idea.
Note 3. I don't know whether or not there is some simple way to test the presence of non-finite values, e.g. meta information about objects that indexes where NAs, Infs, etc. are stored in the object, or if these are stored in place. So far, I've tried Simon Urbanek's inspect package, and have not found a way to divine the presence of non-numeric values.
Follow-up: Simon Urbanek has pointed out in a comment that such information is not available as meta information for objects.
Note 4. I'm still testing the ideas presented. Also, as suggested by Simon, testing for the presence of non-finite values should be fastest in C/C++; that should surpass even compiled R code, but I'm open to anything. For large datasets, e.g. on the order of 10-50GB, this should be a substantial savings over copying the data. One may get further improvements via use of multiple cores, but that's a bit more advanced.
The idea sketched below (and its implementation) is very imperfect. I'm hesitant to even suggest it, but: (a) I think it's kind of interesting, even in all of its ugliness; and (b) I can think of situations where it would be useful. Given that it sounds like you are right now manually inserting a check after each computation, I'm hopeful that your situation is one of those.
Mine is a two-step hack. First, I define a function nanDetector() which is designed to detect NaNs in several of the object types that might be returned by your calculations. Then, it using addTaskCallback() to call the function nanDetector() on .Last.value after each top-level task/calculation is completed. When it finds an NaN in one of those returned values, it throws an error, which you can use to avoid any further computations.
Among its shortcomings:
If you do something like setting stop(error = recover), it's hard to tell where the error was triggered, since the error is always thrown from inside of stopOnNaNs().
When it throws an error, stopOnNaNs() is terminated before it can return TRUE. As a consequence, it is removed from the task list, and you'll need to reset with addTaskCallback(stopOnNaNs) it you want to use it again. (See the 'Arguments' section of ?addTaskCallback for more details).
Without further ado, here it is:
# Sketch of a function that tests for NaNs in several types of objects
nanDetector <- function(X) {
# To examine data frames
if(is.data.frame(X)) {
return(any(unlist(sapply(X, is.nan))))
}
# To examine vectors, matrices, or arrays
if(is.numeric(X)) {
return(any(is.nan(X)))
}
# To examine lists, including nested lists
if(is.list(X)) {
return(any(rapply(X, is.nan)))
}
return(FALSE)
}
# Set up the taskCallback
stopOnNaNs <- function(...) {
if(nanDetector(.Last.value)) {stop("NaNs detected!\n")}
return(TRUE)
}
addTaskCallback(stopOnNaNs)
# Try it out
j <- 1:00
y <- rnorm(99)
l <- list(a=1:4, b=list(j=1:4, k=NaN))
# Error in function (...) : NaNs detected!
# Subsequent time consuming code that could be avoided if the
# error thrown above is used to stop its evaluation.
I fear there is no such shortcut. In theory on unix there is SIGFPE that you could trap on, but in practice
there is no standard way to enable FP operations to trap it (even C99 doesn't include a provision for that) - it is highly system-specifc (e.g. feenableexcept on Linux, fp_enable_all on AIX etc.) or requires the use of assembler for your target CPU
FP operations are nowadays often done in vector units like SSE so you can't be even sure that FPU is involved and
R intercepts some operations on things like NaNs, NAs and handles them separately so they won't make it to the FP code
That said, you could hack yourself an R that will catch some exceptions for your platform and CPU if you tried hard enough (disable SSE etc.). It is not something we would consider building into R, but for a special purpose it may be doable.
However, it would still not catch NaN/NA operations unless you change R internal code. In addition, you would have to check every single package you are using since they may be using FP operations in their C code and may also handle NA/NaN separately.
If you are only worried about things like division by zero or over/underflows, the above will work and is probably the closest to something like a solution.
Just checking your results may not be very reliable, because you don't know whether a result is based on some intermediate NaN calculation that changed an aggregated value which may not need to be NaN as well. If you are willing to discard such case, then you could simply walk recursively through your result objects or the workspace. That should not be extremely inefficient, because you only need to worry about REALSXP and not anything else (unless you don't like NAs either - then you'd have more work).
This is an example code that could be used to traverse R object recursively:
static int do_isFinite(SEXP x) {
/* recurse into generic vectors (lists) */
if (TYPEOF(x) == VECSXP) {
int n = LENGTH(x);
for (int i = 0; i < n; i++)
if (!do_isFinite(VECTOR_ELT(x, i))) return 0;
}
/* recurse into pairlists */
if (TYPEOF(x) == LISTSXP) {
while (x != R_NilValue) {
if (!do_isFinite(CAR(x))) return 0;
x = CDR(x);
}
return 1;
}
/* I wouldn't bother with attributes except for S4
where attributes are slots */
if (IS_S4_OBJECT(x) && !do_isFinite(ATTRIB(x))) return 0;
/* check reals */
if (TYPEOF(x) == REALSXP) {
int n = LENGTH(x);
double *d = REAL(x);
for (int i = 0; i < n; i++) if (!R_finite(d[i])) return 0;
}
return 1;
}
SEXP isFinite(SEXP x) { return ScalarLogical(do_isFinite(x)); }
# in R: .Call("isFinite", x)
Hey there,
I have a mathematical function (multidimensional which means that there's an index which I pass to the C++-function on which single mathematical function I want to return. E.g. let's say I have a mathematical function like that:
f = Vector(x^2*y^2 / y^2 / x^2*z^2)
I would implement it like that:
double myFunc(int function_index)
{
switch(function_index)
{
case 1:
return PNT[0]*PNT[0]*PNT[1]*PNT[1];
case 2:
return PNT[1]*PNT[1];
case 3:
return PNT[2]*PNT[2]*PNT[1]*PNT[1];
}
}
whereas PNT is defined globally like that: double PNT[ NUM_COORDINATES ]. Now I want to implement the derivatives of each function for each coordinate thus generating the derivative matrix (columns = coordinates; rows = single functions). I wrote my kernel already which works so far and which call's myFunc().
The Problem is: For calculating the derivative of the mathematical sub-function i concerning coordinate j, I would use in sequential mode (on CPUs e.g.) the following code (whereas this is simplified because usually you would decrease h until you reach a certain precision of your derivative):
f0 = myFunc(i);
PNT[ j ] += h;
derivative = (myFunc(j)-f0)/h;
PNT[ j ] -= h;
now as I want to do this on the GPU in parallel, the problem is coming up: What to do with PNT? As I have to increase certain coordinates by h, calculate the value and than decrease it again, there's a problem coming up: How to do it without 'disturbing' the other threads? I can't modify PNT because other threads need the 'original' point to modify their own coordinate.
The second idea I had was to save one modified point for each thread but I discarded this idea quite fast because when using some thousand threads in parallel, this is a quite bad and probably slow (perhaps not realizable at all because of memory limits) idea.
'FINAL' SOLUTION
So how I do it currently is the following, which adds the value 'add' on runtime (without storing it somewhere) via preprocessor macro to the coordinate identified by coord_index.
#define X(n) ((coordinate_index == n) ? (PNT[n]+add) : PNT[n])
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
//*// Example: f[i] = x[i]^3
return (X(function_index)*X(function_index)*X(function_index));
// */
}
That works quite nicely and fast. When using a derivative matrix with 10000 functions and 10000 coordinates, it just takes like 0.5seks. PNT is defined either globally or as constant memory like __constant__ double PNT[ NUM_COORDINATES ];, depending on the preprocessor variable USE_CONST.
The line return (X(function_index)*X(function_index)*X(function_index)); is just an example where every sub-function looks the same scheme, mathematically spoken:
f = Vector(x0^3 / x1^3 / ... / xN^3)
NOW THE BIG PROBLEM ARISES:
myFunc is a mathematical function which the user should be able to implement as he likes to. E.g. he could also implement the following mathematical function:
f = Vector(x0^2*x1^2*...*xN^2 / x0^2*x1^2*...*xN^2 / ... / x0^2*x1^2*...*xN^2)
thus every function looking the same. You as a programmer should only code once and not depending on the implemented mathematical function. So when the above function is being implemented in C++, it looks like the following:
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
double ret = 1.0;
for(int i = 0; i < NUM_COORDINATES; i++)
ret *= X(i)*X(i);
return ret;
}
And now the memory accesses are very 'weird' and bad for performance issues because each thread needs access to each element of PNT twice. Surely, in such a case where each function looks the same, I could rewrite the complete algorithm which surrounds the calls to myFunc, but as I stated already: I don't want to code depending on the user-implemented function myFunc...
Could anybody come up with an idea how to solve this problem??
Thanks!
Rewinding back to the beginning and starting with a clean sheet, it seems you want to be able to do two things
compute an arbitrary scalar valued
function over an input array
approximate the partial derivative of an arbitrary scalar
valued function over the input array
using first order accurate finite differencing
While the function is scalar valued and arbitrary, it seems that there are, in fact, two clear forms which this function can take:
A scalar valued function with scalar arguments
A scalar valued function with vector arguments
You appeared to have started with the first type of function and have put together code to deal with computing both the function and the approximate derivative, and are now wrestling with the problem of how to deal with the second case using the same code.
If this is a reasonable summary of the problem, then please indicate so in a comment and I will continue to expand it with some code samples and concepts. If it isn't, I will delete it in a few days.
In comments, I have been trying to suggest that conflating the first type of function with the second is not a good approach. The requirements for correctness in parallel execution, and the best way of extracting parallelism and performance on the GPU are very different. You would be better served by treating both types of functions separately in two different code frameworks with different usage models. When a given mathematical expression needs to be implemented, the "user" should make a basic classification as to whether that expression is like the model of the first type of function, or the second. The act of classification is what drives algorithmic selection in your code. This type of "classification by algorithm" is almost universal in well designed libraries - you can find it in C++ template libraries like Boost and the STL, and you can find it in legacy Fortran codes like the BLAS.
I wanted to know if there is any limit on the number of arguments that are set to kernel function in OpenCL. I am getting the error as INVALID_ARG_INDEX while setting arguments. I am setting 9 arguments in the kernel function. Please help me in this regard.
You might try calling the following function: www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetDeviceInfo.html
The only argument limits seem to concern:
-CL_DEVICE_MAX_CONSTANT_ARGS (arguments which are pointers to the __constant memory space)
-CL_DEVICE_MAX_READ_IMAGE_ARGS
-CL_DEVICE_MAX_WRITE_IMAGE_ARGS
Max read image arguments count should not be a problem, however max constant arguments count and max write image arguments count should be at least 8 on all devices. I might be that you are passing 9 write images on a device that only accepts 8, for example...