Smallest/Largest float or double precision constant in OpenCL - opencl

I need the smallest/largest floating point number in an OpenCL kernel program which involves computing log probability. I had a look on OpenCL reference manual but I cannot locate such constants.
In Java it is equivalent to something like Double.MAX_VALUE;
In C it is in float.h

They are called FLT_MAX / FLT_MIN, and DBL_MAX / DBL_MIN.
See Macros and Limits in the OpenCL 1.2 spec.

Well one obvious solution is to just use 1E+37. But it would be good to know if OpenCL has already defined such constants.

Related

is there a maximum variable size in GLPK?

I'm looking into using GLPK to solve a ILP. Some of my constraints have the following form
I * W <= A
Where I is the variable and W and A are constants. W though, could be very, very large. An example value might be 2251799813685248, and may be even larger. Therefore, if GLPK uses standard primitives under the hood, there could be an issue.
So my question is, is GLPK subject to machine precision (i.e. 32 bit) or does GLPK use variable precision (i.e. mathematical ints without memory bound limits)? If not, are there any other open source packages that support variable precision?
If you configure --with-gmp when you build GLPK, it'll use the
GNU Multiple-Precision library
GMP; download, build that first. (configure -h:
--with-gmp: use GNU MP bignum library [[default=no]]
I don't know of a problem where GMP make a difference; please let us know.

Optimal NEON vector structure for processing vectors of uint8_t type with Arm Cortex-A8 (32-bit)

I am doing some image processing on an embedded system (BeagleBone Black) using OpenCV and need to write some code to take advantage of NEON optimization. Specifically, I would like to write a NEON optimized thresholding function and then a NEON optimized erosion/dilation function.
This is my first time writing NEON code and I don't have experience writing assmbly code, so I have been looking at examples and resources for the C-style NEON intrinsics. I believe that I can put some working code together, but am not sure how I should structure the vectors. According to page 2 of the "ARM NEON support in the ARM compiler" white paper:
"These registers can hold "vectors" of items which are 8, 16, 32 or 64
bits. The traditional advice when optimizing or porting algorithms
written in C/C++ is to use the natural type of the machine for data
handling (in the case of ARM 32 bits). The unwanted bits can then be
discarded by casting and/or shifting before storing to memory."
What exactly does this mean? Do I need to to restrict my NEON code to using uint32x4_t vectors rather than uint8x16_t? How would I go about loading the registers? Or does this mean than I need to take some special steps when using vst1q_u8 to store the data to memory?
I did find this example, which is untested but uses the uint8x16_t type. Does it adhere to the "32-bit" advice given above?
I would really appreciate it if someone could please elaborate on the above quotation and maybe provide a very simple working example.
The next sentence from the document you linked gives your answer.
The ability of NEON to specify the data width in the instruction and
hence use the whole register width for useful information means
keeping the natural type for the algorithm is both possible and
preferable.
Note, the document is distinguishing between the natural type of the machine (32-bit) and the natural type of the algorithm (in your case uint8_t).
The document is saying that in the past you would have written your code in such a way that it used 32-bit integers so that it could use the efficient machine instructions suited for 32-bit operations.
With Neon, this is not necessary. It is more useful to use the data type you actually want to use, as Neon can efficiently operate on those data types.
It will depend on your algorithm as to the optimal choice of register width (uint8x8_t or uint8x16_t).
To give a simple example of using the Neon intrinsics to add two sets of uint8_t:
#include <arm_neon.h>
void
foo (uint8_t a, uint8_t *b, uint8_t *c)
{
uint8x16_t t1 = vld1q_u8 (a);
uint8x16_t t2 = vld1q_u8 (b);
uint8x16_t t3 = vaddq_u8 (a, b);
vst1q_u8 (c, t3);
}

Floating point math on a processor that does not support it?

How is floating point math performed on a processor with no floating point unit ? e.g low-end 8 bit microcontrollers.
Have a look at this article: http://www.edwardrosten.com/code/fp_template.html
(from this article)
First you have to think about how to represent a floating point number in memory:
struct this_is_a_floating_point_number
{
static const unsigned int mant = ???;
static const int expo = ???;
static const bool posi = ???;
};
Then you'd have to consider how to do basic calculations with this representation. Some might be easy to implement and be rather fast at runtime (multiply or divide by 2 come to mind)
Division might be harder and, for instance, Newtons algorithm could be used to calculate the answer.
Finally, smart approximations and generated values in tables might speed up the calculations at run time.
Many years ago C++ templates helped me getting floating point calculations on an Intel 386 SX
In the end I learned a lot of math and C++ but decided at the same time to buy a co-processor.
Especially the polynomial algorithms and the smart lookup tables; who needs a cosine or tan function when you have sine function, helped a lot in thinking about using integers for floating point arithmetic. Taylor series were a revelation too.
In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit.
Take a look at the wikipedia page: Floating-point_unit#Floating-point_library as you might find more info.
It is not actually the cpu who emulates the instructions. The floating point operations for low end cpu's are made out of integer arithmetic instructions and the compiler is the one which generates those instructions. Basically the compiler (tool chain) comes with a floating point library containing floating point functions.
The short answer is "slowly". Specialized hardware can do tasks like extracting groups of bits that are not necessarily byte-aligned very fast. Software can do everything that can be done by specialized hardware, but tends to take much longer to do it.
Read "The complete Spectrum ROM disassembly" at http://www.worldofspectrum.org/documentation.html to see examples of floating point computations on an 8 bit Z80 processor.
For things like sine functions, you precompute a few values then interpolate using Chebyshev polynomials.

math.h functions in fixed-point (32,32) format (64-bit) library for C

I'm looking for a 64-bit fixed-point (32,32) library for one of my C implementations.
Similar to this one http://code.google.com/p/libfixmath/
Need support for standard math.h operation.
Did anyone see such implementations?
fixedptc seems to be what you look for. It is a C, header-only and integer-only library for fixed point operations, located at http://www.sourceforge.net/projects/fixedptc
Bitwidth is settable through defines. In your case, you want to compile with -DFIXEDPT_BITS=64 -DFIXED_WBITS=32 to get a (32,32) fixed point number format.
Implemented functions are conversion to string, multiplication, division, square root, sine, cosine, tangent, exponential, power, natural logarithm and arbitrary-base logarithm.

OpenCL Fast Relaxed Math

What does the OpenCL compiler option -cl-fast-relaxed-math do?
From reading the documentation - it looks like -cl-fast-relaxed-math allows a kernel to do floating point math on any variables - even if those variables point to the wrong data type, cause division by zero, or some other illegal behavior.
Is this correct? In what situation would this compiler option be useful?
From comments:
Enables -cl-finite-math-only and -cl-unsafe-math-optimizations. These two options provide aditional speed by removing some checks to the input values. IE: Not check for NaN numbers. However, if the input values happend to BE non normal numbers, the results are unknown. – DarkZeros

Resources