Hacks for clamping integer to 0-255 and doubles to 0.0-1.0? - math

Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.

This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* #param v any int value
* #return {#code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}

Note that your compiler may already give you what you want if you code value = min (value, 255). This may be translated into a MIN instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc on x86 and ISETcc on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?: can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP on NVIDIA GPUs. Or it provides CMOV (conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN and FMAX instructions which map straight to the C/C++ standard math functions fmin() and fmax(). Alternatively fmin() and fmax() may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);

I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}

For those using C#, Kotlin or Java this is the best I could do, it's nice and succinct if somewhat cryptic:
(x & ~(x >> 31) | 255 - x >> 31) & 255
It only works on signed integers so that might be a blocker for some.

For clamping doubles, I'm afraid there's no language/platform agnostic solution.
The problem with floating point that they have options from fastest operations (MSVC /fp:fast, gcc -funsafe-math-optimizations) to fully precise and safe (MSVC /fp:strict, gcc -frounding-math -fsignaling-nans). In fully precise mode the compiler does not try to use any bit hacks, even if they could.
A solution that manipulates double bits cannot be portable. There may be different endianness, also there may be no (efficient) way to get double bits, double is not necessarily IEEE 754 binary64 after all. Plus direct manipulations will not cause signals for signaling NANs, when they are expected.
For integers most likely the compiler will do it right anyway, otherwise there are already good answers given.

Related

Simple low pass filter in fixed point

I have a simple circuit setup to read the light level via an LDR into an Arduino. I'm trying to implement a simple low pass filter to data read in. How best to tackle this given that analogRead() returns an unsigned int.
I have tried to implement a simple fixed point representation but am unsure if this is the correct approach.
Here's a code snippet:
#define WLPF 0.1
#define FIXED_SHIFT 4
ldr_val = ((int)analogRead(A0)) << FIXED_SHIFT;
while (true) {
int newval = (int)analogRead(A0) << FIXED_SHIFT;
ldr_val += WLPF*(newval - ldr_val);
Serial.println(ldr_val >> FIXED_SHIFT, DEC);
}
Note the resolution of the ADC is 10 bits and I am working with an 8-bit Arduino Micro.
I'm paraphrasing from the book "Musical Applications of Microprocessors" by Hal Chamberlin, page 438:
If you allow large numbers in the accumulator, then you can make a first-order low-pass filter with one multiplication and some right-shifts.
out = accum >> k
accum = accum - out + in
Choose 'k' to change the cutoff frequency. The more shifts, the lower the low-pass cutoff, but the larger the value in the accumulator. With a 10-bit value from analog_read(), you can easily right-shift 4 places, and still have 2 bits of headroom in the accumulator (as #datafiddler noted above).
Cypress has some app-notes for their PSOC chips with similar equations, and using shifts. I remember one had a nice table that related number of shifts to the cutoff frequency.
The approximate cutoff frequency is the sampling frequency divided by 2-pi times the gain factor:
f0 ~ fs / (2 pi a)
where 'a' is that power of two.
Keep smoothin' those signals!
On a device with no FPU rather then multiplying by 0.1 (which in any case make this a floating not fixed point implementation) you should divide by 10:
#define WLPF_DIV 10
...
ldr_val += (newval - ldr_val) / WLPF_DIV;
However division on an 8 bit processor is often expensive (although probably dwarfed by the execution time of Serial.println() in the loop - but that is a different issue). Instead it is more efficient to select a power of two so that the division can be performed with a right-shift.
#define WLPF_SHIFT 3 // divide by 8
...
ldr_val += (newval - ldr_val) >> WLPF_SHIFT ;
The use of signed int is problematic since right-shift of a signed type is undefined behaviour. In this case this can be resolved by changing the code to:
#define WLPF_DIV 8
...
ldr_val += (newval - ldr_val) / WLPF_DIV ;
The compiler will most likely spot the power-of-two constant and generate the code using an arithmetic-shift-right in any case. However you would probably do better to reconsider the data type.
You still have a right-shift in the Serial.println() call, but that too could by replaced with a divide-by-16:
#define WLPF_DIV 8
#define FIXED_MUL 16
ldr_val = (int)analogRead(A0) * FIXED_MUL ;
for(;;)
{
int newval = (int)analogRead(A0) * FIXED_MUL ;
ldr_val += (newval - ldr_val) / WLPF_DIV
Serial.println(ldr_val / FIXED_MUL, DEC);
}
Non-deterministic output of the data on a per sample basis is not going to make for a very accurate filter and will dominate the timing in any case so you have little control over the frequency response and it will not be stable. It also makes the previous performance optimisations rather pointless. You may want to think about that if it is important in your application - but that is a different question.
Stick with integer arithmetics:
#define WLPF 9
filtered = ((long)filtered * WLPF + newValue) / (WLPF + 1);

Microcontroller, How to display decimal on LCD?

I have a microcontroller and I am sampling the values of an LM335 temperature sensor.
The LCD library that I have allows me to display the hexadecimal value sampled by the 10-bit ADC.
10bit ADC gives me values from 0x0000 to 0x03FF.
What I am having trouble is trying to convert the hexadecimal value to a format that can be understood by regular humans.
Any leads would be greatly appreciated, since I am completely lost on the issue.
You could create a "string" into which you construct the decimal number like this (constants depend on what size the value actually, I presume 0-255, whether You want it to be null-terminated, etc.):
char result[4];
char i = 3;
do {
result[i] = '0' + value % 10;
value /= 10;
i--;
}
while (value > 0);
Basically, your problem is how to split a number into decimal digits so you can use your LCD library and send one digit to each cell.
If your LCD is based on 7-segment cells, then you need to output a value from 0 to 9 for each digit, not an ASCII code. The solution by #Roman Hocke is fine for this, provided that you don't add '0' to value % 10
Another way to split a number into digits is to convert it into BCD. For that, there is an algorithm named "double dabble" which allows you to convert your number into BCD without using divisions nor module operations, which can be nice if your microcontroller has no provision for division operation, or this is slower than you need.
"Double dable" algorithm sounds perfect for microcontrollers without provision for the division operation. However, a quick oversight of such algorithm in the Wikipedia shows that it uses dynamic memory, which seems to be worst than a routine for division. Of course, there must be an implementation out there that are not using calls to malloc() and friends.
Just to point out that Roman Hocke's snippet code has a little mistake. This version works ok for decimals in the range 0-255. It can be easily expand it to any range:
void dec2str(uint8_t val, char * res)
{
uint8_t i = 2;
do {
res[i] = '0' + val % 10;
val /= 10;
i--;
} while (val > 0);
res[3] = 0;
}

Multiply number by 10 n times

Is there a better mathematical way to multiply a number by 10 n times in Dart than the following (below). I don't want to use the math library, because it would be overkill. It's no big deal; however if there's a better (more elegant) way than the "for loop", preferably one line, I'd like to know.
int iDecimals = 3;
int iValue = 1;
print ("${iValue} to power of ${iDecimals} = ");
for (int iLp1 = 1; iLp1 <= iDecimals; iLp1++) {
iValue *= 10;
}
print ("${iValue}");
You are not raising to a power of ten, you are multiplying by a power of ten. That is in your code the answer will be iValue * 10^(iDecimals) while raising to a power means iValue^10.
Now, your code still contains exponentiation and what it does is raises ten to the power iDecimals and then multiplies by iValue. Raising may be made way more efficient. (Disclaimer: I've never written a line of dart code before and I don't have an interpreter to test, so this might not work right away.)
int iValue = 1;
int p = 3;
int a = 10;
// The following code raises `a` to the power of `p`
int tmp = 1;
while (p > 1) {
if (p % 2 == 0) {
p /= 2;
} else {
c *= a;
p = (p - 1) / 2;
}
a *= a;
}
a *= t;
// in our example now `a` is 10^3
iValue *= a;
print ("${iValue}");
This exponentiation algorithm is very straightforward and it is known as Exponentiation by squaring.
Use the math library. Your idea of doing so being "overkill" is misguided. The following is easier to write, easier to read, fewer lines of code, and most likely faster than anything you might replace it with:
import 'dart:math';
void main() {
int iDecimals = 3;
int iValue = 1;
print("${iValue} times ten to the power of ${iDecimals} = ");
iValue *= pow(10, iDecimals);
print(iValue);
}
Perhaps you're deploying to JavaScript, concerned about deployment size, and unaware that dart2js does tree shaking?
Finally, if you do want to raise a number to the power of ten, as you asked for but didn't do, simply use pow(iValue, 10).
Considering that you don't want to use any math library, i think this is the best way to compute the power of a number. The time complexity of this code snippet also seems minimal. If you need a one line solution you will have to use some math library function.
Btw, you are not raising to the power but simply multiplying a number with 10 n times.
Are you trying to multiply something by a power of 10? If so, I believe Dart supports scientific notation. So the above value would be written as: iValue = 1e3;
Which is equal to 1000. If you want to raise the number itself to the power of ten, I think your only other option is to use the Math library.
Because the criteria was that the answer needed to not require the math library and needed to be fast and ideally a mathematical-solution (not String), and because using the exponential solution requires too much overhead - String, double, integer, I think that the only answer that meets the criteria is as follows :
for (int iLp1=0; iLp1<iDecimal; iLp1++, iScale*=10);
It is quite fast, doesn't require the "math" library, and is a one-liner

Unexpected integer math results on Arduino

I'm trying to smoothly transition an RGB LED from one colour to another. As part of the logic for this I have the following function to determine how big the change will be (it multiplies by a factor f to avoid floating-point math):
int colorDelta(int from, int to, int f) {
int delta;
if (to == from) {
delta = 0;
} else {
delta = (to - from) * f;
}
return delta;
}
When I call colorDelta(0, 255, 1000) I expect the result to be -255000 but instead the function returns 7144.
I've tried performing the operation as directly as possible for debugging, but Serial.print((0 - 255) * 1000, DEC); also writes 7144 to the serial port.
What have I foolishly overlooked here? I'd really like to see the (smoothly transitioning) light. ;)
I would suspect an integer overflow: the int type being incapable of holding -255000. By language standard, signed integer overflow is undefined behavior, but in practice the major bits of a result are usually just thrown away (warning: this observation is not meant to be used in writing code, because undefined behavior remains undefined; it's just for those cases when you have to reason about the program that is known to be wrong).
A good way to check it quickly is computing a difference between your real result and your expected one: -255000 - 7144 = -262144. The latter is -(1<<18), which is the indication that my suspicions are well-founded.

number squared in programming

I know this is probably a very simple question but how would I do something like
n2 in a programming language?
Is it n * n? Or is there another way?
n * n is the easiest way.
For languages that support the exponentiation operator (** in this example), you can also do n ** 2
Otherwise you could use a Math library to call a function such as pow(n, 2) but that is probably overkill for simply squaring a number.
n * n will almost always work -- the couple cases where it won't work are in prefix languages (Lisp, Scheme, and co.) or postfix languages (Forth, Factor, bc, dc); but obviously then you can just write (* n n) or n n* respectively.
It will also fail when there is an overflow case:
#include <limits.h>
#include <stdio.h>
int main()
{
volatile int x = INT_MAX;
printf("INT_MAX squared: %d\n", x * x);
return 0;
}
I threw the volatile quantifier on there just to point out that this can be compiled with -Wall and not raise any warnings, but on my 32-bit computer this says that INT_MAX squared is 1.
Depending on the language, you might have a power function such as pow(n, 2) in C, or math.pow(n, 2) in Python... Since those power functions cast to floating-point numbers, they are more useful in cases where overflow is possible.
There are many programming languages, each with their own way of expressing math operations.
Some common ones will be:
x*x
pow(x,2)
x^2
x ** 2
square(x)
(* x x)
If you specify a specific language, we can give you more guidance.
If n is an integer :p :
int res=0;
for(int i=0; i<n; i++)
res+=n; //res=n+n+...+n=n*n
For positive integers you may use recursion:
int square(int n){
if (n>1)
return square(n-1)+(n-1)+n;
else
return 1;
}
Calculate using array allocation (extremely sub-optimal):
#include <iostream>
using namespace std;
int heapSquare(int n){
return sizeof(char[n][n]);
}
int main(){
for(int i=1; i<=10; i++)
cout << heapSquare(i) << endl;
return 0;
}
Using bit shift (ancient Egyptian multiplication):
int sqr(int x){
int i=0;
int result = 0;
for (;i<32;i++)
if (x>>i & 0x1)
result+=x << i;
return result;
}
Assembly:
int x = 10;
_asm_ __volatile__("imul %%eax,%%eax"
:"=a"(x)
:"a"(x)
);
printf("x*x=%d\n", x);
Always use the language's multiplication, unless the language has an explicit square function. Specifically avoid using the pow function provided by most math libraries. Multiplication will (except in the most outrageous of circumstances) always be faster, and -- if your platform conforms to the IEEE-754 specification, which most platforms do -- will deliver a correctly-rounded result. In many languages, there is no standard governing the accuracy of the pow function. It will generally give a high-quality result for such a simple case (many library implementations will special-case squaring to save programmers from themselves), but you don't want to depend on this[1].
I see a tremendous amount of C/C++ code where developers have written:
double result = pow(someComplicatedExpression, 2);
presumably to avoid typing that complicated expression twice or because they think it will somehow slow down their code to use a temporary variable. It won't. Compilers are very, very good at optimizing this sort of thing. Instead, write:
const double myTemporaryVariable = someComplicatedExpression;
double result = myTemporaryVariable * myTemporaryVariable;
To sum up: Use multiplication. It will always be at least as fast and at least as accurate as anything else you can do[2].
1) Recent compilers on mainstream platforms can optimize pow(x,2) into x*x when the language semantics allow it. However, not all compilers do this at all optimization settings, which is a recipe for hard to debug rounding errors. Better not to depend on it.
2) For basic types. If you really want to get into it, if multiplication needs to be implemented in software for the type that you are working with, there are ways to make a squaring operation that is faster than multiplication. You will almost never find yourself in a situation where this matters, however.

Resources