Unexpected integer math results on Arduino - arduino

I'm trying to smoothly transition an RGB LED from one colour to another. As part of the logic for this I have the following function to determine how big the change will be (it multiplies by a factor f to avoid floating-point math):
int colorDelta(int from, int to, int f) {
int delta;
if (to == from) {
delta = 0;
} else {
delta = (to - from) * f;
}
return delta;
}
When I call colorDelta(0, 255, 1000) I expect the result to be -255000 but instead the function returns 7144.
I've tried performing the operation as directly as possible for debugging, but Serial.print((0 - 255) * 1000, DEC); also writes 7144 to the serial port.
What have I foolishly overlooked here? I'd really like to see the (smoothly transitioning) light. ;)

I would suspect an integer overflow: the int type being incapable of holding -255000. By language standard, signed integer overflow is undefined behavior, but in practice the major bits of a result are usually just thrown away (warning: this observation is not meant to be used in writing code, because undefined behavior remains undefined; it's just for those cases when you have to reason about the program that is known to be wrong).
A good way to check it quickly is computing a difference between your real result and your expected one: -255000 - 7144 = -262144. The latter is -(1<<18), which is the indication that my suspicions are well-founded.

Related

Understanding the method for OpenCL reduction on float

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

how to rewrite the recursive solution to iterative one

The problem is derive from OJ.
The description is :
We are playing the Guess Game. The game is as follows:
I pick a number from 1 to n. You have to guess which number I picked.
Every time you guess wrong, I'll tell you whether the number I picked is higher or lower.
However, when you guess a particular number x, and you guess wrong, you pay $x. You win the game when you guess the number I picked.
Given a particular n ≥ 1, find out how much money you need to have to guarantee a win.
I write small snippet about MinMax problem in recursion. But it is slow and I want to rewrite it in a iterative way. Could anyone help with that and give me the idea about how you convert the recursive solution to iterative one? Any idea is appreciated. The code is showed below:
public int getMoneyAmount(int n) {
int[][] dp = new int[n + 1][n + 1];
for(int i = 0; i < dp.length; i++)
Arrays.fill(dp[i], -1);
return solve(dp, 1, n);
}
private int solve(int[][] dp, int left, int right){
if(left >= right){
return 0;
}
if(dp[left][right] != -1){
return dp[left][right];
}
dp[left][right] = Integer.MAX_VALUE;
for(int i = left; i <= right; i++){
dp[left][right] = Math.min(dp[left][right], i + Math.max(solve(dp, left, i - 1),solve(dp, i + 1, right)));
}
return dp[left][right];
}
In general, you convert using some focused concepts:
Replace the recursion with a while loop -- or a for loop, if you can pre-determine how many iterations you need (which you can do in this case).
Within the loop, check for the recursion's termination conditions; when you hit one of those, skip the rest of the loop.
Maintain local variables to replace the parameters and return value.
The loop termination is completion of the entire problem. In your case, this would be filling out the entire dp array.
The loop body consists of the computations that are currently in your recursion step: preparing the arguments for the recursive call.
Your general approach is to step through a nested (2-D) loop to fill out your array, starting from the simplest cases (left = right) and working your way to the far corner (left = 1, right = n). Note that your main diagonal is 0 (initialize that before you get into the loop), and your lower triangle is unused (don't even bother to initialize it).
For the loop body, you should be able to derive how to fill in each succeeding diagonal (one element shorter in each iteration) from the one you just did. That assignment statement is the body. In this case, you don't need the recursion termination conditions: the one that returns 0 is what you cover in initialization; the other you never hit, controlling left and right with your loop indices.
Are these enough hints to get you moving?

Calculating the average of Sensor Data (Capacitive Sensor)

So I am starting to mess around with Capacitive sensors and all because its some pretty cool stuff.
I have followed some tutorials online about how to set it up and use the CapSense library for Arduino and I just had a quick question about this code i wrote here to get the average for that data.
void loop() {
long AvrNum;
int counter = 0;
AvrNum += cs_4_2.capacitiveSensor(30);
counter++;
if (counter = 10) {
long AvrCap = AvrNum/10;
Serial.println(AvrCap);
counter = 0;
}
}
This is my loop statement and in the Serial it seems like its working but the numbers just look suspiciously low to me. I'm using a 10M resistor (brown, black, black, green, brown) and am touching a piece of foil that both the send and receive pins are attached to (electrical tape) and am getting numbers around about 650, give or take 30.
Basically I'm asking if this code looks right and if these numbers make sense...?
The language used in the Arduino environment is really just an unenforced subset of C++ with the main() function hidden inside the framework code supplied by the IDE. Your code is a module that will be compiled and linked to the framework. When the framework starts running it first initializes itself then your module by calling the function setup(). Once initialized, the framework enters an infinite loop, calling your modules function loop() on each iteration.
Your code is using local variables in loop() and expecting that they will hold their values from call to call. While this might happen in practice (and likely does since that part of framework's main() is probably just while(1) loop();), this is invoking the demons of Undefined Behavior. C++ does not make any promises about the value of an uninitialized variable, and even reading it can cause anything to happen. Even apparently working.
To fix this, the accumulator AvrNum and the counter must be stored somewhere other than on loop()'s stack. They could be declared static, or moved to the module outside. Outside is better IMHO, especially in the constrained Arduino environment.
You also need to clear the accumulator after you finish an average. This is the simplest form of an averaging filter, where you sum up fixed length blocks of N samples, and then use that average each Nth sample.
I believe this fragment (untested) will work for you:
long AvrNum;
int counter;
void setup() {
AvrNum = 0;
counter = 0;
}
void loop() {
AvrNum += cs_4_2.capacitiveSensor(30);
counter++;
if (counter == 10) {
long AvrCap = AvrNum/10;
Serial.println(AvrCap);
counter = 0;
AvrNum = 0;
}
}
I provided a setup(), although it is redundant with the C++ language's guarantee that the global variables begin life initialized to 0.
your line if (counter = 10) is invalid. It should be if (counter == 10)
The first sets counter to 10 and will (of course) evaluate to true.
The second tests for counter equal to 10 and will not evaluate to true until counter is, indeed, equal to 10.
Also, kaylum mentions the other problem, no initialization of AvrNum
This is What I ended up coming up with after spending some more time on it. After some manual calc it gets all the data.
long AvrArray [9];
for(int x = 0; x <= 10; x++){
if(x == 10){
long AvrMes = (AvrArray[0] + AvrArray[1] + AvrArray[2] + AvrArray[3] + AvrArray[4] + AvrArray[5] + AvrArray[6] + AvrArray[7] + AvrArray[8] + AvrArray[9]);
long AvrCap = AvrMes/x;
Serial.print("\t");
Serial.println(AvrCap);
x = 0;
}
AvrArray[x] = cs_4_2.capacitiveSensor(30);
Serial.println(AvrArray[x]);
delay(500);

Hacks for clamping integer to 0-255 and doubles to 0.0-1.0?

Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.
This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* #param v any int value
* #return {#code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}
Note that your compiler may already give you what you want if you code value = min (value, 255). This may be translated into a MIN instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc on x86 and ISETcc on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?: can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP on NVIDIA GPUs. Or it provides CMOV (conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN and FMAX instructions which map straight to the C/C++ standard math functions fmin() and fmax(). Alternatively fmin() and fmax() may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);
I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}
For those using C#, Kotlin or Java this is the best I could do, it's nice and succinct if somewhat cryptic:
(x & ~(x >> 31) | 255 - x >> 31) & 255
It only works on signed integers so that might be a blocker for some.
For clamping doubles, I'm afraid there's no language/platform agnostic solution.
The problem with floating point that they have options from fastest operations (MSVC /fp:fast, gcc -funsafe-math-optimizations) to fully precise and safe (MSVC /fp:strict, gcc -frounding-math -fsignaling-nans). In fully precise mode the compiler does not try to use any bit hacks, even if they could.
A solution that manipulates double bits cannot be portable. There may be different endianness, also there may be no (efficient) way to get double bits, double is not necessarily IEEE 754 binary64 after all. Plus direct manipulations will not cause signals for signaling NANs, when they are expected.
For integers most likely the compiler will do it right anyway, otherwise there are already good answers given.

FSM data structure design

I want to write an FSM which starts with an idle state and moves from one state to another based on some event. I am not familiar with coding of FSM and google didn't help.
Appreciate if someone could post the C data structure that could be used for the same.
Thanks,
syuga2012
We've implemented finite state machine for Telcos in the past and always used an array of structures, pre-populated like:
/* States */
#define ST_ANY 0
#define ST_START 1
: : : : :
/* Events */
#define EV_INIT 0
#define EV_ERROR 1
: : : : :
/* Rule functions */
int initialize(void) {
/* Initialize FSM here */
return ST_INIT_DONE
}
: : : : :
/* Structures for transition rules */
typedef struct {
int state;
int event;
(int)(*fn)();
} rule;
rule ruleset[] = {
{ST_START, EV_INIT, initialize},
: : : : :
{ST_ANY, EV_ERROR, error},
{ST_ANY, EV_ANY, fatal_fsm_error}
};
I may have the function pointer fn declared wrong since this is from memory. Basically the state machine searched the array for a relevant state and event and called the function which did what had to be done then returned the new state.
The specific states were put first and the ST_ANY entries last since priority of the rules depended on their position in the array. The first rule that was found was the one used.
In addition, I remember we had an array of indexes to the first rule for each state to speed up the searches (all rules with the same starting state were grouped).
Also keep in mind that this was pure C - there may well be a better way to do it with C++.
A finite state machine consists of a finite number discrete of states (I know pedantic, but still), which can generally be represented as integer values. In c or c++ using an enumeration is very common.
The machine responds to a finite number of inputs which can often be represented with another integer valued variable. In more complicated cases you can use a structure to represent the input state.
Each combination of internal state and external input will cause the machine to:
possibly transition to another state
possibly generate some output
A simple case in c might look like this
enum state_val {
IDLE_STATE,
SOME_STATE,
...
STOP_STATE
}
//...
state_val state = IDLE_STATE
while (state != STOP_STATE){
int input = GetInput();
switch (state) {
case IDLE_STATE:
switch (input) {
case 0:
case 3: // note the fall-though here to handle multiple states
write(input); // no change of state
break;
case 1:
state = SOME_STATE;
break
case 2:
// ...
};
break;
case SOME_STATE:
switch (input) {
case 7:
// ...
};
break;
//...
};
};
// handle final output, clean up, whatever
though this is hard to read and more easily split into multiple function by something like:
while (state != STOP_STATE){
int input = GetInput();
switch (state) {
case IDLE_STATE:
state = DoIdleState(input);
break;
// ..
};
};
with the complexities of each state held in it's own function.
As m3rLinEz says, you can hold transitions in an array for quick indexing. You can also hold function pointer in an array to efficiently handle the action phase. This is especially useful for automatic generation of large and complex state machines.
The answers here seem really complex (but accurate, nonetheless.) So here are my thoughts.
First, I like dmckee's (operational) definition of an FSM and how they apply to programming.
A finite state machine consists of a
finite number discrete of states (I
know pedantic, but still), which can
generally be represented as integer
values. In c or c++ using an
enumeration is very common.
The machine responds to a finite
number of inputs which can often be
represented with another integer
valued variable. In more complicated
cases you can use a structure to
represent the input state.
Each combination of internal state and
external input will cause the machine
to:
possibly transition to another state
possibly generate some output
So you have a program. It has states, and there is a finite number of them. ("the light bulb is bright" or "the light bulb is dim" or "the light bulb is off." 3 states. finite.) Your program can only be in one state at a time.
So, say you want your program to change states. Usually, you'll want something to happen to trigger a state change. In this example, how about we take user input to determine the state - say, a key press.
You might want logic like this. When the user presses a key:
If the bulb is "off" then make the bulb "dim".
If the bulb is "dim", make the bulb "bright".
If the bulb is "bright", make the bulb "off".
Obviously, instead of "changing a bulb", you might be "changing the text color" or whatever it is you program needs to do. Before you start, you'll want to define your states.
So looking at some pseudoish C code:
/* We have 3 states. We can use constants to represent those states */
#define BULB_OFF 0
#define BULB_DIM 1
#define BULB_BRIGHT 2
/* And now we set the default state */
int currentState = BULB_OFF;
/* now we want to wait for the user's input. While we're waiting, we are "idle" */
while(1) {
waitForUserKeystroke(); /* Waiting for something to happen... */
/* Okay, the user has pressed a key. Now for our state machine */
switch(currentState) {
case BULB_OFF:
currentState = BULB_DIM;
break;
case BULB_DIM:
currentState = BULB_BRIGHT;
doCoolBulbStuff();
break;
case BULB_BRIGHT:
currentState = BULB_OFF;
break;
}
}
And, voila. A simple program which changes the state.
This code executes only a small part of the switch statement - depending on the current state. Then it updates that state. That's how FSMs work.
Now here are some things you can do:
Obviously, this program just changes the currentState variable. You'll want your code to do something more interesting on a state change. The doCoolBulbStuff() function might, i dunno, actually put a picture of a lightbulb on a screen. Or something.
This code only looks for a keypress. But your FSM (and thus your switch statement) can choose state based on what the user inputted (eg, "O" means "go to off" rather than just going to whatever is next in the sequence.)
Part of your question asked for a data structure.
One person suggested using an enum to keep track of states. This is a good alternative to the #defines that I used in my example. People have also been suggesting arrays - and these arrays keep track of the transitions between states. This is also a fine structure to use.
Given the above, well, you could use any sort of structure (something tree-like, an array, anything) to keep track of the individual states and define what to do in each state (hence some of the suggestions to use "function pointers" - have a state map to a function pointer which indicates what to do at that state.)
Hope that helps!
See Wikipedia for the formal definition. You need to decide on your set of states S, your input alphabet Σ and your transition function δ. The simplest representation is to have S be the set of integers 0, 1, 2, ..., N-1, where N is the number of states, and for Σ be the set of integers 0, 1, 2, ..., M-1, where M is the number of inputs, and then δ is just a big N by M matrix. Finally, you can store the set of accepting states by storing an array of N bits, where the ith bit is 1 if the ith state is an accepting state, or 0 if it is not an accepting state.
For example, here is the FSM in Figure 3 of the Wikipedia article:
#define NSTATES 2
#define NINPUTS 2
const int transition_function[NSTATES][NINPUTS] = {{1, 0}, {0, 1}};
const int is_accepting_state[NSTATES] = {1, 0};
int main(void)
{
int current_state = 0; // initial state
while(has_more_input())
{
// advance to next state based on input
int input = get_next_input();
current_state = transition_function[current_state][input];
}
int accepted = is_accepting_state[current_state];
// do stuff
}
You can basically use "if" conditional and a variable to store the current state of FSM.
For example (just a concept):
int state = 0;
while((ch = getch()) != 'q'){
if(state == 0)
if(ch == '0')
state = 1;
else if(ch == '1')
state = 0;
else if(state == 1)
if(ch == '0')
state = 2;
else if(ch == '1')
state = 0;
else if(state == 2)
{
printf("detected two 0s\n");
break;
}
}
For more sophisticated implementation, you may consider store state transition in two dimension array:
int t[][] = {{1,0},{2,0},{2,2}};
int state = 0;
while((ch = getch()) != 'q'){
state = t[state][ch - '0'];
if(state == 2){
...
}
}
A few guys from AT&T, now at Google, wrote one of the best FSM libraries available for general use. Check it out here, it's called OpenFST.
It's fast, efficient, and they created a very clear set of operations you can perform on the FSMs to do things like minimize them or determinize them to make them even more useful for real world problems.
if by FSM you mean finite state machine,
and you like it simple, use enums to name your states
and switch betweem them.
Otherwise use functors. you can look the
fancy definition up in the stl or boost docs.
They are more or less objects, that have a
method e.g. called run(), that executes
everything that should be done in that state,
with the advantage that each state has it's own
scope.

Resources