Arduino #define gives incorrect values for multiplication - arduino

I am somewhat confused with why the code below gives an incorrect value for x and y. When I run it on my Arduino b is shown to be 124, as one would expect, however x is shown to be 10420 and y is 2104.
I believe it is due to the "#define" command as if I replace the define for b with an "int" x and y return the correct values. Oddly this issue only occurs for multiplication and division, addition and subtraction using values from "#define" function correctly.
#define a 20
#define b a + 104
int x = b*100;
int y = 100*b;
void setup() {
void loop() {
delay (500);
Please could someone explain why the multiplication returns incorrect values and why the order of multiplication impacts the result.
Thank you in advance.

define not calculates the value, it is pre-compile directive - so you get following result before compile:
#define a 20
#define b a + 104
int x = 20 + 104*100; //b*100
int y = 100*20 + 104; //100*b;
void setup() {
void loop() {
delay (500);

C preprocessor macros are based on token substitution, not evaluation. The first rule when using them is to add extra parentheses. The zeroth rule is not to use them for simple constants.
So either write with parenthesis:
#define a 20
#define b (a + 104)
int x = b*100;
int y = 100*b;
which expands to
int x = (20 + 104)*100;
int y = 100*(20 + 104);
or use constant variables for constants
const int a = 20;
const int b = a + 104;
int x = b*100;
int y = 100*b;
which avoids the problem. The compiler should optimise the constant away.

The problem is how macro arguments are expanded. They are expanded verbatim, so in your case a + 104 is substituted where b appears like this
int x = a + 104*100;
which is then expanded to
int x = 20 + 104*100;
would this give the result you get?


The requirement:
Let say we have 1) Five groups of colors, each group has three colors (the colors are generated dynamically in the CPU) and 2) a list of 1000 car, each car is represented in the list by its color (the color picked from the group).
And we want to pass three arguments to an OpenCL kernel: 1) a group of the generated color, 2) a car's color array (1D), and 3) an integer array (1D) to test the car color against the color group (doing a simple calculation).
The structures:
struct GeneratedColorGroup
float4 Color1; //16 =2^4
float4 Color2; //16 =2^4
float4 Color3; //16 =2^4
float4 Color4; //16 =2^4
struct ColorGroup
GeneratedColorGroup Colors[8]; //512 = 2^9
The kernel code:
__kernel void findCarColorRelation(
const __global ColorGroup *InColorGroups,
const __global float4* InCarColor,
const __global int* CarGroupIndicator
const int carsNumber)
int globalID = get_global_id( 0 );
if(globalID < carsNumber)
ColorGroup colorGroups;
float4 carColor;
colorGroups = InColorGroups[globalID];
carColor = InCarColor[globalID];
for(int groupIndex =0; groupIndex < 8; groupIndex++)
if(colorGroups[groupIndex].Color1 == carColor)
CarGroupIndicator[globalID] = groupIndex + 1 ;
if(colorGroups[groupIndex].Color2 == carColor)
CarGroupIndicator[globalID] = groupIndex * 2 + 2;
if(colorGroups[groupIndex].Color3 == carColor)
CarGroupIndicator[globalID] = groupIndex * 3 + 3;
Now, we have 1000 items which mean the kernel is going to be executed 1000 time. That's OK.
The problem:
As you see, we have a global ColorGroup as an input to the kernel, this global memory has five items of "GeneratedColorGroup" type.
I tried to access these items as shown in the code above but I got an unexpected result. and the execution is very slow.
What is the wrong with my code?
Any help is highly appreciated.
When passing structs from a host to a device, make sure you declare the struct type with __attribute__ ((packed)) in both host and device code. Otherwise the host and the device compilers may create have a different memory layout for the struct, i.e. they can use a different size for a padding.
Using packed structs may cause a performance degaradation, because packed structs don't have padding at all, so data within a struct may not be properly aligned and an unaligned access is usually slow. In this case, you have to either manually insert a padding with char[], or use the __attribute__ ((aligned (N))) on a struct field (or on the struct itself).
See the OpenCL C specification for details on packed and aligned attributes:
I'm wildly guessing the problem is
... CarGroupIndicator[globalID] = groupIndex + 1 ;
... CarGroupIndicator[globalID] = groupIndex * 2 + 2;
... CarGroupIndicator[globalID] = groupIndex * 3 + 3;
... which makes it impossible to tell from the result CarGroupIndicator[globalID] what was matched exactly. E.g. match on group 5 color 1 results in value 6, but so does group 2 color 2 and also group 1 color 3 result in value 6. What you want is something like this:
... CarGroupIndicator[globalID] = groupIndex;
... CarGroupIndicator[globalID] = groupIndex + 8;
... CarGroupIndicator[globalID] = groupIndex + 16;
.. then 0-7 are color1, 8-15 color2, 16-24 color3.

I have an ADXL355 (EVAL-ADXL355-PMDZ) that I am trying to test against a very expensive industrial grade sensor. I am using I2C and I am able to read the device properties and settings as described in the datasheet.
The issue I'm having is how to read the 3 ZDATA (or XDATA, YDATA) registers as a single value. I have tried two approaches. Here is the first:
double values[3];
Wire.write(0x08); // ACCEL_XAXIS
Wire.requestFrom(addr, 9, true); // Read 9, 3 for each axis
byte x1, x2, x3;
for (int i = 0; i < 3; ++i){
x3 =;
x2 =;
x1 =;
unsigned long tempV = 0;
unsigned long value = 0;
value = x3;
value <<= 12;
tempV = x2;
tempV <<= 4;
value |= tempV;
tempV = x1;
tempV >>= 4;
value |= tempV;
values[i] = SCALEFACTOR * value;
This will produce values that approach 1g for negative gravity and 3g for positive gravity. Also the unloaded axes will sometimes show offscale high instead of -0.0g. They bounce from 0.0 to 4.0 g's. This tells me I have a sign problem which I'm sure comes from using unsigned long. So I attempted to read it as a 16 bit value and retain the sign.
double values[3];
Wire.write(0x08); // ACCEL_XAXIS
Wire.requestFrom(addr, 9, true); // Read 9, 3 for each axis
byte x1, x2, x3;
for (int i = 0; i < 3; ++i){
x3 =;
x2 =;
x1 =;
long tempV = 0;
long value = 0;
value = x3;
value <<= 8;
tempV = x2;
value |= tempV;
values[i] = SCALEFACTOR * value;
This produced values are good in terms of sign but they are (as expected) much lower in magnitude than they are supposed to be. I tried to create a 20 bit number like this long value:20; but I received
expected initializer before ':' token
same error for int.
How do I properly read from 3 registers to obtain a correct 20 bit value?
First of all, you really want to use unsigned types when using the left and right shift operators (see this question).
Taking a look to the avr-gcc type layout we learn that long are represented on 4 bytes (i.e. 32 bits) so they are long enough (no pun intended) to "hold" your 20 bits numbers (XDATA, YDATA, and ZDATA). On the other hand, int are represented on 2 bytes (i.e. 16 bits) and thus should not be used in your case.
According to the datasheet you linked page 33, the numbers are formatted as two's complement. Your first example correctly set the last 20 bits of your unsigned, 32 bits long value (in particular the left justification handling — right-shifting x1 by four — already looks correct) but the "new" 12 most significants bits are always set to 0.
To perform sign extension, you need to set the "new" 12 most significant bits to 0 if the number is a positive value, 1 if the number is a negative value (adaptation of your first example):
value |= tempV;
if (x3 & 0x80) /* msb is 1 so the number is a negative value */
value |= 0xFFF00000;
From there, what you should observe is about the same behaviour as previously: high positive values instead of small negative ones (but even higher than previously). This is caused by the fact that while your value is correct bitwise speaking, it is still intepreted as unsigned. This can be worked around by forcing the compiler to use value as signed:
values[i] = SCALEFACTOR * (long)value;
And now it should be working.
Note that this answer use the fact that your C/C++ implementation use two's complement to represent negative integers. While very rare in practice, the standard allow other representations (see this question for examples).
Here is one way to make it work. It does use bitshifting on a signed value. Various sources have said that this is a potential bug as it is implementation defined. It worked on my platform.
typedef union {
byte bytes[3];
long value:24;
} accelData;
double values[3];
Wire.write(0x08); // ACCEL_XAXIS
Wire.requestFrom(addr, 9, true); // Read 9, 3 for each axis
accelData raw;
for (int i = 0; i < 3; ++i){
raw.bytes[2] =;
raw.bytes[1] =;
raw.bytes[0] =;
long temp = raw.value >> 4;
values[i] = SCALEFACTOR * (double)temp;
I prefer the solution presented by Alexandre Perrin.
unsigned : 4;
unsigned long uvalue : 20;
unsigned : 4;
signed long ivalue : 20;
unsigned char rawdata[3];
for (int i = 0; i < 3; ++i){
raw.bytes[2] =; //if most significant part is transfered first
raw.bytes[1] =;
raw.bytes[0] =;
values[i] = SCALEFACTOR * (double)raw.ivalue;

Add one to or subtract one from an odd integer such that the even result is closer to the nearest power of two.
if ( ??? ) x += 1; else x -= 1;// x > 2 and odd
For example, 25 through 47 round towards 32, adding one to 25 through 31 and subtracting one from 33 through 47. 23 rounds down towards 16 to 22 and 49 rounds up towards 64 to 50.
Is there a way to do this without finding the specific power of two that is being rounded towards. I know how to use a logarithm or count bits to get the specific power of two.
My specific use case for this is in splitting odd sized inputs to karatsuba multiplication.
If the second most significant bit is set then add, otherwise subtract.
if ( (x&(x>>1)) > (x>>2) ) x += 1; else x -= 1;
It isn't a big deal to keep all of the powers of 2 for a 32 bit integer (only 32 entries) do a quick binary search for the location it's supposed to be in. Then you can easily figure out which number it's closer to by subtracting from the higher and lower numbers and getting the abs. Then you can easily decide which one to add to.
You may be able to avoid the search by taking the log base 2 of your number and using that to index into the array
UPDATE: reminder this code is not thoroughly tested.
#include <array>
#include <cmath>
#include <iostream>
const std::array<unsigned int,32> powers =
1<<28,1<<29,1<<30,1<<31 -1
std::array<unsigned int,32> powers_of_two() {
std::array<unsigned int,32> powers_of_two{};
for (unsigned int i = 0; i < 31; ++i) {
powers_of_two[i] = 1 << i;
return powers_of_two;
unsigned int round_to_closest(unsigned int number) {
if (number % 2 == 0) return number;
unsigned int i = std::ceil(std::log2(number));
//higher index
return (powers[i]-number) < (number - powers[i-1]) ?
int main() {
std::cout << round_to_closest(27) << std::endl;
std::cout << round_to_closest(23) << std::endl;
return 0;
Since I can't represent 2 ^ 31 I used the closest unsigned int to it ( all 1's) this means that 1 case out of all of them will produce the incorrect result, I figured that's not a big deal.
I was thinking that you could use a std::vector<bool> as a very large lookup table on wether to add 1 or subtract 1, seems like overkill to me for an operation that seems to run quite fast.
As #aaronman pointed out, if you are working with integers only the fastest way to do this is to have all powers of 2 in table as there are not that many. By construction, in an unsigned 32 bit integer there are 32 powers of 2 (including the number 1), in a 64 bit integer there are 64 and so on.
But if you want to do it on the fly for a generic case you can easily calculate the surrounding powers of 2 of any number. In c/c++:
#include <math.h>
double bottom, top, number, exponent;
number = 1234; // Set the value for number
exponent = int(log(number) / log(2.0)); // int(10.2691) = 10
bottom = pow(2, exponent); // 2^10 = 1024
top = bottom * 2; // 2048
// Calculate the difference between number, top and bottom and add or subtract
// 1 accordingly
number = (top - number) < (number - bottom) ? number + 1 : number - 1;
For nearest (not greatest or equal) - see this:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
unsigned int val = atoi(argv[1]);
unsigned int x = val;
unsigned int result;
do {
result = x;
} while(x &= x - 1);
if((result >> 1) & val)
result <<= 1;
printf("result=%u\n", result);
return 0;
if you need greatest or equal - change:
if((result >> 1) & val)
if(result != val)

I have int A, B, C. And A is in range 0-9999, B is 0-99, C is 0-99.
Because the function must return only one double, I think of putting them all into one number. Otherwise I need to call function three times.
But I cannot write an efficient code to do this. This will be called millions times, so it should be quite effective, but no ASM.
I need a function double pack3int_to_double(int A, int B, int C) {}
Couldn't you just store A + 1000B + 100000C?
For example, if you wanted to store A = 1234, B = 6, and C = 89, you'd just store
You can then extract the numbers by casting the double to an int and using standard integer division and modulus tricks to recover the individual values.
Hope this helps!
If A<10,000 and B & C <100, A can be expressed with 14 bits, and B & C with 8 bits. Thus you need 30 bits in total.
You could therefore pack/unpack the integers by shifting it to the right place:
int packed = A + B<<14 + C<<22;
A = packed & 0x3FFF; B = (packed >> 14) & 0xFF; C = (packed >> 22) & 0xFF;
Bit shifting is of course MUCH faster than multiply/divide, and you can cast the int to a double and vice versa.
This is technically not legal C code, so you would use this at your own risk:
typedef union {
double x;
struct {
unsigned a : 14;
unsigned b : 7;
unsigned c : 7;
} y;
} result_t;
The C standard doesn't allow using a union member to write a value and a different one to read it out, but I am not aware of a compiler that does the static analysis to diagnose such a problem (it doesn't mean one won't do so in the future). Also, using certain int values may result in a trap representation for a double. But, if you know your system will not generate any trap representations, you can consider using this.
double pack3int_to_double(int A, int B, int C) {
result_t r;
r.y.a = A;
r.y.b = B;
r.y.c = C;
return r.x;
void unpack3int_from_double (double X, int *A, int *B, int *C) {
result_t r = { X };
*A = r.y.a;
*B = r.y.b;
*C = r.y.c;
You can use out parameters in function call and retrieve all 3 int variables.
You could return a NaN double with the data stored in the mantissa. That gives you 53 bits to utilize. Should be plenty.
Inspired by your answers, this is what I come up so far. This should be quite efficient, and only 32 bits are used, so the exponent of the double is not touched.
struct pack_abc {
unsigned short a;
unsigned char b, c;
int safety;
double pack3int_to_double(int A, int B, int C) {
struct pack_abc R = {A, B, C, 0}; // or 0 could be replaced with something smater, like NaN?
return *(double*)&R;
void main() {
int w = 1234, a = 56, d = 78;
int W, A, D, i;
double p = pack3int_to_double(w, a, d);
// we got the data packed into 'p', now let's unpack it
struct pack_abc *R = (struct pack_abc*) & p;
printf("%i %i %i\n", (int)R->a, (int)R->b, (int)R->c);

I'm using the rainbowduino and it has some methods that take individual r g b values as unsigned chars, and some that take a 24bit rgb colour code.
I want to convert r g b values into this 24bit colour code of type uint32_t (so that all my code only has to use r g b values.
Any ideas?
I have already tried uint32_t result = r << 16 + g << 8 + b;
r = 100 g =200 b=0 gave green, but r=0 g=200 b=0 gave nothing
Rb.setPixelXY(unsigned char x, unsigned char y, unsigned char colorR, unsigned char colorG, unsigned char colorB)
This sets the pixel(x,y)by specifying each channel(color) with 8bit number.
Rb.setPixelXY(unsigned char x, unsigned char y, unit32_t colorRGB)
This sets the pixel(x,y)by specifying a 24bit RGB color code.
The drivers code is:
void Rainbowduino::setPixelXY(unsigned char x, unsigned char y, uint32_t colorRGB /*24-bit RGB Color*/)
if(x > 7 || y > 7)
// Do nothing.
// This check is used to avoid writing to out-of-bound pixels by graphics function.
// But this might slow down setting pixels (remove this check if fast disply is desired)
colorRGB = (colorRGB & 0x00FFFFFF);
frameBuffer[0][x][y]=(colorRGB & 0x0000FF); //channel Blue
colorRGB = (colorRGB >> 8);
frameBuffer[1][x][y]=(colorRGB & 0x0000FF); //channel Green
colorRGB = (colorRGB >> 8);
frameBuffer[2][x][y]=(colorRGB & 0x0000FF); //channel Red
So I would think similar to the above :
uint8_t x,y,r,b,g;
uint32_t result = (r << 16) | (g << 8) | b;
Rb.setPixelXY(x, y, result);
should work. It I think the above likely needs the parenthesis, to ensure proper ordering, as "+" is higher than "<<". Also likely won't hurt but the "|" is better, as not to prevent undesired carry's.
P.S. Remember when shifting to be unsigned, unless you want arithmetic shift versus logical.
and on that note I don't like shifts as they are often messed up and inefficient. Rather a union is simple and efficient.
union rgb {
uint32_t word;
uint8_t byte[3];
struct {
uint8_t blue;
uint8_t green;
uint8_t red;
} color ;
}rgb ;
// one way to assign by discrete names. = b; = g; = r;
//or assign using array
rgb.byte[0] = b;
rgb.byte[1] = g;
rgb.byte[2] = r;
// then interchangeably use the whole integer word when desired.
Rb.setPixelXY(x, y, rgb.word);
no messing with keeping track of shifts.
One way to approach this would be to shift the bits to the left...
uint32_t result = r << 16 + g << 8 + b;
