Basic math calculation issue with Arduino

Basic math calculation issue with Arduino - math

I am doing a basic operation in Arduino and for some reason (this is why I need you) it gives me a totally inappropriate result. Below is the code:
long init_H_top; //I am declaring it a long to make sure I got enough bytes
init_H_top=251*255/360; //gives me -4 and it should be 178
Any idea why it does that?
I am very confused... Thanks!

Your variable may be a long but your constants (251, 255, and 360) are not.
They are int types so will calculate giving an int result which will then be put into the long variable, after any overflow has already done the damage.
Since Arduino has a 16-bit int type, 251 * 255 (64005) will exceed the maximum integer of 32767 and result in behaviour like you're seeing. The value 64005 is -1531 in 16-bit two's complement and, when you divide that by 360, you get about -4.25 which truncates to -4.
You should be using long constants to avoid this:
init_H_top = 251L * 255L / 360L;

Related

#define returns a wrong number in C

I'm trying to this very basic task in C, where I want to define a number of ints in a header file. I've done this like so:
#define MINUTE (60)
#define HOUR (60 * MINUTE)
#define DAY (24 * HOUR)
The problem is that while MINUTE and HOUR return the correct answer, DAY returns something weird.
Serial.println(MINUTE); // 60
Serial.println(HOUR); // 3600
Serial.println(DAY); // 20864
Can someone explain why this happens?

Assuming you have something like
int days = DAY;
or
unsigned days = DAY;
You seem to have 16 bit integers. The max. representable positive value for (signed) 2s complement integers with 16 bits is 32767, for unsigned it is 65535.
So , as 24 * 3600 == 86400, you invoke undefined behaviour for the signed int and wrap for the unsigned (the int will likely wrap, too, but that is not guaranteed).
This results in 86400 modulo 65356 (which is 2 to the power of 16) which happens to be 20864.
Solution: use stdint.h types: uint32_t or int32_t to get defined sized integers.
Edit: Using function arguments follows basically the same principle as the initialisers above.
Update: As you clamed, when directly passing the integr constant 86400 to the function, this will have type long, because the compiler will automatically choose the smallest type which can hold the values. It is very likely that the println methods are overloaded for long arguments, so they will print the correct value.
However, for the expression the original types are relevant. And all values 24, 60, 60 will have int type, so the result will also be int. The compiler will not use a larger type, just because the result might overflow. Use 24L and you will get a long result for the macros, too.

It looks like you actually managed to dig up an ancient 16 bit compiler (where did you find it? ) Otherwise I'd like to see the code that produces these numbers.

20864 = 86400 % 65536
Try storing the value in an int instead of a short.

ATMega peformance for different operations

Has anyone experiences replacing floating point operations on ATMega (2560) based systems? There are a couple of very common situations which happen every day.
For example:
Are comparisons faster than divisions/multiplications?
Are float to int type cast with followed multiplication/division faster than pure floating point operations without type cast?
I hope I don't have to make a benchmark just for me.
Example one:
int iPartialRes = (int)fArg1 * (int)fArg2;
iPartialRes *= iFoo;
faster as?:
float fPartialRes = fArg1 * fArg2;
fPartialRes *= iFoo;
And example two:
iSign = fVal < 0 ? -1 : 1;
faster as?:
iSign = fVal / fabs(fVal);

the questions could be solved just by thinking a moment about it.
AVRs does not have a FPU so all floating point related stuff is done in software --> fp multiplication involves much more than a simple int multiplication
since AVRs also does not have a integer division unit a simple branch is also much faster than a software division. if dividing floating points this is the worst worst case :)
but please note, that your first 2 examples produce very different results.

This is an old answer but I will submit this elaborated answer for the curious.
Just typecasting a float will truncate it ie; 3.7 will become 3, there is no rounding.
Fastest math on a 2560 will be (+,-,*) with divide being the slowest due to no hardware divide. Typecasting to an unsigned long int after multiplying all operands by a pseudo decimal point that suits your fractal number(1) range that your floats are expected to see and tracking the sign as a bool will give the best range/accuracy compromise.
If your loop needs to be as fast as possible, avoid even integer division, instead multiplying by a pseudo fraction instead and then doing your typecast back into a float with myFloat(defined elsewhere) = float(myPseudoFloat) / myPseudoDecimalConstant;
Not sure if you came across the Show info page in the playground. It's basically a sketch that runs a benchmark on your (insert Arduino model here) Shows the actual compute times for various things and systems. The Mega 2560 will be very close to an At Mega 328 as far as FLOPs goes, up to 12.5K/s (80uS per divide float). Typecasting would likely handicap the CPU more as it introduces more overhead and might even give erroneous results due to rounding errors and lack of precision.
(1)ie: 543.509,291 * 100000 = 543,509,291 will move the decimal 6 places to the maximum precision of a float on an 8-bit AVR. If you first multiply all values by the same constant like 1000, or 100000, etc, then the decimal point is preserved and then you cast it back to a float number by dividing by your decimal constant when you are ready to print or store it.
float f = 3.1428;
int x;
x = f * 10000;
x now contains 31428

Arduino Arithmetic error negative result

I am trying to times 52 by 1000 and i am getting a negative result
int getNewSum = 52 * 1000;
but the following code is ouputting a negative result: -13536

An explanation of how two's complement representation works is probably better given on Wikipedia and other places than here. What I'll do here is to take you through the workings of your exact example.
The int type on your Arduino is represented using sixteen bits, in a two's complement representation (bear in mind that other Arduinos use 32 bits for it, but yours is using 16.) This means that both positive and negative numbers can be stored in those 16 bits, and if the leftmost bit is set, the number is considered negative.
What's happening is that you're overflowing the number of bits you can use to store a positive number, and (accidentally, as far as you're concerned) setting the sign bit, thus indicating that the number is negative.
In 16 bits on your Arduino, decimal 52 would be represented in binary as:
0000 0000 0011 0100
(2^5 + 2^4 + 2^2 = 52)
However, the result of multiplying 52 by 1,000 -- 52,000 -- will end up overflowing the magnitude bits of an int, putting a '1' in the sign bit on the end:
*----This is the sign bit. It's now 1, so the number is considered negative.
1100 1011 0010 0000
(typically, computer integer arithmetic and associated programming languages don't protect you against doing things like this, for a variety of reasons, mostly related to efficiency, and mostly now historical.)
Because that sign bit on the left-hand end is set, to convert that number back into decimal from its assumed two's complement representation, we assume it's a negative number, and then first take the one's complement (flipping all the bits):
0011 0100 1101 1111
-- which represents 13,535 -- and add one to it, yielding 13,536, and call it negative: -13,536, the value you're seeing.
If you read up on two's complement/integer representations in general, you'll get the hang of it.
In the meantime, this probably means you should be looking for a bigger type to store your number. Arduino has unsigned integers, and a long type, which will use four bytes to store your numbers, giving you a range from -2,147,483,648 to 2,147,483,647. If that's enough for you, you should probably just switch to use long instead of int.

Matt's answer is already a very good in depth explanation of what's happening, but for those looking for a more TL;dr practical answer:
Problem:
This happens quite often for Arduino programmers when they try to assign (= equal sign) the result of an arithmetic (usually multiplication) to a normal integer (int). As mentioned, when the result is bigger than the memory size compiler has assigned to the variables, overflowing happens.
Solution 1:
The easiest solution is to replace the int type with a bigger datatype considering your needs. As this tutorialspoint.com tutorial has explained there are different integer types we can use:
int:
16 bit: from -32,768 to 32,767
32 bit: from -2,147,483,648 to 2,147,483,647
unsigned int: from 0 to 65,535
long: from 2,147,483,648 to 2,147,483,647
unsigned long: from 0 to 4,294,967,295
Solution 2:
This works only if you have some divisions with big enough denominators, in your arithmetic. In Arduino compiler multiplication is calculated prior to the division. so if you have some divisions in your equation try to encapsulate them with parentheses. for example if you have a * b / c replace it with a * (b / c).

Whats wrong with my division?

I have a 16-bits sample between -32768 and 32767.
To save space I want to convert it to a 8-bits sample, so I divide the sample by 256, and add 128.
-32768 / 256 = -128 + 128 = 0
32767 / 256 = 127.99 + 128 = 255.99
Now, the 0 will fit perfectly in a byte, but the 255.99 has to be rounded down to 255, causing me to loose precision, because when converting back I'll get 32512 instead of 32767.
How can I do this, without loosing the original min/max values? I know I make a very obvious thought error, but I cant figure out where the mistake lies.
And yes, ofcourse I'm fully aware I lost precision by dividing, and will not be able to deduce the original values from the 8-bit samples, but I just wonder why I don't get the original maximum.

The answers for down-sampling have already been provided.
This answer relates to up-sampling using the full range. Here is a C99 snippet demonstrating how you can spread the error across the full range of your values:
#include <stdio.h>
int main(void)
{
for( int i = 0; i < 256; i++ ) {
unsigned short scaledVal = ((unsigned short)i << 8) + (unsigned short)i;
printf( "%8d%8hu\n", i, scaledVal );
}
return 0;
}
It's quite simple. You shift the value left by 8 and then add the original value back. That means every increase by 1 in the [0,255] range corresponds to an increase by 257 in the [0,65535] range.
I would like to point out that this might give worse results than you began with. For example, if you downsampled 65280 (0xff00) you would get 255, but then upsampling that would give 65535 (0xffff), which is a total error of 255. You will have similarly large errors across most of the higher end of your data range.
You might do better to abandon the notion of going back to the [0,65535] range, and instead round your values by half. That is, shift left and add 127. This means the error is uniform instead of skewed. Because you don't actually know what the original value was, the best you can do is estimate it with a value right in the centre.
To summarize, I think this is more mathematically correct:
unsigned short scaledVal = ((unsigned short)i << 8) + 127;

You don't get the original maximum because you can't represent the number 256 as an 8-bit unsigned integer.

if you're trying to compress your 16 bit integer value into a 8 bit integer value range, you take the most significant 8 bits and keep them while throwing out the least significant 8 bits. Normally this is accomplished by shifting the bits. A >> operator is a shift from most to least significant bits which would work if used 8 times or >>8. You can also just mask out the bytes and divide off the 00s doing your rounding before your division, with something like 8BitInt = (16BitInt & 65280)/256; [65280 a.k.a 0xFF00]
Every bit you shift off of a value halves it, like division by 2, and rounds down.
All of the above is complicated some by the fact that you're dealing with a signed integer.
Finally I'm not 100% certain I got everything right here because really, I haven't tried doing this.

Is there any sense in performing binary AND with a number where all bits are set to 1

Greetings everybody. I have seen examples of such operations for so many times that I begin to think that I am getting something wrong with binary arithmetic. Is there any sense to perform the following:
byte value = someAnotherByteValue & 0xFF;
I don't really understand this, because it does not change anything anyway. Thanks for help.
P.S.
I was trying to search for information both elsewhere and here, but unsuccessfully.
EDIT:
Well, off course i assume that someAnotherByteValue is 8 bits long, the problem is that i don't get why so many people ( i mean professionals ) use such things in their code. For example in SharpZlib there is:
buffer_ |= (uint)((window_[windowStart_++] & 0xff |
(window_[windowStart_++] & 0xff) << 8) << bitsInBuffer_);
where window_ is a byte buffer.

The most likely reason is to make the code more self-documenting. In your particular example, it is not the size of someAnotherByteValue that matters, but rather the fact that value is a byte. This makes the & redundant in every language I am aware of. But, to give an example of where it would be needed, if this were Java and someAnotherByteValue was a byte, then the line int value = someAnotherByteValue; could give a completely different result than int value = someAnotherByteValue & 0xff. This is because Java's long, int, short, and byte types are signed, and the rules for conversion and sign extension have to be accounted for.
If you always use the idiom value = someAnotherByteValue & 0xFF then, no matter what the types of the variable are, you know that value is receiving the low 8 bits of someAnotherByteValue.

uint s1 = (uint)(initial & 0xffff);
There is a point to this because uint is 32 bits, while 0xffff is 16 bits. The line selects the 16 least significant bits from initial.

Nope.. There is no use in doing this. Should you be using a value that is having its importance more than 8 bits, then the above statement has some meaning. Otherwise, its the same as the input.

If sizeof(someAnotherByteValue) is more than 8 bits and you want to extract the least signficant 8 bits from someAnotherByteValue then it makes sense. Otherwise, there is no use.

No, there is no point so long as you are dealing with a byte. If value was a long then the lower 8 bits would be the lower 8 bits of someAnotherByteValue and the rest would be zero.
In a language like C++ where operators can be overloaded, it's possible but unlikely that the & operator has been overloaded. That would be pretty unusual and bad practice though.

EDIT: Well, off course i assume that
someAnotherByteValue is 8 bits long,
the problem is that i don't get why so
many people ( i mean professionals )
use such things in their code. For
example in Jon Skeet's MiscUtil there
is:
uint s1 = (uint)(initial & 0xffff);
where initial is int.
In this particular case, the author might be trying to convert an int to a uint. The & with 0xffff would ensure that it would still convert Lowest 2 Bytes, even if the system is not one which has a 2 byte int type.

To be picky, there is no guaranty regarding a machine's byte size. There is no reason to assume in a extremely portable program that the architecture byte is 8 bits wide. To the best of my memory, according to the C standard (for example), a char is one byte, short is wider or the same as char, int is wider or the same as short, long is wider or the same as int and so on. Hence, theoretically there can be a compiler where a long is actually one byte wide, and that byte will be, say, 10 bits wide. Now, to ensure your program behaves the same on that machine, you need to use that (seemingly redundant) coding style.
"Byte" # Wikipedia gives examples for such peculiar architectures.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex