Trouble Understanding a Checksum Algorithm - math

Protocol for BM62 Bluetooth Module
I just have a simple question about how the Checksum algorithm works for a particular Bluetooth module (BM62). The picture above has the UART protocol explained, and it explains the checksum rule, but I am having trouble understanding how it actually works, and cannot seem to guess the checksum value as it is given in the example in the picture.

The idea seems to be that you need to come up with CHKSUM such that LENH + LENL + OPCODE + PARAM + CHKSUM has 0 in the least significant byte. So, let's do the summation in 8 bits (or modulo 256):
LENH + LENL + OPCODE + PARAM + CHKSUM = 0
CHKSUM = -(LENH + LENL + OPCODE + PARAM)
IOW, CHKSUM = -(0 + 2 + 1 + 0) = -3 = 0xFD. (Remember that all of this was done in 8 bits).
You can verify that CHKSUM satisfies the requirement (you're now doing everything in 16 bits):
0 + 2 + 1 + 0 + 0xFD = 0x100
And that has 0 in the least significant byte. If we did this in 8 bits as well, we'd get 0 instead of 0x100 and that would also be a valid way to check correctness.

Related

TCP calculate checksum

I am trying to calculate the checksum of the following packet:
A TCP packet captured with wire shark
But I never managed to get the correct checksum (0x67ea).
I tried to calculate it like follows (of course with using one's complement sum):
source IP + destination IP + TCP protocol + (TCP header length +
payload length) + payload
c0a8 + ae80 + c0a8 + ae01 + 0006 + 0026 + 6c69 + 646f + 720a = 420df
With the one's complement: 4 + 20df = 20e3
not(20e3) = df1c
Which is defiantly not the current checksum.
I also notice that every time I send the same payload, the checksum is changing so I guess it can't be those same unchanging variables and it must be more (for example timestamp..).
What are the exact parameters and the formula that checksum uses?
and how can I calculate it?
Thanks for your help!
You might've overlooked TCP fixed header and TCP options. These bytes also contribute to the checksum (not just TCP payload). Please note that the TCP checksum field (ddff) is replaced with 0 for correct calculation.
IPv4 SRC + IPv4 DST + IPv4 Protocol + TCP Segment Length +
TCP Fixed Header (with checksum field set to 0) +
TCP Options +
TCP Payload
gives you
c0a8 + ae80 + c0a8 + ae01 + 0006 + 0026 +
115c + dcba + 28d5 + 41da + 64e8 + 6a10 + 8018 + 01fe + 0000 + 0000 +
(csum)
0101 + 080a + 5c86 + c6f8 + bd62 e36f +
6c69 + 646f + 720a
which equals 9980c. Next, 9 + 980c = 9815, and, finally, this will give you 67ea.

how to encode 27 vector3's into a 0-256 value?

I have 27 combinations of 3 values from -1 to 1 of type:
Vector3(0,0,0);
Vector3(-1,0,0);
Vector3(0,-1,0);
Vector3(0,0,-1);
Vector3(-1,-1,0);
... up to
Vector3(0,1,1);
Vector3(1,1,1);
I need to convert them to and from a 8-bit sbyte / byte array.
One solution is to say the first digit, of the 256 = X the second digit is Y and the third is Z...
so
Vector3(-1,1,1) becomes 022,
Vector3(1,-1,-1) becomes 200,
Vector3(1,0,1) becomes 212...
I'd prefer to encode it in a more compact way, perhaps using bytes (which I am clueless about), because the above solution uses a lot of multiplications and round functions to decode, do you have some suggestions please? the other option is to write 27 if conditions to write the Vector3 combination to an array, it seems inefficient.
Thanks to Evil Tak for the guidance, i changed the code a bit to add 0-1 values to the first bit, and to adapt it for unity3d:
function Pack4(x:int,y:int,z:int,w:int):sbyte {
var b: sbyte = 0;
b |= (x + 1) << 6;
b |= (y + 1) << 4;
b |= (z + 1) << 2;
b |= (w + 1);
return b;
}
function unPack4(b:sbyte):Vector4 {
var v : Vector4;
v.x = ((b & 0xC0) >> 6) - 1; //0xC0 == 1100 0000
v.y = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.z = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.w = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
I assume your values are float not integer
so bit operations will not improve speed too much in comparison to conversion to integer type. So my bet using full range will be better. I would do this for 3D case:
8 bit -> 256 values
3D -> pow(256,1/3) = ~ 6.349 values per dimension
6^3 = 216 < 256
So packing of (x,y,z) looks like this:
BYTE p;
p =floor((x+1.0)*3.0);
p+=floor((y+1.0)*3.0*6.0);
p+=floor((y+1.0)*3.0*6.0*6.0);
The idea is convert <-1,+1> to range <0,1> hence the +1.0 and *3.0 instead of *6.0 and then just multiply to the correct place in final BYTE.
and unpacking of p looks like this:
x=p%6; x=(x/3.0)-1.0; p/=6;
y=p%6; y=(y/3.0)-1.0; p/=6;
z=p%6; z=(z/3.0)-1.0;
This way you use 216 from 256 values which is much better then just 2 bits (4 values). Your 4D case would look similar just use instead 3.0,6.0 different constant floor(pow(256,1/4))=4 so use 2.0,4.0 but beware case when p=256 or use 2 bits per dimension and bit approach like the accepted answer does.
If you need real speed you can optimize this to force float representation holding result of packet BYTE to specific exponent and extract mantissa bits as your packed BYTE directly. As the result will be <0,216> you can add any bigger number to it. see IEEE 754-1985 for details but you want the mantissa to align with your BYTE so if you add to p number like 2^23 then the lowest 8 bit of float should be your packed value directly (as MSB 1 is not present in mantissa) so no expensive conversion is needed.
In case you got just {-1,0,+1} instead of <-1,+1>
then of coarse you should use integer approach like bit packing with 2 bits per dimension or use LUT table of all 3^3 = 27 possibilities and pack entire vector in 5 bits.
The encoding would look like this:
int enc[3][3][3] = { 0,1,2, ... 24,25,26 };
p=enc[x+1][y+1][z+1];
And decoding:
int dec[27][3] = { {-1,-1,-1},.....,{+1,+1,+1} };
x=dec[p][0];
y=dec[p][1];
z=dec[p][2];
Which should be fast enough and if you got many vectors you can pack the p into each 5 bits ... to save even more memory space
One way is to store the component of each vector in every 2 bits of a byte.
Converting a vector component value to and from the 2 bit stored form is as simple as adding and subtracting one, respectively.
-1 (1111 1111 as a signed byte) <-> 00 (in binary)
0 (0000 0000 in binary) <-> 01 (in binary)
1 (0000 0001 in binary) <-> 10 (in binary)
The packed 2 bit values can be stored in a byte in any order of your preference. I will use the following format: 00XXYYZZ where XX is the converted (packed) value of the X component, and so on. The 0s at the start aren't going to be used.
A vector will then be packed in a byte as follows:
byte Pack(Vector3<int> vector) {
byte b = 0;
b |= (vector.x + 1) << 4;
b |= (vector.y + 1) << 2;
b |= (vector.z + 1);
return b;
}
Unpacking a vector from its byte form will be as follows:
Vector3<int> Unpack(byte b) {
Vector3<int> v = new Vector<int>();
v.x = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.y = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.z = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
Both the above methods assume that the input is valid, i.e. All components of vector in Pack are either -1, 0 or 1 and that all two-bit sections of b in Unpack have a (binary) value of either 00, 01 or 10.
Since this method uses bitwise operators, it is fast and efficient. If you wish to compress the data further, you could try using the 2 unused bits too, and convert every 3 two-bit elements processed to a vector.
The most compact way is by writing a 27 digits number in base 3 (using a shift -1 -> 0, 0 -> 1, 1 -> 2).
The value of this number will range from 0 to 3^27-1 = 7625597484987, which takes 43 bits to be encoded, i.e. 6 bytes (and 5 spare bits).
This is a little saving compared to a packed representation with 4 two-bit numbers packed in a byte (hence 7 bytes/56 bits in total).
An interesting variant is to group the base 3 digits five by five in bytes (hence numbers 0 to 242). You will still require 6 bytes (and no spare bits), but the decoding of the bytes can easily be hard-coded as a table of 243 entries.

bitxor, solve equation, boolean logic

I have the equation:
C = A^b + (2*A)^b + (4*A)^b.
Where C and A are known, but b is unknown. How to find b?
All numbers are 8 bit bytes. Is there any possible method much faster than brute-force?
Does the + sign indicate addition on bytes and * multipication, with overflowing bits discarded? If so, I think the answer is
b = C ^ (A + 2 * A + 4* A)
How to reach that conclusion :
C = A^b + (2*A)^b + (4*A)^b
hence
C^b = A^b^b + (2*A)^b^b + (4*A)^b^b = A + 2*A + 4*A
then
C^C^b = b = C^(A + 2*A + 4*A)
EDIT Just to make sure : This answer is not correct. Shame on me. I'll have to think more about it.
I'm taking the same assumptions: + and * are addition and multiplication with overflow ignored.
Look-up Table
This would probably be the fastest solution: Precompute the results, and store them in a look-up table. It would require 216 bytes of memory, or 64 kB.
Guess, Check, Refine Method
Presented in C-family-like pseudocode:
byte Solve(byte a, byte c){
byte guess = lastGuess = result = lastResult = 0;
do {
guess = lastGuess ^ lastResult ^ c; //see explanation below
result = a^guess + (2*a)^guess + (4*a)^guess;
lastGuess = guess;
lastResult = result;
} while (result != c);
return guess;
}
The idea of this algorithm is that it makes a guess at what b is, then plugs it into the formula for a tentative result, and checks it against c. Whatever bits in the guess caused the result to differ from c are changed. This corresponds to the XOR of the last guess, last result, and c (if this statement is a little bit of a jump, I encourage you draw a truth table, and not just take my word for it!).
Explanation
It works because changing a bit can only affect the results of that bit, and the more significant bits, but not lower bits (since when you do addition with pen and paper, the carries can propagate to the left). So in the worst case the algorithm takes 2 guesses to get the least significant bit correct, another guess for the 2nd lsb, another for the 3rd, etc. for a maximum of 9 guesses given any combination of a and c.
Here's an example trace from my test program:
a: 00001100
c: 01100111
Guess: 01100111
Result: 01000001
Guess: 01000001
Result: 00010111
Guess: 00110001
Result: 01100111
b: 00110001

How to find x mod 15 without using any Arithmetic Operations?

We are given a unsigned integer, suppose. And without using any arithmetic operators ie + - / * or %, we are to find x mod 15. We may use binary bit manipulations.
As far as I could go, I got this based on 2 points.
a = a mod 15 = a mod 16 for a<15
Let a = x mod 15
then a = x - 15k (for some non-negative k).
ie a = x - 16k + k...
ie a mod 16 = ( x mod 16 + k mod 16 ) mod 16
ie a mod 15 = ( x mod 16 + k mod 16 ) mod 16
ie a = ( x mod 16 + k mod 16 ) mod 16
OK. Now to implement this. A mod16 operations is basically & OxF. and k is basically x>>4
So a = ( x & OxF + (x>>4) & OxF ) & OxF.
It boils down to adding 2 4-bit numbers. Which can be done by bit expressions.
sum[0] = a[0] ^ b[0]
sum[1] = a[1] ^ b[1] ^ (a[0] & b[0])
...
and so on
This seems like cheating to me. I'm hoping for a more elegant solution
This reminds me of an old trick from base 10 called "casting out the 9s". This was used for checking the result of large sums performed by hand.
In this case 123 mod 9 = 1 + 2 + 3 mod 9 = 6.
This happens because 9 is one less than the base of the digits (10). (Proof omitted ;) )
So considering the number in base 16 (Hex). you should be able to do:
0xABCE123 mod 0xF = (0xA + 0xB + 0xC + 0xD + 0xE + 0x1 + 0x2 + 0x3 ) mod 0xF
= 0x42 mod 0xF
= 0x6
Now you'll still need to do some magic to make the additions disappear. But it gives the right answer.
UPDATE:
Heres a complete implementation in C++. The f lookup table takes pairs of digits to their sum mod 15. (which is the same as the byte mod 15). We then repack these results and reapply on half as much data each round.
#include <iostream>
uint8_t f[256]={
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,1,
2,3,4,5,6,7,8,9,10,11,12,13,14,0,1,2,
3,4,5,6,7,8,9,10,11,12,13,14,0,1,2,3,
4,5,6,7,8,9,10,11,12,13,14,0,1,2,3,4,
5,6,7,8,9,10,11,12,13,14,0,1,2,3,4,5,
6,7,8,9,10,11,12,13,14,0,1,2,3,4,5,6,
7,8,9,10,11,12,13,14,0,1,2,3,4,5,6,7,
8,9,10,11,12,13,14,0,1,2,3,4,5,6,7,8,
9,10,11,12,13,14,0,1,2,3,4,5,6,7,8,9,
10,11,12,13,14,0,1,2,3,4,5,6,7,8,9,10,
11,12,13,14,0,1,2,3,4,5,6,7,8,9,10,11,
12,13,14,0,1,2,3,4,5,6,7,8,9,10,11,12,
13,14,0,1,2,3,4,5,6,7,8,9,10,11,12,13,
14,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0};
uint64_t mod15( uint64_t in_v )
{
uint8_t * in = (uint8_t*)&in_v;
// 12 34 56 78 12 34 56 78 => aa bb cc dd
in[0] = f[in[0]] | (f[in[1]]<<4);
in[1] = f[in[2]] | (f[in[3]]<<4);
in[2] = f[in[4]] | (f[in[5]]<<4);
in[3] = f[in[6]] | (f[in[7]]<<4);
// aa bb cc dd => AA BB
in[0] = f[in[0]] | (f[in[1]]<<4);
in[1] = f[in[2]] | (f[in[3]]<<4);
// AA BB => DD
in[0] = f[in[0]] | (f[in[1]]<<4);
// DD => D
return f[in[0]];
}
int main()
{
uint64_t x = 12313231;
std::cout<< mod15(x)<<" "<< (x%15)<<std::endl;
}
Your logic is somewhere flawed but I can't put a finger on it. Think about it yourself, your final formula operates on first 8 bits and ignores the rest. That could only be valid if the part you throw away (9+ bits) are always the multiplication of 15. However, in reality (in binary numbers) 9+ bits are always multiplications of 16 but not 15. For example try putting 1 0000 0000 and 11 0000 0000 in your formula. Your formula will give 0 as a result for both cases, while in reality the answer is 1 and 3.
In essense I'm almost sure that your task can not be solved without loops. And if you are allowed to use loops - then it's nothing easier than to implement bitwiseAdd function and do whatever you like with it.
Added:
Found your problem. Here it is:
... a = x - 15k (for some non-negative k).
... and k is basically x>>4
It equals x>>4 only by pure coincidence for some numbers. Take any big example, for instance x=11110000. By your calculation k = 15, while in reality it is k=16: 16*15 = 11110000.

OR-multiplication on big integers

Multiplication of two n-bit numbers A and B can be understood as a sum of shifts:
(A << i1) + (A << i2) + ...
where i1, i2, ... are numbers of bits that are set to 1 in B.
Now lets replace PLUS with OR to get new operation I actually need:
(A << i1) | (A << i2) | ...
This operation is quite similar to regular multiplication for which there exists many faster algorithms (Schönhage-Strassen for example).
Is a similar algorithm for operation I presented here?
The size of the numbers is 6000 bits.
edit:
For some reason I have no link/button to post comments (any idea why?) so I will edit my question insead.
I indeed search for faster than O(n^2) algorithm for the operation defined above.
And yes, I am aware that it is not ordinary multiplication.
Is there a similar algorithm? I think probably not.
Is there some way to speed things up beyond O(n^2)? Possibly. If you consider a number A to be the analogue of A(x) = Σanxn where an are the binary digits of A, then your operation with bitwise ORs (let's call it A ⊕ B ) can be expressed as follows, where "⇔" means "analogue"
A ⇔ A(x) = Σanxn
B ⇔ B(x) = Σbnxn
C = A ⊕ B ⇔ C(x) = f(A(x)B(x)) = f(V(x)) where f(V(x)) = f(Σvnxn) = Σu(vn)xn where u(vn) = 0 if vn = 0, u(vn) = 1 otherwise.
Basically you are doing the equivalent of taking two polynomials and multiplying them together, then identifying all the nonzero terms. From a bit-string standpoint, this means treating the bitstring as an array of samples of zeros or ones, convolving the two arrays, and collapsing the resulting samples that are nonzero. There are fast convolution algorithms that are O(n log n), using FFTs for instance, and the "collapsing" step here is O(n)... but somehow I wonder if the O(n log n) evaluation of fast convolution treats something (like multiplication of large integers) as O(1) so you wouldn't actually get a faster algorithm. Either that, or the constants for orders of growth are so large that you'd have to have thousands of bits before you got any speed advantage. ORing is so simple.
edit: there appears to be something called "binary convolution" (see this book for example) that sounds awfully relevant here, but I can't find any good links to the theory behind it and whether there are fast algorithms.
edit 2: maybe the term is "logical convolution" or "bitwise convolution"... here's a page from CPAN (bleah!) talking a little about it along with Walsh and Hadamard transforms which are kind of the bitwise equivalent to Fourier transforms... hmm, no, that seems to be the analog for XOR rather than OR.
You can do this O(#1-bits in A * #1-bits in B).
a-bitnums = set(x : ((1<<x) & A) != 0)
b-bitnums = set(x : ((1<<x) & B) != 0)
c-set = 0
for a-bit in a-bitnums:
for b-bit in b-bitnums:
c-set |= 1 << (a-bit + b-bit)
This might be worthwhile if A and B are sparse in the number
of 1 bits present.
I presume, you are asking the name for the additive technique you have given
when you write "Is a similar algorithm for operation I presented here?"...
Have you looked at the Peasant multiplication technique?
Please read up the Wikipedia description if you do not get the 3rd column in this example.
B X A
27 X 15 : 1
13 30 : 1
6 60 : 0
3 120 : 1
1 240 : 1
B is 27 == binary form 11011b
27x15 = 15 + 30 + 120 + 240
= 15<<0 + 15<<1 + 15<<3 + 15<<4
= 405
Sounds familiar?
Here is your algorithm.
Choose the smaller number as your A
Initialize C as your result area
while B is not zero,
if lsb of B is 1, add A to C
left shift A once
right shift B once
C has your multiplication result (unless you rolled over sizeof C)
Update If you are trying to get a fast algorithm for the shift and OR operation across 6000 bits,
there might actually be one. I'll think a little more on that.
It would appear like 'blurring' one number over the other. Interesting.
A rather crude example here,
110000011 X 1010101 would look like
110000011
110000011
110000011
110000011
---------------
111111111111111
The number of 1s in the two numbers will decide the amount of blurring towards a number with all its bits set.
Wonder what you want to do with it...
Update2 This is the nature of the shift+OR operation with two 6000 bit numbers.
The result will be 12000 bits of course
the operation can be done with two bit streams; but, need not be done to its entirety
the 'middle' part of the 12000 bit stream will almost certainly be all 1s (provided both numbers are non-zero)
the problem will be in identifying the depth to which we need to process this operation to get both ends of the 12000 bit stream
the pattern at the two ends of the stream will depend on the largest consecutive 1s present in both the numbers
I have not yet got to a clean algorithm for this yet. Have updated for anyone else wanting to recheck or go further from here. Also, describing the need for such an operation might motivate further interest :-)
The best I could up with is to use a fast out on the looping logic. Combined with the possibility of using the Non-Zero approach as described by themis, you can answer you question by inspecting less than 2% of the N^2 problem.
Below is some code that gives the timing for numbers that are between 80% and 99% zero.
When the numbers get around 88% zero, using themis' approach switches to being better (was not coded in the sample below, though).
This is not a highly theoretical solution, but it is practical.
OK, here is some "theory" of the problem space:
Basically, each bit for X (the output) is the OR summation of the bits on the diagonal of a grid constructed by having the bits of A along the top (MSB to LSB left to right) and the bits of B along the side (MSB to LSB from top to bottom). Since the bit of X is 1 if any on the diagonal is 1, you can perform an early out on the cell traversal.
The code below does this and shows that even for numbers that are ~87% zero, you only have to check ~2% of the cells. For more dense (more 1's) numbers, that percentage drops even more.
In other words, I would not worry about tricky algorithms and just do some efficient logic checking. I think the trick is to look at the bits of your output as the diagonals of the grid as opposed to the bits of A shift-OR with the bits of B. The trickiest thing is this case is keeping track of the bits you can look at in A and B and how to index the bits properly.
Hopefully this makes sense. Let me know if I need to explain this a bit further (or if you find any problems with this approach).
NOTE: If we knew your problem space a bit better, we could optimize the algorithm accordingly. If your numbers are mostly non-zero, then this approach is better than themis since his would result is more computations and storage space needed (sizeof(int) * NNZ).
NOTE 2: This assumes the data is basically bits, and I am using .NET's BitArray to store and access the data. I don't think this would cause any major headaches when translated to other languages. The basic idea still applies.
using System;
using System.Collections;
namespace BigIntegerOr
{
class Program
{
private static Random r = new Random();
private static BitArray WeightedToZeroes(int size, double pctZero, out int nnz)
{
nnz = 0;
BitArray ba = new BitArray(size);
for (int i = 0; i < size; i++)
{
ba[i] = (r.NextDouble() < pctZero) ? false : true;
if (ba[i]) nnz++;
}
return ba;
}
static void Main(string[] args)
{
// make sure there are enough bytes to hold the 6000 bits
int size = (6000 + 7) / 8;
int bits = size * 8;
Console.WriteLine("PCT ZERO\tSECONDS\t\tPCT CELLS\tTOTAL CELLS\tNNZ APPROACH");
for (double pctZero = 0.8; pctZero < 1.0; pctZero += 0.01)
{
// fill the "BigInts"
int nnzA, nnzB;
BitArray a = WeightedToZeroes(bits, pctZero, out nnzA);
BitArray b = WeightedToZeroes(bits, pctZero, out nnzB);
// this is the answer "BigInt" that is at most twice the size minus 1
int xSize = bits * 2 - 1;
BitArray x = new BitArray(xSize);
int LSB, MSB;
LSB = MSB = bits - 1;
// stats
long cells = 0;
DateTime start = DateTime.Now;
for (int i = 0; i < xSize; i++)
{
// compare using the diagonals
for (int bit = LSB; bit < MSB; bit++)
{
cells++;
x[i] |= (b[MSB - bit] && a[bit]);
if (x[i]) break;
}
// update the window over the bits
if (LSB == 0)
{
MSB--;
}
else
{
LSB--;
}
//Console.Write(".");
}
// stats
TimeSpan elapsed = DateTime.Now.Subtract(start);
double pctCells = (cells * 100.0) / (bits * bits);
Console.WriteLine(pctZero.ToString("p") + "\t\t" +elapsed.TotalSeconds.ToString("00.000") + "\t\t" +
pctCells.ToString("00.00") + "\t\t" + cells.ToString("00000000") + "\t" + (nnzA * nnzB).ToString("00000000"));
}
Console.ReadLine();
}
}
}
Just use any FFT Polynomial Multiplication Algorithm and transform all resulting coefficients that are greater than or equal 1 into 1.
Example:
10011 * 10001
[1 x^4 + 0 x^3 + 0 x^2 + 1 x^1 + 1 x^0] * [1 x^4 + 0 x^3 + 0 x^2 + 0 x^1 + 1 x^0]
== [1 x^8 + 0 x^7 + 0 x^6 + 1 x^5 + 2 x^4 + 0 x^3 + 0 x^2 + 1 x^1 + 1 x^0]
-> [1 x^8 + 0 x^7 + 0 x^6 + 1 x^5 + 1 x^4 + 0 x^3 + 0 x^2 + 1 x^1 + 1 x^0]
-> 100110011
For an example of the algorithm, check:
http://www.cs.pitt.edu/~kirk/cs1501/animations/FFT.html
BTW, it is of linearithmic complexity, i.e., O(n log(n))
Also see:
http://everything2.com/title/Multiplication%2520using%2520the%2520Fast%2520Fourier%2520Transform

Resources