Encryption - PKCS5/7 padding race condition? - encryption

I'm writing some crypto (known algorithm - not rolling my own) but I couldn't find any specific documentation on this case.
One method of padding (although the issue is there for any of them could have the same problem) works like this:
If your block is < 8 bytes, pad the end with the number of padding bytes
So FF E2 B8 AA becomes FF E2 B8 AA 04 04 04 04
Which is great and allows you with a pretty obvious window with which you can remove padding during decryption, but my question is that instead of the above example say I have this -
10 39 ff ef 09 64 aa (7 bytes in length). Now in this situation the above algorithm would say to convert this to 10 39 ff ef 09 64 aa 01, but my question is then when decrypting how do you decide between when you get a 01 byte on the end of a decrypted message how do you know whether it's meant to be padding (and should be stripped) or it's part of the actual message and you should keep it?
The most reasonable solutions I can think of would be append/prepend the size of the actual message in the encryption or add a parity block to state whether there's padding or not, which both have their own problems in my mind.
I'm assuming this problem has been encountered before but I was wondering what the solution was.

PKCS #5/7 padding is always added – if the length of the plaintext is a multiple of the block size, a whole block of padding is added. This way there is no ambiguity, which is the main benefit of PKCS #7 over, say, zero padding.
Quoted from the PKCS #7 specification:
2. Some content-encryption algorithms assume the
input length is a multiple of k octets, where k > 1, and
let the application define a method for handling inputs
whose lengths are not a multiple of k octets. For such
algorithms, the method shall be to pad the input at the
trailing end with k - (l mod k) octets all having value k -
(l mod k), where l is the length of the input. In other
words, the input is padded at the trailing end with one of
the following strings:
01 -- if l mod k = k-1
02 02 -- if l mod k = k-2
.
.
.
k k ... k k -- if l mod k = 0
The padding can be removed unambiguously since all input is
padded and no padding string is a suffix of another. This
padding method is well-defined if and only if k < 256;
methods for larger k are an open issue for further study.

Related

Implementation of Speck cipher

I am trying to implement the speck cipher as specified here: Speck Cipher. On page 18 of the document you can find some speck pseudo-code I want to implement.
It seems that I got a problem on understanding the pseudo-code. As you can find there, x and y are plaintext words with length n. l[m-2],...l[0], k[0] are key words (as for words, they have length n right?). When you do the key expansion, we iterate for i from 0 to T-2, where T are the round numbers (for example 34). However I get an IndexOutofBoundsException, because the array with the l's has only m-2 positions and not T-2.
Can someone clarify what the key expansions does and how?
Ah, I get where the confusion lies:
l[m-2],...l[0], k[0]
these are the input key words, in other words, they represent the key. These are not declarations of the size of the arrays, as you might expect if you're a developer.
Then the subkey's in array k should be derived, using array l for intermediate values.
According to the formulas, taking the largest i, i.e. i_max = T - 2 you get a highest index for array l of i_max + m - 1 = T - 2 + m - 1 = T + m - 3 and therefore a size of the array of one more: T + m - 2. The size of a zero-based array is always the index of the last element - plus one, after all.
Similarly, for subkey array k you get a highest index of i_max + 1, which is T - 2 + 1 or T - 1. Again, the size of the array is one more, so there are T elements in k. This makes a lot of sense if you require T round keys :)
Note that it seems possible to simply redo the subkey derivation for each round if you require a minimum of RAM. The entire l array doesn't seem necessary either. For software implementations that doesn't matter a single iota of course.

Samsung IR codes checksum

I had a long time decoding IR codes with optimum's Ken Shirriff Arduino Library. I modified the code a bit so that I was able to dump a Samsung air conditioner (MH026FB) 56-bit signals.
The results of my work is located in Google Docs document Samsung MH026FB AirCon IR Codes Dump.
It is a spreasheet with all dumped values and the interpretation of results. AFAIK, air conditioner unit sends out two or three "bursts" of 56 bit data, depending on command. I was able to decode bits properly, figuring out where air conditioner temperature, fan, function and other options are located.
The problem I have is related to the checksum. In all those 7-byte codes, the second one is computed somehow from the latter 5 bytes, for example:
BF B2 0F FF FF FF F0 (lead-in code)
7F B8 8A 71 F6 4F F0 (auto mode - 25 degrees)
7F B2 80 71 7A 4F F0 (auto mode - 26 degrees)
7F B4 80 71 FA 7D F0 (heat mode - 26 degrees - fan auto)
Since I re-create the IR codes at runtime, I need to be able to compute checksum for these codes.
I tried with many standard checksum algorithms, none of them gave meaningful results. The checksum seems to be related to number of zeroes in the rest of code (bytes from 3 to 7), but I really can't figure it how.
Is there a solution to this problem?
Ken Shirriff sorted this out. Algorithm is as follow:
Count the number of 1 bits in all the bytes except #2 (checksum)
Compute count mod 15. If the value is 0, use 15 instead.
Take the value from 2, flip the 4 bits, and reverse the 4 bits.
The checksum is Bn where n is the value from the previous step.
Congraturations to him for his smartness and sharpness.
When bit order in bytes/packets and 0/1 are interpreted properly (from the algorithm it appears that both are reversed), the algorithm would be just sum of 0 bits modulo 15.
It is nearly correct.
Count the 0's / 1's (You can call them what you like, but it is the short signals).
Do not count 2. byte and first/last bit of 3.byte (depending if you are seeing it as big or little indian).
Take the result and -30 (29-30 = 15, only looking af 4 bits!)
Reverse result
Checksum = 0x4 "reverse resultesult", if short signals = 0, and 0xB "reverse resultesult" if long signal = 0.
i used Ken's method but mod 15 didnt work for me.
Count the number of 1 bits in all the bytes except #2 (checksum)
Compute count mod 17. if value is 16, use first byte of mode result(0).
Take the value , flip the 4 bits.
The checksum is 0xn9 where n is the value from the previous step.

Calculating FCS(CRC) for HDLC frame

I have the following frame:
7e 01 00 00 01 00 18 ef 00 00 00 b5 20 c1 05 10 02 71 2e 1a c2 05 10 01 71 00 6e 87 02 00 01 42 71 2e 1a 01 96 27 be 27 54 17 3d b9 93 ac 7e
If I understand correctly, then it is this portion of the frame on which the FCS is calculated:
010000010018ef000000b520c1051002712e1ac205100171006e8702000142712e1a019627be2754173db9
I've tried entering this into a number of online calculators but I cant produce 0x93ac from the above data.
http://www.lammertbies.nl/comm/info/crc-calculation.html with input type hex.
How is 0x93ac arrived at?
Thanks,
Barry
Answering rather for others who got here while searching for advice.
The key is what several points in the closely related ITU-T recommendations (e.g. Q.921, available online for quite some time already) say:
1. the lowest order bit is transmitted (and thus received) first
This legacy behaviour is in contrary to the daily life conventions where highest order digits are written first in the order of reading, and all the generic online calculators and libraries perform the calculation using the conventional order and provide optional settings to facilitate the reversed one.
Therefore, you must ask the online calculator
to reverse the order of bits in the message you've input in the "conventional" format before performing the calculation,
to reverse the order of bits of the result so that you get them in
the same order like in the message itself
Quite reasonably, some calculators offer just a single common setting for both.
This reasons the settings "reverse data bytes" and "reverse CRC result before Final XOR" recommended in the previous answer;
2. the result of the CRC calculation must be bit-inverted before sending
Bit inversion is another name of "xor by 0xffff...". There is a purpose in bit-inverting the CRC calculation result before sending it as the message FCS (the last two bytes of the message, the '93 ac' in your example).
See point 4 for details.
This reasons the setting "Final value ffff", whose name is quite misleading as it actually defines the pattern to be for xor'ed with the result of the calculation. As such operation is required by several CRC types, only the xor patterns vary from 0 (no op) through 0xfff... (complete inversion), generic calculators/libraries offer it for simplicity of use.
3. the calculation must include processing of a leading sequence of 0xffff
This reasons the point "initial value ffff".
4. on the receiving (checking) side, it is recommended to push the complete message, i.e. including the FCS, through the CRC calculation, and expect the result to be 0x1d0f
There is some clever thinking behind this:
the intrinsic property of the CRC algorithm is that
CRC( x.CRC(x) )
is always 0 (x represents the original message and "." represents concatenation).
running the complete message through the calculation rather than
calculating only the message itself and comparing with the FCS
received separately means much simpler algorithm (or even circuitry)
at the receiving side.
however, it is too easy to make a coding mistake causing a result to become 0. Luckily, thanks to the CRC algorithm intrinsic properties again,
CRC( x.(CRC(x))' )
yields a constant value independent of x and different from 0 (at least for CRC-CCITT, which we talk about here). The "'" sign represents the bit inversion as required in point 2.
First of all, CRC value is 0xac93
Use this calculator: http://www.zorc.breitbandkatze.de/crc.html
Set CRC order 16
Polynomial 1021
Initial value ffff
Final value ffff
"reverse data bytes"
"reverse CRC result before Final XOR"
Enter your sequence as:
%01%00%00%01%00%18%ef%00%00%00%b5%20%c1%05%10%02%71%2e%1a%c2%05%10%01%71%00%6e%87%02%00%01%42%71%2e%1a%01%96%27%be%27%54%17%3d%b9
Press "calculate" and you get 0xAC93
This is simple Python script for HDLC CRC calculation. You can use it for DLMS
def byte_mirror(c):
c = (c & 0xF0) >> 4 | (c & 0x0F) << 4
c = (c & 0xCC) >> 2 | (c & 0x33) << 2
c = (c & 0xAA) >> 1 | (c & 0x55) << 1
return c
CRC_INIT=0xffff
POLYNOMIAL=0x1021
DATA_VALUE=0xA0
SNRM_request=[ 0x7E, 0xA0, 0x08, 0x03, 0x02, 0xFF, 0x93, 0xCA, 0xE4, 0x7E]
print("sent>>", end=" ")
for x in SNRM_request:
if x>15:
print(hex(x), end=" ")
else:
a=str(hex(x))
a = a[:2] + "0" + a[2:]
print(a, end=" ")
lenn=len(SNRM_request)
print(" ")
crc = CRC_INIT
for i in range(lenn):
if( (i!=0) and (i!=(lenn-1)) and (i!=(lenn-2)) and (i!=(lenn-3)) ):
print("i>>",i)
c=SNRM_request[i]
c=byte_mirror(c)
c = c << 8
print(hex(c))
for j in range(8):
print(hex(c))
print("CRC",hex(crc))
if (crc ^ c) & 0x8000:
crc = (crc << 1) ^ POLYNOMIAL
else:
crc = crc << 1
c = c << 1
crc=crc%65536
c =c%65536
print("CRC-CALC",hex(crc))
crc=0xFFFF-crc
print("CRC- NOT",hex(crc))
crc_HI=crc//256
crc_LO=crc%256
print("CRC-HI",hex(crc_HI))
print("CRC-LO",hex(crc_LO))
crc_HI=byte_mirror(crc_HI)
crc_LO=byte_mirror(crc_LO)
print("CRC-HI-zrc",hex(crc_HI))
print("CRC-LO-zrc",hex(crc_LO))
crc=256*crc_HI+crc_LO
print("CRC-END",hex(crc))
For future readers, there's code in appendix C of RFC1662 to calculate FCS for HDLC.

Hexadecimal calculation for a checksum

I'm not understanding how this result can be zero. This was presented to me has an example to validate a checksum of a message.
ED(12+01+ED=0)
How can this result be zero?
"1201 is the message" ED is the checksum, my question is more on, how can I determine the checksum?
Thank you for any help.
Best regards,
FR
How can this result be zero?
The checksum is presumably represented by a byte.
A byte can store 256 different values, so the calculation is probably done module 256.
Since 0x12 + 0x01 + 0xED = 256, the result becomes 0.
how can I determine the checksum?
The checksum is the specific byte value B that makes the sum of the bytes in the message + B = 0 (modulo 256).
So, as #LanceH says in the comment, to figure out the checksum B, you...
add up the values of the bytes in the message (say it adds up to M)
compute M' = M % 256
Now, the checksum B is computed as 256 - M'.
I'm not sure about your checksum details but in base-16 arithmetic (and in base-10):
base-16 base-10
-----------------------
12 18
01 1
+ ED 237
------------------------
100 256
If your checksum is modulo-256 (16^2), you only keep the last 2 base-16 digits, so you have 00
Well, obviously, when you add up 12 + 01 + ED the result overflows 1 byte, and it's actually the hex number 100. So, if you only take the final byte of 0x0100. you get 0.

Arbitrary-precision arithmetic Explanation

I'm trying to learn C and have come across the inability to work with REALLY big numbers (i.e., 100 digits, 1000 digits, etc.). I am aware that there exist libraries to do this, but I want to attempt to implement it myself.
I just want to know if anyone has or can provide a very detailed, dumbed down explanation of arbitrary-precision arithmetic.
It's all a matter of adequate storage and algorithms to treat numbers as smaller parts. Let's assume you have a compiler in which an int can only be 0 through 99 and you want to handle numbers up to 999999 (we'll only worry about positive numbers here to keep it simple).
You do that by giving each number three ints and using the same rules you (should have) learned back in primary school for addition, subtraction and the other basic operations.
In an arbitrary precision library, there's no fixed limit on the number of base types used to represent our numbers, just whatever memory can hold.
Addition for example: 123456 + 78:
12 34 56
78
-- -- --
12 35 34
Working from the least significant end:
initial carry = 0.
56 + 78 + 0 carry = 134 = 34 with 1 carry
34 + 00 + 1 carry = 35 = 35 with 0 carry
12 + 00 + 0 carry = 12 = 12 with 0 carry
This is, in fact, how addition generally works at the bit level inside your CPU.
Subtraction is similar (using subtraction of the base type and borrow instead of carry), multiplication can be done with repeated additions (very slow) or cross-products (faster) and division is trickier but can be done by shifting and subtraction of the numbers involved (the long division you would have learned as a kid).
I've actually written libraries to do this sort of stuff using the maximum powers of ten that can be fit into an integer when squared (to prevent overflow when multiplying two ints together, such as a 16-bit int being limited to 0 through 99 to generate 9,801 (<32,768) when squared, or 32-bit int using 0 through 9,999 to generate 99,980,001 (<2,147,483,648)) which greatly eased the algorithms.
Some tricks to watch out for.
1/ When adding or multiplying numbers, pre-allocate the maximum space needed then reduce later if you find it's too much. For example, adding two 100-"digit" (where digit is an int) numbers will never give you more than 101 digits. Multiply a 12-digit number by a 3 digit number will never generate more than 15 digits (add the digit counts).
2/ For added speed, normalise (reduce the storage required for) the numbers only if absolutely necessary - my library had this as a separate call so the user can decide between speed and storage concerns.
3/ Addition of a positive and negative number is subtraction, and subtracting a negative number is the same as adding the equivalent positive. You can save quite a bit of code by having the add and subtract methods call each other after adjusting signs.
4/ Avoid subtracting big numbers from small ones since you invariably end up with numbers like:
10
11-
-- -- -- --
99 99 99 99 (and you still have a borrow).
Instead, subtract 10 from 11, then negate it:
11
10-
--
1 (then negate to get -1).
Here are the comments (turned into text) from one of the libraries I had to do this for. The code itself is, unfortunately, copyrighted, but you may be able to pick out enough information to handle the four basic operations. Assume in the following that -a and -b represent negative numbers and a and b are zero or positive numbers.
For addition, if signs are different, use subtraction of the negation:
-a + b becomes b - a
a + -b becomes a - b
For subtraction, if signs are different, use addition of the negation:
a - -b becomes a + b
-a - b becomes -(a + b)
Also special handling to ensure we're subtracting small numbers from large:
small - big becomes -(big - small)
Multiplication uses entry-level math as follows:
475(a) x 32(b) = 475 x (30 + 2)
= 475 x 30 + 475 x 2
= 4750 x 3 + 475 x 2
= 4750 + 4750 + 4750 + 475 + 475
The way in which this is achieved involves extracting each of the digits of 32 one at a time (backwards) then using add to calculate a value to be added to the result (initially zero).
ShiftLeft and ShiftRight operations are used to quickly multiply or divide a LongInt by the wrap value (10 for "real" math). In the example above, we add 475 to zero 2 times (the last digit of 32) to get 950 (result = 0 + 950 = 950).
Then we left shift 475 to get 4750 and right shift 32 to get 3. Add 4750 to zero 3 times to get 14250 then add to result of 950 to get 15200.
Left shift 4750 to get 47500, right shift 3 to get 0. Since the right shifted 32 is now zero, we're finished and, in fact 475 x 32 does equal 15200.
Division is also tricky but based on early arithmetic (the "gazinta" method for "goes into"). Consider the following long division for 12345 / 27:
457
+-------
27 | 12345 27 is larger than 1 or 12 so we first use 123.
108 27 goes into 123 4 times, 4 x 27 = 108, 123 - 108 = 15.
---
154 Bring down 4.
135 27 goes into 154 5 times, 5 x 27 = 135, 154 - 135 = 19.
---
195 Bring down 5.
189 27 goes into 195 7 times, 7 x 27 = 189, 195 - 189 = 6.
---
6 Nothing more to bring down, so stop.
Therefore 12345 / 27 is 457 with remainder 6. Verify:
457 x 27 + 6
= 12339 + 6
= 12345
This is implemented by using a draw-down variable (initially zero) to bring down the segments of 12345 one at a time until it's greater or equal to 27.
Then we simply subtract 27 from that until we get below 27 - the number of subtractions is the segment added to the top line.
When there are no more segments to bring down, we have our result.
Keep in mind these are pretty basic algorithms. There are far better ways to do complex arithmetic if your numbers are going to be particularly large. You can look into something like GNU Multiple Precision Arithmetic Library - it's substantially better and faster than my own libraries.
It does have the rather unfortunate misfeature in that it will simply exit if it runs out of memory (a rather fatal flaw for a general purpose library in my opinion) but, if you can look past that, it's pretty good at what it does.
If you cannot use it for licensing reasons (or because you don't want your application just exiting for no apparent reason), you could at least get the algorithms from there for integrating into your own code.
I've also found that the bods over at MPIR (a fork of GMP) are more amenable to discussions on potential changes - they seem a more developer-friendly bunch.
While re-inventing the wheel is extremely good for your personal edification and learning, its also an extremely large task. I don't want to dissuade you as its an important exercise and one that I've done myself, but you should be aware that there are subtle and complex issues at work that larger packages address.
For example, multiplication. Naively, you might think of the 'schoolboy' method, i.e. write one number above the other, then do long multiplication as you learned in school. example:
123
x 34
-----
492
+ 3690
---------
4182
but this method is extremely slow (O(n^2), n being the number of digits). Instead, modern bignum packages use either a discrete Fourier transform or a Numeric transform to turn this into an essentially O(n ln(n)) operation.
And this is just for integers. When you get into more complicated functions on some type of real representation of number (log, sqrt, exp, etc.) things get even more complicated.
If you'd like some theoretical background, I highly recommend reading the first chapter of Yap's book, "Fundamental Problems of Algorithmic Algebra". As already mentioned, the gmp bignum library is an excellent library. For real numbers, I've used MPFR and liked it.
Don't reinvent the wheel: it might turn out to be square!
Use a third party library, such as GNU MP, that is tried and tested.
You do it in basically the same way you do with pencil and paper...
The number is to be represented in a buffer (array) able to take on an arbitrary size (which means using malloc and realloc) as needed
you implement basic arithmetic as much as possible using language supported structures, and deal with carries and moving the radix-point manually
you scour numeric analysis texts to find efficient arguments for dealing by more complex function
you only implement as much as you need.
Typically you will use as you basic unit of computation
bytes containing with 0-99 or 0-255
16 bit words contaning wither 0-9999 or 0--65536
32 bit words containing...
...
as dictated by your architecture.
The choice of binary or decimal base depends on you desires for maximum space efficiency, human readability, and the presence of absence of Binary Coded Decimal (BCD) math support on your chip.
You can do it with high school level of mathematics. Though more advanced algorithms are used in reality. So for example to add two 1024-byte numbers :
unsigned char first[1024], second[1024], result[1025];
unsigned char carry = 0;
unsigned int sum = 0;
for(size_t i = 0; i < 1024; i++)
{
sum = first[i] + second[i] + carry;
carry = sum - 255;
}
result will have to be bigger by one place in case of addition to take care of maximum values. Look at this :
9
+
9
----
18
TTMath is a great library if you want to learn. It is built using C++. The above example was silly one, but this is how addition and subtraction is done in general!
A good reference about the subject is Computational complexity of mathematical operations. It tells you how much space is required for each operation you want to implement. For example, If you have two N-digit numbers, then you need 2N digits to store the result of multiplication.
As Mitch said, it is by far not an easy task to implement! I recommend you take a look at TTMath if you know C++.
One of the ultimate references (IMHO) is Knuth's TAOCP Volume II. It explains lots of algorithms for representing numbers and arithmetic operations on these representations.
#Book{Knuth:taocp:2,
author = {Knuth, Donald E.},
title = {The Art of Computer Programming},
volume = {2: Seminumerical Algorithms, second edition},
year = {1981},
publisher = {\Range{Addison}{Wesley}},
isbn = {0-201-03822-6},
}
Assuming that you wish to write a big integer code yourself, this can be surprisingly simple to do, spoken as someone who did it recently (though in MATLAB.) Here are a few of the tricks I used:
I stored each individual decimal digit as a double number. This makes many operations simple, especially output. While it does take up more storage than you might wish, memory is cheap here, and it makes multiplication very efficient if you can convolve a pair of vectors efficiently. Alternatively, you can store several decimal digits in a double, but beware then that convolution to do the multiplication can cause numerical problems on very large numbers.
Store a sign bit separately.
Addition of two numbers is mainly a matter of adding the digits, then check for a carry at each step.
Multiplication of a pair of numbers is best done as convolution followed by a carry step, at least if you have a fast convolution code on tap.
Even when you store the numbers as a string of individual decimal digits, division (also mod/rem ops) can be done to gain roughly 13 decimal digits at a time in the result. This is much more efficient than a divide that works on only 1 decimal digit at a time.
To compute an integer power of an integer, compute the binary representation of the exponent. Then use repeated squaring operations to compute the powers as needed.
Many operations (factoring, primality tests, etc.) will benefit from a powermod operation. That is, when you compute mod(a^p,N), reduce the result mod N at each step of the exponentiation where p has been expressed in a binary form. Do not compute a^p first, and then try to reduce it mod N.
Here's a simple ( naive ) example I did in PHP.
I implemented "Add" and "Multiply" and used that for an exponent example.
http://adevsoft.com/simple-php-arbitrary-precision-integer-big-num-example/
Code snip
// Add two big integers
function ba($a, $b)
{
if( $a === "0" ) return $b;
else if( $b === "0") return $a;
$aa = str_split(strrev(strlen($a)>1?ltrim($a,"0"):$a), 9);
$bb = str_split(strrev(strlen($b)>1?ltrim($b,"0"):$b), 9);
$rr = Array();
$maxC = max(Array(count($aa), count($bb)));
$aa = array_pad(array_map("strrev", $aa),$maxC+1,"0");
$bb = array_pad(array_map("strrev", $bb),$maxC+1,"0");
for( $i=0; $i<=$maxC; $i++ )
{
$t = str_pad((string) ($aa[$i] + $bb[$i]), 9, "0", STR_PAD_LEFT);
if( strlen($t) > 9 )
{
$aa[$i+1] = ba($aa[$i+1], substr($t,0,1));
$t = substr($t, 1);
}
array_unshift($rr, $t);
}
return implode($rr);
}

Resources