When using AES, is there a way to tell if data was encrypted using 128 or 256 bit keys? - encryption

I was wondering if there is some way to tell if data was encrypted with a specific key size, without the source code of course. Is there any detectable differences with the data that you can check post encryption?

No there is not any way to do that. Both encrypt 16-byte chunks of data and the resulting blocks would "look" the same after the encryption is complete (they would have different values, but an analysis on only the encrypted data would not be able to determine the original key size). If the original data (plain text) is available, it may be possible to do some kind of analysis.
A very simplistic "proof" is:
For a given input, the length of the output is the same regardless of the key size. It may, however, differ depending on the mode (CBC, CTR, etc.).
Since the encryption is reversible, it can be considered to be a one-to-one function. In other words, a different input results in a different output.
Therefore, it is possible to produce any given output (by changing the plain text) regardless of the key size.
Thus, for a given password, you could end up with the same output by using the appropriate plain text regardless of the key size. This "proof" has a hole in that padding schemes can result in a longer output than input (so the function is not necessarily onto.) But I doubt this would make a difference in the end result.

If an encryption system is any good (AES is) then there should be no way to distinguish its raw output from random data -- so, in particular, there should be no way to distinguish between AES-128 and AES-256, at least on the output bits.
However, most protocols which use encryption end up including some metadata which designates, without ambiguity, the kind of algorithm which was used, including key size. This is to that the receiver knows what to use to decrypt. This is not considered to be an issue. So, in practice, one has to assume that whatever attacker looks at your system knows whether the key is actually a 128-bit or 256-bit key.
Some side channels may give that information, too. AES encryption with a 256-bit key is 40% slower than AES encryption with a 128-bit key: simply timing how much time an encrypting server takes to respond can reveal the key size.

Related

Storing IV when using AES asymmetric encryption and decryption

I'm looking at an C# AES asymmetric encryption and decryption example here and not sure if i should store the IV in a safe place (also encrypted??). Or i can just attach it to the encrypted text for using later when i with to decrypt. From a short reading about AES it seems it's not needed at all for decryption but i'm not sure i got it right and also the aes.CreateDecryptor(keyBytes, iv) need it as parameter.
I use a single key for all encryptions.
It's fairly standard to transmit the encrypted data as IV.Concat(cipherText). It's also fairly standard to put the IV off to the side, like in PKCS#5.
The IV-on-the-side approach matches more closely with how .NET wants to process the data, since it's somewhat annoying to slice off the IV to pass it separately to the IV parameter (or property), and then to have a more complicated slicing operation with the ciphertext (or recovered plaintext).
But the IV is usually transmitted in the clear either way.
So, glue it together, or make it a separate column... whatever fits your program and structure better.
Answer: IV is necessary for decryption as long as the content has been encrypted with it. You don't need to encrypt or hide the IV. It may be public.
--
The purpose of the IV is to be combined to the key that you are using, so it's like you are encrypting every "block of data" with a different "final key" and then it guarantees that the cipher data (the encrypted one) will always be different along the encryption (and decryption) process.
This is a very good illustration of what happens IF YOU DON'T use IV.
Basically, the encryption process is done by encrypting the input data in blocks. So during the encryption of this example, all the parts of the image that have the same color (let's say the white background) will output the same "cipher data" if you use always the same key, then a pattern can still be found and then you didn't hide the image as desired.
So combining a different extra data (the IV) to the key for each block is like you are using a different "final key" for each block, then you solve your problem.

Why do different implementations of AES produce different output?

I feel I have a pretty good understanding of hash functions and the contracts they entail.
SHA1 on Input X will ALWAYS produce the same output. You could use a Python library, a Java library, or pen and paper. It's a function, it is deterministic. My SHA1 does the same as yours and Alice's and Bob's.
As I understand it, AES is also a function. You put in some values, it spits out the ciphertext.
Why, then, could there ever be fears that Truecrypt (for instance) is "broken"? They're not saying AES is broken, they're saying the program that implements it may be. AES is, in theory, solid. So why can't you just run a file through Truecrypt, run it through a "reference AES" function, and verify that the results are the same? I know it absolutely does not work like that, but I don't know why.
What makes AES different from SHA1 in this way? Why might Truecrypt AES spit out a different file than Schneier-Ifier* AES, when they were both given all the same inputs?
In the end, my question boils down to:
My_SHA1(X) == Bobs_SHA1(X) == ...etc
But TrueCrypt_AES(X) != HyperCrypt_AES(X) != VeraCrypt_AES(X) etc. Why is that? Do all those programs wrap AES, but have different ways of determining stuff like an initialization vector or something?
*this would be the name of my file encryption program if I ever wrote one
In the SHA-1 example you give, there is only a single input to the function, and any correct SHA-1 implementation should produce the same output as any other when provided the same input data.
For AES however things are a bit tricker, and since you don't specify what you mean exactly by "AES", this itself seems likely to be the source of the perceived differences between implementations.
Firstly, "AES" isn't a single algorithm, but a family of algorithms that take different key sizes (128, 192 or 256 bits). AES is also a block cipher, it takes a single block of 128 bits/16 bytes of plaintext input, and encrypts this using the key to produce a single 16 byte block of output.
Of course in practice we often want to encrypt more than 16 bytes of data at once, so we must find a way to repeatedly apply the AES algorithm in order to encrypt all the data. Naively we could split it into 16 byte chunks and encrypt each one in turn, but this mode (described as Electronic Codebook or ECB) turns out to be horribly insecure. Instead, various other more secure modes are usually used, and most of these require an Initialization Vector (IV) which helps to ensure that encrypting the same data with the same key doesn't result in the same ciphertext (which would otherwise leak information).
Most of these modes still operate on fixed-sized blocks of data, but again we often want to encrypt data that isn't a multiple of the block size, so we have to use some form of padding, and again there are various different possibilities for how we pad a message to a length that is a multiple of the block size.
So to put all of this together, two different implementations of "AES" should produce the same output if all of the following are identical:
Plaintext input data
Key (and hence key size)
IV
Mode (including any mode-specific inputs)
Padding
Iridium covered many of the causes for a different output between TrueCrypt and other programs using nominally the same (AES) algorithm. If you are just checking actual initialization vectors, these tend to be done using ECB. It is the only good time to use ECB -- to make sure the algorithm itself is implemented correctly. This is because ECB, while insecure, does work without an IV and therefore makes it easier to check "apples to apples" though other stumbling blocks remain as Iridium pointed out.
With a test vector, the key is specified along with the plain text. And test vectors are specified as exact multiples of the block size. Or more specifically, they tend to be exactly 1 block in size for the plain text. This is done to remove padding and mode from the list of possible differences. So if you use standard test vectors between two AES encryption programs, you eliminate the issue with the plain text data differences, key differences, IV, mode, and padding.
But note you can still have differences. AES is just as deterministic as hashing, so you can get the same result every time with AES just as you can with hashing. It's just that there are more variables to control to get the same output result. One item Iridium did not mention but which can be an issue is endianness of the input (key and plain text). I ran into exactly this when checking a reference implementation of Serpent against TrueCrypt. They gave the same output to the text vectors only if I reversed the key and plain text between them.
To elaborate on that, if you have plain text that is all 16 bytes as 0s, and your key is 31 bytes of 0s and one byte of '33' (in the 256 bit version), if the '33' byte was on the left end of the byte string for the reference implementation, you had to feed TrueCrypt 31 '00' bytes and then the '33' byte on the right-hand side to get the same output. So as I mentioned, an endianness issue.
As for TrueCrypt maybe not being secure even if AES still is, that is absolutely true. I don't know the specifics on TrueCrypt's alleged weaknesses, but let me present a couple ways a program can have AES down right and still be insecure.
One way would be if, after the user keys in their password, the program stores it for the session in an insecure manner. If it is not encrypted in memory or if it encrypts your key using its own internal key but fails to protect that key well enough, you can have Windows write it out on the hard drive plain for all to read if it swaps memory to the hard drive. Or as such swaps are less common than they used to be, unless the TrueCrypt authors protect your key during a session, it is also possible for a malicious program to come and "debug" the key right out of the TrueCrypt software. All without AES being broken at all.
Another way it could be broken (theoretically) would be in a way that makes timing attacks possible. As a simple example, imagine a very basic crypto that takes your 32 bit key and splits it into 2 each chunks of 16 bytes. It then looks at the first chunk by byte. It bit-rotates the plain text right a number of bits corresponding to the value of byte 0 of your key. Then it XORs the plain text with the right-hand 16 bytes of your key. Then it bit-rotates again per byte 1 of your key. And so on, 16 shifts and 16 XORs. Well, if a "bad guy" were able to monitor your CPU's power consumption, they could use side channel attacks to time the CPU and / or measure its power consumption on a per-bit-of-the-key basis. The fact is it would take longer (usually, depending on the code that handles the bit-rotate) to bit-rotate 120 bits than it takes to bit-rotate 121 bits. That difference is tiny, but it is there and it has been proven to leak key information. The XOR steps would probably not leak key info, but half of your key would be known to an attacker with ease based on the above attack, even on an implementation of an unbroken algorithm, if the implementation itself is not done right -- a very difficult thing to do.
So I do not know if TrueCrypt is broken in one of these ways or in some other way altogether. But crypto is a lot harder than it looks. If the people on the inside say it is broken, it is very easy for me to believe them.

AES Encryption and Obfuscating IDs

I was considering hashing small blocks of sensitive ID data but I require to maintain the full uniqueness of the data blocks as a whole once obfuscated.
So, I came up with the idea of encrypting some publicly-known input data (say, 128 bits of zeroes), and use the data I want to obfuscate as the key/password, then throw it away, thus protecting the original data from ever being discovered.
I already know about hashing algorithms, but my problem is that I need to maintain full uniqueness (generally speaking a 1:1 mapping of input to output) while still making it impossible to retrieve the actual input. A hash cannot serve this function because information is lost during the process.
It is not necessary that the data be retrieved once "encrypted". It is only to be used as an ID number from then on.
An actual GUID/UUID is not suitable here because I need to manually control the identifiers on a per-identifier basis. The IDs cannot be unknown or arbitrarily generated data.
EDIT: To clarify exactly what these identifiers are made of:
(unencrypted) 64bit Time Stamp
ID Generation Counter (one count for each filetype)
Random Data (to make multiple encrypted keys dissimilar)
MAC Address (or if that's not available, set top bit + random digits)
Other PC-Specific Information (from registry)
The whole thing should add up to 192 bits, but the encrypted section's content size(s) could vary (this is by no means a final specification).
Given:
A static IV value
Any arbitrary 128bit key
A static 128 bits of input
Are AES keys treated in a fashion that would result in a 1:1 key<---->output mapping, given the same input and IV value?
No. AES is, in the abstract, a family of permutations of which you select a random one with the key. It is the case that for one of those permutations(i.e. for encryption under a given AES key) you will not get collisions because permutations are bijective.
However, for two different permutations (i.e. encryption under different AES keys, which is what you have), there is no guarantee what so ever that you don't get a collision. Indeed, because of the birthday paradox, the likelihood of a collision is probably higher than you think.
If your ID's are short ( < 1024 bits) you could just do an RSA encryption of them which would give you want you want. You'd just need to forget the private key.

Can information encoded with a one time pad be distinguished from random noise?

I understand that the cyphertext from a properly used one time pad cypher reveals absolutely no data about the encrypted message.
Does this mean that there is no way to distinguish a message encrypted with a one time pad from completely random noise? Or is there some theoretical way to determine that there is a message, even though you can't learn anything about it?
There is no way to determine if a string has been encrypted with a OTP. You can produce any string of the same size by choosing an appropriate key.
For example (from the Wikipedia One Time Pad article), the plaintext "HELLO" can be encrypted with the key "XMCKL", giving ciphertext "EQNVZ". But it is possible to find keys which produce any 5 character plaintext, such as "LATER". There is no way to determine the original plaintext without the original key.
A OTP can be 'broken' if it is reused (and therefore is no longer a one time pad). The Venona Project is an example of what can happen when OTPs are reused.
A major drawback to OTPs is that you must securely distribute a key equal in size to the plaintext to be encoded.
If your one-time pad is completely random, then anything XOR'd with it also is (assuming your message has no/low correlation with the contents of the one-time pad).

AES, Cipher Block Chaining Mode, Static Initialization Vector, and Changing Data

When using AES (or probably most any cipher), it is bad practice to reuse an initialization vector (IV) for a given key. For example, suppose I encrypt a chunk of data with a given IV using cipher block chaining (CBC) mode. For the next chunk of data, the IV should be changed (e.g., the nonce might be incremented or something). I'm wondering, though, (and mostly out of curiosity) how much of a security risk it is if the same IV is used if it can be guaranteed that the first four bytes of the chunks are incrementing. In other words, suppose two chunks of data to be encrypted are:
0x00000000someotherdatafollowsforsomenumberofblocks
0x00000001someotherdatathatmaydifferormaynotfollows
If the same IV is used for both chunks of data, how much information would be leaked?
In this particular case, it's probably OK (but don't do it, anyway). The "effective IV" is your first encrypted block, which is guaranteed to be different for each message (as long as the nonce truly never repeats under the same key), because the block cipher operation is a bijection. It's also not predictable, as long as you change the key at the same time as you change the "IV", since even with fully predictable input the attacker should not be able to predict the output of the block cipher (block cipher behaves as a pseudo-random function).
It is, however, very fragile. Someone who is maintaining this protocol long after you've moved on to greener pastures might well not understand that the security depends heavily on that non-repeating nonce, and could "optimise" it out. Is sending that single extra block each message for a real IV really an overhead you can't afford?
Mark,
what you describe is pretty much what is proposed in Appendix C of NIST SP800-38a.
In particular, there are two ways to generate an IV:
Generate a new IV randomly for
each message.
For each message use a new unique nonce (this may be a counter), encrypt the nonce, and use the result as IV.
The second option looks very similar to what you are proposing.
Well, that depends on the block size of the encryption algorithm. For the usual block size of 64 bytes i dont think that would make any difference. The first bits would be the same for many blocks, before entering the block cipher, but the result would not have any recognisable pattern. For block sizes < 4 bytes (i dont think that happens) it would make a difference, because the first block(s) would always be the same, leaking information about patterns. Just my opinion.
edit:
Found this
"For CBC and CFB, reusing an IV leaks some information about the first block
of plaintext, and about any common prefix shared by the two messages"
Source: lectures of my university :)

Resources