Using AES in CBC with the same IV for messages - encryption

Will encrypting two identical plaintext messages with AES in CBC with the same IV yield the same ciphertext?
From my understanding the first block is XOR'd with the IV, and then each subsequent block with the previous. Does this mean that with the same IV and identical messages that every block will be encrypted to the same thing? I understand using a predictable or non-changing IV for encryption is a very bad thing to do, and I am wondering why - is it because attackers can build up a "book" of known messages, or because we leave the first block vulnerable to frequency checks?
Thanks

If you use the same key both times, then yes, you'd get identical output. If you use a different key, then you'd get different output (you XOR the previous block with the current block, but then you encrypt the result to produce a block of ciphertext).
That, however, is generally of little help. One of the basic reasons for using something like CBC is to avoid repetition among messages even if they contain the same data and you continue to use the same key (though of course, it's also useful that it avoids patterns within a single message as well). Changing the IV keeps each message unique (even if some of the plaintext content is predictable) without going to all the work of distributing a new key for every message (which would generally be relatively painful).

Related

Storing IV when using AES asymmetric encryption and decryption

I'm looking at an C# AES asymmetric encryption and decryption example here and not sure if i should store the IV in a safe place (also encrypted??). Or i can just attach it to the encrypted text for using later when i with to decrypt. From a short reading about AES it seems it's not needed at all for decryption but i'm not sure i got it right and also the aes.CreateDecryptor(keyBytes, iv) need it as parameter.
I use a single key for all encryptions.
It's fairly standard to transmit the encrypted data as IV.Concat(cipherText). It's also fairly standard to put the IV off to the side, like in PKCS#5.
The IV-on-the-side approach matches more closely with how .NET wants to process the data, since it's somewhat annoying to slice off the IV to pass it separately to the IV parameter (or property), and then to have a more complicated slicing operation with the ciphertext (or recovered plaintext).
But the IV is usually transmitted in the clear either way.
So, glue it together, or make it a separate column... whatever fits your program and structure better.
Answer: IV is necessary for decryption as long as the content has been encrypted with it. You don't need to encrypt or hide the IV. It may be public.
--
The purpose of the IV is to be combined to the key that you are using, so it's like you are encrypting every "block of data" with a different "final key" and then it guarantees that the cipher data (the encrypted one) will always be different along the encryption (and decryption) process.
This is a very good illustration of what happens IF YOU DON'T use IV.
Basically, the encryption process is done by encrypting the input data in blocks. So during the encryption of this example, all the parts of the image that have the same color (let's say the white background) will output the same "cipher data" if you use always the same key, then a pattern can still be found and then you didn't hide the image as desired.
So combining a different extra data (the IV) to the key for each block is like you are using a different "final key" for each block, then you solve your problem.

Why do different implementations of AES produce different output?

I feel I have a pretty good understanding of hash functions and the contracts they entail.
SHA1 on Input X will ALWAYS produce the same output. You could use a Python library, a Java library, or pen and paper. It's a function, it is deterministic. My SHA1 does the same as yours and Alice's and Bob's.
As I understand it, AES is also a function. You put in some values, it spits out the ciphertext.
Why, then, could there ever be fears that Truecrypt (for instance) is "broken"? They're not saying AES is broken, they're saying the program that implements it may be. AES is, in theory, solid. So why can't you just run a file through Truecrypt, run it through a "reference AES" function, and verify that the results are the same? I know it absolutely does not work like that, but I don't know why.
What makes AES different from SHA1 in this way? Why might Truecrypt AES spit out a different file than Schneier-Ifier* AES, when they were both given all the same inputs?
In the end, my question boils down to:
My_SHA1(X) == Bobs_SHA1(X) == ...etc
But TrueCrypt_AES(X) != HyperCrypt_AES(X) != VeraCrypt_AES(X) etc. Why is that? Do all those programs wrap AES, but have different ways of determining stuff like an initialization vector or something?
*this would be the name of my file encryption program if I ever wrote one
In the SHA-1 example you give, there is only a single input to the function, and any correct SHA-1 implementation should produce the same output as any other when provided the same input data.
For AES however things are a bit tricker, and since you don't specify what you mean exactly by "AES", this itself seems likely to be the source of the perceived differences between implementations.
Firstly, "AES" isn't a single algorithm, but a family of algorithms that take different key sizes (128, 192 or 256 bits). AES is also a block cipher, it takes a single block of 128 bits/16 bytes of plaintext input, and encrypts this using the key to produce a single 16 byte block of output.
Of course in practice we often want to encrypt more than 16 bytes of data at once, so we must find a way to repeatedly apply the AES algorithm in order to encrypt all the data. Naively we could split it into 16 byte chunks and encrypt each one in turn, but this mode (described as Electronic Codebook or ECB) turns out to be horribly insecure. Instead, various other more secure modes are usually used, and most of these require an Initialization Vector (IV) which helps to ensure that encrypting the same data with the same key doesn't result in the same ciphertext (which would otherwise leak information).
Most of these modes still operate on fixed-sized blocks of data, but again we often want to encrypt data that isn't a multiple of the block size, so we have to use some form of padding, and again there are various different possibilities for how we pad a message to a length that is a multiple of the block size.
So to put all of this together, two different implementations of "AES" should produce the same output if all of the following are identical:
Plaintext input data
Key (and hence key size)
IV
Mode (including any mode-specific inputs)
Padding
Iridium covered many of the causes for a different output between TrueCrypt and other programs using nominally the same (AES) algorithm. If you are just checking actual initialization vectors, these tend to be done using ECB. It is the only good time to use ECB -- to make sure the algorithm itself is implemented correctly. This is because ECB, while insecure, does work without an IV and therefore makes it easier to check "apples to apples" though other stumbling blocks remain as Iridium pointed out.
With a test vector, the key is specified along with the plain text. And test vectors are specified as exact multiples of the block size. Or more specifically, they tend to be exactly 1 block in size for the plain text. This is done to remove padding and mode from the list of possible differences. So if you use standard test vectors between two AES encryption programs, you eliminate the issue with the plain text data differences, key differences, IV, mode, and padding.
But note you can still have differences. AES is just as deterministic as hashing, so you can get the same result every time with AES just as you can with hashing. It's just that there are more variables to control to get the same output result. One item Iridium did not mention but which can be an issue is endianness of the input (key and plain text). I ran into exactly this when checking a reference implementation of Serpent against TrueCrypt. They gave the same output to the text vectors only if I reversed the key and plain text between them.
To elaborate on that, if you have plain text that is all 16 bytes as 0s, and your key is 31 bytes of 0s and one byte of '33' (in the 256 bit version), if the '33' byte was on the left end of the byte string for the reference implementation, you had to feed TrueCrypt 31 '00' bytes and then the '33' byte on the right-hand side to get the same output. So as I mentioned, an endianness issue.
As for TrueCrypt maybe not being secure even if AES still is, that is absolutely true. I don't know the specifics on TrueCrypt's alleged weaknesses, but let me present a couple ways a program can have AES down right and still be insecure.
One way would be if, after the user keys in their password, the program stores it for the session in an insecure manner. If it is not encrypted in memory or if it encrypts your key using its own internal key but fails to protect that key well enough, you can have Windows write it out on the hard drive plain for all to read if it swaps memory to the hard drive. Or as such swaps are less common than they used to be, unless the TrueCrypt authors protect your key during a session, it is also possible for a malicious program to come and "debug" the key right out of the TrueCrypt software. All without AES being broken at all.
Another way it could be broken (theoretically) would be in a way that makes timing attacks possible. As a simple example, imagine a very basic crypto that takes your 32 bit key and splits it into 2 each chunks of 16 bytes. It then looks at the first chunk by byte. It bit-rotates the plain text right a number of bits corresponding to the value of byte 0 of your key. Then it XORs the plain text with the right-hand 16 bytes of your key. Then it bit-rotates again per byte 1 of your key. And so on, 16 shifts and 16 XORs. Well, if a "bad guy" were able to monitor your CPU's power consumption, they could use side channel attacks to time the CPU and / or measure its power consumption on a per-bit-of-the-key basis. The fact is it would take longer (usually, depending on the code that handles the bit-rotate) to bit-rotate 120 bits than it takes to bit-rotate 121 bits. That difference is tiny, but it is there and it has been proven to leak key information. The XOR steps would probably not leak key info, but half of your key would be known to an attacker with ease based on the above attack, even on an implementation of an unbroken algorithm, if the implementation itself is not done right -- a very difficult thing to do.
So I do not know if TrueCrypt is broken in one of these ways or in some other way altogether. But crypto is a lot harder than it looks. If the people on the inside say it is broken, it is very easy for me to believe them.

Is it possible to split a large AES encrypted string and decrypt the parts one by one?

Due to some platform restrictions our decryption can only handle up to 1 million bytes. The string we receive is larger. Is it possible to somehow split the encrypted data and decrypt the parts?
Yes. You can cut it up into multiples of the block size.
You need to know the block chaining method used. If it is CBC or another one which uses the results of the previous block as the IV for the next block{^1], then you will have to handle saving the IV out of the last block of each batch and use it to feed into the next.
[^1]: so basically anything but ECB and CTR, although even with the latter you'll need to track the correct counter value.
Answer is yes, because AES encrypts and decrypts using blocks of bits. So you can decrypt as the blocks come in, but in the proper order...
Normally you only have to split up your encrypted string if you want to split decryption over multiple processors or threads. Most platforms provide some method of streaming encryption/decryption. If that is not present, it should be relatively easy to create it yourself.

AES, Cipher Block Chaining Mode, Static Initialization Vector, and Changing Data

When using AES (or probably most any cipher), it is bad practice to reuse an initialization vector (IV) for a given key. For example, suppose I encrypt a chunk of data with a given IV using cipher block chaining (CBC) mode. For the next chunk of data, the IV should be changed (e.g., the nonce might be incremented or something). I'm wondering, though, (and mostly out of curiosity) how much of a security risk it is if the same IV is used if it can be guaranteed that the first four bytes of the chunks are incrementing. In other words, suppose two chunks of data to be encrypted are:
0x00000000someotherdatafollowsforsomenumberofblocks
0x00000001someotherdatathatmaydifferormaynotfollows
If the same IV is used for both chunks of data, how much information would be leaked?
In this particular case, it's probably OK (but don't do it, anyway). The "effective IV" is your first encrypted block, which is guaranteed to be different for each message (as long as the nonce truly never repeats under the same key), because the block cipher operation is a bijection. It's also not predictable, as long as you change the key at the same time as you change the "IV", since even with fully predictable input the attacker should not be able to predict the output of the block cipher (block cipher behaves as a pseudo-random function).
It is, however, very fragile. Someone who is maintaining this protocol long after you've moved on to greener pastures might well not understand that the security depends heavily on that non-repeating nonce, and could "optimise" it out. Is sending that single extra block each message for a real IV really an overhead you can't afford?
Mark,
what you describe is pretty much what is proposed in Appendix C of NIST SP800-38a.
In particular, there are two ways to generate an IV:
Generate a new IV randomly for
each message.
For each message use a new unique nonce (this may be a counter), encrypt the nonce, and use the result as IV.
The second option looks very similar to what you are proposing.
Well, that depends on the block size of the encryption algorithm. For the usual block size of 64 bytes i dont think that would make any difference. The first bits would be the same for many blocks, before entering the block cipher, but the result would not have any recognisable pattern. For block sizes < 4 bytes (i dont think that happens) it would make a difference, because the first block(s) would always be the same, leaking information about patterns. Just my opinion.
edit:
Found this
"For CBC and CFB, reusing an IV leaks some information about the first block
of plaintext, and about any common prefix shared by the two messages"
Source: lectures of my university :)

Should I use an initialization vector (IV) along with my encryption?

Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.

Resources