Storing the Initialization Vector - Separate field? - encryption

When encryption sensitive information using the .NET AesCryptoServiceProvider library I generate a unique Initialization Vector (IV) for each value that is encrypted. In the database record where I save the encrypted data I have a field named "IV" which stores the Initialization Vector for use in later decryption.
Is there a different way in which the Initialization Vector can be stored alongside the cipher-text? By appending the IV to the Cipher Text perhaps? If so, is there a standard approach?

Is there a different way in which the Initialization Vector can be stored alongside the cipher-text? By appending the IV to the Cipher Text perhaps?
Yes, you can do exactly that. Prepending it to the ciphertext works. Since the IV has fixed size depending on block mode, block cipher and protocol, you can slice the IV off during decryption and treat the remaining bytes as the actual ciphertext.
If so, is there a standard approach?
No, there is no standard. A common way is to prepend the IV. If you're applying the Cryptographic Message Standard (CMS), then there is a little bit about how the IV is stored. RFC3370

Related

Storing IV when using AES asymmetric encryption and decryption

I'm looking at an C# AES asymmetric encryption and decryption example here and not sure if i should store the IV in a safe place (also encrypted??). Or i can just attach it to the encrypted text for using later when i with to decrypt. From a short reading about AES it seems it's not needed at all for decryption but i'm not sure i got it right and also the aes.CreateDecryptor(keyBytes, iv) need it as parameter.
I use a single key for all encryptions.
It's fairly standard to transmit the encrypted data as IV.Concat(cipherText). It's also fairly standard to put the IV off to the side, like in PKCS#5.
The IV-on-the-side approach matches more closely with how .NET wants to process the data, since it's somewhat annoying to slice off the IV to pass it separately to the IV parameter (or property), and then to have a more complicated slicing operation with the ciphertext (or recovered plaintext).
But the IV is usually transmitted in the clear either way.
So, glue it together, or make it a separate column... whatever fits your program and structure better.
Answer: IV is necessary for decryption as long as the content has been encrypted with it. You don't need to encrypt or hide the IV. It may be public.
--
The purpose of the IV is to be combined to the key that you are using, so it's like you are encrypting every "block of data" with a different "final key" and then it guarantees that the cipher data (the encrypted one) will always be different along the encryption (and decryption) process.
This is a very good illustration of what happens IF YOU DON'T use IV.
Basically, the encryption process is done by encrypting the input data in blocks. So during the encryption of this example, all the parts of the image that have the same color (let's say the white background) will output the same "cipher data" if you use always the same key, then a pattern can still be found and then you didn't hide the image as desired.
So combining a different extra data (the IV) to the key for each block is like you are using a different "final key" for each block, then you solve your problem.

AES Encryption - Key versus IV

The application I am working on lets the user encrypt files. The files could be of any format (spreadsheet, document, presentation, etc.).
For the specified input file, I create two output files - an encrypted data file and a key file. You need both these files to obtain your original data. The key file must work only on the corresponding data file. It should not work on any other file, either from the same user or from any other user.
AES algorithm requires two different parameters for encryption, a key and an initialization vector (IV).
I see three choices for creating the key file:
Embed hard-coded IV within the application and save the key in the key file.
Embed hard-coded key within the application and save the IV in the key file.
Save both the key and the IV in the key file.
Note that it is the same application that is used by different customers.
It appears all three choices would achieve the same end goal. However, I would like to get your feedback on what the right approach should be.
As you can see from the other answers, having a unique IV per encrypted file is crucial, but why is that?
First - let's review why a unique IV per encrypted file is important. (Wikipedia on IV). The IV adds randomness to your start of your encryption process. When using a chained block encryption mode (where one block of encrypted data incorporates the prior block of encrypted data) we're left with a problem regarding the first block, which is where the IV comes in.
If you had no IV, and used chained block encryption with just your key, two files that begin with identical text will produce identical first blocks. If the input files changed midway through, then the two encrypted files would begin to look different beginning at that point and through to the end of the encrypted file. If someone noticed the similarity at the beginning, and knew what one of the files began with, he could deduce what the other file began with. Knowing what the plaintext file began with and what it's corresponding ciphertext is could allow that person to determine the key and then decrypt the entire file.
Now add the IV - if each file used a random IV, their first block would be different. The above scenario has been thwarted.
Now what if the IV were the same for each file? Well, we have the problem scenario again. The first block of each file will encrypt to the same result. Practically, this is no different from not using the IV at all.
So now let's get to your proposed options:
Option 1. Embed hard-coded IV within the application and save the key in the key file.
Option 2. Embed hard-coded key within the application and save the IV in the key file.
These options are pretty much identical. If two files that begin with the same text produce encrypted files that begin with identical ciphertext, you're hosed. That would happen in both of these options. (Assuming there's one master key used to encrypt all files).
Option 3. Save both the key and the IV in the key file.
If you use a random IV for each key file, you're good. No two key files will be identical, and each encrypted file must have it's key file. A different key file will not work.
PS: Once you go with option 3 and random IV's - start looking into how you'll determine if decryption was successful. Take a key file from one file, and try using it to decrypt a different encryption file. You may discover that decryption proceeds and produces in garbage results. If this happens, begin research into authenticated encryption.
The important thing about an IV is you must never use the same IV for two messages. Everything else is secondary - if you can ensure uniqueness, randomness is less important (but still a very good thing to have!). The IV does not need to be (and indeed, in CBC mode cannot be) secret.
As such, you should not save the IV alongside the key - that would imply you use the same IV for every message, which defeats the point of having an IV. Typically you would simply prepend the IV to the encrypted file, in the clear.
If you are going to be rolling your own cipher modes like this, please read the relevant standards. The NIST has a good document on cipher modes here: http://dx.doi.org/10.6028/NIST.SP.800-38A IV generation is documented in Appendix C. Cryptography is a subtle art. Do not be tempted to create variations on the normal cipher modes; 99% of the time you will create something that looks more secure, but is actually less secure.
When you use an IV, the most important thing is that the IV should be as unique as possible, so in practice you should use a random IV. This means embedding it in your application is not an option. I would save the IV in the data file, as it does not harm security as long as the IV is random/unique.
Key/Iv pairs likely the most confused in the world of encryption. Simply put, password = key + iv. Meaning you need matching key and iv to decrypt an encrypted message. The internet seems to imply you only need iv to encrypt and toss it away but its also required to decrypt. The reason for spitting the key/iv values is to make it possible to encrypt same messages with the same key but use different Iv to get unequal encrypted messages. So, Encrypt("message", key, iv) != Encrypt("message", key, differentIv). The idea is to use a new random Iv value every time a message is encrypted. But how do you manage an ever changing Iv value? There's a million possibilities but the most logical way is to embed the 16 byte Iv within the encrypted message itself. So, encrypted = Iv + encryptedMessage. This way the contently changing Iv value can be pulled and removed from the encrypted message then decrypted. So decryptedMessage = Decrypt("messageWithoutIv", key, IvFromEncryptedMessage). Alternatively if storing encrypted messages in a database Iv could be stored in a field there. Although its true Iv is part of the secret, its tiny in comparison to the 32 bit key and is never reused so it is practically safe to expose publicly. Keep in mind, iv has nothing to do with encruotion, it has to do with masking encryption of messages having the same content.
IV is used for increase the security via randomness, but that does not mean it is used by all algorithm, i.e.
The trick thing is how long should the IV be? Usually it is the same size as the block size, or cipher size. For example, AES would have 16 bytes for IV. Besides, IV type can also be selected, i.e. eseqiv, seqiv, chainiv ...

Key salt and initial value AES

I am creating an encryption scheme with AES in cbc mode with a 256-bit key. Before I learned about CBC mode and initial values, I was planning on creating a 32-bit salt for each act of encryption and storing the salt. The password/entered key would then be padded with this salt up to 32 bits.
ie. if the pass/key entered was "tree," instead of padding it with 28 0s, it would be padded with the first 28 chars of this salt.
However, this was before I learned of the iv, also called a salt in some places. The question for me has now arisen as to whether or not this earlier method of salting has become redundant in principle with the IV. This would be to assume that the salt and the iv would be stored with the cipher text and so a theoretical brute force attack would not be deterred any.
Storing this key and using it rather than 0s is a step that involves some effort, so it is worth asking I think whether or not it is a practically useless measure. It is not as though there could be made, with current knowledge, any brute-force decryption tables for AES, and even a 16 bit salt pains the creation of md5 tables.
Thanks,
Elijah
It's good that you know CBC, as it is certainly better than using ECB mode encryption (although even better modes such as the authenticated modes GCM and EAX exist as well).
I think there are several things that you should know about, so I'll explain them here.
Keys and passwords are not the same. Normally you create a key used for symmetric encryption out of a password using a key derivation function. The most common one discussed here is PBKDF2 (password based key derivation function #2), which is used for PBE (password based encryption). This is defined in the latest, open PKCS#5 standard by RSA labs. Before entering the password need to check if the password is correctly translated into bytes (character encoding).
The salt is used as another input of the key derivation function. It is used to prevent brute force attacks using "rainbow tables" where keys are pre-computed for specific passwords. Because of the salt, the attacker cannot use pre-computed values, as he cannot generate one for each salt. The salt should normally be 8 bytes (64 bits) or longer; using a 128 bit salt would give you optimum security. The salt also ensures that identical passwords (of different users) do not derive the same key.
The output of the key derivation function is a secret of dkLen bytes, where dkLen is the length of the key to generate, in bytes. As an AES key does not contain anything other than these bytes, the AES key will be identical to the generated secret. dkLen should be 16, 24 or 32 bytes for the key lengths of AES: 128, 192 or 256 bits.
OK, so now you finally have an AES key to use. However, if you simply encrypt each plain text block with this key, you will get identical result if the plain text blocks are identical. CBC mode gets around this by XOR'ing the next plain text block with the last encrypted block before doing the encryption. That last encrypted block is the "vector". This does not work for the first block, because there is no last encrypted block. This is why you need to specify the first vector: the "initialization vector" or IV.
The block size of AES is 16 bytes independent of the key size. So the vectors, including the initialization vector, need to be 16 bytes as well. Now, if you only use the key to encrypt e.g. a single file, then the IV could simply contain 16 bytes with the value 00h.
This does not work for multiple files, because if the files contain the same text, you will be able to detect that the first part of the encrypted file is identical. This is why you need to specify a different IV for each encryption you perform with the key. It does not matter what it contains, as long as it is unique, 16 bytes and known to the application performing the decryption.
[EDIT 6 years later] The above part is not entirely correct: for CBC the IV needs to be unpredictable to an attacker, it doesn't just need to be unique. So for instance a counter cannot be used.
Now there is one trick that might allow you to use all zero's for the IV all the time: for each plain text you encrypt using AES-CBC, you could calculate a key using the same password but a different salt. In that case, you will only use the resulting key for a single piece of information. This might be a good idea if you cannot provide an IV for a library implementing password based encryption.
[EDIT] Another commonly used trick is to use additional output of PBKDF2 to derive the IV. This way the official recommendation that the IV for CBC should not be predicted by an adversary is fulfilled. You should however make sure that you do not ask for more output of the PBKDF2 function than that the underlying hash function can deliver. PBKDF2 has weaknesses that would enable an adversary to gain an advantage in such a situation. So do not ask for more than 256 bits if SHA-256 is used as hash function for PBKDF2. Note that SHA-1 is the common default for PBKDF2 so that only allows for a single 128 bit AES key.
IV's and salts are completely separate terms, although often confused. In your question, you also confuse bits and bytes, key size and block size and rainbow tables with MD5 tables (nobody said crypto is easy). One thing is certain: in cryptography it pays to be as secure as possible; redundant security is generally not a problem, unless you really (really) cannot afford the extra resources.
When you understand how this all works, I would seriously you to find a library that performs PBE encryption. You might just need to feed this the password, salt, plain data and - if separately configured- the IV.
[Edit] You should probably look for a library that uses Argon2 by now. PBKDF2 is still considered secure, but it does give unfair advantage to an attacker in some cases, letting the attacker perform fewer calculations than the regular user of the function. That's not a good property for a PBKDF / password hash.
If you are talking about AES-CBC then it is an Initialisation Vector (IV), not Salt. It is common practice to send the IV in clear as the first block of the encyphered message. The IV does not need to be kept secret. It should however be changed with every message - a constant IV means that effectively your first block is encrypted in ECB mode, which is not properly secure.

Should the initialization vector be required to be able to decrypt my data?

I'm using RijndaelManaged to encrypt and decrypt data. I may well have misunderstood the point of an initialization vector, but I am finding that if I set it to a different value when decrypting my data, all but the first 16 characters are still decrypted correctly. Is that expected behaviour?
Yes. In CBC mode each cyphertext block is used as the IV for the next cyphertext block. Using a faulty IV will mess up the first 16 byte block, but subsequent blocks will be unaffected. This can be a useful property as it allows error recovery after a faulty block, which can be important in some situations. It also illustrates why it is not really necessary to keep the IV secret (unlike the key!).

Should I use an initialization vector (IV) along with my encryption?

Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.

Resources