Storing IV when using AES asymmetric encryption and decryption - encryption

I'm looking at an C# AES asymmetric encryption and decryption example here and not sure if i should store the IV in a safe place (also encrypted??). Or i can just attach it to the encrypted text for using later when i with to decrypt. From a short reading about AES it seems it's not needed at all for decryption but i'm not sure i got it right and also the aes.CreateDecryptor(keyBytes, iv) need it as parameter.
I use a single key for all encryptions.

It's fairly standard to transmit the encrypted data as IV.Concat(cipherText). It's also fairly standard to put the IV off to the side, like in PKCS#5.
The IV-on-the-side approach matches more closely with how .NET wants to process the data, since it's somewhat annoying to slice off the IV to pass it separately to the IV parameter (or property), and then to have a more complicated slicing operation with the ciphertext (or recovered plaintext).
But the IV is usually transmitted in the clear either way.
So, glue it together, or make it a separate column... whatever fits your program and structure better.

Answer: IV is necessary for decryption as long as the content has been encrypted with it. You don't need to encrypt or hide the IV. It may be public.
--
The purpose of the IV is to be combined to the key that you are using, so it's like you are encrypting every "block of data" with a different "final key" and then it guarantees that the cipher data (the encrypted one) will always be different along the encryption (and decryption) process.
This is a very good illustration of what happens IF YOU DON'T use IV.
Basically, the encryption process is done by encrypting the input data in blocks. So during the encryption of this example, all the parts of the image that have the same color (let's say the white background) will output the same "cipher data" if you use always the same key, then a pattern can still be found and then you didn't hide the image as desired.
So combining a different extra data (the IV) to the key for each block is like you are using a different "final key" for each block, then you solve your problem.

Related

Proper/Secure encryption of data using AES and a password

Right now, this is what I am doing:
1. SHA-1 a password like "pass123", use the first 32 characters of the hexadecimal decoding for the key
2. Encrypt with AES-256 with just whatever the default parameters are
^Is that secure enough?
I need my application to encrypt data with a password, and securely. There are too many different things that come up when I google this and some things that I don't understand about it too. I am asking this as a general question, not any specific coding language (though I'm planning on using this with Java and with iOS).
So now that I am trying to do this more properly, please follow what I have in mind:
Input is a password such as "pass123" and the data is
what I want to encrypt such as "The bank account is 038414838 and the pin is 5931"
Use PBKDF2 to derive a key from the password. Parameters:
1000 iterations
length of 256bits
Salt - this one confuses me because I am not sure where to get the salt from, do I just make one up? As in, all my encryptions would always use the salt "F" for example (since apparently salts are 8bits which is just one character)
Now I take this key, and do I hash it?? Should I use something like SHA-256? Is that secure? And what is HMAC? Should I use that?
Note: Do I need to perform both steps 2 and 3 or is just one or the other okay?
Okay now I have the 256-bit key to do the encryption with. So I perform the encryption using AES, but here's yet another confusing part (the parameters).
I'm not really sure what are the different "modes" to use, apparently there's like CBC and EBC and a bunch of others
I also am not sure about the "Initialization Vector," do I just make one up and always use that one?
And then what about other options, what is PKCS7Padding?
For your initial points:
Using hexadecimals clearly splits the key size in half. Basically, you are using AES-128 security wise. Not that that is bad, but you might also go for AES-128 and use 16 bytes.
SHA-1 is relatively safe for key derivation, but it shouldn't be used directly because of the existence/creation of rainbow tables. For this you need a function like PBKDF2 which uses an iteration count and salt.
As for the solution:
You should not encrypt PIN's if that can be avoided. Please make sure your passwords are safe enough, allow pass phrases.
Create a random number per password and save the salt (16 bytes) with the output of PBKDF2. The salt does not have to be secret, although you might want to include a system secret to add some extra security. The salt and password are hashed, so they may have any length to be compatible with PBKDF2.
No, you just save the secret generated by the PBKDF2, let the PBKDF2 generate more data when required.
Never use ECB (not EBC). Use CBC as minimum. Note that CBC encryption does not provide integrity checking (somebody might change the cipher text and you might never know it) or authenticity. For that, you might want to add an additional MAC, HMAC or use an encryption mode such as GCM. PKCS7Padding (identical to PKCS5Padding in most occurences) is a simple method of adding bogus data to get N * [blocksize] bytes, required by block wise encryption.
Don't forget to prepend a (random) IV to your cipher text in case you reuse your encryption keys. An IV is similar to a salt, but should be exactly [blocksize] bytes (16 for AES).

AES Encryption - Key versus IV

The application I am working on lets the user encrypt files. The files could be of any format (spreadsheet, document, presentation, etc.).
For the specified input file, I create two output files - an encrypted data file and a key file. You need both these files to obtain your original data. The key file must work only on the corresponding data file. It should not work on any other file, either from the same user or from any other user.
AES algorithm requires two different parameters for encryption, a key and an initialization vector (IV).
I see three choices for creating the key file:
Embed hard-coded IV within the application and save the key in the key file.
Embed hard-coded key within the application and save the IV in the key file.
Save both the key and the IV in the key file.
Note that it is the same application that is used by different customers.
It appears all three choices would achieve the same end goal. However, I would like to get your feedback on what the right approach should be.
As you can see from the other answers, having a unique IV per encrypted file is crucial, but why is that?
First - let's review why a unique IV per encrypted file is important. (Wikipedia on IV). The IV adds randomness to your start of your encryption process. When using a chained block encryption mode (where one block of encrypted data incorporates the prior block of encrypted data) we're left with a problem regarding the first block, which is where the IV comes in.
If you had no IV, and used chained block encryption with just your key, two files that begin with identical text will produce identical first blocks. If the input files changed midway through, then the two encrypted files would begin to look different beginning at that point and through to the end of the encrypted file. If someone noticed the similarity at the beginning, and knew what one of the files began with, he could deduce what the other file began with. Knowing what the plaintext file began with and what it's corresponding ciphertext is could allow that person to determine the key and then decrypt the entire file.
Now add the IV - if each file used a random IV, their first block would be different. The above scenario has been thwarted.
Now what if the IV were the same for each file? Well, we have the problem scenario again. The first block of each file will encrypt to the same result. Practically, this is no different from not using the IV at all.
So now let's get to your proposed options:
Option 1. Embed hard-coded IV within the application and save the key in the key file.
Option 2. Embed hard-coded key within the application and save the IV in the key file.
These options are pretty much identical. If two files that begin with the same text produce encrypted files that begin with identical ciphertext, you're hosed. That would happen in both of these options. (Assuming there's one master key used to encrypt all files).
Option 3. Save both the key and the IV in the key file.
If you use a random IV for each key file, you're good. No two key files will be identical, and each encrypted file must have it's key file. A different key file will not work.
PS: Once you go with option 3 and random IV's - start looking into how you'll determine if decryption was successful. Take a key file from one file, and try using it to decrypt a different encryption file. You may discover that decryption proceeds and produces in garbage results. If this happens, begin research into authenticated encryption.
The important thing about an IV is you must never use the same IV for two messages. Everything else is secondary - if you can ensure uniqueness, randomness is less important (but still a very good thing to have!). The IV does not need to be (and indeed, in CBC mode cannot be) secret.
As such, you should not save the IV alongside the key - that would imply you use the same IV for every message, which defeats the point of having an IV. Typically you would simply prepend the IV to the encrypted file, in the clear.
If you are going to be rolling your own cipher modes like this, please read the relevant standards. The NIST has a good document on cipher modes here: http://dx.doi.org/10.6028/NIST.SP.800-38A IV generation is documented in Appendix C. Cryptography is a subtle art. Do not be tempted to create variations on the normal cipher modes; 99% of the time you will create something that looks more secure, but is actually less secure.
When you use an IV, the most important thing is that the IV should be as unique as possible, so in practice you should use a random IV. This means embedding it in your application is not an option. I would save the IV in the data file, as it does not harm security as long as the IV is random/unique.
Key/Iv pairs likely the most confused in the world of encryption. Simply put, password = key + iv. Meaning you need matching key and iv to decrypt an encrypted message. The internet seems to imply you only need iv to encrypt and toss it away but its also required to decrypt. The reason for spitting the key/iv values is to make it possible to encrypt same messages with the same key but use different Iv to get unequal encrypted messages. So, Encrypt("message", key, iv) != Encrypt("message", key, differentIv). The idea is to use a new random Iv value every time a message is encrypted. But how do you manage an ever changing Iv value? There's a million possibilities but the most logical way is to embed the 16 byte Iv within the encrypted message itself. So, encrypted = Iv + encryptedMessage. This way the contently changing Iv value can be pulled and removed from the encrypted message then decrypted. So decryptedMessage = Decrypt("messageWithoutIv", key, IvFromEncryptedMessage). Alternatively if storing encrypted messages in a database Iv could be stored in a field there. Although its true Iv is part of the secret, its tiny in comparison to the 32 bit key and is never reused so it is practically safe to expose publicly. Keep in mind, iv has nothing to do with encruotion, it has to do with masking encryption of messages having the same content.
IV is used for increase the security via randomness, but that does not mean it is used by all algorithm, i.e.
The trick thing is how long should the IV be? Usually it is the same size as the block size, or cipher size. For example, AES would have 16 bytes for IV. Besides, IV type can also be selected, i.e. eseqiv, seqiv, chainiv ...

When using AES, is there a way to tell if data was encrypted using 128 or 256 bit keys?

I was wondering if there is some way to tell if data was encrypted with a specific key size, without the source code of course. Is there any detectable differences with the data that you can check post encryption?
No there is not any way to do that. Both encrypt 16-byte chunks of data and the resulting blocks would "look" the same after the encryption is complete (they would have different values, but an analysis on only the encrypted data would not be able to determine the original key size). If the original data (plain text) is available, it may be possible to do some kind of analysis.
A very simplistic "proof" is:
For a given input, the length of the output is the same regardless of the key size. It may, however, differ depending on the mode (CBC, CTR, etc.).
Since the encryption is reversible, it can be considered to be a one-to-one function. In other words, a different input results in a different output.
Therefore, it is possible to produce any given output (by changing the plain text) regardless of the key size.
Thus, for a given password, you could end up with the same output by using the appropriate plain text regardless of the key size. This "proof" has a hole in that padding schemes can result in a longer output than input (so the function is not necessarily onto.) But I doubt this would make a difference in the end result.
If an encryption system is any good (AES is) then there should be no way to distinguish its raw output from random data -- so, in particular, there should be no way to distinguish between AES-128 and AES-256, at least on the output bits.
However, most protocols which use encryption end up including some metadata which designates, without ambiguity, the kind of algorithm which was used, including key size. This is to that the receiver knows what to use to decrypt. This is not considered to be an issue. So, in practice, one has to assume that whatever attacker looks at your system knows whether the key is actually a 128-bit or 256-bit key.
Some side channels may give that information, too. AES encryption with a 256-bit key is 40% slower than AES encryption with a 128-bit key: simply timing how much time an encrypting server takes to respond can reveal the key size.

Identifying An Encryption Algorithm

First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data. This seems highly unlikely, if not impossible to me, so sorry if it's a stupid question.
My final question is, say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
No. There are algorithms that use a different key to decrypt than to encrypt, but a keyless method would rely on secrecy of the algorithm, generally regarded as a poor idea.
My final question is, say you have
access to the plain text data before
it is encrypted, the key used to
encrypt the plain text data, and the
resulting encrypted data, would
figuring out which algorithm used to
encrypt the data be feasible?
Most likely yes, especially given the key. A good crypto algorithm relies on the secrecy of the key, and the key alone. See kerckhoff's principle.
Also if a common algorithm is used it would be a simple matter of trial and error, and besides cryptotext often is accompanied by metadata which tells you algorithm details.
edit: as per comments, you may be thinking of digital signature (which requires a secret only on the sender side), a hash algorithm (which requires no key but isn't encryption), or a zero-knowledge proof (which can prove knowledge of a secret without revealing it).
Abstractly, we can think of the encryption system this way:
-------------------
plaintext ---> | algorithm & key | ---> ciphertext
-------------------
The system must guarantee the following:
decrypt(encrypt(plaintext, algorithm, key), algorithm, key) = plaintext
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
Yes, in such a system the key is redundant; all the "secrecy" lies in the algorithm.
My final question
is, say you have access to the plain
text data before it is encrypted, the
key used to encrypt the plain text
data, and the resulting encrypted
data, would figuring out which
algorithm used to encrypt the data be
feasible?
In practice, you'll probably have a small space of algorithms, so a simple brute-force search is feasible. However, there may be more than one algorithm that fits the given information. Consider the following example:
We define the following encryption and decryption operations, where plaintext, ciphertext, algorithm, and key are real numbers (assume algorithm is nonzero):
encrypt(plaintext, algorithm, key) = algorithm x (plaintext + key) = ciphertext
decrypt(ciphertext, algorithm, key) = ciphertext/algorithm - key = plaintext
Now, suppose that plaintext + key = 0. We have ciphertext = 0 for any choice of algorithm. Hence, we cannot deduce the algorithm used.
First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data.
What are you getting at? It's trivial to come up with a pair of functions that fits the letter of the specification, but without knowing the intent it's hard to give a more helpful answer.
say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
If the algorithm is any good the output will be indistinguishable from random noise, so there is no analytic solution to this. As a practical matter, there are only so many trusted algorithms in wide use. Trying each one in turn would be quick, but would be complicated by the fact that an implementation has some freedom with regard to things like byte order (little-endian vs big-endian), key derivation (if you had a pass-phrase instead of the actual cryptographic key itself), encryption modes and padding.
As frankodwyer points out, this situation is not part of usual threat models. This would work in your favor, as it makes it more likely that the algorithm is a well-known one.
The best you could do without a known key in the decoder would be to add a bit of obscurity. For example, if the first step of the decode algorythm is to strip out everything except for every tenth character, then your encode key may be used to seed some random garbage for nine out of every ten characters. Thus, with different keys you could achieve different encoded results which would be decoded to the same message, with no key necessary for the decoder.
However, this does not add much real security and should not be solely relied on to protect crucial data. I'm just thinking of a case where it would be possible to do so yes I suppose it could - if you were just trying to prove a point or add one more level of security.
I don't believe that there is such an algorithm that would use a key to encrypt, but not to decrypt. (Silly answers like a 26 character Caesar cipher aside...)
To your second question, yes; it just depends on how much time you're willing to spend on it. In theoretical cryptography it is assumed that the algorithm can always be determined. Whether that be through theft of the algorithm or a physical machine, or as in your case having a plain text and cipher text pair.

Should I use an initialization vector (IV) along with my encryption?

Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.

Resources