I have a go script that uses crypto/aes to encrypt and decrypt a plaintext.
https://play.golang.org/p/le_-uuzWN4
I want this script to be used across different machines and produce the same encrypted text. I thought that by having a custom IV it would result in a consistent encryption no matter where.
Right now it yields in different results on the go playground versus on https://repl.it/languages/go
Is it possible to produce consistent encryption or will it always be different due to internal implementation (like encryption salt, etc..)
Also, what exactly is IV, I'm still confused about that. The doc doesnt really explain what it is
I figured out why it was generating a different cipher text each time. The IV is randomized with this statement
if _, err := io.ReadFull(rand.Reader, iv); err != nil {
panic(err)
}
Removing it will keep the IV constant and it will generate the same encryption given the same Key an IV on any machine
Related
I am using node and the crypto module to encrypt and decrypt a large binary file. I encrypt the file using crypto.createCipheriv and decrypt it using crypto.createDecipheriv.
For the encryption I use a random IV as follows:
const iv = crypto.randomBytes(16);
const encrypt = crypto.createCipheriv('aes-128-cbc', key, iv)
What I don't understand, do I need to pass a random IV for createDecipheriv as well? The SO here says:
The IV needs to be identical for encryption and decryption.
Can the IV be static? And if it can't, is it considered to be a secret? Where would I store the IV? In the payload?
If I use different random IVs for the encryption and decryption, my payload gets decrypted but the first 16 bytes are corrupt. This means, it looks like the IV needs to be the same but from a security perspective there is also not much value as the payload is decrypted except 16 bytes.
Can anyone elaborate what the go-to approach is? Thanks for your help!
The Key+IV pair must never be duplicated on two encryptions using CBC. Doing so leaks information about the first block (in all cases), and is creates duplicate cipher texts (which is a problem if you ever encrypt the same message prefix twice).
So, if your key changes for every encryption, then your IV could be static. But no one does that. They have a key they reuse. So the IV must change.
There is no requirement that it be random. It just shouldn't repeat and it must not be predictable (in cases where the attacker can control the messages). Random is the easiest way to do that. Anything other than random requires a lot of specialized knowledge to get right, so use random.
Reusing a Key+IV pair in CBC weakens the security of the cipher, but does not destroy it, as in CTR. IV reused with CTR can lead to trivial decryptions. In CBC, it generally just leaks information. It's a serious problem, but it is not catastrophic. (Not all insecure configurations are created equal.)
The IV is not a secret. Everyone can know it. So it is typically prepended to the ciphertext.
For security reasons, the IV needs to be chosen to meet cryptographic randomness security requirements (i.e. use crypto.randomBytes( ) in node). This was shown in Phil Rogaway's research paper. The summary is in Figure 1.2 of the paper, which I transcribe here:
CBC (SP 800-38A): An IV-based encryption scheme, the mode is secure as a probabilistic encryption scheme, achieving indistinguishability from random bits, assuming a random IV. Confidentiality is not achieved if the IV is merely a nonce, nor if it is a nonce enciphered under the same key used by the scheme, as the standard incorrectly suggests to do.
The normal way to implement this is to include the IV prepended to the ciphertext. The receiving party extracts the IV and then decrypts the ciphertext. The IV is not a secret, instead it is just used to bring necessary security properties into the mode of operation.
However, be aware that encryption with CBC does not prevent people from tampering with the data. If an attacker fiddles with ciphertext bits within a block, it affects exactly two plaintext blocks, one of which is in a very controlled way.
To make a very long story short, GCM is a better mode to use to prevent such abuses. In that case, you do not need a random IV, but instead you must never let the IV repeat (in cryptography, we call this property a "nonce"). Luke Park gives an example of how to implement it, here. He uses randomness for the nonce, which achieves the nonce property for all practical purposes (unless you are encrypting 2^48 texts, which is crazy large).
But whatever mode you do, you must never repeat an IV for a given key, which is a very common mistake.
The GCM and CBC AES ciphers in Go can't be used along side with StreamWriter or StreamReader, which forces me to allocate the entire file into memory. Obviously, this is not ideal with large files.
I was thinking of making them streamable, by allocation some fixed size of blocks into memory, and feeding them to GCM or CBC, but I'm assuming that is probably a bad idea, since there must be a reason they've been designed this way.
Can someone explain why these operation modes can't be used without allocating the entire files into memory?
Simple answer - that's how they designed the API.
CBC and GCM are very different modes. GCM is AEAD mode (Authenticated Encryption with Associated Data). Do you actually need authentication? If not, for large files CBC is a good fit. You could also use CTR or OFB but they're streaming modes and are very strict about choosing an IV.
Before implementing anything I suggest you read about these modes. You have to at least understand, which parameters they need, for what purpose and how they should be generated.
CBC
BlockMode interface is a good fit for encrypting large files. You just need to encrypt block by block. CBC requires padding, but Go doesn't have the implementation. At least, I don't see one. You will have to deal with that somehow.
GCM
GCM uses AEAD interface, which only allows you to encrypt and decrypt a whole message. There is absolutely no reason why it should be implemented like that. GCM is a streaming mode, it's actually a good fit for streaming encryption. The only problem is authentication. GCM produces a tag at the end, which acts like a MAC. To make use of that tag you can't just encrypt an endless stream of data. You have to split it into chunks and authenticate them separately. Or do something else but at some point you have to read that tag and verify it, otherwise there's no point in using GCM.
What some libraries do, including Go, is they append that tag at the end implicitly on encryption and read and verify it on decryption. Personally, I think that's a very bad design. Tag should be available as a separate entity, you can't just assume that it will always be at the end. And that's not the only one of the problems in Go implementation.
Sorry for that rant. Recently I've dealt with a particulary bad implementation.
As a solution I suggest you split your stream into chunks and encrypt them separately with a unique nonce (that's very important). Each chunk will have a tag at the end, which you should verify. That way you can make use of GCM authentication. Yes, it's kind of ugly but Go doesn't give you access to inner methods, so that you could make your own encryption API.
As an alternative you could find a different implementation. Maybe even a C library. I can suggest mbedtls. For me, it's the best implementation I came across in terms of API.
Heres a stream implementation for reading from a BlockMode. Your mileage may vary.
type BlockReader struct {
buf []byte
block cipher.BlockMode
in io.Reader
}
func NewBlockReader(blockMode cipher.BlockMode,reader io.Reader) *BlockReader {
return &BlockReader{
block: blockMode,
in: reader,
}
}
func (b *BlockReader) Read(p []byte) (n int, err error) {
toRead := len(p)
mul := toRead/b.block.BlockSize()
size := mul*b.block.BlockSize()
if cap(b.buf) != size{
b.buf = make([]byte,toRead,toRead)
}
read, err := b.in.Read(b.buf)
if err != nil {
return 0,err
}
if read < b.block.BlockSize(){
return 0,io.ErrUnexpectedEOF
}
b.block.CryptBlocks(b.buf,b.buf)
return copy(p,b.buf),nil
}
The application I am working on lets the user encrypt files. The files could be of any format (spreadsheet, document, presentation, etc.).
For the specified input file, I create two output files - an encrypted data file and a key file. You need both these files to obtain your original data. The key file must work only on the corresponding data file. It should not work on any other file, either from the same user or from any other user.
AES algorithm requires two different parameters for encryption, a key and an initialization vector (IV).
I see three choices for creating the key file:
Embed hard-coded IV within the application and save the key in the key file.
Embed hard-coded key within the application and save the IV in the key file.
Save both the key and the IV in the key file.
Note that it is the same application that is used by different customers.
It appears all three choices would achieve the same end goal. However, I would like to get your feedback on what the right approach should be.
As you can see from the other answers, having a unique IV per encrypted file is crucial, but why is that?
First - let's review why a unique IV per encrypted file is important. (Wikipedia on IV). The IV adds randomness to your start of your encryption process. When using a chained block encryption mode (where one block of encrypted data incorporates the prior block of encrypted data) we're left with a problem regarding the first block, which is where the IV comes in.
If you had no IV, and used chained block encryption with just your key, two files that begin with identical text will produce identical first blocks. If the input files changed midway through, then the two encrypted files would begin to look different beginning at that point and through to the end of the encrypted file. If someone noticed the similarity at the beginning, and knew what one of the files began with, he could deduce what the other file began with. Knowing what the plaintext file began with and what it's corresponding ciphertext is could allow that person to determine the key and then decrypt the entire file.
Now add the IV - if each file used a random IV, their first block would be different. The above scenario has been thwarted.
Now what if the IV were the same for each file? Well, we have the problem scenario again. The first block of each file will encrypt to the same result. Practically, this is no different from not using the IV at all.
So now let's get to your proposed options:
Option 1. Embed hard-coded IV within the application and save the key in the key file.
Option 2. Embed hard-coded key within the application and save the IV in the key file.
These options are pretty much identical. If two files that begin with the same text produce encrypted files that begin with identical ciphertext, you're hosed. That would happen in both of these options. (Assuming there's one master key used to encrypt all files).
Option 3. Save both the key and the IV in the key file.
If you use a random IV for each key file, you're good. No two key files will be identical, and each encrypted file must have it's key file. A different key file will not work.
PS: Once you go with option 3 and random IV's - start looking into how you'll determine if decryption was successful. Take a key file from one file, and try using it to decrypt a different encryption file. You may discover that decryption proceeds and produces in garbage results. If this happens, begin research into authenticated encryption.
The important thing about an IV is you must never use the same IV for two messages. Everything else is secondary - if you can ensure uniqueness, randomness is less important (but still a very good thing to have!). The IV does not need to be (and indeed, in CBC mode cannot be) secret.
As such, you should not save the IV alongside the key - that would imply you use the same IV for every message, which defeats the point of having an IV. Typically you would simply prepend the IV to the encrypted file, in the clear.
If you are going to be rolling your own cipher modes like this, please read the relevant standards. The NIST has a good document on cipher modes here: http://dx.doi.org/10.6028/NIST.SP.800-38A IV generation is documented in Appendix C. Cryptography is a subtle art. Do not be tempted to create variations on the normal cipher modes; 99% of the time you will create something that looks more secure, but is actually less secure.
When you use an IV, the most important thing is that the IV should be as unique as possible, so in practice you should use a random IV. This means embedding it in your application is not an option. I would save the IV in the data file, as it does not harm security as long as the IV is random/unique.
Key/Iv pairs likely the most confused in the world of encryption. Simply put, password = key + iv. Meaning you need matching key and iv to decrypt an encrypted message. The internet seems to imply you only need iv to encrypt and toss it away but its also required to decrypt. The reason for spitting the key/iv values is to make it possible to encrypt same messages with the same key but use different Iv to get unequal encrypted messages. So, Encrypt("message", key, iv) != Encrypt("message", key, differentIv). The idea is to use a new random Iv value every time a message is encrypted. But how do you manage an ever changing Iv value? There's a million possibilities but the most logical way is to embed the 16 byte Iv within the encrypted message itself. So, encrypted = Iv + encryptedMessage. This way the contently changing Iv value can be pulled and removed from the encrypted message then decrypted. So decryptedMessage = Decrypt("messageWithoutIv", key, IvFromEncryptedMessage). Alternatively if storing encrypted messages in a database Iv could be stored in a field there. Although its true Iv is part of the secret, its tiny in comparison to the 32 bit key and is never reused so it is practically safe to expose publicly. Keep in mind, iv has nothing to do with encruotion, it has to do with masking encryption of messages having the same content.
IV is used for increase the security via randomness, but that does not mean it is used by all algorithm, i.e.
The trick thing is how long should the IV be? Usually it is the same size as the block size, or cipher size. For example, AES would have 16 bytes for IV. Besides, IV type can also be selected, i.e. eseqiv, seqiv, chainiv ...
Will encrypting two identical plaintext messages with AES in CBC with the same IV yield the same ciphertext?
From my understanding the first block is XOR'd with the IV, and then each subsequent block with the previous. Does this mean that with the same IV and identical messages that every block will be encrypted to the same thing? I understand using a predictable or non-changing IV for encryption is a very bad thing to do, and I am wondering why - is it because attackers can build up a "book" of known messages, or because we leave the first block vulnerable to frequency checks?
Thanks
If you use the same key both times, then yes, you'd get identical output. If you use a different key, then you'd get different output (you XOR the previous block with the current block, but then you encrypt the result to produce a block of ciphertext).
That, however, is generally of little help. One of the basic reasons for using something like CBC is to avoid repetition among messages even if they contain the same data and you continue to use the same key (though of course, it's also useful that it avoids patterns within a single message as well). Changing the IV keeps each message unique (even if some of the plaintext content is predictable) without going to all the work of distributing a new key for every message (which would generally be relatively painful).
Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.