The application I am working on lets the user encrypt files. The files could be of any format (spreadsheet, document, presentation, etc.).
For the specified input file, I create two output files - an encrypted data file and a key file. You need both these files to obtain your original data. The key file must work only on the corresponding data file. It should not work on any other file, either from the same user or from any other user.
AES algorithm requires two different parameters for encryption, a key and an initialization vector (IV).
I see three choices for creating the key file:
Embed hard-coded IV within the application and save the key in the key file.
Embed hard-coded key within the application and save the IV in the key file.
Save both the key and the IV in the key file.
Note that it is the same application that is used by different customers.
It appears all three choices would achieve the same end goal. However, I would like to get your feedback on what the right approach should be.
As you can see from the other answers, having a unique IV per encrypted file is crucial, but why is that?
First - let's review why a unique IV per encrypted file is important. (Wikipedia on IV). The IV adds randomness to your start of your encryption process. When using a chained block encryption mode (where one block of encrypted data incorporates the prior block of encrypted data) we're left with a problem regarding the first block, which is where the IV comes in.
If you had no IV, and used chained block encryption with just your key, two files that begin with identical text will produce identical first blocks. If the input files changed midway through, then the two encrypted files would begin to look different beginning at that point and through to the end of the encrypted file. If someone noticed the similarity at the beginning, and knew what one of the files began with, he could deduce what the other file began with. Knowing what the plaintext file began with and what it's corresponding ciphertext is could allow that person to determine the key and then decrypt the entire file.
Now add the IV - if each file used a random IV, their first block would be different. The above scenario has been thwarted.
Now what if the IV were the same for each file? Well, we have the problem scenario again. The first block of each file will encrypt to the same result. Practically, this is no different from not using the IV at all.
So now let's get to your proposed options:
Option 1. Embed hard-coded IV within the application and save the key in the key file.
Option 2. Embed hard-coded key within the application and save the IV in the key file.
These options are pretty much identical. If two files that begin with the same text produce encrypted files that begin with identical ciphertext, you're hosed. That would happen in both of these options. (Assuming there's one master key used to encrypt all files).
Option 3. Save both the key and the IV in the key file.
If you use a random IV for each key file, you're good. No two key files will be identical, and each encrypted file must have it's key file. A different key file will not work.
PS: Once you go with option 3 and random IV's - start looking into how you'll determine if decryption was successful. Take a key file from one file, and try using it to decrypt a different encryption file. You may discover that decryption proceeds and produces in garbage results. If this happens, begin research into authenticated encryption.
The important thing about an IV is you must never use the same IV for two messages. Everything else is secondary - if you can ensure uniqueness, randomness is less important (but still a very good thing to have!). The IV does not need to be (and indeed, in CBC mode cannot be) secret.
As such, you should not save the IV alongside the key - that would imply you use the same IV for every message, which defeats the point of having an IV. Typically you would simply prepend the IV to the encrypted file, in the clear.
If you are going to be rolling your own cipher modes like this, please read the relevant standards. The NIST has a good document on cipher modes here: http://dx.doi.org/10.6028/NIST.SP.800-38A IV generation is documented in Appendix C. Cryptography is a subtle art. Do not be tempted to create variations on the normal cipher modes; 99% of the time you will create something that looks more secure, but is actually less secure.
When you use an IV, the most important thing is that the IV should be as unique as possible, so in practice you should use a random IV. This means embedding it in your application is not an option. I would save the IV in the data file, as it does not harm security as long as the IV is random/unique.
Key/Iv pairs likely the most confused in the world of encryption. Simply put, password = key + iv. Meaning you need matching key and iv to decrypt an encrypted message. The internet seems to imply you only need iv to encrypt and toss it away but its also required to decrypt. The reason for spitting the key/iv values is to make it possible to encrypt same messages with the same key but use different Iv to get unequal encrypted messages. So, Encrypt("message", key, iv) != Encrypt("message", key, differentIv). The idea is to use a new random Iv value every time a message is encrypted. But how do you manage an ever changing Iv value? There's a million possibilities but the most logical way is to embed the 16 byte Iv within the encrypted message itself. So, encrypted = Iv + encryptedMessage. This way the contently changing Iv value can be pulled and removed from the encrypted message then decrypted. So decryptedMessage = Decrypt("messageWithoutIv", key, IvFromEncryptedMessage). Alternatively if storing encrypted messages in a database Iv could be stored in a field there. Although its true Iv is part of the secret, its tiny in comparison to the 32 bit key and is never reused so it is practically safe to expose publicly. Keep in mind, iv has nothing to do with encruotion, it has to do with masking encryption of messages having the same content.
IV is used for increase the security via randomness, but that does not mean it is used by all algorithm, i.e.
The trick thing is how long should the IV be? Usually it is the same size as the block size, or cipher size. For example, AES would have 16 bytes for IV. Besides, IV type can also be selected, i.e. eseqiv, seqiv, chainiv ...
Related
I'm looking at an C# AES asymmetric encryption and decryption example here and not sure if i should store the IV in a safe place (also encrypted??). Or i can just attach it to the encrypted text for using later when i with to decrypt. From a short reading about AES it seems it's not needed at all for decryption but i'm not sure i got it right and also the aes.CreateDecryptor(keyBytes, iv) need it as parameter.
I use a single key for all encryptions.
It's fairly standard to transmit the encrypted data as IV.Concat(cipherText). It's also fairly standard to put the IV off to the side, like in PKCS#5.
The IV-on-the-side approach matches more closely with how .NET wants to process the data, since it's somewhat annoying to slice off the IV to pass it separately to the IV parameter (or property), and then to have a more complicated slicing operation with the ciphertext (or recovered plaintext).
But the IV is usually transmitted in the clear either way.
So, glue it together, or make it a separate column... whatever fits your program and structure better.
Answer: IV is necessary for decryption as long as the content has been encrypted with it. You don't need to encrypt or hide the IV. It may be public.
--
The purpose of the IV is to be combined to the key that you are using, so it's like you are encrypting every "block of data" with a different "final key" and then it guarantees that the cipher data (the encrypted one) will always be different along the encryption (and decryption) process.
This is a very good illustration of what happens IF YOU DON'T use IV.
Basically, the encryption process is done by encrypting the input data in blocks. So during the encryption of this example, all the parts of the image that have the same color (let's say the white background) will output the same "cipher data" if you use always the same key, then a pattern can still be found and then you didn't hide the image as desired.
So combining a different extra data (the IV) to the key for each block is like you are using a different "final key" for each block, then you solve your problem.
For
`BDK = "0123456789ABCDEFFEDCBA9876543210"` `KSN = "FFFF9876543210E00008"`
The ciphertext generated was below
"C25C1D1197D31CAA87285D59A892047426D9182EC11353C051ADD6D0F072A6CB3436560B3071FC1FD11D9F7E74886742D9BEE0CFD1EA1064C213BB55278B2F12"`
which I found here. I know this cipher-text is based on BDK and KSN but how this 128 length cipher text was generated? What are steps involved in it or algorithm used for this? Could someone explain in simple steps. I found it hard to understand the documents I got while googled.
Regarding DUKPT , there are some explanations given on Wiki. If that doesn't suffice you, here goes some brief explanation.
Quoting http://www.maravis.com/library/derived-unique-key-per-transaction-dukpt/
What is DUKPT?
Derived Unique Key Per Transaction (DUKPT) is a key management scheme. It uses one time encryption keys that are derived from a secret master key that is shared by the entity (or device) that encrypts and the entity (or device) that decrypts the data.
Why DUKPT?
Any encryption algorithm is only as secure as its keys. The strongest algorithm is useless if the keys used to encrypt the data with the algorithm are not secure. This is like locking your door with the biggest and strongest lock, but if you hid the key under the doormat, the lock itself is useless. When we talk about encryption, we also need to keep in mind that the data has to be decrypted at the other end.
Typically, the weakest link in any encryption scheme is the sharing of the keys between the encrypting and decrypting parties. DUKPT is an attempt to ensure that both the parties can encrypt and decrypt data without having to pass the encryption/decryption keys around.
The Cryptographic Best Practices document that VISA has published also recommends the use of DUKPT for PCI DSS compliance.
How DUKPT Works
DUKPT uses one time keys that are generated for every transaction and then discarded. The advantage is that if one of these keys is compromised, only one transaction will be compromised. With DUKPT, the originating (say, a Pin Entry Device or PED) and the receiving (processor, gateway, etc) parties share a key. This key is not actually used for encryption. Instead, another one time key that is derived from this master key is used for encrypting and decrypting the data. It is important to note that the master key should not be recoverable from the derived one time key.
To decrypt data, the receiving end has to know which master key was used to generate the one time key. This means that the receiving end has to store and keep track of a master key for each device. This can be a lot of work for someone that supports a lot of devices. A better way is required to deal with this.
This is how it works in real-life: The receiver has a master key called the Base Derivation Key (BDK). The BDK is supposed to be secret and will never be shared with anyone. This key is used to generate keys called the Initial Pin Encryption Key (IPEK). From this a set of keys called Future Keys is generated and the IPEK discarded. Each of the Future keys is embedded into a PED by the device manufacturer, with whom these are shared. This additional derivation step means that the receiver does not have to keep track of each and every key that goes into the PEDs. They can be re-generated when required.
The receiver shares the Future keys with the PED manufacturer, who embeds one key into each PED. If one of these keys is compromised, the PED can be rekeyed with a new Future key that is derived from the BDK, since the BDK is still safe.
Encryption and Decryption
When data needs to be sent from the PED to the receiver, the Future key within that device is used to generate a one time key and then this key is used with an encryption algorithm to encrypt the data. This data is then sent to the receiver along with the Key Serial Number (KSN) which consists of the Device ID and the device transaction counter.
Based on the KSN, the receiver then generates the IPEK and from that generates the Future Key that was used by the device and then the actual key that was used to encrypt the data. With this key, the receiver will be able to decrypt the data.
Source
First, let me quote the complete sourcecode you linked and of which you provided only 3 lines...
require 'bundler/setup'
require 'test/unit'
require 'dukpt'
class DUKPT::DecrypterTest < Test::Unit::TestCase
def test_decrypt_track_data
bdk = "0123456789ABCDEFFEDCBA9876543210"
ksn = "FFFF9876543210E00008"
ciphertext = "C25C1D1197D31CAA87285D59A892047426D9182EC11353C051ADD6D0F072A6CB3436560B3071FC1FD11D9F7E74886742D9BEE0CFD1EA1064C213BB55278B2F12"
plaintext = "%B5452300551227189^HOGAN/PAUL ^08043210000000725000000?\x00\x00\x00\x00"
decrypter = DUKPT::Decrypter.new(bdk, "cbc")
assert_equal plaintext, decrypter.decrypt(ciphertext, ksn)
end
end
Now, you're asking is how the "ciphertext" was created...
Well, first thing we know is that it is based on "plaintext", which is used in the code to verify if decryption works.
The plaintext is 0-padded - which fits the encryption that is being tested by verifying decryption with this DecrypterTest TestCase.
Let's look at the encoding code then...
I found the related encryption code at https://github.com/Shopify/dukpt/blob/master/lib/dukpt/encryption.rb.
As the DecrypterTEst uses "cbc", it becomes apparent that the encrypting uses:
#cipher_type_des = "des-cbc"
#cipher_type_tdes = "des-ede-cbc"
A bit more down that encryption code, the following solves our quest for an answer:
ciphertext = des_encrypt(...
Which shows we're indeed looking at the result of a DES encryption.
Now, DES has a block size of 64 bits. That's (64/8=) 8 bytes binary, or - as the "ciphertext" is a hex-encoded text representation of the bytes - 16 chars hex.
The "ciphertext" is 128 hex chars long, which means it holds (128 hex chars/16 hex chars=) 8 DES blocks with each 64 bits of encrypted information.
Wrapping all this up in a simple answer:
When looking at "ciphertext", you are looking at (8 blocks of) DES encrypted data, which is being represented using a human-readable, hexadecimal (2 hex chars = 1 byte) notation instead of the original binary bytes that DES encryption would produce.
As for the steps involved in "recreating" the ciphertext, I tend to tell you to simply use the relevant parts of the ruby project where you based your question upon. Simply have to look at the sourcecode. The file at "https://github.com/Shopify/dukpt/blob/master/lib/dukpt/encryption.rb" pretty much explains it all and I'm pretty sure all functionality you need can be found at the project's GitHub repository. Alternatively, you can try to recreate it yourself - using the preferred programming language of your choice. You only need to handle 2 things: DES encryption/decryption and bin-to-hex/hex-to-bin translation.
Since this is one of the first topics that come up regarding this I figured I'd share how I was able to encode the ciphertext. This is the first time I've worked with Ruby and it was specifically to work with DUKPT
First I had to get the ipek and pek (same as in the decrypt) method. Then unpack the plaintext string. Convert the unpacked string to a 72 byte array (again, forgive me if my terminology is incorrect).
I noticed in the dukpt gem author example he used the following plain text string
"%B5452300551227189^HOGAN/PAUL ^08043210000000725000000?\x00\x00\x00\x00"
I feel this string is incorrect as there shouldn't be a space after the name (AFAIK).. so it should be
"%B5452300551227189^HOGAN/PAUL^08043210000000725000000?\x00\x00\x00\x00"
All in all, this is the solution I ended up on that can encrypt a string and then decrypt it using DUKPT
class Encrypt
include DUKPT::Encryption
attr_reader :bdk
def initialize(bdk, mode=nil)
#bdk = bdk
self.cipher_mode = mode.nil? ? 'cbc' : mode
end
def encrypt(plaintext, ksn)
ipek = derive_IPEK(bdk, ksn)
pek = derive_PEK(ipek, ksn)
message = plaintext.unpack("H*").first
message = hex_string_from_unpacked(message, 72)
encrypted_cryptogram = triple_des_encrypt(pek,message).upcase
encrypted_cryptogram
end
def hex_string_from_unpacked val, bytes
val.ljust(bytes * 2, "0")
end
end
boomedukpt FFFF9876543210E00008 "%B5452300551227189^HOGAN/PAUL^08043210000000725000000?"
(my ruby gem, the KSN and the plain text string)
2542353435323330303535313232373138395e484f47414e2f5041554c5e30383034333231303030303030303732353030303030303f000000000000000000000000000000000000
(my ruby gem doing a puts on the unpacked string after calling hex_string_from_unpacked)
C25C1D1197D31CAA87285D59A892047426D9182EC11353C0B82D407291CED53DA14FB107DC0AAB9974DB6E5943735BFFE7D72062708FB389E65A38C444432A6421B7F7EDD559AF11
(my ruby gem doing a puts on the encrypted string)
%B5452300551227189^HOGAN/PAUL^08043210000000725000000?
(my ruby gem doing a puts after calling decrypt on the dukpt gem)
Look at this: https://github.com/sgbj/Dukpt.NET, I was in a similar situation where i wondered how to implement dukpt on the terminal when the terminal has its own function calls which take the INIT and KSN to create the first key, so my only problem was to make sure the INIT key was generated the same way on the terminal as it is in the above mentioned repo's code, which was simple enough using ossl encryption library for 3des with ebc and applying the appropriate masks.
I'm implementing a web based file storage service (C#). The files will be encrypted when stored on the server, but the challenge is how to implement the decryption functionality.
The files can be of any size, from a few KB to several GB. The data transfer is done in chunks, so that the users download the data from say offset 50000, 75000 etc. This works fine for unencrypted files, but if encryption is used the entire file has to be decrypted before each chunk is read from the offset.
So, I am looking at how to solve this. My research so far shows that ECB and CBC can be used. ECB is the most basic (and most insecure), with each block encrypted separately. The way ECB works is pretty much what I am looking for, but security is a concern. CBC is similar, but you need the previous block decrypted before decrypting the current block. This is ok as long as the file is read from beginning to end, and you keep the data as you decrypt it on the server, but at the end of the day its not much better than decrypting the entire file server side before transferring.
Does anyone know of any other alternatives I should consider? I have no code at this point, because I am still only doing theoretical research.
Do not use ECB (Electronic Code Book). Any patterns in your plaintext will appear as patterns in the ciphertext. CBC (Cipher Block Chaining) has random read access (the calling code knows the key and the IV is the previous block's result) but writing a block requires rewriting all following blocks.
A better mode is Counter (CTR). In effect, each block uses the same key and the IV for each block is the sum of offset of that block from a defined start and an initial IV. For example, the IV for block n is IV + n. CTR mode is described in detail on page 15 in NIST SP800-38a. Guidance on key and IV derivation can be found in NIST SP800-108.
There are a few other similar modes such as (Galois Counter Mode) GCM.
Right now, this is what I am doing:
1. SHA-1 a password like "pass123", use the first 32 characters of the hexadecimal decoding for the key
2. Encrypt with AES-256 with just whatever the default parameters are
^Is that secure enough?
I need my application to encrypt data with a password, and securely. There are too many different things that come up when I google this and some things that I don't understand about it too. I am asking this as a general question, not any specific coding language (though I'm planning on using this with Java and with iOS).
So now that I am trying to do this more properly, please follow what I have in mind:
Input is a password such as "pass123" and the data is
what I want to encrypt such as "The bank account is 038414838 and the pin is 5931"
Use PBKDF2 to derive a key from the password. Parameters:
1000 iterations
length of 256bits
Salt - this one confuses me because I am not sure where to get the salt from, do I just make one up? As in, all my encryptions would always use the salt "F" for example (since apparently salts are 8bits which is just one character)
Now I take this key, and do I hash it?? Should I use something like SHA-256? Is that secure? And what is HMAC? Should I use that?
Note: Do I need to perform both steps 2 and 3 or is just one or the other okay?
Okay now I have the 256-bit key to do the encryption with. So I perform the encryption using AES, but here's yet another confusing part (the parameters).
I'm not really sure what are the different "modes" to use, apparently there's like CBC and EBC and a bunch of others
I also am not sure about the "Initialization Vector," do I just make one up and always use that one?
And then what about other options, what is PKCS7Padding?
For your initial points:
Using hexadecimals clearly splits the key size in half. Basically, you are using AES-128 security wise. Not that that is bad, but you might also go for AES-128 and use 16 bytes.
SHA-1 is relatively safe for key derivation, but it shouldn't be used directly because of the existence/creation of rainbow tables. For this you need a function like PBKDF2 which uses an iteration count and salt.
As for the solution:
You should not encrypt PIN's if that can be avoided. Please make sure your passwords are safe enough, allow pass phrases.
Create a random number per password and save the salt (16 bytes) with the output of PBKDF2. The salt does not have to be secret, although you might want to include a system secret to add some extra security. The salt and password are hashed, so they may have any length to be compatible with PBKDF2.
No, you just save the secret generated by the PBKDF2, let the PBKDF2 generate more data when required.
Never use ECB (not EBC). Use CBC as minimum. Note that CBC encryption does not provide integrity checking (somebody might change the cipher text and you might never know it) or authenticity. For that, you might want to add an additional MAC, HMAC or use an encryption mode such as GCM. PKCS7Padding (identical to PKCS5Padding in most occurences) is a simple method of adding bogus data to get N * [blocksize] bytes, required by block wise encryption.
Don't forget to prepend a (random) IV to your cipher text in case you reuse your encryption keys. An IV is similar to a salt, but should be exactly [blocksize] bytes (16 for AES).
Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.