Deterministic Encryption - Generating IV from password key - encryption

I need to encrypt file and directory names/paths but I need the encryption to be deterministic. I need to sync the local files with a cloud storage provider so I can't use probabilistic encryption.
Know that you should not use a static IV when encrypting text, would this be an acceptable work around:
Run passphrase through scrypt and store resulting output
Take the resulting output from scrypt and hash it (using MD5 for example)
Take the first 16 bytes of the hash and use it as the IV to encrypt the directory and file name
The only other thing I can think of:
Use probabilistic encryption
Read the directory/file structure from the cloud service provider and local directory
Map all the encrypted cloud provider names with their decrypted values
Map all the encrypted local names with their decrypted values
Sync based on the mappings found above
The only issue with that is that it is time consuming and really difficult to implement when using different cloud service providers.

In order to securely encrypt data, you need to use a different key/IV pair for each message. If you don't, you leak a lot of information about the encryption and it becomes very weak. However, it's not too difficult to do if you have an incrementing counter that never repeats:
Generate a random salt (32 bytes) and store it with the rest of the data. This is public.
Take the current version of the counter as a 32-bit or 64-bit integer.
Use scrypt with your passphrase, and for the salt, concatenate your salt and the counter. Take enough bytes out for both a key and an IV.
Encrypt your file or directory name (ideally with an AEAD if possible, such as AES-GCM or ChaCha20-Poly1305) using the key and IV you've generated. Prepend the counter as an integer.
Increment the counter and store the new counter.
Using a key derivation function like scrypt to generate both the key and IV is secure as long as your use a different salt each time. By generating a random salt, which can be used for your entire project, and then appending a counter, you're producing salts that are both distinct and different from those used by others. Using just the counter wouldn't be distinct enough.
Your proposed idea will use the same key/IV pair for each file name encryption, which would be weak. It doesn't matter how you generate that same key/IV pair, using the same one would remain weak. You must also never reuse the counter in my proposal above, because otherwise you generate the same key/IV pair from scrypt. You can reuse the same counter if you change the random salt, though.
As a note, you should avoid using MD5 for any reason. SHA-256 or BLAKE2b are better choices in all situations.

Related

Self-validating encrypted string - is method feasible?

I have a keystring which allows customer to have additional features.
Obviously I would like the software to check that this string is valid, and not modified.
Is the following idea feasible:
get the key string as encrypted value, and encode it in Base64
(my encrypted string is around 100 characters, for my purpose)
calculate the checksum (MD5) of course using a private salt.
weave the checksum into the encrypted data
In principle :
xxxxCxxxxxxCxxxxxxxxCxxxxxxxxxxCxxxxxxxxxxxxxCxxx
the places to weave into the encrypted data could be determined by first cher of the encrypted, creating up to 16 different patterns.
On checking the code validity I simply "unweave" the checksum, test if it's correct, and thereby know if the data has been modified.
Is my line of thoughts correct ?
The cryptographic feature you're thinking of is called "authentication," and there are many well-established approaches. You should strongly avoid inventing your own, particularly using a long-outdated hash like MD5. When an encryption system is authenticated, it can detect changes to the ciphertext.
Your best approach is to use an authenticated cipher mode, such as AES-GCM. Used correctly, that combines encryption an authentication in a single operation. While decrypting an authenticated scheme, the decryption will fail if the cipher text has been modified.
If you don't have access to AES-GCM, the next option is AES-CBC+HMAC, which uses the more ubiquitous AES-CBC with a random IV, and appends a type of encrypted hash (called an HMAC) to the end of the message to authenticate it. In order to authenticate, you need to remove the HMAC, use it to validate that the cipher text is unmodified, and then proceed to decrypt normally. This scheme is generally called "encrypt then MAC."
The implementation details will depend on your language and frameworks.

Encrypting a file with a weak password, bcrypt or SHA-256 + AES-256?

I start with a weak password (8 lower case characters for ex) and a file. I need to encrypt that file using that password. Result has to be secure against known attacks.
Approach 1: I could hash the password using SHA-256 and then use the resulting hash and file as inputs to AES-256, giving me an encrypted file. I understand that both SHA-256 and AES-256 are very fast. Wouldn't this make the file vulnerable to a brute force attack?
For example, could one grab a rainbow table of pre-computed SHA-256 hashes and, assuming its a really small file and a really weak password, try to AES-256 decrypt using each hash from that table in a reasonable time (a few months with specialized hardware).
Approach 2: Use bcrypt. If I understand correctly, bcrypt is better suited for encrypting files than SHA-256 + AES-256, since it's key generation scheme has a work factor resulting in a stronger key. Or am I wrong?
The Ruby and Python implementations (wrappers?) that I've seen focus on using bcrypt as a hashing scheme for passwords, not a cipher per se. Can I even use bcrypt to hash a weak pass AND encrypt the file in "one step"?
Approach 3: Use bcrypt to hash the pass, use that hash and file as inputs into AES-256, giving me the encrypted file. This takes care of the "key is too fast to generate" problem. (Assuming its a problem.) However, bcrypt hashes are 448-bits long and AES-256 wants a 256-bit key. Naive solution is to simply drop the trailing bits of the hash and use that as the key for AES-256. I would NOT go this route because I don't know enough about cryptography to know what the consequences are.
EDIT: I can't salt the pass, since this is for an offline application. ie. there is no reasonable place to store the salt. I can salt the pass and store the salt unencrypted along with the encrypted file. Salts are almost inherently public/visible if say a database is compromised. Purpose of a salt is to prevent a rainbow table attack. Thanks to Nemo, bellow.
Approach 4: Use PKCS#5 (PBKDF2 for deriving a key from a pass + a cipher of your choice for encryption using that key), preferably somebody else's implementation.
And don't forget the salt. (You store it together with the encrypted data. It only needs to be 8 bytes or so.)

Encryption: How to turn an 8 character string into a 128-bit key, 256-bit key, etc?

I tried to research this, but there were still some questions left unanswered. I was looking into figuring out how an 8 character password gets turned into a high-bit encryption key. During my research I found articles that would talk about the salt value.
Assume you could get all 256 characters to play with, then an 8-character password would be 64-bits long. So, the remaining 64 bits is simply a salt value. And, correct me if I'm wrong, but this is done so that if someone was going to try to try ALL the possible values (brute force) they'd have to try all 128-bits since even the salt is unknown.
My questions really relate to this 'salt' value:
When someone makes an application, is the salt value hard-coded into it? And if so, can't it be obtained through reverse engineering the executable?
If the salt is generated at random, then I assume it must have some way to duplicate it. So, isn't that function that returns a random salt able to be reverse engineered to force it to duplicate itself to get the salt value?
This might be out of the scope, but if a salt value is generated on a server side (of a client/server relation), then wouldn't it have to be shared with the client so they can decrypt data sent by the server? And, if it's being sent over to the client, can't it be intercepted which makes it useless?
Is there some other method that is used besides this 'salt' value that can turn an 8-character string into a strong encryption key?
As usual with security-related questions, this answer's going to be a long one.
First, the simple answer.
Q: How does one turn an 8-character string into a 128-bit key?
A: One doesn't.
This is a truthful answer. Now, one that's more appropriate to what you're asking:
A: Create a random 64-bit value, and store it with the password in the database. Now, the password is half the key, and the random value is the other half.
This one is a lie. Here's what you actually do:
A: Hash the password along with a random salt using a method producing 128-bit or longer output. Use 128 bits of that as the key. Store the salt.
Now to address your questions on salt. First off, the purpose of salt is not really to lengthen encryption keys. It is to prevent people building rainbow tables - mappings from hashed to unhashed forms. To see that your encryption is no stronger, just imagine the attacker knows your key-extending algorithm. Now, instead of guessing 128-bit keys, he just guesses your 64-bit password and then uses the same algorithm. Voila. If the salt is unknown to the attacker, yes, you've gained a bit, but they must already have your ciphertexts to attack them, and the salt must be stored in the plain along with the ciphertext. So this is an unlikely scenario.
Salt is random per encryption key.
Random means random. If you are insufficiently random when you use a cryptography algorithm which assumes unpredictable material, you are vulnerable. That's what /dev/random is for - the system entropy pool is very good. Get a better hardware RNG if you're worried.
Yes, if you salted the key, someone needs the salt to decrypt things you encrypted using the salted key's hashed value. No, sending the salt does not necessarily compromise your data; send the salt only to someone who has proved they already have the password, but it's stored in your database next to the ciphertext. As mentioned above, someone needs both the salt and the ciphertext to mount an attack. Again, the purpose of the salt is not to raise the strength of the encryption, it is only to prevent precomputation attacks against your hashes.
There are methods of key extension. But, fundamentally, your protection is only so strong as its weakest link, so to provide 100% unbreakable encryption you will need a one-time-pad (a truly random key as long as the data to be encrypted). In the real world, what is usually done is hashing the password along with a salt to produce unpredictable longer keying material.
The function that turns a password or passphrase into a cryptographic key is called a Key Derivation Function (this might help you searching for more information on the topic). Such functions take a password and a randomly generated salt, and produce a key through a process that is deliberately computationally intensive. To reproduce that key, you must have both the password and the salt - so you are correct, the salt must be stored or transmitted along with the encrypted data.
The reason that Key Derivation Functions use a salt is to increase the work factor for any attacker. If a salt was not used, then a given password will only ever produce one single key. This means that an attacker can easily create a dictionary of keys - one key for each word in his dictionary. If, on the other hand, a 64 bit salt is used then each password can produce ~2**64 different possible keys, which expands the size of the dictionary by the same factor. This essentially makes producing such a dictionary ahead-of-time impossible. Instead, the attacker has to wait until he's seen the salt value, and then start generating keys to test. Since the key derivation function is computationally expensive, this is slow, and he won't be able to get far through his dictionary in a reasonable timeframe.
1) Each password is salted differently, and the salt is stored with the hash.
2) It's stored.
3) No, the client never decrypts anything. It sends the password, which the server salts, hashes and compares.
4) Yes, I'll add a few links.
Salts are generally not hardcoded, but they are generated at random, usually server-side, and never communicated to the user.
The salt would be stored in a database, separate from the passwords. The idea is that even if the password hash database is stolen, it would be very difficult to get the actual passwords (you'd have to try a lot of combinations), without having the salts as well. The salts would be generated at random, and different for each user, so even if you found it out for one, you'd still need to find all the others.
The salt is never sent, because the client never decrypts anything. The client sends the password to the server, the server adds the salt (which is randomly generated and stored for each user, and the user never know it).
So basically on this is what happens.
On registration:
User sends password to server.
Server adds a random salt to the password and then hashes it.
The salt and final hash are stored in separate tables.
On login:
User sends password to server.
Server fetches stored hash, and adds it to the password.
Server hashes the password and salt.
If the final hash matches the one in database, the user is logged
in.
...Is there some other method that is used besides this 'salt' value that can turn an 8-character string into a strong encryption key?
YES but...
You can compute the hash of that 8-character string:
For example if you need a 256 bit key:
key-256bit = hash(8-character string) //use SHA-256 - very secure
key-128bit = hash(8-character string) //use MD5 no more considered secure
"into a strong encryption key?" about strong.... depend how strong you need it because if you use only a 8-character string it mean that you could only create 2^8=256 different hash values and that's an easy task to brute force!!
conclusion: a salt would be of great value!
cheers
Daniel

AES Encryption and key storage?

A few years ago, when first being introduced to ASP.net and the .NET Framework, I built a very simple online file storage system.
This system used Rijndael encryption for storing the files encrypted on the server's hard drive, and an HttpHandler to decrypt and send those files to the client.
Being one of my first project with ASP.net and databases, not understanding much about how the whole thing works (as well as falling to the same trap described by Jeff Atwood on this subject), I decided to store freshly generated keys and IVs together with each file entry in the database.
To make things a bit clearer, encryption was only to protect files from direct access to the server, and keys were not generated by user-entered passwords.
My question is, assuming I don't want to keep one key for all files, how should I store encryption keys for best security? What is considered best practice? (i.e: On a different server, on a plain-text file, encrypted).
Also, what is the initialization vector used for in this type of encryption algorithm? Should it be constant in a system?
Keys should be protected and kept secret, simple as that. The implementation is not. Key Management Systems get sold for large amounts of money by trusted vendors because solving the problem is hard.
You certainly don't want to use the same key for each user, the more a key is used the "easier" it comes to break it, or at least have some information leaks. AES is a block cipher, it splits the data into blocks and feeds the results of the last block encryption into the next block. An initialization vector is the initial feed into the algorithm, because at the starting point there is nothing to start with. Using random IVs with the same key lowers the risk of information leaks - it should be different for every single piece of data encrypted.
How you store the keys depends on how your system is architected. I've just finished a KMS where the keys are kept away from the main system and functions to encrypt and decrypt are exposed via WCF. You send in plain text and get a reference to a key and the ciphered text back - that way the KMS is responsible for all cryptography in the system. This may be overkill in your case. If the user enters a password into your system then you could use that to generate a key pair. This keypair could then be used to encrypt a key store for that user - XML, SQL, whatever, and used to decrypt each key which is used to protect data.
Without knowing more about how your system is configured, or it's purpose it's hard to recommend anything other than "Keys must be protected, keys and IVs must not be reused."
There's a very good article on this one at http://web.archive.org/web/20121017062956/http://www.di-mgt.com.au/cryptoCreditcard.html which covers the both the IV and salting issues and the problems with ECB referred to above.
It still doesn't quite cover "where do I store the key", admittedly, but after reading and digesting it, it won't be a huge leap to a solution hopefully....
As a pretty good soltution, you could store your Key/IV pair in a table:
ID Key IV
skjsh-38798-1298-hjj FHDJK398720== HFkjdf87923==
When you save an encrypted value, save the ID and a random Salt value along with it.
Then, when you need to decrypt the value, lookup the key/iv pair using the id and the salt stored with the data.
You'd want to make sure you have a good security model around the key storage. If you went with SQL server, don't grant SELECT rights to the user that accesses the database from the application. You wouldn't want to give someone access to the whole table.
What if, you simply just generated a key for each user, then encrypted it with a "master key"? Then, make sure to have random ivs and as long as you keep the master key secret, no one should be able to make much use of any amount of keys. Of course, the encryption and decryption functions would have to be server-side, as well as the master key not being exposed at all, not even to the rest of the server. This would be a decent way to go about it, but obviously, there are some issues, namely, if you have stored your master key unsafely, well there goes your security. Of course, you could encrypt the master key, but then your just kicking the can down the road. Maybe, you could have an AES key, encrypted with a RSA key, and the RSA key is then secured by a secret passprase. This would mitigate the problem, as if you have a decent sized RSA key, you should be good, and then you could expose the encryption functions to the client (though still probably shouldn't) and since the key encryption uses a public key, you can have that taken. For added security, you could cycle the RSA key every few months or even weeks if need be. These are just a few ideas, and I know that it isn't bulletproof, but is more secure than just stuffing it in a sql database.

How to implement password protection for individual files?

I'm writing a little desktop app that should be able to encrypt a data file and protect it with a password (i.e. one must enter the correct password to decrypt). I want the encrypted data file to be self-contained and portable, so the authentication has to be embedded in the file (or so I assume).
I have a strategy that appears workable and seems logical based on what I know (which is probably just enough to be dangerous), but I have no idea if it's actually a good design or not. So tell me: is this crazy? Is there a better/best way to do it?
Step 1: User enters plain-text password, e.g. "MyDifficultPassword"
Step 2: App hashes the user-password and uses that value as the symmetric key to encrypt/decrypt the data file. e.g. "MyDifficultPassword" --> "HashedUserPwdAndKey".
Step 3: App hashes the hashed value from step 2 and saves the new value in the data file header (i.e. the unencrypted part of the data file) and uses that value to validate the user's password. e.g. "HashedUserPwdAndKey" --> "HashedValueForAuthentication"
Basically I'm extrapolating from the common way to implement web-site passwords (when you're not using OpenID, that is), which is to store the (salted) hash of the user's password in your DB and never save the actual password. But since I use the hashed user password for the symmetric encryption key, I can't use the same value for authentication. So I hash it again, basically treating it just like another password, and save the doubly-hashed value in the data file. That way, I can take the file to another PC and decrypt it by simply entering my password.
So is this design reasonably secure, or hopelessly naive, or somewhere in between? Thanks!
EDIT: clarification and follow-up question re: Salt.
I thought the salt had to be kept secret to be useful, but your answers and links imply this is not the case. For example, this spec linked by erickson (below) says:
Thus, password-based key derivation as defined here is a function of a password, a salt, and an iteration count, where the latter two quantities need not be kept secret.
Does this mean that I could store the salt value in the same place/file as the hashed key and still be more secure than if I used no salt at all when hashing? How does that work?
A little more context: the encrypted file isn't meant to be shared with or decrypted by others, it's really single-user data. But I'd like to deploy it in a shared environment on computers I don't fully control (e.g. at work) and be able to migrate/move the data by simply copying the file (so I can use it at home, on different workstations, etc.).
Key Generation
I would recommend using a recognized algorithm such as PBKDF2 defined in PKCS #5 version 2.0 to generate a key from your password. It's similar to the algorithm you outline, but is capable of generating longer symmetric keys for use with AES. You should be able to find an open-source library that implements PBE key generators for different algorithms.
File Format
You might also consider using the Cryptographic Message Syntax as a format for your file. This will require some study on your part, but again there are existing libraries to use, and it opens up the possibility of inter-operating more smoothly with other software, like S/MIME-enabled mail clients.
Password Validation
Regarding your desire to store a hash of the password, if you use PBKDF2 to generate the key, you could use a standard password hashing algorithm (big salt, a thousand rounds of hashing) for that, and get different values.
Alternatively, you could compute a MAC on the content. A hash collision on a password is more likely to be useful to an attacker; a hash collision on the content is likely to be worthless. But it would serve to let a legitimate recipient know that the wrong password was used for decryption.
Cryptographic Salt
Salt helps to thwart pre-computed dictionary attacks.
Suppose an attacker has a list of likely passwords. He can hash each and compare it to the hash of his victim's password, and see if it matches. If the list is large, this could take a long time. He doesn't want spend that much time on his next target, so he records the result in a "dictionary" where a hash points to its corresponding input. If the list of passwords is very, very long, he can use techniques like a Rainbow Table to save some space.
However, suppose his next target salted their password. Even if the attacker knows what the salt is, his precomputed table is worthless—the salt changes the hash resulting from each password. He has to re-hash all of the passwords in his list, affixing the target's salt to the input. Every different salt requires a different dictionary, and if enough salts are used, the attacker won't have room to store dictionaries for them all. Trading space to save time is no longer an option; the attacker must fall back to hashing each password in his list for each target he wants to attack.
So, it's not necessary to keep the salt secret. Ensuring that the attacker doesn't have a pre-computed dictionary corresponding to that particular salt is sufficient.
As Niyaz said, the approach sounds reasonable if you use a quality implementation of strong algorithms, like SHA-265 and AES for hashing and encryption. Additionally I would recommend using a Salt to reduce the possibility to create a dictionary of all password hashes.
Of course, reading Bruce Schneier's Applied Cryptography is never wrong either.
If you are using a strong hash algorithm (SHA-2) and a strong Encryption algorithm (AES), you will do fine with this approach.
Why not use a compression library that supports password-protected files? I've used a password-protected zip file containing XML content in the past :}
Is there really need to save the hashed password into the file. Can't you just use the password (or hashed password) with some salt and then encrypt the file with it. When decrypting just try to decrypt the file with the password + salt. If user gives wrong password the decrypted file isn't correct.
Only drawbacks I can think is if the user accidentally enters wrong password and the decryption is slow, he has to wait to try again. And of course if password is forgotten there's no way to decrypt the file.

Resources