Block ciphers SALT: clear text or secret? - encryption

There are many articles and quotes on the web saying that a 'salt' must be kept secret. Even the Wikipedia entry on Salt:
For best security, the salt value is
kept secret, separate from the
password database. This provides an
advantage when a database is stolen,
but the salt is not. To determine a
password from a stolen hash, an
attacker cannot simply try common
passwords (such as English language
words or names). Rather, they must
calculate the hashes of random
characters (at least for the portion
of the input they know is the salt),
which is much slower.
Since I happen to know for a fact that encryption Salt (or Initialization Vectors) are OK to be stored on clear text along with the encrypted text, I want to ask why is this misconception perpetuated ?
My opinion is that the origin of the problem is a common confusion between the encryption salt (the block cipher's initialization vector) and the hashing 'salt'. In storing hashed passwords is a common practice to add a nonce, or a 'salt', and is (marginally) true that this 'salt' is better kept secret. Which in turn makes it not a salt at all, but a key, similar to the much clearly named secret in HMAC. If you look at the article Storing Passwords - done right! which is linked from the Wikipedia 'Salt' entry you'll see that is talking about this kind of 'salt', the password hash. I happen to disagree with most of these schemes because I believe that a password storage scheme should also allow for HTTP Digest authentication, in which case the only possible storage is the HA1 digest of the username:realm:password, see Storing password in tables and Digest authentication.
If you have an opinion on this issue, please post here as a response.
Do you think that the salt for block cipher encryption should be hidden? Explain why and how.
Do you agree that the blanket statement 'salts should be hidden' originates from salted hashing and does not apply to encryption?
Sould we include stream ciphers in discussion (RC4)?

If you are talking about IV in block cipher, it definitely should be in clear. Most people make their cipher weaker by using secret IV.
IV should be random, different for each encryption. It's very difficult to manage a random IV so some people simply use a fixed IV, defeating the purpose of IV.
I used to work with a database with password encrypted using secret fixed IV. The same password is always encrypted to the same ciphertext. This is very prone to rainbow table attack.

Do you think that the salt for block
cipher encryption should be hidden?
Explain why and how
No it shouldn't. The strength of a block cipher relies on the key. IMO you should not increase the strength of your encryption by adding extra secrets. If the cipher and key are not strong enough then you need to change the cipher or key length, not start keeping other bits of data secret. Security is hard enough so keep it simple.

Like LFSR Consulting says:
There are people that are much smarter
than you and I that have spent more
time thinking about this topic than
you or I ever will.
Which is a loaded answer to say the least. There are folks who, marginally in the honest category, will overlook some restraints when money is available. There are a plethora of people who have no skin at the fire and will lower the boundaries for that type,....
then, not too far away, there is a type of risk that comes from social factors - which is almost impossible to program away. For that person, setting up a device solely to "break the locks" can be an exercise of pure pleasure for no gain or measurable reason. That said, you asked that those who have an opinion please respond so here goes:
Do you think that the salt for block
cipher encryption should be hidden?
Explain why and how.
Think of it this way, it adds to the computational strength needed. It's just one more thing to hide if it has to be hidden. By and of it's self, being forced to hide ( salt, iv, or anything ) places the entity doing the security in the position of being forced to do something. Anytime the opposition can tell you what to do, they can manipulate you. If it leaks, that should have been caught by cross-controls that would have detected the leak and replacement salts available. There is no perfect cipher, save otp, and even that can be compromised somehow as greatest risk comes from within.
In my opinion, the only solution is to be selective about whom you do any security for - the issue of protecting salts leads to issues that are relevant to the threat model. Obviously, keys have to be protected. If you have to protect the salt, you probably need to review your burger flippin resume and question the overall security approach of those for whom you are working.
There is no answer, actually.
Do you agree that the blanket statement 'salts should be hidden' originates from salted hashing and does not apply to encryption?
Who said this, where, and what basis was given.
Should we include stream ciphers in discussion (RC4)?
A cipher is a cipher - what difference would it make?

Each encrypted block is the next block IV. So by definition, the IV cannot be secret. Each block is an IV.
The first block is not very different. An attacker who knows the length of the plain text could have a hint that the first block is the IV.
BLOCK1 could be IV or Encrypted with well known IV
BLOCK2 is encrypted with BLOCK#1 as an IV
...
BLOCK N is encrypted with BLOCK#N-1 as an IV
Still, whenever possible, I generate a random (non-null) IV and give it to each party out-of-band. But the security gain is probably not that important.

The purpose of a per record salt is to make the task of reversing the hashes much harder. So if a password database is exposed the effort required to break the passwords is increased. So assuming that the attacker knows exactly how you perform the hash, rather than constructing a single rainbow table for the entire database they need to do this for every entry in the database.
The per record salt is usually some combination of fields in the record that vary greatly between records. Transaction time, Account Number, transaction Number are all good examples of fields that can be used in a per record salt. A record salt should come from other fields in the record. So yes it is not secret, but you should avoid publicising the method of calculation.
There is a separate issue with a database wide salt. This is a sort of key, and protects against the attacker using existing rainbow tables to crack the passwords. The database wide salt should be stored separately so that if the database is compromised then it is unlikely that the attacker will get this value as well.
A database wide salt should be treated as though it was a key and access to the salt value should be moderately protected. One way of doing this is to split the salt into components that are managed in different domains. One component in the code, one in a configuration file, one in the database. Only the running code should be able to read all of these and combine them together using a bit wide XOR.
The last area is where many fail. There must be a way to change these salt values and or algorithm. If a security incident occurs we may want to be able to change the salt values easily. The database should have a salt version field and the code will use the version to identify which salts to use and in what combination. The encryption or hash creation always uses the latest salt algorithm, but the decode verify function always uses the algorithm specified in the record. This way a low priority thread can read through the database decrypting and re-encrypting the entries.

Related

symmetric AES enryption concept

i have a project for a website, running on Django. One function of it needs to store user/password for a third party website. So it needs to be symmetric encryption, as it needs to use these credentials in an automated process.
Storing credentials is never a good idea, I know, but for this case there is no other option.
My idea so far is, to create a Django app, that will save and use these passwords, and do nothing else. With this I can have 2 "webservers" that will not receive any request from outside, but only get tasking via redis or something. Therefore I can isolate them to some degree (they are the only servers who will have access to this extra db, they will not handle any web request, etc)
First question: Does this plan sound solid or is there a major flaw?
Second question is about the encryption itself:
AES requires an encryption key for all its work, ok that needs to be "secured" in some way. But I am more interested in the IV.
Every user can have one or more credential sets saved in the extra db. Would it be a good idea to use some hash of sort over the user id or something to generate a per user custom IV? Most of the time I see IV to be just random generated. But then I will have to also store them somewhere in addition to the key.
For me it gets a bit confusing here. I need key and IV to decrypt, but I would "store" them the same way. So wouldn't it be likely if one get compromised, that also the IV will be? Would it then make any difference if I generate the IV on the fly over a known procedure? Problem then, everyone could know the IV if they know their user id, as the code will be open source....
In the end, I need some direction guidance as how to handle key and best unique IV per user. Thank you very much for reading so far :-)
Does this plan sound solid or is there a major flaw?
The need to store use credentials is imho a flaw by design, at least we all appreciate you are aware of it.
Having a separate credential service with dedicated datastore seems to be best you can do under stated conditions. I don't like the option to store user credentials, but let's skip academic discussion to practical things.
AES requires an encryption key for all its work, ok that needs to be "secured" in some way.
Yes, there's the whole problem.
to generate a per user custom IV?
IV allows reusing the same key for multiple encryptions, so effectively it needs to be unique for each ciphertext (if a user has multiple passwords, you need an IV for each password). Very commonly IV is prepended to the ciphertext as it is needed to decrypt it.
Would it then make any difference if I generate the IV on the fly over a known procedure?
IV doesn't need to be secret itself.
Some encryption modes require the IV to be unpredictable (e.g. CBC mode), therefore it's best if you generate the IV as random. There are some modes that use IV as a counter to encrypt/decrypt only part of data (such as CTR or OFB), but still it is required the IV is unique for each key and encryption.

How to generate AES-ECB encryption secret key given examples of plaintexts and hashes?

My question is that, suppose you have some AES-ECB encrypted hash and you want to decode it. You are also given a bunch of example plaintexts and hashes. For example:
I want: unknown_plaintext for the hash given_hash
and i have a bunch of known_plaintexts and hashes that have been encrypted with the same secret key. None of them (obviously) are the exact same to the given hash.
Please let me know if you can help. This is not for malicious intents, just to learn how Cryptography and AES systems work.
This is not computationally feasible. I.e., you can't do this.
Modern encryption algorithms like AES are resistant to known-plaintext attacks, which is what you are describing.
There has been some past success in a category called adaptive chosen plaintext attacks. Often these exploit an "oracle." In this scenario, an attacker can decrypt a single message by repeatedly asking the victim whether it can successfully decrypt a guess generated by the attacker. By being smart about choosing successive guesses, the attacker could decrypt the message with a million tries or so, which is a relatively small number. But even in this scenario, the attacker can't recover the key.
As an aside, ciphers don't generate hashes. They output cipher text. Hash functions (aka message digests) generate hashes.
For any respectable block cipher (and AES is a respectable block cipher), the only way to decrypt a ciphertext block (not "hash") is to know the key, and the only way to find the key from a bunch of plaintext-ciphertext pairs is by guessing a key and seeing if it maps a known plaintext onto the corresponding ciphertext. If you have some knowledge of how the key was chosen (e.g., SHA-256 of a pet's name), this might work; but if the key was randomly selected from the set of all possible AES keys, the number of guesses required to produce a significant probability of success is such a large number that you wander off into age-of-the-universe handwaving.
If you know that all the encrypted hashes are encrypted with the same key you can first try to find that key using your pairs of plaintexts and encrypted hashes. The most obvious way to do that would be to just take one of your plaintexts, first hash it and then try out all the possible keys to encrypt it until it matches the encrypted hash that you know. If the key you're looking for is just one of the many many possible AES keys this is set to fail, because it would take way too long to try all the keys.
Assuming you were able to recover the AES key somehow, you can decrypt that one hash you don't have a plaintext for and start looking for the plaintext.
The more you know about the plaintext, the easier this guesswork would be. You could just throw the decrypted hash into google and see what it spits out, query databases of known hashes or make guesses in the most eduated way possible. This step will again fail, if the hash is strong enough and the plaintext is random enough.
As other people have indicated, modern encryption algorithms are specifically designed to resist this kind of attack. Even a rather weak encryption algorithm like the Tiny Encryption Algorithm would require well over 8 million chosen plaintexts to do anything like this. Better algorithms like AES, Blowfish, etc. require vastly more than that.
As of right now, there are no practical attacks on AES.
If you're interested in learning about cryptography, the older Data Encryption Standard (DES) may actually be a more interesting place to start than AES; there's a lot of literature available about it and it was already broken (the code to do so is still freely available online - studying it is actually really useful).

RC4 Safe to use plaintext as the key to encrypt itself?

Basically what the title says. If I have a password, of say "APPLEPIE" is it safe to use "APPLEPIE" as the key when I RC4 it? Is it possible to break the RC4 encryption when you know the Key and Plaintext or are short and the same?
This should be handled with a key generation algorithm like PBKDF2, which will allow you to securely generate a hash from your password in a way that is appropriate for password verification (which is what I assume you're doing).
While it is possible to generate a system by which RC4 would be safe this way (by converting the password into an RC4 key using a good KDF (such as PBKDF2), and then generating a random nonce), this is a lot of overhead to no purpose. You'll wind up with a much longer final cipher text for the same level of security, and it'll take you longer to generate it. In the end, you'll have just created an extremely complicated secure hash function (whose first step is "do the only thing you needed to do anyway). And you'll probably have made a mistake along the way, making the system insecure. RC4 can be tricky to do correctly and has known related-key attacks; hence the break of WEP.

How does the user inputted password unlock the master key created by TrueCrypt?

I am a student attempting to understand the mechanisms of the Open Source cryptography software http://www.truecrypt.org/ . In TrueCrypt there is a user created key, and/or keyfile, as well as a program generated master key. I would like a link to or a better semi technical explanation of how this user created password unlocks the header file. I have read the TrueCrypt docs at http://www.truecrypt.org/docs/?s=technical-details , [I would post more but new users are only allowed two links] , and the rest of the true crypt documentation. I would like an explanation at a High level of how the password unlocks the header files, and as a sidebar, how the salt helps to prevent rainbow attacks.
Sorry for adding to the question so frequently, but I realize the main heart of the question is this. I am trying to figure out how the password is changeable. To do this, I need to understand how header key relates to the master key, because you can change the header key, yet only certain header key's will work with your master key. The header key must be used to create the master, yet you can choose an arbitrary password that will create a header key that will also work with the master key.
Truecrypt takes your password and passes it through PBKDF2. It's like a hash function, but takes much longer, and is slower - to slow down brute force attacks. Similar password-derivation algorithms are bcrypt and scrypt. These three are the 'big three' when it comes to 'hashing' passwords - anything else, like a simple SHA-1 or MD5 of a password is generally too fast to be safe. Attackers can run brute force attacks against simple hashes like SHA-1 very quickly. PBKDF2, bcrypt, and scrypt are much slower.
But, theoretically you could make a rainbow table against PBKDF2, bcrypt, and scrypt with the parameters used (Each has some optional parameters). The salt Truecrypt uses is designed to defend against that.
http://www.truecrypt.org/docs/header-key-derivation is the main reference for this.
More specifically is the header key == to the user key after PBKDF2 and salt?
I believe the derived-from-password key is used to decrypt the header, which contains the master key. This way you can change your password just by re-encrypting the master key with a new password.
truecrypt.org/docs/?s=header-key-derivation says that the salt is unecrypted? Is it really that difficult to add the unencrypted salt to your rainbow table and try again?
Building a rainbow table is difficult, I think difficult as brute-forcing but I'm not sure. They're in the same ballpark though. So the threat model you're thinking of "I should encrypt my salt!" doesn't really come into play. Plus, you need the salt to derive the key, to decrypt the block, to get the salt. Chicken and the Egg.
I'm not sure how adding the salt translates to if "512-bit salt is used, which means there are 2^512 keys for each password."
They mean a password of "password" actually has 2^512 combinations: password0000001, password0000002, password0000003 and so on.

Passwords and different types of encryption

I know, I know, similar questions have been asked millions and billions of times already, but since most of them got a different flavor, I got one of my own.
Currently I'm working on a website that is meant to be launched all across my country, therefore, needs some kind of protection for user system.
I've been lately reading alot about password encryption, hashing, salting.. you name it, but after reading that much of articles, I get confused.
One says that plain SHA512 encryption is enough for a password, others say that you have to use "salt" no matter what you would do, and then there are guys who say that you should build a whole new machine for password encryption because that way no one will be able to get it.
For now I'm using hash_hmac(); with SHA512, plus, password gets random SHA1 salt and the last part, defined random md5(); key. For most of us it'll sound secure, but is it?
I recently read here on SO, that bcrypt(); (now known as crypt(); with Blowfish hashing) is the most secure way. After reading PHP manual about crypt(); and associated stuff, I'm confused.
Basicly, the question is, will my hash_hmac(); beat the hell out of Blowfished crypt(); or vice-versa?
And one more, maybe there are more secure options for password hashing?
The key to proper application of cryptography is to define with enough precision what properties you are after.
Usually, when someone wants to hash passwords, it is in the following context: a server is authenticating users; users show their password, through a confidential channel (HTTPS...). Thus, the server must store user passwords, or at least store something which can be used to verify a password. We do not want to store the passwords "as is" because an attacker gaining read access to the server database would then learn all passwords. This is our attack model.
A password is something which fits in the brain of the average user, hence it cannot be fully unguessable. A few users will choose very long passwords with high entropy, but most will select passwords with an entropy no higher than, say, 32 bits. This is a way of saying that an attacker will have to "try" on average less than 231 (about 2 billions) potential passwords before finding the right one.
Whatever the server stores, it is sufficient to verify a password; hence, our attacker has all the data needed to try passwords, limited only by the computing power he can muster. This is known as an offline dictionary attack.
One must assume that our attacker can crack one password. At that point we may hope for two properties:
cracking a single password should be difficult (a matter of days or weeks, rather than seconds);
cracking two passwords should be twice as hard as cracking one.
Those two properties call for distinct countermeasures, which can be combined.
1. Slow hash
Hash functions are fast. Computing power is cheap. As a data point, with SHA-1 as hash function, and a 130$ NVidia graphic card, I can hash 160 millions passwords per second. The 231 cost is paid in about 13 seconds. SHA-1 is thus too fast for security.
On the other hand, the user will not see any difference between being authenticated in 1µs, and being authenticated in 1ms. So the trick here is to warp the hash function in a way which makes it slow.
For instance, given a hash function H, use another hash function H' defined as:
H'(x) = H(x || x || x || ... || x)
where '||' means concatenation. In plain words, repeat the input enough times so that computing the H' function takes some non-negligible time. So you set a timing target, e.g. 1ms, and adjust the number of repetitions needed to reach that target. 10ms means that your server will be able to authenticate 10 users per second at the cost of only 10% of its computing power. Note that we are talking about a server storing a hashed password for its own ulterior usage, hence there is no interoperability issue here: each server can use a specific repetition count, tailored for its power.
Suppose now that the attacker can have 100 times your computing power; e.g. the attacker is a bored student -- the nemesis of many security systems -- and can use dozens of computers across his university campus. Also, the attacker may use a more thoroughly optimized implementation of the hash function H (you are talking about PHP but the attacker can do assembly). Moreover, the attacker is patient: users cannot wait for more than a fraction of a second, but a sufficiently bored student may try for several days. Yet, trying 2 billions passwords will still require about 3 full days worth of computing. This is not ultimately secure, but is much better than 13 seconds on a single cheap PC.
2. Salts
A salt is a piece of public data which you hash with the password in order to prevent sharing.
"Sharing" is what happens when the attacker can reuse his hashing efforts over several attacked passwords. This is what happens when the attacker has several hashed passwords (he read the whole database of hashed passwords): whenever he hashes one potential password, he can look it up against all hashed passwords he is trying to attack. We call that a parallel dictionary attack. Another instance of sharing is when the attacker can build a precomputed table of hashed passwords, and then use his table repeatedly (by simple lookups). The fabled rainbow table is just a special case of a precomputed table (that's just a time-memory trade-off which allows for using a precomputed table much bigger than what would fit on a hard disk; but building the table still requires hashing each potential password). Space-time wise, parallel attacks and precomputed tables are the same attack.
Salting defeats sharing. The salt is a public data element which alters the hashing process (one could say that the salt selects the hash function among a whole set of distinct functions). The point of the salt is that it is unique for each password. The attacker can no longer share cracking efforts because any precomputed table would have to use a specific salt and would be useless against a password hashed with a distinct salt.
The salt must be used to verify a password, hence the server must store, for each hashed password, the salt value which was used to hash that password. In a database, that's just an extra column. Or you could concatenate the salt and the hash password in a single blob; that's just a matter of data encoding and it is up to you.
Assuming S to be the salt (i.e. some bytes), the hashing process for password p is: H'(S||p) (with the H' function defined in the previous section). That's it!
The point of the salt is to be, as much as possible, unique to each hashed password. A simple way to achieve that is to use random salts: whenever a password is created or changed, use a random generator to get 16 random bytes. 16 bytes ought to be enough to make salt reuse highly improbable. Note that the salt should be unique for each password: using the user name as a salt is not sufficient (some distinct server instances may have users with the same name -- how many "bob"s exist out there ? -- and, also, some users change their password, and the new password should not use the same salt than the previous password).
3. Choice of hash function
The H' hash function is built over a hash function H. Some traditional implementations have used encryption algorithms twisted into hash functions (e.g. DES for Unix's crypt()). This has promoted the use of the "encrypted password" expression, although it is not proper (the password is not encrypted because there is no decryption process; the correct term is "hashed password"). It seems safer, however, to use a real hash function, designed for the purpose of hashing.
The most used hash functions are: MD5, SHA-1, SHA-256, SHA-512 (the latter two are collectively known as "SHA-2"). Some weaknesses have been found in MD5 and SHA-1. Those weaknesses have serious impact for some usages, but not for what is described above (the weaknesses are about collisions, whereas we work here on preimage resistance). However, it is better public relations to choose SHA-256 or SHA-512: if you use MD5 or SHA-1, you may have to justify yourself. SHA-256 and SHA-512 differ by their output size and performance (on some systems, SHA-256 is much faster than SHA-512, and on others SHA-512 is faster than SHA-256). However, performance is not an issue here (regardless of the hash function intrinsic speed, we make it much slower through input repetitions), and the 256 bits of SHA-256 output are more than enough. Truncating the hash function output to the first n bits, in order to save on storage costs, is cryptographically valid, as long as you keep at least 128 bits (n >= 128).
4. Conclusion
Whenever you create or modify a password, generate a new random salt S (16 bytes). Then hash the password p as SHA-256(S||p||S||p||S||p||...||S||p) where the 'S||p' pattern is repeated enough times to that the hashing process takes 10ms. Store both S and the hash result. To verify a user password, retrieve S, recompute the hash, and compare it with the stored value.
And you will live longer and happier.
This question raises multiple points, each of which need to be addressed individually.
Firstly you should not engineer your own encryption algorithm. The argument that something is secure because it is not mainstream is completely invalid. Any algorithm you might develop will only be as strong as your understanding of cryptography.
The average developer does not have a grasp on the mathematical concepts necessary to create a strong algorithm, should your application be compromised, then your completely untested algorithm will be the only thing standing between an attacker and your users personal information, and a suitably motivated attacker will probably defeat your custom encryption much faster than they could had you used a time tested algorithm.
Using a salt is a very good idea. Because the hash is generated using both the salt and password value, a brute force attack on the hashed data becomes excessively expensive because the dictionary of hashed passwords used by an attacker would not take into account the salt value used when generating the hashes.
I'm not the most qualified person to comment on algorithm selection, so I'll leave that to somebody else.
I'm not a PHP developer, but I have some experience with encryption. My first recommendation is as Crippledsmurf suggested, absolutely don't try to "roll your own" encryption. It will have disaster written all over it.
You say you're using hash_hmac() currently. If you're just protecting user accounts and some basic information (name, address, email etc.) and not anything important such as SSN, credit cards, I think you're safe to stick with what you have.
With encryption we'd all like the most secure, complex vault to secure our stuff, but the question is, why have a huge safe door to protect things no-one would realistically want? You have to balance the type and strength of encryption you use against what you are protecting and the risk of it being taken.
Currently, if you are encrypting your information, even at a basic level, you already beat the hell out of 90% of sites and applications out there - who still store in plain text. You're using a salt (excellent idea) and you're making it extremely difficult to decrypt the information (the md5 key is good).
Make a call - is this worth protecting further. If not, don't waste your time and move on.

Resources