Web app passwords: bcrypt and SHA256 (and scrypt) - encryption

With all the recent (e.g. LinkedIn) discussions of passwords I'm looking at password hashing implementations. After two cups of coffee and a morning reading I'm no more a cryptographer than when I started. And I really don't want to pretend that I am.
Specific Questions
Does using a integer unique user ID fail as an effective salt? (crypt() uses only 16 bits?)
If I simply run sha256() on a hash over and over until a second is used up does that defeat the brute-force attacks?
If I have to ask these questions should I be using bcrypt?
Discussion/Explanation:
The goal is simply if my user's hashed passwords were leaked they:
would not be "easy" to crack,
cracking one password would not expose other users that use the same password).
What I've read for #1 is the the hash computation must be expensive -- taking, say, a second or two to calculate and maybe requiring a bit or memory (to thwart hardware decryption).
bcrypt has this built in, and scrypt, if I understand correctly, is more future-proof and includes a minimum memory usage requirement.
But, is it an equally effective approach to eat time by "rehashing" the result of sha256() as many times as needed to use up a few seconds and then store the final loop count with the hash for later checking a provided password?
For #2, using a unique salt for every password is important. What's not been clear is how random (or large) the salt must be. If the goal is to avoid everyone that uses "mypassword" as their password from having the same hash is it not enough to simply do this?:
hash = sha256_hex( unique_user_id + user_supplied_password );
or even this, although I'm not sure it buys me anything:
hash = sha256_hex( sha256( unique_user_id ) + user_supplied_password );
The only benefit I can see from using the user's ID, besides I know it is unique, is avoiding having to save the salt along with the hash. Not much of an advantage. Is there a real problem with using a user's ID as the salt? Does it not accomplish #2?
I assume if someone can steal my user's hashed passwords then I must assume they can get whatever they want -- including the source code that generates the hash. So, is there any benefit to adding an extra random string (the same string) to the password before hashing? That is:
# app_wide_string = one-time generated, random 64 7-bit *character* string.
hash = sha256_hex( unique_user_id + app_wide_string + user_supplied_password );
I have seen that suggested, but I don't understand what I gain from that over the per-user salt. If someone wanted to brute-force the attack they would know that "app_wide_string" and use that when running their dictionary attack, right?
Is there a good reason to use bcrypt over rolling my own as described above? Maybe the fact that I'm asking these questions is reason enough?
BTW -- I just timed an existing hashing function I have and on my laptop and I can generate about 7000 hashes a second. Not quite the one or two seconds that are often suggested.
Some related links:
using sha256 as hashing and salting with user's ID
SHA512 vs. Blowfish and Bcrypt
What is the optimal length for user password salt?

Bcrypt is great because you can tune the work factor from 4 to 31, each increment creates an exponentional required time, I've actually graphed it, at a work factor of 14 it's already taking over a second, so as computers get faster and faster you only need to change one parameter, and of course update your password hashes ...
My main concern with bcrypt is that if the work factor is set to high, then it may overload your system as multiple users are trying to login so you have tune it, depending on the number of of concurrent logins and the resources of your system ...
Salts are still required, their main purpose is to deterred off-line attacks, if the salt space is to large, then the adversary won't be able to generate the look up table, 64 bit salt seems a bit low, bcrypt has 128 bit salts coupled with the work factor makes it quite a challenge for offline attacks ... and yes the salt should be random for each password, bcrypt will generate one for you, if you use the same salt for each password then you have made it eassier for the adversary to comprimised all the passwords using an online attack.
Bcrypt really shines for online attacks, if you have set the work factor properly, because even if I get the hash, meant to say if the 'adversary' gets the hash, the work factor makes it really painful to go through an entire dictionary, taking multiple days and if the password isn't in the dictionary, then I'm really in trouble cause a brute force attack will be epic, the password bit space for bcrypt is quite large though finite :)
Sha256 may be taking a bit of time now, but eventually computers will get faster and faster and it'll be fairly easy for attacks, the unix guys thought crypt was so slow it would have never being an issue, and today I have done an online attack in seconds, offline attack in days, a brute force attack (going through the entire password bit space) in weeks ...
you want the salt to be as large and random as possible using only numbers makes it easier for me to iterate over all the possible ids.
multiple sha256 may take a second now but down the road it won't be effective any more, computers processing power grows exponentially and so you want an algorithm that can be configured as such.
you are doing the right thing by asking questions and doing your homework if more people did this we wouldn't have so many breaches

Does using a integer unique user ID fail as an effective salt? (crypt() uses only 16 bits?)
You'd normally use a random generated salt and then store that hash along with the encrypted password. It doesn't matter that the attacker also gets access to the salt - the purpose of it is to prevent a lookup table to be used, thereby forcing the attacker to brute force each hash individually.
crypt just stores the salt and hash into a single string, along with the algoritm to use.

Related

Why is BCrypt specifically effective against rainbow tables?

If somebody has the encryption functions from BCrypt, and encrypts a dictionary of passwords and store them on a cd. Gets access to the hashed passwords in the database, shouldn't they be able to?
I hope the answer is no. If so, why not?
Bcrypt, like other PBKDF functions, includes salting and stretching. Salting means that it adds some extra random data with the password. The salt is public. So, for instance, if my salt is "f588d29a" and my password is "password" the thing I'm actually going to hash is "f588d29apassword" (this is not precisely how bcrypt does it, but it is equivalent).
After salting, you'll hash (more in a second), and the output will be: "f588d29a,hash". So the salt and hash are known to everyone. But now your rainbow table that includes "password" isn't any use. You need "f588d29apassword" and also "aaaaaaaapassword" and also "aaaaaaabpassword" and also... a lot of passwords hashed. So that dramatically increases the amount of time and space you need. And longer salts can make this arbitrarily hard on the attacker for very little cost to the defender. This is the part that makes rainbow tables basically useless. Even if I find multiple people with the same password, their hashes will be different so my table doesn't help.
The second half of bcrypt (and other PBKDFs) is stretching, which means that performing the hash function is time intensive. We're talking tens of milliseconds usually, so it's not a big deal to humans or to one hash, but it makes password guessing much more expensive.

Will increasing the # of rounds in bcrypt make it good enough for encrypting strings?

Looking through the various encrypting and hashing algorithms they seem to focus on computation time vs security, and seem to target encrypting/hashing passwords.
In my scenario I am trying to encrypt a string that will be provided to the enduser, of which later I will provided the unencrypted version that they can match up to the encrypted version to verify a certain action (a la a provably fair system)
I thought of using sha-512, providing the hash and then later on providing the unecrypted string for which the enduser will be able to match up the hash and the unencrypted string.
However I recently discoved bcrypt, for which certain people have said it is a better choice. Now for me it does not matter how long it takes to generate the hash so for my circumstances is it best to use bcrypt with an ungodly # of rounds to make my string harder to crack or am I just going about this the wrong way?

Is my password storage technique strong enough

This is a question about whether my security process is adequate for the kind of information i am storing.
I am building a website using ASP.NET 4.0 with a SQL backend and need to know how my security would hold up with regards to passwords and hashes etc.
I don't store any critical information on someone - No real names, addresses, credit card details or anything like that... just email and username.
For now, I am deliberately leaving out some specifics as I am not sure if telling you them will weaken my security but if not I can reveal slightly more.
Here is how I do it:
The user registers with their email and a unique username up to 50 chars long
They create a password (minimum 6 chars) using any characters on the keyboard (I HTMLEncode the input and am using parameterized stored procedures so I don't restrict the chars)
I send them an email with a link to verify they are real.
I use FormsAuthentication to set an auth cookie but I'm not using SSL at the moment... I understand the implications of sending auth details across plain http but I have asked my host to add the cert so it should be ready shortly.
It's the hashing bit I need to be sure of!
I create a random 100 character salt from the following char set (I just use the System.Random class, nothing cryptographic) - abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNOPQRSTUVWXYZ0123456789!£$%^*()_{}[]#~#<,>.?
This is then merged with the password and then hashed using SHA-512 (SHA512Managed class) tens of thousands of times (takes nearly 2 seconds on my i7 laptop to generate the final hash).
This final hash is then converted to a base64 string and compared with the already-hashed password in the database (the salt is stored in another column in the DB too)
A few questions (ignore the lack of SSL for the moment, I just haven't bought the certificate yet but it will be ready in a week or so):
Does this strike you as secure enough? I understand there are degrees of security and that given enough time and resources anything is breakable but given that I don't store critical data, does it seem like enough?
Would revealing the actual number of times I hash the password weaken my security?
Does a 100 character salt make any difference over, say, a 20 character one?
By revealing how I join a password and salt together, would that weaken my security?
So, let's try to answer your questions one by one:
Does this strike you as secure enough? I understand there are degrees of security and that given enough time and resources anything is breakable but given that I don't store critical data, does it seem like enough?
No. It is definitely not "secure enough".
Without seeing code, it's hard to say more. But the fact that you're doing a straight SHA512 instead of doing a HMAC indicates one problem. Not because you need to be using a HMAC, but because most algorithms that are designed for this purpose use HMAC under the hood (for several reasons).
And it seems likely you're doing hash = SHA512(hash) (just from your wording) which is proven to be bad.
So without seeing code, it's hard to say for sure, but it's not pointing in the right direction...
Would revealing the actual number of times I hash the password weaken my security?
No, it shouldn't. If it does, you have a problem somewhere else in the algorithm.
Does a 100 character salt make any difference over, say, a 20 character one?
Nope. All the salt does is make the hash unique (forcing the attacker to attack each password separately). All you need is a salt long enough to be statistically unique. Thanks to the Birthday Problem, 128 bits is more than enough for a 1/10^12 chance of collision. Which is plenty for us. So that means that 16 characters is the upper bound on salt effectiveness.
That doesn't mean it's bad to use a longer salt. It just means that making it longer than 16 characters doesn't significantly increase the security it provides...
By revealing how I join a password and salt together, would that weaken my security?
If it does, your algorithm is severely flawed. If it does, it amounts to Security Through Obscurity.
The Real Answer
The real answer here is to not re-invent the wheel. Algorithms like PBKDF2 and BCRYPT exist for exactly this purpose. So use them.
Further Information (Note that these talk about PHP, but the concepts are 100% applicable to ASP.NET and C#):
YouTube Video - Password Storage and Hacking in PHP
Blog Post - The Rainbow Table Is Dead
Blog Post - Properly Salting Passwords
PHP password_hash RFC
Blog Post - Seven Ways To Screw Up BCrypt
In theory, your hashing scheme sounds ok. In practice, it sounds like you have rolled your own crypto, which is bad. Use bcrypt, scrypt, or pbkdf2. All of these are designed by security professionals.
Not really, but I don't think anyone needs to know that anyway.
No. It just needs to be unique to every user. The purpose of salt is to prevent precalculation of hashes/rainbow table attacks.
This doesn't apply once you make use of bcrypt (or scrypt or pbkdf2)
http://security.stackexchange.com has some topics on the subject, you should check them out.
Some extra notes - serious attackers will crack sha512 hashes way faster than your laptop. For example you could rent a server with a few Tesla GPU's from Amazon or similar, and start cracking at a few billion hashes/second rate. Scrypt makes some effort trying to prevent this by using memory intensive operations.
6 characters minimum for password is not enough, go with at least 8. A related image, I haven't verified the times but it gives a rough estimate and gives you the general idea (excluding dictionary attacks, which can target longer passwords):

Passwords and different types of encryption

I know, I know, similar questions have been asked millions and billions of times already, but since most of them got a different flavor, I got one of my own.
Currently I'm working on a website that is meant to be launched all across my country, therefore, needs some kind of protection for user system.
I've been lately reading alot about password encryption, hashing, salting.. you name it, but after reading that much of articles, I get confused.
One says that plain SHA512 encryption is enough for a password, others say that you have to use "salt" no matter what you would do, and then there are guys who say that you should build a whole new machine for password encryption because that way no one will be able to get it.
For now I'm using hash_hmac(); with SHA512, plus, password gets random SHA1 salt and the last part, defined random md5(); key. For most of us it'll sound secure, but is it?
I recently read here on SO, that bcrypt(); (now known as crypt(); with Blowfish hashing) is the most secure way. After reading PHP manual about crypt(); and associated stuff, I'm confused.
Basicly, the question is, will my hash_hmac(); beat the hell out of Blowfished crypt(); or vice-versa?
And one more, maybe there are more secure options for password hashing?
The key to proper application of cryptography is to define with enough precision what properties you are after.
Usually, when someone wants to hash passwords, it is in the following context: a server is authenticating users; users show their password, through a confidential channel (HTTPS...). Thus, the server must store user passwords, or at least store something which can be used to verify a password. We do not want to store the passwords "as is" because an attacker gaining read access to the server database would then learn all passwords. This is our attack model.
A password is something which fits in the brain of the average user, hence it cannot be fully unguessable. A few users will choose very long passwords with high entropy, but most will select passwords with an entropy no higher than, say, 32 bits. This is a way of saying that an attacker will have to "try" on average less than 231 (about 2 billions) potential passwords before finding the right one.
Whatever the server stores, it is sufficient to verify a password; hence, our attacker has all the data needed to try passwords, limited only by the computing power he can muster. This is known as an offline dictionary attack.
One must assume that our attacker can crack one password. At that point we may hope for two properties:
cracking a single password should be difficult (a matter of days or weeks, rather than seconds);
cracking two passwords should be twice as hard as cracking one.
Those two properties call for distinct countermeasures, which can be combined.
1. Slow hash
Hash functions are fast. Computing power is cheap. As a data point, with SHA-1 as hash function, and a 130$ NVidia graphic card, I can hash 160 millions passwords per second. The 231 cost is paid in about 13 seconds. SHA-1 is thus too fast for security.
On the other hand, the user will not see any difference between being authenticated in 1µs, and being authenticated in 1ms. So the trick here is to warp the hash function in a way which makes it slow.
For instance, given a hash function H, use another hash function H' defined as:
H'(x) = H(x || x || x || ... || x)
where '||' means concatenation. In plain words, repeat the input enough times so that computing the H' function takes some non-negligible time. So you set a timing target, e.g. 1ms, and adjust the number of repetitions needed to reach that target. 10ms means that your server will be able to authenticate 10 users per second at the cost of only 10% of its computing power. Note that we are talking about a server storing a hashed password for its own ulterior usage, hence there is no interoperability issue here: each server can use a specific repetition count, tailored for its power.
Suppose now that the attacker can have 100 times your computing power; e.g. the attacker is a bored student -- the nemesis of many security systems -- and can use dozens of computers across his university campus. Also, the attacker may use a more thoroughly optimized implementation of the hash function H (you are talking about PHP but the attacker can do assembly). Moreover, the attacker is patient: users cannot wait for more than a fraction of a second, but a sufficiently bored student may try for several days. Yet, trying 2 billions passwords will still require about 3 full days worth of computing. This is not ultimately secure, but is much better than 13 seconds on a single cheap PC.
2. Salts
A salt is a piece of public data which you hash with the password in order to prevent sharing.
"Sharing" is what happens when the attacker can reuse his hashing efforts over several attacked passwords. This is what happens when the attacker has several hashed passwords (he read the whole database of hashed passwords): whenever he hashes one potential password, he can look it up against all hashed passwords he is trying to attack. We call that a parallel dictionary attack. Another instance of sharing is when the attacker can build a precomputed table of hashed passwords, and then use his table repeatedly (by simple lookups). The fabled rainbow table is just a special case of a precomputed table (that's just a time-memory trade-off which allows for using a precomputed table much bigger than what would fit on a hard disk; but building the table still requires hashing each potential password). Space-time wise, parallel attacks and precomputed tables are the same attack.
Salting defeats sharing. The salt is a public data element which alters the hashing process (one could say that the salt selects the hash function among a whole set of distinct functions). The point of the salt is that it is unique for each password. The attacker can no longer share cracking efforts because any precomputed table would have to use a specific salt and would be useless against a password hashed with a distinct salt.
The salt must be used to verify a password, hence the server must store, for each hashed password, the salt value which was used to hash that password. In a database, that's just an extra column. Or you could concatenate the salt and the hash password in a single blob; that's just a matter of data encoding and it is up to you.
Assuming S to be the salt (i.e. some bytes), the hashing process for password p is: H'(S||p) (with the H' function defined in the previous section). That's it!
The point of the salt is to be, as much as possible, unique to each hashed password. A simple way to achieve that is to use random salts: whenever a password is created or changed, use a random generator to get 16 random bytes. 16 bytes ought to be enough to make salt reuse highly improbable. Note that the salt should be unique for each password: using the user name as a salt is not sufficient (some distinct server instances may have users with the same name -- how many "bob"s exist out there ? -- and, also, some users change their password, and the new password should not use the same salt than the previous password).
3. Choice of hash function
The H' hash function is built over a hash function H. Some traditional implementations have used encryption algorithms twisted into hash functions (e.g. DES for Unix's crypt()). This has promoted the use of the "encrypted password" expression, although it is not proper (the password is not encrypted because there is no decryption process; the correct term is "hashed password"). It seems safer, however, to use a real hash function, designed for the purpose of hashing.
The most used hash functions are: MD5, SHA-1, SHA-256, SHA-512 (the latter two are collectively known as "SHA-2"). Some weaknesses have been found in MD5 and SHA-1. Those weaknesses have serious impact for some usages, but not for what is described above (the weaknesses are about collisions, whereas we work here on preimage resistance). However, it is better public relations to choose SHA-256 or SHA-512: if you use MD5 or SHA-1, you may have to justify yourself. SHA-256 and SHA-512 differ by their output size and performance (on some systems, SHA-256 is much faster than SHA-512, and on others SHA-512 is faster than SHA-256). However, performance is not an issue here (regardless of the hash function intrinsic speed, we make it much slower through input repetitions), and the 256 bits of SHA-256 output are more than enough. Truncating the hash function output to the first n bits, in order to save on storage costs, is cryptographically valid, as long as you keep at least 128 bits (n >= 128).
4. Conclusion
Whenever you create or modify a password, generate a new random salt S (16 bytes). Then hash the password p as SHA-256(S||p||S||p||S||p||...||S||p) where the 'S||p' pattern is repeated enough times to that the hashing process takes 10ms. Store both S and the hash result. To verify a user password, retrieve S, recompute the hash, and compare it with the stored value.
And you will live longer and happier.
This question raises multiple points, each of which need to be addressed individually.
Firstly you should not engineer your own encryption algorithm. The argument that something is secure because it is not mainstream is completely invalid. Any algorithm you might develop will only be as strong as your understanding of cryptography.
The average developer does not have a grasp on the mathematical concepts necessary to create a strong algorithm, should your application be compromised, then your completely untested algorithm will be the only thing standing between an attacker and your users personal information, and a suitably motivated attacker will probably defeat your custom encryption much faster than they could had you used a time tested algorithm.
Using a salt is a very good idea. Because the hash is generated using both the salt and password value, a brute force attack on the hashed data becomes excessively expensive because the dictionary of hashed passwords used by an attacker would not take into account the salt value used when generating the hashes.
I'm not the most qualified person to comment on algorithm selection, so I'll leave that to somebody else.
I'm not a PHP developer, but I have some experience with encryption. My first recommendation is as Crippledsmurf suggested, absolutely don't try to "roll your own" encryption. It will have disaster written all over it.
You say you're using hash_hmac() currently. If you're just protecting user accounts and some basic information (name, address, email etc.) and not anything important such as SSN, credit cards, I think you're safe to stick with what you have.
With encryption we'd all like the most secure, complex vault to secure our stuff, but the question is, why have a huge safe door to protect things no-one would realistically want? You have to balance the type and strength of encryption you use against what you are protecting and the risk of it being taken.
Currently, if you are encrypting your information, even at a basic level, you already beat the hell out of 90% of sites and applications out there - who still store in plain text. You're using a salt (excellent idea) and you're making it extremely difficult to decrypt the information (the md5 key is good).
Make a call - is this worth protecting further. If not, don't waste your time and move on.

How does using a salt make a password more secure if it is stored in the database?

I am learning Rails, at the moment, but the answer doesn't have to be Rails specific.
So, as I understand it, a secure password system works like this:
User creates password
System encrypts password with an encryption algorithm (say SHA2).
Store hash of encrypted password in database.
Upon login attempt:
User tries to login
System creates hash of attempt with same encryption algorithm
System compares hash of attempt with hash of password in the database.
If match, they get let in. If not, they have to try again.
As I understand it, this approach is subject to a rainbow attack — wherein the following can happen.
An attacker can write a script that essentially tries every permutation of characters, numbers and symbols, creates a hash with the same encryption algorithm and compares them against the hash in the database.
So the way around it is to combine the hash with a unique salt. In many cases, the current date and time (down to milliseconds) that the user registers.
However, this salt is stored in the database column 'salt'.
So my question is, how does this change the fact that if the attacker got access to the database in the first place and has the hash created for the 'real' password and also has the hash for the salt, how is this not just as subject to a rainbow attack? Because, the theory would be that he tries every permutation + the salt hash and compare the outcome with the password hash. Just might take a bit longer, but I don't see how it is foolproof.
Forgive my ignorance, I am just learning this stuff and this just never made much sense to me.
The primary advantage of a salt (chosen at random) is that even if two people use the same password, the hash will be different because the salts will be different. This means that the attacker can't precompute the hashes of common passwords because there are too many different salt values.
Note that the salt does not have to be kept secret; it just has to be big enough (64-bits, say) and random enough that two people using the same password have a vanishingly small chance of also using the same salt. (You could, if you wanted to, check that the salt was unique.)
First of all, what you've described isn't a rainbow attack, it's a dictionary attack.
Second, the primary point of using salt is that it just makes life more difficult for the attacker. For example, if you add a 32-bit salt to each pass-phrase, the attacker has to hash and re-hash each input in the dictionary ~4 billion times, and store the results from all of those to have a successful attack.
To have any hope of being at all effective, a dictionary needs to include something like a million inputs (and a million matching results). You mentioned SHA-1, so let's use that for our example. It produces a 20-byte (160-bit) result. Let's guess that an average input is something like 8 characters long. That means a dictionary needs to be something like 28 megabytes. With a 32-bit salt, however, both the size and time to produce the dictionary get multiplied by 232-1.
Just as an extremely rough approximation, let's say producing an (unsalted) dictionary took an hour. Doing the same with a 32-bit salt would take 232-1 hours, which works out to around 15 years. There aren't very many people willing to spend that amount of time on an attack.
Since you mention rainbow tables, I'll add that they're typically even larger and slower to start with. A typical rainbow table will easily fill a DVD, and multiplying that by 232-1 gives a large enough number that storage becomes a serious problem as well (as in, that's more than all the storage built in the entire history of computers, at least on planet earth).
The attacker cannot do a rainbow-table attack and has to brute-force which is a lot less efficient.

Resources