Why the salt is 8 characters long? [duplicate]

Why the salt is 8 characters long? [duplicate] - encryption

This question already has answers here:
What is the optimal length for user password salt? [closed]
(5 answers)
Closed 4 years ago.
Why is the salt not more than 8 ~ 16 characters long?
Also, why in most cases is it in the front or end of the password, and not in different positions?
Is this to make it harder for the breaker? Or is it useless?

Because more salt doesn't serve a useful purpose.
The point of salt is to prevent some parallel attacks from working in a reasonable amount of time/memory, and/or drive space. (You can no longer have a table that says Hash A => Password A, because even if you had enough disk space to construct a rainbow table, the salt makes the number of possible entries way beyond feasibility. And you can no longer hash a potential password once and compare it against a bunch of hashes at a time, because the salt is quite likely to be different for each hash.)
16 characters gives you somewhere between 10^16 and 96^16 times as many possibilities, which already fits the definition of "way beyond feasibility". Past a certain point, you're simply increasing your own storage requirements for no significant benefit.

The salt of 8 characters is enough against any imaginable in real life dictionary attack, it makes any dictionary or rainbow table useless.

Related

Building a unique ID without collisions

I'm playing around with system design and have been reading up on url shortener. I realize there are many questions around this topic, but have some specific questions with respect to hashing and the order in which I hash + encode.
Input: https://example.com/owjpojwepofjwpoejfpwjepfojpwejfp/wefoijhwioejfiowef/weoifhwoiehjfiowef
Output: https://example.com/abr4fna
If I run this input through md5 I get the following 9e91e9c2a7ce0f0d11b475d2abfb8593. Clearly, this exceeds the length that I want, so I could truncate the substring from (0,7]. The problem is, to some degree, I can still have a collision since the prefix of the md5 is not guaranteed to be unique as the amount of urls generated increases within the service.
I do not want to have to check the database if I've already used this ID before as that would increase the amount of reads I'm doing proportional to the number of writes I'm doing. In addition, there could be concurrency issues as I grow the number of application servers doing the hash generation and storage.
I see people mentioning the use of base64 encoding the output hash, but what value does this add after the hash? Is it because I grow the amount of unique combinations by 64^n where n is the length of my hash versus md5 being only 36^n?
Thanks. Just interested in having this discussion.
edit:
As I understand, we purely doing the encoding piece to ensure we do not have transmission failures if the receiving system has issues interpreting binary data from the output hash - so it's used for the pure sake of display.

By definition, you cannot hash a large domain and expect to get a smaller domain without collisions. A hash is useful because it is one-way and would require a computationally infeasible amount of tries to find those collisions. However, with a 7 character output and a large input domain, it will be exceptionally easy to generate collisions even by chance.
You're currently using 7 hexadecimal digits. Each hexadecimal digit represents 4 bits. So you have 28 bits or 2^28 possible values. That's around 256 million possible values. So if you guess long enough you'll get a collision soon enough. With base64 you'd have 6 bits per character instead (2^6 = 64, hence the name). That means that you increase the bit size with 7 * 2 = 14 bits, or around 16 thousand times as much, but you'd still be pretty far from collision free.
Actually, for any cryptographic reassurance when taking in the birthday bound, the 16 byte output of MD5 is about the absolute minimum size of hash you want to avoid collisions. Of course, MD5 hashn't been deprecated for nothing, you'd really want to use SHA-256.

Salt practices clarification [duplicate]

This question already has answers here:
How does password salt help against a rainbow table attack?
(10 answers)
Closed 8 years ago.
I was recently reading Application Security For The Android Platform by Jeff Six and I came across a statement that I found puzzling. In the encryption section while describing salts and hashing functions this statement was made
Just like with IVs [Initialization Vector], salt values should be random but they do not need to be kept secret.
Is this true? Because my understanding of salts and hashing functions was that this statement is just wrong and the salt needs to be protected because if the salt is released a new rainbow table can be generated making the salt unnecessary? Is this correct? Or does the salt really not have to be kept secret and why is this?

The salt doesn't have to be kept secret because it will be a 64-bit or 128-bit random number, and the attacker would be unable to use any rainbow table that didn't incorporate that salt. In effect, the attacker would be brute-forcing each individual password (because each password will have its own salt, of course — no two passwords should be hashed with the same salt).
The rainbow table attack is based on storing precomputed hashes for all possible password inputs (up to a certain length, naturally). It is infeasible to store rainbow tables for every conceivable salt of 128-bit complexity: a rainbow table to cover just single byte passwords that accounts for 128-bit salts would be approximately 280 Terabytes (that's 1027: one thousand trillion trillion 1TB hard drives).

Which method of password storage is more secure [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Which is the more secure method of storing passwords? I lack the mathematical background to determine the answer myself.
Let's please for the sake of argument assume that all passwords and usernames generated for each of the following methods are randomly generated 6 characters known to be exactly six alpha-humeric-special-character fields and that each are using the same hashing algorithm and the same number of passes.
The standard way. UserName stored in plain text and only the password is to be discovered. Hash(PlaintextPassword + UniqueRecordSalt) = Password stored in DB.
One field recognized as LoginInfo = Hash(Encryption(UserName, Password) + Shared Salt). Neither the UserName nor the Password are ever stored in any other format EVER.
Does the forced cross attempting of username/password combinations offset the weakness of a shared salt as opposed to a unique record salt? This is of course completely IGNORING all affects on usability and focusing entirely on security.
Can anyone point me to any software to help me answer this question myself since I lack the cryptography and mathematical knowledge to arrive at the answer myself?
Please feel free to move this to a more appropriate forum. I didn't know where else to put it. However, I don't feel that it is a topic irrelevant to programmers overall doing their everyday job.

Please read How to securely hash passwords? first. To summarize:
Never use a single pass of any hashing algorithm.
Never roll your own, which is what your example 2 is (and example 1 as well, if + means concatenation).
Username stored in the clear
Salt generated per user, 8-16 random bytes, stored in the clear
in pure binary or encoded into Base64 or hex or whatever you like.
Use BCrypt, SCrypt, or PBKDF2
Until some time after the results of the Password Hashing Competition, at least.
Use as high an work factor/cost/iteration count as your CPU's can handle during expected future peak times.
For PBKDF2 in particular, do not ask for more binary output bytes than the native hash produces. I would say not less than 20 binary bytes, regardless.
SHA-1: output = 20 bytes (40 hex digits)
SHA-224: 20 bytes <= output <= 28 bytes (56 hex digits)
SHA-256: 20 bytes <= output <= 32 bytes (64 hex digits)
SHA-384: 20 bytes <= output <= 48 bytes (96 hex digits)
SHA-512: 20 bytes <= output <= 64 bytes (128 hex digits)
For PBKDF2 in particular, SHA-384 and SHA-512 have a comparative advantage on 64-bit systems for the moment, as 2014 vintage GPU's many attackers will use have a smaller margin of advantage for 64-bit operations over your defensive CPU's than they would on 32-bit operations.
If you want an example, then perhaps look at PHP source code, in particular the password_hash() and password_verify() functions, per the PHP.net Password Hashing FAQ.
Alternately, I have a variety of (currently very crude) password hashing examples at my github repositories. Right now it's almost entirely PBKDF2, but I will be adding BCrypt, SCrypt, and so on in the future.

As you say option 1 is the standard way to store passwords. As long as you use a secure hash function (eg. NIST recommend PBKDF2) with a unique salt, your passwords are secure. So I would recommend this option.
Option 2 doesn't really make sense. You cant 'undo' a hash function, so why encrypt its contents? You would then also have to store the encryption key somewhere which is different issue entirely.
Also what do you mean by a shared salt? If you always use the same salt then that defeats the point of salting your hashes. A unique salt per row is the way to go.
I would say that combining the username and password into a single hash is overcomplicating things, and limits your options in development, since you can't get a row from the DB given a username.
Say you want to lock out a user after 5 incorrect password attempts. With a standard plain-text username and hashed pw, you can just have a 'login_attempt_count' column and update the row for that user each time their password is incorrectly entered.
If your username and passwords are hashed together, you have no way of identifying which row to update with a login attempt count, since a hashed correct username with a wrong password wont match any hash.
I guess you could have some kind of mapping function to get a row_id given a username, but I would say its just needlessly complicated, and with greater complication you have a bigger chance of security flaws.
As I said, I would just go with option 1. It's the industry standard way to store passwords, and its secure enough for pretty much any application (as long as you use a modern secure hash function).

Entire range - Reverse MD5 lookup

I am learning about encryption methods and I have a question about MD5.
I have seen there are several websites that have 'rainbow tables' that will give you reverse MD5 lookup, but, they can't lookup all the combinations possible.
For knowledge's sake, my question is this :
Hypothetically, if a group of people were to consider an upper limit (eg. 5 or 6 characters) and decide to map out the entire MD5 hash for all the values inside that range, storing the results in a database to use for reverse lookup.
1. Do you think such a thing is probable.
2. If you can speculate, what kind of scale of resources would this mean?
3. To your knowledge have there been any public or private attempts to do this?
I am not referring to tables that have select entries based on a dictionary, but mapping the entire range upto a certain number of characters.
(I have refered to This question already.)

It is possible. For a small number of characters, it has already been done. In the near future, it will be easy for larger numbers of characters. MD5 isn't getting any stronger.
That's a function of time. To reverse the entire 6-or-fewer-character alphanumeric space would require computing 62^6 entries. That's 56 trillion MD5s. That's doable by a determined small group or easy for a government, right now. In the future, it will be doable on a home computer. Remember, though, that as the number of allowable characters or the maximum length increases, the difficulty increase is exponential.
People already have done it. But, honestly, it doesn't matter - because anyone with half an ounce of sense uses a random salt. If you precompute the entire MD5 space and reverse it, that doesn't mean jack dandy if someone is using key strengthening or a good salt! Read up on salting.

5 or 6 characters is easy. 6 bytes is doable (that's 248 combinations), even with limited hardware.
Namely, a simple Core2 CPU from Intel will be able to hash one password in about 150 clock cycles (assuming you use a SSE2 implementation, which will hash four passwords in parallel in 600 clock cycles). With a 2.4 GHz quad core CPU (that's my PC, not exactly the newest machine available), I can then try about 226 passwords per second. For that kind of job, a massively parallel architecture is fine, hence it makes sense to use a GPU. For maybe 200$, you can buy a NVidia video card which will be about four times faster (i.e. 228 passwords per second). 6 alphanumeric characters (uppercase, lowercase and digits) are close to 236 combinations; trying them all is then a matter of 2(36-28) seconds, which is less than five minutes. With 6 random bytes, it will need 220 seconds, i.e. a bit less than a fortnight.
That's for the CPU cost. If you want to speed up the actual attack, you store the hash results: thus you will not need to recompute all those hashed passwords every time you attack a password (but you still have to do it once). 236 hash results (16 bytes each) mean 1 terabyte. You can buy a harddisk that big for 100$. 248 hash results imply 4096 times that storage space; in plain harddisks this will cost as much as a house: a bit expensive for the average bored student, but affordable for most kinds of governmental or criminal organizations.
Rainbow tables are an optimization trick for the storage. In rough terms, you store only one every t hash results, in exchange of having to do t lookups and t2 hash computations for every attack. E.g., you choose t=1000, you only have to buy four harddisks instead of four thousands, but you will need to make 1000 lookups and a million hashes every time you want to crack a password (this will need a dozen seconds at most, if you do it right).
Hence you have two costs:
The CPU cost is about computing hashes for the complete password space; with a table (rainbow or not) you have to do it once, and then can reuse that computational effort for every attacked password.
The storage cost is about storing the hash results in order to easily attack several passwords. Harddisks are not very expensive, as shown above. Rainbow tables help you lower storage costs.
Salting defeats cost sharing through precomputed tables (whether they are rainbow tables or just plain tables has no effect here: tables are about reusing precomputed values for several attacked passwords, and salts prevent such recycling).
The CPU cost can be increased by defining that the hash procedure is not just a single hash computation; for instance, you can define the "password hash" as applying MD5 over the concatenation of 10000 copies of the password. This will make each attacker guess one
thousand times more expensive. It also makes legitimate password validation one thousands times more expensive, but most users will not mind (the user has just typed his password; he cannot really see whether the password verification took 10ms or 10µs).
Modern Unix-like systems (e.g. Linux) use "MD5" passwords which actually combine salting and iterated hashing, as described above. (Actually, a modern Linux system may use another hash function, such as SHA-256, but that does not change things much here.) So precomputed tables will not help, and the on-the-fly password cracking is expensive. A password with 6 alphanumeric characters can still be cracked within a few days, because 6 characters are kind of weak anyway. Also, many longer passwords are crackable because it turns out that human begins are bad are remembering passwords; hence they will not choose just any random sequence of characters, they will select passwords which have some "meaning". This reduces the space of possible passwords.

It's called a rainbow table, and it's easily defeated with salting.

Yes, it is not only probable, but it's probably been done before.
It depends on whether they are mapping the entire possible range or just a range of ASCII characters. Let's say you need 128 bits + 6 bytes to store each match. That's 22 bytes. You'd need:
6.32 GB to store all lowercase alphabetic combinations [a-z]
405 GB to for all alphabetic combinations [a-zA-Z]
1.13 TB for all alphanumeric combinations [a-zA-Z0-9]
5.24 TB for all combinations that consists of letters, numbers and 18 symbols.
As you see, it increases exponentially, but even at 5.24 TB that's nothing to agencies like, say, the NSA or the CIA. They probably have done it.
As everyone else said, salting can easily defeat rainbow tables and that's almost as important as hashing. Read this: Just hashing is far from enough - How to position against dictionary and rainbow attacks

How can SHA encryption be possible? [duplicate]

This question already has answers here:
Closed 13 years ago.
Duplicate:
Confused about hashes
How can SHA encryption create unique 40 character hash for any string, when there are n infinite number of possible input strings but only a finite number of 40 character hashes?

SHA is not an encryption algorithm, it is a cryptographic hashing algorithm.
Check out this reference at Wikipedia

The simple answer is that it doesn't create a unique 40 character hash for any string - it's inevitable that different strings will have the same hash.
It does try to make sure that close-by string will have very different hashes. 40 characters is a pretty long hash, so the chance of collision is quite low unless you're doing ridiculous numbers of them.

SHA doesn't create a unique 40 character hash for any string. If you create enough hashes, you'll get a collision (two inputs that hash to the same output) eventually. What makes SHA and other hash functions cryptographically useful is that there's no easy way to find two files that will have the same hash.

To elaborate on jdigital's answer:
Since it's a hash algorithm and not an encryption algorithm, there is no need to reverse the operation. This, in turn, means that the result does not need to be unique; there are (in theory) in infinite number of strings that will result in the same hash. Finding out which on those are is practically impossible, though.

Hash algorithms like SHA-1 or the SHA-2 family are used as "one-way" hashes in support of password-based authentication. It is not computationally feasible to find a message (password) that hashes to a given value. So, if an attacker obtains the list of hashed passwords, they can't determine the original passwords.
You are correct that, in general, there are an infinite number of messages that hash to a given value. It's still hard to find one though.

It does not guarantee that two strings will have unique 40 character hashes. What it does is provide an extremely low probability that two strings will have conflicting hashes, and makes it very difficult to create two conflicting documents without just randomly trying inputs.
Generally, a low enough probability of something bad happening is as good as a guarantee that it never will. As long as it's more likely that the world will end when a comet hits it, the chance of a colliding hash isn't generally worth worrying about.
Of course, secure hash algorithms are not perfect. Because they are used in cryptography, they are very valuable things to try and crack. SHA-1, for instance, has been weakened (you can find a collision in 2000 times fewer guesses than just doing random guessing); MD5 has been completely cracked, and security researchers have actually created two certificates which have the same MD5 sum, and got one of them signed by a certificate authority, thus allowing them to use the other one as if it had been signed by the certificate authority. You should not blindly put your faith in cryptographic hashes; once one has been weakened (like SHA-1), it is time to look for a new hash, which is why there is currently a competition to create a new standard hash algorithm.

The function is something like:
hash1 = SHA1(plaintext1)
hash2 = SHA1(plaintext2)
now, hash1 and hash2 can technically be the same. It's a collision. Not common, but possible, and not a problem.
The real magic is in the fact that it's impossible to do this:
plaintext1 = SHA1-REVERSE(hash1)
So you can never reverse it. Handy if you dont want to know what a password is, only that the user gave you the same one both times. Think about it. You have 1024 bytes of input. You get 40 bits of output. How can you EVER reconstruct those 1024 bytes from the 40 - you threw information away. It's just not possible (well, unless you design the algorithm to allow it, I guess....)
Also, if 40 bits isn't enough, use SHA256 or something with a bigger output. And Salt it. Salt is good.
Oh, and as an aside: any website which emails you your password, is not hashing it's passwords. It's either storing them unencrypted (run, run screaming), or encrypting them with a 2 way encryption (DES, AES, public-private key et al - trust them a little more)
There is ZERO reasons for a website to be able to email you your password, or need to store anything but the hash. /rant.

Nice observation. Short answer it can't and leads to collisions which can be exploited in birthday attacks.

The simple answer is: it doesn't create unique hashes. Look at the Pidgeonhole priciple. It's just so unlikely for there to be a collision that nobody has ever found one.