I am using PBEWithMD5AndDES algorithm to encrypt a password. I need to store encrypted password in a db column. Based on password length, i have to decide the length of db column. I want to know what can be the max length of a password that can be encrypted using the mentioned algorithm.
If this is a password for access to something(s) on your system, you shouldn't store it encrypted at all, you should instead store an irreversible and slow hash, sometimes disguised as a PBKDF (password-based key derivation function). This provides much better security, which is offtopic for SO but has been discussed thousands of times at great length on security.SX and to a somewhat lesser extent on crypto.SX. See https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords for the canonical, and many links.
If you actually do need to store encrypted, PBEwithMD5andDES is not secure. Any value you store encrypted with this algorithm can be easily decrypted in hours at most and probably seconds by a competent attacker, so it's hardly worth the effort of encrypting.
Those said, to answer the only Q you asked: there is no inherent limit on the size of data that can be encrypted insecurely with this algorithm. Like all Java Cipher instances it uses an init, update, doFinal structure, and although the Java language limits the size of the byte[] arguments passed or returned on one call to slightly less than 2^31 (about 2,100,000,000) bytes, you can make any number of calls. (For secure ciphers there are some limits on data size, depending on algorithm and mode, to remain secure, but since this one is not secure to start with it doesn't lose anything by exceeding the data sizes that could reduce security.)
It will take increasing amounts of time to process more data, because this cipher uses CBC mode which cannot be parallelized or pipelined on encrypt. I haven't measured and anyway it will vary depending on your hardware and to some extent your Java version, but if for example you want to encrypt 1,000,000,000,000,000,000 bytes it will probably take somewhere in the range of 300,000 years. Your computer may not last that long.
OTOH since you want to store the result in a database, there will almost certainly be some limit there. Even if there is no architectural limit, or no effective one, every database is ultimately limited by the size of the disk or equivaent storage it can use, and the total of all storage in data centers (admittedly not all) is estimated as approaching 2,000,000,000,000,000,000 bytes.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Which is the more secure method of storing passwords? I lack the mathematical background to determine the answer myself.
Let's please for the sake of argument assume that all passwords and usernames generated for each of the following methods are randomly generated 6 characters known to be exactly six alpha-humeric-special-character fields and that each are using the same hashing algorithm and the same number of passes.
The standard way. UserName stored in plain text and only the password is to be discovered. Hash(PlaintextPassword + UniqueRecordSalt) = Password stored in DB.
One field recognized as LoginInfo = Hash(Encryption(UserName, Password) + Shared Salt). Neither the UserName nor the Password are ever stored in any other format EVER.
Does the forced cross attempting of username/password combinations offset the weakness of a shared salt as opposed to a unique record salt? This is of course completely IGNORING all affects on usability and focusing entirely on security.
Can anyone point me to any software to help me answer this question myself since I lack the cryptography and mathematical knowledge to arrive at the answer myself?
Please feel free to move this to a more appropriate forum. I didn't know where else to put it. However, I don't feel that it is a topic irrelevant to programmers overall doing their everyday job.
Please read How to securely hash passwords? first. To summarize:
Never use a single pass of any hashing algorithm.
Never roll your own, which is what your example 2 is (and example 1 as well, if + means concatenation).
Username stored in the clear
Salt generated per user, 8-16 random bytes, stored in the clear
in pure binary or encoded into Base64 or hex or whatever you like.
Use BCrypt, SCrypt, or PBKDF2
Until some time after the results of the Password Hashing Competition, at least.
Use as high an work factor/cost/iteration count as your CPU's can handle during expected future peak times.
For PBKDF2 in particular, do not ask for more binary output bytes than the native hash produces. I would say not less than 20 binary bytes, regardless.
SHA-1: output = 20 bytes (40 hex digits)
SHA-224: 20 bytes <= output <= 28 bytes (56 hex digits)
SHA-256: 20 bytes <= output <= 32 bytes (64 hex digits)
SHA-384: 20 bytes <= output <= 48 bytes (96 hex digits)
SHA-512: 20 bytes <= output <= 64 bytes (128 hex digits)
For PBKDF2 in particular, SHA-384 and SHA-512 have a comparative advantage on 64-bit systems for the moment, as 2014 vintage GPU's many attackers will use have a smaller margin of advantage for 64-bit operations over your defensive CPU's than they would on 32-bit operations.
If you want an example, then perhaps look at PHP source code, in particular the password_hash() and password_verify() functions, per the PHP.net Password Hashing FAQ.
Alternately, I have a variety of (currently very crude) password hashing examples at my github repositories. Right now it's almost entirely PBKDF2, but I will be adding BCrypt, SCrypt, and so on in the future.
As you say option 1 is the standard way to store passwords. As long as you use a secure hash function (eg. NIST recommend PBKDF2) with a unique salt, your passwords are secure. So I would recommend this option.
Option 2 doesn't really make sense. You cant 'undo' a hash function, so why encrypt its contents? You would then also have to store the encryption key somewhere which is different issue entirely.
Also what do you mean by a shared salt? If you always use the same salt then that defeats the point of salting your hashes. A unique salt per row is the way to go.
I would say that combining the username and password into a single hash is overcomplicating things, and limits your options in development, since you can't get a row from the DB given a username.
Say you want to lock out a user after 5 incorrect password attempts. With a standard plain-text username and hashed pw, you can just have a 'login_attempt_count' column and update the row for that user each time their password is incorrectly entered.
If your username and passwords are hashed together, you have no way of identifying which row to update with a login attempt count, since a hashed correct username with a wrong password wont match any hash.
I guess you could have some kind of mapping function to get a row_id given a username, but I would say its just needlessly complicated, and with greater complication you have a bigger chance of security flaws.
As I said, I would just go with option 1. It's the industry standard way to store passwords, and its secure enough for pretty much any application (as long as you use a modern secure hash function).
I want to compare a hash function and a RSA encryption with another parameter.
I have an algorithm with some hash function and I want to claim that computation load of these hashes is less than one RSA.
Can I say compare them with multiplication parameter, for example how many multiplication each of them has?
How can I compare them in communication load? How can I say that what the length of output in RSA is?
It sounds like you're trying to compare apples and oranges.
A hash function is generally expected to accept arbitrarily long inputs, and the time needed to compute it should generally scale linearly with the length of the input. Thus, a useful measure of hash function performance would be, say, "megabytes per second".
(Specifically, that would be a measure of throughput, which is the relevant measure when hashing long inputs. For short messages, a more relevant measure is the latency, which is basically the minimum time needed to hash zero-length input. Given the throughput and the latency, one can generally calculate a fairly good approximation of the time needed to hash an input of any given length as time = latency + length / throughput.)
RSA, on the other hand, can only encrypt messages shorter than the modulus, which is chosen at the time the key is generated. (Typical modulus sizes might be, say, from 1024 to 4096 bits.) To "encrypt a long message with RSA" one would normally use hybrid encryption: first encrypt the message using a symmetric cipher like AES, using a suitable mode of operation and a randomly chosen key, and then encrypt the AES key with RSA.
The same length limits apply to signing messages with RSA — by itself, RSA can only sign messages shorter than the modulus. The standard workaround in this case is to first hash the message, and then sign the hash value. (There's also a lot of important details like padding involved that I'm not going to go into here, since we're not on crypto.SE, but which are absolutely crucial for security.)
The point is that, in both cases, the RSA operation itself takes a fixed amount of time regardless of the message length, and thus, for sufficiently long messages, most of the time will be consumed by AES or the hash function, not by RSA itself. So when you say you want to "claim that computation load of these hashes is less than one RSA", I would say that's meaningless, at least unless you fixed a specific input length for your hash. (And if you did, my next question would be "what's so special about that particular input length?")
I have some offline files that have to be password-protected. My strategy is as follows:
Cipher Algorithm: AES, 128-bit block, 256-bit key (PBKDF2-SHA-256
10000 iterations with a random salt stored plainly elsewhere)
Whole file is divided into pages with page size 1024 bytes
For a complete page, CBC is used
For an incomplete page,
Use CBC with cipher text stealing if it has at least one block
Use CTR if it has less one block
With this setup, we can keep the same file size
IV or nonce will be based on the salt and deterministic. Since this is not for network communication, I reckon we don't need to concern about replay attacks?
Question: Will this kind of mixing lower the security? Would we better off just use CTR throughout the whole file?
You're better off just using CTR for the entire file. Otherwise, you're adding a lot of extra work, in supporting multiple modes (CBC, CTR, and CTS) and determining which mode to use. It's not clear there's any value in doing so, since CTR is perfectly fine for encrypting a large amount of data.
Are you planning on reusing the same IV for each page? You should expand a bit on what you mean by a page, but I'd recommend unique IV's for each page. Are these pages addressable somehow? You might want to look into some of the new disk encryption modes for an idea on generating unique IV's
You also really need to MAC your data. In CTR for example, if someone flips a bit of the ciphertext, it'll flip the bit when you decrypt, and you'll never know it was tampered with. You can use HMAC or if you want to simplify your entire scheme, use AES GCM mode, which combines CTR for encryption and GMAC for integrity
There are a few things you need to know about CTR mode. After you know them all you could happily apply a stream cipher in your situation:
never ever reuse a data key with the same nonce;
above, not even in time;
be aware that CTR mode really shows the size of the encrypted data; always encrypting full blocks can hide this somewhat (in general a 1024 byte block takes as much as a single bit block if the file system boundaries are honored);
CTR mode in itself does not provide authentication (for completion, as this was already discussed);
If you don't keep to the first two rules, an attacker will immediately see the place of the edit and the attacker will be able to retrieve data directly related to the plain text.
On a possitive node:
you can happily use the offset (in, e.g., blocks) in the file to be part of the nonce;
it is very easy to seek in files, buffer ciphertext and create multi-threaded code around CTR.
And in general:
it pays off to use a data specific key specific sets of files, in such a way that if a key is compromised or changed that you don't have to re-encrypt everything;
think very well about how your keys are used, stored, backed up etc. Key management is the hardest part;
I know, I know, similar questions have been asked millions and billions of times already, but since most of them got a different flavor, I got one of my own.
Currently I'm working on a website that is meant to be launched all across my country, therefore, needs some kind of protection for user system.
I've been lately reading alot about password encryption, hashing, salting.. you name it, but after reading that much of articles, I get confused.
One says that plain SHA512 encryption is enough for a password, others say that you have to use "salt" no matter what you would do, and then there are guys who say that you should build a whole new machine for password encryption because that way no one will be able to get it.
For now I'm using hash_hmac(); with SHA512, plus, password gets random SHA1 salt and the last part, defined random md5(); key. For most of us it'll sound secure, but is it?
I recently read here on SO, that bcrypt(); (now known as crypt(); with Blowfish hashing) is the most secure way. After reading PHP manual about crypt(); and associated stuff, I'm confused.
Basicly, the question is, will my hash_hmac(); beat the hell out of Blowfished crypt(); or vice-versa?
And one more, maybe there are more secure options for password hashing?
The key to proper application of cryptography is to define with enough precision what properties you are after.
Usually, when someone wants to hash passwords, it is in the following context: a server is authenticating users; users show their password, through a confidential channel (HTTPS...). Thus, the server must store user passwords, or at least store something which can be used to verify a password. We do not want to store the passwords "as is" because an attacker gaining read access to the server database would then learn all passwords. This is our attack model.
A password is something which fits in the brain of the average user, hence it cannot be fully unguessable. A few users will choose very long passwords with high entropy, but most will select passwords with an entropy no higher than, say, 32 bits. This is a way of saying that an attacker will have to "try" on average less than 231 (about 2 billions) potential passwords before finding the right one.
Whatever the server stores, it is sufficient to verify a password; hence, our attacker has all the data needed to try passwords, limited only by the computing power he can muster. This is known as an offline dictionary attack.
One must assume that our attacker can crack one password. At that point we may hope for two properties:
cracking a single password should be difficult (a matter of days or weeks, rather than seconds);
cracking two passwords should be twice as hard as cracking one.
Those two properties call for distinct countermeasures, which can be combined.
1. Slow hash
Hash functions are fast. Computing power is cheap. As a data point, with SHA-1 as hash function, and a 130$ NVidia graphic card, I can hash 160 millions passwords per second. The 231 cost is paid in about 13 seconds. SHA-1 is thus too fast for security.
On the other hand, the user will not see any difference between being authenticated in 1µs, and being authenticated in 1ms. So the trick here is to warp the hash function in a way which makes it slow.
For instance, given a hash function H, use another hash function H' defined as:
H'(x) = H(x || x || x || ... || x)
where '||' means concatenation. In plain words, repeat the input enough times so that computing the H' function takes some non-negligible time. So you set a timing target, e.g. 1ms, and adjust the number of repetitions needed to reach that target. 10ms means that your server will be able to authenticate 10 users per second at the cost of only 10% of its computing power. Note that we are talking about a server storing a hashed password for its own ulterior usage, hence there is no interoperability issue here: each server can use a specific repetition count, tailored for its power.
Suppose now that the attacker can have 100 times your computing power; e.g. the attacker is a bored student -- the nemesis of many security systems -- and can use dozens of computers across his university campus. Also, the attacker may use a more thoroughly optimized implementation of the hash function H (you are talking about PHP but the attacker can do assembly). Moreover, the attacker is patient: users cannot wait for more than a fraction of a second, but a sufficiently bored student may try for several days. Yet, trying 2 billions passwords will still require about 3 full days worth of computing. This is not ultimately secure, but is much better than 13 seconds on a single cheap PC.
2. Salts
A salt is a piece of public data which you hash with the password in order to prevent sharing.
"Sharing" is what happens when the attacker can reuse his hashing efforts over several attacked passwords. This is what happens when the attacker has several hashed passwords (he read the whole database of hashed passwords): whenever he hashes one potential password, he can look it up against all hashed passwords he is trying to attack. We call that a parallel dictionary attack. Another instance of sharing is when the attacker can build a precomputed table of hashed passwords, and then use his table repeatedly (by simple lookups). The fabled rainbow table is just a special case of a precomputed table (that's just a time-memory trade-off which allows for using a precomputed table much bigger than what would fit on a hard disk; but building the table still requires hashing each potential password). Space-time wise, parallel attacks and precomputed tables are the same attack.
Salting defeats sharing. The salt is a public data element which alters the hashing process (one could say that the salt selects the hash function among a whole set of distinct functions). The point of the salt is that it is unique for each password. The attacker can no longer share cracking efforts because any precomputed table would have to use a specific salt and would be useless against a password hashed with a distinct salt.
The salt must be used to verify a password, hence the server must store, for each hashed password, the salt value which was used to hash that password. In a database, that's just an extra column. Or you could concatenate the salt and the hash password in a single blob; that's just a matter of data encoding and it is up to you.
Assuming S to be the salt (i.e. some bytes), the hashing process for password p is: H'(S||p) (with the H' function defined in the previous section). That's it!
The point of the salt is to be, as much as possible, unique to each hashed password. A simple way to achieve that is to use random salts: whenever a password is created or changed, use a random generator to get 16 random bytes. 16 bytes ought to be enough to make salt reuse highly improbable. Note that the salt should be unique for each password: using the user name as a salt is not sufficient (some distinct server instances may have users with the same name -- how many "bob"s exist out there ? -- and, also, some users change their password, and the new password should not use the same salt than the previous password).
3. Choice of hash function
The H' hash function is built over a hash function H. Some traditional implementations have used encryption algorithms twisted into hash functions (e.g. DES for Unix's crypt()). This has promoted the use of the "encrypted password" expression, although it is not proper (the password is not encrypted because there is no decryption process; the correct term is "hashed password"). It seems safer, however, to use a real hash function, designed for the purpose of hashing.
The most used hash functions are: MD5, SHA-1, SHA-256, SHA-512 (the latter two are collectively known as "SHA-2"). Some weaknesses have been found in MD5 and SHA-1. Those weaknesses have serious impact for some usages, but not for what is described above (the weaknesses are about collisions, whereas we work here on preimage resistance). However, it is better public relations to choose SHA-256 or SHA-512: if you use MD5 or SHA-1, you may have to justify yourself. SHA-256 and SHA-512 differ by their output size and performance (on some systems, SHA-256 is much faster than SHA-512, and on others SHA-512 is faster than SHA-256). However, performance is not an issue here (regardless of the hash function intrinsic speed, we make it much slower through input repetitions), and the 256 bits of SHA-256 output are more than enough. Truncating the hash function output to the first n bits, in order to save on storage costs, is cryptographically valid, as long as you keep at least 128 bits (n >= 128).
4. Conclusion
Whenever you create or modify a password, generate a new random salt S (16 bytes). Then hash the password p as SHA-256(S||p||S||p||S||p||...||S||p) where the 'S||p' pattern is repeated enough times to that the hashing process takes 10ms. Store both S and the hash result. To verify a user password, retrieve S, recompute the hash, and compare it with the stored value.
And you will live longer and happier.
This question raises multiple points, each of which need to be addressed individually.
Firstly you should not engineer your own encryption algorithm. The argument that something is secure because it is not mainstream is completely invalid. Any algorithm you might develop will only be as strong as your understanding of cryptography.
The average developer does not have a grasp on the mathematical concepts necessary to create a strong algorithm, should your application be compromised, then your completely untested algorithm will be the only thing standing between an attacker and your users personal information, and a suitably motivated attacker will probably defeat your custom encryption much faster than they could had you used a time tested algorithm.
Using a salt is a very good idea. Because the hash is generated using both the salt and password value, a brute force attack on the hashed data becomes excessively expensive because the dictionary of hashed passwords used by an attacker would not take into account the salt value used when generating the hashes.
I'm not the most qualified person to comment on algorithm selection, so I'll leave that to somebody else.
I'm not a PHP developer, but I have some experience with encryption. My first recommendation is as Crippledsmurf suggested, absolutely don't try to "roll your own" encryption. It will have disaster written all over it.
You say you're using hash_hmac() currently. If you're just protecting user accounts and some basic information (name, address, email etc.) and not anything important such as SSN, credit cards, I think you're safe to stick with what you have.
With encryption we'd all like the most secure, complex vault to secure our stuff, but the question is, why have a huge safe door to protect things no-one would realistically want? You have to balance the type and strength of encryption you use against what you are protecting and the risk of it being taken.
Currently, if you are encrypting your information, even at a basic level, you already beat the hell out of 90% of sites and applications out there - who still store in plain text. You're using a salt (excellent idea) and you're making it extremely difficult to decrypt the information (the md5 key is good).
Make a call - is this worth protecting further. If not, don't waste your time and move on.
I am learning Rails, at the moment, but the answer doesn't have to be Rails specific.
So, as I understand it, a secure password system works like this:
User creates password
System encrypts password with an encryption algorithm (say SHA2).
Store hash of encrypted password in database.
Upon login attempt:
User tries to login
System creates hash of attempt with same encryption algorithm
System compares hash of attempt with hash of password in the database.
If match, they get let in. If not, they have to try again.
As I understand it, this approach is subject to a rainbow attack — wherein the following can happen.
An attacker can write a script that essentially tries every permutation of characters, numbers and symbols, creates a hash with the same encryption algorithm and compares them against the hash in the database.
So the way around it is to combine the hash with a unique salt. In many cases, the current date and time (down to milliseconds) that the user registers.
However, this salt is stored in the database column 'salt'.
So my question is, how does this change the fact that if the attacker got access to the database in the first place and has the hash created for the 'real' password and also has the hash for the salt, how is this not just as subject to a rainbow attack? Because, the theory would be that he tries every permutation + the salt hash and compare the outcome with the password hash. Just might take a bit longer, but I don't see how it is foolproof.
Forgive my ignorance, I am just learning this stuff and this just never made much sense to me.
The primary advantage of a salt (chosen at random) is that even if two people use the same password, the hash will be different because the salts will be different. This means that the attacker can't precompute the hashes of common passwords because there are too many different salt values.
Note that the salt does not have to be kept secret; it just has to be big enough (64-bits, say) and random enough that two people using the same password have a vanishingly small chance of also using the same salt. (You could, if you wanted to, check that the salt was unique.)
First of all, what you've described isn't a rainbow attack, it's a dictionary attack.
Second, the primary point of using salt is that it just makes life more difficult for the attacker. For example, if you add a 32-bit salt to each pass-phrase, the attacker has to hash and re-hash each input in the dictionary ~4 billion times, and store the results from all of those to have a successful attack.
To have any hope of being at all effective, a dictionary needs to include something like a million inputs (and a million matching results). You mentioned SHA-1, so let's use that for our example. It produces a 20-byte (160-bit) result. Let's guess that an average input is something like 8 characters long. That means a dictionary needs to be something like 28 megabytes. With a 32-bit salt, however, both the size and time to produce the dictionary get multiplied by 232-1.
Just as an extremely rough approximation, let's say producing an (unsalted) dictionary took an hour. Doing the same with a 32-bit salt would take 232-1 hours, which works out to around 15 years. There aren't very many people willing to spend that amount of time on an attack.
Since you mention rainbow tables, I'll add that they're typically even larger and slower to start with. A typical rainbow table will easily fill a DVD, and multiplying that by 232-1 gives a large enough number that storage becomes a serious problem as well (as in, that's more than all the storage built in the entire history of computers, at least on planet earth).
The attacker cannot do a rainbow-table attack and has to brute-force which is a lot less efficient.