This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm reverse engineering a system that I've been tasked to understand. The system uses tokens and semi-randomised[1] timestamps (as the salt) and an unknown hashing function to generate random keys for authentication.
The strings it produces are 64 characters in length and there appears to be a pattern to the hashing. For example (based on fictional inputs and outputs of unrelated lengths as this question is theory):
salt: 135407067754316
token: aaaa.bbbb.cccc.dddd
This would produce:
hash: d41d8cd98f00b204e9800998ecf8427ed41d8cd98f00b204e9800998ecf8427e
And then changing the salt:
salt: 13540707209819
token: aaaa.bbbb.cccc.dddd
Produces:
hash: d41d9g838fddb275e9800123ecf8427ed41d9g838fddb275e9800123ecf8427e
Which is similar to the first hash (first 3 characters always match), where the only difference in the generation is the key (the semi-randomised timestamp), which leads me to the conclusion that they're using some basic hashing that uses the randomised timestamp as the salt.
Question How would I go about narrowing down how they're encrypting or what function they're using? I can cause generation of any number of hashes I need (with the key and token available to me) however I cannot cause generation of hashes using my token and key of choice. The result is 64 characters in length.
Does a techique exist that will allow me to brute force the generation method if I have 100, 200, 500, 1000, 10000, 100000 examples of salts, tokens and the result?
[1] Semi randomised means they're taking the current timestamp (eg: 1354068826) and adding a random 5 length number to the end, to create something like 135406882613951)
Final note, I know nothing of encryption or hashing or security beyond the basic ideas (input value and salt, get hash).
Related
I'm playing around with system design and have been reading up on url shortener. I realize there are many questions around this topic, but have some specific questions with respect to hashing and the order in which I hash + encode.
Input: https://example.com/owjpojwepofjwpoejfpwjepfojpwejfp/wefoijhwioejfiowef/weoifhwoiehjfiowef
Output: https://example.com/abr4fna
If I run this input through md5 I get the following 9e91e9c2a7ce0f0d11b475d2abfb8593. Clearly, this exceeds the length that I want, so I could truncate the substring from (0,7]. The problem is, to some degree, I can still have a collision since the prefix of the md5 is not guaranteed to be unique as the amount of urls generated increases within the service.
I do not want to have to check the database if I've already used this ID before as that would increase the amount of reads I'm doing proportional to the number of writes I'm doing. In addition, there could be concurrency issues as I grow the number of application servers doing the hash generation and storage.
I see people mentioning the use of base64 encoding the output hash, but what value does this add after the hash? Is it because I grow the amount of unique combinations by 64^n where n is the length of my hash versus md5 being only 36^n?
Thanks. Just interested in having this discussion.
edit:
As I understand, we purely doing the encoding piece to ensure we do not have transmission failures if the receiving system has issues interpreting binary data from the output hash - so it's used for the pure sake of display.
By definition, you cannot hash a large domain and expect to get a smaller domain without collisions. A hash is useful because it is one-way and would require a computationally infeasible amount of tries to find those collisions. However, with a 7 character output and a large input domain, it will be exceptionally easy to generate collisions even by chance.
You're currently using 7 hexadecimal digits. Each hexadecimal digit represents 4 bits. So you have 28 bits or 2^28 possible values. That's around 256 million possible values. So if you guess long enough you'll get a collision soon enough. With base64 you'd have 6 bits per character instead (2^6 = 64, hence the name). That means that you increase the bit size with 7 * 2 = 14 bits, or around 16 thousand times as much, but you'd still be pretty far from collision free.
Actually, for any cryptographic reassurance when taking in the birthday bound, the 16 byte output of MD5 is about the absolute minimum size of hash you want to avoid collisions. Of course, MD5 hashn't been deprecated for nothing, you'd really want to use SHA-256.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
Does a cipher exist that can take in a plaintext that is x characters long and returns a ciphertext that is less than x characters long, but that can be decrypted back to the original plaintext? What if the alphabet for the ciphertext is much bigger than the alphabet for the plaintext?
No, this is not possible by using a cipher.
For a generic cipher that allows any message with equal probability, it is theoretically - and therefore practically - impossible that it can compress the message in any meaningful way. Generally, it is not a function of a cipher to perform any kind of compression anyway, and generally a modern ciphers are written to operate on any kind of binary encoded message.
Due to the pigeonhole principle there must be at least as many ciphertext possible as plaintext. Otherwise one ciphertext will map to multiple plaintext, and there is no way you can decide which one would be the original plaintext.
Generally, if a key is reused, it is even required to add an IV to be secure. If no IV or other unique value is added then repeated plaintext will generate identical ciphertext, leaking information to an adversary, so commonly the ciphertext is even expanded. Nowadays we also often add an authentication tag so an adversary can not change the ciphertext undetected.
If if doesn't matter for your application that repeated plaintext result in identical ciphertext then there are relatively complex techniques that allow you to "break even". These techniques are called Format Preserving Encryption or FPE. If you're lucky enough that the plaintext size is identical to the block size then you could also simply perform a single block encrypt using any block cipher.
If you want to reduce message size then you need to somehow compress your input plaintext. This can for instance be performed by a generic compression routine on either binary or textual data. Often it is also possible to re-encode the message at the application level (enum instead of string) or representation level (binary instead of text) very effectively.
Finally, there is also a "cryptographic compression", which is performed in secure hash algorithms such as SHA-256. However, SHA-256 is a one-way hash function, making it impossible to retrieve your message. Hashing is used to create a unique fingerprint over the message; it doesn't provide confidentiality of the message as would be expected by a cipher.
This question already has answers here:
What is the optimal length for user password salt? [closed]
(5 answers)
Closed 4 years ago.
Why is the salt not more than 8 ~ 16 characters long?
Also, why in most cases is it in the front or end of the password, and not in different positions?
Is this to make it harder for the breaker? Or is it useless?
Because more salt doesn't serve a useful purpose.
The point of salt is to prevent some parallel attacks from working in a reasonable amount of time/memory, and/or drive space. (You can no longer have a table that says Hash A => Password A, because even if you had enough disk space to construct a rainbow table, the salt makes the number of possible entries way beyond feasibility. And you can no longer hash a potential password once and compare it against a bunch of hashes at a time, because the salt is quite likely to be different for each hash.)
16 characters gives you somewhere between 10^16 and 96^16 times as many possibilities, which already fits the definition of "way beyond feasibility". Past a certain point, you're simply increasing your own storage requirements for no significant benefit.
The salt of 8 characters is enough against any imaginable in real life dictionary attack, it makes any dictionary or rainbow table useless.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Which is the more secure method of storing passwords? I lack the mathematical background to determine the answer myself.
Let's please for the sake of argument assume that all passwords and usernames generated for each of the following methods are randomly generated 6 characters known to be exactly six alpha-humeric-special-character fields and that each are using the same hashing algorithm and the same number of passes.
The standard way. UserName stored in plain text and only the password is to be discovered. Hash(PlaintextPassword + UniqueRecordSalt) = Password stored in DB.
One field recognized as LoginInfo = Hash(Encryption(UserName, Password) + Shared Salt). Neither the UserName nor the Password are ever stored in any other format EVER.
Does the forced cross attempting of username/password combinations offset the weakness of a shared salt as opposed to a unique record salt? This is of course completely IGNORING all affects on usability and focusing entirely on security.
Can anyone point me to any software to help me answer this question myself since I lack the cryptography and mathematical knowledge to arrive at the answer myself?
Please feel free to move this to a more appropriate forum. I didn't know where else to put it. However, I don't feel that it is a topic irrelevant to programmers overall doing their everyday job.
Please read How to securely hash passwords? first. To summarize:
Never use a single pass of any hashing algorithm.
Never roll your own, which is what your example 2 is (and example 1 as well, if + means concatenation).
Username stored in the clear
Salt generated per user, 8-16 random bytes, stored in the clear
in pure binary or encoded into Base64 or hex or whatever you like.
Use BCrypt, SCrypt, or PBKDF2
Until some time after the results of the Password Hashing Competition, at least.
Use as high an work factor/cost/iteration count as your CPU's can handle during expected future peak times.
For PBKDF2 in particular, do not ask for more binary output bytes than the native hash produces. I would say not less than 20 binary bytes, regardless.
SHA-1: output = 20 bytes (40 hex digits)
SHA-224: 20 bytes <= output <= 28 bytes (56 hex digits)
SHA-256: 20 bytes <= output <= 32 bytes (64 hex digits)
SHA-384: 20 bytes <= output <= 48 bytes (96 hex digits)
SHA-512: 20 bytes <= output <= 64 bytes (128 hex digits)
For PBKDF2 in particular, SHA-384 and SHA-512 have a comparative advantage on 64-bit systems for the moment, as 2014 vintage GPU's many attackers will use have a smaller margin of advantage for 64-bit operations over your defensive CPU's than they would on 32-bit operations.
If you want an example, then perhaps look at PHP source code, in particular the password_hash() and password_verify() functions, per the PHP.net Password Hashing FAQ.
Alternately, I have a variety of (currently very crude) password hashing examples at my github repositories. Right now it's almost entirely PBKDF2, but I will be adding BCrypt, SCrypt, and so on in the future.
As you say option 1 is the standard way to store passwords. As long as you use a secure hash function (eg. NIST recommend PBKDF2) with a unique salt, your passwords are secure. So I would recommend this option.
Option 2 doesn't really make sense. You cant 'undo' a hash function, so why encrypt its contents? You would then also have to store the encryption key somewhere which is different issue entirely.
Also what do you mean by a shared salt? If you always use the same salt then that defeats the point of salting your hashes. A unique salt per row is the way to go.
I would say that combining the username and password into a single hash is overcomplicating things, and limits your options in development, since you can't get a row from the DB given a username.
Say you want to lock out a user after 5 incorrect password attempts. With a standard plain-text username and hashed pw, you can just have a 'login_attempt_count' column and update the row for that user each time their password is incorrectly entered.
If your username and passwords are hashed together, you have no way of identifying which row to update with a login attempt count, since a hashed correct username with a wrong password wont match any hash.
I guess you could have some kind of mapping function to get a row_id given a username, but I would say its just needlessly complicated, and with greater complication you have a bigger chance of security flaws.
As I said, I would just go with option 1. It's the industry standard way to store passwords, and its secure enough for pretty much any application (as long as you use a modern secure hash function).
This question already has answers here:
Closed 13 years ago.
Duplicate:
Confused about hashes
How can SHA encryption create unique 40 character hash for any string, when there are n infinite number of possible input strings but only a finite number of 40 character hashes?
SHA is not an encryption algorithm, it is a cryptographic hashing algorithm.
Check out this reference at Wikipedia
The simple answer is that it doesn't create a unique 40 character hash for any string - it's inevitable that different strings will have the same hash.
It does try to make sure that close-by string will have very different hashes. 40 characters is a pretty long hash, so the chance of collision is quite low unless you're doing ridiculous numbers of them.
SHA doesn't create a unique 40 character hash for any string. If you create enough hashes, you'll get a collision (two inputs that hash to the same output) eventually. What makes SHA and other hash functions cryptographically useful is that there's no easy way to find two files that will have the same hash.
To elaborate on jdigital's answer:
Since it's a hash algorithm and not an encryption algorithm, there is no need to reverse the operation. This, in turn, means that the result does not need to be unique; there are (in theory) in infinite number of strings that will result in the same hash. Finding out which on those are is practically impossible, though.
Hash algorithms like SHA-1 or the SHA-2 family are used as "one-way" hashes in support of password-based authentication. It is not computationally feasible to find a message (password) that hashes to a given value. So, if an attacker obtains the list of hashed passwords, they can't determine the original passwords.
You are correct that, in general, there are an infinite number of messages that hash to a given value. It's still hard to find one though.
It does not guarantee that two strings will have unique 40 character hashes. What it does is provide an extremely low probability that two strings will have conflicting hashes, and makes it very difficult to create two conflicting documents without just randomly trying inputs.
Generally, a low enough probability of something bad happening is as good as a guarantee that it never will. As long as it's more likely that the world will end when a comet hits it, the chance of a colliding hash isn't generally worth worrying about.
Of course, secure hash algorithms are not perfect. Because they are used in cryptography, they are very valuable things to try and crack. SHA-1, for instance, has been weakened (you can find a collision in 2000 times fewer guesses than just doing random guessing); MD5 has been completely cracked, and security researchers have actually created two certificates which have the same MD5 sum, and got one of them signed by a certificate authority, thus allowing them to use the other one as if it had been signed by the certificate authority. You should not blindly put your faith in cryptographic hashes; once one has been weakened (like SHA-1), it is time to look for a new hash, which is why there is currently a competition to create a new standard hash algorithm.
The function is something like:
hash1 = SHA1(plaintext1)
hash2 = SHA1(plaintext2)
now, hash1 and hash2 can technically be the same. It's a collision. Not common, but possible, and not a problem.
The real magic is in the fact that it's impossible to do this:
plaintext1 = SHA1-REVERSE(hash1)
So you can never reverse it. Handy if you dont want to know what a password is, only that the user gave you the same one both times. Think about it. You have 1024 bytes of input. You get 40 bits of output. How can you EVER reconstruct those 1024 bytes from the 40 - you threw information away. It's just not possible (well, unless you design the algorithm to allow it, I guess....)
Also, if 40 bits isn't enough, use SHA256 or something with a bigger output. And Salt it. Salt is good.
Oh, and as an aside: any website which emails you your password, is not hashing it's passwords. It's either storing them unencrypted (run, run screaming), or encrypting them with a 2 way encryption (DES, AES, public-private key et al - trust them a little more)
There is ZERO reasons for a website to be able to email you your password, or need to store anything but the hash. /rant.
Nice observation. Short answer it can't and leads to collisions which can be exploited in birthday attacks.
The simple answer is: it doesn't create unique hashes. Look at the Pidgeonhole priciple. It's just so unlikely for there to be a collision that nobody has ever found one.