Will an MD5 hash be unique for a given input? - encryption

Will a MD5 hash calculated for 'Apple' be same when done on 2 different machines?

Related

Creating Hash Values with variable Functions

I am curious if it is possible to create a HASH value from an MD5 or SHA algorithm, such that if an attacker had a populated HASH table, and access to my HASH values, they wouldn't be able to guess the original value. By this I mean that if someone populates a TABLE using MD5 or SHA algorithms on dictionary words, can I possibly run or generate a HASH with a variable parameter so that an attacker can't use use a pre-populated HASH table as they would have to guess the variable used to generate the HASH differently.
For instance:
Generate MD5 "OriginalText1" ---> FD823lF8lGSLJlDFDF....
Generate MD5 -variance 10000 "OriginalText1" ---> SLJDFLDSKJ3243243D....
I am not asking for a platform specific answer, but if you happen to provide one for Linux or Python, I would appreciate it. Many thanks.
You probably want to check out salting
Short explanation:
You store an additional random string and concatenate your "OriginalText1" with it -> "OriginalText1RANDOM". Then your hash function gets the concatenated version as input.
Advantage:
Makes use of Rainbow Table close to impossible
Disadvantage:
You need to store more data

Surrogate keys for distributed systems

I am new to the world of big data, I would like to take a question regarding surrogate keys. Considering a distributed data system, creating surrogate keys with hash md5 seems interesting. At the same time, md5 uses attribute concatenation. My question:
Is there a preference for using md5 hash instead of simply using the concatenation of some attributes as the surrogate key?
Great question . I believe both the implementation will work just fine , but from a readablity point of view I think having a set of properties compared to hashed md5 will make sense .

Decrypting MD5 hashed text when salt is known

Let's say I have the following MD5 hashed password:
bec0932119f0b0dd192c3bb5e5984eec
If I know that the original password was salted and hashed and know that instead of typical salt it was just wrapped in 'flag{}' before MD5 summing it.
How may I decrypt MD5 in this case?
The other answer is not correct in the definition of what you are trying. Let's begin with the formal definitions of Cryptographical hash functions' required resistances. The below from Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance by P. Rogaway and T. Shrimpton;
preimage-resistance — for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage x' such that h(x') = y when given any y for which a corresponding input is not known.
2nd-preimage resistance, weak-collision — it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given x, to find a 2nd-preimage x' != x such that h(x) = h(x').
collision resistance, strong-collision — it is computationally infeasible to find any two distinct inputs x, x' which hash to the same output, i.e., such that h(x) = h(x').
Collisions and password cracking is not related. Actually, you are trying to find a pre-image that works with the given hash value and the salt. The cost of generic pre-image attacks is O(2^n) in the case of MD5 n=128 that is O(2^128). There is a pre-image attack on the MD5 that is better than the generic with a cost of 2^123.4
Finding Preimages in Full MD5 Faster Than Exhaustive Search
This attack still beyond the search of everybody (except the QC and that is another story). Even for the supercomputers or the collaborative power of the bitcoin miners.
As pointed above, MD5 is no longer cryptographically secure since its collision resistance is broken, even SHA-1 is no longer secure.
hashing is not encryption/decryption. That is really a long story here a short answer, Encryption is reversible but hashes are not ( consider the pigeonhole principle, and see one-way functions) [ minor note block cipher mode of operation like the CTR mode doesn't requires a PRP it can work with PRF and it is designed in this way]...
What can you do?
First, use the John the Ripper password cracker.
If not found, then
Build a fast pre-image attack on the MD5 up to some limit according to your budget. hashcat is a very powerful tool that you can benefit from it to build it. Here a hashcat performance;
hashcat with Nvidia RTX 3090 one can search for 65322.5 MH/s (Mega Hashes/ Seconds). That is 2^16 MH/s. The calculations - time, device cost, electricity costs - can be done according to target search space if known.
MD5 is a hash function, you cannot really decrypt the result (plz search difference between hash and decryption).
However - you may try to find a collision - an input giving the same hash. With some probability it will match the original input. Cryptographic hash functions are designed to be very difficult (unfeasible) to find a collision, however for the MD5 it is not valid anymore (that's why MD5 is considered as not safe to use)
You may check the resources Vlastimil Klima: Tunnels in Hash Functions: MD5 Collisions Within a Minute, there are some more references and tools linked related to the latest Tunnel attack.

Can rainbow table analysis out simple cleartext passwords from md5 value?

I know that a hash value(for example: md5 value) can have connection with multiple values like '^&#%we242eweqweqweqwedfdfdfee2', '%$#%3423efffe435%%^#'
But as most users are actually use a very simple password, are those md5 values can only have relationship with a limited simple cleartxt passwords?
I mean if 'cfcd208495d565ef66e7dff9f98764da' just have connection with 30 simple values like '0','tom123','goodcar', then a hacker who has get md5 data from a database would easily figure out the relationship between a username and its cleartext password, and then could use this pair of value to hack the same account on other websites.
So, is any specified md5 value only responsible for a limited simple values?
PS: I know I can add salt or use better method like sha512, sha3, but I'm very curious about the question above.
The question depends on what your understanding of "simple values" is. Generally speaking cryptographic hash function try to emulate a random mapping of arbitrary length inputs to fixed length outputs. The most fundamental security notion of those cryptographic hashes is so called collision resistance, i.e. it is computational infeasible to find a pair of input messages that hash to the same fixed length output. As you have demonstrated this notion is now broken with md5 as you can construct special messages that do indeed collide under md5.
But as you were talking about "simple values" I assume you exclude such artificially crafted messages and then we can still view md5 as a random mapping.
For such a random mapping the chance of a collision only depends on size of the input domain. For example if you are looking at all 6 character passwords out of the charset {a-z, A-Z, 0-9} you can be sure that there will be no collision (and you can even try it yourself as Chris has pointed out). But if you expand that size to 25 characters out of the same charset there is guaranteed to be a collision as there are now more possible passwords than available hash values.
Estimating the chance of a collision is called the birthday problem. As a simple estimated if you have k possible output values you can expect there to be a collision when you reach sqrt(k) input values. So for md5 with k=2^128 you expect a collision if your input value set approaches the size of 2^64.

MD5 purpose or uses

If we can't decode the MD5 hash string, then what is the purpose of MD5 where can we use MD5.
To store data save in a database for example.
If you save your password using md5 and you compare it with the password you enter in a form and hash it, it is still the same password but you can't see it in clear text in the database.
For example:
password = 123
md5(123) === "202cb962ac59075b964b07152d234b70"
if you try to log in and you enter 123 as your password, the md5 of it will still be the same and you can compare those. But if your database is hacked the hacker cannot read the password in clear text, only the hashed value
An decryptable file has the property that its always at least as big as the original file, a hash is much, much smaller.
This allows us the create a hash from a file that can prove the integrity of the file, without storing it.
There are many reasons not to store the file in encrypted or plain text:
As soon as an encrypted file falls in the wrong hands, they could try to decrypt it. There's no chance that's going to happen with a hash.
You simply don't need the file yourself, but maybe you're sending it to someone, and that person can proof it's integrity using the hash.
It allows you to determine whether the data you have (e.g., an entered password) is the same as some other data which is secret (e.g., the correct password) without requiring access to the secret data. In other words, it can be used to determine "is this user-entered password correct?" while also keeping the correct password secret. (Note that there are stronger hashing methods out there which should be used instead of md5 for this purpose these days, such as sha* and bcrypt. With modern hardware, it's fairly easy to throw millions of passwords per second at an md5 hash until you find one that matches the correct password.)
It allows you to verify the integrity of a transmitted file by comparing the md5 hash of the original file with the md5 hash of the data that was received. If the hashes are different, the received data was not the same as the sent data, so you know to re-send it; if they're the same, you can be reasonably certain that the sent and received data are identical.
Good hash functions like MD5 can be used for identification. See this question. Under certain conditions you can assume that equal hashes mean equal data blocks.
MD5 is mainly used to maintain the integrity of files when it is send from 1 machine to another machine,to detect whether any man in middle third party have not modify the contents of files.
Basic example is : When you download any file from server server has MD5 calculated when it comes to you it again check for md5 values if md5 hash matches file is not corrupted or not modified by any third person.
MD5 is a hash function and there are more like that such as SHA, PBKDF, bcrypt and scrypt. I really prefer scrypt. Hash functions are used for integrity reasons in order to detect any manipulations that may occurred during the transmission of the actual message. The receiver is able to find if the received message has not not changed by checking the hash value of the message.
These functions have three security properties:
1) It is difficult for someone to detect the actual message when it only has the h(m).
2) Given a message m and its hash function it is difficult to find another message with the same hash value.
3) Last, it is difficult to find to different messages m1, m2 with the same hash value.
Also, it is important to know that hash function's algorithms are public and it is very easy to compute the hash value of a message. Moreover, hashes are "one-way" functions, meaning that is hard to find the message given the hash of the message. The actual security thus, is based on that property.

Resources