I was asked to implement the AES algorithm for a security class. While implementing i couldn't find answer on how i can accept a key like a password, with arbitrary length, from the user and convert it to 128, 192 or 256-bit key. What should i do?
As mentioned in the comments, this is typically done with a key derivation function (KDF). There are two main types of key derivation functions that are used.
The first kind is used when you have some type of cryptographic material already, oftentimes some variant of a key exchange (usually, Diffie-Hellman). In this case, the key material is assumed to be strong and you just want to distill it and generate potentially multiple keys from it. HKDF, which is used in TLS 1.3, and the TLS 1.2 PRF are good examples of this. They are generally wrappers around HMAC, and they're pretty fast.
The second kind is used when you have a password. Because, in general, people are bad at coming up with and remembering passwords with sufficient entropy, we use a KDF that is specifically iterated so as to be slow, such as the older PBKDF2 or the newer scrypt and Argon2. These options are designed to use a unique salt and be iterated many times so that users who pick poor passwords are afforded at least some level of protection against compromise, and the newer options are designed to be expensive in memory to prevent efficient attacks on GPUs.
Related
My question is that, suppose you have some AES-ECB encrypted hash and you want to decode it. You are also given a bunch of example plaintexts and hashes. For example:
I want: unknown_plaintext for the hash given_hash
and i have a bunch of known_plaintexts and hashes that have been encrypted with the same secret key. None of them (obviously) are the exact same to the given hash.
Please let me know if you can help. This is not for malicious intents, just to learn how Cryptography and AES systems work.
This is not computationally feasible. I.e., you can't do this.
Modern encryption algorithms like AES are resistant to known-plaintext attacks, which is what you are describing.
There has been some past success in a category called adaptive chosen plaintext attacks. Often these exploit an "oracle." In this scenario, an attacker can decrypt a single message by repeatedly asking the victim whether it can successfully decrypt a guess generated by the attacker. By being smart about choosing successive guesses, the attacker could decrypt the message with a million tries or so, which is a relatively small number. But even in this scenario, the attacker can't recover the key.
As an aside, ciphers don't generate hashes. They output cipher text. Hash functions (aka message digests) generate hashes.
For any respectable block cipher (and AES is a respectable block cipher), the only way to decrypt a ciphertext block (not "hash") is to know the key, and the only way to find the key from a bunch of plaintext-ciphertext pairs is by guessing a key and seeing if it maps a known plaintext onto the corresponding ciphertext. If you have some knowledge of how the key was chosen (e.g., SHA-256 of a pet's name), this might work; but if the key was randomly selected from the set of all possible AES keys, the number of guesses required to produce a significant probability of success is such a large number that you wander off into age-of-the-universe handwaving.
If you know that all the encrypted hashes are encrypted with the same key you can first try to find that key using your pairs of plaintexts and encrypted hashes. The most obvious way to do that would be to just take one of your plaintexts, first hash it and then try out all the possible keys to encrypt it until it matches the encrypted hash that you know. If the key you're looking for is just one of the many many possible AES keys this is set to fail, because it would take way too long to try all the keys.
Assuming you were able to recover the AES key somehow, you can decrypt that one hash you don't have a plaintext for and start looking for the plaintext.
The more you know about the plaintext, the easier this guesswork would be. You could just throw the decrypted hash into google and see what it spits out, query databases of known hashes or make guesses in the most eduated way possible. This step will again fail, if the hash is strong enough and the plaintext is random enough.
As other people have indicated, modern encryption algorithms are specifically designed to resist this kind of attack. Even a rather weak encryption algorithm like the Tiny Encryption Algorithm would require well over 8 million chosen plaintexts to do anything like this. Better algorithms like AES, Blowfish, etc. require vastly more than that.
As of right now, there are no practical attacks on AES.
If you're interested in learning about cryptography, the older Data Encryption Standard (DES) may actually be a more interesting place to start than AES; there's a lot of literature available about it and it was already broken (the code to do so is still freely available online - studying it is actually really useful).
If I use different encryption methods but provide no indication in the ciphertext output of which method I use (for example, attaching an unencrypted header to the ciphertext) does that make the ciphertext harder to decrypt than just the difficulty implied by, for example, the keylength? The lack of information as to what encryption protocol and parameters to use should add difficulty by requiring a potential decrypter to try some or all the various encryption methods and parameters.
Well, in general you should not rely on information in the algorithm / protocol itself. Such information is generic for any key you use, so you should consider it public knowledge. OK, so that's that out of the way.
Now say you use 16 methods and you somehow have created a protocol that keeps the used encryption method confidential (let's say by encrypting a single block half filled with random and a magic, decrypting blocks at the receiver until you find the correct one). Now if you would want to brute force the key used you would need 16 more tries. In other words, you just have increased the key length with 4 bits, as 2 ^ 4 = 16. So say you would have AES-256 equivalent ciphers. You would now have equivalent encryption of 256 + 4 = 260 bits. That hardly registers, especially since AES-256 is already considered safe against attacks using a quantum computer.
Now those 4 bits comes at a very high price. A highly complex protocol using multiple ciphers. Each of these ciphers have their weaknesses. None of them will have received as much scrutiny as AES, and if one breaks you are in trouble (at least for 1 out of 16 encrypted messages). Speeds will differ, parameters and block sizes will differ, platforms may not support them all...
All in all, just use AES-256 if you are not willing to accept AES-128. If you must, encrypt things twice using AES and SERPENT. Adding an authentication tag over IV & ciphertext probably makes much more of a difference though. See this answer by Thomas over at the security site.
Try GCM or EAX mode of operation. Much more useful.
Looking through the various encrypting and hashing algorithms they seem to focus on computation time vs security, and seem to target encrypting/hashing passwords.
In my scenario I am trying to encrypt a string that will be provided to the enduser, of which later I will provided the unencrypted version that they can match up to the encrypted version to verify a certain action (a la a provably fair system)
I thought of using sha-512, providing the hash and then later on providing the unecrypted string for which the enduser will be able to match up the hash and the unencrypted string.
However I recently discoved bcrypt, for which certain people have said it is a better choice. Now for me it does not matter how long it takes to generate the hash so for my circumstances is it best to use bcrypt with an ungodly # of rounds to make my string harder to crack or am I just going about this the wrong way?
I am looking for a way of obtaining the key from this set of information, I know for a fact that we are using 16 byte blocks with CBC and I have the first 16 byte plaintext and cyphered, along with the used IV.
At the moment I can test if a key is correct by comparing the output, but I cannot bruteforce 16 character keys for obvious reasons, reading other posts it was my understanding that having the data I have it might be possible to get the key.
Any hint?
What you are trying to do is called a "known plaintext atack", you have both the cyphertext and the plaintext, all that you lack is the key used. Unfortunately, all modern cyphers are designed to resist such attacks. Unless you have extremely sophisticated mathematical skills, you will not be able to find the key this way. AES is resistant to a known plaintext attack.
You will have to try some other method of determining the key. Has the key owner left it written on a piece of paper somewhere?
Note that if AES has been applied as it should be then you cannot find the key. However, judging on the amount of incorrect implementations on stackoverflow, the key may as well be a password, or a simple SHA-256 of a string. If you can obtain information about how the key was generated/applied or stored you may be able to get around even AES-256.
Otherwise your only attack vector is breaking AES or brute forcing the key. In that case I wish you good luck, because brute forcing a 256 bit key is completely out of the question, even with a quantum computer. Unless vulnerabilities are found, of course, AES is not provably secure after all. There may be a vulnerability.
What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.