How collision resistant are encryption algorithms? - encryption

How hard is it for a given ciphertext generated by a given (symmetric or asymmetric) encryption algorithm working on a plaintext/key pair, to find a different plaintext/key pair that yields the same cyphertext?
And how hard is it two find two plaintext/key pairs lead to the same cyphertext?
What led to this question, is another question that might turn out to have nothing to do with the above questions:
If you have a ciphertext and a key and want to decrypt it using some decryption routine, the routine usually tells you, if the key was correct. But how does it know it? Does it look for some pattern in the resulted plaintext, that indicates, that the decryption was successful? Does there exists another key results in some different plaintext, that contains the pattern and is also reported "valid" by the routine?
Follow-up question inspired by answers and comments:
If the allowed plaintext/key pairs where restricted in the on of the following (or both) way(s):
1) The plaintext starts with the KCV (Key check value) of the key.
2) The plaintext starts with a hash value of some plaintext/key combination
Would this make the collision finding infeasible? Is it even clear, that such a plaintext/key exists=

The answer to your question the way you phrased it, is that there is no collision resistance what so ever.
Symmetric case
Let's presume you got a plain text PT with a length that is a multiple of the block length of the underlying block cipher. You generate a random IV and encrypt the plain text using a key K, CBC mode and no padding.
Producing a plain text PT' and key K' that produces the same cipher text CT is easy. Simply select K' at random, decrypt CT using key K' and IV, and you get your colliding PT'.
This gets a bit more complicated if you also use padding, but it is still possible. If you use PKCS#5/7 padding, just keep generating keys until you find one such that the last octet of your decrypted text PT' is 0x01. This will take on average 128 attempts.
To make such collision finding infeasible, you have to use a message authentication code (MAC).
Asymmetric case
Something similar applies to RSA public key encryption. If you use no padding (which obviously isn't recommended and possibly not even supported by most cryptographic libraries), and use a public key (N,E) for encrypting PT into CT, simply generate a second key pair (N',E',D') such that N' > N, then PT' = CT^D' (mod N) will encrypt into CT under (N',E').
If you are using PKCS#1 v1.5 padding for your RSA encryption, the most significant octet after the RSA private key operation has to be 0x02, which it will be with a probability of approximately one in 256. Furthermore the first 0x00 valued octet has to occur no sooner than at index 9, which will happen with a high probability (approximately 0,97). Hence, on average you will have to generate on average some 264 random RSA key pairs of the same bit size, before you hit one that for some plain text could have produced the same cipher text.
If your are using RSA-OAEP padding, the private key decryption is however guaranteed to fail unless the cipher text was generated using the the corresponding public key.

If you're encrypting some plaintext (length n), then there are 2n unique input strings, and each must result in a unique ciphertext (otherwise it wouldn't be reversible). Therefore, all possible strings of length n are valid ciphertexts. But this is true for all keys. Therefore, for any given ciphertext, there are 2k ways of obtaining it, each with a different key of length k.
Therefore, to answer your first question: very easy! Just pick an arbitrary key, and "decrypt" the ciphertext. You will get the plaintext that matches the key.
I'm not sure what you mean by "the routine usually tells you if the key was correct".

One simple way to check the validity of a key is to add a known part to the plaintext before encryption. If the decryption doesn't reproduce that, it's not the right key.
The known part should not be a constant, since that would be an instant crib. But it could be e.g. be a hash of the plaintext; if hashing the decrypted text yields the same hash value, the key is probably correct (with the exception of hash collisions).

Related

Can AES-128 have a key of 15-long ASCII characters?

I'm trying to decrypt an encrypted h264 I-frame, and I was given a key of length 15, is this even valid?
Should not it be of length 16, so the binary representation would be 128 bits?
If you have a thing you could type on a keyboard, that is not a proper AES key, no matter the length. AES derives its power from the fact that its key is effectively random. Anything you can type on a keyboard in not an effectively random sequence of equivalent length. There are only about 96 characters you can type easily on a Latin-style keyboard. A byte has 256 values. 96^16 is a minuscule fraction of 256^16.
To convert a "password" that a human could type into an effectively random AES key, you need a password-based key derivation function (PBKDF). The most famous and widely available is PBKDF2. There are other excellent PBKDFs including scrypt and Argon2. All of them require a random salt, and all are (in cryptographic terms) very slow to compute.
That said, regarding your framework, it is not possible to guess how they have converted this string into a key. You must consult the documentation or the implementation. There are an unbounded number of ways to convert strings into keys (most of them are terrible, but there are still an unbounded selection to pick from). As Michael Fehr noted they might have done something insecure like padding with zeros. They might also have used a simple hashing function like SHA-256 and either used a 256-bit key or taken the top or bottom 128 bits. Or…almost literally anything else. There is no common practice here. Each encryption system has to document how it is implemented.
(Note that even if you see "AES-128," this is also ambiguous. It can mean "AES with a 128-bit key" or it can mean "AES with a 128-bit block and a key of 128, 192 or 256 bits." While the former meaning is a bit more common, the latter occurs often, for example in Apple documentation, despite being redundant (AES always has a 128-bit block). So even questions like "how long is the key" requires digging into the documentation or the implementation. Cryptography is unfortunately incredibly unstandardized.)
Should not it be of length 16, so the binary representation would be 128 bits?
You are right. For AES only key length of 128, 192 or 256 bit is valid.
I commonly see two possibilities for having a key of different length:
You was given a password, not a key. Then you need as well to ask for a way to generate a key from the password (Hash? PBKDF2? Other?)
Many frameworks will silently accept different key length and then trim or zero-pad the value to fit the required key size. IMHO this is not a proper approach as it gives the developers feeling the key is good and in reality a different (padded or trimmed) value is used.

3DES: does identical ciphertext mean identical keys?

Can we assume that same encryption key is used to encrypt data if encrypted data are same?
For example, plain text is 'This is sample'.
First time we use 3DES algorithm and encryption key to encrypt it. Encrypted data became 'MNBVCXZ'.
Second time again, we use 3DES algorithm and encryption key to encrypt it. Encrypted data became 'MNBVCXZ'.
My questions are:
Can I assume static encryption key is used in this encryption process?
How many keys can be used to encrypt data using 3DES algorithm?
Can I assume static encryption key is used in this encryption process?
Yes, if you perform the encryption yourself (with a very high probability), no if an adversary can perform the encryption and the plaintext/ciphertext is relatively small.
As 3DES does indeed have 2^168 possible keys and 2^64 possible blocks, it should be obvious that some keys will encrypt a single plaintext to the same ciphertext. Finding such a pair of keys requires about 2^32 calculations on average (because of the birthday paradox).
If the plaintext is larger (requires more than one block encrypt) then the chance of finding a different key that produces the same ciphertext quickly will go to zero.
If one of the keys is preset it will take about 2^64 calculations to find another key. And - for the same reason - there is only a chance of 1 / 2^64 to use two keys that unfortunately produce the same ciphertext for a specific plaintext.
If you want to make the calculations yourself, more information here on the crypto site.
How many keys can be used to encrypt data using 3DES algorithm?
2^168 if you consider the full set of possible keys, i.e. you allow DES-ABC keys. These keys are encoded as 192 bits including parity. This would include DES-ABA and DES-AAA keys (the latter is equivalent to single DES).
2^112 if you consider only DES-ABA keys. These keys are encoded as 128 bits including parity. This would include single DES.

AES128 vs AES256 using bruteforce

I came across this:
I don't understand how AES128 is stronger than AES256 in a brute force attack, or how AES256 allows for more combinations than AES128.
These are my simplified premises - assuming I have 100 unique characters on my keyboard, and my ideal password length is 10 characters - there would be 100^10 (or 1x10^20) combinations for brute force attack to decry-pt a given cipher text.
In that case, whether or not AES128 or AES256 is applied doesn't make a difference - please correct me.
Yes, you are correct (in that a weak password will negate the difference between AES128 and AES256 and make bruteforcing as complex as the password is). But this applies only to the case when the password is the only source for key generation.
In normal use, AES keys are generated by a "truly" random source and never by a simple pseudorandom generator (like C++ rand());
AES256 is "more secure" than AES128 because it has 256-bit key - that means 2^256 possible keys to bruteforce, as opposed to 2^128 (AES128). The numbers of possible keys are shown in your table as "combinations".
Personally, I use KeePass and passwords of 20 symbols and above.
Using 20-symbol password composed of small+capital letters (26+26), digits (10) and special symbols (around 20) gives (26+26+10+20)^20 = 1.89*10^38 possible combinations - comparable to an AES128 key.
how AES128 is stronger than AES256 in a brute force attack
AES does multiple rounds of transforming each chunk of data, and it uses different portions of the key in these different rounds. The specification for which portions of the key get used when is called the key schedule. The key schedule for 256-bit keys is not as well designed as the key schedule for 128-bit keys. And in recent years there has been substantial progress in turning those design problems into potential attacks on AES 256.This is the basis for advice on key choice.
how AES256 allows for more combinations than AES128
AES256 uses 256 bits, giving you the permissible combination of aroung 2^256, while in case of 128, its 2^128.
These are my simplified premises - assuming I have 100 unique characters on my keyboard, and my ideal password length is 10
characters - there would be 100^10 (or 1x10^20) combinations for brute
force attack to decry-pt a given cipher text.
I am not quite sure what your understanding is, but when you say applying AES128/AES256, you actually encrypt your password into a cipher text.It is encoded information because it contains a form of the original plaintext that is unreadable by a human. It won't just use all the 100unique characters from your keyboard. It uses more than that. So, if you want to get the original password, you must find the key with which it is encrypted. And that gives you the combination figures 2^128 ans 2^256.

Characteristics of an Initialization Vector

I'm by no means a cryptography expert, I have been reading a few questions around Stack Overflow and on Wikipedia but nothing is really 'clear cut' in terms of defining an IV and its usage.
Points I have discovered:
An IV is prepended to a plaintext message in order to strengthen the encryption
The IV is truely random
Each message has its own unique IV
Timestamps and cryptographic hashes are sometimes used instead of random values, but these are considered to be insecure as timestamps can be predicted
One of the weaknesses of WEP (in 802.11) is the fact that the IV will reset after a specific amount of encryptions, thus repeating the IV
I'm sure there are many other points to be made, can anyone think of any other characteristics which I've missed?
An IV is "a public value which impacts the encryption process". The point of the IV is often to "randomize" the input data to avoid leaking information about which input blocks were identical in the plaintext (because identical blocks happen quite a lot in "real-life" data).
Whether the IV is input by pre-pending it or otherwise depends on the algorithm in which it is used. For symmetric encryption with a block cipher in CBC mode, the IV is pre-pended to the encrypted data (CBC uses, for each block, the previous encrypted block; the IV plays the role of the encrypted block -1).
An IV is distinct from a key in that a key is secret whereas the IV needs not be secret; the IV is often transmitted along the encrypted message. Conversely, the IV must be distinct for every message, whereas the key may be reused. Actually, the IV must be distinct for every message encrypted with the same key; if you use a new key for every message then you can use a constant, fixed IV. Note that the IV needs not be secret, but you can keep it secret if you wish. But the sender and the receiver must agree on the IV, and since the IV changes for every message then it can be inconvenient, in some setups, to keep IV secret.
Whether the IV must be uniformly random, or simply non-repeating, depends on the algorithm. CBC requires a random IV. Other modes are less picky, e.g. GCM. You may derive the key and the IV from a "master key", using a proper one-way function. This is what SSL does. It is more tricky that it seems, do not try it at home.
Repeating the IV is one of the numerous sins of WEP.

Identifying An Encryption Algorithm

First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data. This seems highly unlikely, if not impossible to me, so sorry if it's a stupid question.
My final question is, say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
No. There are algorithms that use a different key to decrypt than to encrypt, but a keyless method would rely on secrecy of the algorithm, generally regarded as a poor idea.
My final question is, say you have
access to the plain text data before
it is encrypted, the key used to
encrypt the plain text data, and the
resulting encrypted data, would
figuring out which algorithm used to
encrypt the data be feasible?
Most likely yes, especially given the key. A good crypto algorithm relies on the secrecy of the key, and the key alone. See kerckhoff's principle.
Also if a common algorithm is used it would be a simple matter of trial and error, and besides cryptotext often is accompanied by metadata which tells you algorithm details.
edit: as per comments, you may be thinking of digital signature (which requires a secret only on the sender side), a hash algorithm (which requires no key but isn't encryption), or a zero-knowledge proof (which can prove knowledge of a secret without revealing it).
Abstractly, we can think of the encryption system this way:
-------------------
plaintext ---> | algorithm & key | ---> ciphertext
-------------------
The system must guarantee the following:
decrypt(encrypt(plaintext, algorithm, key), algorithm, key) = plaintext
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
Yes, in such a system the key is redundant; all the "secrecy" lies in the algorithm.
My final question
is, say you have access to the plain
text data before it is encrypted, the
key used to encrypt the plain text
data, and the resulting encrypted
data, would figuring out which
algorithm used to encrypt the data be
feasible?
In practice, you'll probably have a small space of algorithms, so a simple brute-force search is feasible. However, there may be more than one algorithm that fits the given information. Consider the following example:
We define the following encryption and decryption operations, where plaintext, ciphertext, algorithm, and key are real numbers (assume algorithm is nonzero):
encrypt(plaintext, algorithm, key) = algorithm x (plaintext + key) = ciphertext
decrypt(ciphertext, algorithm, key) = ciphertext/algorithm - key = plaintext
Now, suppose that plaintext + key = 0. We have ciphertext = 0 for any choice of algorithm. Hence, we cannot deduce the algorithm used.
First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data.
What are you getting at? It's trivial to come up with a pair of functions that fits the letter of the specification, but without knowing the intent it's hard to give a more helpful answer.
say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
If the algorithm is any good the output will be indistinguishable from random noise, so there is no analytic solution to this. As a practical matter, there are only so many trusted algorithms in wide use. Trying each one in turn would be quick, but would be complicated by the fact that an implementation has some freedom with regard to things like byte order (little-endian vs big-endian), key derivation (if you had a pass-phrase instead of the actual cryptographic key itself), encryption modes and padding.
As frankodwyer points out, this situation is not part of usual threat models. This would work in your favor, as it makes it more likely that the algorithm is a well-known one.
The best you could do without a known key in the decoder would be to add a bit of obscurity. For example, if the first step of the decode algorythm is to strip out everything except for every tenth character, then your encode key may be used to seed some random garbage for nine out of every ten characters. Thus, with different keys you could achieve different encoded results which would be decoded to the same message, with no key necessary for the decoder.
However, this does not add much real security and should not be solely relied on to protect crucial data. I'm just thinking of a case where it would be possible to do so yes I suppose it could - if you were just trying to prove a point or add one more level of security.
I don't believe that there is such an algorithm that would use a key to encrypt, but not to decrypt. (Silly answers like a 26 character Caesar cipher aside...)
To your second question, yes; it just depends on how much time you're willing to spend on it. In theoretical cryptography it is assumed that the algorithm can always be determined. Whether that be through theft of the algorithm or a physical machine, or as in your case having a plain text and cipher text pair.

Resources