Relation between encryption input and output length? - encryption

If I were to encrypt a string of arbitrary length (let's say using md5), is there any relation between increasing string size to the output hash?
For example, if I were to encrypt a string that was 100 million characters in length, would it be more likely to output a longer hash than an input string of 5 characters?
Elaboration on why would be much appreciated as well, thanks!

One idea of a hash function is that there is no relation between output and input length. So for example the length of a md5 hash is always 128 bit.
Normally you cannot infer anything from the output hash about the input.

MD5 is not encryption, it's hashing. That's an important distinction to make. When encrypting a string, you can reverse the encryption afterwards. That necessarily means the encrypted string length will have a correlation with the input string length.
A hash on the other hand is a fingerprint or digest of the input, which is always a constant length and is therefore also not reversible. The length of the hash depends on the algorithm used, not on the length of the input.

Related

Can AES-128 have a key of 15-long ASCII characters?

I'm trying to decrypt an encrypted h264 I-frame, and I was given a key of length 15, is this even valid?
Should not it be of length 16, so the binary representation would be 128 bits?
If you have a thing you could type on a keyboard, that is not a proper AES key, no matter the length. AES derives its power from the fact that its key is effectively random. Anything you can type on a keyboard in not an effectively random sequence of equivalent length. There are only about 96 characters you can type easily on a Latin-style keyboard. A byte has 256 values. 96^16 is a minuscule fraction of 256^16.
To convert a "password" that a human could type into an effectively random AES key, you need a password-based key derivation function (PBKDF). The most famous and widely available is PBKDF2. There are other excellent PBKDFs including scrypt and Argon2. All of them require a random salt, and all are (in cryptographic terms) very slow to compute.
That said, regarding your framework, it is not possible to guess how they have converted this string into a key. You must consult the documentation or the implementation. There are an unbounded number of ways to convert strings into keys (most of them are terrible, but there are still an unbounded selection to pick from). As Michael Fehr noted they might have done something insecure like padding with zeros. They might also have used a simple hashing function like SHA-256 and either used a 256-bit key or taken the top or bottom 128 bits. Or…almost literally anything else. There is no common practice here. Each encryption system has to document how it is implemented.
(Note that even if you see "AES-128," this is also ambiguous. It can mean "AES with a 128-bit key" or it can mean "AES with a 128-bit block and a key of 128, 192 or 256 bits." While the former meaning is a bit more common, the latter occurs often, for example in Apple documentation, despite being redundant (AES always has a 128-bit block). So even questions like "how long is the key" requires digging into the documentation or the implementation. Cryptography is unfortunately incredibly unstandardized.)
Should not it be of length 16, so the binary representation would be 128 bits?
You are right. For AES only key length of 128, 192 or 256 bit is valid.
I commonly see two possibilities for having a key of different length:
You was given a password, not a key. Then you need as well to ask for a way to generate a key from the password (Hash? PBKDF2? Other?)
Many frameworks will silently accept different key length and then trim or zero-pad the value to fit the required key size. IMHO this is not a proper approach as it gives the developers feeling the key is good and in reality a different (padded or trimmed) value is used.

How to convert a message to an integer to encrypt it with RSA?

In encryption methods like RSA, we operate on an integer which represents our message. I've toyed around with converting the string to an array of bytes and working one character at a time, but that seems overly slow and the RSA algorithm is designed to work with the entire message.
How do we convert a string to a representation (integer, big integer etc) in which we can apply our cryptographic algorithm too?
In typical usage, you don't actually encrypt the entire message using RSA. Instead, you encrypt the encryption key for a symmetric block cipher (like AES) using RSA, then encrypt your stream of data using that block cipher.
Do not attempt to do this on your own! You have to be very careful with how you do the conversion, including setting up a secure padding scheme and using the block cipher correctly and in a secure mode. You might want to look using language-provided crypto libraries or a standard library like OpenSSL.
Hope this helps!
Think about how integers and strings are represented in memory. A 32-bit integer takes up four 8-bit bytes, and a 64-bit integer takes up eight bytes. A string is stored as bytes too, and in case of ASCII, each characters is represented by one byte. (UTF-8 and UTF-16 are variable length encodings, but it's still bytes.)
There is nothing to convert, because all datatypes are represented by bytes internally.
There's no reason this can't be extended to, say, 2048-bit integers for use with RSA.

How collision resistant are encryption algorithms?

How hard is it for a given ciphertext generated by a given (symmetric or asymmetric) encryption algorithm working on a plaintext/key pair, to find a different plaintext/key pair that yields the same cyphertext?
And how hard is it two find two plaintext/key pairs lead to the same cyphertext?
What led to this question, is another question that might turn out to have nothing to do with the above questions:
If you have a ciphertext and a key and want to decrypt it using some decryption routine, the routine usually tells you, if the key was correct. But how does it know it? Does it look for some pattern in the resulted plaintext, that indicates, that the decryption was successful? Does there exists another key results in some different plaintext, that contains the pattern and is also reported "valid" by the routine?
Follow-up question inspired by answers and comments:
If the allowed plaintext/key pairs where restricted in the on of the following (or both) way(s):
1) The plaintext starts with the KCV (Key check value) of the key.
2) The plaintext starts with a hash value of some plaintext/key combination
Would this make the collision finding infeasible? Is it even clear, that such a plaintext/key exists=
The answer to your question the way you phrased it, is that there is no collision resistance what so ever.
Symmetric case
Let's presume you got a plain text PT with a length that is a multiple of the block length of the underlying block cipher. You generate a random IV and encrypt the plain text using a key K, CBC mode and no padding.
Producing a plain text PT' and key K' that produces the same cipher text CT is easy. Simply select K' at random, decrypt CT using key K' and IV, and you get your colliding PT'.
This gets a bit more complicated if you also use padding, but it is still possible. If you use PKCS#5/7 padding, just keep generating keys until you find one such that the last octet of your decrypted text PT' is 0x01. This will take on average 128 attempts.
To make such collision finding infeasible, you have to use a message authentication code (MAC).
Asymmetric case
Something similar applies to RSA public key encryption. If you use no padding (which obviously isn't recommended and possibly not even supported by most cryptographic libraries), and use a public key (N,E) for encrypting PT into CT, simply generate a second key pair (N',E',D') such that N' > N, then PT' = CT^D' (mod N) will encrypt into CT under (N',E').
If you are using PKCS#1 v1.5 padding for your RSA encryption, the most significant octet after the RSA private key operation has to be 0x02, which it will be with a probability of approximately one in 256. Furthermore the first 0x00 valued octet has to occur no sooner than at index 9, which will happen with a high probability (approximately 0,97). Hence, on average you will have to generate on average some 264 random RSA key pairs of the same bit size, before you hit one that for some plain text could have produced the same cipher text.
If your are using RSA-OAEP padding, the private key decryption is however guaranteed to fail unless the cipher text was generated using the the corresponding public key.
If you're encrypting some plaintext (length n), then there are 2n unique input strings, and each must result in a unique ciphertext (otherwise it wouldn't be reversible). Therefore, all possible strings of length n are valid ciphertexts. But this is true for all keys. Therefore, for any given ciphertext, there are 2k ways of obtaining it, each with a different key of length k.
Therefore, to answer your first question: very easy! Just pick an arbitrary key, and "decrypt" the ciphertext. You will get the plaintext that matches the key.
I'm not sure what you mean by "the routine usually tells you if the key was correct".
One simple way to check the validity of a key is to add a known part to the plaintext before encryption. If the decryption doesn't reproduce that, it's not the right key.
The known part should not be a constant, since that would be an instant crib. But it could be e.g. be a hash of the plaintext; if hashing the decrypted text yields the same hash value, the key is probably correct (with the exception of hash collisions).

How is an MD5 or SHA-X hash different from an encryption?

I've read a couple times that MD5 is not an encryption, e.g. on MD5 ... Encryption? or Command Line Message Digest Utility.
Well, I get that it's a hash/message digest, and the explanation in the links above says an encryption has to have a key, while hash/md is a cryptographic hash function that produces just a signature. I don't really understand the difference. Couldn't you see the cryptographic hash function / algorithm as a key?
Also, what is the difference between something that's cryptographic and something that's encryption?
You can't "decrypt" a md5 hashed function, and you chose a bad algorithm if you want to transmit information and the receiver can't read it.
So encryption must be decryptable. MD5 is a "cryptographic" hash function, because it's very difficult to produce a block of information that has a specific given hash value.
So if you want to sign a message, it is enough to sign the hash. This uses less computing power and the receiver can regardless be sure that the original message is untouched.
A hash algorithm causes information about the original data to be lost irreplaceably, whereas an encryption algorithm has a corresponding decryption algorithm which restores the original data.
This can be shown in that hash algorithm results have a uniform size (128, 160, 256, etc. bits) regardless of the input, whereas encryption algorithm results have a variable size depending on the size of the input.
I don't think you can regard the function itself as a key.
Because a key is something you pass to the function in order encrypt or decrypt (<- impossible with md5) a message.

What encryption algorithm is best for small strings?

I have a string of 10-15 characters and I want to encrypt that string. The problem is I want to get a shortest encrypted string as possible. I will also want to decrypt that string back to its original string.
Which encryption algorithm fits best to this situation?
AES uses a 16-byte block size; it is admirably suited to your needs if your limit of 10-15 characters is firm. The PKCS#11 (IIRC) padding scheme would add 6-1 bytes to the data and generate an output of exactly 16 bytes. You don't really need to use an encryption mode (such as CBC) since you're only encrypting one block. There is an issue of how you'd be handling the keys - there is always an issue of how you handle encryption keys.
If you must go with shorter data lengths for shorter strings, then you probably need to consider AES in CTR mode. This uses the key and a counter to generate a byte stream which is XOR'd with the bytes of the string. It would leave your encrypted string at the same length as the input plaintext string.
You'll be hard pressed to find a general purpose compression algorithm that reliably reduces the length of such short strings, so compressing before encrypting is barely an option.
If it's just one short string, you could use a one-time pad which is mathematically perfect secrecy.
http://en.wikipedia.org/wiki/One-time_pad
Just be sure you don't use the key more than one time.
If the main goal is shortening, I would look for a compression library that allows a fixed dictionary built on a corpus of common strings.
Personally I do not have experience with that, but I bet LZMA can do that.

Resources