Comparison of one hash function and RSA - encryption

I want to compare a hash function and a RSA encryption with another parameter.
I have an algorithm with some hash function and I want to claim that computation load of these hashes is less than one RSA.
Can I say compare them with multiplication parameter, for example how many multiplication each of them has?
How can I compare them in communication load? How can I say that what the length of output in RSA is?

It sounds like you're trying to compare apples and oranges.
A hash function is generally expected to accept arbitrarily long inputs, and the time needed to compute it should generally scale linearly with the length of the input. Thus, a useful measure of hash function performance would be, say, "megabytes per second".
(Specifically, that would be a measure of throughput, which is the relevant measure when hashing long inputs. For short messages, a more relevant measure is the latency, which is basically the minimum time needed to hash zero-length input. Given the throughput and the latency, one can generally calculate a fairly good approximation of the time needed to hash an input of any given length as time = latency + length / throughput.)
RSA, on the other hand, can only encrypt messages shorter than the modulus, which is chosen at the time the key is generated. (Typical modulus sizes might be, say, from 1024 to 4096 bits.) To "encrypt a long message with RSA" one would normally use hybrid encryption: first encrypt the message using a symmetric cipher like AES, using a suitable mode of operation and a randomly chosen key, and then encrypt the AES key with RSA.
The same length limits apply to signing messages with RSA — by itself, RSA can only sign messages shorter than the modulus. The standard workaround in this case is to first hash the message, and then sign the hash value. (There's also a lot of important details like padding involved that I'm not going to go into here, since we're not on crypto.SE, but which are absolutely crucial for security.)
The point is that, in both cases, the RSA operation itself takes a fixed amount of time regardless of the message length, and thus, for sufficiently long messages, most of the time will be consumed by AES or the hash function, not by RSA itself. So when you say you want to "claim that computation load of these hashes is less than one RSA", I would say that's meaningless, at least unless you fixed a specific input length for your hash. (And if you did, my next question would be "what's so special about that particular input length?")

Related

AES Encryption and Obfuscating IDs

I was considering hashing small blocks of sensitive ID data but I require to maintain the full uniqueness of the data blocks as a whole once obfuscated.
So, I came up with the idea of encrypting some publicly-known input data (say, 128 bits of zeroes), and use the data I want to obfuscate as the key/password, then throw it away, thus protecting the original data from ever being discovered.
I already know about hashing algorithms, but my problem is that I need to maintain full uniqueness (generally speaking a 1:1 mapping of input to output) while still making it impossible to retrieve the actual input. A hash cannot serve this function because information is lost during the process.
It is not necessary that the data be retrieved once "encrypted". It is only to be used as an ID number from then on.
An actual GUID/UUID is not suitable here because I need to manually control the identifiers on a per-identifier basis. The IDs cannot be unknown or arbitrarily generated data.
EDIT: To clarify exactly what these identifiers are made of:
(unencrypted) 64bit Time Stamp
ID Generation Counter (one count for each filetype)
Random Data (to make multiple encrypted keys dissimilar)
MAC Address (or if that's not available, set top bit + random digits)
Other PC-Specific Information (from registry)
The whole thing should add up to 192 bits, but the encrypted section's content size(s) could vary (this is by no means a final specification).
Given:
A static IV value
Any arbitrary 128bit key
A static 128 bits of input
Are AES keys treated in a fashion that would result in a 1:1 key<---->output mapping, given the same input and IV value?
No. AES is, in the abstract, a family of permutations of which you select a random one with the key. It is the case that for one of those permutations(i.e. for encryption under a given AES key) you will not get collisions because permutations are bijective.
However, for two different permutations (i.e. encryption under different AES keys, which is what you have), there is no guarantee what so ever that you don't get a collision. Indeed, because of the birthday paradox, the likelihood of a collision is probably higher than you think.
If your ID's are short ( < 1024 bits) you could just do an RSA encryption of them which would give you want you want. You'd just need to forget the private key.

SHA2 - Why is it difficult (actually impossible without using brute-force) to recover the message if you have the Digest and the Key

I looked into the implementation of SHA2 in Python and it looks like it uses some default key for hashing. Once the key is known and the digest is known, is it possible to get the plain-text back? (pre-image attack without brute-force) http://en.wikipedia.org/wiki/Preimage_attack
My intuition says NO as the block size (input size) is 512 bit and the output size is 256 bit. It means that to be a good hashing function (cryptographically) the function should be many to one function (non-invertible). {This is the exact opposite to the requirement of a block cipher where the function should be invertible (one to one).}
As far as I understood the requirement is to have random many-to-one function!
It's impossible, even with Brute Force, to recover the plain text for a given hash because there are many texts that map to the same N-bit key (for any value of N). That is, there are different messages that have the same digest value.

Short (6 bit) cryptographic keyed hash

I have to implement a simple hashing algorithm.
Input data:
Value (16-bit integer).
Key (any length).
Output data:
6-bit hash (number 0-63).
Requirements:
It should be practically impossible to predict hash value if you only have the input value but not the key. More specific: if I known hash(x) for x < M, it should be hard to predict hash(M) without knowing the key.
Possible solutions:
Keep full mapping as a key. So the key has length 2^16*6 bits. It's too long for my case.
Linear code. Key is a generator matrix. It's length is 16*6. But it's easy to find generator matrix using several known hash values.
Are there any other possibilities?
A HMAC seems to be what you want. So a possibility for you could be to use a SHA-based HMAC and just use a substring of the resulting hash. This should be relatively safe, since the bits of a cryptographic hash should be as independent and unpredictable as possible.
Depending on your environment, this could however take too much processing time, so you might have to chose a simpler hashing scheme to construct your HMAC.
Original Answer the discussion in the comments is based on:
Since you can forget cryptographic properties anyway (it is trivial to find collisions via bruteforce attacks on a 5-bit hash) you might as well use something like CRC or Hamming Codes and get error-detection for free
Mensi' suggestion to use truncated HMAC is a good one, but if you do happen to be on a highly constrained system and want something faster or simpler, you could take any block cipher, encrypt your 16-bit value (padded to a full block) with it and truncate the result to 6 bits.
Unlike HMAC, which computes a pseudorandom function, a block cipher is a pseudorandom permutation — every input maps to a different output. However, when you throw away all but six bits of the block cipher's output, what remains will look very much like a pseudorandom function. There will be a very tiny bias against repeated outputs, but (assuming that the block cipher's block size is much larger than 6 bits, which it should be) it'll be so small as to be all but undetectable.
A good block cipher choice for very low-end systems might be TEA or its successors XTEA and XXTEA. While there are some known attacks on these ciphers, they all require much more extensive access to the cipher than should be possible in your application.

How collision resistant are encryption algorithms?

How hard is it for a given ciphertext generated by a given (symmetric or asymmetric) encryption algorithm working on a plaintext/key pair, to find a different plaintext/key pair that yields the same cyphertext?
And how hard is it two find two plaintext/key pairs lead to the same cyphertext?
What led to this question, is another question that might turn out to have nothing to do with the above questions:
If you have a ciphertext and a key and want to decrypt it using some decryption routine, the routine usually tells you, if the key was correct. But how does it know it? Does it look for some pattern in the resulted plaintext, that indicates, that the decryption was successful? Does there exists another key results in some different plaintext, that contains the pattern and is also reported "valid" by the routine?
Follow-up question inspired by answers and comments:
If the allowed plaintext/key pairs where restricted in the on of the following (or both) way(s):
1) The plaintext starts with the KCV (Key check value) of the key.
2) The plaintext starts with a hash value of some plaintext/key combination
Would this make the collision finding infeasible? Is it even clear, that such a plaintext/key exists=
The answer to your question the way you phrased it, is that there is no collision resistance what so ever.
Symmetric case
Let's presume you got a plain text PT with a length that is a multiple of the block length of the underlying block cipher. You generate a random IV and encrypt the plain text using a key K, CBC mode and no padding.
Producing a plain text PT' and key K' that produces the same cipher text CT is easy. Simply select K' at random, decrypt CT using key K' and IV, and you get your colliding PT'.
This gets a bit more complicated if you also use padding, but it is still possible. If you use PKCS#5/7 padding, just keep generating keys until you find one such that the last octet of your decrypted text PT' is 0x01. This will take on average 128 attempts.
To make such collision finding infeasible, you have to use a message authentication code (MAC).
Asymmetric case
Something similar applies to RSA public key encryption. If you use no padding (which obviously isn't recommended and possibly not even supported by most cryptographic libraries), and use a public key (N,E) for encrypting PT into CT, simply generate a second key pair (N',E',D') such that N' > N, then PT' = CT^D' (mod N) will encrypt into CT under (N',E').
If you are using PKCS#1 v1.5 padding for your RSA encryption, the most significant octet after the RSA private key operation has to be 0x02, which it will be with a probability of approximately one in 256. Furthermore the first 0x00 valued octet has to occur no sooner than at index 9, which will happen with a high probability (approximately 0,97). Hence, on average you will have to generate on average some 264 random RSA key pairs of the same bit size, before you hit one that for some plain text could have produced the same cipher text.
If your are using RSA-OAEP padding, the private key decryption is however guaranteed to fail unless the cipher text was generated using the the corresponding public key.
If you're encrypting some plaintext (length n), then there are 2n unique input strings, and each must result in a unique ciphertext (otherwise it wouldn't be reversible). Therefore, all possible strings of length n are valid ciphertexts. But this is true for all keys. Therefore, for any given ciphertext, there are 2k ways of obtaining it, each with a different key of length k.
Therefore, to answer your first question: very easy! Just pick an arbitrary key, and "decrypt" the ciphertext. You will get the plaintext that matches the key.
I'm not sure what you mean by "the routine usually tells you if the key was correct".
One simple way to check the validity of a key is to add a known part to the plaintext before encryption. If the decryption doesn't reproduce that, it's not the right key.
The known part should not be a constant, since that would be an instant crib. But it could be e.g. be a hash of the plaintext; if hashing the decrypted text yields the same hash value, the key is probably correct (with the exception of hash collisions).

When using AES, is there a way to tell if data was encrypted using 128 or 256 bit keys?

I was wondering if there is some way to tell if data was encrypted with a specific key size, without the source code of course. Is there any detectable differences with the data that you can check post encryption?
No there is not any way to do that. Both encrypt 16-byte chunks of data and the resulting blocks would "look" the same after the encryption is complete (they would have different values, but an analysis on only the encrypted data would not be able to determine the original key size). If the original data (plain text) is available, it may be possible to do some kind of analysis.
A very simplistic "proof" is:
For a given input, the length of the output is the same regardless of the key size. It may, however, differ depending on the mode (CBC, CTR, etc.).
Since the encryption is reversible, it can be considered to be a one-to-one function. In other words, a different input results in a different output.
Therefore, it is possible to produce any given output (by changing the plain text) regardless of the key size.
Thus, for a given password, you could end up with the same output by using the appropriate plain text regardless of the key size. This "proof" has a hole in that padding schemes can result in a longer output than input (so the function is not necessarily onto.) But I doubt this would make a difference in the end result.
If an encryption system is any good (AES is) then there should be no way to distinguish its raw output from random data -- so, in particular, there should be no way to distinguish between AES-128 and AES-256, at least on the output bits.
However, most protocols which use encryption end up including some metadata which designates, without ambiguity, the kind of algorithm which was used, including key size. This is to that the receiver knows what to use to decrypt. This is not considered to be an issue. So, in practice, one has to assume that whatever attacker looks at your system knows whether the key is actually a 128-bit or 256-bit key.
Some side channels may give that information, too. AES encryption with a 256-bit key is 40% slower than AES encryption with a 128-bit key: simply timing how much time an encrypting server takes to respond can reveal the key size.

Resources