AES Encryption and Obfuscating IDs - encryption

I was considering hashing small blocks of sensitive ID data but I require to maintain the full uniqueness of the data blocks as a whole once obfuscated.
So, I came up with the idea of encrypting some publicly-known input data (say, 128 bits of zeroes), and use the data I want to obfuscate as the key/password, then throw it away, thus protecting the original data from ever being discovered.
I already know about hashing algorithms, but my problem is that I need to maintain full uniqueness (generally speaking a 1:1 mapping of input to output) while still making it impossible to retrieve the actual input. A hash cannot serve this function because information is lost during the process.
It is not necessary that the data be retrieved once "encrypted". It is only to be used as an ID number from then on.
An actual GUID/UUID is not suitable here because I need to manually control the identifiers on a per-identifier basis. The IDs cannot be unknown or arbitrarily generated data.
EDIT: To clarify exactly what these identifiers are made of:
(unencrypted) 64bit Time Stamp
ID Generation Counter (one count for each filetype)
Random Data (to make multiple encrypted keys dissimilar)
MAC Address (or if that's not available, set top bit + random digits)
Other PC-Specific Information (from registry)
The whole thing should add up to 192 bits, but the encrypted section's content size(s) could vary (this is by no means a final specification).
Given:
A static IV value
Any arbitrary 128bit key
A static 128 bits of input
Are AES keys treated in a fashion that would result in a 1:1 key<---->output mapping, given the same input and IV value?

No. AES is, in the abstract, a family of permutations of which you select a random one with the key. It is the case that for one of those permutations(i.e. for encryption under a given AES key) you will not get collisions because permutations are bijective.
However, for two different permutations (i.e. encryption under different AES keys, which is what you have), there is no guarantee what so ever that you don't get a collision. Indeed, because of the birthday paradox, the likelihood of a collision is probably higher than you think.
If your ID's are short ( < 1024 bits) you could just do an RSA encryption of them which would give you want you want. You'd just need to forget the private key.

Related

How are pairs of asymmetric encryption keys generated?

I have recently been learning about public/private key encryption in my computer science lessons, and how it works in terms of data encryption/decryption. We also covered how it can be used for digital signatures. However, we didn't go into too much detail on how the actual keys are generated themselves.
I know that it begins with a very large number, which is then passed through some kind of keygen algorithm which returns two distinctive keys, one of which is private and the other is public. Are these algorithms known or are they black box systems? And does one user always have the same pair of keys linked to them or do they ever change at any point?
It just seems like a very mathematical issue, as the keys are linked, yet one is not deducible from the other.
I know that it begins with a very large number, which is then passed through some kind of keygen algorithm which returns two distinctive keys, one of which is private and the other is public.
Well, that's not entirely correct. Most asymmetric algorithms are of course based on large numbers, but this is not a requirement. There are, for instance, algorithms based on hashing, and hashing is based on bits/bytes, not numbers.
But yes, for asymmetric algorithms usually contain a specific algorithm to perform the key pair generation. For instance, asymmetric encryption consists of a triple Gen, Enc and Dec where Gen represents the key pair generation. And the key pair of course consists of a public and a private part.
RSA basically starts off by generating two large random primes, it doesn't start with a single number necessarily.
Are these algorithms known or are they black box systems?
They are known, and they are fundamental to the security of the system. You cannot use just any numbers to perform, e.g., RSA. Note that for RSA there are different algorithms and configurations possible; not every system will use the same Gen.
And does one user always have the same pair of keys linked to them or do they ever change at any point?
That depends on the key management of the system. Usually there is some way of refreshing or regenerating keys. For instance X.509 certificates tend to have a end date (the date of expiry or expiration date), so you cannot even keep using the corresponding private key forever; you have to refresh the certificates and keys now and then.
It just seems like a very mathematical issue, as the keys are linked, yet one is not deducible from the other.
That's generally not correct. The public key is usually easy to derive from the private key. For RSA the public exponent may not be known, but it is usually set to a fixed number (65537). This together with the modulus - also part of the private key - makes the public key. For Elliptic Curve keys a private random value is first produced and the public key is directly derived from it.
You can of course never derive the private key from the public key; that would make no sense - it would not be very private if you could.
In RSA the generated two numbers p and q are very large prime numbers more or less the same size, which are used to calculate N which derives the public/private keys using modulo arithmetic.
The following answer in crypto.stackexchange.com describes in more details how we can start from a random (large) number and use Fermat test and Miller-Rabin tests to reach a number that is very probable to be prime.

Comparison of one hash function and RSA

I want to compare a hash function and a RSA encryption with another parameter.
I have an algorithm with some hash function and I want to claim that computation load of these hashes is less than one RSA.
Can I say compare them with multiplication parameter, for example how many multiplication each of them has?
How can I compare them in communication load? How can I say that what the length of output in RSA is?
It sounds like you're trying to compare apples and oranges.
A hash function is generally expected to accept arbitrarily long inputs, and the time needed to compute it should generally scale linearly with the length of the input. Thus, a useful measure of hash function performance would be, say, "megabytes per second".
(Specifically, that would be a measure of throughput, which is the relevant measure when hashing long inputs. For short messages, a more relevant measure is the latency, which is basically the minimum time needed to hash zero-length input. Given the throughput and the latency, one can generally calculate a fairly good approximation of the time needed to hash an input of any given length as time = latency + length / throughput.)
RSA, on the other hand, can only encrypt messages shorter than the modulus, which is chosen at the time the key is generated. (Typical modulus sizes might be, say, from 1024 to 4096 bits.) To "encrypt a long message with RSA" one would normally use hybrid encryption: first encrypt the message using a symmetric cipher like AES, using a suitable mode of operation and a randomly chosen key, and then encrypt the AES key with RSA.
The same length limits apply to signing messages with RSA — by itself, RSA can only sign messages shorter than the modulus. The standard workaround in this case is to first hash the message, and then sign the hash value. (There's also a lot of important details like padding involved that I'm not going to go into here, since we're not on crypto.SE, but which are absolutely crucial for security.)
The point is that, in both cases, the RSA operation itself takes a fixed amount of time regardless of the message length, and thus, for sufficiently long messages, most of the time will be consumed by AES or the hash function, not by RSA itself. So when you say you want to "claim that computation load of these hashes is less than one RSA", I would say that's meaningless, at least unless you fixed a specific input length for your hash. (And if you did, my next question would be "what's so special about that particular input length?")

When using AES, is there a way to tell if data was encrypted using 128 or 256 bit keys?

I was wondering if there is some way to tell if data was encrypted with a specific key size, without the source code of course. Is there any detectable differences with the data that you can check post encryption?
No there is not any way to do that. Both encrypt 16-byte chunks of data and the resulting blocks would "look" the same after the encryption is complete (they would have different values, but an analysis on only the encrypted data would not be able to determine the original key size). If the original data (plain text) is available, it may be possible to do some kind of analysis.
A very simplistic "proof" is:
For a given input, the length of the output is the same regardless of the key size. It may, however, differ depending on the mode (CBC, CTR, etc.).
Since the encryption is reversible, it can be considered to be a one-to-one function. In other words, a different input results in a different output.
Therefore, it is possible to produce any given output (by changing the plain text) regardless of the key size.
Thus, for a given password, you could end up with the same output by using the appropriate plain text regardless of the key size. This "proof" has a hole in that padding schemes can result in a longer output than input (so the function is not necessarily onto.) But I doubt this would make a difference in the end result.
If an encryption system is any good (AES is) then there should be no way to distinguish its raw output from random data -- so, in particular, there should be no way to distinguish between AES-128 and AES-256, at least on the output bits.
However, most protocols which use encryption end up including some metadata which designates, without ambiguity, the kind of algorithm which was used, including key size. This is to that the receiver knows what to use to decrypt. This is not considered to be an issue. So, in practice, one has to assume that whatever attacker looks at your system knows whether the key is actually a 128-bit or 256-bit key.
Some side channels may give that information, too. AES encryption with a 256-bit key is 40% slower than AES encryption with a 128-bit key: simply timing how much time an encrypting server takes to respond can reveal the key size.

Error correcting key encryption

Say I have a scheme that derives a key from N different inputs. Each of the inputs may not be completely secure (f.x. bad passwords) but in combination they are secure. The simple way to do this is to concatenate all of the inputs in order and use a hash as a result.
Now I want to allow key-derivation (or rather key-decryption) given only N-1 out of the N inputs. A simple way to do this is to generate a random key K, generate N temporary keys out of different N subsets of the input, each with one input missing (i.e. Hash(input_{1}, ..., input_{N-1}), Hash(input_{0}, input_{2}, ..., input_{N-1}), Hash(input_{0}, input_{1}, input_{3},..., input_{N-1}), ..., Hash(input_{0}, ..., input_{N-2})) then encrypt K with each of the N keys and store all of the results.
Now I want to a generalized solution, where I can decrypt the key using K out of N inputs. The naive way to expand the scheme above requires storing (N choose N-K) values, which quickly becomes infeasible.
Is there a good algorithm for this, that does not entail this much storage?
I have thought about a way to use something like Shamir's Secret Sharing Scheme, but cannot think of a good way, since the inputs are fixed.
Error Correcting Codes are the most direct way to deal with the problem. They are not, however, particularly easy to implement.
The best approach would be using a Reed Solomon Code. When you derive the password for the first time you also calculate the redundancy required by the code and store it. When you want to recalculate the key you use the redundancy to correct the wrong or missing inputs.
To encrypt / create:
Take the N inputs. Turn each into a block in a good /secure way. Use Reed Solomon to generate M redundancy blocks from the N block combination. You now have N+M blocks, of which you need only a total of N to generate the original N blocks.
Use the N blocks to encrypt or create a secure key.
If the first, store the encrypted key and the M redundancy blocks. If the second, store only the M redundancy blocks.
To decrypt / retrieve:
Take N - R correct input blocks, where R =< M. Combine them with the redundancy blocks you stored to create the original N blocks. Use the original N blocks to decrypt or create the secure key.
(Thanks to https://stackoverflow.com/users/492020/giacomo-verticale : This is essentially what he/she said, but I think a little more explicit / clearer.)
Shamir's share secret is a techinique that is used when you want to split a secret in multiple shares such that only a combination of minimum k parts would reveal the intial secret. If you are not sure about the correctness of the initiator and you want to verify this you use verifiable secret sharing .both are based to polynomial interpolation
One approach would be to generate a purely random key (or by hashing all of the inputs, if you want to avoid an RNG for some reason), split it using a k-of-n threshold scheme, and encrypt each share using the individual password inputs (eg send them through PBKDF2 with 100000 iterations and then encrypt/MAC with AES-CTR/HMAC). This would require less storage than storing hash subsets; roughly N * (share size + salt size + MAC size)
Rather than simply allowing a few errors out of a large number of inputs, you should divide the inputs up into groups and allow some number of errors in each group. If you were to allow 4 errors out of 64 inputs then you would have to have 15,249,024 encrypted keys, but if you break that up into two groups of 32, allowing two errors per group then you would only need to have 1984 encrypted keys.
Once you have decrypted the key information from each group then use that as input into decrypting key that you ultimately want.
Also, the keys acquired from each group must not be trivial in comparison to the key that you ultimately want. Do not simply break up a 256 bit key into 8 32bit key pieces. Doing this would allow someone that could decrypt 7 of those key pieces to attempt a bruteforce attack on the last piece. if you want access to a 256 bit key, then you must work with 256 bit keys for the whole procedure.

Generating short license keys with OpenSSL

I'm working on a new licensing scheme for my software, based on OpenSSL public / private key encryption. My past approach, based on this article, was to use a large private key size and encrypt an SHA1 hashed string, which I sent to the customer as a license file (the base64 encoded hash is about a paragraph in length). I know someone could still easily crack my application, but it prevented someone from making a key generator, which I think would hurt more in the long run.
For various reasons I want to move away from license files and simply email a 16 character base32 string the customer can type into the application. Even using small private keys (which I understand are trivial to crack), it's hard to get the encrypted hash this small. Would there be any benefit to using the same strategy to generated an encrypted hash, but simply using the first 16 characters as a license key? If not, is there a better alternative that will create keys in the format I want?
DSA signatures are signficantly shorter than RSA ones. DSA signatures are the twice the size of the Q parameter; if you use the OpenSSL defaults, Q is 160 bits, so your signatures fit into 320 bits.
If you can switch to a base-64 representation (which only requires upper-and-lower case alphanumerics, the digits and two other symbols) then you will need 53 symbols, which you could do with 11 groups of 5. Not quite the 16 that you wanted, but still within the bounds of being user-enterable.
Actually, it occurs to me that you could halve the number of bits required in the license key. DSA signatures are made up of two numbers, R and S, each the size of Q. However, the R values can all be pre-computed by the signer (you) - the only requirement is that you never re-use them. So this means that you could precalculate a whole table of R values - say 1 million of them (taking up 20MB) - and distribute these as part of the application. Now when you create a license key, you pick the next un-used R value, and generate the S value. The license key itself only contains the index of the R value (needing only 20 bits) and the complete S value (160 bits).
And if you're getting close to selling a million copies of the app - a nice problem to have - just create a new version with a new R table.
Did you consider using some existing protection + key generation scheme? I know that EXECryptor (I am not advertising it at all, this is just some info I remember) offers strong protection whcih together with complimentatary product of the same guys, StrongKey (if memory serves) offers short keys and protection against cracking. Armadillo is another product name that comes to my mind, though I don't know what level of protection they offer now. But they also had short keys earlier.
In general, cryptographically strong short keys are based on some aspects of ECC (elliptic curve cryptography). Large part of ECC is patented, and in overall ECC is hard to implement right and so industry solution is a preferred way to go.
Of course, if you don't need strong keys, you can go with just a hash of "secret word" (salt) + user name, and verify them in the application, but this is crackable in minutes.
Why use public key crypto? It gives you the advantage that nobody can reverse-engineer the executable to create a key generator, but key generators are a somewhat secondary risk compared to patching the executable to skip the check, which is generally much easier for an attacker, even with well-obfuscated executables.
Eugene's suggestion of using ECC is a good one - ECC keys are much shorter than RSA or DSA for a given security level.
However, 16 characters in base 32 is still only 5*16=80 bits, which is low enough that brute-forcing for valid keys might be practical, regardless of what algorithm you use.

Resources