Related
I've done some research into this, but I'm still not sure why this cannot be implemented. Provided we share an initial OTP, possibly via USB or some other physically secure method, surely we can include the next one in the messages that follow.
[Edit: More specifically, if I were to take a pad of double length, splitting it into x and y. Then using x to encrypt the message, and using y twice to encrypt the next pad, would that be insecure?]
You have to pair each bit of message with a same size bit of OTP. There's a limited amount of OTP.
If you pair up all of the OTP bits with bits for the next OTP...
a b c d e ...
q w e r t ...
There's no room for a message. And if you keep spending your OTP transferring another OTP, there never will be room for a message.
You can't compress the OTP, because the strength of the OTP is that it's completely random - that's what makes it impossible for codebreakers, because there's no pattern to latch onto.
Compression is a technology that works by finding patterns and replacing them with shorter "that large repetitive block goes here and here and there" signals - and by definition there are no patterns in complete randomness, so OTPs are not compressible.
If you can compress it a bit, you could do this but it's not right to describe it as OTP anymore, it's weak - and also massively wasteful of bandwidth. If you can compress it a lot, throw your random number generator away it's terrible.
Quick test demonstration of concept on a linux machine:
$ dd if=/dev/urandom of=/tmp/test count=10k
-> 5Mb file of randomness
$ bzip2 /tmp/test
-> 5.1Mb file
$ gzip /tmp/test
-> 5.1Mb file
Compressing a pad makes it bigger, by adding all the bzip/gzip file format information and doing nothing else.
What makes a One-Time Pad strong is, in addition to the lack of a pattern, the fact that there is no way to tell that the key used was the correct key. A message could be decrypted to reveal some "take over the world" scenario, but literally every message encrypted with a key of that exact length has a key that reveals that exact same message, word for word. This means you could have the actual decrypted message and the correct key, but it would be impossible to know that this is the case, and because literally any message (and I do mean literally) of that length can be a result. Even rubber-hose-decryption won't work. Even if the person being "persuaded" gives the correct key, there's no way to be sure. It's even common practice for people to possess fake keys that decrypt messages to reveal a message that isn't what an investigator is looking for, but would definitely be something even a completely innocent person would hide. A OTP hiding confidential information could, for instance, have a fake key that reveals someone bad-mouthing their commanding officer.
I need to encrypt two-way (symmetric) distinct tokens. These tokens are expected to be repeated (e.g. They are people first names), but I do not want an attacker to conclude which encrypted tokens came from the same original tokens. Salt is the way to go for one-way cryptography (hashing).
Is there a method that can work in symmetric cryptography, a workaround or an alternative?
Yes. Properly used, symmetric encryption does not reveal anything about the plaintext, not even the fact that multiple plaintexts are the same.
Proper usage means choosing a mode of operation that uses an initialization vector (IV) or nonce (that is, not ECB), and choosing the IV appropriately (usually random bytes). Encrypting multiple plaintexts with the same key and IV allows this attack pretty much just like with ECB mode, and using a static IV is a common mistake.
As mentioned above, properly utilizing a symmetric encryption scheme would NOT reveal information about the plaintext. You mention the need to protect the users against a dictionary attack on the hidden tokens, and a properly utilized encryption scheme such as GCM would provide you with this property.
I recommend utilizing GCM mode as it is an efficient authenticated encryption scheme. Performing cryptographic functions on unauthenticated data may lead to security flaws so utilizing an authenticated encryption scheme such as GCM is your best bet. Note that this encryption scheme along with other CPA-SECURE schemes will provide you security against an adversary that wishes to learn the value of an encrypted token.
For example, in correctly implemented GCM mode, the encryption of the same last name will result in a different ciphertext i.e GCM Mode is Non-Deterministic.
Make sure to utilize a secure padding scheme and fix a length for the ciphertexts to make sure an attacker can't use the lenght of the ciphertext to learn some information about the contents of what generated this token.
Be careful however, you can't interchangeably use hash functions and symmetric encryption schemes as they are created for very different purposes. Be careful with how you share the key, and remember that once an adversary has knowledge of the key, there is nothing random about the ciphertext.
-NOTE-
Using encryption incorrectly : If every user is utilizing the same key to encrypt their token then they can simply decrypt everyone else's token and see the name that generated it.
To be safe, every user must encrypt with a different key so now you have to somehow store and manage the key for each user. This may be very painful and you have to be very careful with this.
However if you are utilizing salts and hash functions, then even if every user is utilizing the same salt to compute hash(name||salt), a malicious user would have to brute force all possible names with the salt to figure out what generated these tokens.
So keep this into consideration and be careful as hash functions and symmetric encryptions schemes can't be used interchangeably.
Assuming that the only items to be ciphered are the tokens (that is, they are not embedded in a larger data structure), then Inicialization Vectors (IV's) are the way to go.
They are quite simple to understand: let M be your token, padded to fit the block size used in the symmetric ciphering algorithm (I'm assuming it's AES) and IV be a random array of bits also the size of the ciphering block.
Then compute C = AES_ENCRYPT(M xor IV, K) where C is the ciphered data and K the symmetric key. That way, the same message M will not be ciphered the same way multiple times since IV is randomly obtained every time.
To decrypt M, just compute M = (AES_DECRYPT(C, K) xor IV).
Of course, both IV and K must be known at decryption time. The most usual way to transmit the IV is to just send it along the ciphered text. This does not compromise security, it's pretty much like storing a salt value, since the encryption key will remain unknown for everybody else.
Will any encryption scheme safely allow me to encrypt the same integer repeatedly, with different random material prepended each time? It seems like the kind of operation that might get me in hot water.
I want to prevent spidering of items at my web application, but still have persistent item IDs/URLs so content links don't expire over time. My security requirements aren't really high for this, but I'd rather not do something totally moronic that obviously compromises the secret.
// performed on each ID before transmitting item search results to the client
public int64 encryptWithRandomPadding(int32 id) {
int32 randomPadding = getNextRandomInt32();
return encrypt(((int64)randomPadding << 32) + id), SECRET);
}
// performed on an encrypted/padded ID for which the client requests details
public int32 decryptAndRemoveRandomPadding(int64 idToDecrypt) {
int64 idWithPadding = decrypt(idToDecrypt, SECRET);
return (int32)idWithPadding;
}
static readonly string SECRET = "thesecret";
Generated IDs/URLs are permanent, the encrypted IDs are sparsely populated (less than 1 in uint32.Max are unique, and I could add another constant padding to reduce the likelyhood of a guess existing), and the client may run the same search and get the same results with different representative IDs each time. I think it meets my requirements, unless there's a blatant cryptographic issue.
Example:
encrypt(rndA + item1) -> tokenA
encrypt(rndB + item1) -> tokenB
encrypt(rndC + item2) -> tokenC
encrypt(rndD + item175) -> tokenD
Here, there is no way to identify that tokenA and tokenB both point to identical items; this prevents a spider from removing duplicate search results without retrieving them (while retrieving increments the usage meter). Additionally, item2 may not exist.
Knowing that re-running a search will return the same int32 padded multiple ways with the same secret, can I do this safely with any popular crypto algorithms? Thanks, crypto experts!
note: this is a follow-up to a question that didn't work out as I'd hoped: Encrypt integer with a secret and shared salt
If your encryption is secure, then random padding makes cracking neither easier nor harder. For a message this short, a single block long, either everything is compromised or nothing is. Even with a stream cipher, you'd still need the key to get any further; the point of good encryption is that you don't need extra randomness. Zero padding or other known messages at least a block long at the beginning are obviously to be avoided if possible, but that's not the issue here. It's pure noise, and once someone discovered that, they'd just skip ahead and start cracking from there.
Now, in a stream cipher, you can add all the randomness in the beginning and the later bytes will still be the same with the same key, don't forget that. This only actually does anything at all for a block cipher, otherwise you'd have to xor the random bits into the real value to get any use out of it.
However, you might be better off using a MAC as padding: with proper encryption, the encrypted mac won't give any information away, but it looks semi-randomish and you can use it to verify that there were no errors or malicious attacks during decryption. Any hash function you like can create the MAC, even a simple CRC-32, without giving anything away after encryption.
(A cryptographer might find a way to shave a bit or two off due to the relatedness, will tons of plaintexts if they knew beforehand how they were related, but that's still far beyond practicality.)
As you asked before, you can safely throw in an unecrypted salt in front of every message; a salt can only compromise an encrypted value if the implementation is broken or the key compromised, as long as the salt is properly mixed into the key, particularly if you can mix it into the expanded key schedule before decryption. Modern hash algorithms with lots of bits are really good at that, but even mixing into a regular input key will always have the same security as the key alone.
First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data. This seems highly unlikely, if not impossible to me, so sorry if it's a stupid question.
My final question is, say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
No. There are algorithms that use a different key to decrypt than to encrypt, but a keyless method would rely on secrecy of the algorithm, generally regarded as a poor idea.
My final question is, say you have
access to the plain text data before
it is encrypted, the key used to
encrypt the plain text data, and the
resulting encrypted data, would
figuring out which algorithm used to
encrypt the data be feasible?
Most likely yes, especially given the key. A good crypto algorithm relies on the secrecy of the key, and the key alone. See kerckhoff's principle.
Also if a common algorithm is used it would be a simple matter of trial and error, and besides cryptotext often is accompanied by metadata which tells you algorithm details.
edit: as per comments, you may be thinking of digital signature (which requires a secret only on the sender side), a hash algorithm (which requires no key but isn't encryption), or a zero-knowledge proof (which can prove knowledge of a secret without revealing it).
Abstractly, we can think of the encryption system this way:
-------------------
plaintext ---> | algorithm & key | ---> ciphertext
-------------------
The system must guarantee the following:
decrypt(encrypt(plaintext, algorithm, key), algorithm, key) = plaintext
First off, I would like to ask
if any of you know of an encryption
algorithm that uses a key to encrypt
the data, but no key to decrypt the
data.
Yes, in such a system the key is redundant; all the "secrecy" lies in the algorithm.
My final question
is, say you have access to the plain
text data before it is encrypted, the
key used to encrypt the plain text
data, and the resulting encrypted
data, would figuring out which
algorithm used to encrypt the data be
feasible?
In practice, you'll probably have a small space of algorithms, so a simple brute-force search is feasible. However, there may be more than one algorithm that fits the given information. Consider the following example:
We define the following encryption and decryption operations, where plaintext, ciphertext, algorithm, and key are real numbers (assume algorithm is nonzero):
encrypt(plaintext, algorithm, key) = algorithm x (plaintext + key) = ciphertext
decrypt(ciphertext, algorithm, key) = ciphertext/algorithm - key = plaintext
Now, suppose that plaintext + key = 0. We have ciphertext = 0 for any choice of algorithm. Hence, we cannot deduce the algorithm used.
First off, I would like to ask if any of you know of an encryption algorithm that uses a key to encrypt the data, but no key to decrypt the data.
What are you getting at? It's trivial to come up with a pair of functions that fits the letter of the specification, but without knowing the intent it's hard to give a more helpful answer.
say you have access to the plain text data before it is encrypted, the key used to encrypt the plain text data, and the resulting encrypted data, would figuring out which algorithm used to encrypt the data be feasible?
If the algorithm is any good the output will be indistinguishable from random noise, so there is no analytic solution to this. As a practical matter, there are only so many trusted algorithms in wide use. Trying each one in turn would be quick, but would be complicated by the fact that an implementation has some freedom with regard to things like byte order (little-endian vs big-endian), key derivation (if you had a pass-phrase instead of the actual cryptographic key itself), encryption modes and padding.
As frankodwyer points out, this situation is not part of usual threat models. This would work in your favor, as it makes it more likely that the algorithm is a well-known one.
The best you could do without a known key in the decoder would be to add a bit of obscurity. For example, if the first step of the decode algorythm is to strip out everything except for every tenth character, then your encode key may be used to seed some random garbage for nine out of every ten characters. Thus, with different keys you could achieve different encoded results which would be decoded to the same message, with no key necessary for the decoder.
However, this does not add much real security and should not be solely relied on to protect crucial data. I'm just thinking of a case where it would be possible to do so yes I suppose it could - if you were just trying to prove a point or add one more level of security.
I don't believe that there is such an algorithm that would use a key to encrypt, but not to decrypt. (Silly answers like a 26 character Caesar cipher aside...)
To your second question, yes; it just depends on how much time you're willing to spend on it. In theoretical cryptography it is assumed that the algorithm can always be determined. Whether that be through theft of the algorithm or a physical machine, or as in your case having a plain text and cipher text pair.
What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.