I have to implement a simple hashing algorithm.
Input data:
Value (16-bit integer).
Key (any length).
Output data:
6-bit hash (number 0-63).
Requirements:
It should be practically impossible to predict hash value if you only have the input value but not the key. More specific: if I known hash(x) for x < M, it should be hard to predict hash(M) without knowing the key.
Possible solutions:
Keep full mapping as a key. So the key has length 2^16*6 bits. It's too long for my case.
Linear code. Key is a generator matrix. It's length is 16*6. But it's easy to find generator matrix using several known hash values.
Are there any other possibilities?
A HMAC seems to be what you want. So a possibility for you could be to use a SHA-based HMAC and just use a substring of the resulting hash. This should be relatively safe, since the bits of a cryptographic hash should be as independent and unpredictable as possible.
Depending on your environment, this could however take too much processing time, so you might have to chose a simpler hashing scheme to construct your HMAC.
Original Answer the discussion in the comments is based on:
Since you can forget cryptographic properties anyway (it is trivial to find collisions via bruteforce attacks on a 5-bit hash) you might as well use something like CRC or Hamming Codes and get error-detection for free
Mensi' suggestion to use truncated HMAC is a good one, but if you do happen to be on a highly constrained system and want something faster or simpler, you could take any block cipher, encrypt your 16-bit value (padded to a full block) with it and truncate the result to 6 bits.
Unlike HMAC, which computes a pseudorandom function, a block cipher is a pseudorandom permutation — every input maps to a different output. However, when you throw away all but six bits of the block cipher's output, what remains will look very much like a pseudorandom function. There will be a very tiny bias against repeated outputs, but (assuming that the block cipher's block size is much larger than 6 bits, which it should be) it'll be so small as to be all but undetectable.
A good block cipher choice for very low-end systems might be TEA or its successors XTEA and XXTEA. While there are some known attacks on these ciphers, they all require much more extensive access to the cipher than should be possible in your application.
Related
I want to compare a hash function and a RSA encryption with another parameter.
I have an algorithm with some hash function and I want to claim that computation load of these hashes is less than one RSA.
Can I say compare them with multiplication parameter, for example how many multiplication each of them has?
How can I compare them in communication load? How can I say that what the length of output in RSA is?
It sounds like you're trying to compare apples and oranges.
A hash function is generally expected to accept arbitrarily long inputs, and the time needed to compute it should generally scale linearly with the length of the input. Thus, a useful measure of hash function performance would be, say, "megabytes per second".
(Specifically, that would be a measure of throughput, which is the relevant measure when hashing long inputs. For short messages, a more relevant measure is the latency, which is basically the minimum time needed to hash zero-length input. Given the throughput and the latency, one can generally calculate a fairly good approximation of the time needed to hash an input of any given length as time = latency + length / throughput.)
RSA, on the other hand, can only encrypt messages shorter than the modulus, which is chosen at the time the key is generated. (Typical modulus sizes might be, say, from 1024 to 4096 bits.) To "encrypt a long message with RSA" one would normally use hybrid encryption: first encrypt the message using a symmetric cipher like AES, using a suitable mode of operation and a randomly chosen key, and then encrypt the AES key with RSA.
The same length limits apply to signing messages with RSA — by itself, RSA can only sign messages shorter than the modulus. The standard workaround in this case is to first hash the message, and then sign the hash value. (There's also a lot of important details like padding involved that I'm not going to go into here, since we're not on crypto.SE, but which are absolutely crucial for security.)
The point is that, in both cases, the RSA operation itself takes a fixed amount of time regardless of the message length, and thus, for sufficiently long messages, most of the time will be consumed by AES or the hash function, not by RSA itself. So when you say you want to "claim that computation load of these hashes is less than one RSA", I would say that's meaningless, at least unless you fixed a specific input length for your hash. (And if you did, my next question would be "what's so special about that particular input length?")
I looked into the implementation of SHA2 in Python and it looks like it uses some default key for hashing. Once the key is known and the digest is known, is it possible to get the plain-text back? (pre-image attack without brute-force) http://en.wikipedia.org/wiki/Preimage_attack
My intuition says NO as the block size (input size) is 512 bit and the output size is 256 bit. It means that to be a good hashing function (cryptographically) the function should be many to one function (non-invertible). {This is the exact opposite to the requirement of a block cipher where the function should be invertible (one to one).}
As far as I understood the requirement is to have random many-to-one function!
It's impossible, even with Brute Force, to recover the plain text for a given hash because there are many texts that map to the same N-bit key (for any value of N). That is, there are different messages that have the same digest value.
I'm in the process of designing an encryption algorithm. The algorithm is symmetric (single key).
How do you measure an algorithms strength in terms of bits? Is the key length the strength of the algorithm?
EDIT:
Lesson 1: Don't design an encryption algorithm, AES and others are
designed and standardized by academics for a reason
Lesson 2: An encryption algorithms strength is not measured in bits, key sizes are. An algorithm's strength is determined by its design. In general, an algorithm using a larger key size is harder to brute-force, and thus stronger.
First of all, is this for anything serious? If it is, stop right now. Don't do it. Designing algorithms is one of the hardest things in the world. Unless you have years and years of experience breaking ciphers, you will not design anything remotely secure.
AES and RSA serve two very different purposes. The difference is more than just signing. RSA is a public key algorithm. We use it for encryption, key exchange, digital signatures. AES is a symmetric block cipher. We use it for bulk encryption. RSA is very slow. AES is very fast. Most modern cryptosystems use a hybrid approach of using RSA for key exchange, and then AES for the bulk encryption.
Typically when we say "128-bit strength", we mean the size of the key. This is incredibly deceptive though, in that there is much more to the strength of an algorithm than the size of it's key. In other words, just because you have a million bit key, it means nothing.
The strength of an algorithm, is defined both in terms of it's key size, as well as it's resistance to cryptanalytic attacks. We say an algorithm is broken if there exists an attack better than brute force.
So, with AES and a 128-bit key, AES is considered "secure" if there is no attack that less than 2^128 work. If there is, we consider it "broken" (in an academic sense). Some of these attacks (for your searching) include differential cryptanalysis, linear cryptanalysis, and related key attacks.
How we brute force an algorithm also depends on it's type. A symmetric block cipher like AES is brute forced by trying every possible key. For RSA though, the size of the key is the size of the modulus. We don't break that by trying every possible key, but rather factoring. So the strength of RSA then is dependent on the current state of number theory. Thus, the size of the key doesn't always tell you it's actual strength. RSA-128 is horribly insecure. Typically RSA key sizes are 1024-bits+.
DES with a 56-bit key is stronger than pretty much EVERY amateur cipher ever designed.
If you are interested in designing algorithms, you should start by breaking other peoples. Bruce Schenier has a self-study course in cryptanalysis that can get you started: http://www.schneier.com/paper-self-study.html
FEAL is one of the most broken ciphers of all time. It makes for a great starting place of learning block cipher cryptanalysis. The source code is available, and there are countless published papers on it, so you can always "look up the answer" if you get stuck.
You can compare key lengths for the same algorithm. Between algorithms it does not make too much sense.
If the algorithm is any good (and it would be very hard to prove that for something homegrown), then it gets more secure with a longer key size. Adding one bit should (again, if the algorithm is good) double the effort it takes to brute-force it (because there are now twice as many possible keys).
The more important point, though, is that this only works for "good" algorithms. If your algorithm is broken (i.e. it can be decrypted without trying all the keys because of some design flaws in it), then making the key longer probably does not help much.
If you tell me you have invented an algorithm with a 1024-bit key, I have no way to judge if that is better or worse than a published 256-bit algorithm (I'd err on the safe side and assume worse).
If you have two algorithms in your competition, telling the judge the key size is not helping them to decide which one is better.
Oh man, this is a really difficult problem. One is for sure - key length shows nothing about encryption algorithm strength.
I can only think of two measures of encryption algorithm strength:
Show your algorithm to professional cryptanalyst. Algorithm strength will be proportional to the time cryptanalyst has taken to break your encryption.
Strong encryption algorithms makes encrypted data look pretty much random. So - measure randomness of your encrypted data. Algorithm strength should be proportional to encrypted data randomness degree. Warning - this criteria is just for playing arround, doesn't shows real encryption scheme strength !
So real measure is first, but with second you can play around for fun.
Assuming the algorithm is sound and that it uses the entire key range...
Raise the number of unique byte values for each key byte to the power of the number of bytes.
So if you are using only ASCII characters A-Z,a-z,0-9, that's 62 unique values - a 10 byte key using these values is 62^10. If you are using all 256 values, 0x00 - 0xFF, a 10 byte key is 256^10 (or 10 * 8 bits per byte = 2 ^ 80).
"Bits of security" is defined by NIST (National Institute of Standards and Technology), in:
NIST SP 800-57 Part 1, section 5.6.1 "Comparable Algorithm Strengths".
Various revisions of SP 800-57 Part 1 from NIST:
http://csrc.nist.gov/publications/PubsSPs.html#800-57-part1
Current version:
http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_part1_rev3_general.pdf
The "strength" is defined as "the amount of work needed to “break the algorithms”", and 5.6.1 goes on to describe that criterion at some length.
Table 2, in the same section, lays out the "bits of security" achieved by different key sizes of various algorithms, including AES, RSA, and ECC.
Rigorously determining the relative strength of a novel algorithm will require serious work.
My quick and dirty definition is "the number of bits that AES would require to have the same average cracking time". You can use any measure you like for time, like operations, wall time, whatever. If yours takes as long to crack as a theoretical 40-bit AES message would (2^88 less time than 128-bit AES), then it's 40 bits strong, regardless of whether you used 64,000 bit keys.
That's being honest, and honestly is hard to find in the crypto world, of course. For hilarity, compare it to plain RSA keys instead.
Obviously it's in no way hard and fast, and it goes down every time someone finds a better crack, but that's the nature of an arbitrary "strength-in-terms-of-bits" measure. Strength-in-terms-of-operations is a much more concrete measure.
When using AES (or probably most any cipher), it is bad practice to reuse an initialization vector (IV) for a given key. For example, suppose I encrypt a chunk of data with a given IV using cipher block chaining (CBC) mode. For the next chunk of data, the IV should be changed (e.g., the nonce might be incremented or something). I'm wondering, though, (and mostly out of curiosity) how much of a security risk it is if the same IV is used if it can be guaranteed that the first four bytes of the chunks are incrementing. In other words, suppose two chunks of data to be encrypted are:
0x00000000someotherdatafollowsforsomenumberofblocks
0x00000001someotherdatathatmaydifferormaynotfollows
If the same IV is used for both chunks of data, how much information would be leaked?
In this particular case, it's probably OK (but don't do it, anyway). The "effective IV" is your first encrypted block, which is guaranteed to be different for each message (as long as the nonce truly never repeats under the same key), because the block cipher operation is a bijection. It's also not predictable, as long as you change the key at the same time as you change the "IV", since even with fully predictable input the attacker should not be able to predict the output of the block cipher (block cipher behaves as a pseudo-random function).
It is, however, very fragile. Someone who is maintaining this protocol long after you've moved on to greener pastures might well not understand that the security depends heavily on that non-repeating nonce, and could "optimise" it out. Is sending that single extra block each message for a real IV really an overhead you can't afford?
Mark,
what you describe is pretty much what is proposed in Appendix C of NIST SP800-38a.
In particular, there are two ways to generate an IV:
Generate a new IV randomly for
each message.
For each message use a new unique nonce (this may be a counter), encrypt the nonce, and use the result as IV.
The second option looks very similar to what you are proposing.
Well, that depends on the block size of the encryption algorithm. For the usual block size of 64 bytes i dont think that would make any difference. The first bits would be the same for many blocks, before entering the block cipher, but the result would not have any recognisable pattern. For block sizes < 4 bytes (i dont think that happens) it would make a difference, because the first block(s) would always be the same, leaking information about patterns. Just my opinion.
edit:
Found this
"For CBC and CFB, reusing an IV leaks some information about the first block
of plaintext, and about any common prefix shared by the two messages"
Source: lectures of my university :)
What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.