Related
I tried to find the complexity for AES (Advanced Encryption Algorithm) but sometime I got constant time and sometime polynomial. Can anyone explain the exact time and space complexity for it?
I want to compare a hash function and a RSA encryption with another parameter.
I have an algorithm with some hash function and I want to claim that computation load of these hashes is less than one RSA.
Can I say compare them with multiplication parameter, for example how many multiplication each of them has?
How can I compare them in communication load? How can I say that what the length of output in RSA is?
It sounds like you're trying to compare apples and oranges.
A hash function is generally expected to accept arbitrarily long inputs, and the time needed to compute it should generally scale linearly with the length of the input. Thus, a useful measure of hash function performance would be, say, "megabytes per second".
(Specifically, that would be a measure of throughput, which is the relevant measure when hashing long inputs. For short messages, a more relevant measure is the latency, which is basically the minimum time needed to hash zero-length input. Given the throughput and the latency, one can generally calculate a fairly good approximation of the time needed to hash an input of any given length as time = latency + length / throughput.)
RSA, on the other hand, can only encrypt messages shorter than the modulus, which is chosen at the time the key is generated. (Typical modulus sizes might be, say, from 1024 to 4096 bits.) To "encrypt a long message with RSA" one would normally use hybrid encryption: first encrypt the message using a symmetric cipher like AES, using a suitable mode of operation and a randomly chosen key, and then encrypt the AES key with RSA.
The same length limits apply to signing messages with RSA — by itself, RSA can only sign messages shorter than the modulus. The standard workaround in this case is to first hash the message, and then sign the hash value. (There's also a lot of important details like padding involved that I'm not going to go into here, since we're not on crypto.SE, but which are absolutely crucial for security.)
The point is that, in both cases, the RSA operation itself takes a fixed amount of time regardless of the message length, and thus, for sufficiently long messages, most of the time will be consumed by AES or the hash function, not by RSA itself. So when you say you want to "claim that computation load of these hashes is less than one RSA", I would say that's meaningless, at least unless you fixed a specific input length for your hash. (And if you did, my next question would be "what's so special about that particular input length?")
This isn’t a specific question with a specific solution; but it’s rather a response to the fact that I can’t find any good Stack Overflow qestions about how to choose a good a hashing function for hash tables and similar tasks.
So! Let’s talk hash functions, and how to choose one. How should a programming noob, who needs to choose a good hash function for their specific task, go about choosing one? When is the simple and quick Fowler-Noll-Vo appropriate? When should they vendor in MurmurHash3 instead? Do you have any links to good resources on comparing the various options?
The hash function for hash tables should have these two properties
Uniformity all outputs of H() should be evenly distributed as much as possible. In other words the for 32-bit hash function the probability for every output should be equal to 1/2^32. (for n-bit it should be 1/2^n). With uniform hash function the chance of collision is minimized to lowest possible for any possible input.
Low computational cost Hash functions for tables are expected to be FAST, compared to cryptographic hash functions where speed is traded for preimage resistance (eg it is hard to find the message from given hash value) and collision resistance.
For purposes of hash tables all cryptographic functions are BAD choice, since the computational cost is enormous. Because hashing here is used not for security but for fast access. MurmurHash is considered one of the fastest and uniform functions suitable for big hash tables or hash indexes. For small tables a trivial hash function should be OK. A trivial hash is where we mix values of object (by multiplication, addition and subtraction with some prime).
If your hash keys are strings (or other variable-length data) you might look at this paper by Ramakrishna and Zobel. They benchmark a few classes of hashing functions (for speed and low collisions) and exhibit a class that is better than the usual Bernstein hashes.
I'm in the process of designing an encryption algorithm. The algorithm is symmetric (single key).
How do you measure an algorithms strength in terms of bits? Is the key length the strength of the algorithm?
EDIT:
Lesson 1: Don't design an encryption algorithm, AES and others are
designed and standardized by academics for a reason
Lesson 2: An encryption algorithms strength is not measured in bits, key sizes are. An algorithm's strength is determined by its design. In general, an algorithm using a larger key size is harder to brute-force, and thus stronger.
First of all, is this for anything serious? If it is, stop right now. Don't do it. Designing algorithms is one of the hardest things in the world. Unless you have years and years of experience breaking ciphers, you will not design anything remotely secure.
AES and RSA serve two very different purposes. The difference is more than just signing. RSA is a public key algorithm. We use it for encryption, key exchange, digital signatures. AES is a symmetric block cipher. We use it for bulk encryption. RSA is very slow. AES is very fast. Most modern cryptosystems use a hybrid approach of using RSA for key exchange, and then AES for the bulk encryption.
Typically when we say "128-bit strength", we mean the size of the key. This is incredibly deceptive though, in that there is much more to the strength of an algorithm than the size of it's key. In other words, just because you have a million bit key, it means nothing.
The strength of an algorithm, is defined both in terms of it's key size, as well as it's resistance to cryptanalytic attacks. We say an algorithm is broken if there exists an attack better than brute force.
So, with AES and a 128-bit key, AES is considered "secure" if there is no attack that less than 2^128 work. If there is, we consider it "broken" (in an academic sense). Some of these attacks (for your searching) include differential cryptanalysis, linear cryptanalysis, and related key attacks.
How we brute force an algorithm also depends on it's type. A symmetric block cipher like AES is brute forced by trying every possible key. For RSA though, the size of the key is the size of the modulus. We don't break that by trying every possible key, but rather factoring. So the strength of RSA then is dependent on the current state of number theory. Thus, the size of the key doesn't always tell you it's actual strength. RSA-128 is horribly insecure. Typically RSA key sizes are 1024-bits+.
DES with a 56-bit key is stronger than pretty much EVERY amateur cipher ever designed.
If you are interested in designing algorithms, you should start by breaking other peoples. Bruce Schenier has a self-study course in cryptanalysis that can get you started: http://www.schneier.com/paper-self-study.html
FEAL is one of the most broken ciphers of all time. It makes for a great starting place of learning block cipher cryptanalysis. The source code is available, and there are countless published papers on it, so you can always "look up the answer" if you get stuck.
You can compare key lengths for the same algorithm. Between algorithms it does not make too much sense.
If the algorithm is any good (and it would be very hard to prove that for something homegrown), then it gets more secure with a longer key size. Adding one bit should (again, if the algorithm is good) double the effort it takes to brute-force it (because there are now twice as many possible keys).
The more important point, though, is that this only works for "good" algorithms. If your algorithm is broken (i.e. it can be decrypted without trying all the keys because of some design flaws in it), then making the key longer probably does not help much.
If you tell me you have invented an algorithm with a 1024-bit key, I have no way to judge if that is better or worse than a published 256-bit algorithm (I'd err on the safe side and assume worse).
If you have two algorithms in your competition, telling the judge the key size is not helping them to decide which one is better.
Oh man, this is a really difficult problem. One is for sure - key length shows nothing about encryption algorithm strength.
I can only think of two measures of encryption algorithm strength:
Show your algorithm to professional cryptanalyst. Algorithm strength will be proportional to the time cryptanalyst has taken to break your encryption.
Strong encryption algorithms makes encrypted data look pretty much random. So - measure randomness of your encrypted data. Algorithm strength should be proportional to encrypted data randomness degree. Warning - this criteria is just for playing arround, doesn't shows real encryption scheme strength !
So real measure is first, but with second you can play around for fun.
Assuming the algorithm is sound and that it uses the entire key range...
Raise the number of unique byte values for each key byte to the power of the number of bytes.
So if you are using only ASCII characters A-Z,a-z,0-9, that's 62 unique values - a 10 byte key using these values is 62^10. If you are using all 256 values, 0x00 - 0xFF, a 10 byte key is 256^10 (or 10 * 8 bits per byte = 2 ^ 80).
"Bits of security" is defined by NIST (National Institute of Standards and Technology), in:
NIST SP 800-57 Part 1, section 5.6.1 "Comparable Algorithm Strengths".
Various revisions of SP 800-57 Part 1 from NIST:
http://csrc.nist.gov/publications/PubsSPs.html#800-57-part1
Current version:
http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_part1_rev3_general.pdf
The "strength" is defined as "the amount of work needed to “break the algorithms”", and 5.6.1 goes on to describe that criterion at some length.
Table 2, in the same section, lays out the "bits of security" achieved by different key sizes of various algorithms, including AES, RSA, and ECC.
Rigorously determining the relative strength of a novel algorithm will require serious work.
My quick and dirty definition is "the number of bits that AES would require to have the same average cracking time". You can use any measure you like for time, like operations, wall time, whatever. If yours takes as long to crack as a theoretical 40-bit AES message would (2^88 less time than 128-bit AES), then it's 40 bits strong, regardless of whether you used 64,000 bit keys.
That's being honest, and honestly is hard to find in the crypto world, of course. For hilarity, compare it to plain RSA keys instead.
Obviously it's in no way hard and fast, and it goes down every time someone finds a better crack, but that's the nature of an arbitrary "strength-in-terms-of-bits" measure. Strength-in-terms-of-operations is a much more concrete measure.
I understand how standard random number generators work. But when working with crytpography, the random numbers really have to be random.
I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this.
How does a cryptographically secure random number generator get its values with no repeatable patterns?
A cryptographically secure number random generator, as you might use for generating encryption keys, works by gathering entropy - that is, unpredictable input - from a source which other people can't observe.
For instance, /dev/random(4) on Linux collects information from the variation in timing of hardware interrupts from sources such as hard disks returning data, keypresses and incoming network packets. This approach is secure provided that the kernel does not overestimate how much entropy it has collected. A few years back the estimations of entropy from the various different sources were all reduced, making them far more conservative. Here's an explanation of how Linux estimates entropy.
None of the above is particularly high-throughput. /dev/random(4) probably is secure, but it maintains that security by refusing to give out data once it can't be sure that that data is securely random. If you want to, for example, generate a lot of cryptographic keys and nonces then you'll probably want to resort to hardware random number generators.
Often hardware RNGs are designed about sampling from the difference between a pair of oscillators that are running at close to the same speed, but whose rates are varied slightly according to thermal noise. If I remember rightly, the random number generator that's used for the UK's premium bond lottery, ERNIE, works this way.
Alternate schemes include sampling the noise on a CCD (see lavaRND), radioactive decay (see hotbits) or atmospheric noise (see random.org, or just plug an AM radio tuned somewhere other than a station into your sound card). Or you can directly ask the computer's user to bang on their keyboard like a deranged chimpanzee for a minute, whatever floats your boat.
As andras pointed out, I only thought to talk about some of the most common entropy gathering schemes. Thomas Pornin's answer and Johannes Rössel's answer both do good jobs of explaining how one can go about mangling gathered entropy in order to hand bits of it out again.
For cryptographic purposes, what is needed is that the stream shall be "computationally indistinguishable from uniformly random bits". "Computationally" means that it needs not be truly random, only that it appears so to anybody without access to God's own computer.
In practice, this means that the system must first gather a sequence of n truly random bits. n shall be large enough to thwart exhaustive search, i.e. it shall be infeasible to try all 2^n combinations of n bits. This is achieved, with regards to today's technology, as long as n is greater than 90-or-so, but cryptographers just love powers of two, so it is customary to use n = 128.
These n random bits are obtained by gathering "physical events" which should be unpredictable, as far as physics are concerned. Usually, timing is used: the CPU has a cycle counter which is updated several billions times per second, and some events occur with an inevitable amount of jitter (incoming network packets, mouse movements, key strokes...). The system encodes these events and then "compresses" them by applying a cryptographically secure hash function such as SHA-256 (output is then truncated to yield our n bits). What matters here is that the encoding of the physical events has enough entropy: roughly speaking, that the said events could have collectively assumed at least 2^n combinations. The hash function, by its definition, should make a good job at concentrating that entropy into a n-bit string.
Once we have n bits, we use a PRNG (Pseudo-Random Number Generator) to crank out as many bits as necessary. A PRNG is said to be cryptographically secure if, assuming that it operates over a wide enough unknown n-bit key, its output is computationally indistinguishable from uniformly random bits. In the 90's, a popular choice was RC4, which is very simple to implement, and quite fast. However, it turned out to have measurable biases, i.e. it was not as indistinguishable as was initially wished for. The eSTREAM Project consisted in gathering newer designs for PRNG (actually stream ciphers, because most stream ciphers consist in a PRNG, which output is XORed with the data to encrypt), documenting them, and promoting analysis by cryptographers. The eSTREAM Portfolio contains seven PRNG designs which were deemed secure enough (i.e. they resisted analysis and cryptographers tend to have a good understanding of why they resisted). Among them, four are "optimized for software". The good news is that while these new PRNG seem to be much more secure than RC4, they are also noticeably faster (we are talking about hundreds of megabytes per second, here). Three of them are "free for any use" and source code is provided.
From a design point of view, PRNG reuse much of the elements of block ciphers. The same concepts of avalanche and diffusion of bits into a wide internal state are used. Alternatively, a decent PRNG can be built from a block cipher: simply use the n-bit sequence as key into a block cipher, and encrypt successive values of a counter (expressed as a m-bit sequence, if the block cipher uses m-bit blocks). This produces a pseudo-random stream of bits which is computationally indistinguishable from random, as long as the block cipher is secure, and the produced stream is no longer than m*2^(m/2) bits (for m = 128, this means about 300 billions of gigabytes, so that's big enough for most purposes). That kind of usage is known as counter mode (CTR).
Usually, a block cipher in CTR mode is not as fast as a dedicated stream cipher (the point of the stream cipher is that, by forfeiting the flexibility of a block cipher, better performance is expected). However, if you happen to have one of the most recent CPU from Intel with the AES-NI instructions (which are basically an AES implementation in hardware, integrated in the CPU), then AES with CTR mode will yield unbeatable speed (several gigabytes per second).
First of all, the point of a cryptographically secure PRNG is not to generate entirely unpredictable sequences. As you noted, the absence of something that generates large volumes of (more or less) true randomness1 makes that impossible.
So you resort to something which is only hard to predict. “Hard” meaning here that it takes unfeasibly long by which time whatever it was necessary for would be obsolete anyway. There are a number of mathematical algorithms that play a part in this—you can get a glimpse if you take some well-known CSPRNGs and look at how they work.
The most common variants to build such a PRNG are:
Using a stream cipher, which already outputs a (supposedly secure) pseudo-random bit stream.
Using a block cipher in counter mode
Hash functions on a counter are also sometimes used. Wikipedia has more on this.
General requirements are just that it's unfeasible to determine the original initialization vector from a generator's bit stream and that the next bit cannot be easily predicted.
As for initialization, most CSPRNGs use various sources available on the system, ranging from truly random things like line noise, interrupts or other events in the system to other things like certain memory locations, &c. The initialization vector is preferrably really random and not dependent on a mathematical algorithm. This initialization was broken for some time in Debian's implementation of OpenSSL which led to severe security problems.
1 Which has its problems too and one has to be careful in eliminating bias as things such as thermal noise has different characteristics depending on the temperature—you almost always have bias and need to eliminate it. And that's not a trivial task in itself.
In order for a random number generator to be considered cryptographically secure, in needs to be secure against attack by an adversary who knows the algorithm and a (large) number of previously generated bits. What this means is that someone with that information can't reconstruct any of the hidden internal state of the generator and give predictions of what the next bits produced will be with better than 50% accuracy.
Normal pseudo-random number generators are generally not cryptographically secure, as reconstructing the internal state from previously output bits is generaly trivial (often, the entire internal state is just the last N bits produced directly). Any random number generator without good statistical properties is also not cryptographically secure, as its output is at least party predictable even without knowing the internal state.
So, as to how they work, any good crypto system can be used as a cryptographically secure random number generator -- use the crypto system to encrypt the output of a 'normal' random number generator. Since an adversary can't reconstruct the plaintext output of the normal random number generator, he can't attack it directly. This is a somewhat circular definition an begs the question of how you key the crypto system to keep it secure, which is a whole other problem.
Each generator will use its own seeding strategy, but here's a bit from the Windows API documentation on CryptGenRandom
With Microsoft CSPs, CryptGenRandom uses the same random number
generator used by other security components. This allows numerous
processes to contribute to a system-wide seed. CryptoAPI stores an
intermediate random seed with every user. To form the seed for the
random number generator, a calling application supplies bits it might
have—for instance, mouse or keyboard timing input—that are then
combined with both the stored seed and various system data and user
data such as the process ID and thread ID, the system clock, the
system time, the system counter, memory status, free disk clusters,
the hashed user environment block. This result is used to seed the
pseudorandom number generator (PRNG).
In Windows Vista with Service Pack 1 (SP1) and later, an
implementation of the AES counter-mode based PRNG specified in NIST
Special Publication 800-90 is used. In Windows Vista, Windows Storage
Server 2003, and Windows XP, the PRNG specified in Federal Information
Processing Standard (FIPS) 186-2 is used. If an application has access
to a good random source, it can fill the pbBuffer buffer with some
random data before calling CryptGenRandom. The CSP then uses this data
to further randomize its internal seed. It is acceptable to omit the
step of initializing the pbBuffer buffer before calling
CryptGenRandom.