How does being able to factor large numbers determine the security of popular encryption algorithms? - encryption

How is the encryption algorithm's security dependent on factoring large numbers?
For example, I've read on some math-programming forums that by using the Quadratic Sieve or the General Number Field Sieve, one can factor a 256 bit number with relative ease on commercially available hardware.
How does this translate to being able to break the security of algorithms such as RSA, AES, etc? Is being able to factor numbers the length of the key enough?
Is there anyone knowledgeable in cryptography and encryption algorithms who could shed a little light on it?

RSA, the cryptoalgorithm, relies on number theory, specifically the multiplication of two large primes and the fact this is difficult to factor, to differentiate between public and private keys.
Here's a question on Yahoo answers where someone has given some detail: http://answers.yahoo.com/question/index?qid=20070125183948AALJ40l
It relies on a few facts:
n=p.q is easy to calculate but hard to reverse
Fermat's little theorem: http://en.wikipedia.org/wiki/Fermat%27s_little_theorem
Various results of number theory.
It is not factoring large numbers that is difficult, it is factoring two large numbers whose only factors are themselves large primes, because finding those primes is difficult.
A quick search through my bookmarks gives me this: the mathematical guts of rsa encryption if you're interested in how it works. Also, some explanation here too - just re-read my num-theory notes to be clear.
n = p*q gives you a large number given p,q prime.
phi(n) = (p-1)(q-1). This is an extension of Fermat's little theorem More on why we need this and why it works on my blog here: http://vennard.org.uk/weblog/2010/02/a-full-explanation-of-the-rsa-algorithm/
Which means, if we choose a number E coprime (no common prime factors) to (p-1)(q-1) we can find Es inverse mod phi(n).
Which we do, we find DE = 1(p-1)(q-1) or rather we solve using euclid's greatest common divisor algorithm, extended version.
Now, given all of the above, if we take T^E (pq) we get C. However, if we take (T^E)^D (pq) we get T back again.
AES isn't the same - it isn't public key cryptography. AES takes one key and uses that in both directions, encryption and decryption. The process is basically very difficult to undo, rather like a hash but designed to be reversible. It however, does not rely on factoring large numbers into primes for its security; it relies entirely on the strength of the key and the inability to deduce either the key from the algorithm or the key given known plaintext and the algorithm.
Wikipedia has a good article on AES for high level with a good link that shows you how it works - see here and here. I particularly like the latter link.

How is the an encryption algorithm's security dependent on factoring large numbers?
The missing phrase is "public-key", as in "How is the public key encryption algorithm's security..."
In modern cryptography there two major categories of ciphers, symmetric (secret key) and public-key (which uses a public/private key pair).
Within each category, you will find the key sizes relatively close. For public-key systems like RSA and DH/DSA, both used in OpenPGP e-mail encryption, common key sizes are 1024-bit and larger these days (early 2010). This has to do with the mathematical requirements of suitable keys used to encryption and decrypt messages. For RSA, in short, it is many time easier to generate a factor of two random large prime numbers and do multiplication with them, compared to factoring of very large number that has no small factors. As you've discovered the factoring of very large numbers is the "problem" or approach needed to break RSA via brute force.
Diffie-Hellman / Digital Signature Algorithm (DH/DSA) are based on a different mathematical problem, calculating discrete logarithms.
Due to properties of the public and private key pairs, the search space is limited to factors of large primes numbers, which becomes incredibly sparse, so it makes sense to try to be far more intelligent then simply trying to factor every very large number.
Whereas with symmetric ciphers like AES, RC6, RC4, Twofish, DES and Triple-DES, these algorithms use a random key of a given bit length. Any non-trivial (i.e. 0x000...000 may be a poor key choice) random key is suitable. So these systems, if there is no attack against the algorithm itself, you can simply search brute force through the key space (i.e. try all 2^256 possible keys) to decrypt a message without the secret key. Since any key is suitable, the density of keys is 2^256.
I'm ignoring Quantum Computing (theoretic and practical), mainly because a) I can't give a solid answer, and b) it represents a large paradigm shift that turns much applied mathematics and computer science of computational complexity potentially on its head, that basic understanding is still a moving target. Oh, and most of my enemies don't have a quantum computer yet. :)
I hope that explains the general difference between the two types of crypto systems, such as RSA and AES.
Sidebar: Cryptography is a rich and complex topic, where the basics may be simple enough to understand, and even write a naive ("textbook") implementation, the complex subtleties of a secure implementation makes it best for programmers who are not cryptography experts to use high level crypto-systems including using well-known standard protocols to improve your chances that the cryptography of a system is not the exploitable flaw in a system.

AES is much different, AES creates a SPN, Substitution Permutation Network. It generates s-boxes (substitution boxes) based on polynomial functions generated at encryption time. It runs it through 10-14 rounds of byte-level substitution and bit-level permuting, the bit length of the key determining the number of rounds and the round keys.
RSA is based on factors of large prime numbers which are extremely hard to do computationally, but quite easy to initially encrypt.

RSA is broken by factoring. Actually, RSA is two algorithms, one for (asymmetric) encryption and one for digital signatures; both use the same primitive. In RSA, there is a public value (the modulus, often noted n) which is a product of two (or more) distinct prime factors. Factoring n reveals the private key. Factoring becomes harder when the size of n increases. The current record (published earlier this year) is for a 768-bit integer; it took four years of big computing and hard work by very smart people. The same people openly admit that they have little clue of how they could try the same stunt on a 1024-bit integer (there is a part of the best known factorization algorithm which requires an awful lot of fast RAM, and for a 1024-bit integer that would require a ludicrously huge machine). Current recommendations of key length for RSA are 1024 bits for short term, 2048 bits for long term security. Note that computational cost of RSA increases with key size as well, so we do not want to use really big keys without a good reason. A basic PC will produce about 1000 RSA signatures per second (and per core) with a 1024-bit key, and eight times less with a 2048-bit key. This is still quite good.
There are other asymmetric encryption algorithms, and digital signature algorithms. Somewhat related to RSA is the Rabin-Williams encryption algorithm; factoring also breaks it. Then there are algorithms based on discrete logarithm (in the multiplicative group of numbers modulo a big prime): Diffie-Hellman (key exchange), DSA (signature), El Gamal (encryption)... for these algorithms, factorization is no direct threat; but they rely on the same parts of number theory, and the best known algorithm for discrete logarithm is very similar to the best known algorithm for factorization (and has the same name: GNFS -- as General Number Field Sieve). So it is suspected that an advance in factorization would result from an advance in number theory which would be likely to shed some light on discrete logarithm as well.
The discrete logarithm algorithms can be applied to other groups, the most popular being elliptic curves. Elliptic curves are not impacted by factorization. If factorization became easy, thus scraping RSA and indirectly jeopardizing DSA and Diffie-Hellman, then we would switch to ECDH and ECDSA; standards and implementations exist and are deployed.
"Symmetric cryptography", i.e. hash functions (MD5, SHA-256...), authentication code (HMAC, CBC-MAC...), symmetric encryption (AES, 3DES...), random number generation (RC4...) and related activities, are totally unaffected by factorization. For these algorithms, keys are mere bunches of bits, without any special structure; there is nothing to factor.

Related

Why must the keys to asymmetric encryption have to be as much longer such as RSA than typical symmetrical encryption algorithms, such as AES?

I'm reading that it is because RSA has to do with math (prime numbers) while Symmetric key encryption is about taking blocks of data and modifying the blocks with replacements and remappings, but I still don't understand why Asymmetric encryption has to have longer keys because of that, or if that's even why?
For symmetric ciphers, the cipher strength depends on the key length assuming the cipher is not broken.
Asymmetric encryption is based on a trapdoor function (not necessarily prime numbers, there are others as well, such as elliptic curves or lattices). It should be one way function (for encryption) with very difficult computation of its inverse (decryption) without some kind of secret. So the strength of the asymmetric cipher depends on key length and as well on how difficult is to compute the function inverse with the specific key length
Example: breaking a 128 bit symmetric key would mean testing 2^128 numbers. Solving discrete logarithm problem (inverse of RSA) for 128 bit key will take much less time (we need to solve a math problem, not find a random key), so much longer key is needed to make up the same level of security

Short (6 bit) cryptographic keyed hash

I have to implement a simple hashing algorithm.
Input data:
Value (16-bit integer).
Key (any length).
Output data:
6-bit hash (number 0-63).
Requirements:
It should be practically impossible to predict hash value if you only have the input value but not the key. More specific: if I known hash(x) for x < M, it should be hard to predict hash(M) without knowing the key.
Possible solutions:
Keep full mapping as a key. So the key has length 2^16*6 bits. It's too long for my case.
Linear code. Key is a generator matrix. It's length is 16*6. But it's easy to find generator matrix using several known hash values.
Are there any other possibilities?
A HMAC seems to be what you want. So a possibility for you could be to use a SHA-based HMAC and just use a substring of the resulting hash. This should be relatively safe, since the bits of a cryptographic hash should be as independent and unpredictable as possible.
Depending on your environment, this could however take too much processing time, so you might have to chose a simpler hashing scheme to construct your HMAC.
Original Answer the discussion in the comments is based on:
Since you can forget cryptographic properties anyway (it is trivial to find collisions via bruteforce attacks on a 5-bit hash) you might as well use something like CRC or Hamming Codes and get error-detection for free
Mensi' suggestion to use truncated HMAC is a good one, but if you do happen to be on a highly constrained system and want something faster or simpler, you could take any block cipher, encrypt your 16-bit value (padded to a full block) with it and truncate the result to 6 bits.
Unlike HMAC, which computes a pseudorandom function, a block cipher is a pseudorandom permutation — every input maps to a different output. However, when you throw away all but six bits of the block cipher's output, what remains will look very much like a pseudorandom function. There will be a very tiny bias against repeated outputs, but (assuming that the block cipher's block size is much larger than 6 bits, which it should be) it'll be so small as to be all but undetectable.
A good block cipher choice for very low-end systems might be TEA or its successors XTEA and XXTEA. While there are some known attacks on these ciphers, they all require much more extensive access to the cipher than should be possible in your application.

Analyzing goals and choosing a good hash function

This isn’t a specific question with a specific solution; but it’s rather a response to the fact that I can’t find any good Stack Overflow qestions about how to choose a good a hashing function for hash tables and similar tasks.
So! Let’s talk hash functions, and how to choose one. How should a programming noob, who needs to choose a good hash function for their specific task, go about choosing one? When is the simple and quick Fowler-Noll-Vo appropriate? When should they vendor in MurmurHash3 instead? Do you have any links to good resources on comparing the various options?
The hash function for hash tables should have these two properties
Uniformity all outputs of H() should be evenly distributed as much as possible. In other words the for 32-bit hash function the probability for every output should be equal to 1/2^32. (for n-bit it should be 1/2^n). With uniform hash function the chance of collision is minimized to lowest possible for any possible input.
Low computational cost Hash functions for tables are expected to be FAST, compared to cryptographic hash functions where speed is traded for preimage resistance (eg it is hard to find the message from given hash value) and collision resistance.
For purposes of hash tables all cryptographic functions are BAD choice, since the computational cost is enormous. Because hashing here is used not for security but for fast access. MurmurHash is considered one of the fastest and uniform functions suitable for big hash tables or hash indexes. For small tables a trivial hash function should be OK. A trivial hash is where we mix values of object (by multiplication, addition and subtraction with some prime).
If your hash keys are strings (or other variable-length data) you might look at this paper by Ramakrishna and Zobel. They benchmark a few classes of hashing functions (for speed and low collisions) and exhibit a class that is better than the usual Bernstein hashes.

How to define your encryption algorithm's strength in terms of bits?

I'm in the process of designing an encryption algorithm. The algorithm is symmetric (single key).
How do you measure an algorithms strength in terms of bits? Is the key length the strength of the algorithm?
EDIT:
Lesson 1: Don't design an encryption algorithm, AES and others are
designed and standardized by academics for a reason
Lesson 2: An encryption algorithms strength is not measured in bits, key sizes are. An algorithm's strength is determined by its design. In general, an algorithm using a larger key size is harder to brute-force, and thus stronger.
First of all, is this for anything serious? If it is, stop right now. Don't do it. Designing algorithms is one of the hardest things in the world. Unless you have years and years of experience breaking ciphers, you will not design anything remotely secure.
AES and RSA serve two very different purposes. The difference is more than just signing. RSA is a public key algorithm. We use it for encryption, key exchange, digital signatures. AES is a symmetric block cipher. We use it for bulk encryption. RSA is very slow. AES is very fast. Most modern cryptosystems use a hybrid approach of using RSA for key exchange, and then AES for the bulk encryption.
Typically when we say "128-bit strength", we mean the size of the key. This is incredibly deceptive though, in that there is much more to the strength of an algorithm than the size of it's key. In other words, just because you have a million bit key, it means nothing.
The strength of an algorithm, is defined both in terms of it's key size, as well as it's resistance to cryptanalytic attacks. We say an algorithm is broken if there exists an attack better than brute force.
So, with AES and a 128-bit key, AES is considered "secure" if there is no attack that less than 2^128 work. If there is, we consider it "broken" (in an academic sense). Some of these attacks (for your searching) include differential cryptanalysis, linear cryptanalysis, and related key attacks.
How we brute force an algorithm also depends on it's type. A symmetric block cipher like AES is brute forced by trying every possible key. For RSA though, the size of the key is the size of the modulus. We don't break that by trying every possible key, but rather factoring. So the strength of RSA then is dependent on the current state of number theory. Thus, the size of the key doesn't always tell you it's actual strength. RSA-128 is horribly insecure. Typically RSA key sizes are 1024-bits+.
DES with a 56-bit key is stronger than pretty much EVERY amateur cipher ever designed.
If you are interested in designing algorithms, you should start by breaking other peoples. Bruce Schenier has a self-study course in cryptanalysis that can get you started: http://www.schneier.com/paper-self-study.html
FEAL is one of the most broken ciphers of all time. It makes for a great starting place of learning block cipher cryptanalysis. The source code is available, and there are countless published papers on it, so you can always "look up the answer" if you get stuck.
You can compare key lengths for the same algorithm. Between algorithms it does not make too much sense.
If the algorithm is any good (and it would be very hard to prove that for something homegrown), then it gets more secure with a longer key size. Adding one bit should (again, if the algorithm is good) double the effort it takes to brute-force it (because there are now twice as many possible keys).
The more important point, though, is that this only works for "good" algorithms. If your algorithm is broken (i.e. it can be decrypted without trying all the keys because of some design flaws in it), then making the key longer probably does not help much.
If you tell me you have invented an algorithm with a 1024-bit key, I have no way to judge if that is better or worse than a published 256-bit algorithm (I'd err on the safe side and assume worse).
If you have two algorithms in your competition, telling the judge the key size is not helping them to decide which one is better.
Oh man, this is a really difficult problem. One is for sure - key length shows nothing about encryption algorithm strength.
I can only think of two measures of encryption algorithm strength:
Show your algorithm to professional cryptanalyst. Algorithm strength will be proportional to the time cryptanalyst has taken to break your encryption.
Strong encryption algorithms makes encrypted data look pretty much random. So - measure randomness of your encrypted data. Algorithm strength should be proportional to encrypted data randomness degree. Warning - this criteria is just for playing arround, doesn't shows real encryption scheme strength !
So real measure is first, but with second you can play around for fun.
Assuming the algorithm is sound and that it uses the entire key range...
Raise the number of unique byte values for each key byte to the power of the number of bytes.
So if you are using only ASCII characters A-Z,a-z,0-9, that's 62 unique values - a 10 byte key using these values is 62^10. If you are using all 256 values, 0x00 - 0xFF, a 10 byte key is 256^10 (or 10 * 8 bits per byte = 2 ^ 80).
"Bits of security" is defined by NIST (National Institute of Standards and Technology), in:
NIST SP 800-57 Part 1, section 5.6.1 "Comparable Algorithm Strengths".
Various revisions of SP 800-57 Part 1 from NIST:
http://csrc.nist.gov/publications/PubsSPs.html#800-57-part1
Current version:
http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_part1_rev3_general.pdf
The "strength" is defined as "the amount of work needed to “break the algorithms”", and 5.6.1 goes on to describe that criterion at some length.
Table 2, in the same section, lays out the "bits of security" achieved by different key sizes of various algorithms, including AES, RSA, and ECC.
Rigorously determining the relative strength of a novel algorithm will require serious work.
My quick and dirty definition is "the number of bits that AES would require to have the same average cracking time". You can use any measure you like for time, like operations, wall time, whatever. If yours takes as long to crack as a theoretical 40-bit AES message would (2^88 less time than 128-bit AES), then it's 40 bits strong, regardless of whether you used 64,000 bit keys.
That's being honest, and honestly is hard to find in the crypto world, of course. For hilarity, compare it to plain RSA keys instead.
Obviously it's in no way hard and fast, and it goes down every time someone finds a better crack, but that's the nature of an arbitrary "strength-in-terms-of-bits" measure. Strength-in-terms-of-operations is a much more concrete measure.

How does a cryptographically secure random number generator work?

I understand how standard random number generators work. But when working with crytpography, the random numbers really have to be random.
I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this.
How does a cryptographically secure random number generator get its values with no repeatable patterns?
A cryptographically secure number random generator, as you might use for generating encryption keys, works by gathering entropy - that is, unpredictable input - from a source which other people can't observe.
For instance, /dev/random(4) on Linux collects information from the variation in timing of hardware interrupts from sources such as hard disks returning data, keypresses and incoming network packets. This approach is secure provided that the kernel does not overestimate how much entropy it has collected. A few years back the estimations of entropy from the various different sources were all reduced, making them far more conservative. Here's an explanation of how Linux estimates entropy.
None of the above is particularly high-throughput. /dev/random(4) probably is secure, but it maintains that security by refusing to give out data once it can't be sure that that data is securely random. If you want to, for example, generate a lot of cryptographic keys and nonces then you'll probably want to resort to hardware random number generators.
Often hardware RNGs are designed about sampling from the difference between a pair of oscillators that are running at close to the same speed, but whose rates are varied slightly according to thermal noise. If I remember rightly, the random number generator that's used for the UK's premium bond lottery, ERNIE, works this way.
Alternate schemes include sampling the noise on a CCD (see lavaRND), radioactive decay (see hotbits) or atmospheric noise (see random.org, or just plug an AM radio tuned somewhere other than a station into your sound card). Or you can directly ask the computer's user to bang on their keyboard like a deranged chimpanzee for a minute, whatever floats your boat.
As andras pointed out, I only thought to talk about some of the most common entropy gathering schemes. Thomas Pornin's answer and Johannes Rössel's answer both do good jobs of explaining how one can go about mangling gathered entropy in order to hand bits of it out again.
For cryptographic purposes, what is needed is that the stream shall be "computationally indistinguishable from uniformly random bits". "Computationally" means that it needs not be truly random, only that it appears so to anybody without access to God's own computer.
In practice, this means that the system must first gather a sequence of n truly random bits. n shall be large enough to thwart exhaustive search, i.e. it shall be infeasible to try all 2^n combinations of n bits. This is achieved, with regards to today's technology, as long as n is greater than 90-or-so, but cryptographers just love powers of two, so it is customary to use n = 128.
These n random bits are obtained by gathering "physical events" which should be unpredictable, as far as physics are concerned. Usually, timing is used: the CPU has a cycle counter which is updated several billions times per second, and some events occur with an inevitable amount of jitter (incoming network packets, mouse movements, key strokes...). The system encodes these events and then "compresses" them by applying a cryptographically secure hash function such as SHA-256 (output is then truncated to yield our n bits). What matters here is that the encoding of the physical events has enough entropy: roughly speaking, that the said events could have collectively assumed at least 2^n combinations. The hash function, by its definition, should make a good job at concentrating that entropy into a n-bit string.
Once we have n bits, we use a PRNG (Pseudo-Random Number Generator) to crank out as many bits as necessary. A PRNG is said to be cryptographically secure if, assuming that it operates over a wide enough unknown n-bit key, its output is computationally indistinguishable from uniformly random bits. In the 90's, a popular choice was RC4, which is very simple to implement, and quite fast. However, it turned out to have measurable biases, i.e. it was not as indistinguishable as was initially wished for. The eSTREAM Project consisted in gathering newer designs for PRNG (actually stream ciphers, because most stream ciphers consist in a PRNG, which output is XORed with the data to encrypt), documenting them, and promoting analysis by cryptographers. The eSTREAM Portfolio contains seven PRNG designs which were deemed secure enough (i.e. they resisted analysis and cryptographers tend to have a good understanding of why they resisted). Among them, four are "optimized for software". The good news is that while these new PRNG seem to be much more secure than RC4, they are also noticeably faster (we are talking about hundreds of megabytes per second, here). Three of them are "free for any use" and source code is provided.
From a design point of view, PRNG reuse much of the elements of block ciphers. The same concepts of avalanche and diffusion of bits into a wide internal state are used. Alternatively, a decent PRNG can be built from a block cipher: simply use the n-bit sequence as key into a block cipher, and encrypt successive values of a counter (expressed as a m-bit sequence, if the block cipher uses m-bit blocks). This produces a pseudo-random stream of bits which is computationally indistinguishable from random, as long as the block cipher is secure, and the produced stream is no longer than m*2^(m/2) bits (for m = 128, this means about 300 billions of gigabytes, so that's big enough for most purposes). That kind of usage is known as counter mode (CTR).
Usually, a block cipher in CTR mode is not as fast as a dedicated stream cipher (the point of the stream cipher is that, by forfeiting the flexibility of a block cipher, better performance is expected). However, if you happen to have one of the most recent CPU from Intel with the AES-NI instructions (which are basically an AES implementation in hardware, integrated in the CPU), then AES with CTR mode will yield unbeatable speed (several gigabytes per second).
First of all, the point of a cryptographically secure PRNG is not to generate entirely unpredictable sequences. As you noted, the absence of something that generates large volumes of (more or less) true randomness1 makes that impossible.
So you resort to something which is only hard to predict. “Hard” meaning here that it takes unfeasibly long by which time whatever it was necessary for would be obsolete anyway. There are a number of mathematical algorithms that play a part in this—you can get a glimpse if you take some well-known CSPRNGs and look at how they work.
The most common variants to build such a PRNG are:
Using a stream cipher, which already outputs a (supposedly secure) pseudo-random bit stream.
Using a block cipher in counter mode
Hash functions on a counter are also sometimes used. Wikipedia has more on this.
General requirements are just that it's unfeasible to determine the original initialization vector from a generator's bit stream and that the next bit cannot be easily predicted.
As for initialization, most CSPRNGs use various sources available on the system, ranging from truly random things like line noise, interrupts or other events in the system to other things like certain memory locations, &c. The initialization vector is preferrably really random and not dependent on a mathematical algorithm. This initialization was broken for some time in Debian's implementation of OpenSSL which led to severe security problems.
1 Which has its problems too and one has to be careful in eliminating bias as things such as thermal noise has different characteristics depending on the temperature—you almost always have bias and need to eliminate it. And that's not a trivial task in itself.
In order for a random number generator to be considered cryptographically secure, in needs to be secure against attack by an adversary who knows the algorithm and a (large) number of previously generated bits. What this means is that someone with that information can't reconstruct any of the hidden internal state of the generator and give predictions of what the next bits produced will be with better than 50% accuracy.
Normal pseudo-random number generators are generally not cryptographically secure, as reconstructing the internal state from previously output bits is generaly trivial (often, the entire internal state is just the last N bits produced directly). Any random number generator without good statistical properties is also not cryptographically secure, as its output is at least party predictable even without knowing the internal state.
So, as to how they work, any good crypto system can be used as a cryptographically secure random number generator -- use the crypto system to encrypt the output of a 'normal' random number generator. Since an adversary can't reconstruct the plaintext output of the normal random number generator, he can't attack it directly. This is a somewhat circular definition an begs the question of how you key the crypto system to keep it secure, which is a whole other problem.
Each generator will use its own seeding strategy, but here's a bit from the Windows API documentation on CryptGenRandom
With Microsoft CSPs, CryptGenRandom uses the same random number
generator used by other security components. This allows numerous
processes to contribute to a system-wide seed. CryptoAPI stores an
intermediate random seed with every user. To form the seed for the
random number generator, a calling application supplies bits it might
have—for instance, mouse or keyboard timing input—that are then
combined with both the stored seed and various system data and user
data such as the process ID and thread ID, the system clock, the
system time, the system counter, memory status, free disk clusters,
the hashed user environment block. This result is used to seed the
pseudorandom number generator (PRNG).
In Windows Vista with Service Pack 1 (SP1) and later, an
implementation of the AES counter-mode based PRNG specified in NIST
Special Publication 800-90 is used. In Windows Vista, Windows Storage
Server 2003, and Windows XP, the PRNG specified in Federal Information
Processing Standard (FIPS) 186-2 is used. If an application has access
to a good random source, it can fill the pbBuffer buffer with some
random data before calling CryptGenRandom. The CSP then uses this data
to further randomize its internal seed. It is acceptable to omit the
step of initializing the pbBuffer buffer before calling
CryptGenRandom.

Resources