Newbee question:
I am studying ANSI X9.31 -1998 for implementing PRNG as per section 2.4. I am not able to properly understand the representation of varibles used - like "ede".
Is the "ede" an operation or a variable ?
Why a * is used before X? Is it some kind of a standard representation?
Is there any specific document which describes all these?
"A.2.4 Generating Pseudo Random Numbers Using the DEA
Let ede*X(Y) represent the DEA multiple encryption of Y under the key *X.
Let *K be a DEA key pair reserved only for the generation of pseudo random numbers, let V be a 64-bit seed value which is also kept secret, and let XOR be the exclusive-or operator. Let DT be a date/time vector which is updated on each iteration. I is an intermediate value. A 64-bit vector R is generated as follows:
I = ede*K(DT)
R = ede*K(I XOR V) and a new V is generated by V = ede*K(R XOR I).
Successive values of R may be concatenated to produce a pseudo random number of the desired length."
EDE means Encrypt, Decrypt, Encrypt, the usual protocol when using Triple DES.
The use of * looks a lot like the more usual subscription common to cryptography articles:
EDEX(Y) to mean using X as the key for the algorithm EDE.
Related
Is there any (simple) random generation function that can work without variable assignment? Most functions I read look like this current = next(current). However currently I have a restriction (from SQLite) that I cannot use any variable at all.
Is there a way to generate a number sequence (for example, from 1 to max) with only n (current number index in the sequence) and seed?
Currently I am using this:
cast(((1103515245 * Seed * ROWID + 12345) % 2147483648) / 2147483648.0 * Max as int) + 1
with max being 47, ROWID being n. However for some seed, the repeat rate is too high (3 unique out of 47).
In my requirements, repetition is ok as long as it's not too much (<50%). Is there any better function that meets my need?
The question has sqlite tag but any language/pseudo-code is ok.
P.s: I have tried using Linear congruential generators with some a/c/m triplets and Seed * ROWID as Seed, but it does not work well, it's even worse.
EDIT: I currently use this one, but I do not know where it's from. The rate looks better than mine:
((((Seed * ROWID) % 79) * 53) % "Max") + 1
I am not sure if you still have the same problem but I might have a solution for you.
What you could do is use Pseudo Random M-sequence generators based on shifting registers. Where you just have to take high enough order of you primitive polynomial and you don't need to store any variables really.
For more info you can check the wiki page
What you would need to code is just the primitive polynomial shifting equation and I have checked in an online editor it should be very easy to do. I think the easiest way for you would be to use Binary base and use PRBS sequences and depending on how many elements you will have you can choose your sequence length. For example this is the implementation for length of 2^15 = 32768 (PRBS15), the primitive polynomial I took from the wiki page (There youcan find the primitive polynomials all the way to PRBS31 what would be 2^31=2.1475e+09)
Basically what you need to do is:
SELECT (((ROWID << 1) | (((ROWID >> 14) <> (ROWID >> 13)) & 1)) & 0x7fff)
The beauty of this approach is if you take the sequence of the PRBS with longer period than your ROWID largest value you will have unique random index. Very simple. :)
If you need help with searching for primitive polynomials you can see my github repo which deals exactly with finding primitive polynomials and unique m-sequences. It is currently written in Matlab, but I plan to write it in python in next few days.
Cheers!
What about using good hash function and map result into [1...max] range?
Along the lines (in pseudocode). sha1 was added to SQLite 3.17.
sha1(ROWID) % Max + 1
Or use any external C code for hash (murmur, chacha, ...) as shown here
A linear congruential generator with appropriately-chosen parameters (a, c, and modulus m) will be a full-period generator, such that it cycles pseudorandomly through every integer in its period before repeating. Although you may have tried this idea before, have you considered that m is equivalent to max in your case? For a list of parameter choices for such generators, see L'Ecuyer, P., "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", Mathematics of Computation 68(225), January 1999.
Note that there are some practical issues to implementing this in SQLite, especially if your SQLite version supports only 32-bit integers and 64-bit floating-point numbers (with 52 bits of precision). Namely, there may be a risk of—
overflow if an intermediate multiplication exceeds 32 bits for integers, and
precision loss if an intermediate multiplication results in a greater-than-52-bit number.
Also, consider why you are creating the random number sequence:
Is the sequence intended to be unpredictable? In that case, a linear congruential generator alone is not enough, and you should generate unique identifiers by other means, such as by combining unique numbers with cryptographically random numbers.
Will the numbers generated this way be exposed in any way to end users? If not, there is no need to obfuscate them by "shuffling" them.
Also, depending on the SQLite API you're using (for your programming language), there may be a way to write a custom function to convert the seed and ROWID to a random unique number. The details, however, depend heavily on the specific SQLite API. Another answer shows an example for Perl.
I'm not learning cryptography yet, and this exercise - in the form it was delivered as a homework, was more of an exercise on reading composite functions and the like. Either way, I took a look at some part of the source code and didn't understand this.
For RSA encryption, the source code manipulated the string in such a way:
Message is being hashed into an integer list. (int1, int2, int3...)
Encrypt int1
Subtract result from int2 ( int2 - e(int1))
Modulo with the modulo key (n)
RSA transform with a key.
However, the RSA decryption method is done by:
1) RSA_transform
2) Result is added
3) Modulo with n
The part that puzzles me about the RSA decryption is the need for modulo after the adding and rsa_transform. If it's needed, shouldnt it be used in reverse order of how the chain of operations was carried out in RSA encryption?
Also, an "invert_modulo" was provided in the source code. I originally believed this to be a key in decrypting the message, but it wasn't so. What could "invert_modulo" be used for?
I cannot understand the first part of your question as the steps to hash the string is not clear also i don't get 3rd part of your encryption step. As for the Second question invert_modulo is the "MODULAR MULTIPLICATIVE INVERSE".
While working with modular airthmetic we always want our answer to be in the integer range 0 to M-1(where M is the number we modulo with) simple operations like addition , multiplication and subtraction are easy to perform : like (a+b) MOD M, it is well defined for the constraints of modular airthmetic.
Problem arises wen we try to divide : (a/b) MOD M
as you can see here a/b may not always always give an integer, therefore (a/b) does not lie in the integer range 0 to M-1. so to overcome this we try to find an inverse of b that we would rather multiply a with, i.e : (a*b_inverse) MOD M.
b_inverse can be defined as : (b*b_inverse) MOD M = 1.
i.e b_inverse is a number in the range 0 to M-1, which when multiplied with b, modulo M yields 1.
Note : also note that modular inverse of some numbers might not exist we can check that by taking the GCD of M and the number concerned(in our example "b") if GCD is not equal to 1 the the modular inverse does not exist.
I have a table with an auto-increment 32-bit integer primary key in a database, which will produce numbers ranging 1-4294967295.
I would like to keep the convenience of an auto-generated primary key, while having my numbers on the front-end of an application look like randomly generated.
Is there a mathematical function which would allow a two-way, one-to-one transformation between an integer and another?
For example a function would take a number, and translate it to another:
1 => 1538645623
2 => 2043145593
3 => 393439399
And another function the way back:
1538645623 => 1
2043145593 => 2
393439399 => 3
I'm not necessarily looking for an implementation here, but rather a hint on what I suppose, must be a well-known mathematical problem somewhere :)
Mathematically this is almost exactly the same problem as cryptography.
You: I want to go from an id(string of bits) to another number (string of bits) and back again in a non-obvious way.
Cryptography: I want to go from plaintext (string of bits) to another string of bits and back again (reversible) in a non-obvious way.
So for a simple solution, can I suggest just plugging in whatever cryptography algorithm is most convenient in your language, and encrypt and decrypt your id?
If you wanted to be a bit cleverer you can do what is called "salting" in addition to cryptography. Take your id as a 32 bit (or whatever) number. Concatenate it with a random 32 bit number. Encrypt the result. To reverse, just decrypt, and throw away the random part.
Of course, if someone was seriously attacking this, this might be vulnerable to known plaintext/differential cryptanalysis attacks as you have a very small known plaintext space, but it sounds like you aren't trying to defend against serious attacks.
First remove the offset of 1, so you get numbers in the range 0 to 232-2. Let m = 232-1.
Choose some a that is relative prime to m. Since it is relatively prime it has an inverse a' so that a * a' = 1 (mod m). Also choose some b. Choose big numbers to get a good mixing effect.
Then you can compute your desired pseudo-random number by y = (a * x + b) % m, and get back the original by x = ((y - b) * a') % m.
This is essentially one step of a linear congruential generator (LCG) for pseudo-random numbers.
Note that this is not secure, it is only obfuscation. For example, if a user can get two numbers in sequence then he can recover a and b easily.
In most cases web apps use a hash of a randomly generated number as a reference to a table row. This hash can be stored as a number and displayed as a string for the end user.
This hash is unique and it is identifier and the id is only used in the application itself, never shown to the outside world.
Is there a standard way to do this?
Googling -- "approximate entropy" bits -- uncovers multiple academic papers but I'd like to just find a chunk of pseudocode defining the approximate entropy for a given bit string of arbitrary length.
(In case this is easier said than done and it depends on the application, my application involves 16,320 bits of encrypted data (cyphertext). But encrypted as a puzzle and not meant to be impossible to crack. I thought I'd first check the entropy but couldn't easily find a good definition of such. So it seemed like a question that ought to be on StackOverflow! Ideas for where to begin with de-cyphering 16k random-seeming bits are also welcome...)
See also this related question:
What is the computer science definition of entropy?
Entropy is not a property of the string you got, but of the strings you could have obtained instead. In other words, it qualifies the process by which the string was generated.
In the simple case, you get one string among a set of N possible strings, where each string has the same probability of being chosen than every other, i.e. 1/N. In the situation, the string is said to have an entropy of N. The entropy is often expressed in bits, which is a logarithmic scale: an entropy of "n bits" is an entropy equal to 2n.
For instance: I like to generate my passwords as two lowercase letters, then two digits, then two lowercase letters, and finally two digits (e.g. va85mw24). Letters and digits are chosen randomly, uniformly, and independently of each other. This process may produce 26*26*10*10*26*26*10*10 = 4569760000 distinct passwords, and all these passwords have equal chances to be selected. The entropy of such a password is then 4569760000, which means about 32.1 bits.
Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:
import math
def entropy(string):
"Calculates the Shannon entropy of a string"
# get probability of chars in string
prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]
# calculate the entropy
entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])
return entropy
def entropy_ideal(length):
"Calculates the ideal Shannon entropy of a string with given length"
prob = 1.0 / length
return -1.0 * length * prob * math.log(prob) / math.log(2.0)
Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.
I believe the answer is the Kolmogorov Complexity of the string.
Not only is this not answerable with a chunk of pseudocode, Kolmogorov complexity is not a computable function!
One thing you can do in practice is compress the bit string with the best available data compression algorithm.
The more it compresses the lower the entropy.
The NIST Random Number Generator evaluation toolkit has a way of calculating "Approximate Entropy." Here's the short description:
Approximate Entropy Test Description: The focus of this test is the
frequency of each and every overlapping m-bit pattern. The purpose of
the test is to compare the frequency of overlapping blocks of two
consecutive/adjacent lengths (m and m+1) against the expected result
for a random sequence.
And a more thorough explanation is available from the PDF on this page:
http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html
There is no single answer. Entropy is always relative to some model. When someone talks about a password having limited entropy, they mean "relative to the ability of an intelligent attacker to predict", and it's always an upper bound.
Your problem is, you're trying to measure entropy in order to help you find a model, and that's impossible; what an entropy measurement can tell you is how good a model is.
Having said that, there are some fairly generic models that you can try; they're called compression algorithms. If gzip can compress your data well, you have found at least one model that can predict it well. And gzip is, for example, mostly insensitive to simple substitution. It can handle "wkh" frequently in the text as easily as it can handle "the".
Using Shannon entropy of a word with this formula : http://imgur.com/a/DpcIH
Here's a O(n) algorithm that calculates it :
import math
from collections import Counter
def entropy(s):
l = float(len(s))
return -sum(map(lambda a: (a/l)*math.log2(a/l), Counter(s).values()))
Here's an implementation in Python (I also added it to the Wiki page):
import numpy as np
def ApEn(U, m, r):
def _maxdist(x_i, x_j):
return max([abs(ua - va) for ua, va in zip(x_i, x_j)])
def _phi(m):
x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
return -(N - m + 1.0)**(-1) * sum(np.log(C))
N = len(U)
return _phi(m) - _phi(m + 1)
Example:
>>> U = np.array([85, 80, 89] * 17)
>>> ApEn(U, 2, 3)
-1.0996541105257052e-05
The above example is consistent with the example given on Wikipedia.
Take a commonly used binary hash function - for example, SHA-256. As the name implies, it outputs a 256 bit value.
Let A be the set of all possible 256 bit binary values. A is extremely large, but finite.
Let B be the set of all possible binary values. B is infinite.
Let C be the set of values obtained by running SHA-256 on every member of B. Obviously this can't be done in practice, but I'm guessing we can still do mathematical analysis of it.
My Question: By necessity, C ⊆ A. But does C = A?
EDIT: As was pointed out by some answers, this is wholly dependent on the has function in question. So, if you know the answer for any particular hash function, please say so!
First, let's point out that SHA-256 does not accept all possible binary strings as input. As defined by FIPS 180-3, SHA-256 accepts as input any sequence of bits of length lower than 2^64 bits (i.e. no more than 18446744073709551615 bits). This is very common; all hash functions are somehow limited in formal input length. One reason is that the notion of security is defined with regards to computational cost; there is a threshold about computational power that any attacker may muster. Inputs beyond a given length would require more than that maximum computational power to simply evaluate the function. In brief, cryptographers are very wary of infinites, because infinites tend to prevent security from being even defined, let alone quantified. So your input set C should be restricted to sequences up to 2^64-1 bits.
That being said, let's see what is known about hash function surjectivity.
Hash functions try to emulate a random oracle, a conceptual object which selects outputs at random under the only constraint that it "remembers" previous inputs and outputs, and, if given an already seen input, it returns the same output than previously. By definition, a random oracle can be proven surjective only by trying inputs and exhausting the output space. If the output has size n bits, then it is expected that about 2^(2n) distinct inputs will be needed to exhaust the output space of size 2^n. For n = 256, this means that hashing about 2^512 messages (e.g. all messages of 512 bits) ought to be enough (on average). SHA-256 accepts inputs very much longer than 512 bits (indeed, it accepts inputs up to 18446744073709551615 bits), so it seems highly plausible that SHA-256 is surjective.
However, it has not been proven that SHA-256 is surjective, and that is expected. As shown above, a surjectivity proof for a random oracle requires an awful lot of computing power, substantially more than mere attacks such as preimages (2^n) and collisions (2^(n/2)). Consequently, a good hash function "should not" allow a property such as surjectivity to be actually proven. It would be very suspicious: security of hash function stems from the intractability of their internal structure, and such an intractability should firmly oppose to any attempt at mathematical analysis.
As a consequence, surjectivity is not formally proven for any decent hash function, and not even for "broken" hash functions such as MD4. It is only "highly suspected" (a random oracle with inputs much longer than the output should be surjective).
Not necessarily. The pigeonhole principle states that once one more hash beyond the size of A has been generated that there is a probability of collision of 1, but it does not state that every single element of A has been generated.
It really depends on the hash function. If you use this valid hash function:
Int256 Hash (string input) {
return 0;
}
then it is obvious that C != A. So the "for example, SHA256" is a pretty important note to consider.
To answer your actual question: I believe so, but I'm just guessing. Wikipedia does not provide any meaningful info on this.
Not necessarily. That would depend on the hash function.
It would probably be ideal if the hash function was surjective, but there are things that're usually more important, such as a low likelihood of collisions.
It is not always the case. However, quality required for an hash algorithm are:
Cardinality of B
Repartition of hashes in B (every value in B must have the same probability to be a hash)