How does the modulus shorten a number to only 16 digits in this example? - math

I am learning Solidity by following various tutorials online. One of these tutorials is Cryptozombies. In this tutorial we create a zombie that has "dna" (a number). These numbers must be only 16 digits long to work with the code. We do this by defining
uint dnaDigits = 16;
uint dnaModulus = 10 ** dnaDigits;
At some point in the lesson, we define a function that generates "random" dna by passing a string to the keccak256 hash function.
function _generateRandomDna(string memory _str) private view returns (uint) {
uint rand = uint(keccak256(abi.encodePacked(_str)));
return rand % dnaModulus;
}
I am trying to figure out how the output of keccak256 % 10^16 is always a 16 digit integer. I guess part of the problem is that I don't exactly understand what keccak256 outputs. What I (think) I know is that it outputs a 256 bit number. I bit is either 0 or 1, so there are 2^256 possible outputs from this function?
If you need more information please let me know. I think included everything relevant from the code.

Whatever keccak256(abi.encodePacked(_str)) returns, it converted into uint because of the casting uint(...).
uint rand = uint(keccak256(abi.encodePacked(_str)));
When it's an uint then the simple math, because it's modulus.
xxxxx % 100 always < 100

I see now that this code is mean to produce a number that has no more than 16 digits, not exactly 16 digits.
Take x % 10^16
If x < 10^16, then then the remainder is x, and x is by definition less than 10^16, which is the smallest possible number with 17 digits I.e. every number less than 10^16 has 16 or fewer digits.
If x = 10^16, then remainder is 0 which has fewer than 16 digits.
If x > 10^16, then either 10^16 goes into x fully or not. If it does fully, remainder is zero which has less than 16 digits, if it goes in partially, then remainder is only a part of 10^16 which will always have 16 or fewer digits.

Related

Statistical probability of N contiguous true-bits in a sequence of bits?

Let's assume I have an N-bit stream of generated bits. (In my case 64kilobits.)
Whats the probability of finding a sequence of X "all true" bits, contained within a stream of N bits. Where X = (2 to 16), and N = (16 to 1000000), and X < N.
For example:
If N=16 and X=5, whats the likelyhood of finding 11111 within a 16-bit number.
Like this pseudo-code:
int N = 1<<16; // (64KB)
int X = 5;
int Count = 0;
for (int i = 0; i < N; i++) {
int ThisCount = ContiguousBitsDiscovered(i, X);
Count += ThisCount;
}
return Count;
That is, if we ran an integer in a loop from 0 to 64K-1... how many times would 11111 appear within those numbers.
Extra rule: 1111110000000000 doesn't count, because it has 6 true values in a row, not 5. So:
1111110000000000 = 0x // because its 6 contiguous true bits, not 5.
1111100000000000 = 1x
0111110000000000 = 1x
0011111000000000 = 1x
1111101111100000 = 2x
I'm trying to do some work involving physically-based random-number generation, and detecting "how random" the numbers are. Thats what this is for.
...
This would be easy to solve if N were less than 32 or so, I could just "run a loop" from 0 to 4GB, then count how many contiguous bits were detected once the loop was completed. Then I could store the number and use it later.
Considering that X ranges from 2 to 16, I'd literally only need to store 15 numbers, each less than 32 bits! (if N=32)!
BUT in my case N = 65,536. So I'd need to run a loop, for 2^65,536 iterations. Basically impossible :)
No way to "experimentally calculate the values for a given X, if N = 65,536". So I need maths, basically.
Fix X and N, obiously with X < N. You have 2^N possible values of combinations of 0 and 1 in your bit number, and you have N-X +1 possible sequences of 1*X (in this part I'm only looking for 1's together) contained in you bit number. Consider for example N = 5 and X = 2, this is a possible valid bit number 01011, so fixed the last two characteres (the last two 1's) you have 2^2 possible combinations for that 1*Xsequence. Then you have two cases:
Border case: Your 1*X is in the border, then you have (2^(N -X -1))*2 possible combinations
Inner case: You have (2^(N -X -2))*(N-X-1) possible combinations.
So, the probability is (border + inner )/2^N
Examples:
1)N = 3, X =2, then the proability is 2/2^3
2) N = 4, X = 2, then the probaility is 5/16
A bit brute force, but I'd do something like this to avoid getting mired in statistics theory:
Multiply the probabilities (1 bit = 0.5, 2 bits = 0.5*0.5, etc) while looping
Keep track of each X and when you have the product of X bits, flip it and continue
Start with small example (N = 5, X=1 - 5) to make sure you get edge cases right, compare to brute force approach.
This can probably be expressed as something like Sum (Sum 0.5^x (x = 1 -> 16) (for n = 1 - 65536) , but edge cases need to be taken into account (i.e. 7 bits doesn't fit, discard probability), which gives me a bit of a headache. :-)
#Andrex answer is plain wrong as it counts some combinations several times.
For example consider the case N=3, X=1. Then the combination 101 happens only 1/2^3 times but the border calculation counts it two times: one as the sequence starting with 10 and another time as the sequence ending with 01.
His calculations gives a (1+4)/8 probability whereas there are only 4 unique sequences that have at least a single contiguous 1 (as opposed to cases such as 011):
001
010
100
101
and so the probability is 4/8.
To count the number of unique sequences you need to account for sequences that can appear multiple times. As long as X is smaller than N/2 this will happens. Not sure how you can count them tho.

How do computers evaluate huge numbers?

If I enter a value, for example
1234567 ^ 98787878
into Wolfram Alpha it can provide me with a number of details. This includes decimal approximation, total length, last digits etc. How do you evaluate such large numbers? As I understand it a programming language would have to have a special data type in order to store the number, let alone add it to something else. While I can see how one might approach the addition of two very large numbers, I can't see how huge numbers are evaluated.
10^2 could be calculated through repeated addition. However a number such as the example above would require a gigantic loop. Could someone explain how such large numbers are evaluated? Also, how could someone create a custom large datatype to support large numbers in C# for example?
Well it's quite easy and you can have done it yourself
Number of digits can be obtained via logarithm:
since `A^B = 10 ^ (B * log(A, 10))`
we can compute (A = 1234567; B = 98787878) in our case that
`B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...`
integer part + 1 (601767807 + 1 = 601767808) is the number of digits
First, say, five, digits can be gotten via logarithm as well;
now we should analyze fractional part of the
B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...
f = 0.4709646...
first digits are 10^f (decimal point removed) = 29577...
Last, say, five, digits can be obtained as a corresponding remainder:
last five digits = A^B rem 10^5
A rem 10^5 = 1234567 rem 10^5 = 34567
A^B rem 10^5 = ((A rem 10^5)^B) rem 10^5 = (34567^98787878) rem 10^5 = 45009
last five digits are 45009
You may find BigInteger.ModPow (C#) very useful here
Finally
1234567^98787878 = 29577...45009 (601767808 digits)
There are usually libraries providing a bignum datatype for arbitrarily large integers (eg. mapping digits k*n...(k+1)*n-1, k=0..<some m depending on n and number magnitude> to a machine word of size n redefining arithmetic operations). for c#, you might be interested in BigInteger.
exponentiation can be recursively broken down:
pow(a,2*b) = pow(a,b) * pow(a,b);
pow(a,2*b+1) = pow(a,b) * pow(a,b) * a;
there also are number-theoretic results that have engenedered special algorithms to determine properties of large numbers without actually computing them (to be precise: their full decimal expansion).
To compute how many digits there are, one uses the following expression:
decimal_digits(n) = 1 + floor(log_10(n))
This gives:
decimal_digits(1234567^98787878) = 1 + floor(log_10(1234567^98787878))
= 1 + floor(98787878 * log_10(1234567))
= 1 + floor(98787878 * 6.0915146640862625)
= 1 + floor(601767807.4709647)
= 601767808
The trailing k digits are computed by doing exponentiation mod 10^k, which keeps the intermediate results from ever getting too large.
The approximation will be computed using a (software) floating-point implementation that effectively evaluates a^(98787878 log_a(1234567)) to some fixed precision for some number a that makes the arithmetic work out nicely (typically 2 or e or 10). This also avoids the need to actually work with millions of digits at any point.
There are many libraries for this and the capability is built-in in the case of python. You seem primarily concerned with the size of such numbers and the time it may take to do computations like the exponent in your example. So I'll explain a bit.
Representation
You might use an array to hold all the digits of large numbers. A more efficient way would be to use an array of 32 bit unsigned integers and store "32 bit chunks" of the large number. You can think of these chunks as individual digits in a number system with 2^32 distinct digits or characters. I used an array of bytes to do this on an 8-bit Atari800 back in the day.
Doing math
You can obviously add two such numbers by looping over all the digits and adding elements of one array to the other and keeping track of carries. Once you know how to add, you can write code to do "manual" multiplication by multiplying digits and putting the results in the right place and a lot of addition - but software will do all this fairly quickly. There are faster multiplication algorithms than the one you would use manually on paper as well. Paper multiplication is O(n^2) where other methods are O(n*log(n)). As for the exponent, you can of course multiply by the same number millions of times but each of those multiplications would be using the previously mentioned function for doing multiplication. There are faster ways to do exponentiation that require far fewer multiplies. For example you can compute x^16 by computing (((x^2)^2)^2)^2 which involves only 4 actual (large integer) multiplications.
In practice
It's fun and educational to try writing these functions yourself, but in practice you will want to use an existing library that has been optimized and verified.
I think a part of the answer is in the question itself :) To store these expressions, you can store the base (or mantissa), and exponent separately, like scientific notation goes. Extending to that, you cannot possibly evaluate the expression completely and store such large numbers, although, you can theoretically predict certain properties of the consequent expression. I will take you through each of the properties you talked about:
Decimal approximation: Can be calculated by evaluating simple log values.
Total number of digits for expression a^b, can be calculated by the formula
Digits = floor function (1 + Log10(a^b)), where floor function is the closest integer smaller than the number. For e.g. the number of digits in 10^5 is 6.
Last digits: These can be calculated by the virtue of the fact that the expression of linearly increasing exponents form a arithmetic progression. For e.g. at the units place; 7, 9, 3, 1 is repeated for exponents of 7^x. So, you can calculate that if x%4 is 0, the last digit is 1.
Can someone create a custom datatype for large numbers, I can't say, but I am sure, the number won't be evaluated and stored.

how do I generate 2 random prime numbers that when multiplied, yield a number with X bits? (X given as argument))

I lack the math skills to make this function.
basically, i want to return 2 random prime numbers that when multiplied, yield a number of bits X given as argument.
for example:
if I say my X is 3 then a possible solution would be:
p = 2 and q = 3 becouse 2 * 3 = 6 (110 has 3 bits).
A problem with this statement is that it starts by asking for two "random" prime numbers. Without any explicit statement of the distribution of the required random primes, we are already stuck. (This is the beginning of a classic paradox, where we are asked to generate a "random" integer.)
But suppose that we change the statement to finding any two arbitrary primes, that yield the desired product with a given number of bits x. The answer is trivial.
The set of numbers that have exactly x bits in their binary representation is the half open set of integers [2^(x-1),2^x-1].
Choose an arbitrary prime number that is less than or equal to (2^x-1)/2. Call it p1.
Next, choose a second prime number that lies in the interval (2^(x-1)/p1,(2^x-1)/p1). Call it p2.
It must be true that p1*p2 will be in the desired interval.
For example, given x = 10, so the product must lie in the interval [512,1023], the set of integers with exactly 10 bits. (Note, there are apparently 147 such numbers in that interval, with exactly two prime factors.)
Step 1:
Choose p1 as any prime no larger than 1023/2 = 511.5. I'll pick p1 = 137. Then the second prime factor must be a prime that lies in the interval
[512 1023]/137
ans =
3.7372 7.4672
thus either 5 or 7.
dec2bin(137*[5 7])
ans =
1010101101
1110111111
If you know the number of bits, you can generate a number 2^(x-2) < x < 2^(x-1). Then take the square root and find the closest primes on either side of it. Multiplying them together will, in most cases, get you a number in the correct range. If it's too high, you can take the two primes directly on the lower side of it.
pseudocode:
x = bits
primelist[] = makeprimelist()
rand = randnum between 2^(x-2) and 2^(x-1)
n = findposition(primelist, rand)
do
result = primelist[n]*primelist[n+1]
n--
while result > 2^(x-1)
Note that numbers generated this way will allways have '1' as the highest significant bit, so would be possible to generate a number of x-1 bits and just tack the 1 onto the end.

How to efficiently convert a few bytes into an integer between a range?

I'm writing something that reads bytes (just a List<int>) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.
Now I am trying to implement a method which signature looks like:
int getRandomInteger(int min, int max)
I have two theories how I can fetch bytes from my random source, and convert them to an integer.
Approach #1 is naivé . Fetch (max - min) / 256 number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.
Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:
Lets take min: 0, max: 1000 as an example.
Calculate ceil(rangeSize / 256) which in this case is ceil(1000 / 256) = 4. Now fetch one (1) byte from the source.
Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is 750 + 120 = 870.
In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".
How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.
I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.
Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.
The more-or-less standard way to do what you want is to:
Let R = max - min + 1 -- the number of possible random output values
Let N = 2^k >= mR, m>=1 -- a power of 2 at least as big as some multiple of R that you choose.
loop
b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)
This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1 happens to be a power of 2, then you'll have zero rejections.
If you have m=1 and min-max+1 is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m.
In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m.
Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.
3 bytes (together) give you random integer in range 0..16777215. You can use 20 bits from this value to get range 0..1048575 and throw away values > 1000000
range 1 to r
256^a >= r
first find 'a'
get 'a' number of bytes into array A[]
num=0
for i=0 to len(A)-1
num+=(A[i]^(8*i))
next
random number = num mod range
Your random source gives you 8 random bits per call. For an integer in the range [min,max] you would need ceil(log2(max-min+1)) bits.
Assume that you can get random bytes from the source using some function:
bool RandomBuf(BYTE* pBuf , size_t nLen); // fill buffer with nLen random bytes
Now you can use the following function to generate a random value in a given range:
// --------------------------------------------------------------------------
// produce a uniformly-distributed integral value in range [nMin, nMax]
// T is char/BYTE/short/WORD/int/UINT/LONGLONG/ULONGLONG
template <class T> T RandU(T nMin, T nMax)
{
static_assert(std::numeric_limits<T>::is_integer, "RandU: integral type expected");
if (nMin>nMax)
std::swap(nMin, nMax);
if (0 == (T)(nMax-nMin+1)) // all range of type T
{
T nR;
return RandomBuf((BYTE*)&nR, sizeof(T)) ? *(T*)&nR : nMin;
}
ULONGLONG nRange = (ULONGLONG)nMax-(ULONGLONG)nMin+1 ; // number of discrete values
UINT nRangeBits= (UINT)ceil(log((double)nRange) / log(2.)); // bits for storing nRange discrete values
ULONGLONG nR ;
do
{
if (!RandomBuf((BYTE*)&nR, sizeof(nR)))
return nMin;
nR= nR>>((sizeof(nR)<<3) - nRangeBits); // keep nRangeBits random bits
}
while (nR >= nRange); // ensure value in range [0..nRange-1]
return nMin + (T)nR; // [nMin..nMax]
}
Since you are always getting a multiple of 8 bits, you can save extra bits between calls (for example you may need only 9 bits out of 16 bits). It requires some bit-manipulations, and it is up to you do decide if it is worth the effort.
You can save even more, if you'll use 'half bits': Let's assume that you want to generate numbers in the range [1..5]. You'll need log2(5)=2.32 bits for each random value. Using 32 random bits you can actually generate floor(32/2.32)= 13 random values in this range, though it requires some additional effort.

Fast exponentiation when only first k digits are required?

This is actually for a programming contest, but I've tried really hard and haven't got even the faintest clue how to do this.
Find the first and last k digits of nm where n and m can be very large ~ 10^9.
For the last k digits I implemented modular exponentiation.
For the first k I thought of using the binomial theorem upto certain powers but that involves quite a lot of computation for factorials and I'm not sure how to find an optimal point at which n^m can be expanded as (x+y)m.
So is there any known method to find the first k digits without performing the entire calculation?
Update 1 <= k <= 9 and k will always be <= digits in nm
not sure, but the identity nm = exp10(m log10(n)) = exp(q (m log(n)/q)) where q = log(10) comes to mind, along with the fact that the first K digits of exp10(x) = the first K digits of exp10(frac(x)) where frac(x) = the fractional part of x = x - floor(x).
To be more explicit: the first K digits of nm are the first K digits of its mantissa = exp(frac(m log(n)/q) * q), where q = log(10).
Or you could even go further in this accounting exercise, and use exp((frac(m log(n)/q)-0.5) * q) * sqrt(10), which also has the same mantissa (+ hence same first K digits) so that the argument of the exp() function is centered around 0 (and between +/- 0.5 log 10 = 1.151) for speedy convergence.
(Some examples: suppose you wanted the first 5 digits of 2100. This equals the first 5 digits of exp((frac(100 log(2)/q)-0.5)*q)*sqrt(10) = 1.267650600228226. The actual value of 2100 is 1.267650600228229e+030 according to MATLAB, I don't have a bignum library handy. For the mantissa of 21,000,000,000 I get 4.612976044195602 but I don't really have a way of checking.... There's a page on Mersenne primes where someone's already done the hard work; 220996011-1 = 125,976,895,450... and my formula gives 1.259768950493908 calculated in MATLAB which fails after the 9th digit.)
I might use Taylor series (for exp and log, not for nm) along with their error bounds, and keep adding terms until the error bounds drop below the first K digits. (normally I don't use Taylor series for function approximation -- their error is optimized to be most accurate around a single point, rather than over a desired interval -- but they do have the advantage that they're mathematically simple, and you can increased accuracy to arbitrary precision simply by adding additional terms)
For logarithms I'd use whatever your favorite approximation is.
Well. We want to calculate and to get only n first digits.
Calculate by the following iterations:
You have .
Calcluate each not exactly.
The thing is that the relative error of is less
than n times relative error of a.
You want to get final relative error less than .
Thus relative error on each step may be .
Remove last digits at each step.
For example, a=2, b=16, n=1. Final relative error is 10^{-n} = 0,1.
Relative error on each step is 0,1/16 > 0,001.
Thus 3 digits is important on each step.
If n = 2, you must save 4 digits.
2 (1), 4 (2), 8 (3), 16 (4), 32 (5), 64 (6), 128 (7), 256 (8), 512 (9), 1024 (10) --> 102,
204 (11), 408 (12), 816 (13), 1632 (14) -> 163, 326 (15), 652 (16).
Answer: 6.
This algorithm has a compexity of O(b). But it is easy to change it to get
O(log b)
Suppose you truncate at each step? Not sure how accurate this would be, but, e.g., take
n=11
m=some large number
and you want the first 2 digits.
recursively:
11 x 11 -> 121, truncate -> 12 (1 truncation or rounding)
then take truncated value and raise again
12 x 11 -> 132 truncate -> 13
repeat,
(132 truncated ) x 11 -> 143.
...
and finally add #0's equivalent to the number of truncations you've done.
Have you taken a look at exponentiation by squaring? You might be able to modify one of the methods such that you only compute what's necessary.
In my last algorithms class we had to implement something similar to what you're doing and I vaguely remember that page being useful.

Resources