Will this addition always produce a unique number? - math

I don't know how to test the following exhaustively, without brute forcing it, so I'll just ask whether the concept is sound.
I have two 64bit unsigned int variables, which are both used as bit-fields. Both variables can have up to 60 bits set, from 1-60. Any amount of the 60 bits can be set, and they can be set in any order. bits 61, 62, and 63 do not get set in either variable. Additionally, one, and only one, of the variables always has the 64th bit set.
Given the above description, am I correct in thinking that hash will be unique for all possible combinations of field1 and field2?:
uint64_t field1 = ...;
uint64_t field2 = ...;
uint64_t hash = field1 + field2;

No. Simple example:
0b0011 + 0b0100 = 0b0111
0b0010 + 0b0101 = 0b0111
It is not possible to provide unique hash of length n for all pairs of values with length n. Note that there are about 2^60 * 2^60 = 2^120 combinations, so 2^60 hashes cannot fit them all.

Related

Most elegant way to determine how much one has been bitshifted

So let's say we bitshift 1 by some number x; eg, in c:
unsigned char cNum= 1, x= 6;
cNum <<= x;
cNum will equal 01000000b (0x40).
Easy peasy. But without using a lookup table or while loop, is there a simple operation that will take cNum and give me x back?
AFAIK, no 'simple' formula is available.
One can, however, calculate the index of most significant (or least significant) set bit:
a = 000010010, a_left = 3, a_right = 1
b = 001001000, b_left = 5, b_right = 3
The difference of the shifts is 2 (or -2).
One can then shift the smaller by abs(shift) to compare that a << 2 == b. (In some architectures there exists a shift by signed value, which works without absolute value or checking which way the shift needs to be carried.)
In ARM Neon there exists an instruction for counting the MSB bit and in Intel there exists an instruction to scan both from left and right.
log2(cNum)+ 1; will yield x where cNum != 0, at least in GNU c.
And the compiler does the casts automagically which is probably bad form, but it gives me what I need.

Weighted randomization based on runtime data in System Verilog

Is there a way to do weighted randomization in System Verilog based on runtime data. Say, I have a queue of integers and a queue of weights (unsigned integers) and wish to select a random integer from the first queue as per the weights in the second queue.
int data[$] = '{10, 20, 30};
uint_t weights[$] = '{100, 200, 300};
Any random construct expects the weights hardcoded as in
constraint range { Var dist { [0:1] := 50 , [2:7] := 50 }; }
But in my case, I need to pick an element from an unknown number of elements.
PS: Assume the number of elements and weights will be the same always.
Unfortunately, the dist constraint only lets you choose from a fixed number of values.
Two approaches I can think of are
Push each data value into a queue using the weight as a repetition count. In your example, you wind up with a queue of 600 values. Randomly pick an index into the queue. The selected element has the distribution you want. An example is posted here.
Create an array of ranges for each weight. For your example the array would be uint_t ranges[][2]'{{0,99},{100,299},{300,599}}. Then you could do the following in a constraint
index inside {[0:weights.sum()-1]};
foreach (data[ii])
index inside {[ranges[ii][0]:ranges[ii][1]} -> value == date[ii];

Can any set of numbers be expressed by a function?

Is it possible to express ANY random set of numbers by a function?
Question clarification:
for example:
if desired result set = {1,2,3,4,5}
so I don't mean something like this:
function getSet(){
return {1,2,3,4,5};
}
but more like this:
function genSet(){
result = {}
for(i=0;i<5;i++){
result.push(i);
}
return result;
}
So in other words, can there be a logic to calculate any desired set?
There is a lot of mathematics behind this question. There are some interesting results.
Any set of (real) numbers can be define by a polynomial function f(x) = a + b x + c x^2 + ... so that a number is in the set if f(x)=0. Technically this is an algebraic curve in 1D. While this might seem a optimistic result there is not limit on how complex the polynomial could be and polynomials above the degree 5 have no explicit result.
There is a whole field of study on Computable numbers, real numbers which can be can be computed to within any desired precision by a finite, terminating algorithm, and their converse: non computable numbers, which can't. The bad news is there are a lot more non-computable numbers than computable ones.
The above has been based on real numbers which are decidedly more tricky than the integers or even a finite set of integers which is all we can represent by int or long datatypes. There is a big field of study in this see Computability theory (computer science). I think the Turings halting problem come in to play, this is about if you can determine if a algorithm will terminate. Unfortunately this can't be determined and a consequence is "Not every set of natural numbers is computable." The proof of this does require the infinite size of the naturals so I'm not sure about finite sets.
Representations
There are two common representations used for sets when programming. Suppose the set S is a subset of some universe of items U.
Membership Predicate
One way to represent the set S is a function member from S to { true, false }. For all x in U:
member(x) = true if x is in S
member(x) = false if x is not in S
Pseudocode
bool member(int n)
return 1 <= n <= 5
Enumeration
Another way to represent the S is to store all of its members in a data structure, such as a list, hash table, or binary tree.
Pseudocode
enumerable<int> S()
for int i = 1 to 5
yield return i
Operations
With either of these representations, most set operations can be defined. For example, the union of two sets would look as follows with each of the two representations.
Membership Predicate
func<int, bool> union(func<int, bool> s, func<int, bool> t)
return x => s(x) || t(x)
Enumeration
enumrable<int> union(enumerable<int> s, enumerable<int> t)
hashset<int> r
foreach x in s
r.add(x)
foreach x in t
if x not in r
r.add(x)
return r
Comparison
The membership predicate representation can be extremely versatile because all kinds of set operations from mathematics can be very easily expressed (complement, Cartesian product, etc.). The drawback is that there is no general way to enumerate all the members of a set represented in this way. The set of all positive real numbers, for example, cannot even be enumerated.
The enumeration representation typically involves much more expensive set operations, and some operations (such as the complement of the integer set {1, 2, 3, 4, 5}) cannot even be represented. It should be chosen if you need to be able to enumerate the members of a set, not just test membership.

How to efficiently convert a few bytes into an integer between a range?

I'm writing something that reads bytes (just a List<int>) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.
Now I am trying to implement a method which signature looks like:
int getRandomInteger(int min, int max)
I have two theories how I can fetch bytes from my random source, and convert them to an integer.
Approach #1 is naivé . Fetch (max - min) / 256 number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.
Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:
Lets take min: 0, max: 1000 as an example.
Calculate ceil(rangeSize / 256) which in this case is ceil(1000 / 256) = 4. Now fetch one (1) byte from the source.
Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is 750 + 120 = 870.
In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".
How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.
I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.
Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.
The more-or-less standard way to do what you want is to:
Let R = max - min + 1 -- the number of possible random output values
Let N = 2^k >= mR, m>=1 -- a power of 2 at least as big as some multiple of R that you choose.
loop
b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)
This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1 happens to be a power of 2, then you'll have zero rejections.
If you have m=1 and min-max+1 is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m.
In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m.
Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.
3 bytes (together) give you random integer in range 0..16777215. You can use 20 bits from this value to get range 0..1048575 and throw away values > 1000000
range 1 to r
256^a >= r
first find 'a'
get 'a' number of bytes into array A[]
num=0
for i=0 to len(A)-1
num+=(A[i]^(8*i))
next
random number = num mod range
Your random source gives you 8 random bits per call. For an integer in the range [min,max] you would need ceil(log2(max-min+1)) bits.
Assume that you can get random bytes from the source using some function:
bool RandomBuf(BYTE* pBuf , size_t nLen); // fill buffer with nLen random bytes
Now you can use the following function to generate a random value in a given range:
// --------------------------------------------------------------------------
// produce a uniformly-distributed integral value in range [nMin, nMax]
// T is char/BYTE/short/WORD/int/UINT/LONGLONG/ULONGLONG
template <class T> T RandU(T nMin, T nMax)
{
static_assert(std::numeric_limits<T>::is_integer, "RandU: integral type expected");
if (nMin>nMax)
std::swap(nMin, nMax);
if (0 == (T)(nMax-nMin+1)) // all range of type T
{
T nR;
return RandomBuf((BYTE*)&nR, sizeof(T)) ? *(T*)&nR : nMin;
}
ULONGLONG nRange = (ULONGLONG)nMax-(ULONGLONG)nMin+1 ; // number of discrete values
UINT nRangeBits= (UINT)ceil(log((double)nRange) / log(2.)); // bits for storing nRange discrete values
ULONGLONG nR ;
do
{
if (!RandomBuf((BYTE*)&nR, sizeof(nR)))
return nMin;
nR= nR>>((sizeof(nR)<<3) - nRangeBits); // keep nRangeBits random bits
}
while (nR >= nRange); // ensure value in range [0..nRange-1]
return nMin + (T)nR; // [nMin..nMax]
}
Since you are always getting a multiple of 8 bits, you can save extra bits between calls (for example you may need only 9 bits out of 16 bits). It requires some bit-manipulations, and it is up to you do decide if it is worth the effort.
You can save even more, if you'll use 'half bits': Let's assume that you want to generate numbers in the range [1..5]. You'll need log2(5)=2.32 bits for each random value. Using 32 random bits you can actually generate floor(32/2.32)= 13 random values in this range, though it requires some additional effort.

A way of checking if the digits of num1 are the digits in num2 without checking each digit?

Lets say I have guessed a lottery number of:
1689
And the way the lottery works is, the order of the digits don't matter as long as the digits match up 1:1 with the digits in the actual winning lottery number.
So, the number 1689 would be a winning lottery number with:
1896, 1698, 9816, etc..
As long as each digit in your guess was present in the target number, then you win the lottery.
Is there a mathematical way I can do this?
I've solved this problem with a O(N^2) looping checking each digit against each digit of the winning lottery number (separating them with modulus). Which is fine, it works but I want to know if there are any neat math tricks I can do.
For example, at first... I thought I could be tricky and just take the sum and product of each digit in both numbers and if they matched then you won.
^ Do you think that would work?
However, I quickly disproved this when I found that lottery guess: 222, and 124 have the different digits but the same product and sum.
Anyone know any math tricks I can use to quickly determine if the digits in num1 match the digits in num2 regardless of order?
How about going through each number, and counting up the number of appearances of each digit (into two different 10 element arrays)? After you'd done the totaling, compare the counts of each digit. Since you only look at each digit once, that's O(N).
Code would look something like:
for(int i=0; i<digit_count; i++)
{
guessCounts[guessDigits[i] - '0']++;
actualCounts[actualDigits[i] - '0']++;
}
bool winner = true;
for(int i=0; i<10 && winner; i++)
{
winner &= guessCounts[i] == actualCounts[i];
}
Above code makes the assumption that guessDigits and actualDigits are both char strings; if they held the actual digits then you can just skip the - '0' business.
There are probably optimizations that would make this take less space or terminate sooner, but it's a pretty straightforward example of an O(N) approach.
By the way, as I mentioned in a comment, the multiplication/sum comparison will definitely not work because of zeros. Consider 0123 and 0222. Product is 0, sum is 6 in both cases.
Split into array, sort array, join into string, compare strings.
(Not a math trick, I know)
You can place the digits into an array, sort the array, then compare the arrays element by element. This will give you O( NlogN ) complexity which is better than O( N^2 ).
If N can become large, sorting the digits is the answer.
Because digits are 0..9 you can count the number of occurrences of each digit of the lottery answer in an array [0..9].
To compare you can subtract 1 for each digit you encounter in the guess. When you encounter a digit where the count is already 0, you know the guess is different. When you get through all the digits, the guess is the same (as long as the guess has as many digits as the lottery answer).
For each digit d multiply with the (d+1)-th prime number.
This is more mathematical but less efficient than the sorting or bucket methods. Actually it is the bucket method in disguise.
I'd sort both number's digits and compare them.
If you are only dealing with 4 digits I dont think you have to put much thought into which algorithm you use. They will all perform roughly the same.
Also 222 and 124 dont have the same sum
You have to consider that when n is small, the order of efficiency is irrelevant, and the constants start to matter more. How big can your numbers actually get? Can you get up to 10 digits? 20? 100? If your numbers have just a few digits, n^2 really isn't that bad. If you have strings of thousands of digits, then you might actually need to do something more clever like sorting or bucketing. (i.e. count the 0s, count the 1s, etc.)
I'm stealing the answer from Yuliy, and starblue (upvote them)
Bucketing is the fastest aside from the O(1)
lottonumbers == mynumbers;
Sorting is O(nlog2n)
Bucketsort is an O(n) algorithm.
So all you need to do is do it twice (once for your numbers, once for the target-set), and if the numbers of the digits add up, then they match.
Any kind of sorting is an added overhead that is unnecessary in this case.
array[10] digits;
while(targetnum > 0)
{
short currDig = targetnum % 10;
digits[currDig]++;
targetnum = targetnum / 10;
}
while(mynum > 0)
{
short myDig = mynum % 10;
digits[myDig]--;
mynum = mynum / 10;
}
for(int i = 0; i < 10; i++)
{
if(digits[i] == 0)
continue;
else
//FAIL TO MATCH
}
Not the prettiest code, I'll admit.
Create an array of 10 integers subscripted [0 .. 9].
Initialize each element to a different prime number
Set product to 1.
Use each digit from the number, to subscript into the array,
pull out the prime number, and multiply the product by it.
That gives you a unique representation which is digit order independent.
Do the same procedure for the other number.
If the unique representations match, then the original numbers match.
If there are no repeating digits allowed (not sure if this is the case though) then use a 10-bit binary number. The most significant bit represents the digit 9 and the LSB represents the digit 0. Work through each number in turn and flip the appropriate bit for each digit that you find
So 1689 would be: 1101000010
and 9816 would also be: 1101000010
then a XOR or a subtract will leave 0 if you are a winner
This is just a simple form of bucketing
Just for fun, and thinking outside of the normal, instead of sorting and other ways, do the deletion-thing. If resultstring is empty, you have a winner!
Dim ticket As String = "1324"
Dim WinningNumber As String = "4321"
For Each s As String In WinningNumber.ToCharArray
ticket = Replace(ticket, s, "", 1, 1)
Next
If ticket = "" Then MsgBox("WINNER!")
If ticket.Length=1 then msgbox "Close but no cigar!"
This works with repeating numbers too..
Sort digits before storing a number. After that, your numbers will be equal.
One cute solution is to use a variant of Zobrist hashing. (Yes, I know it's overkill, as well as probabilistic, but hey, it's "clever".)
Initialize a ten-element array a[0..9] to random integers. Then, for each number d[], compute the sum of a[d[i]]. If the numbers contained the same digits, the resulting numbers will be equal; with high probability (~ 1 in how many possible ints there are), the opposite is true as well.
(If you know that there will be at most 10 digits total, then you can use the fixed numbers 1, 10, 100, ... instead of random numbers for guaranteed success. This is bucket sorting in not-too-much disguise.)

Resources