One-to-one integer mapping function - math

We are using MySQL and developing an application where we'd like the ID sequence not to be publicly visible... the IDs are hardly top secret and there is no significant issue if someone indeed was able to decode them.
So, a hash is of course the obvious solution, we are currently using MD5... 32bit integers go in, and we trim the MD5 to 64bits and then store that. However, we have no idea how likely collisions are when you trim like this (especially since all numbers come from autoincrement or the current time). We currently check for collisions, but since we may be inserting 100.000 rows at once the performance is terrible (can't bulk insert).
But in the end, we really don't need the security offered by the hashes and they consume unnecessary space and also require an additional index... so, is there any simple and good enough function/algorithm out there that guarantees one-to-one mapping for any number without obvious visual patterns for sequential numbers?
EDIT: I'm using PHP which does not support integer arithmetic by default, but after looking around I found that it could be cheaply replicated with bitwise operators. Code for 32bit integer multiplication can be found here: http://pastebin.com/np28xhQF

You could simply XOR with 0xDEADBEEF, if that's good enough.
Alternatively multiply by an odd number mod 2^32. For the inverse mapping just multiply by the multiplicative inverse
Example: n = 2345678901; multiplicative inverse (mod 2^32): 2313902621
For the mapping just multiply by 2345678901 (mod 2^32):
1 --> 2345678901
2 --> 396390506
For the inverse mapping, multiply by 2313902621.

If you want to ensure a 1:1 mapping then use an encryption (i.e. a permutation), not a hash. Encryption has to be 1:1 because it can be decrypted.
If you want 32 bit numbers then use Hasty Pudding Cypher or just write a simple four round Feistel cypher.
Here's one I prepared earlier:
import java.util.Random;
/**
* IntegerPerm is a reversible keyed permutation of the integers.
* This class is not cryptographically secure as the F function
* is too simple and there are not enough rounds.
*
* #author Martin Ross
*/
public final class IntegerPerm {
//////////////////
// Private Data //
//////////////////
/** Non-zero default key, from www.random.org */
private final static int DEFAULT_KEY = 0x6CFB18E2;
private final static int LOW_16_MASK = 0xFFFF;
private final static int HALF_SHIFT = 16;
private final static int NUM_ROUNDS = 4;
/** Permutation key */
private int mKey;
/** Round key schedule */
private int[] mRoundKeys = new int[NUM_ROUNDS];
//////////////////
// Constructors //
//////////////////
public IntegerPerm() { this(DEFAULT_KEY); }
public IntegerPerm(int key) { setKey(key); }
////////////////////
// Public Methods //
////////////////////
/** Sets a new value for the key and key schedule. */
public void setKey(int newKey) {
assert (NUM_ROUNDS == 4) : "NUM_ROUNDS is not 4";
mKey = newKey;
mRoundKeys[0] = mKey & LOW_16_MASK;
mRoundKeys[1] = ~(mKey & LOW_16_MASK);
mRoundKeys[2] = mKey >>> HALF_SHIFT;
mRoundKeys[3] = ~(mKey >>> HALF_SHIFT);
} // end setKey()
/** Returns the current value of the key. */
public int getKey() { return mKey; }
/**
* Calculates the enciphered (i.e. permuted) value of the given integer
* under the current key.
*
* #param plain the integer to encipher.
*
* #return the enciphered (permuted) value.
*/
public int encipher(int plain) {
// 1 Split into two halves.
int rhs = plain & LOW_16_MASK;
int lhs = plain >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, i);
} // end for
// 3 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end encipher()
/**
* Calculates the deciphered (i.e. inverse permuted) value of the given
* integer under the current key.
*
* #param cypher the integer to decipher.
*
* #return the deciphered (inverse permuted) value.
*/
public int decipher(int cypher) {
// 1 Split into two halves.
int rhs = cypher & LOW_16_MASK;
int lhs = cypher >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, NUM_ROUNDS - 1 - i);
} // end for
// 4 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end decipher()
/////////////////////
// Private Methods //
/////////////////////
// The F function for the Feistel rounds.
private int F(int num, int round) {
// XOR with round key.
num ^= mRoundKeys[round];
// Square, then XOR the high and low parts.
num *= num;
return (num >>> HALF_SHIFT) ^ (num & LOW_16_MASK);
} // end F()
} // end class IntegerPerm

Do what Henrik said in his second suggestion. But since these values seem to be used by people (else you wouldn't want to randomize them). Take one additional step. Multiply the sequential number by a large prime and reduce mod N where N is a power of 2. But choose N to be 2 bits smaller than you can store. Next, multiply the result by 11 and use that. So we have:
Hash = ((count * large_prime) % 536870912) * 11
The multiplication by 11 protects against most data entry errors - if any digit is typed wrong, the result will not be a multiple of 11. If any 2 digits are transposed, the result will not be a multiple of 11. So as a preliminary check of any value entered, you check if it's divisible by 11 before even looking in the database.

You can use mod operation for big prime number.
your number * big prime number 1 / big prime number 2.
Prime number 1 should be bigger than second. Seconds should be close to 2^32 but less than it. Than it will be hard to substitute.
Prime 1 and Prime 2 should be constants.

For our application, we use bit shuffle to generate the ID. It is very easy to reverse back to the original ID.
func (m Meeting) MeetingCode() uint {
hashed := (m.ID + 10000000) & 0x00FFFFFF
chunks := [24]uint{}
for i := 0; i < 24; i++ {
chunks[i] = hashed >> i & 0x1
}
shuffle := [24]uint{14, 1, 15, 21, 0, 6, 5, 10, 4, 3, 20, 22, 2, 23, 8, 13, 19, 9, 18, 12, 7, 11, 16, 17}
result := uint(0)
for i := 0; i < 24; i++ {
result = result | (chunks[shuffle[i]] << i)
}
return result
}

There is an exceedingly simple solution that none have posted, even though an answer has been selected I highly advise any visiting this question to consider the nature of binary representations, and the application of modulos arithmetic.
Given an finite range of integers, all the values can be permuted in any order through a simple addition over their index while bound by the range of the index through a modulos. You could even leverage simple integer overflow such that using the modulos operator is not even necessary.
Essentially, you'd have a static variable in memory, where a function when called increments the static variable by some constant, enforces the boundaries, and then returns the value. This output could be an index over a collection of desired outputs, or the desired output itself
The constant of the increment that defines the mapping may be several times the size in memory of the value being returned, but given any mapping there exists some finite constant that will achieve the mapping through a trivial modulos arithmetic.

Related

two dimensional array and pointer arithmetic

I am trying to copy a 2 dimensional array to another 2 dimensional array. Since the name (srcAry) is the address of the first element of the source array, I have been able to print out all the values in the source array using pointer arithmetic in a for loop. I am using the number of rows times the number of columns as the condition to stop looping. If I try to assign the values to the new array using this method I get an error message (error: assignment to expression with array type). Is this possible to do this or am I limited to using two nested for loops with indexes?
...
void copyAry(double *pAry, int numRows, int numCols)
{
double newAry[numRows][numCols];
int end = numRows * numCols;
int ctr = 0;
for( ; ctr < end; ctr++)
// printf("*(pAry + %d) = %.1f\n", ctr, *(pAry + ctr)); //this works fine
{
*(newAry + ctr) = *(pAry + ctr); //this is where I receive error
}
return;
}
...
Thanks in advance.
I would assume that the type of newAry + ctr is not double* as your code assumes, but rather double*[numCols] i.e. a pointer to an array of numCols elements. Which also means that you would advance not one element at a time, but numCols.
Usually you would use memcpy for this kind of low level data copying. Barring that, you might start with double* pNewAry = &newAry[0][0] or some such in order to test the 2d array as a linear sequence of doubles.

Constraint on an array with same values group together

I have two rand arrays: pointer and value. Whatever values in the pointer should also come in value with same number of times. For eg: if pointer[i] == 2, then value should have a value 2 which occur two times and should be after 1.
Expected result is shown below.
Sample code:
class ABC;
rand int unsigned pointer[$];
rand int unsigned value[20];
int count;
constraint c_mode {
pointer.size() == count;
solve pointer before value;
//======== Pointer constraints =========//
// To avoid duplicates
unique {pointer};
foreach(pointer[i]) {
// Make sure pointer is inside 1 to 4
pointer[i] inside {[1:4]};
// Make sure in increasing order
if (i>0)
pointer[i] > pointer[i-1];
}
//======== Value constraints =========//
//Make sure Pointer = 2 has to come two times in value, but this is not working as expected
foreach(pointer[i]) {
value.sum with (int'(item == pointer[i])) == pointer[i];
}
// Ensure it will be in increasing order but not making sure that pointers are not grouping together
// For eg: if pointer = 2, then 2 has to come two times together and after 1 in the array order. This is not met with the below constraint
foreach(value[i]) {
foreach(value[j]) {
((i>j) && (value[i] inside pointer) && (value[j] inside pointer)) -> value[i] >= value[j];
}
}
}
function new(int num);
count = num;
endfunction
endclass
module tb;
initial begin
int unsigned index;
ABC abc = new(4);
abc.randomize();
$display("-----------------");
$display("Pointer = %p", abc.pointer);
$display("Value = %p", abc.value);
$display("-----------------");
end
endmodule
I would implement this using a couple of helper arrays:
class pointers_and_values;
rand int unsigned pointers[];
rand int unsigned values[];
local rand int unsigned values_dictated_by_pointers[][];
local rand int unsigned filler_values[][];
// ...
endclass
The values_dictated_by_pointers array will contain the groups of values that your pointers mandate. The other array will contain the dummy values that come between these groups. So, the values array will contain filler_values[0], values_dictated_by_pointers[0], filler_values[1], values_dictated_by_pointers[1], etc.
Computing the values mandated by the pointers is easy:
constraint compute_values_dicated_by_pointers {
values_dictated_by_pointers.size() == pointers.size();
foreach (pointers[i]) {
values_dictated_by_pointers[i].size() == pointers[i];
foreach (values_dictated_by_pointers[i,j])
values_dictated_by_pointers[i][j] == pointers[i];
}
}
You need as many groups as you need pointers. In each group you have as many elements as the pointer value for that group. Also, each element of a group has the same value as the group's pointer value.
For the filler values you didn't mention what they should look like. I interpreted your problem description to say that the values in the pointers array should only come in the patters described above. This means that they are not allowed as filler values. Depending on whether you want to allow filler values before the first value, you will need either as many filler groups as you have pointers or one extra. In the following code I allowed filler values before the "real" values:
constraint compute_filler_values {
filler_values.size() == pointers.size() + 1;
foreach (filler_values[i, j])
!(filler_values[i][j] inside { pointers });
}
You'll also need to constrain the size of each of the filler value groups, otherwise the solver will leave them as 0. Here you can change the constraints to match your requirements. I chose to always insert filler values and to never insert more than 3 filler values.
constraint max_number_of_filler_values {
foreach (filler_values[i]) {
filler_values[i].size() > 0;
filler_values[i].size() <= 3;
}
}
For the real values array, you can compute its value in post_randomize() by interleaving the other two arrays:
function void post_randomize();
values = filler_values[0];
foreach (pointers[i])
values = { values, values_dictated_by_pointers[i], filler_values[i] };
endfunction
If you need to be able to constrain values as well, then you'll have to implement this interleaving operation using constraints. I'm not going to show this, as this is probably pretty complicated in itself and warrants an own question.
Be aware that the code above might not work on all EDA tools, because of spotty support for random multi-dimensional arrays. I only got this to work on Aldec Riviera Pro on EDA Playground.

How to find prime number on O(1) runtime

I got this question in an interview
Please provide a solution to check if a number is a prime number using
a loop of one - O(1). The input number can be between 1 and 10,000
only.
I said that its impossible unless if you have stored all prime numbers up to 10,000. Now I am not entirely sure whether my answer was correct. I tried to search for an answer on internet and the best I came up with AKS algorithm with run-time of O((log n)^6)
it is doable using SoE (Sieve of Eratosthenes). Its result is an array of bools usually encoded as single bit in BYTE/WORD/DWORD array for better density of storage. Also usually only the odd numbers are stored as the even except 2 are all not primes. Usually true value means it is not prime....
So the naive O(1) C++ code for checking x would look like:
bool SoE[10001]; // precomputed sieve array
int x = 27; // any x <0,10000>
bool x_is_prime = !SoE[x];
if the SoE is encoded as 8 bit BYTE array you need to tweak the access a bit:
BYTE SoE[1251]; // precomputed sieve array ceil(10001/8)
int x = 27; // any x <0,10000>
BYTE x_is_prime = SoE[x>>3]^(1<<(x&7));
of coarse constructing SoE is not O(1) !!! Here an example heavily using it to speedup mine IsPrime function:
Prime numbers by Eratosthenes quicker sequential than concurrently?
YES!,
You can use Sieve of Eratosthenes to check if number is a prime or not,
However you will have to precompute for certain number of value and store it in the array and for each query you can check in O(1).
If you do not want to precompute as it will take O(log(long)) time , then you can use this Concept ,
if P is a Prime Number , then P^2 - 1 is divisible by 24.
So in case of C++ , if the given number is less than or equal to 10^9 , we can use this concept.
The Source to this Concept can be learned at www.brilliant.org
public static boolean prime(int n) {
if(n%2 == 0)
return true;
else if(n%3 == 0)
return true;
else if(n%5 == 0)
return true;
else if(n%7 == 0)
return true;
return false;
}

two dimensional vector

I wanted to have a linked list of nodes with below structure.
struct node
{
string word;
string color;
node *next;
}
for some reasons I decided to use vector instead of list.my question is that is it possible to implement a vector which it's j direction is bounded and in i direction is unlimited and to add more two strings at the end of my vertex.
in other words is it possible to implement below structure in vector ?
j
i color1 color2 …
word1 word2 …
I am not good with C/C++, so this answer will only be very general. Unless you are extremely concerned about speed or memory optimization (most of the time you shouldn't be), use encapsulation.
Make a class. Make an interface which says what you want to do. Make the simples possible implementation of how to do it. Most of the time, the simplest implementation is good enough, unless it contains some bugs.
Let's start with the interface. You could have made it part of the question. To me it seems that you want a two-dimensional something-like-an-array of strings, where one dimension allows only values 0 and 1, and the other dimension allows any non-genative integers.
Just to make sure there is no misunderstanding: The bounded dimension is always size 2 (not at most 2), right? So we are basicly speaking about 2×N "rectangles" of strings.
What methods will you need? My guesses: A constructor for a new 2×0 size rectangle. A method to append a new pair of values, which increases the size of the rectangle from 2×N to 2×(N+1) and sets the two new values. A method which returns the current length of the rectangle (only the unbounded dimension, because the other one is constant). And a pair of random-access methods for reading or writing a single value by its coordinates. Is that all?
Let's write the interface (sorry, I am not good at C/C++, so this will be some C/Java/pseudocode hybrid).
class StringPairs {
constructor StringPairs(); // creates an empty rectangle
int size(); // returns the length of the unbounded dimension
void append(string s0, string s1); // adds two strings to the new J index
string get(int i, int j); // return the string at given coordinates
void set(int i, int j, string s); // sets the string at given coordinates
}
We should specify what will the functions "set" and "get" do, if the index is out of bounds. For simplicity, let's say that "set" will do nothing, and "get" will return null.
Now we have the question ready. Let's get to the answer.
I think the fastest way to write this class would be to simply use the existing C++ class for one-dimensional vector (I don't know what it is and how it is used, so I just assume that it exists, and will use some pseudocode; I will call it "StringVector") and do something like this:
class StringPairs {
private StringVector _vector0;
private StringVector _vector1;
private int _size;
constructor StringPairs() {
_vector0 = new StringVector();
_vector1 = new StringVector();
_size = 0;
}
int size() {
return _size;
}
void append(string s0, string s1) {
_vector0.appens(s0);
_vector1.appens(s1);
_size++;
}
string get(int i, int j) {
if (0 == i) return _vector0.get(j);
if (1 == i) return _vector1.get(j);
return null;
}
void set(int i, int j, string s) {
if (0 == i) _vector0.set(j, s);
if (1 == i) _vector1.set(j, s);
}
}
Now, translate this pseudocode to C++, and add any new methods you need (it should be obvious how).
Using the existing classes to build your new classes can help you program faster. And if you later change your mind, you can change the implementation while keeping the interface.

Modifying motion vectors in ffmpeg H.264 decoder

For research purposes, I am trying to modify H.264 motion vectors (MVs) for each P- and B-frame prior to motion compensation during the decoding process. I am using FFmpeg for this purpose. An example of a modification is replacing each MV with its original spatial neighbors and then using the resultant MVs for motion compensation, rather than the original ones. Please direct me appropriately.
So far, I have been able to do a simple modification of MVs in the file /libavcodec/h264_cavlc.c. In the function, ff_h264_decode_mb_cavlc(), modifying the mx and my variables, for instance, by increasing their values modifies the MVs used during decoding.
For example, as shown below, the mx and my values are increased by 50, thus lengthening the MVs used in the decoder.
mx += get_se_golomb(&s->gb)+50;
my += get_se_golomb(&s->gb)+50;
However, in this regard, I don't know how to access the neighbors of mx and my for my spatial mean analysis that I mentioned in the first paragraph. I believe that the key to doing so lies in manipulating the array, mv_cache.
Another experiment that I performed was in the file, libavcodec/error_resilience.c. Based on the guess_mv() function, I created a new function, mean_mv() that is executed in ff_er_frame_end() within the first if-statement. That first if-statement exits the function ff_er_frame_end() if one of the conditions is a zero error-count (s->error_count == 0). However, I decided to insert my mean_mv() function at this point so that is always executed when there is a zero error-count. This experiment somewhat yielded the results I wanted as I could start seeing artifacts in the top portions of the video but they were restricted just to the upper-right corner. I'm guessing that my inserted function is not being completed so as to meet playback deadlines or something.
Below is the modified if-statement. The only addition is my function, mean_mv(s).
if(!s->error_recognition || s->error_count==0 || s->avctx->lowres ||
s->avctx->hwaccel ||
s->avctx->codec->capabilities&CODEC_CAP_HWACCEL_VDPAU ||
s->picture_structure != PICT_FRAME || // we dont support ER of field pictures yet, though it should not crash if enabled
s->error_count==3*s->mb_width*(s->avctx->skip_top + s->avctx->skip_bottom)) {
//av_log(s->avctx, AV_LOG_DEBUG, "ff_er_frame_end in er.c\n"); //KG
if(s->pict_type==AV_PICTURE_TYPE_P)
mean_mv(s);
return;
And here's the mean_mv() function I created based on guess_mv().
static void mean_mv(MpegEncContext *s){
//uint8_t fixed[s->mb_stride * s->mb_height];
//const int mb_stride = s->mb_stride;
const int mb_width = s->mb_width;
const int mb_height= s->mb_height;
int mb_x, mb_y, mot_step, mot_stride;
//av_log(s->avctx, AV_LOG_DEBUG, "mean_mv\n"); //KG
set_mv_strides(s, &mot_step, &mot_stride);
for(mb_y=0; mb_y<s->mb_height; mb_y++){
for(mb_x=0; mb_x<s->mb_width; mb_x++){
const int mb_xy= mb_x + mb_y*s->mb_stride;
const int mot_index= (mb_x + mb_y*mot_stride) * mot_step;
int mv_predictor[4][2]={{0}};
int ref[4]={0};
int pred_count=0;
int m, n;
if(IS_INTRA(s->current_picture.f.mb_type[mb_xy])) continue;
//if(!(s->error_status_table[mb_xy]&MV_ERROR)){
//if (1){
if(mb_x>0){
mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index - mot_step][0];
mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index - mot_step][1];
ref [pred_count] = s->current_picture.f.ref_index[0][4*(mb_xy-1)];
pred_count++;
}
if(mb_x+1<mb_width){
mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index + mot_step][0];
mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index + mot_step][1];
ref [pred_count] = s->current_picture.f.ref_index[0][4*(mb_xy+1)];
pred_count++;
}
if(mb_y>0){
mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index - mot_stride*mot_step][0];
mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index - mot_stride*mot_step][1];
ref [pred_count] = s->current_picture.f.ref_index[0][4*(mb_xy-s->mb_stride)];
pred_count++;
}
if(mb_y+1<mb_height){
mv_predictor[pred_count][0]= s->current_picture.f.motion_val[0][mot_index + mot_stride*mot_step][0];
mv_predictor[pred_count][1]= s->current_picture.f.motion_val[0][mot_index + mot_stride*mot_step][1];
ref [pred_count] = s->current_picture.f.ref_index[0][4*(mb_xy+s->mb_stride)];
pred_count++;
}
if(pred_count==0) continue;
if(pred_count>=1){
int sum_x=0, sum_y=0, sum_r=0;
int k;
for(k=0; k<pred_count; k++){
sum_x+= mv_predictor[k][0]; // Sum all the MVx from MVs avail. for EC
sum_y+= mv_predictor[k][1]; // Sum all the MVy from MVs avail. for EC
sum_r+= ref[k];
// if(k && ref[k] != ref[k-1])
// goto skip_mean_and_median;
}
mv_predictor[pred_count][0] = sum_x/k;
mv_predictor[pred_count][1] = sum_y/k;
ref [pred_count] = sum_r/k;
}
s->mv[0][0][0] = mv_predictor[pred_count][0];
s->mv[0][0][1] = mv_predictor[pred_count][1];
for(m=0; m<mot_step; m++){
for(n=0; n<mot_step; n++){
s->current_picture.f.motion_val[0][mot_index + m + n * mot_stride][0] = s->mv[0][0][0];
s->current_picture.f.motion_val[0][mot_index + m + n * mot_stride][1] = s->mv[0][0][1];
}
}
decode_mb(s, ref[pred_count]);
//}
}
}
}
I would really appreciate some assistance on how to go about this properly.
It's been a long time i have been out of touch with FFMPEG's code internally.
However, given my experience with inside FFMPEG horrors (you would know what i mean), i would rather give you a simple pragmatic advice.
Suggestion #1
Best possibility is that when motion vector of each of the blocks are identified - you can create your own additional array inside FFMPEG encoder context (a.k.a s) which will store all of them. When your algorithm runs it will pick up the values from there.
Suggestion #2
Another thing i read (i am not sure if i read it right)
the mx and my values are increased by 50
I think 50 is a very large motion vector. And usually, the F-value range of motion vector encoding would be prior restrictive. If you alter things by +/- 8 (or even +/- 16) might just be ok- but +50 could be so high that end result may not encode things properly.
I didn't quite understood your objective about mean_mv() and what failure you expect from there. Please re-phrase a bit.

Resources