How can I best check these Elliptic Curve parameters are valid? - encryption

(Just abit of context:) I'm a novice to Cryptography, but for a school project I wanted to create a proof of concept 64-bit ECC curve. (Yes I do know 64-bit keys are not very secure!) However afaik there is no SEC standard parameters for 64-bit, only 160-512bit.
So I had to go about generating my own parameters, which is the bit i'm (quite) unsure on. I followed a quick guide, and came out with these parameters for my curve:
p = 10997031918897188677
a = 3628449283386729367
b = 4889270915382004880
x = 3124469192170877657
y = 4370601445727723733
n = 10997031916045924769 (order)
h = 1 (co-factor)
Could someone give me some advice as to if this curve will generate valid private/public key pairs? How could I check this?
Any help (/confirmation) would be greatly appreicated, thanks!

You can use OpenSSL's EC_GROUP_check() function to make sure it's a valid group. In the following program I did two things:
Generate an EC_GROUP with the provided parameters, and check if it's valid
Generate an EC_KEY using the generated EC_group, and check if it's valid
Note that it is the EC group you want to check if it can be used to generate valid EC keys, not EC curve.
Please read the comments for details:)
// gcc 22270485.c -lcrypto -o 22270485
#include <openssl/ec.h>
#include <stdio.h>
int main(){
BN_CTX *ctx = NULL;
BIGNUM *p, *a, *b, *x, *y, *order;
EC_GROUP *group;
EC_POINT *G;
int ok = 1;
ctx = BN_CTX_new();
p = BN_new();
a = BN_new();
b = BN_new();
x = BN_new();
y = BN_new();
order = BN_new();
/* Set EC_GROUP */
group = EC_GROUP_new(EC_GFp_mont_method());
BN_dec2bn(&p, "10997031918897188677");
BN_dec2bn(&a, "3628449283386729367");
BN_dec2bn(&b, "4889270915382004880");
EC_GROUP_set_curve_GFp(group, p, a, b, ctx);
/* Set generator G=(x,y) and its cofactor */
G = EC_POINT_new(group);
BN_dec2bn(&x, "3124469192170877657");
BN_dec2bn(&y, "4370601445727723733");
BN_dec2bn(&order, "10997031916045924769");
EC_POINT_set_affine_coordinates_GFp(group,G,x,y,ctx);
EC_GROUP_set_generator(group,G,order,BN_value_one());
/* Checks whether the parameter in the EC_GROUP define a valid ec group */
if(!EC_GROUP_check(group,ctx)) {
fprintf(stdout, "EC_GROUP_check() failed\n");
ok = 0;
}
if (ok) {
fprintf(stdout, "It is a valid EC group\n");
}
/* Generate a private/public key pair with above EC_GROUP */
if (ok) {
BIGNUM *private_key, *pub_x, *pub_y;
EC_POINT *public_key;
EC_KEY *eckey;
pub_x = BN_new(); pub_y = BN_new();
eckey = EC_KEY_new();
/* create key on group */
EC_KEY_set_group(eckey,group);
EC_KEY_generate_key(eckey);
/* Verifies that a private and/or public key is valid */
if (!EC_KEY_check_key(eckey)) {
fprintf(stdout, "EC_KEY_check_key() failed\n");
ok = 0;
}
if (ok) {
fprintf(stdout, "It is a valid EC key, where\n");
private_key = EC_KEY_get0_private_key(eckey);
fprintf(stdout, "\tprivate key = %s",BN_bn2dec(private_key));
public_key = EC_KEY_get0_public_key(eckey);
EC_POINT_get_affine_coordinates_GFp(group,public_key,pub_x,pub_y,ctx);
fprintf(stdout, "\n\tpublic key = ( %s , %s )\n",
BN_bn2dec(pub_x),BN_bn2dec(pub_y));
}
BN_free(pub_x); BN_free(pub_y);
EC_KEY_free(eckey);
}
if (ctx)
BN_CTX_free(ctx);
BN_free(p); BN_free(a); BN_free(b);
EC_GROUP_free(group);
EC_POINT_free(G);
BN_free(x); BN_free(y); BN_free(order);
return 0;
}
Compile and run with this command:
$ gcc 22270485.c -lcrypto -o 22270485
$ ./22270485
The stdout should print
It is a valid EC group
It is a valid EC key, where
private key = 1524190197747279622
public key = ( 3228020167903858345 , 9344375093791763077 )
The private/public key pair will change every time, since EC_KEY_generate_key(eckey) randomly chooses a private key and compute the corresponding public key for every run.

Related

Parse EC Public key

I an working on ECIES and need to load peer public key.
Load EC Public key
I an using ECDH and need to load peer public key.
When I try to load public key from PEM file , seems no issue
Issue here:
EVP_PKEY * get_peer_key()
{
// base64 certificate data of alice_pub_key.pem
char *buffer= "MFYwEAYHKoZIzj0CAQYFK4EEAAoDQgAEjWrT7F97QrSqGrlIgPK8dphNBicNO6gDLfOIMjhF2MiLuuzd7L7BP+bLCuNtKKe/2dOkgPqgXv4BFWqgp6PZXQ=="`
// calculate buffer length
int l = strlen(buffer)
//create bio from buffer
BIO *in = BIO_new_mem_buf(buffer,l)
//gnerate ec key
EC_KEY *eckey = PEM_read_bio_EC_PUBKEY(in,NULL,NULL,NULL)` // ==> FAIL
//need to convert to EVP format
EVP_PKEY *peerKey = EVP_PKEY_new()
//assign ec key evp
if(EVP_PKEY_assign_EC_KEY(peerKey,eckey) != 1 )
printf("\n error hapened");
return peerKey;
}
Works fine:
EVP_PKEY * get_peer_key()
{
//Load PEM format file
char * infile = "alice_pub_key.pem";
//create bio
BIO *in = BIO_new(BIO_s_file());
//read bio file
BIO_read_filename(in , infile);
//create eckey
EC_KEY *eckey = PEM_read_bio_EC_PUBKEY(in,NULL,NULL,NULL); // ==> success
// create peer key
EVP_PKEY *peerKey = EVP_PKEY_new();
//assign public key
if(EVP_PKEY_assign_EC_KEY(peerKey,eckey) != 1 )
printf("\n error hapened");
return peerKey;
}
Can some one suggest whats going wrong while reading base64 data of pem file
There are two ways of solving this:
Creating a PEM using a header and footer line and line breaks (at the 64th character;
Base 64 decoding the text and then handling it by parsing the resulting ASN.1 / DER binary;
I'd prefer the latter, as I abhor adding lines and such, it is error prone at best, and string manipulations should be avoided where possible.
Note that this assumes that the base 64 contains a SubjectPublicKeyInfo structure which I've shown you earlier. Otherwise you may have to find out how to parse a X9.62 structure or just a point.

Change and wrap keyword integers without loop in C

I'm writing a program that accepts a string at the command prompt then converts each character of the string to corresponding 0-25 digit of the alphabet. Each digit is then used to encipher each character of another string the user enters after being prompted by the program. Each alphabetic character of the second string should match the order of the string of integers and the string of integers will wrap if the second string is longer. The goal of the program is the use the first string as a key to shift each character of a message (the second string).
Example (desired output):
User runs program and enters keyword: bad
User is prompted to enter string of alphabetical characters and punctuation only: Dr. Oz
Program converts keyword 'bad' into 1,0,3
Program enciphers message into Er. Ra
What I actually get is:
… T.B.S. …
I've tried many things but unfortunately I can't seem to figure out how to loop and wrap the key without looping the second message. If you run the program you will see my problem.
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int shift(char key1);
int main(int argc, string argv[]) // user enter number at cmd prompt
{
if (argv[1] == '\0')
{
printf("Usage: ./vigenere keyword\n");
return 1;
}
string key = argv[1]; // declare second arg as string
for (int i = 0, n = strlen(key); i < n; i++)
if (isdigit(key[i]) != 0 || argc != 2)
{
printf("Usage: ./vigenere keyword\n");
return 1;
}
string text = get_string("plaintext: ");
printf("ciphertext: ");
int k;
char t;
for (int j = 0, o = strlen(text); j < o; j++)
{
t = text[j];
for (int i = 0, n = strlen(key); i < n; i++)
{
k = shift(key[i]);
if (isupper(t))
{
t += k;
if (t > 'Z')
{
t -= 26;
}
}
if (islower(t))
{
t += k;
if (t > 'z')
{
t -= 26;
}
}
printf("%c", t);
}
}
printf("\n");
}
int shift(char key1)
{
int k1 = key1;
if (islower(key1))
{
k1 %= 97;
}
if (isupper(key1))
{
k1 %= 65;
}
return k1;
}
I appreciate any help and suggestions but please keep in mind the solution should match the level of coding my program suggests. There may be many advanced ways to write this program but unfortunately we are still in the beginning of this course so showing new methods (which I will definitely try to understand) may go over my head.
Here's a modified version of your code, with changes based on my comments:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int shift(char key1);
int main(int argc, string argv[]) // user enter number at cmd prompt
{
if (argc != 2 || argv[1][0] == '\0')
{
fprintf(stderr, "Usage: ./vigenere keyword\n");
return 1;
}
string key = argv[1]; // declare second arg as string
for (int i = 0, n = strlen(key); i < n; i++)
{
if (!isalpha(key[i]))
{
fprintf(stderr, "Usage: ./vigenere keyword\n");
return 1;
}
}
string text = get_string("plain text: ");
printf("ciphertext: ");
int keylen = strlen(key);
int keyidx = 0;
for (int j = 0, o = strlen(text); j < o; j++)
{
int t = text[j];
if (isupper(t))
{
int k = shift(key[keyidx++ % keylen]);
t += k;
if (t > 'Z')
t -= 26;
}
else if (islower(t))
{
int k = shift(key[keyidx++ % keylen]);
t += k;
if (t > 'z')
t -= 26;
}
printf("%c", t);
}
printf("\n");
}
int shift(char key1)
{
if (islower(key1))
key1 -= 'a';
if (isupper(key1))
key1 -= 'A';
return key1;
}
The test for exactly two arguments and for a non-empty key are moved to the top. This is slightly different from what was suggested in the comments. The error messages are printed to standard error, not standard output. I'd probably replace the second 'usage' message with a more specific error — the key may only contain alphabetic characters or thereabouts. And the errors should include argv[0] as the program name rather than hard-coding the name. The key validation loop checks that the key is all alphabetic, rather than checking that they are not digits — there are more character classes than digits and letters. The code uses keyidx and keylen to track the length of the key and the position in the key. I use single-letter variable names, but usually only for loop indexes or simple pointers (usually pointers into strings); otherwise I use short semi-mnemonic names. There are two calls to shift() so that keyidx is only incremented when the input character is a letter. There are other ways that this could be coded.
One very important change not foretold in the comments is the change of type for t — from char to int. When it is a char, if you encrypt letter z with a letter late in the alphabet (e.g. y), the value 'z' + 24 overflows the (signed) char type prevalent on Intel machines, giving a negative value (most typically; formally, the behaviour is undefined). That leads to bogus outputs. Changing to int fixes that problem. Since the value of t is promoted to int anyway when passed to printf(), there is no harm done in the printing. I used the prompt plain text: with a space so that the input and output align on the page.
I decided not to use the extra local variable k1 in shift(). I also used subtraction instead of modulus as noted in the comments.
Given the program cc59 created from cc59.c, a sample run is:
$ cc59 bad
plain text: Dr. Oz
ciphertext: Er. Ra
$ cc59 zax
plain text: Er. Ra
ciphertext: Dr. Oz
$ cc59 ablewasiereisawelba
plain text: The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs. The five boxing wizards jump quickly. How vexingly quick daft zebras jump. Bright vixens jump; dozy fowl quack.
ciphertext: Tip uqius fisef fkb uvmpt zzar lpi cehq dkk. Abck nj fkx oqxy jqne zskfn ljbykr bckj. Xpw fezp coxjyk sirivuw rmml ufjckmj. Lkw nmbzrody mytdk dbqx vetzej ncep. Xvthht wtbank rydt; lgzu jzxl qvlgg.
$ cc59 azpweaiswjwsiaewpza
plain text: Tip uqius fisef fkb uvmpt zzar lpi cehq dkk. Abck nj fkx oqxy jqne zskfn ljbykr bckj. Xpw fezp coxjyk sirivuw rmml ufjckmj. Lkw nmbzrody mytdk dbqx vetzej ncep. Xvthht wtbank rydt; lgzu jzxl qvlgg.
ciphertext: The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs. The five boxing wizards jump quickly. How vexingly quick daft zebras jump. Bright vixens jump; dozy fowl quack.
$
The decrypting keys were derived by matching the 'encrypting' letters in row 1 with the decrypting letters in row 2 of the data:
abcdefghijklmnopqrstuvwxyz
azyxwvutsrqponmlkjihgfedcb
With encryption and decryption, the most basic acid test for the code is that the program can decrypt its own encrypted output given the correct decrypting key and the cipher text.

Frama-c slice : choosing an entry to get pragma ctrl

I'm having a problem getting a CTRL slice.
I'm trying to analyze OpenSSL by running this:
the code is like below
int dtls1_process_heartbeat(SSL *s)
{
unsigned char *p = &s->s3->rrec.data[0], *pl;
unsigned short hbtype;
unsigned int payload;
unsigned int padding = 16; /* Use minimum padding */
/* Read type and payload length first */
hbtype = *p++;
n2s(p, payload);
pl = p;
if (s->msg_callback)
s->msg_callback(0, s->version, TLS1_RT_HEARTBEAT,
&s->s3->rrec.data[0], s->s3->rrec.length,
s, s->msg_callback_arg);
if (hbtype == TLS1_HB_REQUEST)
{
unsigned char *buffer, *bp;
int r;
/* Allocate memory for the response, size is 1 byte
* message type, plus 2 bytes payload length, plus
* payload, plus padding
*/
buffer = OPENSSL_malloc(1 + 2 + payload + padding);
bp = buffer;
/* Enter response type, length and copy payload */
*bp++ = TLS1_HB_RESPONSE;
s2n(payload, bp);
/*# slice pragma stmt; */
memcpy(bp, pl, payload);
bp += payload;
/* Random padding */
RAND_pseudo_bytes(bp, padding);
r = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
if (r >= 0 && s->msg_callback)
s->msg_callback(1, s->version, TLS1_RT_HEARTBEAT,
buffer, 3 + payload + padding,
s, s->msg_callback_arg);
OPENSSL_free(buffer);
if (r < 0)
return r;
}
else if (hbtype == TLS1_HB_RESPONSE)
{
unsigned int seq;
/* We only send sequence numbers (2 bytes unsigned int),
* and 16 random bytes, so we just try to read the
* sequence number */
n2s(pl, seq);
if (payload == 18 && seq == s->tlsext_hb_seq)
{
dtls1_stop_timer(s);
s->tlsext_hb_seq++;
s->tlsext_hb_pending = 0;
}
}
return 0;
}
`
frama-c ./ssl/d1_both.c -main dtls1_process_heartbeat -slice-calls memcpy -cpp-command "gcc -C -E -I ./include/ -I ./" -then-on 'Slicing export' -print
That produced nothing, so I then tried this: want to get a backforward slicing
frama-c ./ssl/d1_both.c -main dtls1_process_heartbeat -slice-pragma dtls1_process_heartbeat -cpp-command "gcc -C -E -I ./include/ -I ./" -then-on 'Slicing export' -print
But I still get nothing like that
void dtls1_process_heartbeat(void);
void dtls1_process_heartbeat(void)
{
return;
}
How can I get a slice like that?
function A (){
…
memcpy()
...
}
function B (){
…
…
...
}
function C (){
…
memcpy()
...
}
I want to capture everything to do with memcpy(), so I want to keep A and C, but not B.
How should I choose an entry point? How do I choose the pragma?
I hope I've stated my question clearly; it's had me confused for days.
First, notice that Frama-C Fluorine is an obsolete version. It has been released more than 3 years ago. Some slicing-related bugs have been fixed in the meantine. Please upgrade to a newer version, preferably Aluminium.
Second, the documentation for option -slicing-value is
select the result of left-values v1,...,vn at the
end of the function given as entry point (addresses are
evaluated at the beginning of the function given as entry
point)
It is unlikely to do what you want. Did you try option -slice-calls, more precisely -slice-calls memcpy ?
Also, keep in mind that B will be kept in the slice if it computes a value that is later used within a call to memcpy.

"uncompressable" data sequence

I would like to generate an "uncompressable" data sequence of X MBytes through an algorithm. I want it that way in order to create a program that measures the network speed through VPN connection (avoiding vpn built-in compression).
Can anybody help me? Thanks!
PS. I need an algorithm, I have used a file compressed to the point that cannot be compressed anymore, but now I need to generate the data sequence from scratch programatically.
White noise data is truly random and thus incompressible.
Therefore, you should find an algorithm that generates it (or an approximation).
Try this in Linux:
# dd if=/dev/urandom bs=1024 count=10000 2>/dev/null | bzip2 -9 -c -v > /dev/null
(stdin): 0.996:1, 8.035 bits/byte, -0.44% saved, 10240000 in, 10285383 out.
You might try any kind of random number generation though...
One simple approach to creating statistically hard-to-compress data is just to use a random number generator. If you need it to be repeatable, fix the seed. Any reasonably good random number generator will do. Ironically, the result is incredibly compressible if you know the random number generator: the only information present is the seed. However, it will defeat any real compression method.
Other answers have pointed out that random noise is incompressible, and good encryption functions have output that is as close as possible to random noise (unless you know the decryption key). So a good approach could be to just use random number generators or encryption algorithms to generate your incompressible data.
Genuinely incompressible (by any compression algorithm) bitstrings exist (for certain formal definitions of "incompressible"), but even recognising them is computationally undecidable, let alone generating them.
It's worth pointing out though that "random data" is only incompressible in that there is no compression algorithm that can achieve a compression ratio of better than 1:1 on average over all possible random data. However, for any particular randomly generated string, there may be a particular compression algorithm that does achieve a good compression ratio. After all, any compressible string should be possible output from a random generator, including stupid things like all zeroes, however unlikely.
So while the possibility of getting "compressible" data out of a random number generator or an encryption algorithm is probably vanishingly small, I would want to actually test the data before I use it. If you have access to the compression algorithm(s) used in the VPN connection that would be best; just randomly generate data until you get something that won't compress. Otherwise, just running it through a few common compression tools and checking that the size doesn't decrease would probably be sufficient.
You have a couple of options:
1. Use a decent pseudo-random number generator
2. Use an encryption function like AES (implementations found everywhere)
Algo
Come up with whatever key you want. All zeroes is fine.
Create an empty block
Encrypt the block using the key
Output the block
If you need more data, goto 3
If done correctly, the datastream you generate will be mathematically indistinguishable from random noise.
The following program (C/POSIX) produces incompressible data quickly, it should be in the gigabytes per second range. I'm sure it's possible to use the general idea to make it even faster (maybe using Djb's ChaCha core with SIMD?).
/* public domain, 2013 */
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#define R(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
static void salsa_scrambler(uint32_t out[16], uint32_t x[16])
{
int i;
/* This is a quickly mutilated Salsa20 of only 1 round */
x[ 4] ^= R(x[ 0] + x[12], 7);
x[ 8] ^= R(x[ 4] + x[ 0], 9);
x[12] ^= R(x[ 8] + x[ 4], 13);
x[ 0] ^= R(x[12] + x[ 8], 18);
x[ 9] ^= R(x[ 5] + x[ 1], 7);
x[13] ^= R(x[ 9] + x[ 5], 9);
x[ 1] ^= R(x[13] + x[ 9], 13);
x[ 5] ^= R(x[ 1] + x[13], 18);
x[14] ^= R(x[10] + x[ 6], 7);
x[ 2] ^= R(x[14] + x[10], 9);
x[ 6] ^= R(x[ 2] + x[14], 13);
x[10] ^= R(x[ 6] + x[ 2], 18);
x[ 3] ^= R(x[15] + x[11], 7);
x[ 7] ^= R(x[ 3] + x[15], 9);
x[11] ^= R(x[ 7] + x[ 3], 13);
x[15] ^= R(x[11] + x[ 7], 18);
for (i = 0; i < 16; ++i)
out[i] = x[i];
}
#define CHUNK 2048
int main(void)
{
uint32_t bufA[CHUNK];
uint32_t bufB[CHUNK];
uint32_t *input = bufA, *output = bufB;
int i;
/* Initialize seed */
srand(time(NULL));
for (i = 0; i < CHUNK; i++)
input[i] = rand();
while (1) {
for (i = 0; i < CHUNK/16; i++) {
salsa_scrambler(output + 16*i, input + 16*i);
}
write(1, output, sizeof(bufA));
{
uint32_t *tmp = output;
output = input;
input = tmp;
}
}
return 0;
}
A very simple solution is to generate a random string and then compress it.
An already compressed file is incompressible.
For copy-paste lovers here some C# code to generate files with (almost) uncompressable content. The heart of the code is the MD5 hashing algorithm but any cryptographically strong (good random distribution in final result) hash algorithm does the job (SHA1, SHA256, etc).
It just use the file number bytes (32 bit little endian signed integer in my machine) as an hash function's initial input and reshashes and concatenates the output until the desired file size reached. So the file content is deterministic (same number always generates same output) randomly distributed "junk" for the compression algorithm under test.
using System;
using System.IO;
using System.Linq;
using System.Security.Cryptography;
class Program {
static void Main( string [ ] args ) {
GenerateUncompressableTestFiles(
outputDirectory : Path.GetFullPath( "." ),
fileNameTemplate : "test-file-{0}.dat",
fileCount : 10,
fileSizeAsBytes : 16 * 1024
);
byte[] bytes = GetIncompressibleBuffer( 16 * 1024 );
}//Main
static void GenerateUncompressableTestFiles( string outputDirectory, string fileNameTemplate, int fileCount, int fileSizeAsBytes ) {
using ( var md5 = MD5.Create() ) {
for ( int number = 1; number <= fileCount; number++ ) {
using ( var content = new MemoryStream() ) {
var inputBytes = BitConverter.GetBytes( number );
while ( content.Length <= fileSizeAsBytes ) {
var hashBytes = md5.ComputeHash( inputBytes );
content.Write( hashBytes );
inputBytes = hashBytes;
if ( content.Length >= fileSizeAsBytes ) {
var file = Path.Combine( outputDirectory, String.Format( fileNameTemplate, number ) );
File.WriteAllBytes( file, content.ToArray().Take( fileSizeAsBytes ).ToArray() );
}
}//while
}//using
}//for
}//using
}//GenerateUncompressableTestFiles
public static byte[] GetIncompressibleBuffer( int size, int seed = 0 ) {
using ( var md5 = MD5.Create() ) {
using ( var content = new MemoryStream() ) {
var inputBytes = BitConverter.GetBytes( seed );
while ( content.Length <= size ) {
var hashBytes = md5.ComputeHash( inputBytes );
content.Write( hashBytes );
inputBytes = hashBytes;
if ( content.Length >= size ) {
return content.ToArray().Take( size ).ToArray();
}
}//while
}//using
}//using
return Array.Empty<byte>();
}//GetIncompressibleBuffer
}//class
I just created a (very simple and not optimized) C# console application that creates uncompressable files.
It scans a folder for textfiles (extension .txt) and creates a binary file (extension .bin) with the same name and size for each textfile.
Hope this helps someone.
Here is the C# code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var files = Directory.EnumerateFiles(#"d:\MyPath\To\TextFile\", "*.txt");
var random = new Random();
foreach (var fileName in files)
{
var fileInfo = new FileInfo(fileName);
var newFileName = Path.GetDirectoryName(fileName) + #"\" + Path.GetFileNameWithoutExtension(fileName) + ".bin";
using (var f = File.Create(newFileName))
{
long bytesWritten = 0;
while (bytesWritten < fileInfo.Length)
{
f.WriteByte((byte)random.Next());
bytesWritten++;
}
f.Close();
}
}
}
}
}

One-to-one integer mapping function

We are using MySQL and developing an application where we'd like the ID sequence not to be publicly visible... the IDs are hardly top secret and there is no significant issue if someone indeed was able to decode them.
So, a hash is of course the obvious solution, we are currently using MD5... 32bit integers go in, and we trim the MD5 to 64bits and then store that. However, we have no idea how likely collisions are when you trim like this (especially since all numbers come from autoincrement or the current time). We currently check for collisions, but since we may be inserting 100.000 rows at once the performance is terrible (can't bulk insert).
But in the end, we really don't need the security offered by the hashes and they consume unnecessary space and also require an additional index... so, is there any simple and good enough function/algorithm out there that guarantees one-to-one mapping for any number without obvious visual patterns for sequential numbers?
EDIT: I'm using PHP which does not support integer arithmetic by default, but after looking around I found that it could be cheaply replicated with bitwise operators. Code for 32bit integer multiplication can be found here: http://pastebin.com/np28xhQF
You could simply XOR with 0xDEADBEEF, if that's good enough.
Alternatively multiply by an odd number mod 2^32. For the inverse mapping just multiply by the multiplicative inverse
Example: n = 2345678901; multiplicative inverse (mod 2^32): 2313902621
For the mapping just multiply by 2345678901 (mod 2^32):
1 --> 2345678901
2 --> 396390506
For the inverse mapping, multiply by 2313902621.
If you want to ensure a 1:1 mapping then use an encryption (i.e. a permutation), not a hash. Encryption has to be 1:1 because it can be decrypted.
If you want 32 bit numbers then use Hasty Pudding Cypher or just write a simple four round Feistel cypher.
Here's one I prepared earlier:
import java.util.Random;
/**
* IntegerPerm is a reversible keyed permutation of the integers.
* This class is not cryptographically secure as the F function
* is too simple and there are not enough rounds.
*
* #author Martin Ross
*/
public final class IntegerPerm {
//////////////////
// Private Data //
//////////////////
/** Non-zero default key, from www.random.org */
private final static int DEFAULT_KEY = 0x6CFB18E2;
private final static int LOW_16_MASK = 0xFFFF;
private final static int HALF_SHIFT = 16;
private final static int NUM_ROUNDS = 4;
/** Permutation key */
private int mKey;
/** Round key schedule */
private int[] mRoundKeys = new int[NUM_ROUNDS];
//////////////////
// Constructors //
//////////////////
public IntegerPerm() { this(DEFAULT_KEY); }
public IntegerPerm(int key) { setKey(key); }
////////////////////
// Public Methods //
////////////////////
/** Sets a new value for the key and key schedule. */
public void setKey(int newKey) {
assert (NUM_ROUNDS == 4) : "NUM_ROUNDS is not 4";
mKey = newKey;
mRoundKeys[0] = mKey & LOW_16_MASK;
mRoundKeys[1] = ~(mKey & LOW_16_MASK);
mRoundKeys[2] = mKey >>> HALF_SHIFT;
mRoundKeys[3] = ~(mKey >>> HALF_SHIFT);
} // end setKey()
/** Returns the current value of the key. */
public int getKey() { return mKey; }
/**
* Calculates the enciphered (i.e. permuted) value of the given integer
* under the current key.
*
* #param plain the integer to encipher.
*
* #return the enciphered (permuted) value.
*/
public int encipher(int plain) {
// 1 Split into two halves.
int rhs = plain & LOW_16_MASK;
int lhs = plain >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, i);
} // end for
// 3 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end encipher()
/**
* Calculates the deciphered (i.e. inverse permuted) value of the given
* integer under the current key.
*
* #param cypher the integer to decipher.
*
* #return the deciphered (inverse permuted) value.
*/
public int decipher(int cypher) {
// 1 Split into two halves.
int rhs = cypher & LOW_16_MASK;
int lhs = cypher >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, NUM_ROUNDS - 1 - i);
} // end for
// 4 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end decipher()
/////////////////////
// Private Methods //
/////////////////////
// The F function for the Feistel rounds.
private int F(int num, int round) {
// XOR with round key.
num ^= mRoundKeys[round];
// Square, then XOR the high and low parts.
num *= num;
return (num >>> HALF_SHIFT) ^ (num & LOW_16_MASK);
} // end F()
} // end class IntegerPerm
Do what Henrik said in his second suggestion. But since these values seem to be used by people (else you wouldn't want to randomize them). Take one additional step. Multiply the sequential number by a large prime and reduce mod N where N is a power of 2. But choose N to be 2 bits smaller than you can store. Next, multiply the result by 11 and use that. So we have:
Hash = ((count * large_prime) % 536870912) * 11
The multiplication by 11 protects against most data entry errors - if any digit is typed wrong, the result will not be a multiple of 11. If any 2 digits are transposed, the result will not be a multiple of 11. So as a preliminary check of any value entered, you check if it's divisible by 11 before even looking in the database.
You can use mod operation for big prime number.
your number * big prime number 1 / big prime number 2.
Prime number 1 should be bigger than second. Seconds should be close to 2^32 but less than it. Than it will be hard to substitute.
Prime 1 and Prime 2 should be constants.
For our application, we use bit shuffle to generate the ID. It is very easy to reverse back to the original ID.
func (m Meeting) MeetingCode() uint {
hashed := (m.ID + 10000000) & 0x00FFFFFF
chunks := [24]uint{}
for i := 0; i < 24; i++ {
chunks[i] = hashed >> i & 0x1
}
shuffle := [24]uint{14, 1, 15, 21, 0, 6, 5, 10, 4, 3, 20, 22, 2, 23, 8, 13, 19, 9, 18, 12, 7, 11, 16, 17}
result := uint(0)
for i := 0; i < 24; i++ {
result = result | (chunks[shuffle[i]] << i)
}
return result
}
There is an exceedingly simple solution that none have posted, even though an answer has been selected I highly advise any visiting this question to consider the nature of binary representations, and the application of modulos arithmetic.
Given an finite range of integers, all the values can be permuted in any order through a simple addition over their index while bound by the range of the index through a modulos. You could even leverage simple integer overflow such that using the modulos operator is not even necessary.
Essentially, you'd have a static variable in memory, where a function when called increments the static variable by some constant, enforces the boundaries, and then returns the value. This output could be an index over a collection of desired outputs, or the desired output itself
The constant of the increment that defines the mapping may be several times the size in memory of the value being returned, but given any mapping there exists some finite constant that will achieve the mapping through a trivial modulos arithmetic.

Resources