UUID which version to use to avoid collisions - guid

In my system, people use different devices (which works online and offline) to create entries(stored locally until synced) and they get synced with other devices and the server periodically. For this reason, I have to use UUID so that each device can generate a UUID to identify a record and subrecords under it so that it can be synced with other devices once it's synced with the server. So far, it's working well. The only problem is that I got a collision. Even though there are only 400k records in the system, I am wondering this could happen more often.
I am using https://github.com/monicao/angular-uuid4 version 4. Should I use version 1 instead? Each device usually generate none to few UUID per minute, so no chance of split second generation on a single device. These UUID will always be generated from client side and not by server side.
What is the best way to reduce the chance of collisions here assuming no evil/malicious clients and the UUID is system generated and it doesn't matter if it can be traced?

The issue seems to be with the implementation of the library you are using.
Since you are already storing ids with 128 bit precision (stored as 16 chars, not the 36 chars used to encode as a string with hexadecimal values and dashes), you can simply use this for your id generator:
// Returns a 128 bit random id encoded as hexadecimal values.
string randomId() {
const bytes = new Uint8Array(16)
crypto.getRandomBytes(bytes)
return Array.from(bytes, b => ('0' + b.toString(16)).slice(-2)).join('')
}
But this is bad for your database so to improve things, you can use a number of bytes to add locality:
const LOCALITY_BYTES = 4
const RESOLUTION = 500 // [ms]
// 4 bytes with 500 ms resolution gives 68 years modulo.
const MODULO = Math.pow(2, LOCALITY_BYTES * 8)
function locality() {
let date = Math.floor(Date.now() / RESOLUTION) % MODULO
const bytes = []
for (let i = 0; i < LOCALITY_BYTES; ++i) {
const b = date % 256
bytes.unshift(b)
date = (date - b) / 256
}
return bytes
}
function randomIdWithLocality() {
const bytes = new Uint8Array(16)
crypto.getRandomValues(bytes)
bytes.set(locality())
return Array.from(bytes, b => ('0' + b.toString(16)).slice(-2)).join('')
}
With this implementation, you have "only" 96 bits of randomness per 500 ms timeframe compared to v4 122 bits (for the whole life of your app). These 96 bits give you a 50% chance of having at least 1 collision if you write 3 * 10^14 records in 500 ms instead of v4's 2.6 * 10^18 for your app's duration v4 uuid.

Related

Storing binary data in QR codes

I'm trying to store binary data in a QR code. Apparently QR codes do support storing raw binary data (or ISO-8859-1 / Latin1). Here is what I want to encode (hex):
d1 50 01 00 00 00 f6 5f 05 2d 8f 0b 40 e2 01
I've tried the following encoders:
qr.js
Google Charts
qrcode.js
Decoding with zxing.org produces various incorrect results. The two javascript ones produce this (it's wrong; the first text character should be Ñ.
Whereas Google Charts produces this...
What is going on? Are any of these correct? What's really weird is that if I encode this sequence (with the JS ones at least) then it works fine - I would have thought the issue was non-ASCII characters but Ñ (0xd1) is non-ASCII.
d1 50 01 00 00 00 01 02 03 04 05 06 40 e2 01
Does anyone know what is going on?
Update
It occurred to me to try scanning them with a ZBar-based scanner app I found. It scans both JS versions ok (at least they start with ÑP). The Google Charts one is just wrong. So it seems like the issue is with ZXing (which is surprisingly shit - I wouldn't recommend it to anyone).
Update 2
ZBar can't handle null bytes. :-(
"What is going on? Are any of these correct?"
Except for the google chart (which is just empty), your QR codes are correct.
You can see the binary data from zxing is what you would expect:
4: Byte mode indicator
0f: length of 15 byte
d15001...: your 15 bytes of data
ec11 is just padding
The problem comes from the decoding. Because most decoders will try to interpret it as text. But since it's binary data, you should not try to handle it as text. Even if you think you can convert it from text to binary, as you saw this may cause issues with values which are not valid text.
So the solution is to use a decoder that will output you the binary data, and not text data.
Now about interpreting the QR code binary data as text, you said the first character should be 'Ñ' which is true if interpreted it as "ISO-8859-1",
which according to the QR code standard, is what should be done when there is no ECI mode defined.
But in practice, most smartphone QR code reader will interpret it as UTF-8 in this case (or at least try to auto-detect the encoding).
Even though this is not the standard, this had become common practice:
binary mode with no ECI, UTF-8 encoded text.
Maybe the reason behind it is that no one wants to waste these precious bytes adding an ECI mode specifying UTF-8. And actually, not all decoders support ECI.
There are two issues that you have to overcome to store binary data in QR codes.
ISO-8859-1 does not allow bytes in ranges of 00-1F and 7F-9F. If you
need to encode these bytes anyway, quote or encode them, i.e. use
quoted-printable or Base-64 encoding to avoid these ranges.
Since you are trying to store binary data in QR codes, you have to
rely only on your own scanner that will handle this binary data. You
don’t have to display text from your QR codes by other software,
like web application at zxing.org, because most QR decoders,
including that of zxing.org use heuristics to detect the character
set used. These heuristics may detect a character set other than
ISO-8859-1 and thus fail to properly display your binary data. Some
scanners use heuristics to detect a character set even if the
character set is explicitly given by ECI. This is why providing ECI
may not help much – scanners still use heuristics even with ECI.
So, using US-ASCII printable characters only (e.g., binary data encoded in Base64 before passing it to a QR Code generator) is the safest choice for QR code against the heuristics. This will also overcome another complication: that ISO-8859-1 was not the default encoding in earlier QR code standard published in 2000 (ISO/IEC 18004:2000). That standard did specify 8-bit Latin/Kana character set in accordance with JIS X 0201 (JIS8 also known as ISO-2022-JP) as default encoding for 8-bit mode, while the updated standard published in 2005 did change the default to ISO-8859-1.
As an alternative to Base-64, you can encode each byte with two hexadecimal characters (0-9, A-F), so, in the QR code your data will be encoded in the alphanumeric mode, not in 8-bit mode. This will disable all heuristics for sure and should not produce larger QR Code than with Base-64, because each character in the alphanumeric mode takes only 6 bits in the QR code stream.
Update:
I recently went back and published the referenced code as a project on GitHub for anyone who wants to use it.
https://github.com/yurelle/Base45Encoder
This is a bit necro, but I just hit this problem, and figured out a solution.
The problem with reading QR Codes with ZXING is that it assumes all QR Payloads are Strings. If you're willing to generate the QR Code in java with ZXING, I developed a solution which enables storing a binary payload in ZXING QR Codes with a storage efficiently loss of only -8%; better than the 33% inflation from Base64.
It exploits an internal compression optimization of the ZXING library based around pure Alphanum Strings. If you want a full explanation, with math and Unit Tests, check out my other answer.
But the short answer is this:
Solution
I implemented it as a self-contained static utility class, so all you have to do is call:
//Encode
final byte[] myBinaryData = ...;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(myBinaryData);
//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);
Alternatively, you can also do it via InputStreams:
//Encode
final InputStream in_1 = ... ;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(in_1);
//Decode
final InputStream in_2 = ... ;
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(in_2);
Here's the implementation
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
/**
* For some reason none of the Java QR Code libraries support binary payloads. At least, none that
* I could find anyway. The commonly suggested workaround for this is to use Base64 encoding.
* However, this results in a 33% payload size inflation. If your payload is already near the size
* limit of QR codes, this is a lot.
*
* This class implements an encoder which takes advantage of a built-in compression optimization
* of the ZXING QR Code library, to enable the storage of Binary data into a QR Code, with a
* storage efficiency loss of only -8%.
*
* The built-in optimization is this: ZXING will automatically detect if your String payload is
* purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2
* AlphaNumeric characters into 11 bits.
*
*
* ----------------------
*
*
* The included ALPHANUMERIC_TABLE is the conversion table used by the ZXING library as a reverse
* index for determining if a given input data should be classified as alphanumeric.
*
* See:
*
* com.google.zxing.qrcode.encoder.Encoder.chooseMode(String content, String encoding)
*
* which scans through the input string one character at a time and passes them to:
*
* getAlphanumericCode(int code)
*
* in the same class, which uses that character as a numeric index into the the
* ALPHANUMERIC_TABLE.
*
* If you examine the values, you'll notice that it ignores / disqualifies certain values, and
* effectively converts the input into base 45 (0 -> 44; -1 is interpreted by the calling code
* to mean a failure). This is confirmed in the function:
*
* appendAlphanumericBytes(CharSequence content, BitArray bits)
*
* where they pack 2 of these base 45 digits into 11 bits. This presents us with an opportunity.
* If we can take our data, and convert it into a compatible base 45 alphanumeric representation,
* then the QR Encoder will automatically pack that data into sub-byte chunks.
*
* 2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048
* possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
*
* 45 ^ 2 = 2,025
* 2 ^ 11 = 2,048
* 2,048 - 2,025 = 23
* 23 / 2,048 = 0.01123046875 = 1.123%
*
* However, this is the ideal / theoretical efficiency. This implementation processes data in
* chunks, using a Long as a computational buffer. However, since Java Long's are singed, we
* can only use the lower 7 bytes. The conversion code requires continuously positive values;
* using the highest 8th byte would contaminate the sign bit and randomly produce negative
* values.
*
*
* Real-World Test:
*
* Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
*
* Raw Binary Size: 2,048
* Encoded String Size: 3,218
* QR Code Alphanum Size: 2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
*
* This is a real-world storage efficiency loss of only 8%.
*
* 2,213 - 2,048 = 165
* 165 / 2,048 = 0.08056640625 = 8.0566%
*/
public class BinaryToBase45Encoder {
public final static int[] ALPHANUMERIC_TABLE;
/*
* You could probably just copy & paste the array literal from the ZXING source code; it's only
* an array definition. But I was unsure of the licensing issues with posting it on the internet,
* so I did it this way.
*/
static {
final Field SOURCE_ALPHANUMERIC_TABLE;
int[] tmp;
//Copy lookup table from ZXING Encoder class
try {
SOURCE_ALPHANUMERIC_TABLE = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredField("ALPHANUMERIC_TABLE");
SOURCE_ALPHANUMERIC_TABLE.setAccessible(true);
tmp = (int[]) SOURCE_ALPHANUMERIC_TABLE.get(null);
} catch (NoSuchFieldException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
} catch (IllegalAccessException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
}
//Store
ALPHANUMERIC_TABLE = tmp;
}
public static final int NUM_DISTINCT_ALPHANUM_VALUES = 45;
public static final char[] alphaNumReverseIndex = new char[NUM_DISTINCT_ALPHANUM_VALUES];
static {
//Build AlphaNum Index
final int len = ALPHANUMERIC_TABLE.length;
for (int x = 0; x < len; x++) {
// The base45 result which the alphanum lookup table produces.
// i.e. the base45 digit value which String characters are
// converted into.
//
// We use this value to build a reverse lookup table to find
// the String character we have to send to the encoder, to
// make it produce the given base45 digit value.
final int base45DigitValue = ALPHANUMERIC_TABLE[x];
//Ignore the -1 records
if (base45DigitValue > -1) {
//The index into the lookup table which produces the given base45 digit value.
//
//i.e. to produce a base45 digit with the numeric value in base45DigitValue, we need
//to send the Encoder a String character with the numeric value in x.
alphaNumReverseIndex[base45DigitValue] = (char) x;
}
}
}
/*
* The storage capacity of one digit in the number system; i.e. the maximum
* possible number of distinct values which can be stored in 1 logical digit
*/
public static final int QR_PAYLOAD_NUMERIC_BASE = NUM_DISTINCT_ALPHANUM_VALUES;
/*
* We can't use all 8 bytes, because the Long is signed, and the conversion math
* requires consistently positive values. If we populated all 8 bytes, then the
* last byte has the potential to contaminate the sign bit, and break the
* conversion math. So, we only use the lower 7 bytes, and avoid this problem.
*/
public static final int LONG_USABLE_BYTES = Long.BYTES - 1;
//The following mapping was determined by brute-forcing -1 Long (all bits 1), and compressing to base45 until it hit zero.
public static final int[] BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION = new int[] {0,2,3,5,6,8,9,11,12};
public static final int NUM_BASE45_DIGITS_PER_LONG = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[LONG_USABLE_BYTES];
public static final Map<Integer, Integer> BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION = new HashMap<>();
static {
//Build Reverse Lookup
int len = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION.length;
for (int x=0; x<len; x++) {
int numB45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[x];
BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.put(numB45Digits, x);
}
}
public static String encodeToBase45QrPayload(final byte[] inputData) throws IOException {
return encodeToBase45QrPayload(new ByteArrayInputStream(inputData));
}
public static String encodeToBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final StringBuilder strOut = new StringBuilder();
int data;
long buf = 0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Fill buffer
int numBytesStored = 0;
while (numBytesStored < LONG_USABLE_BYTES && in.available() > 0) {
//Read next byte
data = in.read();
//Push byte into buffer
buf = (buf << 8) | data; //8 bits per byte
//Increment
numBytesStored++;
}
//Write out in lower base
final StringBuilder outputChunkBuffer = new StringBuilder();
final int numBase45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[numBytesStored];
int numB45DigitsProcessed = 0;
while(numB45DigitsProcessed < numBase45Digits) {
//Chunk out a digit
final byte digit = (byte) (buf % QR_PAYLOAD_NUMERIC_BASE);
//Drop digit data from buffer
buf = buf / QR_PAYLOAD_NUMERIC_BASE;
//Write Digit
outputChunkBuffer.append(alphaNumReverseIndex[(int) digit]);
//Track output digits
numB45DigitsProcessed++;
}
/*
* The way this code works, the processing output results in a First-In-Last-Out digit
* reversal. So, we need to buffer the chunk output, and feed it to the OutputStream
* backwards to correct this.
*
* We could probably get away with writing the bytes out in inverted order, and then
* flipping them back on the decode side, but just to be safe, I'm always keeping
* them in the proper order.
*/
strOut.append(outputChunkBuffer.reverse().toString());
}
//Return
return strOut.toString();
}
public static byte[] decodeBase45QrPayload(final String inputStr) throws IOException {
//Prep for InputStream
final byte[] buf = inputStr.getBytes();//Use the default encoding (the same encoding that the 'char' primitive uses)
return decodeBase45QrPayload(new ByteArrayInputStream(buf));
}
public static byte[] decodeBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final ByteArrayOutputStream out = new ByteArrayOutputStream();
int data;
long buf = 0;
int x=0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Convert & Fill Buffer
int numB45Digits = 0;
while (numB45Digits < NUM_BASE45_DIGITS_PER_LONG && in.available() > 0) {
//Read in next char
char c = (char) in.read();
//Translate back through lookup table
int digit = ALPHANUMERIC_TABLE[(int) c];
//Shift buffer up one digit to make room
buf *= QR_PAYLOAD_NUMERIC_BASE;
//Append next digit
buf += digit;
//Increment
numB45Digits++;
}
//Write out in higher base
final LinkedList<Byte> outputChunkBuffer = new LinkedList<>();
final int numBytes = BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.get(numB45Digits);
int numBytesProcessed = 0;
while(numBytesProcessed < numBytes) {
//Chunk out 1 byte
final byte chunk = (byte) buf;
//Shift buffer to next byte
buf = buf >> 8; //8 bits per byte
//Write byte to output
//
//Again, we need to invert the order of the bytes, so as we chunk them off, push
//them onto a FILO stack; inverting their order.
outputChunkBuffer.push(chunk);
//Increment
numBytesProcessed++;
}
//Write chunk buffer to output stream (in reverse order)
while (outputChunkBuffer.size() > 0) {
out.write(outputChunkBuffer.pop());
}
}
//Return
out.flush();
out.close();
return out.toByteArray();
}
}
Just at a glance, the qr formats are different. I'd compare the qr formats to see if it's a problem of error correction or encoding or something else.
It turned out that ZXing is just crap, and ZBar does some weird stuff with the data (converting it to UTF-8 for example). I managed to get it to output the raw data including null bytes though. Here is a patch for the best Android ZBar library I found, that has now been merged.
I used System.Convert.ToBase64String to convert the supplied sample byte array into a Base64-encoded string, then I used ZXing to create a QRCode image.
Next I called ZXing to read the string back from the generated QRCode, and then called System.Convert.FromBase64String to convert the string back into a byte array.
I confirm that the data completed the round trip successfully.
The informational RFC 9285 - The Base45 Data Encoding document describing the optimal scheme for storing binary data within the constraints of QR Alphanumeric Mode was recently published by the IETF.
(one positive side-effect of ongoing standardization work surrounding Health Certificate QR-codes)

Source text, key size relationship for encryption/decryption in Go

In the code below (also at http://play.golang.org/p/77fRvrDa4A but takes "too long to process" in the browser there) the 124 byte version of the sourceText won't encrypt because: "message too long for RSA public key size" of 1024. It, and the longer 124 byte sourceText version, work with 2048 bit key size.
My question is how does one exactly calculate the key size in rsa.GenerateKey given the byte length of the source text? (A small paragraph size of text takes nearly 10 seconds at 4096 key size, and I don't know the length of the sourceText until runtime.)
There's a very brief discussion of this at https://stackoverflow.com/a/11750658/3691075, but it's not clear to me as I'm not a crypto guy.
My goal is to encrypt, store in a DB and decrypt about 300-byte long JSON strings. I control both the sending and the receiving end. Text is encrypted once, and decrypted many times. Any hints of strategy would be appreciated.
package main
import (
"crypto/md5"
"crypto/rand"
"crypto/rsa"
"fmt"
"hash"
"log"
"time"
)
func main() {
startingTime := time.Now()
var err error
var privateKey *rsa.PrivateKey
var publicKey *rsa.PublicKey
var sourceText, encryptedText, decryptedText, label []byte
// SHORT TEXT 92 bytes
sourceText = []byte(`{347,7,3,8,7,0,7,5,6,4,1,6,5,6,7,3,7,7,7,6,5,3,5,3,3,5,4,3,2,10,3,7,5,6,65,350914,760415,33}`)
fmt.Printf("\nsourceText byte length:\n%d\n", len(sourceText))
// LONGER TEXT 124 bytes
// sourceText = []byte(`{347,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,65,350914,760415,33}`)
// fmt.Printf("\nsourceText byte length:\n%d\n", len(sourceText))
if privateKey, err = rsa.GenerateKey(rand.Reader, 1024); err != nil {
log.Fatal(err)
}
// fmt.Printf("\nprivateKey:\n%s\n", privateKey)
privateKey.Precompute()
if err = privateKey.Validate(); err != nil {
log.Fatal(err)
}
publicKey = &privateKey.PublicKey
encryptedText = encrypt(publicKey, sourceText, label)
decryptedText = decrypt(privateKey, encryptedText, label)
fmt.Printf("\nsourceText: \n%s\n", string(sourceText))
fmt.Printf("\nencryptedText: \n%x\n", encryptedText)
fmt.Printf("\ndecryptedText: \n%s\n", decryptedText)
fmt.Printf("\nDone in %v.\n\n", time.Now().Sub(startingTime))
}
func encrypt(publicKey *rsa.PublicKey, sourceText, label []byte) (encryptedText []byte) {
var err error
var md5_hash hash.Hash
md5_hash = md5.New()
if encryptedText, err = rsa.EncryptOAEP(md5_hash, rand.Reader, publicKey, sourceText, label); err != nil {
log.Fatal(err)
}
return
}
func decrypt(privateKey *rsa.PrivateKey, encryptedText, label []byte) (decryptedText []byte) {
var err error
var md5_hash hash.Hash
md5_hash = md5.New()
if decryptedText, err = rsa.DecryptOAEP(md5_hash, rand.Reader, privateKey, encryptedText, label); err != nil {
log.Fatal(err)
}
return
}
One does not usually calculate the RSA key size based on payload. One simply needs to select one RSA key size based on a compromise between security (bigger is better) and performance (smaller is better). If that is done, use hybrid encryption in conjunction with AES or another symmetric cipher to actually encrypt the data.
If the payload doesn't exceed 300 bytes and you're using OAEP (at least 42 bytes of padding), then you can easily calculate the minimum key size:
(300 + 42) * 8 = 2736 bit
That's already a reasonable size key. It provides good security according to today's norms and is fairly fast. There is no need to apply a hybrid encryption scheme for this.
Now, you may notice that the key size isn't a power of 2. This is not a problem. You should however use a key size that is a multiple of 64 bit, because processors use 32-bit and 64-bit primitives to do the actual calculation, so you can increase the security without a performance penalty. The next such key size would be:
ceil((300 + 42) * 8 / 64.0) * 64 = 2752 bit
Here are some experimental results what some languages/frameworks accept (not performance-wise) as the key size:
Golang: multiple of 1 bit and >= 1001 (sic!) [used ideone.com]
PyCrypto: multiple of 256 bit and >= 1024 [local install]
C#: multiple of 16 bit and >= 512 [used ideone.com]
Groovy: multiple of 1 bit and >= 512 [local install]
Java: multiple of 1 bit and >= 512 [used ideone.com: Java & Java7]
PHP/OpenSSL Ext: multiple of 128 bit and >= 640 [used ideone.com]
Crypto++: multiple of 1 bit and >= 16 [local install with maximal validation toughness of 3]
Before you decide to use some kind of specific key size, you should check that all frameworks support that size. As you see, there are vastly varying results.
I tried to write some performance tests of key generation, encryption and decryption with different key sizes: 512, 513, 514, 516, 520, 528, 544, 576. Since I don't know any go, it would be hard to get the timing right. So I settled for Java and Crypto++. The Crypto++ code is probably very buggy, because key generation for 520-bit and 528-bit keys is up to seven orders of magnitude faster than for the other key sizes which is more or less constant for the small key size window.
In Java the key generation was pretty clear in that the generation of a 513-bit key was 2-3 times slower than for a 512-bit key. Other than that the results are nearly linear. The graph is normalized and the numbers of iterations is 1000 for the full keygen-enc-dec cycle.
The decryption makes a little dip at 544-bit which is a multiple of 32-bit. Since it was executed on a 32-bit debian, this might mean that indeed there are some performance improvements, but on the other hand the encryption was slower for that key size.
Since this benchmark wasn't done in Go, I won't give any advice on how small the overhead can be.

Decrypt Mega.co.nz file partially using aes 128 ctr for streaming range support

How do you decrypt aes 128 ctr encrypted file from the middle for http range support?
Here is the encrypted file:
https://www.dropbox.com/s/8e9qembud6n3z7i/encrypted.txt?dl=0
the key is base64 encrypted: E7VQWj3cv1JUi5pklirtDQ9SRJt1DhiqYgzPSpIiVP0
Mega docs: https://mega.co.nz/#doc
The IV is calculated by decrypting the key which gives an array:
Array
(
[0] => 330649690
[1] => 1037877074
[2] => 1418435172
[3] => 2519395597
[4] => 257049755
[5] => 1963858090
[6] => 1645006666
[7] => 2451723517
)
The IV is obtained by slicing the array at 4th offset with length of two And the last two elements of the array are filled with 0:
Array
(
[0] => 257049755
[1] => 1963858090
[2] => 0
[3] => 0
)
Then the key is XOR'd and made into a 128bit array which is then converted into string by the php function pack:
$key = array($key[0] ^ $key[4], $key[1] ^ $key[5], $key[2] ^ $key[6], $key[3] ^ $key[7]);
$key = base64_encode(a32_to_str($key));
$iv = base64_encode(a32_to_str($iv));
Then the file is decrypted using the normal php aes library. I am using mcrypt_generic for the decryption process.
The problem arises when I try to decrypt the file from 2nd byte or the 3rd or the middle.
It works fine if I decrypt it from the 1st byte.
Another thing I have noticed is, If I decrypt the file from 2nd byte, but before that, I decrypt a random string or just the digit 0, the decryption works from the 2nd byte then.
I suppose it has something to do with the IV block counter. I decrypt a random byte then continue decrypting the actual cipher so it works.
I need to start decrypting the file from the start, lets say from the 40mb offset to support live strem seeking.
But that would consume too much memory because I will have to decrypt 40mb of 0's before seeking can be done.
How can I move the IV counter value to 40mb offset ??
I read that IV is increased by +1 for each block for decryption. But since my IV is an array I have tried everything it does not work if I add 1 in it.
I've been at it for months with no fruit. Please help
Here is my previous question which helped understanding the process a bit: AES 128 bit CTR partial file decryption with PHP
Your initial research is indeed correct. In CTR mode, the IV (or nonce) is simply incremented by 1 after each encryption operation. (Encryption and decryption are the same operation in CTR mode, so you can substitute one word for the other as necessary.)
In other words, the state of a CTR mode cipher can be predicted in advance – just add the number of blocks already encrypted to the initial IV. In particular, the state does not depend on the plaintext in any way. AES has a block size of 16, so you would add the number of bytes encrypted divided by 16.
The IV can be considered a 128-bit integer stored in big endian. The cryptography API you use represents it as an array of four 32-bit integers. Simply add the number of blocks to the fourth integer before initializing the cipher. If you think you'll need to handle more than four billion blocks or so, you need to add handling for overflow to the third integer.
The slightly trickier part is initializing the cipher to a state where you have already encrypted a number of bytes that is not divisible by the block size. The solution is to first initialize the cipher to the number of bytes already encrypted divided by 16, rounded down, and then encrypting (the number of bytes already encrypted mod 16) dummy bytes. I believe this is in fact what you already suspected.
You're writing in PHP, but I'm posting a method from a Mega downloader program which I've written in Java in case it helps:
public Cipher getDownloadCipher(final long startPosition) throws Exception {
final Cipher cipher = Cipher.getInstance("AES/CTR/NoPadding");
final ByteBuffer buffer = ByteBuffer.allocate(16).put(nonce);
buffer.asLongBuffer().put(startPosition / 16);
cipher.init(Cipher.DECRYPT_MODE, new SecretKeySpec(key, "AES"), new IvParameterSpec(buffer.array()));
final int skip = (int) (startPosition % 16);
if (skip != 0) {
if (cipher.update(new byte[skip]).length != skip) {
//that should always work with a CTR mode cipher
throw new IOException("Failed to skip bytes from cipher");
}
}
return cipher;
}

Get four 16bit numbers from a 64bit hex value

I have been through these related questions:
How to convert numbers between hexadecimal and decimal in C#?
How to Convert 64bit Long Data Type to 16bit Data Type
Way to get value of this hex number
But I did not get an answer probably because I do not understand 64bit or 16bit values.
I had posted a question on Picasa and face detection, to use the face detection that Picasa does to get individual pics from a photo containing many pictures. Automatic Face detection using API
In an answer #Joel Martinez linked to an answer on picasa help which said:
The number encased in rect64() is a 64-bit hexadecimal number.
Break that up into four 16-bit numbers.
Divide each by the maximum unsigned 16-bit number (65535) and you'll have four
numbers between 0 and 1.
the full text
#oedious wrote:- This is going to be
somewhat technical, so hang on. * The
number encased in rect64() is a 64-bit
hexadecimal number. * Break that up
into four 16-bit numbers. * Divide
each by the maximum unsigned 16-bit
number (65535) and you'll have four
numbers between 0 and 1. * The four
numbers remaining give you relative
coordinates for the face rectangle:
(left, top, right, bottom). * If you
want to end up with absolute
coordinates, multiple the left and
right by the image width and the top
and bottom by the image height.
A sample picasa.ini file:
[1.jpg]
backuphash=65527
faces=rect64(5520c092dfb2f8d),615eec1bb18bdec5;rect64(dcc2ccf1fd63e93e),bc209d92a3388dc3;rect64(52524b7c785e6cf6),242908faa5044cb3
crop=rect64(0)
How do I get the 4 numbers from the 64 bit hex?
I am sorry people, currently I do not understand the answers. I guess I will have to learn some C++ (I am a PHP & Java Web Developer with weakness in Math) before I can jump in and write a something which will cut up an image into multiple images with the help of some co-ordinates. I am looking into CodeLab and creating plugins for Paint.net too
If you want basics, say you have this hexadecimal number:
4444333322221111
We split it into your 4 parts on paper, so all that's left is to extract them. This involves using a ffff mask to block out everything else besides our number (f masks nothing, 0 masks everything) and sliding it over each part. So we have:
part 1: 4444333322221111 & ffff = 1111
part 2: 4444333322221111 & ffff0000 = 22220000
part 3: 4444333322221111 & ffff00000000 = 333300000000
part 4: 4444333322221111 & ffff000000000000 = 4444000000000000
All that's left is to remove the 0's at the end. All in all, in C, you'd write this as:
int GetPart(int64 pack, int n) // where you define int64 as whatever your platform uses
{ // __int64 in msvc
return (pack & (0xffff << (16*n)) >> (16*n);
}
So basically, you calculate the mask as 0xffff (2 bytes) moved to the right 16*n bits (0 for the first, 16 for the 2nd, 32 for the 3rd and 48 for the 4th), apply it over the number to mask out everything but the part we're interested in, then shift the result back 16*n bits to clear out those 0's at the end.
Some additional reading: Bitwise operators in C.
Hope that helps!
Here is the algorithm:
The remainder of the division by 0x10000 (65536) will give you the first number.
Take the result then divide by 0x10000 (65536) again, the remainder will give you the second number.
Take the result the divide by 0x10000 (65536) again, the remainder will give you the third number.
The result is the fourth number.
It depends on your programming language - in C# i.e. you can use the BitConverter class, which allows you to extract a number based on the byte position within a byte array.
UInt64 largeHexNumber = 420404334;
byte[] hexData = BitConverter.GetBytes(largeHexNumber);
UInt16 firstValue = BitConverter.ToUInt16(hexData, 0);
UInt16 secondValue = BitConverter.ToUInt16(hexData, 2);
UInt16 thirdValue = BitConverter.ToUInt16(hexData, 4);
UInt16 forthValue = BitConverter.ToUInt16(hexData, 6);
It depends on the language. For the C-family of languages, it can be done like this (in C#):
UInt64 number = 0x4444333322221111;
//to get the ones, use a mask
// 0x4444333322221111
const UInt64 mask1 = 0xFFFF;
UInt16 part1 = (UInt16)(number & mask1);
//to get the twos, use a mask then shift
// 0x4444333322221111
const UInt64 mask2 = 0xFFFF0000;
UInt16 part2 = (UInt16)((number & mask2) >> 16);
//etc.
// 0x4444333322221111
const UInt64 mask3 = 0xFFFF00000000;
UInt16 part3 = (UInt16)((number & mask3) >> 32);
// 0x4444333322221111
const UInt64 mask4 = 0xFFFF000000000000;
UInt16 part4 = (UInt16)((number & mask4) >> 48);
What I think you are being asked to do is take the 64 bits of data you have and treat it like 4 16-bit integers. From there you are taking the 16-bit values and converting them to percentages. Those percentages, when multiplied to the image height/width, give you 4 coordinates.
How you do this depends on the language you're programming in.
I needed to convert the crop=rect64() values from picasa.ini file.
I created the following ruby method with the above information.
def coordinates(hex_num)
[
hex_num.divmod(65536)[1],
hex_num.divmod(65536)[0].divmod(65536)[1],
hex_num.divmod(65536)[0].divmod(65536)[0].divmod(65536)[1],
hex_num.divmod(65536)[0].divmod(65536)[0].divmod(65536)[0].divmod(65536)[1]
].reverse
end
It works, but I needed to add the .reverse method on the array to achieve the desired result.

How unique is UUID?

How safe is it to use UUID to uniquely identify something (I'm using it for files uploaded to the server)? As I understand it, it is based off random numbers. However, it seems to me that given enough time, it would eventually repeat it self, just by pure chance. Is there a better system or a pattern of some type to alleviate this issue?
Very safe:
the annual risk of a given person being hit by a meteorite is
estimated to be one chance in 17 billion, which means the
probability is about 0.00000000006 (6 × 10−11), equivalent to the odds
of creating a few tens of trillions of UUIDs in a year and having one
duplicate. In other words, only after generating 1 billion UUIDs every
second for the next 100 years, the probability of creating just one
duplicate would be about 50%.
Caveat:
However, these probabilities only hold when the UUIDs are generated
using sufficient entropy. Otherwise, the probability of duplicates
could be significantly higher, since the statistical dispersion might
be lower. Where unique identifiers are required for distributed
applications, so that UUIDs do not clash even when data from many
devices is merged, the randomness of the seeds and generators used on
every device must be reliable for the life of the application. Where
this is not feasible, RFC4122 recommends using a namespace variant
instead.
Source: The Random UUID probability of duplicates section of the Wikipedia article on Universally unique identifiers (link leads to a revision from December 2016 before editing reworked the section).
Also see the current section on the same subject on the same Universally unique identifier article, Collisions.
If by "given enough time" you mean 100 years and you're creating them at a rate of a billion a second, then yes, you have a 50% chance of having a collision after 100 years.
There is more than one type of UUID, so "how safe" depends on which type (which the UUID specifications call "version") you are using.
Version 1 is the time based plus MAC address UUID. The 128-bits contains 48-bits for the network card's MAC address (which is uniquely assigned by the manufacturer) and a 60-bit clock with a resolution of 100 nanoseconds. That clock wraps in 3603 A.D. so these UUIDs are safe at least until then (unless you need more than 10 million new UUIDs per second or someone clones your network card). I say "at least" because the clock starts at 15 October 1582, so you have about 400 years after the clock wraps before there is even a small possibility of duplications.
Version 4 is the random number UUID. There's six fixed bits and the rest of the UUID is 122-bits of randomness. See Wikipedia or other analysis that describe how very unlikely a duplicate is.
Version 3 is uses MD5 and Version 5 uses SHA-1 to create those 122-bits, instead of a random or pseudo-random number generator. So in terms of safety it is like Version 4 being a statistical issue (as long as you make sure what the digest algorithm is processing is always unique).
Version 2 is similar to Version 1, but with a smaller clock so it is going to wrap around much sooner. But since Version 2 UUIDs are for DCE, you shouldn't be using these.
So for all practical problems they are safe. If you are uncomfortable with leaving it up to probabilities (e.g. your are the type of person worried about the earth getting destroyed by a large asteroid in your lifetime), just make sure you use a Version 1 UUID and it is guaranteed to be unique (in your lifetime, unless you plan to live past 3603 A.D.).
So why doesn't everyone simply use Version 1 UUIDs? That is because Version 1 UUIDs reveal the MAC address of the machine it was generated on and they can be predictable -- two things which might have security implications for the application using those UUIDs.
The answer to this may depend largely on the UUID version.
Many UUID generators use a version 4 random number. However, many of these use Pseudo a Random Number Generator to generate them.
If a poorly seeded PRNG with a small period is used to generate the UUID I would say it's not very safe at all. Some random number generators also have poor variance. i.e. favouring certain numbers more often than others. This isn't going to work well.
Therefore, it's only as safe as the algorithms used to generate it.
On the flip side, if you know the answer to these questions then I think a version 4 uuid should be very safe to use. In fact I'm using it to identify blocks on a network block file system and so far have not had a clash.
In my case, the PRNG I'm using is a mersenne twister and I'm being careful with the way it's seeded which is from multiple sources including /dev/urandom. Mersenne twister has a period of 2^19937 − 1. It's going to be a very very long time before I see a repeat uuid.
So pick a good library or generate it yourself and make sure you use a decent PRNG algorithm.
For UUID4 I make it that there are approximately as many IDs as there are grains of sand in a cube-shaped box with sides 360,000km long. That's a box with sides ~2 1/2 times longer than Jupiter's diameter.
Working so someone can tell me if I've messed up units:
volume of grain of sand 0.00947mm^3 (Guardian)
UUID4 has 122 random bits -> 5.3e36 possible values (wikipedia)
volume of that many grains of sand = 5.0191e34 mm^3 or 5.0191e+25m^3
side length of cubic box with that volume = 3.69E8m or 369,000km
diameter of Jupiter: 139,820km (google)
I concur with the other answers. UUIDs are safe enough for nearly all practical purposes1, and certainly for yours.
But suppose (hypothetically) that they aren't.
Is there a better system or a pattern of some type to alleviate this issue?
Here are a couple of approaches:
Use a bigger UUID. For instance, instead of a 128 random bits, use 256 or 512 or ... Each bit you add to a type-4 style UUID will reduce the probability of a collision by a half, assuming that you have a reliable source of entropy2.
Build a centralized or distributed service that generates UUIDs and records each and every one it has ever issued. Each time it generates a new one, it checks that the UUID has never been issued before. Such a service would be technically straight-forward to implement (I think) if we assumed that the people running the service were absolutely trustworthy, incorruptible, etcetera. Unfortunately, they aren't ... especially when there is the possibility of governments' security organizations interfering. So, this approach is probably impractical, and may be3 impossible in the real world.
1 - If uniqueness of UUIDs determined whether nuclear missiles got launched at your country's capital city, a lot of your fellow citizens would not be convinced by "the probability is extremely low". Hence my "nearly all" qualification.
2 - And here's a philosophical question for you. Is anything ever truly random? How would we know if it wasn't? Is the universe as we know it a simulation? Is there a God who might conceivably "tweak" the laws of physics to alter an outcome?
3 - If anyone knows of any research papers on this problem, please comment.
Quoting from Wikipedia:
Thus, anyone can create a UUID and use
it to identify something with
reasonable confidence that the
identifier will never be
unintentionally used by anyone for
anything else
It goes on to explain in pretty good detail on how safe it actually is. So to answer your question: Yes, it's safe enough.
UUID schemes generally use not only a pseudo-random element, but also the current system time, and some sort of often-unique hardware ID if available, such as a network MAC address.
The whole point of using UUID is that you trust it to do a better job of providing a unique ID than you yourself would be able to do. This is the same rationale behind using a 3rd party cryptography library rather than rolling your own. Doing it yourself may be more fun, but it's typically less responsible to do so.
Been doing it for years. Never run into a problem.
I usually set up my DB's to have one table that contains all the keys and the modified dates and such. Haven't run into a problem of duplicate keys ever.
The only drawback that it has is when you are writing some queries to find some information quickly you are doing a lot of copying and pasting of the keys. You don't have the short easy to remember ids anymore.
Here's a testing snippet for you to test it's uniquenes.
inspired by #scalabl3's comment
Funny thing is, you could generate 2 in a row that were identical, of course at mind-boggling levels of coincidence, luck and divine intervention, yet despite the unfathomable odds, it's still possible! :D Yes, it won't happen. just saying for the amusement of thinking about that moment when you created a duplicate! Screenshot video! – scalabl3 Oct 20 '15 at 19:11
If you feel lucky, check the checkbox, it only checks the currently generated id's. If you wish a history check, leave it unchecked.
Please note, you might run out of ram at some point if you leave it unchecked. I tried to make it cpu friendly so you can abort quickly when needed, just hit the run snippet button again or leave the page.
Math.log2 = Math.log2 || function(n){ return Math.log(n) / Math.log(2); }
Math.trueRandom = (function() {
var crypt = window.crypto || window.msCrypto;
if (crypt && crypt.getRandomValues) {
// if we have a crypto library, use it
var random = function(min, max) {
var rval = 0;
var range = max - min;
if (range < 2) {
return min;
}
var bits_needed = Math.ceil(Math.log2(range));
if (bits_needed > 53) {
throw new Exception("We cannot generate numbers larger than 53 bits.");
}
var bytes_needed = Math.ceil(bits_needed / 8);
var mask = Math.pow(2, bits_needed) - 1;
// 7776 -> (2^13 = 8192) -1 == 8191 or 0x00001111 11111111
// Create byte array and fill with N random numbers
var byteArray = new Uint8Array(bytes_needed);
crypt.getRandomValues(byteArray);
var p = (bytes_needed - 1) * 8;
for(var i = 0; i < bytes_needed; i++ ) {
rval += byteArray[i] * Math.pow(2, p);
p -= 8;
}
// Use & to apply the mask and reduce the number of recursive lookups
rval = rval & mask;
if (rval >= range) {
// Integer out of acceptable range
return random(min, max);
}
// Return an integer that falls within the range
return min + rval;
}
return function() {
var r = random(0, 1000000000) / 1000000000;
return r;
};
} else {
// From http://baagoe.com/en/RandomMusings/javascript/
// Johannes Baagøe <baagoe#baagoe.com>, 2010
function Mash() {
var n = 0xefc8249d;
var mash = function(data) {
data = data.toString();
for (var i = 0; i < data.length; i++) {
n += data.charCodeAt(i);
var h = 0.02519603282416938 * n;
n = h >>> 0;
h -= n;
h *= n;
n = h >>> 0;
h -= n;
n += h * 0x100000000; // 2^32
}
return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
};
mash.version = 'Mash 0.9';
return mash;
}
// From http://baagoe.com/en/RandomMusings/javascript/
function Alea() {
return (function(args) {
// Johannes Baagøe <baagoe#baagoe.com>, 2010
var s0 = 0;
var s1 = 0;
var s2 = 0;
var c = 1;
if (args.length == 0) {
args = [+new Date()];
}
var mash = Mash();
s0 = mash(' ');
s1 = mash(' ');
s2 = mash(' ');
for (var i = 0; i < args.length; i++) {
s0 -= mash(args[i]);
if (s0 < 0) {
s0 += 1;
}
s1 -= mash(args[i]);
if (s1 < 0) {
s1 += 1;
}
s2 -= mash(args[i]);
if (s2 < 0) {
s2 += 1;
}
}
mash = null;
var random = function() {
var t = 2091639 * s0 + c * 2.3283064365386963e-10; // 2^-32
s0 = s1;
s1 = s2;
return s2 = t - (c = t | 0);
};
random.uint32 = function() {
return random() * 0x100000000; // 2^32
};
random.fract53 = function() {
return random() +
(random() * 0x200000 | 0) * 1.1102230246251565e-16; // 2^-53
};
random.version = 'Alea 0.9';
random.args = args;
return random;
}(Array.prototype.slice.call(arguments)));
};
return Alea();
}
}());
Math.guid = function() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.trueRandom() * 16 | 0,
v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
};
function logit(item1, item2) {
console.log("Do "+item1+" and "+item2+" equal? "+(item1 == item2 ? "OMG! take a screenshot and you'll be epic on the world of cryptography, buy a lottery ticket now!":"No they do not. shame. no fame")+ ", runs: "+window.numberofRuns);
}
numberofRuns = 0;
function test() {
window.numberofRuns++;
var x = Math.guid();
var y = Math.guid();
var test = x == y || historyTest(x,y);
logit(x,y);
return test;
}
historyArr = [];
historyCount = 0;
function historyTest(item1, item2) {
if(window.luckyDog) {
return false;
}
for(var i = historyCount; i > -1; i--) {
logit(item1,window.historyArr[i]);
if(item1 == history[i]) {
return true;
}
logit(item2,window.historyArr[i]);
if(item2 == history[i]) {
return true;
}
}
window.historyArr.push(item1);
window.historyArr.push(item2);
window.historyCount+=2;
return false;
}
luckyDog = false;
document.body.onload = function() {
document.getElementById('runit').onclick = function() {
window.luckyDog = document.getElementById('lucky').checked;
var val = document.getElementById('input').value
if(val.trim() == '0') {
var intervaltimer = window.setInterval(function() {
var test = window.test();
if(test) {
window.clearInterval(intervaltimer);
}
},0);
}
else {
var num = parseInt(val);
if(num > 0) {
var intervaltimer = window.setInterval(function() {
var test = window.test();
num--;
if(num < 0 || test) {
window.clearInterval(intervaltimer);
}
},0);
}
}
};
};
Please input how often the calulation should run. set to 0 for forever. Check the checkbox if you feel lucky.<BR/>
<input type="text" value="0" id="input"><input type="checkbox" id="lucky"><button id="runit">Run</button><BR/>
I don't know if this matters to you, but keep in mind that GUIDs are globally unique, but substrings of GUIDs aren't.
I should mention I bought two external Seagate drives on Amazon, and they had the same device UUID, but differing PARTUUID. Presumably the cloning software they used to format the drives just copied the UUID as well.
Obviously UUID collisions are much more likely to happen due to a flawed cloning or copying process than from random coincidence. Bear that in mind when calculating UUID risks.

Resources