Is there an algorithm to compress a version 4 GUID to less than 16 bytes? - guid

I reverse engineered the Windows UuidCreate() method assembly and am guessing it's the same algorithm that MS uses for SQL Server. Anyhow, it is a recursive algorithm which self-modifies a 256 byte array as it uses that same 256 byte array to generate the 16 GUID bytes. There are actually 8 such 256 byte arrays that they round robin through. After 8 GUIDs generated they cycle back to the first. They basically just sum up the bytes, adding in one more byte at a time to the sum, doing a mod base 256 on the sum as they go and then swapping current byte with the byte value at that offset in that array, then summing up those 2 values jut swapped to then index back into that array to grab the next GUID byte. I've been thinking about this a few months now off and on and can't figure out away to compress it. Anyone know of any group theory behind any such algorithm which might be able to compress such a version 4 GUID?

Related

Write HEX value out on Arduino digital lines (4)

I am attempting to address 16 bytes on an SRAM chip using 74595 to load memory and then read it later. My issue is the addressing to the memory is 4 bits that I want to control from the Arduino. I would prefer a method to write the decimal value out as binary to the 4 lines at once, but not sure of the best or most efficient method.

How stream cipher works

I am new here, and I am trying to understand encryption. I have done a lot of reading here and I can not find an explanation that could help me understand.
When we are talking about stream ciphers, from what I understood, the encryption is done bit by bit.
Does that mean that the input text (let's say "Google") is encrypted character by character(because that would be byte by byte) ? Or is it converted to binary first, then the sequence of 0 and 1 is encrypted bit bi bit?
Thank you.
When we are talking about stream ciphers, from what I understood, the encryption is done bit by bit.
I assume you are talking about the simple XOR-ing of plaintext with the cipherstream.
Stream ciphers are often defined (theoretically, as a formal definition) as PRG (pseudo random generator) producing bit by bit with non-guessable probability. I've seen such a definition in multiple courses. You could (in theory) apply the XOR operation bit byt bit. As you've already find out that would not be very practical in current computer architecture.
Or is it converted to binary first, then the sequence of 0 and 1 is encrypted bit bi bit?
Practically the cipher streams are having some internal state and produces the output as stream of bytes or a byte array. As a result the string is converted as a byte array and and XOR is applied to the whole array (byte by byte or whole chunks of bytes)

How many bits of integer data can be stored in a DynamoDB attribute of type Number?

DynamoDB's Number type supports 38 digits of decimal precision. This is not big enough to store a 128-bit integer which would require 39 digits. The max value is 340,282,366,920,938,463,463,374,607,431,768,211,455 for unsigned 128-bit ints or 170,141,183,460,469,231,731,687,303,715,884,105,727 for signed 128-bit ints. These are both 39-digit numbers.
If I can't store 128 bits, then how many bits of integer data can I store in a Number?
DynamoDB attribute of type Number can store 126-bit integers (or 127-bit unsigned integers, with serious caveats).
According to Amazon's documentation:
Numbers can have up to 38 digits precision. Exceeding this results in an exception.
This means (verified by testing in the AWS console) that the largest positive integer and smallest negative integers, respectively, that DynamoDB can store in a Number attribute are:
99,999,999,999,999,999,999,999,999,999,999,999,999 (aka 10^38-1)
-99,999,999,999,999,999,999,999,999,999,999,999,999 (aka -10^38+1)
These numbers require 126 bits of storage, using this formula:
bits = floor (ln(number) / ln (2))
= floor (87.498 / 0.693)
= floor (126.259)
= 126
So you can safely store a 126-bit signed int in a DynamoDB.
If you want to live dangerously, you can store a 127-bit unsigned int too, but there are some caveats:
You'd need to avoid (or at least be very careful) using such a number as a sort key, because values with a most-significant-bit of 1 will sort as negative numbers.
Your app will need to convert unsigned ints to signed ints when storing them or querying for them in DynamoDB, and will also need to convert them back to unsigned after reading data from DynamoDB.
If it were me, I wouldn't take these risks for one extra bit without a very, very good reason.
One logical question is whether 126 (or 127 given the caveats above) is good enough to store a UUID. The answer is: it depends. If you are in control of the UUID generation, then you can always shave a bit or two from the UUID and store it. If you shave from the 4 "version" bits (see format here) then you may not be losing any entropy at all if you are always generating UUIDs with the same version.
However, if someone else is generating those UUIDs AND is expecting lossless storage, then you may not be able to use a Number to store the UUID. But you may be able to store it if you restrict clients to a whitelist of 4-8 UUID versions. The largest version now is 5 out of a 0-15 range, and some of the older versions are discouraged for privacy reasons, so this limitation may be reasonable depending on your clients and whether they adhere to the version bits as defined in RFC 4122.
BTW, I was surprised that this bit-limit question wasn't already online... at least not in an easily-Google-able place. So contributing this Q&A pair so future searchers can find it.

brute force encrypted file (XOR encryption)

I've got a file (GIF type if that matters) that is encrypted using the XOR algorithm.
The only thing i have is the encrypted text so no key or plain text. Now i was wondering how i can brute force this file to get the (symmetrical) key to eventually decrypt it.
IF i'm not mistaken it should be a 10 byte key. I've looked into using john the ripper but i almost only see that being used to brute force accounts.
Also if it is relevant, i do not have a file which could contain the key so it would have to self generate it's possible keys.
update:
now i found a way to generate all possible hexadecimal keys, now i'll have to encrypt the file again with the xor algorithm to decrypt it if this makes sense. Now performing this operation is not gonna be a problem but how do i check if the encryption to decrypt worked when it had the correct key so basicly it stops trying any further?
You (and #gusto2) are exactly correct on using the magic number: you immediately get the first 6 bytes of the key by knowing that the first 6 bytes are GIF89a.
Following the gif specification, we can learn more of the key. Here are a few tips, where I am numbering the bytes of your file from index 0 (so bytes 0-5 correspond to the magic number):
The last byte of a plaintext gif file is 0x3B. This possibly gives you one more byte of the key (depending on the file size, e.g. if the file size is equiv to 7, 8, or 9 modulo 10 then you get the key byte)
After the magic number is a 7 byte Logical Screen Descriptor. The first 4 bytes tell the width and height: if you knew the width and height of your gif, then you would be able to derive the remaining 4 unknown bytes of the key. Let's assume you don't know it.
Byte 10 of the file you will know because it corresponds to key byte 0 in your XOR encryption. When you decrypt that byte, the most significant bit is the Global Color Table Flag. If this bit is 0, then there is no Global Color Table -- which means that the next byte (byte 11) is either an image (byte is 0x2C) block or an extension (0x21) block. Again, you can decrypt this byte (because it corresponds to key byte 1) so you know exactly what it is.
Images come in image blocks starting with 0x2C and ending with 00.
There are two approaches you can do to decrypt this:
(1) Work by hand, as I am describing it above. You should be able to interpret the blocks, and look for the expected key byte values of 0x2c, 0x21, 0x00, and 0x3b. From there you can figure out what makes sense to be next, and derive key bytes by hand; or
(2) You brute force the last 4 bytes (2^32 possible values). For each guess, you decrypt the candidate gif image and then feed the result into a gif parser (example parser ) to see if it barfs or not. If it barfs, then you know that candidate is wrong. If it does not, then you have a possible real decryption and you save it. At the end, you look through your real candidates one-by-one (you don't expect many candidates) to see which one is the right decryption.
EDIT: You said that the width and height are 640 and 960. That means that bytes 6 and 7 will be the little endian representation of 640 and then 960 in little endian for bytes 8 and 9. You should have the entire key from this. Try it and let us know if it works. Make sure you get the endianess right!

Probability of collision with truncated SHA-256 hash

I have a database-driven web application where the primary keys of all data rows are obfuscated as follows: SHA256(content type + primary key + secret), truncated to the first 8 characters. The content type is a simple word, e.g. "post" or "message" and the secret is a 20-30 char ASCII constant. The result is stored in a separate indexed column for fast DB lookup.
How do I calculate the probability of a hash collision in this scenario? I am not a mathematician at all, but a friend claimed that due to the Birthday Paradox the collision probability would be ~1% for 10,000 rows with an 8-char truncation. Is there any truth to this claim?
Yes, there is a collision probability & it's probably somewhat too high. The exact probability depends on what "8 characters" means.
Does "8 characters" mean:
A) You store 8 hex characters of the hash? That would store 32 bits.
B) You store 8 characters of BASE-64? That would store 48 bits.
C) You store 8 bytes, encoded in some single-byte charset/ or hacked in some broken way into a character encoding? That would store 56-64 bits, but if you don't do encoding right you'll encounter character conversion problems.
D) You store 8 bytes, as bytes? That genuinely stores 64 bits of the hash.
Storing binary data as either A) hex or D) binary bytes, would be my preferred options. But I'd definitely recommend either reconsidering your "key obfuscation" scheme or significantly expanding the stored key-size to reduce the (currently excessive) probability of key collision.
From Wikipedia:
https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
The birthday problem in this more generic sense applies to hash functions: the expected number of N-bit hashes that can be generated before getting a collision is not 2^N, but rather only 2^(N/2).
Since in the most conservative above understanding of your design (reading it as A, 8 chars of hex == 32 bits) your scheme would be expected to suffer collisions if it stored on the scale of ~64,000 rows. I would consider such an outcome unacceptable for all serious, or even toy, systems.
Transaction tables may have volumes, allowing growth for the business, from 1000 - 100,000 transactions/day (or more). Systems should be designed to function 100 years (36500 days), with a 10x growth factor built in, so..
For your keying mechanism to be genuinely robust & professionally useful, you would need to be able to scale it up to potentially handle ~36 billion (2^35) rows without collision. That would imply 70+ bits of hash.
The source-control system Git, for example, stores 160 bits of SHA-1 hash (40 chars of hex == 20 bytes or 160 bits). Collisions would not be expected to be probable with < less than 2^80 different file revisions stored.
A possibility better design might be, rather than hashing & pseudo-randomizing the key entirely & hoping (against hope) to avoid collisions, to prepend/ append/ fold-in 8-10 bits of a hash into the key.
This would generates a larger key, containing all the uniqueness of the original key plus 8-10 bits of verification. Attempts to access keys would then be verified, and more than 3 invalid requests would be treated as an attempt to violate security by "probing" the keyspace & would trigger semi-permanent lockout.
The only major costs here, would be a modest reduction in the size of the available keyspace for a given int-size. 32-bit int to/from the browser would have 8-10 bits dedicated to security, thus leaving 22-24 for the actual key. So you'd use 64-bit ints where that was not sufficient.

Resources