randomblob(N) in sqlite - sqlite

The documentation (http://www.sqlite.org/lang_corefunc.html) says that it generates a N-byte blob, and it goes on to give an example of using randomblob(16) with hex() for generating unique ids.
But isn't a randomblob(8) is more than enough for most databases. 8 bytes gives 64 bits, which would give 2^64 different possible values (which will be converted into hex format by hex(randomblob(8)). Why waste the extra 8 bytes here?

GUIDs are defined as having 128 bits.

Related

Finding Aes256 keys

I have a question about aes keys.
I have a binary file which contains an aes256 key (32 bytes) at an unknown offset.
Would it be somehow possible to find this key in the file? Is it somehow possible to tell whether the next 32 bytes would be a valid aes key?
Thanks in advance
EDIT:
Thanks for all of your answers,
The key is stored in the file as normal bytes.
I finally managed to create a way to get it.
I basically filter out all strings, which actually made it work.
Thanks again
Well, yes and no. AES-256 keys should consist of just 32 bytes that are indistinguishable from random. Most files do not consist of just random bytes, so it could be possible t find a sequence that is most likely random, and this could be that key you are looking for. However, it might very well be that there are other random sequences in the file, or sequences that look like random but aren't random at all (such as the binary representation of the number Pi).
It may also be that you are unlucky and that the AES key doesn't look all that random. Or that the key is stored in hexadecimals (text) rather than binary byte values. Then there is the issue of finding the exact offset that might be the problem (is that initial byte with value 0x20 indicating the size of the AES key, a space character or part of the key value)?
Most files have a specific format, so you should first have a look at that. Just looking for random sequences may give you both false positives (rather likely) or false negatives (less likely). If you expect 64 bytes of randomness (two keys) then I suggest you search for that first, as it brings down the chance of false positives by a rather large amount.
No - unless you have a way to verify the key against a known plaintext/ciphertext pair - an AES key is not distinguishable from random noise. Any set of 16, 24 or 32 bytes is a valid AES key.

Length of AES encrypted data

I have a data that needs to be stored in a database as encrypted, the maximum length of the data before encryption is 50 chars (English or Arabic), I need to encrypt the data using AES-128 bit, and store the output in the database (base64string).
How to know the length of the data after encryption?
Try it with your specified algorithm, block size, IV size, and see what size output you get :-)
First it depends on the encoding of the input text. Is it UTF8? UTF16?
Lets assume UTF8 so 1 Byte per character means 50 Bytes of input data to your encryption algorithm. (100 Bytes if UTF16)
Then you will pad to the Block Size for the algorithm. AES, regardless of key size is a block of 16 Bytes. So we will be padded out to 64 Bytes (Or 112 for UTF 16)
Then we need to store the IV and header information. So that is (usually, with default settings/IV sizes) another 16Bytes so we are at 80 Bytes (Or 128 for UTF16)
Finally we are encoding to Base64. I assume you want string length, since otherwise it is wasteful to make it into a string. So Base 64 bloats the string using the following formula: Ceil(bytes/3) * 4. So for us that is Ceil(80/3) = 27 * 4 = 108 characters (Or 172 for UTF 16)
Again this is all highly dependent on your choices of how you encrypt, what the text is encoded as, etc.
I would try it with your scenario before relying on these numbers for anything useful.

Huffman code length

I want to build a huffman tree and assign a code to all the 255 byte values based on their frequencies. But For my application I need a hash table to get the code for a byte in constant time. But in worst case the tree may be so unbalanced that certain bytes have a very large key (even 254 bit long) . So maintaining a hash table is being very difficult. The code requires high performance and so stroing them as a string won't work. How can I resolve the issue?
Why would you need a hash table for 256 values? Simply have a 256-entry table where you directly index the code for each byte.
Each code is at most 32 bytes long, so just have a table of 256 entries, each with a fixed number of 33 bytes per entry. 8448 bytes. The first byte of the 33 being the length of the code in bits, and the remaining bytes being the code, of which you only use the requisite number of bits for each.

Good Idea/Bad Idea: Using Qt's QSet on very large dataset?

Is it a bad idea to use QSet to keep track of a very large set of fairly large strings? Each string is 54 characters (108 bytes). The set may contain thousands of entries (I'm not sure on the exact number yet). The QSet will only be used for insertion and membership query.
If it is a bad idea, I'm definitely open to suggestions. My 54 character strings are composed of only 6 different characters (e.g. "AAAAAAAAABBBBBBBBBCCCCCCCCCDDDDDDDDDEEEEEEEEEFFFFFFFFF"). This seems like a good candidate for compression, perhaps? Any other suggestions are welcome.
Realize that by using a built-in set, you're going to have some path-level compression based on the nature of your data. Of course, this depends on the container's implementation.
Look at some information on radix trees, digital search trees, red-black trees, etc. You'll see that you don't need to store each and every string, but rather the patterns. For instance, let's simplify your problem: we have only 3 characters that can appear an maximum of 2 times each, and each string is 6 characters long. Three possible strings are:
AABBCC, AABCBC, and AACBCB
With these examples, we could get away with using a maximum of 6 + 3 + 4 = 13 nodes instead of a full 18 nodes. not substantial, but I don't know what you're doing either. As with any type of compression, the more your prefix patterns are reused, the more compression you have.
Edit:
The numbers 13 and 18 come from the path-level compression. For instance, in straight C (for argument/discussion), if I am implementing my string storage class as a wrapper around an array I would probably just have an array of character pointers with each pointer referencing a spot in memory that contains a pattern. In the example I gave above, this would take 18 characters ( 6 * 3 = 18). Adding on the size of the array (let's say that sizeof(char*) is 4, our array would take 3 * 4 bytes of storage = 12 + 18 or 30 bytes total to store our patterns.
If I am instead storing the patterns in a sort of digital search tree, I make a small tradeoff. The nodes in my tree are going to be larger than 1 byte apiece (1 byte for the character in the node, 4 bytes for the "next" pointer in each node, 5 bytes apiece). The first pattern we store is AABBCC. This is 6 nodes in the tree. Next is AABCBC. We reuse the path AAB from the first tree and need only an additional 3 nodes for CBC. The last pattern is AACBCB. We reuse AA, and need 4 new nodes for CBCB. This is a total of 13 nodes * 5 bytes = 65 bytes of storage. However, if you have a lot of long, repeating patterns in the prefix of your data, then you'll see some prefix path-level compression.
If this isn't the case for you, I would look into Huffman or LZW compression. This will require you to build a dictionary of patterns that have integer numbers tied to them. When you compress, you build the dictionary and create integer id's for each pattern in your text. You then replace the patterns in your text with the integer id's. When uncompressing, you do the opposite. I don't have the time to describe these algorithms in more detail, so you'll need to look them up.
It's a tradeoff in simplicity/time. If your data will allow it, take the shorter method and just use the built-in container. If not, you will need something more tailored to your data.
I don't think you'd have any additional problems using QSet over another sort of container, such as std::set, a map, or a vector. If you are wondering about running out of memory, that probably depends on how many thousands of the strings you need to store, and if there was a way to encode them more concisely. (For example, if the characters always occur in the same order but vary in relative lengths, store the length for each character rather than all of the characters.) However, even 50,000 of these strings is only around 5 MB, and 500,000 of them is only 50 MB to store, discounting storage overhead, which is a moderate amount of memory on modern machines.
QSet does sound like a good idea. It's basically just a hash-table and it can optimize its bucket size dynamically. Perfect.
Another suggestion for compressing the key:
Treat it as a base-6 number string (think A=0, B=1, ... F=5) and convert it into binary (int).
QByteArray ba("112"); // instead of "BBC"
int num = ba.toInt(0, 6 /*base*/); // num == 44
6^3 < 2^8, so we can represent every 3 chars in your string with a 1 byte int (or char) and make a bytearray of it. That would cut down the size of the key from 54 bytes to 18 bytes.
From your earlier comment: "In my strings, there will always be 54 characters, and there will always be 9 of each character. The order is the only thing that changes."
Don't store raw strings then. You could just compress them into the 6 characters actually used, and then make a QSet of those. A trivial compression would be {a,b,c,d,e,f}, and if the character set is known beforehand (and only those 6 characters) you could even pack things into a 16-bit integer.

Trying to decode 128bit or 256 bit string?

the password string is some kind of like that
MTY5LTYtNjEtMjAxLTkwLTE3MS05My0yMDAtMTMxLTE5Mi01My0xNjItMC0yMjAtMTgxLTIyNg==
I tried base 64 encoder and it gives me:
169-6-61-201-90-171-93-200-131-192-53-162-0-220-181-226
Looks like encode by ASCII Code
I put the numbers on ASCII code list gives me :
©=ÉZ«]ȃÀ5¢Üµâ
But this not the password that i looked.
Does anyone know the solution.
I am not an expert sorry for bad explaining.
The string contains 16 numbergroups and each number is between 0 and 255. So it looks like 16 bytes. And 16 bytes / 128 bits is the size of an md5 hash. So that would be my guess.
While a crypto hash function can't be easily reversed, there are online rainbowtable services which can revert them for short or common inputs. But if the programmer who wrote it did it right (used a salt and many iterations) they won't help.
I'd split it in 16 numbers, than convert these to a byte array of size 16, and then hexencode them, since that's the form most programs will accept. Edit: See Kenny's comment
And then search for some website which allows search in rainbow tables. And pray...

Resources