I wanna know whether a hex number "0xDEADBEEF" is a 32-bit signed number or unsigned number. Because a 32-bit singed number ranges from -2,147,483,648 - 2,147,483,647 but it is 3,735,928,559 so anyone know about this?
Neither. Sign-ed-ness isn't a property of a set of bits. It's an interpretation layer you impose on top of bits, which inform how you read them.
If you're reading this in a context where you expect these bits to encode an unsigned 32 bit integer, then they have a decimal value of 3,735,928,559.
But if you instead read them in a context where you expect thse bits to encode a signed 32 bit ingeger, then they have a decimal value of -559,038,737.
Related
DynamoDB's Number type supports 38 digits of decimal precision. This is not big enough to store a 128-bit integer which would require 39 digits. The max value is 340,282,366,920,938,463,463,374,607,431,768,211,455 for unsigned 128-bit ints or 170,141,183,460,469,231,731,687,303,715,884,105,727 for signed 128-bit ints. These are both 39-digit numbers.
If I can't store 128 bits, then how many bits of integer data can I store in a Number?
DynamoDB attribute of type Number can store 126-bit integers (or 127-bit unsigned integers, with serious caveats).
According to Amazon's documentation:
Numbers can have up to 38 digits precision. Exceeding this results in an exception.
This means (verified by testing in the AWS console) that the largest positive integer and smallest negative integers, respectively, that DynamoDB can store in a Number attribute are:
99,999,999,999,999,999,999,999,999,999,999,999,999 (aka 10^38-1)
-99,999,999,999,999,999,999,999,999,999,999,999,999 (aka -10^38+1)
These numbers require 126 bits of storage, using this formula:
bits = floor (ln(number) / ln (2))
= floor (87.498 / 0.693)
= floor (126.259)
= 126
So you can safely store a 126-bit signed int in a DynamoDB.
If you want to live dangerously, you can store a 127-bit unsigned int too, but there are some caveats:
You'd need to avoid (or at least be very careful) using such a number as a sort key, because values with a most-significant-bit of 1 will sort as negative numbers.
Your app will need to convert unsigned ints to signed ints when storing them or querying for them in DynamoDB, and will also need to convert them back to unsigned after reading data from DynamoDB.
If it were me, I wouldn't take these risks for one extra bit without a very, very good reason.
One logical question is whether 126 (or 127 given the caveats above) is good enough to store a UUID. The answer is: it depends. If you are in control of the UUID generation, then you can always shave a bit or two from the UUID and store it. If you shave from the 4 "version" bits (see format here) then you may not be losing any entropy at all if you are always generating UUIDs with the same version.
However, if someone else is generating those UUIDs AND is expecting lossless storage, then you may not be able to use a Number to store the UUID. But you may be able to store it if you restrict clients to a whitelist of 4-8 UUID versions. The largest version now is 5 out of a 0-15 range, and some of the older versions are discouraged for privacy reasons, so this limitation may be reasonable depending on your clients and whether they adhere to the version bits as defined in RFC 4122.
BTW, I was surprised that this bit-limit question wasn't already online... at least not in an easily-Google-able place. So contributing this Q&A pair so future searchers can find it.
I've got a file (GIF type if that matters) that is encrypted using the XOR algorithm.
The only thing i have is the encrypted text so no key or plain text. Now i was wondering how i can brute force this file to get the (symmetrical) key to eventually decrypt it.
IF i'm not mistaken it should be a 10 byte key. I've looked into using john the ripper but i almost only see that being used to brute force accounts.
Also if it is relevant, i do not have a file which could contain the key so it would have to self generate it's possible keys.
update:
now i found a way to generate all possible hexadecimal keys, now i'll have to encrypt the file again with the xor algorithm to decrypt it if this makes sense. Now performing this operation is not gonna be a problem but how do i check if the encryption to decrypt worked when it had the correct key so basicly it stops trying any further?
You (and #gusto2) are exactly correct on using the magic number: you immediately get the first 6 bytes of the key by knowing that the first 6 bytes are GIF89a.
Following the gif specification, we can learn more of the key. Here are a few tips, where I am numbering the bytes of your file from index 0 (so bytes 0-5 correspond to the magic number):
The last byte of a plaintext gif file is 0x3B. This possibly gives you one more byte of the key (depending on the file size, e.g. if the file size is equiv to 7, 8, or 9 modulo 10 then you get the key byte)
After the magic number is a 7 byte Logical Screen Descriptor. The first 4 bytes tell the width and height: if you knew the width and height of your gif, then you would be able to derive the remaining 4 unknown bytes of the key. Let's assume you don't know it.
Byte 10 of the file you will know because it corresponds to key byte 0 in your XOR encryption. When you decrypt that byte, the most significant bit is the Global Color Table Flag. If this bit is 0, then there is no Global Color Table -- which means that the next byte (byte 11) is either an image (byte is 0x2C) block or an extension (0x21) block. Again, you can decrypt this byte (because it corresponds to key byte 1) so you know exactly what it is.
Images come in image blocks starting with 0x2C and ending with 00.
There are two approaches you can do to decrypt this:
(1) Work by hand, as I am describing it above. You should be able to interpret the blocks, and look for the expected key byte values of 0x2c, 0x21, 0x00, and 0x3b. From there you can figure out what makes sense to be next, and derive key bytes by hand; or
(2) You brute force the last 4 bytes (2^32 possible values). For each guess, you decrypt the candidate gif image and then feed the result into a gif parser (example parser ) to see if it barfs or not. If it barfs, then you know that candidate is wrong. If it does not, then you have a possible real decryption and you save it. At the end, you look through your real candidates one-by-one (you don't expect many candidates) to see which one is the right decryption.
EDIT: You said that the width and height are 640 and 960. That means that bytes 6 and 7 will be the little endian representation of 640 and then 960 in little endian for bytes 8 and 9. You should have the entire key from this. Try it and let us know if it works. Make sure you get the endianess right!
I have a database-driven web application where the primary keys of all data rows are obfuscated as follows: SHA256(content type + primary key + secret), truncated to the first 8 characters. The content type is a simple word, e.g. "post" or "message" and the secret is a 20-30 char ASCII constant. The result is stored in a separate indexed column for fast DB lookup.
How do I calculate the probability of a hash collision in this scenario? I am not a mathematician at all, but a friend claimed that due to the Birthday Paradox the collision probability would be ~1% for 10,000 rows with an 8-char truncation. Is there any truth to this claim?
Yes, there is a collision probability & it's probably somewhat too high. The exact probability depends on what "8 characters" means.
Does "8 characters" mean:
A) You store 8 hex characters of the hash? That would store 32 bits.
B) You store 8 characters of BASE-64? That would store 48 bits.
C) You store 8 bytes, encoded in some single-byte charset/ or hacked in some broken way into a character encoding? That would store 56-64 bits, but if you don't do encoding right you'll encounter character conversion problems.
D) You store 8 bytes, as bytes? That genuinely stores 64 bits of the hash.
Storing binary data as either A) hex or D) binary bytes, would be my preferred options. But I'd definitely recommend either reconsidering your "key obfuscation" scheme or significantly expanding the stored key-size to reduce the (currently excessive) probability of key collision.
From Wikipedia:
https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
The birthday problem in this more generic sense applies to hash functions: the expected number of N-bit hashes that can be generated before getting a collision is not 2^N, but rather only 2^(N/2).
Since in the most conservative above understanding of your design (reading it as A, 8 chars of hex == 32 bits) your scheme would be expected to suffer collisions if it stored on the scale of ~64,000 rows. I would consider such an outcome unacceptable for all serious, or even toy, systems.
Transaction tables may have volumes, allowing growth for the business, from 1000 - 100,000 transactions/day (or more). Systems should be designed to function 100 years (36500 days), with a 10x growth factor built in, so..
For your keying mechanism to be genuinely robust & professionally useful, you would need to be able to scale it up to potentially handle ~36 billion (2^35) rows without collision. That would imply 70+ bits of hash.
The source-control system Git, for example, stores 160 bits of SHA-1 hash (40 chars of hex == 20 bytes or 160 bits). Collisions would not be expected to be probable with < less than 2^80 different file revisions stored.
A possibility better design might be, rather than hashing & pseudo-randomizing the key entirely & hoping (against hope) to avoid collisions, to prepend/ append/ fold-in 8-10 bits of a hash into the key.
This would generates a larger key, containing all the uniqueness of the original key plus 8-10 bits of verification. Attempts to access keys would then be verified, and more than 3 invalid requests would be treated as an attempt to violate security by "probing" the keyspace & would trigger semi-permanent lockout.
The only major costs here, would be a modest reduction in the size of the available keyspace for a given int-size. 32-bit int to/from the browser would have 8-10 bits dedicated to security, thus leaving 22-24 for the actual key. So you'd use 64-bit ints where that was not sufficient.
Original problem:
What is the right column format for a unix timestamp?
The net is full of confusion: some posts claim SQLite has no unsigned types - either whatsoever, or with exception of the 64bit int type (but there are (counter-)examples that invoke UNSIGNED INTEGER). The data types page mentions it only in a bigint example. It also claims there is a 6-byte integer but doesn't give a name for it. It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers. I've heard that some systems return 64-bit timestamps too. OTOH I'm not too fond of wasting 4 bytes to store 1 extra bit (top bit of timestamp), and even if I have to pick a bigger data format, I'd rather go for the 6-byte one. I've even seen a post that claims SQLite unix timestamp is of type REAL...
Complete problem:
Could someone please clarify that mess?
The size of an integer
All columns in SQLite databases are internally variable-width. The file format stores integers in 1, 2, 3, 4, 6, or 8 bytes, depending on how big the number is, plus one byte in the header to indicate the size. So, in total, Unix dates stored as integers will take up 5 bytes until 2038-01-19 and 7 bytes after that.
From the point of view of the user of the C API, all integers are signed 64-bit.
The column type
It doesn't matter whether you declare your column as INTEGER, UNSIGNED INTEGER, BIGINT, or whatever. Anything with "INT" in it has integer affinity. And, as mentioned above, all integers are signed 64-bit but not usually stored that way.
SQLite does not have unsigned types. That's directly from the main author, as well as the docs. Moreover, it doesn't have fixed column widths for integers; the actual on-disk width is an implementation detail.
SQLite has no date or time datatype. However, it has date functions that can operate on ISO8601 strings (TEXT), Julian day numbers (REAL), and Unix timestamps (INTEGER).
So if you decide to make your time field a Unix timestamp, know that it can store up to 64-bit signed integers, but values you store now should actually occupy 32 bits on disk, even if the source value is a 64-bit time_t.
My preference would be for a 64-bit integer. The classic case of an unsigned 32-bit integer is with seconds since 1970-01-01 runs out in 2038. See http://en.wikipedia.org/wiki/Unix_time and http://en.wikipedia.org/wiki/Year_2038_problem . With a 64-bit unsigned integer, you're safe
Could you give an example of what you mean by "It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers."?
If you haven't already I'd suggest reading SQLite docs on datatypes (section 1.2 Date and Time Datatype) and date and time functions.
If you're on an embedded system where the memory situation is critical, you can consider dropping precision by shifting the 64-bit value several bits (resulting in a precision of 2, 4, 8... seconds instead of 1 sec) and using a 32-bit value to store it.
What is the minimum number of bits needed to represent a single character of encrypted text.
eg, if I wanted to encrypt the letter 'a', how many bits would I require. (assume there are many singly encrypted characters using the same key.)
Am I right in thinking that it would be the size of the key. eg 256 bits?
Though the question is somewhat fuzzy, first of all it would depend on whether you use a stream cipher or a block cipher.
For the stream cipher, you would get the same number of bits out that you put in - so the binary logarithm of your input alphabet size would make sense. The block cipher requires input blocks of a fixed size, so you might pad your 'a' with zeroes and encrypt that, effectively having the block size as a minimum, like you already proposed.
I'm afraid all the answers you've had so far are quite wrong! It seems I can't reply to them, but do ask if you need more information on why they are wrong. Here is the correct answer:
About 80 bits.
You need a few bits for the "nonce" (sometimes called the IV). When you encrypt, you combine key, plaintext and nonce to produce the ciphertext, and you must never use the same nonce twice. So how big the nonce needs to be depends on how often you plan on using the same key; if you won't be using the key more than 256 times, you can use an 8 bit nonce. Note that it's only the encrypting side that needs to ensure it doesn't use a nonce twice; the decrypting side only needs to care if it cares about preventing replay attacks.
You need 8 bits for the payload, since that's how many bits of plaintext you have.
Finally, you need about 64 bits for the authentication tag. At this length, an attacker has to try on average 2^63 bogus messages minimum before they get one accepted by the remote end. Do not think that you can do without the authentication tag; this is essential for the security of the whole mode.
Put these together using AES in a chaining mode such as EAX or GCM, and you get 80 bits of ciphertext.
The key size isn't a consideration.
You can have the same number of bits as the plaintext if you use a one-time pad.
This is hard to answer. You should definitely first read up on some fundamentals. You can 'encrypt' an 'a' with a single bit (Huffman encoding-style), and of course you could use more bits too. A number like 256 bits without any context is meaningless.
Here's something to get you started:
Information Theory -- esp. check out Shannon's seminal paper
One Time Pad -- infamous secure, but impractical, encryption scheme
Huffman encoding -- not encryption, but demonstrates the above point