I am working in a project which uses 32 bit unsigned integer for storing the timestamp values. When the project was deployed on the field, after about 49 days the counter overflowed.
We have decided to change the 32 bit unsigned integer to 64 bit unsigned integer. But simply replacing the 32 bit integer with 64 bit integer will not work, as the counter variable is used at so many places and we will not be sure if the replacements will actually work. Moreover, if this counter value is passed to some function as parameter or returned from function, the function needs to be modified as well. This is tedious.
Is there a smooth way to migrate with minimum fuss, so that there is is no functional difference between the original project and the modified project ?
I am expecting a linux tool or a compiler short cut method to solve this.
Related
I have noticed int64 type precision lost, in my project Firebase realtime database:
When I add a new child or edit the child value in browser;
When I add a new child or edit the child value via my (C++) code: as SetValue(int64_t) or even as SetValue(firebase::Variant::kTypeInt64);
The precision lost starts after 53 bits:
// 9007199254740991 <- I set 53 bits value.."11111111111111111111111111111111111111111111111111111";
// 9007199254740991 -> it records correctly;
// 18014398509481983 <- I set 54 bits value."111111111111111111111111111111111111111111111111111111";
// 18014398509481984 -> it records as......"1000000000000000000000000000000000000000000000000000000";
// seems it declared as int64_t but saved as float?
Can someone reproduce it?
Is it bug or feature?
Any time you work with data in the Firebase console, it's going to be subject to the limits of JavaScript. JS always stores numbers of 64 bit floating point numbers (not integers). So, if you're dealing with 64 bit integers, some of that precision gets lost if you're using JS.
Try ignoring what you see in the console, and only perform reads and writes with code that handles 64 bit integers correctly. If that doesn't work, then there is something inside Realtime Database that imposes a similar limitation. The documentation doesn't make any claims about the precision of numbers, as far as I can see. However, Firestore (the successor to Realtime Database) does make a claim that 64bit integers are OK. So if this is important to you, you might want to switch (though it still does have problems with those numbers in the console, due to JS limits).
DynamoDB's Number type supports 38 digits of decimal precision. This is not big enough to store a 128-bit integer which would require 39 digits. The max value is 340,282,366,920,938,463,463,374,607,431,768,211,455 for unsigned 128-bit ints or 170,141,183,460,469,231,731,687,303,715,884,105,727 for signed 128-bit ints. These are both 39-digit numbers.
If I can't store 128 bits, then how many bits of integer data can I store in a Number?
DynamoDB attribute of type Number can store 126-bit integers (or 127-bit unsigned integers, with serious caveats).
According to Amazon's documentation:
Numbers can have up to 38 digits precision. Exceeding this results in an exception.
This means (verified by testing in the AWS console) that the largest positive integer and smallest negative integers, respectively, that DynamoDB can store in a Number attribute are:
99,999,999,999,999,999,999,999,999,999,999,999,999 (aka 10^38-1)
-99,999,999,999,999,999,999,999,999,999,999,999,999 (aka -10^38+1)
These numbers require 126 bits of storage, using this formula:
bits = floor (ln(number) / ln (2))
= floor (87.498 / 0.693)
= floor (126.259)
= 126
So you can safely store a 126-bit signed int in a DynamoDB.
If you want to live dangerously, you can store a 127-bit unsigned int too, but there are some caveats:
You'd need to avoid (or at least be very careful) using such a number as a sort key, because values with a most-significant-bit of 1 will sort as negative numbers.
Your app will need to convert unsigned ints to signed ints when storing them or querying for them in DynamoDB, and will also need to convert them back to unsigned after reading data from DynamoDB.
If it were me, I wouldn't take these risks for one extra bit without a very, very good reason.
One logical question is whether 126 (or 127 given the caveats above) is good enough to store a UUID. The answer is: it depends. If you are in control of the UUID generation, then you can always shave a bit or two from the UUID and store it. If you shave from the 4 "version" bits (see format here) then you may not be losing any entropy at all if you are always generating UUIDs with the same version.
However, if someone else is generating those UUIDs AND is expecting lossless storage, then you may not be able to use a Number to store the UUID. But you may be able to store it if you restrict clients to a whitelist of 4-8 UUID versions. The largest version now is 5 out of a 0-15 range, and some of the older versions are discouraged for privacy reasons, so this limitation may be reasonable depending on your clients and whether they adhere to the version bits as defined in RFC 4122.
BTW, I was surprised that this bit-limit question wasn't already online... at least not in an easily-Google-able place. So contributing this Q&A pair so future searchers can find it.
I'm looking for what I'll call a 'binary serializer/deserializer code generator' for lack of a better term that specifically allows you to specify the on-the-wire format with arbitrary bit lengths and then generates the necessary C/C++ code to pack/unpack packets in that format. I started down the path of using a struct with bit fields but after reading this post I'm wondering if there's already something out there that handles all the messy problems. An example data structure I would need to deal with:
struct header {
unsigned int val1 : 8;
unsigned int val2 : 24;
unsigned int val3 : 16
unsigned int val4 : 2;
unsigned int val5 : 3;
unsigned int val6 : 1;
unsigned int val7 : 10;
}
The motivation for keep the fields of the data structure like that is that it makes the programmers job easier to set/get the fields based on a what they match in the protocol, ex. val5 might be a meaningful 3 bit flag. Yes I could just have two 32 bit values for the whole struct and have to use bit masks and stuff to keep track of everything but why?
I'm aware of things like Google Proto Buf and the like, but AFAIK these all focus on the programmer side data structure and don't allow you to specify specific bit patterns - imagine trying to create the client code for low level protocols where the binary wire format is how it's specified. The closest thing I've found is protlr which sounds great except it doesn't appear to be FOSS. Other posts on SO point to:
RedBlocks which appears to be part of a full blown embedded framework.
PADS which seems extremely stale and overly complicated for my needs.
binpac which sounds interesting but I can't find an example of using it to parse arbitrary bit lengths (e.g. 1 bit, 2 bit, 17 bit fields) or if it also has a serialization method since it seems to be focused on one way deserialization for intrusion detection.
Is there a FOSS alternative that meets my criteria besides rolling yet another serialization format, or can someone provide an example using one of these references for the structure above?
You might consider ASN.1 for this and use PER (aligned or unaligned). You can use either BIT STRING types constrained to your needed lengths, or INTEGER types with constraints to limit values to the number of bits you would like. Since ASN.1 and its encoding rules are independent of machine architecture and programming language, you don't have to worry about whether your machine is big-endian or little-endian, or whether one end of the communications prefers Java rather than C or C++. A good ASN.1 Tool handles all of that for you. You can find out more about ASN.1 at the ASN.1 Project page which has a link Introduction to ASN.1 as well as a list of ASN.1 Tools (some free some commercial). The reason I mention UNALIGNED PER is that you can literally send exactly the number of bits across that line as you desire with no added padding bits between.
For BIT STRINGS, you can even assign names to individual bits that have some meaning to you for your application.
I assume an implementation in a language that allows you to treat pointers as integers, including doing the standard arithmetic on them. If this is unrealistic due to hardware constraints, please let me know. If programming languages normally don't have such powerful pointer arithmetic, but it is feasible in practice, then I still want to know if this approach to implementing BigInt is feasible.
I don't really have any experience with low-level - as in, programming with pointers - programming, so many of my assumptions in the following might be wrong.
Question: As far as I know, implementing BigInt - arbitrary precision/size integer - might be done by a dynamic array of integers, which grows as needed. Then the data structure might be represented as a pointer to the array of integers. But, assuming that the pointer is an integer just like the ones in the array, and that one can do pointer arithmetic on that pointer, then is it feasible to represent the value of the BigInt by simply using that pointer? Then one can avoid the indirection for small values of integers.
Since the pointer could either be a real pointer to a memory address to an array of integers, you would have to have some way of knowing that you should treat it as a pointer or as an integer value. Especially since its role through the life cycle of the BigInt. Suppose you do this by setting the most significant bit to 1 if the pointer is really a pointer, 0 otherwise. As far as it being an integer is concerned, this seems simple enough: Check if is set to 1 before doing anything with it. If it is not, do whatever arithmetic on that in particular and see if it has overflown, and do the appropriate thing if it did.
But this has a problem: does the pointer use its full range to point to memory addresses? If so, then there doesn't seem to be any way of using any bit-pattern to distinguish integers from pointers: every bit-pattern is a potential memory address. It seems reasonable that pointers are represented as signed integers, though to my mind they might also be represented as unsigned integers if that makes the implementation simpler.
So, if pointers are signed; then you don't seem to be able to use pointers are integers, for this purpose. If so, is it feasible (that is: efficient compared to the alternatives) to represent the BigInt as a struct (or record, if you want) with two members; the pointer to an array and an integer that is used when the value of the BigInt is small? If the pointer to the array is null, then the integer is used. If not, use the pointer to the array and ignore the int in the struct. This makes for a more "bloated" data structure, but it might help when it comes to avoiding indirection sometimes, assuming that you don't need a pointer to this struct and can pass it around as a value.
Another question: Is this done in practice?
On 32-bit machines, the lower two bits on pointers are almost always 0 because addresses are 32-bit aligned. Similarly, on 64-bit machines, the lower three bits will be 0.
You can use this fact to use the least-significant bit of the pointer to tag whether it's a number or not. One simple option would be to set the LSB to 1 if it's a number and to 0 if it's a pointer. When performing arithmetic computations, you first check the LSB to see whether you have a pointer or an integer. If it's 1, you arithmetically right-shift the number over one bit to get the real integer value, then use that value in the computation. If it's 0, you simply follow the pointer to the representation.
You could conceivably use the fact that you have 2 or 3 bits of space to encode more possible representations. For example, you could have the number be either an integer, a pointer to a fixed-sized buffer, or a pointer to a variable-sized buffer, using the free pointer bits to encode which case you happen to be in.
Hope this helps!
Original problem:
What is the right column format for a unix timestamp?
The net is full of confusion: some posts claim SQLite has no unsigned types - either whatsoever, or with exception of the 64bit int type (but there are (counter-)examples that invoke UNSIGNED INTEGER). The data types page mentions it only in a bigint example. It also claims there is a 6-byte integer but doesn't give a name for it. It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers. I've heard that some systems return 64-bit timestamps too. OTOH I'm not too fond of wasting 4 bytes to store 1 extra bit (top bit of timestamp), and even if I have to pick a bigger data format, I'd rather go for the 6-byte one. I've even seen a post that claims SQLite unix timestamp is of type REAL...
Complete problem:
Could someone please clarify that mess?
The size of an integer
All columns in SQLite databases are internally variable-width. The file format stores integers in 1, 2, 3, 4, 6, or 8 bytes, depending on how big the number is, plus one byte in the header to indicate the size. So, in total, Unix dates stored as integers will take up 5 bytes until 2038-01-19 and 7 bytes after that.
From the point of view of the user of the C API, all integers are signed 64-bit.
The column type
It doesn't matter whether you declare your column as INTEGER, UNSIGNED INTEGER, BIGINT, or whatever. Anything with "INT" in it has integer affinity. And, as mentioned above, all integers are signed 64-bit but not usually stored that way.
SQLite does not have unsigned types. That's directly from the main author, as well as the docs. Moreover, it doesn't have fixed column widths for integers; the actual on-disk width is an implementation detail.
SQLite has no date or time datatype. However, it has date functions that can operate on ISO8601 strings (TEXT), Julian day numbers (REAL), and Unix timestamps (INTEGER).
So if you decide to make your time field a Unix timestamp, know that it can store up to 64-bit signed integers, but values you store now should actually occupy 32 bits on disk, even if the source value is a 64-bit time_t.
My preference would be for a 64-bit integer. The classic case of an unsigned 32-bit integer is with seconds since 1970-01-01 runs out in 2038. See http://en.wikipedia.org/wiki/Unix_time and http://en.wikipedia.org/wiki/Year_2038_problem . With a 64-bit unsigned integer, you're safe
Could you give an example of what you mean by "It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers."?
If you haven't already I'd suggest reading SQLite docs on datatypes (section 1.2 Date and Time Datatype) and date and time functions.
If you're on an embedded system where the memory situation is critical, you can consider dropping precision by shifting the 64-bit value several bits (resulting in a precision of 2, 4, 8... seconds instead of 1 sec) and using a 32-bit value to store it.