SQlite: Column format for unix timestamp; Integer types - sqlite

Original problem:
What is the right column format for a unix timestamp?
The net is full of confusion: some posts claim SQLite has no unsigned types - either whatsoever, or with exception of the 64bit int type (but there are (counter-)examples that invoke UNSIGNED INTEGER). The data types page mentions it only in a bigint example. It also claims there is a 6-byte integer but doesn't give a name for it. It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers. I've heard that some systems return 64-bit timestamps too. OTOH I'm not too fond of wasting 4 bytes to store 1 extra bit (top bit of timestamp), and even if I have to pick a bigger data format, I'd rather go for the 6-byte one. I've even seen a post that claims SQLite unix timestamp is of type REAL...
Complete problem:
Could someone please clarify that mess?

The size of an integer
All columns in SQLite databases are internally variable-width. The file format stores integers in 1, 2, 3, 4, 6, or 8 bytes, depending on how big the number is, plus one byte in the header to indicate the size. So, in total, Unix dates stored as integers will take up 5 bytes until 2038-01-19 and 7 bytes after that.
From the point of view of the user of the C API, all integers are signed 64-bit.
The column type
It doesn't matter whether you declare your column as INTEGER, UNSIGNED INTEGER, BIGINT, or whatever. Anything with "INT" in it has integer affinity. And, as mentioned above, all integers are signed 64-bit but not usually stored that way.

SQLite does not have unsigned types. That's directly from the main author, as well as the docs. Moreover, it doesn't have fixed column widths for integers; the actual on-disk width is an implementation detail.
SQLite has no date or time datatype. However, it has date functions that can operate on ISO8601 strings (TEXT), Julian day numbers (REAL), and Unix timestamps (INTEGER).
So if you decide to make your time field a Unix timestamp, know that it can store up to 64-bit signed integers, but values you store now should actually occupy 32 bits on disk, even if the source value is a 64-bit time_t.

My preference would be for a 64-bit integer. The classic case of an unsigned 32-bit integer is with seconds since 1970-01-01 runs out in 2038. See http://en.wikipedia.org/wiki/Unix_time and http://en.wikipedia.org/wiki/Year_2038_problem . With a 64-bit unsigned integer, you're safe

Could you give an example of what you mean by "It seems my tries with INTEGER being 4-byte signed signed store unix timestamps as negative numbers."?
If you haven't already I'd suggest reading SQLite docs on datatypes (section 1.2 Date and Time Datatype) and date and time functions.

If you're on an embedded system where the memory situation is critical, you can consider dropping precision by shifting the 64-bit value several bits (resulting in a precision of 2, 4, 8... seconds instead of 1 sec) and using a 32-bit value to store it.

Related

How many bits of integer data can be stored in a DynamoDB attribute of type Number?

DynamoDB's Number type supports 38 digits of decimal precision. This is not big enough to store a 128-bit integer which would require 39 digits. The max value is 340,282,366,920,938,463,463,374,607,431,768,211,455 for unsigned 128-bit ints or 170,141,183,460,469,231,731,687,303,715,884,105,727 for signed 128-bit ints. These are both 39-digit numbers.
If I can't store 128 bits, then how many bits of integer data can I store in a Number?
DynamoDB attribute of type Number can store 126-bit integers (or 127-bit unsigned integers, with serious caveats).
According to Amazon's documentation:
Numbers can have up to 38 digits precision. Exceeding this results in an exception.
This means (verified by testing in the AWS console) that the largest positive integer and smallest negative integers, respectively, that DynamoDB can store in a Number attribute are:
99,999,999,999,999,999,999,999,999,999,999,999,999 (aka 10^38-1)
-99,999,999,999,999,999,999,999,999,999,999,999,999 (aka -10^38+1)
These numbers require 126 bits of storage, using this formula:
bits = floor (ln(number) / ln (2))
= floor (87.498 / 0.693)
= floor (126.259)
= 126
So you can safely store a 126-bit signed int in a DynamoDB.
If you want to live dangerously, you can store a 127-bit unsigned int too, but there are some caveats:
You'd need to avoid (or at least be very careful) using such a number as a sort key, because values with a most-significant-bit of 1 will sort as negative numbers.
Your app will need to convert unsigned ints to signed ints when storing them or querying for them in DynamoDB, and will also need to convert them back to unsigned after reading data from DynamoDB.
If it were me, I wouldn't take these risks for one extra bit without a very, very good reason.
One logical question is whether 126 (or 127 given the caveats above) is good enough to store a UUID. The answer is: it depends. If you are in control of the UUID generation, then you can always shave a bit or two from the UUID and store it. If you shave from the 4 "version" bits (see format here) then you may not be losing any entropy at all if you are always generating UUIDs with the same version.
However, if someone else is generating those UUIDs AND is expecting lossless storage, then you may not be able to use a Number to store the UUID. But you may be able to store it if you restrict clients to a whitelist of 4-8 UUID versions. The largest version now is 5 out of a 0-15 range, and some of the older versions are discouraged for privacy reasons, so this limitation may be reasonable depending on your clients and whether they adhere to the version bits as defined in RFC 4122.
BTW, I was surprised that this bit-limit question wasn't already online... at least not in an easily-Google-able place. So contributing this Q&A pair so future searchers can find it.

What is the Realm native integer size? Int vs Int8, Int16, Int32

TL;DR: I'm building a data set to share between iOS and Android. Should I tweak the integer sizes to match their actual ranges, or just make everything Integer and use Int in Swift and long in both Java and Swift?
In a typical SQL database, storing a large number of 4-byte integers would take ~4x more space than a 1-byte integer[1]. However, I read in this answer that integers are stored bit-packed and in the Realm Java help that The integer types byte, short, int, and long are all mapped to the same type (long actually) within Realm. So, reading between the lines, it seems that the on disk storage will be the same regardless of what integer sub-type I use.
So, from a pure Realm / database perspective should I just use Int & long in Swift & Java respectively? (I.e. leaving aside language differences, like casting, in-memory size etc.)
If an integer field is indexed, does that make any difference to the type chosen?
PS: Many thanks to the Realm team for their great docs and good support here on SO!
[1] Yes, I know it's more complicated than that.
Your interpretation is correct: the same underlying storage type is used for all integer types in Realm, and that storage type adjusts the number of bits it uses per value based on the range of values stored. For example, if you only store values in the range 0-15 then each value will use fewer bits than if you store values in the range 0-65,535. Similarly, all indexes on integer properties use a common storage type.

Cassandra date as column with micro or nano level precision

I have a column family with datetime value as the column name and the associated value. But since date has a precision at millisecond I am limited to store event recorded at millisecs. How shall I store events recorded at micro or nano level? Also I wish to store them in order and query columns between 2 datetime.
Thanks.
Off the top of my head I don't know if nano time fits in 64 bits, but if it does you can use the BIGINT data type (in CQL3, Long in Thrift). If 64 bits isn't enough use VARINT (CQL3, not sure what the equivalent in Thrift is), which supports arbitrarily big numbers. For the rest of the requirements it will work exactly the same as if you were using TIMESTAMP or INT data type (except that you wouldn't get a Date or integer back, but a long or BigInteger -- or the equivalent in the language you're using).

CoreData NSDate SQLite precision discrepancy

I've seen a few questions which float around this topic but nothing which quite matches.
I'm creating a Timestamp as part of a primary key. The timestamp is set with [NSDate date] and is stored in a SQLite store. When I look into the store the dates have full precision (up to 7 decimal places indicating 100 nanosecond precision).
I have services running on a server which I need to send the data to and retrieve the date from. The sending process serializes the data and in order to send the date with the required precision (stored as datetime2 in SQL Server) I use:
NSDateFormatter *dateFormat = [[NSDateFormatter alloc]init];
[dateFormat setDateFormat:#"yyyy-MM-dd'T'HH:mm:ss.SSSSSSS'Z'"];
NSString *stringFromDate = [dateFormat stringFromDate:dateTime];
and vice versa when retrieving the date from the server.
The problem is that the date being retrieved from the store is for some reason ONLY at millisecond precision. So if I have a Timestamp in the store as 341196557.808558, retrieve into an NSDate and then use the above code to generate a string is reads as "2011-10-24'T'08:48:17.8090000".
This gets sent to the server which dutifully stores it as millisecond precision (because that's all it's getting). When I then retrieve the date, deserialize it and try and use a predicate fetch against the store it doesn't return the record because the dates aren't equal. Doing a < or > comparison won't work because it's a primary key... I need ==
I wouldn't mind dropping to millisecond precision on the timestamp prior to saving (if I could work out how) but it seems extremely odd to me that the original date can store and save microsecond precision but doesn't keep the same precision when retrieving the date from the store?
Would love any thoughts on this, or that gem one-liner which sorts this mess out...
While I do not know why the nanosecond precision is being dropped (that surprises me and you should file a radar on it) what I can say is that:
Timestamps are a terrible primary key. If you can change that I would highly recommend it.
If you must use Timestamps then I suggest storing the -timeIntervalSinceReferenceDate in Core Data instead and then reconstruct the date whenever you need to send it to the server.
Update
Timestamps are a pain (as you are experiencing) and wasteful from a purely DB point of view since you never use all of them in order.
Depending on what you are trying to do a INT64 incremented (not easy with Core Data) works well. If you are using the timestamp for more than just part of the unique key then I would just keep it as a NSTimeInterval (double) as I suggested above. That will keep your precision and avoid messing with strings. Keep in mind that strings can be quite slow.

SQLite dataypes lengths?

I'm completely new to SQLite (actually 5 minutes ago), but I do know somewhat the Oracle and MySql backends.
The question: I'm trying to know the lengths of each of the datatypes supported by SQLite, such as the differences between a bigint and a smallint. I've searched across the SQLite documentation (only talks about affinity, only matters it?), SO threads, google... and found nothing.
My guess: I've just slightly revised the SQL92 specifications, which talk about datatypes and its relations but not about its lengths, which is quite obvious I assume. Yet I've come accross the Oracle and MySql datatypes specs, and the specified lengths are mostly identical for integers at least. Should I assume SQLite is using the same lengths?
Aside question: Have I missed something about the SQLite docs? Or have I missed something about SQL in general? Asking this because I can't really understand why the SQLite docs don't specify something as basic as the datatypes lengths. It just doesn't make sense to me! Although I'm sure there is a simple command to discover the lengths.. but why not writing them to the docs?
Thank you!
SQLite is a bit odd when it comes to field types. You can store any type in any field (I.E. put a blob into an integer field). The way it works for integers is: it depends.
While your application may use a long (64 bits) to store the value, if it is actually <128 then SQLite will only use one byte to store it. If the value is >=128 and <16384 then it will use 2 bytes. The algorithm (as I recall) is that it uses 7 bits of each byte with the 8th bit used to indicate if another byte is needed. This works very well for non-negitive values but causes all negative values to take 9 bytes to store.
In SQLite datatypes don't have lengths, values have lengths. A column you define as TINYINT could hold a BLOB, or visa versa.
I'm completely new to the SQLite documentation, but I found it in less than 30 seconds.
Datatypes In SQLite Version 3
INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
8 byte >> mix size is max: 9223372036854775807

Resources