SQLITE3 store text and numbers in the same column - sqlite

In the SQLITE3 Docu you can find:
"SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases should work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases."
I have a column with different values some are integer, some float and some text. I use BLOB als storage class and it works fine. But i'm a little suspicious of this solution. Is this the right way to store numbers and text in the same column?

TLDR; you are not doing it wrong, but I also think you are not doing what you think you are doing.
I don't think you are using a BLOB storage class, but a BLOB type affinity. In SQLite there are 2 concepts of datatype. Storage class, and type affinity.
Storage class is the way a given value is stored in memory, it describes how bit patterns represent values. E.g. while numerically equal, the integer 1 and the floating point 1.0 are represented by very different bit patterns (at least in the memory, but curiously "As an internal optimization, small floating point values with no fractional component and stored in columns with REAL affinity are written to disk as integers").
There are 5 storage classes, NULL, INTEGER, REAL, TEXT, and BLOB.
Type affinity on the other hand, is not a value- but a column-level concept. Type affinity is a characteristic of a given column. There are also 5 type affinities, NUMBER, INTEGER, REAL, TEXT, and BLOB.
The first thing you might notice, is that there is no 1-1 mapping between storage classes and affinities. That's because they are indeed almost completely unrelated. Consider the following
sqlite> CREATE TABLE Test(A TEXT, B BLOB, C REAL);
sqlite> INSERT INTO Test VALUES ('text', 'also a text', 'another one');
sqlite> INSERT INTO Test VALUES (10, 10.1, 10.11111111111111111111111);
sqlite> SELECT A, TYPEOF(A), B, TYPEOF(B), C, TYPEOF(C) FROM Test;
text|text|also a text|text|another one|text
10|text|10.1|real|10.1111111111111|real
As you can see, I defined the columns with types TEXT, BLOB, and REAL. Those are the affinities. But I was able to insert values of different types into them, regardless of their affinities. E.g. column C now both stores 'another one', of storage class TEXT, and 10.1111111111111 of storage class REAL.
Normally you should not be too concerned about storage classes, as you can see from my example, SQLite just automatically and correctly inferred them from the data I inserted.
Type affinity is more important. That's what describes how your data is compared, sorted, how it's uniqueness is determined. E.g.
sqlite> CREATE Table T(A TEXT);
sqlite> INSERT INTO T VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM T ORDER BY A ASC;
0.0
1
2
sqlite> CREATE TABLE R(A REAL);
sqlite> INSERT INTO R VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM R ORDER BY A ASC;
0.0
1.0
2.0
sqlite> CREATE TABLE B(A BLOB);
sqlite> INSERT INTO B VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM B ORDER BY A ASC;
0.0
2
1
So for practical purposes, it is more closely related to your intuitive idea of a datatype. Think about what your data means, and choose the affinity accordingly. If you store your values in a column of BLOB affinity, a bitwise comparison will be used on them. If this is what you want, go for it. But if you want a lexicographic comparison you need TEXT affinity, or if you want to do the comparisons based on the numerical values your data represents, then e.g. REAL.

Related

Why does SQLite store strings as text in BLOB columns, rather than bytes?

This behavior has me scratching my head: apparently, when you store a string into a BLOB column, when you query it it doesn't behave like bytes? And, weirder still, when you attempt to perform a BLOB substring, you have to query a length of 2 to get a single byte?
sqlite> create table wtf (a BLOB);
sqlite> insert into wtf (a) values (NULL);
sqlite> insert into wtf (a) values ('a');
sqlite> insert into wtf (a) values (X'61');
sqlite> select * from wtf;
a
a
sqlite> select a = X'61' from wtf;
0
1
sqlite> select HEX(a) from wtf;
61
61
sqlite> select substr(a, 0, 1) from wtf;
sqlite> select substr(a, 0, 2) from wtf;
a
a
Why does SQLite store strings as TEXT in BLOB columns, rather than bytes?
(I'll disregard your imprecise language: consider that everything stored in a computer is "bytes")
SQLite does not enforce column types.
shock! and horror! (...yes)
From the docs (emphasis mine):
In SQLite, the datatype of a value is associated with the value itself, not with its column [...] Flexible typing is a feature of SQLite, not a bug.
Read more here: https://www.sqlite.org/flextypegood.html
When you INSERT INTO table ( thisIsANumericColumn ) VALUES ( 'zzz' ); the SQLite engine is perfectly happy to store TEXT strings as-is, so doing a SELECT thisIsANumericColumn FROM table will result in your SQLite library (or your application code which consumes SQLite's API) needing to perform implicit type conversions if required, which can break at runtime (so you'd get-away with this in NodeJS or PHP, but not in .NET due to how ADO.NET works).
There are at least 3 possible alternative solutions:
Add STRICT to your CREATE TABLE DDL. This instructs SQLite to respect column types, just like a traditional RDBMS.
i.e. CREATE TABLE tbl ( a BLOB NOT NULL ) STRICT;
You must be running SQLite 3.37 (dated 2021-11-27) or later to use STRICT tables.
Simply don't insert incorrectly-typed values in the first place.
Use explicit CHECK constraints to enforce data-type restrictions and other data integrity checks, like value ranges, string length, etc.

sqlite3 binary data type

I am looking at migrating a small sqlite3 db to mysql. I know mysql but new to sqlite3 so have been reading about it online. I used pragma table_info(<table_name>) to get info about the table structure.
From the output I could understand columns with data type TEXT, INTEGER but i do not understand datatype BINARY(32). From sqlite3 documentation on the net there is a BINARY collation, but there is no BINARY datatype. So I just want to understand this this BINARY(32) datatype. Thanks.
SQLite is unusual in datatypes (column types). You can store any type of data in any type of columns with the exception of the rowid column or an alias of the rowid column.
see Rowid Tables
rowid is similar to MySQL AUTO INCREMENT BUT beware of differences
In the example below see how the rowid starts from -100, then -99 .....
AUTOINCREMENT on SQLite is only a constraint as such that enforces that a new id is higher than any existing in the table.
So BINARY, BINARY(32), (rumplestistkin even) are valid for the datatype when defining a column.
However, a column will be given a column affinity and governed by the rules :-
If the column type contains INT the the affinity is INTEGER.
If the column type contains CHAR, CLOB or TEXT, then it's affinity is TEXT.
If the column type contains BLOB then it's affinity is BLOB.
If the column type contains REAL FLOA or DOUB then it's affinity is REAL.
Otherwise the affinity is NUMERIC.
As such BINARY(32) is NUMERIC affinity. However, the column type is of little consequence in regards to storing data. The affinity can affect retrieval a little.
In regard to converting the rules mentioned above could be utilised you could also perhaps find the typeof function of use (example of it's use is in the example along with the results). However, neither will necessarily, indicate how the data is subsequently used which could well be a factor that needs consideration.
SQLite's flexibility with column types aids in converting from other relational databases BUT can be a bit of a hindrance when converting from SQLite.
Note this answer is by no means intended to be comprehensive explanation of the conversion from SQLite to MysQL.
See Datatypes in SQLite
Here's an example that shows that any type can be stored in any column (thus any row/col combination can store different types) :-
DROP TABLE IF EXISTS example;
CREATE TABLE IF NOT EXISTS example (
rowid_alias_must_be_unique_integer INTEGER PRIMARY KEY, -- INTEGER PRIMARY KEY makes the column an alias of the rowid
col_text TEXT,
col_integer INTEGER,
col_real REAL,
col_BLOB BLOB,
col_anyother this_is_a_stupid_column_type
);
INSERT INTO example VALUES (-100,'MY TEXT', 340000,34.5678,x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',100);
INSERT INTO example (col_text,col_integer,col_real,col_blob,col_anyother) VALUES
('MY TEXT','MY TEXT','MY TEXT','MY TEXT','MY TEXT'),
(100,100,100,100,100),
(34.5678,34.5678,34.5678,34.5678,34.5678),
(x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff')
;
SELECT
*,
rowid,
typeof(rowid_alias_must_be_unique_integer),
typeof(col_text),
typeof(col_integer),
typeof(col_real),
typeof(col_blob),
typeof(col_anyother)
FROM example
;
DROP TABLE IF EXISTS example;
Running the above results in (Note different SQLtools handle blobs in different ways, Navicat was used to run the above) :-
note that the typeof function returns the storage type as opposed to the affinity. However, the affinity can affect the storage type.
e.g. if the affinity is text then with the exception of a blob the value is stored as text. (see 2. in Datatype in SQLite above).

Does SQLite actually support DATE type?

After reading https://sqlite.org/datatype3.html which states
"SQLite does not have a storage class set aside for storing dates
and/or times."
but able to run this
CREATE TABLE User (ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, BORN_ON DATE NULL)
and then see it in "DB Browser for SQL" like this:
I start to wonder if SQLite does support Date type of it is just "faking" the support using other types. And even if so why the DB Browser see it as a Date? Any meta info stored inside the DB?
SQLite does not fake Date with Numerics.
There is no Date data type in SQLite.
In Datatypes In SQLite Version 3 it is explained clearly that:
SQLite uses a more general dynamic type system
Instead of data types there are 5 Storage Classes: NULL, INTEGER, REAL, TEXT and BLOB.
Also:
Any column in an SQLite version 3 database, except an INTEGER PRIMARY
KEY column, may be used to store a value of any storage class.
So when you use Date as the data type of a column in the CREATE TABLE statement you are not restricted to store in it only date-like values. Actually you can store anything in that column.
Tools like "DB Browser for SQLite" and others may offer various data types to select from to define a column when you create the table.
The selection of the data type that you make is not restrictive, but it is rather indicative of what type of data you want to store in a column.
In fact, you can create a table without even declaring the data types of the columns:
CREATE TABLE tablename(col1, col2)
or use fictional data types:
CREATE TABLE tablename(col1 somedatatype, col2 otherdatatype)
and insert values of any data type:
INSERT INTO tablename(col1, col2) VALUES
(1, 'abc'),
('XYZ', '2021-01-06'),
(null, 3.5)
Based on what Colonel Thirty Two suggested (read more on the page) it seems that when you declare a field as Date its affinity will be numeric.
So SQLite "fakes" Date with Numerics.
And even if so why the DB Browser see it as a Date? Any meta info stored inside the DB?
Yes, it simply stores the type name used when the column was created. The linked page calls it "declared type". In this case you get NUMERIC affinity (DATE is even given as one of the examples in 3.1.1) and it behaves like any other column with this affinity:
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively. If the TEXT value is a well-formed integer literal that is too large to fit in a 64-bit signed integer, it is converted to REAL. For conversions between TEXT and REAL storage classes, only the first 15 significant decimal digits of the number are preserved. If the TEXT value is not a well-formed integer or real literal, then the value is stored as TEXT. For the purposes of this paragraph, hexadecimal integer literals are not considered well-formed and are stored as TEXT. (This is done for historical compatibility with versions of SQLite prior to version 3.8.6 2014-08-15 where hexadecimal integer literals were first introduced into SQLite.) If a floating point value that can be represented exactly as an integer is inserted into a column with NUMERIC affinity, the value is converted into an integer. No attempt is made to convert NULL or BLOB values.
A string might look like a floating-point literal with a decimal point and/or exponent notation but as long as the value can be expressed as an integer, the NUMERIC affinity will convert it into an integer. Hence, the string '3.0e+5' is stored in a column with NUMERIC affinity as the integer 300000, not as the floating point value 300000.0.
So if you insert dates looking like e.g. "2021-01-05" they will be stored as strings. But
you can also insert strings which don't look like dates.
if you insert "20210105" it will be stored as the number 20210105.
You can use CHECK constraints to prevent inserting non-date strings.
See also https://sqlite.org/lang_datefunc.html which says what (string and number) formats date/time functions expect.

SQLite is very slow when performing .import on a large table

I'm running the following:
.mode tabs
CREATE TABLE mytable(mytextkey TEXT PRIMARY KEY, field1 INTEGER, field2 REAL);
.import mytable.tsv mytable
mytable.tsv is approx. 6 GB and 50 million rows. The process takes an extremely long time (hours) to run and it also completely throttles the performance of the entire system, I'm guessing because of temporary disk IO.
I don't understand why it takes so long and why it thrashes the disk so much, when I have plenty of free physical RAM it could use for temporary write.
How do I improve this process?
PS: Yes I did search for an previous question and answer, but nothing I found helped.
In Sqlite, a normal rowid table uses a 64-bit integer primary key. If you have a PK in the table definition that's anything but a single INTEGER column, that is instead treated as a unique index, and each row inserted has to update both the original table and that index, doubling the work (And in your case effectively doubling the storage requirements). If you instead make your table a WITHOUT ROWID one, the PK is a true PK and doesn't require an extra index table. That change alone should roughly halve both the time it takes to import your dataset and the size of the database. (If you have other indexes on the table, or use that PK as a foreign key in another table, it might not be worth making the change in the long run as it'll increase the amount of space needed for those tables by potentially a lot given the lengths of your keys; in that case, see Schwern's answer).
Sorting the input on the key column first can help too on large imports because there's less random access of b-tree pages and moving of data within those pages. Everything goes into the same page until it fills up and a new one is allocated and any needed rebalancing is done.
You can also turn on some unsafe settings that in normal usage aren't recommended because they can result in data loss or outright corruption, but if that happens during import because of a freak power outage or whatever, you can always just start over. In particular, setting the synchronous mode and journal type to OFF. That results in fewer disc writes over the course of the import.
My assumption is the problem is the text primary key. This requires building a large and expensive text index.
The primary key is a long nucleotide sequence (anywhere from 20 to 300 characters), field1 is a integer (between 1 and 1500) and field2 is a relative log ratio (between -10 and +10 roughly).
Text primary keys have few advantages and many drawbacks.
They require large, slow indexes. Slow to build, slow to query, slow to insert.
It's tempting to change text, exactly what you don't want a primary key to do.
Any table referencing it also requires storing and indexing text adding to bloat.
Joins with this table will be slower due to the text primary key.
Consider what happens when you make a new table which references this one.
create table othertable(
myrefrence references mytable, -- this is text
something integer,
otherthing integer
)
othertable now must store a copy of the entire sequence, bloating the table. Instead of being simple integers it now has a text column, bloating the table. And it must make its own text index, bloating the index, and slowing down joins and inserts.
Instead, use a normal, integer, autoincrementing primary key and make the sequence column unique (which is also indexed). This provides all the benefits of a text primary key with none of the drawbacks.
create table sequences(
id integer primary key autoincrement,
sequence text not null unique,
field1 integer not null,
field2 real not null
);
Now references to sequences are a simple integer.
Because the SQLite import process is not very customizable, getting your data into this table in SQLite efficiently requires a couple steps.
First, import your data into a table which does not yet exist. Make sure it has header fields matching your desired column names.
$ cat test.tsv
sequence field1 field2
d34db33f 1 1.1
f00bar 5 5.5
somethings 9 9.9
sqlite> .import test.tsv import_sequences
As there's no indexing happening, this process should go pretty quick. SQLite made a table called import_sequences with everything of type text.
sqlite> .schema import_sequences
CREATE TABLE import_sequences(
"sequence" TEXT,
"field1" TEXT,
"field2" TEXT
);
sqlite> select * from import_sequences;
sequence field1 field2
---------- ---------- ----------
d34db33f 1 1.1
f00bar 5 5.5
somethings 9 9.9
Now we create the final production table.
sqlite> create table sequences(
...> id integer primary key autoincrement,
...> sequence text not null unique,
...> field1 integer not null,
...> field2 real not null
...> );
For efficiency, normally you'd add the unique constraint after the import, but SQLite has very limited ability to alter a table and cannot alter an existing column except to change its name.
Now transfer the data from the import table into sequences. The primary key will be automatically populated.
insert into sequences (sequence, field1, field2)
select sequence, field1, field2
from import_sequences;
Because the sequence must be indexed this might not import any faster, but it will result in a much better and more efficient schema going forward. If you want efficiency consider a more robust database.
Once you've confirmed the data came over correctly, drop the import table.
The following settings helped speed things up tremendously.
PRAGMA journal_mode = OFF
PRAGMA cache_size = 7500000
PRAGMA synchronous = 0
PRAGMA temp_store = 2

What is the difference between related SQLite data-types like INT, INTEGER, SMALLINT and TINYINT?

When creating a table in SQLite3, I get confused when confronted with all the possible datatypes which imply similar contents, so could anyone tell me the difference between the following data-types?
INT, INTEGER, SMALLINT, TINYINT
DEC, DECIMAL
LONGCHAR, LONGVARCHAR
DATETIME, SMALLDATETIME
Is there some documentation somewhere which lists the min./max. capacities of the various data-types? For example, I guess smallint holds a larger maximum value than tinyint, but a smaller value than integer, but I have no idea of what these capacities are.
SQLite, technically, has no data types, there are storage classes in a manifest typing system, and yeah, it's confusing if you're used to traditional RDBMSes. Everything, internally, is stored as text. Data types are coerced/converted into various storage locations based on affinities (ala data types assigned to columns).
The best thing that I'd recommend you do is to :
Temporarily forget everything you used to know about standalone database datatypes
Read the above link from the SQLite site.
Take the types based off of your old schema, and see what they'd map to in SQLite
Migrate all the data to the SQLite database.
Note: The datatype limitations can be cumbersome, especially if you add time durations, or dates, or things of that nature in SQL. SQLite has very few built-in functions for that sort of thing. However, SQLite does provide an easy way for you to make your own built-in functions for adding time durations and things of that nature, through the sqlite3_create_function library function. You would use that facility in place of traditional stored procedures.
The difference is syntactic sugar. Only a few substrings of the type names matter as for as the type affinity is concerned.
INT, INTEGER, SMALLINT, TINYINT → INTEGER affinity, because they all contain "INT".
LONGCHAR, LONGVARCHAR → TEXT affinity, because they contain "CHAR".
DEC, DECIMAL, DATETIME, SMALLDATETIME → NUMERIC, because they don't contain any of the substrings that matter.
The rules for determining affinity are listed at the SQLite site.
If you insist on strict typing, you can implement it with CHECK constraints:
CREATE TABLE T (
N INTEGER CHECK(TYPEOF(N) = 'integer'),
Str TEXT CHECK(TYPEOF(Str) = 'text'),
Dt DATETIME CHECK(JULIANDAY(Dt) IS NOT NULL)
);
But I never bother with it.
As for the capacity of each type:
INTEGER is always signed 64-bit. Note that SQLite optimizes the storage of small integers behind-the-scenes, so TINYINT wouldn't be useful anyway.
REAL is always 64-bit (double).
TEXT and BLOB have a maximum size determined by a preprocessor macro, which defaults to 1,000,000,000 bytes.
Most of those are there for compatibility. You really only have integer, float, text, and blob. Dates can be stored as either a number (unix time is integer, microsoft time is float) or as text.
NULL. The value is a NULL value.
INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
BLOB. The value is a blob of data, stored exactly as it was input.
As an addition to answer from dan04, if you want to blindly insert a NUMERIC other than zero represented by a TEXT but ensure that text is convertible to a numeric:
your_numeric_col NUMERIC CHECK(abs(your_numeric_col) <> 0)
Typical use case is in a query from a program that treats all data as text (for uniformity & simplicity, since SQLite already does so). The nice thing about this is that it allows constructs like this:
INSERT INTO table (..., your_numeric_column, ...) VALUES (..., some_string, ...)
which is convenient in case you're using placeholders because you don't have to handle such non-zero numeric fields specially. An example using Python's sqlite3 module would be,
conn_or_cursor.execute(
"INSERT INTO table VALUES (" + ",".join("?" * num_values) + ")",
str_value_tuple) # no need to convert some from str to int/float
In the above example, all values in str_value_tuple will be escaped and quoted as strings when passed to SQlite. However, since we're not checking explicitly the type via TYPEOF but only convertibility to type, it will still work as desired (i.e., SQLite will either store it as a numeric or fail otherwise).

Resources