BLOB/TEXT column used in key specification without key length (1170)? [duplicate] - mariadb

Entries in my table are uniquely identified by word that is 5-10 characters long and I use TINYTEXT(10) for the column. However, when I try to set it as PRIMARY key I get the error that size is missing.
From my limited understanding of the docs, Size for PRIMARY keys can be used to simplify a way to detect unique value i.e. when first few character (specified by Size) can be enough to consider it unique match. In my case, the size would differ from 5 to 10 (they are all latin1 so they are exact byte per character + 1 for the lenght). Two questions:
If i wanted to use TINYTEXT as PRIMARY key, which size should I specify? Maximum available - 10 in this case? Or should be the size strictly EXACT, for example, if my key is 6 character long word, but I specify Size for PK as 10 - it will try to read all 10 and will fail and throw me an exception?
How bad performance-wise would be to use [TINY]TEXT for the PK? All Google results lead me to opinions and statements "it is BAD, you are fired", but is it really true in this case, considering TINYTEXT is 255 max and I already specified max length to 10?

MySQL/MariaDB can index only the first characters of the text fields but not the whole text if it is too large. The maximum key size is 3072 bytes and any text field larger than that cannot be used as KEY. Therefore on text fields longer than 3072 bytes you must specify explicitly how much characters it will index. When using VARCHAR or CHAR it can be done directly because you explicitly set it when declaring the datatype. It's not the case with *TEXT - they do not have that option. The solution is to create the primary key like this:
CREATE TABLE mytbl (
name TEXT NOT NULL,
PRIMARY KEY idx_name(name(255))
);
The same trick can be done if you need to make primary key on a VARCHAR field which is larger than 3072 bytes, on BINARY fields and BLOBs. Anyway you can imagine that if two large and different texts start with the same characters at the first 3072 bytes at the beginning, they will be treated as equal by the system. That may be a problem.
It is generally bad idea to use text field as primary key. There are two reasons for that:
2.1. It takes much more processing time than using integers to search in the table (WHERE, JOINS, etc). The link is old but still relevant;
2.2. Any foreign key in another table must have the same datatype as the primary key. When you use text, this will waste disk space;
Note: the difference between *TEXT and VARCHAR is that the contents of the *TEXT fields are not stored inside the table but in outside memory location. Usually we do that when we need to store really large text.

for TINYTEXT can not specify the size. Use VARCHAR (size)
SQL Data Types

FYI, you can't specify a size for TINYTEXT in MySQL:
mysql> create table t1 ( t tinytext(10) );
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near '(10) )' at line 1
You can specify a length after TEXT, but it doesn't work the way you think it does. It means it will choose one of the family of TEXT types, the smallest type that supports at least the length you requested. But once it does that, it does not limit the length of input. It still accepts any data up to the maximum length of the type it chose.
mysql> create table t1 ( t text(10) );
Query OK, 0 rows affected (0.02 sec)
mysql> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`t` tinytext
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
mysql> insert into t1 set t = repeat('a', 255);
Query OK, 1 row affected (0.01 sec)
mysql> select length(t) from t1;
+-----------+
| length(t) |
+-----------+
| 255 |
+-----------+

Related

sqlite3 binary data type

I am looking at migrating a small sqlite3 db to mysql. I know mysql but new to sqlite3 so have been reading about it online. I used pragma table_info(<table_name>) to get info about the table structure.
From the output I could understand columns with data type TEXT, INTEGER but i do not understand datatype BINARY(32). From sqlite3 documentation on the net there is a BINARY collation, but there is no BINARY datatype. So I just want to understand this this BINARY(32) datatype. Thanks.
SQLite is unusual in datatypes (column types). You can store any type of data in any type of columns with the exception of the rowid column or an alias of the rowid column.
see Rowid Tables
rowid is similar to MySQL AUTO INCREMENT BUT beware of differences
In the example below see how the rowid starts from -100, then -99 .....
AUTOINCREMENT on SQLite is only a constraint as such that enforces that a new id is higher than any existing in the table.
So BINARY, BINARY(32), (rumplestistkin even) are valid for the datatype when defining a column.
However, a column will be given a column affinity and governed by the rules :-
If the column type contains INT the the affinity is INTEGER.
If the column type contains CHAR, CLOB or TEXT, then it's affinity is TEXT.
If the column type contains BLOB then it's affinity is BLOB.
If the column type contains REAL FLOA or DOUB then it's affinity is REAL.
Otherwise the affinity is NUMERIC.
As such BINARY(32) is NUMERIC affinity. However, the column type is of little consequence in regards to storing data. The affinity can affect retrieval a little.
In regard to converting the rules mentioned above could be utilised you could also perhaps find the typeof function of use (example of it's use is in the example along with the results). However, neither will necessarily, indicate how the data is subsequently used which could well be a factor that needs consideration.
SQLite's flexibility with column types aids in converting from other relational databases BUT can be a bit of a hindrance when converting from SQLite.
Note this answer is by no means intended to be comprehensive explanation of the conversion from SQLite to MysQL.
See Datatypes in SQLite
Here's an example that shows that any type can be stored in any column (thus any row/col combination can store different types) :-
DROP TABLE IF EXISTS example;
CREATE TABLE IF NOT EXISTS example (
rowid_alias_must_be_unique_integer INTEGER PRIMARY KEY, -- INTEGER PRIMARY KEY makes the column an alias of the rowid
col_text TEXT,
col_integer INTEGER,
col_real REAL,
col_BLOB BLOB,
col_anyother this_is_a_stupid_column_type
);
INSERT INTO example VALUES (-100,'MY TEXT', 340000,34.5678,x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',100);
INSERT INTO example (col_text,col_integer,col_real,col_blob,col_anyother) VALUES
('MY TEXT','MY TEXT','MY TEXT','MY TEXT','MY TEXT'),
(100,100,100,100,100),
(34.5678,34.5678,34.5678,34.5678,34.5678),
(x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff',x'f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff')
;
SELECT
*,
rowid,
typeof(rowid_alias_must_be_unique_integer),
typeof(col_text),
typeof(col_integer),
typeof(col_real),
typeof(col_blob),
typeof(col_anyother)
FROM example
;
DROP TABLE IF EXISTS example;
Running the above results in (Note different SQLtools handle blobs in different ways, Navicat was used to run the above) :-
note that the typeof function returns the storage type as opposed to the affinity. However, the affinity can affect the storage type.
e.g. if the affinity is text then with the exception of a blob the value is stored as text. (see 2. in Datatype in SQLite above).

SQLite is very slow when performing .import on a large table

I'm running the following:
.mode tabs
CREATE TABLE mytable(mytextkey TEXT PRIMARY KEY, field1 INTEGER, field2 REAL);
.import mytable.tsv mytable
mytable.tsv is approx. 6 GB and 50 million rows. The process takes an extremely long time (hours) to run and it also completely throttles the performance of the entire system, I'm guessing because of temporary disk IO.
I don't understand why it takes so long and why it thrashes the disk so much, when I have plenty of free physical RAM it could use for temporary write.
How do I improve this process?
PS: Yes I did search for an previous question and answer, but nothing I found helped.
In Sqlite, a normal rowid table uses a 64-bit integer primary key. If you have a PK in the table definition that's anything but a single INTEGER column, that is instead treated as a unique index, and each row inserted has to update both the original table and that index, doubling the work (And in your case effectively doubling the storage requirements). If you instead make your table a WITHOUT ROWID one, the PK is a true PK and doesn't require an extra index table. That change alone should roughly halve both the time it takes to import your dataset and the size of the database. (If you have other indexes on the table, or use that PK as a foreign key in another table, it might not be worth making the change in the long run as it'll increase the amount of space needed for those tables by potentially a lot given the lengths of your keys; in that case, see Schwern's answer).
Sorting the input on the key column first can help too on large imports because there's less random access of b-tree pages and moving of data within those pages. Everything goes into the same page until it fills up and a new one is allocated and any needed rebalancing is done.
You can also turn on some unsafe settings that in normal usage aren't recommended because they can result in data loss or outright corruption, but if that happens during import because of a freak power outage or whatever, you can always just start over. In particular, setting the synchronous mode and journal type to OFF. That results in fewer disc writes over the course of the import.
My assumption is the problem is the text primary key. This requires building a large and expensive text index.
The primary key is a long nucleotide sequence (anywhere from 20 to 300 characters), field1 is a integer (between 1 and 1500) and field2 is a relative log ratio (between -10 and +10 roughly).
Text primary keys have few advantages and many drawbacks.
They require large, slow indexes. Slow to build, slow to query, slow to insert.
It's tempting to change text, exactly what you don't want a primary key to do.
Any table referencing it also requires storing and indexing text adding to bloat.
Joins with this table will be slower due to the text primary key.
Consider what happens when you make a new table which references this one.
create table othertable(
myrefrence references mytable, -- this is text
something integer,
otherthing integer
)
othertable now must store a copy of the entire sequence, bloating the table. Instead of being simple integers it now has a text column, bloating the table. And it must make its own text index, bloating the index, and slowing down joins and inserts.
Instead, use a normal, integer, autoincrementing primary key and make the sequence column unique (which is also indexed). This provides all the benefits of a text primary key with none of the drawbacks.
create table sequences(
id integer primary key autoincrement,
sequence text not null unique,
field1 integer not null,
field2 real not null
);
Now references to sequences are a simple integer.
Because the SQLite import process is not very customizable, getting your data into this table in SQLite efficiently requires a couple steps.
First, import your data into a table which does not yet exist. Make sure it has header fields matching your desired column names.
$ cat test.tsv
sequence field1 field2
d34db33f 1 1.1
f00bar 5 5.5
somethings 9 9.9
sqlite> .import test.tsv import_sequences
As there's no indexing happening, this process should go pretty quick. SQLite made a table called import_sequences with everything of type text.
sqlite> .schema import_sequences
CREATE TABLE import_sequences(
"sequence" TEXT,
"field1" TEXT,
"field2" TEXT
);
sqlite> select * from import_sequences;
sequence field1 field2
---------- ---------- ----------
d34db33f 1 1.1
f00bar 5 5.5
somethings 9 9.9
Now we create the final production table.
sqlite> create table sequences(
...> id integer primary key autoincrement,
...> sequence text not null unique,
...> field1 integer not null,
...> field2 real not null
...> );
For efficiency, normally you'd add the unique constraint after the import, but SQLite has very limited ability to alter a table and cannot alter an existing column except to change its name.
Now transfer the data from the import table into sequences. The primary key will be automatically populated.
insert into sequences (sequence, field1, field2)
select sequence, field1, field2
from import_sequences;
Because the sequence must be indexed this might not import any faster, but it will result in a much better and more efficient schema going forward. If you want efficiency consider a more robust database.
Once you've confirmed the data came over correctly, drop the import table.
The following settings helped speed things up tremendously.
PRAGMA journal_mode = OFF
PRAGMA cache_size = 7500000
PRAGMA synchronous = 0
PRAGMA temp_store = 2

SQLITE3 store text and numbers in the same column

In the SQLITE3 Docu you can find:
"SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases should work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases."
I have a column with different values some are integer, some float and some text. I use BLOB als storage class and it works fine. But i'm a little suspicious of this solution. Is this the right way to store numbers and text in the same column?
TLDR; you are not doing it wrong, but I also think you are not doing what you think you are doing.
I don't think you are using a BLOB storage class, but a BLOB type affinity. In SQLite there are 2 concepts of datatype. Storage class, and type affinity.
Storage class is the way a given value is stored in memory, it describes how bit patterns represent values. E.g. while numerically equal, the integer 1 and the floating point 1.0 are represented by very different bit patterns (at least in the memory, but curiously "As an internal optimization, small floating point values with no fractional component and stored in columns with REAL affinity are written to disk as integers").
There are 5 storage classes, NULL, INTEGER, REAL, TEXT, and BLOB.
Type affinity on the other hand, is not a value- but a column-level concept. Type affinity is a characteristic of a given column. There are also 5 type affinities, NUMBER, INTEGER, REAL, TEXT, and BLOB.
The first thing you might notice, is that there is no 1-1 mapping between storage classes and affinities. That's because they are indeed almost completely unrelated. Consider the following
sqlite> CREATE TABLE Test(A TEXT, B BLOB, C REAL);
sqlite> INSERT INTO Test VALUES ('text', 'also a text', 'another one');
sqlite> INSERT INTO Test VALUES (10, 10.1, 10.11111111111111111111111);
sqlite> SELECT A, TYPEOF(A), B, TYPEOF(B), C, TYPEOF(C) FROM Test;
text|text|also a text|text|another one|text
10|text|10.1|real|10.1111111111111|real
As you can see, I defined the columns with types TEXT, BLOB, and REAL. Those are the affinities. But I was able to insert values of different types into them, regardless of their affinities. E.g. column C now both stores 'another one', of storage class TEXT, and 10.1111111111111 of storage class REAL.
Normally you should not be too concerned about storage classes, as you can see from my example, SQLite just automatically and correctly inferred them from the data I inserted.
Type affinity is more important. That's what describes how your data is compared, sorted, how it's uniqueness is determined. E.g.
sqlite> CREATE Table T(A TEXT);
sqlite> INSERT INTO T VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM T ORDER BY A ASC;
0.0
1
2
sqlite> CREATE TABLE R(A REAL);
sqlite> INSERT INTO R VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM R ORDER BY A ASC;
0.0
1.0
2.0
sqlite> CREATE TABLE B(A BLOB);
sqlite> INSERT INTO B VALUES ('1'), (2), (0.0);
sqlite> SELECT * FROM B ORDER BY A ASC;
0.0
2
1
So for practical purposes, it is more closely related to your intuitive idea of a datatype. Think about what your data means, and choose the affinity accordingly. If you store your values in a column of BLOB affinity, a bitwise comparison will be used on them. If this is what you want, go for it. But if you want a lexicographic comparison you need TEXT affinity, or if you want to do the comparisons based on the numerical values your data represents, then e.g. REAL.

What is the maximum size limit of varchar data type in sqlite?

I'm new to sqlite. I want to know the maximum size limit of varchar data type in sqlite?
can anybody suggest me some information related to this? I searched on sqlite.org site and they give the answer as:
Q. What is the maximum size of a VARCHAR in SQLite?
A. SQLite does not enforce the length of a VARCHAR. You can declare a VARCHAR(10) and SQLite will be happy to let you put 500
characters in it. And it will keep all 500 characters intact - it
never truncates.
but I want to know the exact max size limit of varchar datatype in sqlite.
from http://www.sqlite.org/limits.html
Maximum length of a string or BLOB
The maximum number of bytes in a
string or BLOB in SQLite is defined by
the preprocessor macro
SQLITE_MAX_LENGTH. The default value of this macro is 1 billion (1 thousand
million or 1,000,000,000). You can
raise or lower this value at
compile-time using a command-line
option like this:
-DSQLITE_MAX_LENGTH=123456789 The current implementation will only
support a string or BLOB length up to
231-1 or 2147483647. And some built-in
functions such as hex() might fail
well before that point. In
security-sensitive applications it is
best not to try to increase the
maximum string and blob length. In
fact, you might do well to lower the
maximum string and blob length to
something more in the range of a few
million if that is possible.
During part of SQLite's INSERT and
SELECT processing, the complete
content of each row in the database is
encoded as a single BLOB. So the
SQLITE_MAX_LENGTH parameter also
determines the maximum number of bytes
in a row.

What is the maximum size we can give for nvarchar in sqlite?

In SqlServer we can use NVarchar(MAX) but this is not possible in sqlite. What is the maximum size I can give for Nvarchar(?)?
There is no maximum in SQLite. You can insert strings of unlimited length (subject to memory and disk space.) The size in the CREATE TABLE statement is ignored anyway.
What is the maximum size I can give for Nvarchar(?)?
You don't, because Sqlite will ignore anything over 255 when specified inside NVARCHAR(?).
Instead, use the TEXT datatype wherever you need NVARCHAR(MAX).
For instance, if you need a very large string column to store Base64 string values for images, you can use something like the following for that column definition.
LogoBase64String TEXT NULL,
SQLite doesn't really enforce length restrictions on length of string.
Note that numeric arguments in parentheses that following the type
name (ex: "VARCHAR(255)") are ignored by SQLite - SQLite does not
impose any length restrictions (other than the large global
SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric
values.
Source www.sqlite.org/datatype3

Resources