When using EXPLAIN QUERY PLAN in SQLite 3 it sometimes gives me output such as
SEARCH TABLE staff AS s USING AUTOMATIC COVERING INDEX (is_freelancer=? AND sap=?) (~6 rows)
Where does the index come from and what does it do? The table has no manually created indices on it.
"Automatic" means that SQLite creates a temporary index that is used only for this query, and deleted afterwards.
This happens when the cost of creating the index is estimated to be smaller than the cost of looking up records in the table without the index.
(A covering index is an index that contains all the columns to be read, which means that the record corresponding to the index entry does not need to be looked up in the table.)
Related
With Sqlite3, I am trying to do a query like:
select *
from data
where instr(filepath,'.txt') != 0
And I want to index this query to speed it up.
I tried to create an index like:
create index data_instr_filepath
on data(instr(filepath,'.txt'));
However, "explain query plan" still shows that I'm doing a table scan.
Is this doable in sqlite? The examples I have found for doing expression-based indexes seems to be limited to the length function and multiplying two columns together.
UPDATE:
Thanks to Mike's answer, I refactored my query to not use inequalities and was able to create an index that hits it. Below are my indexes that I ended up using:
create index data_instr_filepath_txt on data(instr(filepath,'.txt'));
create index data_instr_filepath_substr on data(substr(filepath,0,instr(filepath,'.')));
The reason is that an index will likely not be used for an inequality as per :-
Similarly, index columns will not normally be used (for indexing
purposes) if they are to the right of a column that is constrained
only by inequalities. The SQLite Query Optimizer Overview
You are able to try forcing the use of an index by using INDEXED BY. However, this will not work in your situation because of the above flagging the index as not being usable. (the query will still work)
e.g.
EXPLAIN QUERY PLAN
SELECT * FROM data INDEXED BY data_instr_filepath
WHERE instr(filepath,'.txt') != 0
results in :-
no query solution
Time: 0s
My problem is that my querys are too slow.
I have a fairly large sqlite database. The table is:
CREATE TABLE results (
timestamp TEXT,
name TEXT,
result float,
)
(I know that timestamps as TEXT is not optimal, but please ignore that for the purposes of this question. I'll have to fix that when I have the time)
"name" is a category. This calculation holds the results of a calculation that has to be done at each timestamp for all "name"s. So the inserts are done at equal-timestamps, but the querys will be done at equal-names (i.e. I want given a name, get its time series), like:
SELECT timestamp,result WHERE name='some_name';
Now, the way I'm doing things now is to have no indexes, calculate all results, then create an index on name CREATE INDEX index_name ON results (name). The reasoning is that I don't need the index when I'm inserting, but having the index will make querys on the index really fast.
But it's not. The database is fairly large. It has about half a million timestamps, and for each timestamp I have about 1000 names.
I suspect, although I'm not sure, that the reason why it's slow is that every though I've indexed the names, they're still scattered all around the physical disk. Something like:
timestamp1,name1,result
timestamp1,name2,result
timestamp1,name3,result
...
timestamp1,name999,result
timestamp1,name1000,result
timestamp2,name1,result
timestamp2,name2,result
etc...
I'm sure this is slower to query with NAME='some_name' than if the rows were physically ordered as:
timestamp1,name1,result
timestamp2,name1,result
timestamp3,name1,result
...
timestamp499997,name1000,result
timestamp499998,name1000,result
timestamp499999,name1000,result
timestamp500000,namee1000,result
etc...
So, how do I tell SQLite that the order in which I'd like the rows in disk isn't the one they were written in?
UPDATE: I'm further convinced that the slowness in doing a select with such an index comes exclusively from non-contiguous disk access. Doing SELECT * FROM results WHERE name=<something_that_doesnt_exist> immediately returns zero results. This suggests that it's not finding the names that's slow, it's actually reading them from the disk.
Normal sqlite tables have, as a primary key, a 64-bit integer (Known as rowid and a few other aliases). That determines the order that rows are stored in a B*-tree (Which puts all actual data in leaf node pages). You can change this with a WITHOUT ROWID table, but that requires an explicit primary key which is used to place rows in a B-tree. So if every row's (name, timestamp) columns make a unique value, that's a possibility that will leave all rows with the same name on a smaller set of pages instead of scattered all over.
You'd want the composite PK to be in that order if you're searching for a particular name most of the time, so something like:
CREATE TABLE results (
timestamp TEXT
, name TEXT
, result REAL
, PRIMARY KEY (name, timestamp)
) WITHOUT ROWID
(And of course not bothering with a second index on name.) The tradeoff is that inserts are likely to be slower as the chances of needing to split a page in the B-tree go up.
Some pragmas worth looking into to tune things:
cache_size
mmap_size
optimize (After creating your index; also consider building sqlite with SQLITE_ENABLE_STAT4.)
Since you don't have an INTEGER PRIMARY KEY, consider VACUUM after deleting a lot of rows if you ever do that.
I have a table that is actually a ranking list. I want to give user a chance to rearrange that top the way he wants, ergo, allow him to move the rows in that table. Should I create a separate column that would hold the place, or can it be done using embedded order in table?
The documentation says:
If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
(This is true for all SQL databases.)
So you cannot rely on the order that the rows happen to be stored in; you have to use some value in some table column.
I have a table 'data' with columns
id (auto_increment) id_device (integer) timestamp(numeric)
I need to execute these selects:
select * from data where id<10000000 and id_device=345
or
select * from data where id<10000000 and id_device=345 and timestamp>'2017-01-01 10:00:00' and timestamp<'2017-03-01 08:00:00'
For first select:
Is it better to make separate index for "id" and separate for "id_device"?
Or is it better for performance to make index like INDEX id, id_device?
For second select:
Is better to make separate index for "id" and separate for "id_device" and separate for "timestamp"?
Or is it better for performance to make index like INDEX id, id_device, timestamp?
My short answer: it depends on your data.
Longer: if id_device=345 is true for fewer rows than id<10000000 then id_device should be listed first in a multi-column index: ...ON data(id_device,id). Also if select speed is more important to you/your users than insert/update/delete speed, then why not add a lot of indexes and leave it to the query planner to choose which ones to use:
create index i01_tbl on tbl(id);
create index i02_tbl on tbl(id_device);
create index i03_tbl on tbl(timestamp);
create index i04_tbl on tbl(id,id_device);
create index i05_tbl on tbl(id_device,id);
create index i06_tbl on tbl(timestamp,id);
create index i07_tbl on tbl(id,timestamp);
create index i08_tbl on tbl(id_device,timestamp);
create index i09_tbl on tbl(timestamp,id_device);
create index i10_tbl on tbl(id, id_device, timestamp);
create index i11_tbl on tbl(id_device, id, timestamp);
create index i12_tbl on tbl(id_device, timestamp, id);
create index i13_tbl on tbl(id, timestamp, id_device);
create index i14_tbl on tbl(timestamp, id_device, id);
create index i15_tbl on tbl(timestamp, id, id_device);
The query planner algorithms in your database (sqlite have them too) usually make good choises on that. Especially if you run the ANALYZE sqlite command periodically or after changing lots of data. The downside of having many indexes is slower inserts and deletes (and updates if they involve indexed columns) and more disk/memory usage. Use explain plan on your important SQLs (important when it comes to speed) to check which indexes are used and not. If an index is never used or only used in queries that is fast anyway without it, then you can drop those. Also be aware that newer versions of your database (sqlite, oracle, postgresql) can have newer query planner algorithms which for most SELECTs are better, but for some can get worse. Realistic tests on realistic datasets are the best way to tell. Which indexes to create is not an exact science and dont have definitive rules that fits all cases.
I am new to idexes and DB optimization. I know there is simple index for one
CREATE index ON table(col)
possibly B-Tree will be created and search capabilities will be improved.
But what is happen for 2 columns index ? And why is the order of defnition important?
CREATE index ON table(col1, col2)
Yes, B-Tree index will be created in most of the database if you didn't specify other type of index. Composite index is useful when the combined selectivity of the composite columns happed on the queries.
The order of the columns on the composite index is important as searching by giving exact values for all the fields included in the index leads to minimal search time but search uses only the first field to retrieve all matched recaords if we provide the values partially with first field.
I found following example for your understanding:
In the phone book example with an composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns.