How do I create an expression-based index using the instr() function? - sqlite

With Sqlite3, I am trying to do a query like:
select *
from data
where instr(filepath,'.txt') != 0
And I want to index this query to speed it up.
I tried to create an index like:
create index data_instr_filepath
on data(instr(filepath,'.txt'));
However, "explain query plan" still shows that I'm doing a table scan.
Is this doable in sqlite? The examples I have found for doing expression-based indexes seems to be limited to the length function and multiplying two columns together.
UPDATE:
Thanks to Mike's answer, I refactored my query to not use inequalities and was able to create an index that hits it. Below are my indexes that I ended up using:
create index data_instr_filepath_txt on data(instr(filepath,'.txt'));
create index data_instr_filepath_substr on data(substr(filepath,0,instr(filepath,'.')));

The reason is that an index will likely not be used for an inequality as per :-
Similarly, index columns will not normally be used (for indexing
purposes) if they are to the right of a column that is constrained
only by inequalities. The SQLite Query Optimizer Overview
You are able to try forcing the use of an index by using INDEXED BY. However, this will not work in your situation because of the above flagging the index as not being usable. (the query will still work)
e.g.
EXPLAIN QUERY PLAN
SELECT * FROM data INDEXED BY data_instr_filepath
WHERE instr(filepath,'.txt') != 0
results in :-
no query solution
Time: 0s

Related

When to create multi-column indices in SQLite?

Assume I have a table in an SQLite database:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
price INTEGER NOT NULL,
updateTime INTEGER NOT NULL,
) [WITHOUT ROWID];
what indices should I create to optimize the following query:
SELECT * FROM orders WHERE price > ? ORDER BY updateTime DESC;
Do I create two indices:
CREATE INDEX i_1 ON orders(price);
CREATE INDEX i_2 ON orders(updateTime);
or one complex index?
CREATE INDEX i_3 ON orders(price, updateTime);
What can be query time complexity?
From The SQLite Query Optimizer Overview/WHERE Clause Analysis:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might
be used if the initial columns of the index (columns a, b, and so
forth) appear in WHERE clause terms. The initial columns of the index
must be used with the = or IN or IS operators. The right-most column
that is used can employ inequalities.
As explained also in The SQLite Query Optimizer Overview/The Skip-Scan Optimization with an example:
Because the left-most column of the index does not appear in the WHERE
clause of the query, one is tempted to conclude that the index is not
usable here. However, SQLite is able to use the index.
This means than if you create an index like:
CREATE INDEX idx_orders ON orders(updateTime, price);
it might be used to optimize the WHERE clause even though updateTime does not appear there.
Also, from The SQLite Query Optimizer Overview/ORDER BY Optimizations:
SQLite attempts to use an index to satisfy the ORDER BY clause of a
query when possible. When faced with the choice of using an index to
satisfy WHERE clause constraints or satisfying an ORDER BY clause,
SQLite does the same cost analysis described above and chooses the
index that it believes will result in the fastest answer.
Since updateTime is defined first in the composite index, the index may also be used to optimize the ORDER BY clause.

How to introduce indexing to sqlite query in android?

In my android application, I use Cursor c = db.rawQuery(query, null); to query data from a local sqlite database, and one of the query string looks like the following:
SELECT t1.* FROM table t1
WHERE NOT EXISTS (
SELECT 1 FROM table t2
WHERE t2.start_time = t1.start_time AND t2.stop_time > t1.stop_time
)
however, the issue is that the query gets very slow when the database gets huge. Trying to look into introducing indexing to speed up the query, but so far, not been very successful, therefore, would be great to have some help here, as it's also hard to find examples for this for android applications.
You can create a composite index for the columns start_time and stop_time:
CREATE INDEX idx_name ON table_name(start_time, stop_time);
You can read in The SQLite Query Optimizer Overview:
The ON and USING clauses of an inner join are converted into
additional terms of the WHERE clause prior to WHERE clause analysis
...
and:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might be used if the initial columns of the index
(columns a, b, and so forth) appear in WHERE clause terms. The initial
columns of the index must be used with the = or IN or IS operators.
The right-most column that is used can employ inequalities.
You may have to uninstall the app from the device so that the db is deleted and rerun to recreate it, or increase the version number of the db so that you can create the index in the onUpgrade() method.

sqlite optimization on inner join with tables values around 18K

I have two tables tool , tool_attribute.
tool has 12 columns and tool_attribute has 5.
Information i needed from the tables :
tool - refid, serial, type, id
tool_attribute - key, value, id (There will be multiple entries for this)
Right now i have around 18264 in tool and 255696 in tool_attribute
Current Query :
select
tool.refid,
tool.serial,
tool_attribute.value,
tool.type
from tool
inner join tool_attribute
on tool.id = tool_attribute.id
where
(tool_attribute.val LIKE '%t00%' or
tool.serial LIKE '%t00%')
group by tool.refid
order by tool.serial asc;
This take around 750ms which is quite fast but i want to make it much faster. I run this code on low memory windows 6.0 device so it takes too much time.
Is there any way i could make it faster ?
You can try adding indices to the columns involved in the join:
CREATE INDEX idx_tool ON tool (id);
CREATE INDEX idx_tool_attr ON tool_attribute (id);
The LIKE conditions in your WHERE clause would preclude any chance of using an index on the columns involved, I think. The reason for this is a LIKE expression of the form %something eliminates the chance to search through a B-tree, which uses the suffix from left to right to find something. If you could rephrase your WHERE logic using something similar to LIKE 'something%' then an index could be used there as well.

separate indexes for select optimization

I have a table 'data' with columns
id (auto_increment) id_device (integer) timestamp(numeric)
I need to execute these selects:
select * from data where id<10000000 and id_device=345
or
select * from data where id<10000000 and id_device=345 and timestamp>'2017-01-01 10:00:00' and timestamp<'2017-03-01 08:00:00'
For first select:
Is it better to make separate index for "id" and separate for "id_device"?
Or is it better for performance to make index like INDEX id, id_device?
For second select:
Is better to make separate index for "id" and separate for "id_device" and separate for "timestamp"?
Or is it better for performance to make index like INDEX id, id_device, timestamp?
My short answer: it depends on your data.
Longer: if id_device=345 is true for fewer rows than id<10000000 then id_device should be listed first in a multi-column index: ...ON data(id_device,id). Also if select speed is more important to you/your users than insert/update/delete speed, then why not add a lot of indexes and leave it to the query planner to choose which ones to use:
create index i01_tbl on tbl(id);
create index i02_tbl on tbl(id_device);
create index i03_tbl on tbl(timestamp);
create index i04_tbl on tbl(id,id_device);
create index i05_tbl on tbl(id_device,id);
create index i06_tbl on tbl(timestamp,id);
create index i07_tbl on tbl(id,timestamp);
create index i08_tbl on tbl(id_device,timestamp);
create index i09_tbl on tbl(timestamp,id_device);
create index i10_tbl on tbl(id, id_device, timestamp);
create index i11_tbl on tbl(id_device, id, timestamp);
create index i12_tbl on tbl(id_device, timestamp, id);
create index i13_tbl on tbl(id, timestamp, id_device);
create index i14_tbl on tbl(timestamp, id_device, id);
create index i15_tbl on tbl(timestamp, id, id_device);
The query planner algorithms in your database (sqlite have them too) usually make good choises on that. Especially if you run the ANALYZE sqlite command periodically or after changing lots of data. The downside of having many indexes is slower inserts and deletes (and updates if they involve indexed columns) and more disk/memory usage. Use explain plan on your important SQLs (important when it comes to speed) to check which indexes are used and not. If an index is never used or only used in queries that is fast anyway without it, then you can drop those. Also be aware that newer versions of your database (sqlite, oracle, postgresql) can have newer query planner algorithms which for most SELECTs are better, but for some can get worse. Realistic tests on realistic datasets are the best way to tell. Which indexes to create is not an exact science and dont have definitive rules that fits all cases.

What is an automatic covering index?

When using EXPLAIN QUERY PLAN in SQLite 3 it sometimes gives me output such as
SEARCH TABLE staff AS s USING AUTOMATIC COVERING INDEX (is_freelancer=? AND sap=?) (~6 rows)
Where does the index come from and what does it do? The table has no manually created indices on it.
"Automatic" means that SQLite creates a temporary index that is used only for this query, and deleted afterwards.
This happens when the cost of creating the index is estimated to be smaller than the cost of looking up records in the table without the index.
(A covering index is an index that contains all the columns to be read, which means that the record corresponding to the index entry does not need to be looked up in the table.)

Resources