What causes sqlite join contraints with OR clauses to be significantly slower? - sqlite

Here are two sqlite queries:
SELECT * FROM items JOIN licenses ON items.id=licenses.id OR items.type=licenses.type;
This query doesn't use OR, it uses UNION
SELECT * FROM items JOIN licenses ON items.id=licenses.id UNION SELECT * FROM items JOIN licenses ON items.type=licenses.type;
Assuming I have an index in the licenses table on id and an index in the licenses table on type shouldn't the first query that uses an OR be only a tiny bit slower?
I am seeing that the first query is approximately 20 times slower than the second query in Sqlite, what is the cause for that?
I would expect the internal plan to look something like this for the first query:
For each row in the items table:
Take the value from the id column of the items table and use it to lookup all rows in the licenses table with that id, call that set of matching rows A.
Take the value from the type column of the items table and use it to lookup all rows in the licenses table with that type, call that set of matching rows A'.
Combine A and A' and eliminate any duplicate rows. Add the result in the list of result rows

For doing joins, SQLite supports only nested loop joins on two tables (which can be optimized with indexes).
As explained in The SQLite Query Planner and Query Planning, doing joins with two tables at once is not one of the supported optimizations.

Related

SQLite: treat non-existent column as NULL

I have a query like this (simplified and anonymised):
SELECT
Department.id,
Department.name,
Department.manager_id,
Employee.name AS manager_name
FROM
Department
LEFT OUTER JOIN Employee
ON Department.manager_id = Employee.id;
The field Department.manager_id may be NULL. If it is non-NULL then it is guaranteed to be a valid id for precisely one row in the Employee table, so the OUTER JOIN is there just for the rows in the Department table where it is NULL.
Here is the problem: old instances of the database do not have this Department.manager_id column at all. In those cases, I would like the query to act as if the field did exist but was always NULL, so e.g. the manager_name field is returned as NULL. If the query only used the Department table then I could just use SELECT * and check for the column in my application, but the JOIN seems to make this impossible. I would prefer not to modify the database, partly so that I can load the database in read only mode. Can this be done just by clever adjustment of the query?
For completeness, here is an answer that does not require munging both possible schemas into one query (but still doesn't need you to actually do the schema migration):
Check for the schema version, and use that to determine which SELECT query to issue (i.e. with or without the manager_id column and JOIN) as a separate step. Here are a few possibilities to determine the schema version:
The ideal situation is that you already keep track of the schema by assigning version numbers to the schema and recording them in the database. Commonly this is done with either:
The user_version pragma.
A table called "Schema" or similar with one row containing the schema version number.
You can directly determine whether the column is present in the table. Two possibilities:
Use the table_info pragma to determine the list of columns in the table.
Use a simple SELECT * FROM Table LIMIT 1 and look at what columns are returned (this is probably better as it is independent of the database engine).
This seems to work:
SELECT
Dept.id,
Dept.name,
Dept.manager_id,
Employee.name AS manager_name
FROM
(SELECT *, NULL AS manager_id FROM Department) AS Dept
LEFT OUTER JOIN Employee
ON Dept.manager_id = Employee.id;
If the manager_id column is present in Department then it is used for the join, whereas if it is not then Dept.manager_id and Employee.name are both NULL.
If I swap the column order in the subquery:
(SELECT NULL AS manager_id, * FROM Department) AS Dept
then the Dept.manager_id and Employee.name are both NULL even if the Department.manager_id column exists, so it seems that Dept.manager_id refers to the first column in the Dept subquery that has that name. It would be good to find a reference in the SQLite documentation saying that this behaviour is guaranteed (or explicitly saying that it is not), but I can't find anything (e.g. in the SELECT or expression pages).
I haven't tried this with other database systems so I don't know if it will work with anything other than SQLite.

sql query for extracting one column from many tables

I need your support for a query in SQLite Studio.
I am dealing with a database made by 1,000 different tables.
Half of them (all named "news" + an identification number, like 04AD86) contain the column "category" which I am interested in. This column can have from 100 to 200 records for each table.
Could you suggest me a query that extracts "category" from every table and returns a list of all possible categories (without duplicates records)?
Thanks a lot
You will probably need dynamic SQL to handle this in a single query. If you don't mind doing this over several queries, then here is one option. First do a query to obtain all the tables which contain the category column:
SELECT name
FROM sqlite_master
WHERE type = 'table' AND name LIKE 'news%'
Next, for the actual queries to obtain the unique categories, you can perform a series of unions to get your list. Here is what it would look like:
SELECT DISTINCT category
FROM news04AD86
UNION
SELECT DISTINCT category
FROM news 05BG34
UNION
...
The DISTINCT keyword will remove duplicates within any given name table, and UNION will remove duplicates which might occur between one table and another.

What is the fastest way of selecting by a list of strings in sqlite database?

I have database with with roughly following structure:
table1 (name) -< table2 -< table3 (score)
where -< means 1 to many relationship. What i need to do is for every string in a given list find the linked entry from table3 with a maximum score value. The way i do it now is quite slow, and i wonder of it could be sped up.
How i am doing this:
SELECT k.score,k.yaw,k.pitch,k.roll,k.kp_number,k.ke_number,k.points,k.elems --various fields of third table
FROM File
JOIN FaceDetection AS d ON d.f_id=File.file_id --joining second table
JOIN FaceKey AS k ON k.face_det=d.fd_id --joining third table
WHERE name=:fld
ORDER BY k.score DESC
I open transaction, prepare query with the above text, and in cycle retrieve the entries i am interested in from the database, then commit transaction. What are better, faster ways?
Indexes can be used for all the columns that are used for lookups or sorting, but a query cannot use more than one index per table.
Check the EXPLAIN QUERY PLAN output to see whether this query does table scans or uses indexes.
You are not returning values from any table but FaceKey, so you do not actually need to do a join.
However, rewriting the query as below might or might not help:
SELECT score,
yaw,
pitch,
roll,
kp_number,
ke_number,
points,
elems
FROM FaceKey
WHERE face_det IN (SELECT fd_id
FROM FaceDetection
WHERE f_id IN (SELECT file_id
FROM File
WHERE name = :fld))
ORDER BY score DESC

SQLite - Selecting not indexed column in GROUP BY

I have similar situation like question below.
Mysql speed up max() group by
SELECT MAX(id) id, cid FROM table GROUP BY cid
To optimize above query (shown in the question), creating index(cid, id) does the trick.
However, when I add a column that is not indexed to SELECT, query speed drastically slows down.
For example,
SELECT MAX(id) id, cid, newcolumn FROM table GROUP BY cid
If I create index(cid, id, newcolumn), query time comes back to minimal. It seems I should index all the columns I select while using GROUP BY.
Is there any way other than indexing all the columns to be select?
When all the columns used in the query are part of the index (which is then called a covering index), SQLite can get all values from the index and does not need to access the table itself.
When adding a column that is not indexed, each record must be looked up in both the index and the table.
Furthermore, the order of the records in the table is unlikely to be the same as the order in the index, so the table's pages are not read in order, and are read multiple times, which means that caching will not work as well.
The newcolumn values must be read from either the table or an index; there is no other mechanism to store data.
tl;dr: no

sqlite, slow joined query throught 4 tables

I have query through 4 tables: times, tags, users and categories.
Each table has no more than 400 records, but this query takes 70ms.
I need it many times (400x), so all procedure takes a total of about 30 seconds.
SELECT COUNT(*) FROM times
INNER JOIN tags ON times.user_id = tags.tag_id
INNER JOIN users ON tags.user_nr = users.nr
INNER JOIN categories ON users.category_id = categories.id
WHERE (times.time_raw < "000560")
AND (times.time_raw != 0 )
AND (times.cell != 1 )
AND (categories.name="kategory_A")
AND (times.run_id="08")
How can I make it faster?
Indexes is the solution!!
The following list gives guidelines in choosing columns to index:
•You should create indexes on columns that are used frequently in
WHERE clauses.
•You should create indexes on columns that are used
frequently to join tables.
•You should create indexes on columns
that are used frequently in ORDER BY clauses.
•You should create
indexes on columns that have few of the same values or unique values
in the table.
•You should not create indexes on small tables (tables
that use only a few blocks) because a full table scan may be faster
than an indexed query.
•If possible, choose a primary key that
orders the rows in the most appropriate order.
•If only one column
of the concatenated index is used frequently in WHERE clauses, place
that column first in the CREATE INDEX statement.
•If more than one
column in a concatenated index is used frequently in WHERE clauses,
place the most selective column first in the CREATE INDEX statement.

Resources