SQLite Query plan - sqlite

Is there a way to manipulate the query plan generated in SQLite?
I 'l try to explain my problem:
I have 3 tables:
CREATE TABLE "index_term" (
"id" INT,
"term" VARCHAR(255) NOT NULL,
PRIMARY KEY("id"),
UNIQUE("term"));
CREATE TABLE "index_posting" (
"doc_id" INT NOT NULL,
"term_id" INT NOT NULL,
PRIMARY KEY("doc_id", "field_id", "term_id"),,
CONSTRAINT "index_posting_doc_id_fkey" FOREIGN KEY ("doc_id")
REFERENCES "document"("doc_id") ON DELETE CASCADE,
CONSTRAINT "index_posting_term_id_fkey" FOREIGN KEY ("term_id")
REFERENCES "index_term"("id") ON DELETE CASCADE);;
CREATE INDEX "index_posting_term_id_idx" ON "index_posting"("term_id");
CREATE TABLE "published_files" (
"doc_id" INTEGER NOT NULL,,
"uri_id" INTEGER,
"user_id" INTEGER NOT NULL,
"status" INTEGER NOT NULL,
"title" VARCHAR(1024),
PRIMARY KEY("uri_id"));
CREATE INDEX "published_files_doc_id_idx" ON "published_files"("doc_id");
about 600.000 entries in the index_term, about 4 Millions in the index_posting and 300.000 in the published_files table.
Now when i want to find the number of unique doc_ids in index_posting which reference some terms i use the following SQL.
select count(distinct index_posting.doc_id) from index_term, index_posting
where
index_posting.term_id = index_term.id and index_term.term like '%test%'
The result is displayed in reasonable time (0.3 secs). Asking Explain Query plan returns
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
When i want to filter the count in the way that it only includes doc_ids of index_posting if there exists a published_files entry:
select count(distinct index_posting.doc_id) from index_term, index_posting,
published_files where
index_posting.term_id = index_term.id and index_posting.doc_id = published_files.doc_id and index_term.term like '%test%'
The query takes almost 10 times as long. Asking Explain Query plan returns
0|0|1|SCAN TABLE index_posting
0|1|0|SEARCH TABLE index_term USING INDEX sqlite_autoindex_index_term_1 (id=?)
0|2|2|SEARCH TABLE published_files AS pf USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
So as far as i understand SQLITE changed here its query plan doing a full table scan of index_posting and a lookup in index_term instead of the other way around.
As a workaround i did do a
analyze index_posting;
analyze index_term;
analyze published_files;
and now it seems correct,
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
0|2|2|SEARCH TABLE published_files USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
but my question is - is there a way to force SQLITE to always use the correct query plan?
TIA

ANALYZE is not a workaround; it's supposed to be used.
You can use CROSS JOIN to enforce a certain order of the nested loops, or use INDEXED BY to force a certain index to be used.
However, you asked for "the correct query plan", which might not be same as the one enforced by these mechanisms.

Related

How to write complex recursive maria db query

Im trying to write a recursive query for a use on a old and poorly designed database - and so the queries get quite complex.
Here is the (relevant) table relationships
Because people asked - here is the creation code for these tables:
CREATE TABLE CircuitLayout(
CircuitLayoutID int,
PRIMARY KEY (CircuitLayoutID)
);
CREATE TABLE LitCircuit (
LitCircuitID int,
CircuitLayoutID int,
PRIMARY KEY (LitCircuitID)
FOREIGN KEY (CircuitLayoutID) REFERENCES CircuitLayout(CircuitLayoutID)
);
CREATE TABLE CircuitLayoutItem(
CircuitLayoutItemID int,
CircuitLayoutID int,
TableName varchar(255),
TablePK int,
PRIMARY KEY (CircuitLayoutItemID)
FOREIGN KEY (CircuitLayoutID) REFERENCES CircuitLayout(CircuitLayoutID)
);
TableName refers to another table in the database and thus TablePK is a primary key from the specified table
One of the valid options for TableName is LitCircuit
I'm trying to write a query that will select a circuit and any circuit it is related to
I am having trouble understanding the syntax for recursive ctes
my non-functional attempt is this:
WITH RECURSIVE carries AS (
SELECT LitCircuit.LitCircuitID AS recurseList FROM LitCircuit
JOIN CircuitLayoutItem ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = "LitCircuit" AND CircuitLayoutItem.TablePK IN (00340)
UNION
SELECT LitCircuit.LitCircuitID AS CircuitIDs FROM LitCircuit
JOIN CircuitLayout ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = "LitCircuit" AND CircuitLayoutItem.TablePK IN (SELECT recurseList FROM carries)
)
SELECT * FROM carries;
the "00340" is a dummy number for testing, and it would get replaced with an actual list in usage
What i'm attempting to do is get a list of LitCircuitIDs based on one or many LitCircuitIDs - that's the anchor member, and that works fine.
What I want to do is take this result and feed it back into itself.
I lack an understanding of how to access data from the anchor member:
I don't know if it is a table with the columns from the select in the anchor or if it is simply a list of resulting values
I dont understand if or where I need to include "carries" in the FROM part of a query
If I were to write this function in python I would do it like this:
def get_circuits(circuit_list):
result_list = []
for layout_item_key, layout_item in CircuitLayoutItem.items():
if layout_item['TableName'] == "LitCircuit" and layout_item['TablePK'] in circuit_list:
layout = layout_item['CircuitLayoutID']
for circuit_key, circuit in LitCircuit.items():
if circuit["CircuitLayoutID"] == layout:
result_list.append(circuit_key)
result_list.extend(get_circuits(result_list))
return result_list
How do I express this in SQL?
danblack's comment made me realize something I was missing:
Here is what I was trying to do:
WITH RECURSIVE carries AS (
SELECT LitCircuit.LitCircuitID FROM LitCircuit
JOIN CircuitLayoutItem ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = 'LitCircuit' AND CircuitLayoutItem.TablePK IN (00340)
UNION ALL
SELECT LitCircuit.LitCircuitID FROM carries
JOIN CircuitLayoutItem ON carries.LitCircuitID = CircuitLayoutItem.TablePK
JOIN LitCircuit ON CircuitLayoutItem.CircuitLayoutID = LitCircuit.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = 'LitCircuit'
)
SELECT DISTINCT LitCircuitID FROM carries;
I did not think of the CTE as a table to query against - rather just a result set, so I did not realize you have to SELECT from it - or in general treat it like a table.

How to check if a parent/child relationship exists in a tree?

Checking if the following tables have a certain relationship among their records would be useful:
-- Table: privilege_group
CREATE TABLE privilege_group (
privilege_group_id integer NOT NULL CONSTRAINT privilege_group_pk PRIMARY KEY AUTOINCREMENT,
name text NOT NULL,
CONSTRAINT privilege_group_name UNIQUE (name)
);
-- Table: privilege_relationship
CREATE TABLE privilege_relationship (
privilege_relationship_id integer NOT NULL CONSTRAINT privilege_relationship_pk PRIMARY KEY AUTOINCREMENT,
parent_id integer NOT NULL,
child_id integer NOT NULL,
CONSTRAINT privilege_relationship_parent_child UNIQUE (parent_id, child_id),
CONSTRAINT privilege_relationship_parent_id FOREIGN KEY (parent_id)
REFERENCES privilege_group (privilege_group_id),
CONSTRAINT privilege_relationship_child_id FOREIGN KEY (child_id)
REFERENCES privilege_group (privilege_group_id),
CONSTRAINT privilege_relationship_check CHECK (parent_id != child_id)
);
Parents can have many children, children can have many parents. Writing code to process records outside of the database is always possible, but is it possible to use a depth-first (or breadth-first) search to check if a child has a particular parent?
My related question received a comment from CL. that mentions the WITH clause, but my experience with hierarchical queries is rather limited and insufficient to understand, select, and apply the examples on the page to my goal:
Only worked with hierarchical queries in Oracle.
Only used to implement "range" number generators (like in Python).
Only seen how to process records in a broad-to-narrow pattern.
Not sure if an expanding result set in a hierarchical query is possible.
Unsure of how to select a depth-first or breadth-first search strategy.
Could someone show me how to find out if a child has a parent if the names of both are known?
This is a standard tree search (using UNION instead of UNION ALL to prevent infinite loops):
WITH RECURSIVE ParentsOfG1(id) AS (
SELECT privilege_group_id
FROM privilege_group
WHERE name = 'G1'
UNION
SELECT parent_id
FROM privilege_relationship
JOIN ParentsOfG1 ON id = child_id
)
SELECT id
FROM ParentsOfG1
WHERE id = (SELECT privilege_group_id
FROM privilege_group
WHERE name = 'P2');
Depth/breadth-first does not matter for this.
An alternative to CL.'s answer could be this query which has been reformatted and adjusted to use bound parameters that could be plugged into a project that needs to check certain relationships:
WITH RECURSIVE parent_of_child(id)
AS (
SELECT privilege_group_id
FROM privilege_group
WHERE name = :child
UNION
SELECT parent_id
FROM privilege_relationship
JOIN parent_of_child
ON id = child_id)
SELECT id
FROM parent_of_child
WHERE id = (
SELECT privilege_group_id
FROM privilege_group
WHERE name = :parent)

SQLite search two tables

In a SQLite database, I have created two tables:
CREATE Table Master (ItemID VARCHAR PRIMARY KEY, Property VARCHAR)
CREATE Table Counter (OtherID VARCHAR PRIMARY KEY, ItemID VARCHAR)
Records on table Master:
* ItemID: Book, Property: large
* ItemID: Table, Property: green
Records on table Counter:
* OtherID: random1, ItemID: Book
* OtherID: random2, ItemID: Book
* OtherID: random3, ItemID: Book
The column ItemID on table Master has the same contents as the same-named column on table Counter.
What is the correct SQL select statement to get all rows from table Master sorted by the number of their records in table Counter ?
In this case, row "Book" has three counts in table Counter and should be listed on first position, while row "Table" has no counts and should be the second result.
I know how to do this on one table but never managed to get a SQL select statement working that spans two tables.
Any help is appreciated.
By the way: I cannot change the table structure; so not sure if there would be something better, but I have to work with the tables as they are.
attach to two different databases
access tables with "db?." in front
join both tables on the common semantic, i.e. the ItemId
left join to get the "empty" lines, too, with "0" count
make groups which represent the lines you want in the output, i.e. also by ItemId
grouping allows using the aggregate function "count()"
order according to desired output, i.e. by count, but descending to get "3" first
select the ItemId and the property to match desired output
Code:
attach 'master.db' as dbm;
attach 'counter.db' as dbc;
select a.ItemId, property
from dbm.Master a LEFT JOIN dbc.Counter b
using (ItemId)
group by a.ItemId
order by count(OtherId) desc;
Tested with :
echo .dump | sqlite3 counter.db
BEGIN TRANSACTION;
CREATE TABLE Counter (OtherID VARCHAR PRIMARY KEY, ItemID VARCHAR);
INSERT INTO Counter VALUES('random1','book');
INSERT INTO Counter VALUES('random2','book');
INSERT INTO Counter VALUES('random3','book');
COMMIT;
echo .dump | sqlite3 master.db
BEGIN TRANSACTION;
CREATE TABLE Master (ItemID VARCHAR PRIMARY KEY, Property VARCHAR);
INSERT INTO Master VALUES('book','large');
INSERT INTO Master VALUES('table','green');
COMMIT;
Output:
book|large
table|green
If I understand you, I think this should work:
SELECT M.ItemId, Property
FROM Master M
LEFT JOIN Counter C
ON M.itemid=C.itemid
GROUP BY C.itemid
ORDER BY COUNT(C.itemid) DESC;

sqlite3 join-filter order-by performance

I'm trying to do a query like that on an sqlite3 database:
select node.loc, node.weight from node
inner join filt on (node.id = filt.node_id)
inner join filt T5 on (node.id = T5.node_id)
where (filt.word = 'aaa' and T5.word = 'aasvogel')
order by node.weight desc limit 10;
On mysql, such query works fine and fast (<0.2s); on sqlite3, on the same data, it runs for ~2s.
What could be the problem and what can I do to improve its performance?
The files I made to test sqlite3 can be found here: https://github.com/HoverHell/sqlperftst1
The table definitions in particular:
CREATE TABLE "node" (
"id" integer PRIMARY KEY,
"loc" varchar(255) NOT NULL,
"weight" real
);
CREATE INDEX "node_loc" ON "node" ("loc");
CREATE INDEX "node_weight" on "node" ("weight");
CREATE TABLE "filt" (
"id" integer PRIMARY KEY,
"node_id" integer NOT NULL,
"word" varchar(120) NOT NULL
);
CREATE INDEX "filt_word" ON "filt" ("word");
CREATE INDEX "filt_node_id" ON "filt" ("node_id");
UPD: Perofrmance comparison on realistic data and queries:
This query can be improved by creating an index on both columns used for lookups:
CREATE INDEX filt_word_node_id ON file(word, node_id);
However, the shape of tje data is unusual, which makes the query optimizer misestimate the selectivity of the word lookups.
Run ANALYZE to fix this.

sqlite3 if exists alternative

I am trying to run simple query for sqlite which is update a record if not exists.
I could have used Insert or replace but article_tags table doesn't have any primary key as it is a relational table.
How can I write this query for sqlite as if not exists is not supported.
And I don't have idea how to use CASE for this ?
Table Structure:
articles(id, content)
article_tags(article_id, tag_id)
tag(id, name)
SQLITE Incorrect Syntax Tried
insert into article_tags (article_id, tag_id ) values ( 2,7)
if not exists (select 1 from article_tags where article_id =2 AND tag_id=7)
I think the correct way to do this would be to add a primary key to the article_tags table, a composite one crossing both columns. That's the normal way to do many-to-many relationship tables.
In other words (pseudo-DDL):
create table article_tags (
article_id int references articles(id),
tag_id int references tag(id),
primary key (article_id, tag_id)
);
That way, insertion of a duplicate pair would fail rather than having to resort to if not exists.
This seems to work, although I will, as others have also said, suggest that you add some constraints on the table and then simply "try" to insert.
insert into article_tags
select 2, 7
where not exists ( select *
from article_tags
where article_id = 2
and tag_id = 7 )

Resources