{SQLITE3} How can I force a query to not use a index on a given table? - sqlite

I am doing performance analysis over mondial database using sqlite3. One test case where I have to compare performance with and without using index (it should not use sqlite_autoindex as well).
I found this link :How can I force a query to not use a index on a given table? very useful but most of the answers refer to SQL SERVER. But I need it for SQLITE3. (I have tried PRAGMA options but no result).

It's buried in the syntax diagrams for SELECT, but there is a way - Using NOT INDEXED with a table name in the FROM clause:
sqlite> CREATE TABLE foo(bar);
sqlite> CREATE INDEX foo_idx ON foo(bar);
sqlite> EXPLAIN QUERY PLAN SELECT * FROM foo WHERE bar = ?;
QUERY PLAN
`--SEARCH TABLE foo USING COVERING INDEX foo_idx (bar=?)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM foo NOT INDEXED WHERE bar = ?;
QUERY PLAN
`--SCAN TABLE foo
As you can see, the first query uses the index, and the second one doesn't.
How to read EXPLAIN QUERY PLAN output.

Related

How to explicitly change query plan for sqlite3 [duplicate]

I am doing performance analysis over mondial database using sqlite3. One test case where I have to compare performance with and without using index (it should not use sqlite_autoindex as well).
I found this link :How can I force a query to not use a index on a given table? very useful but most of the answers refer to SQL SERVER. But I need it for SQLITE3. (I have tried PRAGMA options but no result).
It's buried in the syntax diagrams for SELECT, but there is a way - Using NOT INDEXED with a table name in the FROM clause:
sqlite> CREATE TABLE foo(bar);
sqlite> CREATE INDEX foo_idx ON foo(bar);
sqlite> EXPLAIN QUERY PLAN SELECT * FROM foo WHERE bar = ?;
QUERY PLAN
`--SEARCH TABLE foo USING COVERING INDEX foo_idx (bar=?)
sqlite> EXPLAIN QUERY PLAN SELECT * FROM foo NOT INDEXED WHERE bar = ?;
QUERY PLAN
`--SCAN TABLE foo
As you can see, the first query uses the index, and the second one doesn't.
How to read EXPLAIN QUERY PLAN output.

Use views and table valued functions as node or edge tables in match clauses

I like to use Table Valued functions in MATCH clauses in the same way as is possible with Node tables. Is there a way to achieve this?
The need for table valued functions
There can be various use cases for using table valued functions or views as Node tables. For instance mine is the following.
I have Node tables that contain NVarChar(max) fields that I would like to search for literal text. I need only equality searching and no full text searching, so I opted for using a index on the hash value of the text field. As suggested by Remus Rusanu in his answer to SQL server - worth indexing large string keys? and https://www.brentozar.com/archive/2013/05/indexing-wide-keys-in-sql-server/. A table valued function handles using the CHECKSUM index; see Msg 207 Invalid column name $node_id for pseudo column in inline table valued function.
Example data definitions
CREATE TABLE [Tags](
[tag] NVarChar(max),
[tagHash] AS CHECKSUM([Tag]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Sites](
[endPoint] NVarChar(max),
[endPointHash] AS CHECKSUM([endPoint]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Links] as Edge;
CREATE INDEX [IX_TagsByName] ON [Tags]([tagHash]);
GO
CREATE FUNCTION [TagsByName](
#tag NVarChar(max))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT
$node_id AS [NodeId],
[tag],
[tagHash]
FROM [dbo].[Tags]
WHERE [tagHash] = CHECKSUM(#tag) AND
[tag] = #tag;
[TagsByName] returns the $node_id with an alias NodeId as suggested by https://stackoverflow.com/a/45565410/814206. However, real Node tables contain two more internal columns which I do not know how to export.
Desired query
I would like to query the database similar to this:
SELECT *
FROM [TagsByName]('important') as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s])
However, this results in the error1:
Msg 13901, Level 16, State 2, Line ...
Identifier 't' in a MATCH clause is not a node table or an alias for a node table.
I there a way to do this?
PS. There are some workarounds but they do not look as elegant as the MATCH-query; especially considering that my actual query involves matching more relations and more string equality tests. I will post these workarounds as answers and hope that someone comes with a better idea.
1 This gives a very specific difference between views and tables for Difference between View and table in sql; which only occurs in sql-server-2017 and only when using SQL Graph.
Workaround
Revert to traditional relational joins via JOIN clauses or FROM with <table_or_view_name> and WHERE clauses. In queries that match on more relations, the latter has the advantage that sql-server-2017-graph can MATCH on FROM <table_or_view_name> but not on FROM <table_source> JOIN <table_source>.
SELECT *
FROM [TagsByName]('important') as t
[Sites] as s,
[Links] as l
WHERE t.NodeId = l.$from_id AND
l.$to_id = s.$node_id;
Workaround
Add the Node table twice to the from clause: once as table and once as table valued function and join them via the $node_id in the where clause:
SELECT *
FROM [TagsByName]('important') as t1,
[Tags] as t2,
[Sites] as s,
[Links] as l
WHERE MATCH ([t2]-([l])->[s]) AND
t1.[NodeId] = t2.$node_id
Does this affect performance?
Workaround
Do not use the table valued function, but include its expression in the WHERE clause:
SELECT *
FROM [Tags] as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s]) AND
[t].[tagHash] = CHECKSUM('important') AND
[t].[tag] = 'important'
Downside: This is easy to get wrong; for example by forgetting to join on the CHECKSUM

sqlite: SELECTing with UNION when the other table does not exist

I have two sqlite3 databases which have tables with identical schemas and (possibly) overlapping data. For example a Temperature table in both databases. If I want to get all the columns from both tables combined I will first ATTACH the other database:
sqlite> ATTACH DATABASE 'old.sqlite' AS Old;
and then combine them with UNION like this:
sqlite> SELECT * FROM Temperature UNION SELECT * FROM Old.Temperature;
This works fine.
However sometimes there is just one table. For example for humidity I might have just one Humidity and no counterpart in the other database. In this case the query fails:
sqlite> SELECT * FROM Humidity UNION SELECT * FROM Old.Humidity;
SQL error: no such table: Old.Humidity
What I would like to get is all columns from the tables that do exist and not to fail just because the other table doesn't exist.
I don't know before hand which tables exist in which databases. I only have the table names from all the databases combined into one list. And the part of the codebase that's reading the data expects to get all the columns in one query.
It is not possible to create a query dyncamically in SQL.
(SQLite is designed to be used from within an application written in a 'real' programming language.)
You have to check beforehand whether the table exists (use PRAGMA table_info, or try if a query using that table works).
Then execute a query either with or without UNION.

Indexes with custom collations in sqlite

Assuming I have a schema like this:
CREATE TABLE abc(
id INTEGER PRIMARY KEY AUTOINCREMENT,
txt TEXT
);
CREATE INDEX "txtCS" ON "abc"("txt" COLLATE MY_CUSTOM_SORT);
when will sqlite use my index on txt ?
because I ran:
EXPLAIN QUERY PLAN SELECT * FROM abc ORDER BY txt COLLATE MY_CUSTOM_SORT DESC ...
and it tells me that it scans the table, twice, using the txtCS index (It doesn't search like I expected.)
MY_CUSTOM_SORT is my own sorting function that I hooked with sqliteCreateCollation. I just need that index for some queries that involve special ordering and I want them to be fast
In the EXPLAIN QUERY PLAN output, SEARCH means that the database tries to look up some particular record(s) with specific values, while SCAN means that the database goes through the entire table.
This query returns all records, so the most efficient operation is a SCAN.
Either operation can be sped up with an index.
(In a SCAN, the database just goes through all index entries in order.)

Hierarchical Database Select / Insert Statement (SQL Server)

I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.

Resources