SQLite strange response with IN and subselect - sqlite

CREATE TABLE minitable (uid INTEGER primary key);
insert into minitable values (1);
insert into minitable values (2);
insert into minitable values (3);
insert into minitable values (4);
insert into minitable values (5);
insert into minitable values (6);
select uid
from minitable as gp
where
uid in (
(select uid from minitable)
);
SQLite gives only one line as the result.
I expect SQLite to give all 6 values like other database engines.
MariaDB, PostgreSQL and some other DBMSs give all values from 1 to 6 as the result.
With this,
select uid
from minitable as gp
where
uid in (
1,2,3,4,5,6
);
SQLite (like MariaDB and PostgreSQL) returns all values.
I tested with SQLite 3.40.1, 3.21 and 3.37.

From SQL Language Expressions/Subquery Expressions (emphasis mine):
A SELECT statement enclosed in parentheses is a subquery. All types of
SELECT statement, including aggregate and compound SELECT queries
(queries with keywords like UNION or EXCEPT) are allowed as scalar
subqueries. The value of a subquery expression is the first row of the
result from the enclosed SELECT statement....
In your code you enclose select uid from minitable inside parentheses and again inside the parentheses of the list of the operator IN.
This way the subquery is interpreted as a single element of the list of the operator IN and since SQLite treats the subquery as a single value (the first row of the resultset which is the row containing only the column uid with a value of 1), the full query is equivalent to:
select uid
from minitable as gp
where uid in (1);
For SQLite, your desired behavior can be achieved if you remove the parentheses around the subquery and leave only the ones of the list of the operator IN:
select uid
from minitable as gp
where uid in (select uid from minitable);

Related

Why does this SQLite query not use an index for the correlated subquery?

Consider a SQLite database for things with parts, containing the following tables
CREATE TABLE thing (id integer PRIMARY KEY, name text, total_cost real);
CREATE TABLE part (id integer PRIMARY KEY, cost real);
CREATE TABLE thing_part (thing_id REFERENCES thing(id), part_id REFERENCES part(id));
I have an index to find the parts of a thing
CREATE INDEX thing_part_idx ON thing_part (thing_id);
To illustrate the problem, I'm using the following queries to fill the tables with random data
INSERT INTO thing(name)
WITH RECURSIVE
cte(x) AS (
SELECT 1
UNION ALL
SELECT 1 FROM cte LIMIT 10000
)
SELECT hex(randomblob(4)) FROM cte;
INSERT INTO part(cost)
WITH RECURSIVE
cte(x) AS (
SELECT 1
UNION ALL
SELECT 1 FROM cte LIMIT 10000
)
SELECT abs(random()) % 100 FROM cte;
INSERT INTO thing_part (thing_id, part_id)
SELECT thing.id, abs(random()) % 10000 FROM thing, (SELECT 1 UNION ALL SELECT 1), (SELECT 1 UNION ALL SELECT 1);
So each thing is associated with a small number of parts (4 in this example).
At this point, I have not yet set the total cost of the things. I thought I could use the following query
UPDATE thing SET total_cost = (
SELECT sum(part.cost)
FROM thing_part, part
WHERE thing_part.thing_id = thing.id
AND thing_part.part_id = part.id);
but it is extremely slow (I did not have the patience to wait for it to complete).
EXPLAIN QUERY PLAN shows that both thing and thing_part are being scanned over, only the lookup in part is done using the rowid:
SCAN TABLE thing
EXECUTE CORRELATED SCALAR SUBQUERY 0
SCAN TABLE thing_part
SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
If I look at the query plan for the inner query with a fixed thing_id, i.e.
SELECT sum(part.cost)
FROM thing_part, part
WHERE thing_part.thing_id = 1000
AND thing_part.part_id = part.id;
it does use the thing_part_idx:
SEARCH TABLE thing_part USING INDEX thing_part_idx (thing_id=?)
SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
I would expect the first query to be equivalent to iterating over all rows of thing and executing the inner query each time, but obviously that's not the case. Why? Should I use a different index or rewrite my query or maybe do the iteration in the client to generate multiple queries instead?
In case it matters, I'm using SQLite version 3.22.0
SQLite might use dynamic typing, but column types still matter for affinity, and indexes can be used only when the database can prove that index lookups behave the same as comparisons with the actual table values, which often requires the affinities to be compatible.
So when you tell the database that the thing_part values are integers:
CREATE TABLE thing_part (
thing_id integer REFERENCES thing(id),
part_id integer REFERENCES part(id)
);
then the index on that will have the correct affinity, and will be used:
QUERY PLAN
|--SCAN TABLE thing
`--CORRELATED SCALAR SUBQUERY
|--SEARCH TABLE thing_part USING INDEX thing_part_idx (thing_id=?)
`--SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
I would rewrite your query as:
-- calculating sum for each thing_id at once
WITH cte AS (
SELECT thing_part.thing_id, sum(part.cost) AS s
FROM thing_part
JOIN part
ON thing_part.part_id = part.id
GROUP BY thing_part.thing_id
)
UPDATE thing
SET total_cost = (SELECT s FROM cte WHERE thing.id = cte.thing_id);

Get last_insert_rowid() from bulk insert

I have an sqlite database where I need to insert spatial information along with metadata into an R*tree and an accompanying regular table. Each entry needs to be uniquely defined for the lifetime of the database. Therefore the regular table have an INTEGER PRIMARY KEY AUTOINCREMENT column and my plan was to start with the insert into this table, extract the last inserted rowids and use these for the insert into the R*tree. Alas this doesn't seem possible:
>testCon <- dbConnect(RSQLite::SQLite(), ":memory:")
>dbGetQuery(testCon, 'CREATE TABLE testTable (x INTEGER PRIMARY KEY, y INTEGER)')
>dbGetQuery(testCon, 'INSERT INTO testTable (y) VALUES ($y)', bind.data=data.frame(y=1:5))
>dbGetQuery(testCon, 'SELECT last_insert_rowid() FROM testTable')
last_insert_rowid()
1 5
2 5
3 5
4 5
5 5
Only the last inserted rowid seems to be kept (probably for performance reasons). As the number of records to be inserted is hundreds of thousands, it is not feasible to do the insert line by line.
So the question is: Is there any way to make the last_insert_rowid() bend to my will? And if not, what is the best failsafe alternative? Some possibilities:
Record highest rowid before insert and 'SELECT rowid FROM testTable WHERE rowid > prevRowid'
Get the number of rows to insert, fetch the last_insert_rowid() and use seq(to=lastRowid, length.out=nInserts)
While the two above suggestion at least intuitively should work I don't feel confident enough in sqlite to know if they are failsafe.
The algorithm for generating autoincrementing IDs is documented.
For an INTEGER PRIMARY KEY column, you can simply get the current maximum value:
SELECT IFNULL(MAX(x), 0) FROM testTable
and then use the next values.

How to read the last record in SQLite table?

Is there a way to read the value of the last record inserted in an SQLite table without going through the previous records ?
I ask this question for performance reasons.
There is a function named sqlite3_last_insert_rowid() which will return the integer key for the most recent insert operation. http://www.sqlite.org/c3ref/last_insert_rowid.html
This only helps if you know the last insert happened on the table you care about.
If you need the last row on a table, regardless of wehter the last insert was on this table or not, you will have to use a SQL query
SELECT * FROM mytable WHERE ROWID IN ( SELECT max( ROWID ) FROM mytable );
When you sort the records by ID, in reverse order, the last record will be returned first.
(Because of the implicit index on the autoincrementing column, this is efficient.)
If you aren't interested in any other records, use LIMIT:
SELECT *
FROM MyTable
ORDER BY _id DESC
LIMIT 1

SQLITE Insert Multiple Rows Using Select as Value

I have a sqlite statement that will only insert one row.
INSERT INTO queue (TransKey, CreateDateTime, Transmitted)
VALUES (
(SELECT Id from trans WHERE Id != (SELECT TransKey from queue)),
'2013-12-19T19:47:33',
0
)
How would I have it insert every row where Id from trans != (SELECT TransKey from queue) in one statement?
INSERT INTO queue (TransKey, CreateDateTime, Transmitted)
SELECT Id, '2013-12-19T19:47:33', 0
FROM trans WHERE Id != (SELECT TransKey from queue)
There are two different "flavors" of INSERT. The one you're using (VALUES) inserts one or more rows that you "create" in the INSERT statement itself. The other flavor (SELECT) inserts a variable number of rows that are retrieved from one or more other tables in the database.
While it's not immediately obvious, the SELECT version allows you to include expressions and simple constants -- as long as the number of columns lines up with the number of columns you're inserting, the statement will work (in other databases, the types of the values must match the column types as well).

How do I find out if a SQLite index is unique? (With SQL)

I want to find out, with an SQL query, whether an index is UNIQUE or not. I'm using SQLite 3.
I have tried two approaches:
SELECT * FROM sqlite_master WHERE name = 'sqlite_autoindex_user_1'
This returns information about the index ("type", "name", "tbl_name", "rootpage" and "sql"). Note that the sql column is empty when the index is automatically created by SQLite.
PRAGMA index_info(sqlite_autoindex_user_1);
This returns the columns in the index ("seqno", "cid" and "name").
Any other suggestions?
Edit: The above example is for an auto-generated index, but my question is about indexes in general. For example, I can create an index with "CREATE UNIQUE INDEX index1 ON visit (user, date)". It seems no SQL command will show if my new index is UNIQUE or not.
PRAGMA INDEX_LIST('table_name');
Returns a table with 3 columns:
seq Unique numeric ID of index
name Name of the index
unique Uniqueness flag (nonzero if UNIQUE index.)
Edit
Since SQLite 3.16.0 you can also use table-valued pragma functions which have the advantage that you can JOIN them to search for a specific table and column. See #mike-scotty's answer.
Since noone's come up with a good answer, I think the best solution is this:
If the index starts with "sqlite_autoindex", it is an auto-generated index for a single UNIQUE column
Otherwise, look for the UNIQUE keyword in the sql column in the table sqlite_master, with something like this:
SELECT * FROM sqlite_master WHERE type = 'index' AND sql LIKE '%UNIQUE%'
you can programmatically build a select statement to see if any tuples point to more than one row. If you get back three columns, foo, bar and baz, create the following query
select count(*) from t
group by foo, bar, baz
having count(*) > 1
If that returns any rows, your index is not unique, since more than one row maps to the given tuple. If sqlite3 supports derived tables (I've yet to have the need, so I don't know off-hand), you can make this even more succinct:
select count(*) from (
select count(*) from t
group by foo, bar, baz
having count(*) > 1
)
This will return a single row result set, denoting the number of duplicate tuple sets. If positive, your index is not unique.
You are close:
1) If the index starts with "sqlite_autoindex", it is an auto-generated index for the primary key . However, this will be in the sqlite_master or sqlite_temp_master tables depending depending on whether the table being indexed is temporary.
2) You need to watch out for table names and columns that contain the substring unique, so you want to use:
SELECT * FROM sqlite_master WHERE type = 'index' AND sql LIKE 'CREATE UNIQUE INDEX%'
See the sqlite website documentation on Create Index
As of sqlite 3.16.0 you could also use pragma functions:
SELECT distinct il.name
FROM sqlite_master AS m,
pragma_index_list(m.name) AS il,
pragma_index_info(il.name) AS ii
WHERE m.type='table' AND il.[unique] = 1;
The above statement will list all names of unique indexes.
SELECT DISTINCT m.name as table_name, ii.name as column_name
FROM sqlite_master AS m,
pragma_index_list(m.name) AS il,
pragma_index_info(il.name) AS ii
WHERE m.type='table' AND il.[unique] = 1;
The above statement will return all tables and their columns if the column is part of a unique index.
From the docs:
The table-valued functions for PRAGMA feature was added in SQLite version 3.16.0 (2017-01-02). Prior versions of SQLite cannot use this feature.

Resources