How do I find out if a SQLite index is unique? (With SQL) - sqlite

I want to find out, with an SQL query, whether an index is UNIQUE or not. I'm using SQLite 3.
I have tried two approaches:
SELECT * FROM sqlite_master WHERE name = 'sqlite_autoindex_user_1'
This returns information about the index ("type", "name", "tbl_name", "rootpage" and "sql"). Note that the sql column is empty when the index is automatically created by SQLite.
PRAGMA index_info(sqlite_autoindex_user_1);
This returns the columns in the index ("seqno", "cid" and "name").
Any other suggestions?
Edit: The above example is for an auto-generated index, but my question is about indexes in general. For example, I can create an index with "CREATE UNIQUE INDEX index1 ON visit (user, date)". It seems no SQL command will show if my new index is UNIQUE or not.

PRAGMA INDEX_LIST('table_name');
Returns a table with 3 columns:
seq Unique numeric ID of index
name Name of the index
unique Uniqueness flag (nonzero if UNIQUE index.)
Edit
Since SQLite 3.16.0 you can also use table-valued pragma functions which have the advantage that you can JOIN them to search for a specific table and column. See #mike-scotty's answer.

Since noone's come up with a good answer, I think the best solution is this:
If the index starts with "sqlite_autoindex", it is an auto-generated index for a single UNIQUE column
Otherwise, look for the UNIQUE keyword in the sql column in the table sqlite_master, with something like this:
SELECT * FROM sqlite_master WHERE type = 'index' AND sql LIKE '%UNIQUE%'

you can programmatically build a select statement to see if any tuples point to more than one row. If you get back three columns, foo, bar and baz, create the following query
select count(*) from t
group by foo, bar, baz
having count(*) > 1
If that returns any rows, your index is not unique, since more than one row maps to the given tuple. If sqlite3 supports derived tables (I've yet to have the need, so I don't know off-hand), you can make this even more succinct:
select count(*) from (
select count(*) from t
group by foo, bar, baz
having count(*) > 1
)
This will return a single row result set, denoting the number of duplicate tuple sets. If positive, your index is not unique.

You are close:
1) If the index starts with "sqlite_autoindex", it is an auto-generated index for the primary key . However, this will be in the sqlite_master or sqlite_temp_master tables depending depending on whether the table being indexed is temporary.
2) You need to watch out for table names and columns that contain the substring unique, so you want to use:
SELECT * FROM sqlite_master WHERE type = 'index' AND sql LIKE 'CREATE UNIQUE INDEX%'
See the sqlite website documentation on Create Index

As of sqlite 3.16.0 you could also use pragma functions:
SELECT distinct il.name
FROM sqlite_master AS m,
pragma_index_list(m.name) AS il,
pragma_index_info(il.name) AS ii
WHERE m.type='table' AND il.[unique] = 1;
The above statement will list all names of unique indexes.
SELECT DISTINCT m.name as table_name, ii.name as column_name
FROM sqlite_master AS m,
pragma_index_list(m.name) AS il,
pragma_index_info(il.name) AS ii
WHERE m.type='table' AND il.[unique] = 1;
The above statement will return all tables and their columns if the column is part of a unique index.
From the docs:
The table-valued functions for PRAGMA feature was added in SQLite version 3.16.0 (2017-01-02). Prior versions of SQLite cannot use this feature.

Related

When to create multi-column indices in SQLite?

Assume I have a table in an SQLite database:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
price INTEGER NOT NULL,
updateTime INTEGER NOT NULL,
) [WITHOUT ROWID];
what indices should I create to optimize the following query:
SELECT * FROM orders WHERE price > ? ORDER BY updateTime DESC;
Do I create two indices:
CREATE INDEX i_1 ON orders(price);
CREATE INDEX i_2 ON orders(updateTime);
or one complex index?
CREATE INDEX i_3 ON orders(price, updateTime);
What can be query time complexity?
From The SQLite Query Optimizer Overview/WHERE Clause Analysis:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might
be used if the initial columns of the index (columns a, b, and so
forth) appear in WHERE clause terms. The initial columns of the index
must be used with the = or IN or IS operators. The right-most column
that is used can employ inequalities.
As explained also in The SQLite Query Optimizer Overview/The Skip-Scan Optimization with an example:
Because the left-most column of the index does not appear in the WHERE
clause of the query, one is tempted to conclude that the index is not
usable here. However, SQLite is able to use the index.
This means than if you create an index like:
CREATE INDEX idx_orders ON orders(updateTime, price);
it might be used to optimize the WHERE clause even though updateTime does not appear there.
Also, from The SQLite Query Optimizer Overview/ORDER BY Optimizations:
SQLite attempts to use an index to satisfy the ORDER BY clause of a
query when possible. When faced with the choice of using an index to
satisfy WHERE clause constraints or satisfying an ORDER BY clause,
SQLite does the same cost analysis described above and chooses the
index that it believes will result in the fastest answer.
Since updateTime is defined first in the composite index, the index may also be used to optimize the ORDER BY clause.

How to list total records for all sqlite tables? [duplicate]

Please help on below query :
sqlite query to get all list of table names with number of records in it :
I want to get the count of rows in every table in a Sqlite3 database. I want to avoid writing out a longhand query. I can get the list of tables like this:
SELECT name FROM sqlite_master WHERE type='table'
and I would like to use it in a subquery like this:
select count (*) from (SELECT name FROM sqlite_master WHERE type='table');
but would just return the total rows in the subquery, which isn't what I want.
Perhaps you use the results of ANALYZE to create a workaround. It creates the internal schema object sqlite_stat1
2.6.3. The sqlite_stat1 table
The sqlite_stat1 is an internal table created by the ANALYZE command
and used to hold supplemental information about tables and indexes
that the query planner can use to help it find better ways of
performing queries. Applications can update, delete from, insert into
or drop the sqlite_stat1 table, but may not create or alter the
sqlite_stat1 table. The schema of the sqlite_stat1 table is as
follows:
CREATE TABLE sqlite_stat1(tbl,idx,stat);
There is normally one row per index, with the index identified by the
name in the sqlite_stat1.idx column. The sqlite_stat1.tbl column is
the name of the table to which the index belongs. In each such row,
the sqlite_stat.stat column will be a string consisting of a list of
integers followed by zero or more arguments. The first integer in this
list is the approximate number of rows in the index. (The number of
rows in the index is the same as the number of rows in the table,
except for partial indexes.) .....
If there are no partial indexes, the SELECT tbl,cast(stat as INT) will return the number of rows in each table, unless the table has 0 rows.
This sql gives the expected results on a small (25MB, 34 tables, 26 indexes, 33K+ rows) production database. Your mileage may (will?) vary.
ANALYZE;
select DISTINCT tbl_name, CASE WHEN stat is null then 0 else cast(stat as INT) END numrows
from sqlite_master m
LEFT JOIN sqlite_stat1 stat on m.tbl_name = stat.tbl
where m.type='table'
and m.tbl_name not like 'sqlite_%'
order by 1;
--drop table sqlite_stat1;

Deleting duplicate rows

I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks
If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)
With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);
why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.
try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times

sqlite query to get all list of table names with number of records in it

Please help on below query :
sqlite query to get all list of table names with number of records in it :
I want to get the count of rows in every table in a Sqlite3 database. I want to avoid writing out a longhand query. I can get the list of tables like this:
SELECT name FROM sqlite_master WHERE type='table'
and I would like to use it in a subquery like this:
select count (*) from (SELECT name FROM sqlite_master WHERE type='table');
but would just return the total rows in the subquery, which isn't what I want.
Perhaps you use the results of ANALYZE to create a workaround. It creates the internal schema object sqlite_stat1
2.6.3. The sqlite_stat1 table
The sqlite_stat1 is an internal table created by the ANALYZE command
and used to hold supplemental information about tables and indexes
that the query planner can use to help it find better ways of
performing queries. Applications can update, delete from, insert into
or drop the sqlite_stat1 table, but may not create or alter the
sqlite_stat1 table. The schema of the sqlite_stat1 table is as
follows:
CREATE TABLE sqlite_stat1(tbl,idx,stat);
There is normally one row per index, with the index identified by the
name in the sqlite_stat1.idx column. The sqlite_stat1.tbl column is
the name of the table to which the index belongs. In each such row,
the sqlite_stat.stat column will be a string consisting of a list of
integers followed by zero or more arguments. The first integer in this
list is the approximate number of rows in the index. (The number of
rows in the index is the same as the number of rows in the table,
except for partial indexes.) .....
If there are no partial indexes, the SELECT tbl,cast(stat as INT) will return the number of rows in each table, unless the table has 0 rows.
This sql gives the expected results on a small (25MB, 34 tables, 26 indexes, 33K+ rows) production database. Your mileage may (will?) vary.
ANALYZE;
select DISTINCT tbl_name, CASE WHEN stat is null then 0 else cast(stat as INT) END numrows
from sqlite_master m
LEFT JOIN sqlite_stat1 stat on m.tbl_name = stat.tbl
where m.type='table'
and m.tbl_name not like 'sqlite_%'
order by 1;
--drop table sqlite_stat1;

Make select query return in order of arguments

I have a relatively simple select query which asks for rows by an column value (this is not controlled by me). I pass in a variable argument of id values to be returned. Here's an example:
select * from team where id in (2, 1, 3)
I'm noticing that as the database changes its order over time, my results are changing order as well. Is there a way to make SQLite guarantee results in the same order as the arguments?
If you could have so many IDs that the query becomes unwieldy, use a temporary table to store them:
CREATE TEMPORARY TABLE SearchIDs (
ID,
OrderNr INTEGER PRIMARY KEY
);
(The OrderNr column is autoincrementing so that it automatically gets proper values when you insert values.)
To do the search, you have to fill this table:
INSERT INTO SearchIDs(ID) VALUES (2), (1), (3) ... ;
SELECT Team.*
FROM Team
JOIN SearchIDs USING (ID)
ORDER BY SearchIDs.OrderNr;
DELETE FROM SearchIDs;
Try this!
select * from team order by
case when 2 then 0
when 1 then 1
when 3 then 2
end

Resources