sqlite, slow joined query throught 4 tables - sqlite

I have query through 4 tables: times, tags, users and categories.
Each table has no more than 400 records, but this query takes 70ms.
I need it many times (400x), so all procedure takes a total of about 30 seconds.
SELECT COUNT(*) FROM times
INNER JOIN tags ON times.user_id = tags.tag_id
INNER JOIN users ON tags.user_nr = users.nr
INNER JOIN categories ON users.category_id = categories.id
WHERE (times.time_raw < "000560")
AND (times.time_raw != 0 )
AND (times.cell != 1 )
AND (categories.name="kategory_A")
AND (times.run_id="08")
How can I make it faster?

Indexes is the solution!!
The following list gives guidelines in choosing columns to index:
•You should create indexes on columns that are used frequently in
WHERE clauses.
•You should create indexes on columns that are used
frequently to join tables.
•You should create indexes on columns
that are used frequently in ORDER BY clauses.
•You should create
indexes on columns that have few of the same values or unique values
in the table.
•You should not create indexes on small tables (tables
that use only a few blocks) because a full table scan may be faster
than an indexed query.
•If possible, choose a primary key that
orders the rows in the most appropriate order.
•If only one column
of the concatenated index is used frequently in WHERE clauses, place
that column first in the CREATE INDEX statement.
•If more than one
column in a concatenated index is used frequently in WHERE clauses,
place the most selective column first in the CREATE INDEX statement.

Related

How to list total records for all sqlite tables? [duplicate]

Please help on below query :
sqlite query to get all list of table names with number of records in it :
I want to get the count of rows in every table in a Sqlite3 database. I want to avoid writing out a longhand query. I can get the list of tables like this:
SELECT name FROM sqlite_master WHERE type='table'
and I would like to use it in a subquery like this:
select count (*) from (SELECT name FROM sqlite_master WHERE type='table');
but would just return the total rows in the subquery, which isn't what I want.
Perhaps you use the results of ANALYZE to create a workaround. It creates the internal schema object sqlite_stat1
2.6.3. The sqlite_stat1 table
The sqlite_stat1 is an internal table created by the ANALYZE command
and used to hold supplemental information about tables and indexes
that the query planner can use to help it find better ways of
performing queries. Applications can update, delete from, insert into
or drop the sqlite_stat1 table, but may not create or alter the
sqlite_stat1 table. The schema of the sqlite_stat1 table is as
follows:
CREATE TABLE sqlite_stat1(tbl,idx,stat);
There is normally one row per index, with the index identified by the
name in the sqlite_stat1.idx column. The sqlite_stat1.tbl column is
the name of the table to which the index belongs. In each such row,
the sqlite_stat.stat column will be a string consisting of a list of
integers followed by zero or more arguments. The first integer in this
list is the approximate number of rows in the index. (The number of
rows in the index is the same as the number of rows in the table,
except for partial indexes.) .....
If there are no partial indexes, the SELECT tbl,cast(stat as INT) will return the number of rows in each table, unless the table has 0 rows.
This sql gives the expected results on a small (25MB, 34 tables, 26 indexes, 33K+ rows) production database. Your mileage may (will?) vary.
ANALYZE;
select DISTINCT tbl_name, CASE WHEN stat is null then 0 else cast(stat as INT) END numrows
from sqlite_master m
LEFT JOIN sqlite_stat1 stat on m.tbl_name = stat.tbl
where m.type='table'
and m.tbl_name not like 'sqlite_%'
order by 1;
--drop table sqlite_stat1;

sql query for extracting one column from many tables

I need your support for a query in SQLite Studio.
I am dealing with a database made by 1,000 different tables.
Half of them (all named "news" + an identification number, like 04AD86) contain the column "category" which I am interested in. This column can have from 100 to 200 records for each table.
Could you suggest me a query that extracts "category" from every table and returns a list of all possible categories (without duplicates records)?
Thanks a lot
You will probably need dynamic SQL to handle this in a single query. If you don't mind doing this over several queries, then here is one option. First do a query to obtain all the tables which contain the category column:
SELECT name
FROM sqlite_master
WHERE type = 'table' AND name LIKE 'news%'
Next, for the actual queries to obtain the unique categories, you can perform a series of unions to get your list. Here is what it would look like:
SELECT DISTINCT category
FROM news04AD86
UNION
SELECT DISTINCT category
FROM news 05BG34
UNION
...
The DISTINCT keyword will remove duplicates within any given name table, and UNION will remove duplicates which might occur between one table and another.

SQLite - Selecting not indexed column in GROUP BY

I have similar situation like question below.
Mysql speed up max() group by
SELECT MAX(id) id, cid FROM table GROUP BY cid
To optimize above query (shown in the question), creating index(cid, id) does the trick.
However, when I add a column that is not indexed to SELECT, query speed drastically slows down.
For example,
SELECT MAX(id) id, cid, newcolumn FROM table GROUP BY cid
If I create index(cid, id, newcolumn), query time comes back to minimal. It seems I should index all the columns I select while using GROUP BY.
Is there any way other than indexing all the columns to be select?
When all the columns used in the query are part of the index (which is then called a covering index), SQLite can get all values from the index and does not need to access the table itself.
When adding a column that is not indexed, each record must be looked up in both the index and the table.
Furthermore, the order of the records in the table is unlikely to be the same as the order in the index, so the table's pages are not read in order, and are read multiple times, which means that caching will not work as well.
The newcolumn values must be read from either the table or an index; there is no other mechanism to store data.
tl;dr: no

What causes sqlite join contraints with OR clauses to be significantly slower?

Here are two sqlite queries:
SELECT * FROM items JOIN licenses ON items.id=licenses.id OR items.type=licenses.type;
This query doesn't use OR, it uses UNION
SELECT * FROM items JOIN licenses ON items.id=licenses.id UNION SELECT * FROM items JOIN licenses ON items.type=licenses.type;
Assuming I have an index in the licenses table on id and an index in the licenses table on type shouldn't the first query that uses an OR be only a tiny bit slower?
I am seeing that the first query is approximately 20 times slower than the second query in Sqlite, what is the cause for that?
I would expect the internal plan to look something like this for the first query:
For each row in the items table:
Take the value from the id column of the items table and use it to lookup all rows in the licenses table with that id, call that set of matching rows A.
Take the value from the type column of the items table and use it to lookup all rows in the licenses table with that type, call that set of matching rows A'.
Combine A and A' and eliminate any duplicate rows. Add the result in the list of result rows
For doing joins, SQLite supports only nested loop joins on two tables (which can be optimized with indexes).
As explained in The SQLite Query Planner and Query Planning, doing joins with two tables at once is not one of the supported optimizations.

Explanation on index on a datetime field and included columns

I have a sqlserver table with the usual
intID(primary key),field1,field2,manyotherfields..., datetime TimeOperation
99% of my different kind of queries start with a TimeOperation BETWEEN startTime AND endTime, and then select * (or count(*)) where fieldA=xxx, and join with other smaller tables.
select * because more or less I need all the fields.
I obviusly created an index on TimeOperation ... but performance are not good enough, so I want to add some index key columns or index included columns, but I'm a little bit confused.
I get the difference between the two, but I don't get how much adding a column in each case impacts on speed and on size.
I guess that the biggest improvement would be to create an index including ALL the columns, is it right? (but I can't afford it in terms of space)
And if I often use field1=xxx for example, adding field1 to the index key columns (after TimeOperation) would give better performance right?
Also...just to be sure how an index with included columns works: if I select rows with TimeOperation in a certain range, sql seeks my TimeOperation index for the rows I'm interested in, and it is faster than scanning all the table because in the index the TimeOperation values are in ascending order, is it right? But then I need all the data now I need all the rest of the data fields of those rows...how does sql acts to retrieve the data? I guess it has a sort of bookmark to those rows in the index, right? But it has to hit the table multiple times then... so including all the columns in the index will save the time to hit the table, it it correct?
Thanks!
Mattia
We will need more information on your table examples of your queries to address this fully, but:
DateTime columns should be highly selective by themselves, so an index with TimeOperation as the first column should address the bulk of queries against TimeOperation.
Do not add all columns blindly to an index, or even on included indexes - this will make the index page density worse and be counter productive (you would be duplicating your table in an index).
If all data in your database centres around TimeOperation, you might consider building your clustered index around it.
If you have queries just on field1 = x then you need a separate index just for field1 (assuming that it is suitably selective), i.e. no TimeOperation on the index if its not in the WHERE clause of your query.
Yes, you are right, when SQL locates a record in an index, it needs to do a key (or RID) lookup back into the cluster to retrieve the rest of the columns. If your non clustered index Includes the other columns in your select statement, the lookup can be avoided. But since you are using SELECT(*), covering indexes are unlikely to help .
Edit
Explanation - Selectivity and density are explained in detail here. e.g. iff your queries against TimeOperation return only a small number of rows (rule of thumb is < 5%, but this isn't always), will the index be used, i.e. your query is selective enough for SQL to choose the index on TimeOperation.
The basic starting point would be:
CREATE TABLE [MyTable]
(
intID INT ID identity(1,1) NOT NULL,
field1 NVARCHAR(20),
-- .. More columns, which may be selected, but not filtered
TimeOperation DateTime,
CONSTRAINT PK_MyTable PRIMARY KEY (IntId)
);
And the basic indexes will be
CREATE NONCLUSTERED INDEX IX_MyTable_1 ON [MyTable](TimeOperation);
CREATE NONCLUSTERED INDEX IX_MyTable_2 ON [MyTable](Field1);
Clustering Consideration / Option
If most of your records are inserted in 'serial' ascending TimeOperation order, i.e. intId and TimeOperation will both increase in tandem, then I would leave the clustering on intID (the default) (i.e. table DDL is PRIMARY KEY CLUSTERED (IntId), which is the default anyway).
However, if there is NO correlation between IntId and TimeOperation, and IF most of your queries are of the form SELECT * FROM [MyTable] WHERE TimeOperation between xx and yy then CREATE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation) (and changing PK to PRIMARY KEY NONCLUSTERED (IntId)) should improve this query (Rationale: since contiguous times are kept together, fewer pages need to be read, and the bookmark lookup will be avoided). Even better, if values of TimeOperation are guaranteed to be unique, then CREATE UNIQUE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation) will improve density as it will avoid the uniqueifier.
Note - for the rest of this answer, I'm assuming that your IntId and TimeOperations ARE strongly correlated and hence the clustering is by IntId.
Covering Indexes
As others have mentioned, your use of SELECT (*) is bad practice and inter alia means covering indexes won't be of any use (the exception being COUNT(*)).
If your queries weren't SELECT(*), but instead e.g.
SELECT TimeOperation, field1
FROM
WHERE TimeOperation BETWEEN x and y -- and returns < 5% data.
Then altering your index on TimeOperation to include field1
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation) INCLUDE(Field1);
OR adding both to the index (with the most common filter first, or the most selective first if both filters are always present)
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation, Field1);
Either will avoid the rid / key lookup. The second (,) option will address your query where BOTH TimeOperation and Field1 are filtered in a WHERE or HAVING clause.
Re : What's the difference between index on (TimeOperation, Field1) and separate indexes?
e.g.
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation, Field1);
will not be useful for the query
SELECT ... FROM MyTable WHERE Field1 = 'xyz';
The index will only be useful for the queries which have TimeOperation
SELECT ... FROM MyTable WHERE TimeOperation between x and y;
OR
SELECT ... FROM MyTable WHERE TimeOperation between x and y AND Field1 = 'xyz';
Hope this helps?
An index, at its most basic, creates a layer of the "hypertree" structure behind the scenes, which allows the SQL engine to more easily find rows with particular values for indexed columns. Each index creates a different way to "drill down" into the table's data using a binary search (logN performance). Each index you add makes selecting by that index faster, at the cost of slowing insertions/updates (the data must be put in and then indexes must be created).
An index, therefore, should normally be created for combinations of columns that are commonly used to filter records. I would indeed create an index on TimeOperation, and TimeOperation alone.
NEVER simply create an index including all columns of a table, especially a wide one such as this.

Resources