Can somebody tell me why 'CASE WHEN' makes it so slow and how to optimize / fix it, please ?
It is needed to get the items that are pinned to be put first in the result and in order.
I could probably do it after the sql query but i think it would be faster, when done right, if this sorting is done inside the sql query.
slow query ~490ms
SELECT
places.id AS place_id,
url,
title,
thumbnails.score AS score,
thumbnails.clipping AS clipping,
thumbnails.lastModified AS lastModified,
EXISTS (SELECT 1 FROM pinned pi WHERE pi.place_id = places.id) AS pinned
FROM places
LEFT JOIN thumbnails ON (thumbnails.place_id = places.id)
LEFT JOIN pinned j ON (j.place_id = places.id) WHERE (hidden == 0)
ORDER BY case when j.id is null then 1 else 0 end,
j.id,
frecency DESC LIMIT 24
Removing the 'CASE WHEN' part:
query ~6ms
SELECT
places.id AS place_id,
url,
title,
thumbnails.score AS score,
thumbnails.clipping AS clipping,
thumbnails.lastModified AS lastModified,
EXISTS (SELECT 1 FROM pinned pi WHERE pi.place_id = places.id) AS pinned
FROM places
LEFT JOIN thumbnails ON (thumbnails.place_id = places.id) WHERE (hidden == 0)
ORDER BY frecency DESC LIMIT 24
Table info:
var Create_Table_Places =
'CREATE TABLE places (' +
'id INTEGER PRIMARY KEY,' +
'url LONGVARCHAR,' +
'title LONGVARCHAR,' +
'visit_count INTEGER DEFAULT 0,' +
'hidden INTEGER DEFAULT 0 NOT NULL,' +
'typed INTEGER DEFAULT 0 NOT NULL,' +
'frecency INTEGER DEFAULT -1 NOT NULL,' +
'last_visit_date INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
var Create_Table_Thumbnails =
'CREATE TABLE thumbnails (' +
'id INTEGER PRIMARY KEY,' +
'place_id INTEGER UNIQUE,' +
'data LONGVARCHAR,' +
'score REAL,' +
'clipping INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
var Create_Table_Pinned =
'CREATE TABLE pinned (' +
'id INTEGER PRIMARY KEY,' +
'place_id INTEGER UNIQUE,' +
'position INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
To find out whether there are fundamental differences in the execution of queries, use EXPLAIN QUERY PLAN.
In SQLite 3.7.almost15, your queries have the following plans:
selectid order from detail
-------- ----- ---- ------
0 0 0 SCAN TABLE places (~100000 rows)
0 1 1 SEARCH TABLE thumbnails USING INDEX sqlite_autoindex_thumbnails_1 (place_id=?) (~1 rows)
0 2 2 SEARCH TABLE pinned AS j USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE pinned AS pi USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
selectid order from detail
-------- ----- ---- ------
0 0 0 SCAN TABLE places (~100000 rows)
0 1 1 SEARCH TABLE thumbnails USING INDEX sqlite_autoindex_thumbnails_1 (place_id=?) (~1 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE pinned AS pi USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
These two plans are almost identical, except for the duplicate pinned lookup.
If your SQLite doesn't execute the queries this way, update it.
In you first query, you can remove the subquery for the pinned field because you are already joining with the pinned table, and you're executing exactly the same lookup that was done for the join; use j.id IS NOT NULL instead.
Your CASE WHEN has the purpose of sorting the NULLs after the other values.
You can get the same effect by converting all NULLs to some value that is sorted after numbers, such as a string:
... ORDER BY IFNULL(j.id, ''), frecency DESC
However, in theory, this should not have much of a runtime difference from CASE WHEN.
Related
This query worked perfectly until the moment I went in for vacations, now itdoes not run anymore and does not merge, dont know what it can be
MERGE INTO STG_FATO_MACRO_GESTAO AS FAT
USING(SELECT DISTINCT
COD_EMPRESA
,FUN.MATRICULA AS FUN_MAT
,APR.MATRICULA AS APR_MAT
,FUN.CPF AS FUN_CPF
,APR.CPF AS APR_CPF
,APR.DAT_DESLIGAMENTO
,YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) AS DESL
,FUN.DATA_ADMISSAO
,YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) AS ADM
, CASE WHEN YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) <= YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) THEN 1 ELSE 0 END AS ADMITIDO
,CASE WHEN FUN.DATA_ADMISSAO <= (APR.DAT_DESLIGAMENTO + INTERVAL '90' DAY) THEN 1 ELSE 0 END AS APR_90
FROM (SELECT CPF,DATA_ADMISSAO, MATRICULA, COD_EMPRESA FROM DIM_FUNCIONARIO
WHERE PROFISSAO NOT LIKE '%APRENDIZ%') AS FUN
INNER JOIN (SELECT DISTINCT
CPF,DAT_DESLIGAMENTO,MATRICULA
FROM HST_APRENDIZ
WHERE FLAG_FECHAMENTO = 2
AND DAT_DESLIGAMENTO IS NOT NULL) AS APR
ON FUN.CPF = APR.CPF) AS APR_90
ON FAT.COD_EMPRESA = APR_90.COD_EMPRESA
AND FAT.MATRICULA = APR_90.FUN_MAT
AND APR_90.APR_90 = 1
AND APR_90.ADMITIDO = 1
WHEN MATCHED THEN
UPDATE SET APRENDIZ_EFETIVADO_90 = 1
;
when running this query returns me this error:
"The search condition must fully specify the Target table primary index and partition column(s) and expression must match INSERT specification primary index and partition column(s). "
We have a single table that includes Parent/Child relationships.
We are trying to update the Level of the Children based on the Parents Level + 1
We are currently doing a loop in code that looks something like this:
Dim ParentLevel as Integer = 1
Do While Count > 0
Update [MyTable]
SET [MyLevel] = (SELECT [P].[MyLevel] + 1
FROM [MyTable] [P]
WHERE [MyTable].[Parent] = [P].[Child]
AND [P].[MyLevel] = ParentLevel)
WHERE [MyLevel] IS NULL
AND [Parent] IN (SELECT [Child]
FROM [MyTable] [P]
WHERE P.[MyLevel] = ParentLevel);
execute query
ParentLevel = ParentLevel + 1
Loop
This table has only 31148 records.
When ParentLevel is below around 3 or 4 the update query's performance is acceptable.
However after that it takes to long. Therefore we know we must be doing some wrong.
Indexes are on parent and Child and MyLevel.
We've tried other update queries like below but performance is the same:
Update [MyTable]
SET [MyLevel] = (SELECT P.[MyLevel] + 1
FROM [MyTable] AS P
WHERE [MyTable].[Parent] = P.[Child]
AND [P].[MyLevel] = ParentLevel )
WHERE EXISTS (SELECT P.[MyLevel] + 1
FROM [MyTable] AS P
WHERE [MyTable].[Parent] = P.[Child]
AND [P].[MyLevel] = 6);
We were hoping for some SQLite experts advice on a better performing update query.
We're using sqlite version 3.16.0.
I would like to create some views to simplify some common recursive operations I do on our schema. However, these views turn out to be significantly slower than running the SQL directly.
Specifically, a view to show me the ancestors for a given node:
CREATE VIEW ancestors AS
WITH RECURSIVE ancestors
(
leafid
, parentid
, name
, depth
)
AS
(SELECT id
, parentid
, name
, 1
FROM objects
UNION ALL
SELECT a.leafid
, f.parentid
, f.name
, a.depth + 1
FROM objects f
JOIN ancestors a
ON f.id = a.parentid
) ;
when used with this query:
SELECT *
FROM ancestors
WHERE leafid = 157609;
yields this result:
sele order from deta
---- ------------- ---- ----
2 0 0 SCAN TABLE objects
3 0 1 SCAN TABLE ancestors AS a
3 1 0 SEARCH TABLE objects AS f USING INTEGER PRIMARY KEY (rowid=?)
1 0 0 COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0 0 0 SCAN SUBQUERY 1
Run Time: real 0.374 user 0.372461 sys 0.001483
Yet running the query directly (with a WHERE constraint on the initial query for the same row), yields:
WITH RECURSIVE ancestors
(
leafid, parentid, name, depth
)
AS
(SELECT id, parentid , name, 1
FROM objects
WHERE id = 157609
UNION ALL
SELECT a.leafid, f.parentid , f.name, a.depth + 1
FROM objects f
JOIN ancestors a
ON f.id = a.parentid
)
SELECT *
FROM ancestors;
Run Time: real 0.021 user 0.000249 sys 0.000111
sele order from deta
---- ------------- ---- ----
2 0 0 SEARCH TABLE objects USING INTEGER PRIMARY KEY (rowid=?)
3 0 1 SCAN TABLE ancestors AS a
3 1 0 SEARCH TABLE objects AS f USING INTEGER PRIMARY KEY (rowid=?)
1 0 0 COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0 0 0 SCAN SUBQUERY 1
The second result is around 15 times faster because we're using the PK index on objects to get the initial row, whereas the view seems to scan the entire table, filtering on leaf node only after the ancestors for all rows are found.
Is there any way to write the view such that I can apply a constraint on a consuming select that would be applied to the optimization of the initial query?
You are asking for the WHERE leafid = 157609 to be moved inside the first subquery. This is the push-down optimization, and SQLite tries to do it whenever possible.
However, this is possible only if the database is able to prove that the result is guaranteed to be the same. For this particular query, you know that the transformation would be valid, but, at the moment, there is no algorithm to make this proof for recursive CTEs.
I am having trouble with a query.
Fiddle: https://www.db-fiddle.com/f/JXQHw1VzF7vAowNLFrxv5/1
This is not going to work.
So my question is: What has to be done to get a result when I wanna use both conditions.
(attr_key = 0 AND attr_value & 201326592 = 201326592)
AND
(attr_key = 30 AND attr_value & 8 = 8)
Thanks in advance!
Best regards
One way to check for the presence of some number of key value pairs in the items_attributes table would be to use conditional aggregation:
SELECT i.id
FROM items i
LEFT JOIN items_attributes ia
ON i.id = ia.owner
GROUP BY
i.id
HAVING
SUM(CASE WHEN ia.key = 0 AND ia.value = 201326592 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN ia.key = 30 AND ia.value = 8 THEN 1 ELSE 0 END) > 0
The trick in the above query is that we scan each cluster of key/value pairs for each item, and then check whether the pairs you expect are present.
Note: My query just returns id values from items matching all key value pairs. If you want to bring in other columns from either of the two tables, you may simply add on more joins to what I wrote above.
I have a table TABLE in SQLite database with columns DATE, GROUP. I want to select the first 10 entries in each group. After researching similar topics here on stackoverflow, I came up with the following query, but it runs very slowly. Any ideas how to make it faster?
select * from TABLE as A
where (select count(*) from TABLE as B
where B.DATE < A.DATE and A.GROUP == B.GROUP) < 10
This is the result of EXPLAIN QUERY PLAN (TABLE = clients_bets):
Here are a few suggestions :
Use a covering index (an index containing all the data needed in the subquery, in this case the group and date)
create index some_index on some_table(some_group, some_date)
Additionally, rewrite the subquery to make is less dependent on outer query :
select * from some_table as A
where rowid in (
select B.rowid
from some_table as B
where A.some_group == B.some_group
order by B.some_date limit 10 )
The query plan change from :
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=?)
to
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=? AND some_date<?)
While it is very similar, the query seems quite faster. I'm not sure why.