over - partition by in SQLite - sqlite

I have a table TABLE in SQLite database with columns DATE, GROUP. I want to select the first 10 entries in each group. After researching similar topics here on stackoverflow, I came up with the following query, but it runs very slowly. Any ideas how to make it faster?
select * from TABLE as A
where (select count(*) from TABLE as B
where B.DATE < A.DATE and A.GROUP == B.GROUP) < 10
This is the result of EXPLAIN QUERY PLAN (TABLE = clients_bets):

Here are a few suggestions :
Use a covering index (an index containing all the data needed in the subquery, in this case the group and date)
create index some_index on some_table(some_group, some_date)
Additionally, rewrite the subquery to make is less dependent on outer query :
select * from some_table as A
where rowid in (
select B.rowid
from some_table as B
where A.some_group == B.some_group
order by B.some_date limit 10 )
The query plan change from :
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=?)
to
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=? AND some_date<?)
While it is very similar, the query seems quite faster. I'm not sure why.

Related

How to create a sqlite recursive view that properly uses an index for the first row

We're using sqlite version 3.16.0.
I would like to create some views to simplify some common recursive operations I do on our schema. However, these views turn out to be significantly slower than running the SQL directly.
Specifically, a view to show me the ancestors for a given node:
CREATE VIEW ancestors AS
WITH RECURSIVE ancestors
(
leafid
, parentid
, name
, depth
)
AS
(SELECT id
, parentid
, name
, 1
FROM objects
UNION ALL
SELECT a.leafid
, f.parentid
, f.name
, a.depth + 1
FROM objects f
JOIN ancestors a
ON f.id = a.parentid
) ;
when used with this query:
SELECT *
FROM ancestors
WHERE leafid = 157609;
yields this result:
sele order from deta
---- ------------- ---- ----
2 0 0 SCAN TABLE objects
3 0 1 SCAN TABLE ancestors AS a
3 1 0 SEARCH TABLE objects AS f USING INTEGER PRIMARY KEY (rowid=?)
1 0 0 COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0 0 0 SCAN SUBQUERY 1
Run Time: real 0.374 user 0.372461 sys 0.001483
Yet running the query directly (with a WHERE constraint on the initial query for the same row), yields:
WITH RECURSIVE ancestors
(
leafid, parentid, name, depth
)
AS
(SELECT id, parentid , name, 1
FROM objects
WHERE id = 157609
UNION ALL
SELECT a.leafid, f.parentid , f.name, a.depth + 1
FROM objects f
JOIN ancestors a
ON f.id = a.parentid
)
SELECT *
FROM ancestors;
Run Time: real 0.021 user 0.000249 sys 0.000111
sele order from deta
---- ------------- ---- ----
2 0 0 SEARCH TABLE objects USING INTEGER PRIMARY KEY (rowid=?)
3 0 1 SCAN TABLE ancestors AS a
3 1 0 SEARCH TABLE objects AS f USING INTEGER PRIMARY KEY (rowid=?)
1 0 0 COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
0 0 0 SCAN SUBQUERY 1
The second result is around 15 times faster because we're using the PK index on objects to get the initial row, whereas the view seems to scan the entire table, filtering on leaf node only after the ancestors for all rows are found.
Is there any way to write the view such that I can apply a constraint on a consuming select that would be applied to the optimization of the initial query?
You are asking for the WHERE leafid = 157609 to be moved inside the first subquery. This is the push-down optimization, and SQLite tries to do it whenever possible.
However, this is possible only if the database is able to prove that the result is guaranteed to be the same. For this particular query, you know that the transformation would be valid, but, at the moment, there is no algorithm to make this proof for recursive CTEs.

SQLite query returns 0 results

I am having trouble with a query.
Fiddle: https://www.db-fiddle.com/f/JXQHw1VzF7vAowNLFrxv5/1
This is not going to work.
So my question is: What has to be done to get a result when I wanna use both conditions.
(attr_key = 0 AND attr_value & 201326592 = 201326592)
AND
(attr_key = 30 AND attr_value & 8 = 8)
Thanks in advance!
Best regards
One way to check for the presence of some number of key value pairs in the items_attributes table would be to use conditional aggregation:
SELECT i.id
FROM items i
LEFT JOIN items_attributes ia
ON i.id = ia.owner
GROUP BY
i.id
HAVING
SUM(CASE WHEN ia.key = 0 AND ia.value = 201326592 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN ia.key = 30 AND ia.value = 8 THEN 1 ELSE 0 END) > 0
The trick in the above query is that we scan each cluster of key/value pairs for each item, and then check whether the pairs you expect are present.
Note: My query just returns id values from items matching all key value pairs. If you want to bring in other columns from either of the two tables, you may simply add on more joins to what I wrote above.

Trouble with Sqlite subquery

My CustomTags table may have a series of "temporary" records where Tag_ID is 0, and Tag_Number will have some five digit value.
Periodically, I want to clean up my Sqlite table to remove these temporary values.
For example, I might have:
Tag_ID Tag_Number
0 12345
0 67890
0 45678
1 12345
2 67890
In this case, I want to remove the first two records because they are duplicated with actual Tag_ID 1 and 2. But I don't want to remove the third record yet because it hasn't been duplicated yet.
I have tried a number of different types of subqueries, but I just can't get it working. This is the last thing I tried, but my database client complains of an unknown syntax error. (I have tried with and without AS as an alias)
DELETE FROM CustomTags t1
WHERE t1.Tag_ID = 0
AND (SELECT COUNT(*) FROM CustomTags t2 WHERE t1.Tag_Number = t2.Tag_Number) > 1
Can anyone offer some insight? Thank you
There are many options, but the simplest are probably to use EXISTS;
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND EXISTS(
SELECT 1 FROM CustomTags c
WHERE c.Tag_ID <> 0 AND c.Tag_Number = CustomTags.Tag_Number
)
An SQLfiddle to test with.
...or NOT IN...
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND Tag_Number IN (
SELECT Tag_Number FROM CustomTags WHERE Tag_ID <> 0
)
Another SQLfiddle.
With your dataset like so:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
0 12345
0 67890
0 45678
You can run:
delete from test
where rowid not in (
select a.rowid
from test a
inner join (select tag_number, max(tag_id) as mt from test group by tag_number) b
on a.tag_number = b.tag_number
and a.tag_id = b.mt
);
Result:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
Please do test this out with a few more test cases than you have to be entirely sure that's what you want. I'd recommend creating a copy of your database before you run this on a large dataset.

WebSQL query optimization help - CASE WHEN makes query 400 ms slow

Can somebody tell me why 'CASE WHEN' makes it so slow and how to optimize / fix it, please ?
It is needed to get the items that are pinned to be put first in the result and in order.
I could probably do it after the sql query but i think it would be faster, when done right, if this sorting is done inside the sql query.
slow query ~490ms
SELECT
places.id AS place_id,
url,
title,
thumbnails.score AS score,
thumbnails.clipping AS clipping,
thumbnails.lastModified AS lastModified,
EXISTS (SELECT 1 FROM pinned pi WHERE pi.place_id = places.id) AS pinned
FROM places
LEFT JOIN thumbnails ON (thumbnails.place_id = places.id)
LEFT JOIN pinned j ON (j.place_id = places.id) WHERE (hidden == 0)
ORDER BY case when j.id is null then 1 else 0 end,
j.id,
frecency DESC LIMIT 24
Removing the 'CASE WHEN' part:
query ~6ms
SELECT
places.id AS place_id,
url,
title,
thumbnails.score AS score,
thumbnails.clipping AS clipping,
thumbnails.lastModified AS lastModified,
EXISTS (SELECT 1 FROM pinned pi WHERE pi.place_id = places.id) AS pinned
FROM places
LEFT JOIN thumbnails ON (thumbnails.place_id = places.id) WHERE (hidden == 0)
ORDER BY frecency DESC LIMIT 24
Table info:
var Create_Table_Places =
'CREATE TABLE places (' +
'id INTEGER PRIMARY KEY,' +
'url LONGVARCHAR,' +
'title LONGVARCHAR,' +
'visit_count INTEGER DEFAULT 0,' +
'hidden INTEGER DEFAULT 0 NOT NULL,' +
'typed INTEGER DEFAULT 0 NOT NULL,' +
'frecency INTEGER DEFAULT -1 NOT NULL,' +
'last_visit_date INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
var Create_Table_Thumbnails =
'CREATE TABLE thumbnails (' +
'id INTEGER PRIMARY KEY,' +
'place_id INTEGER UNIQUE,' +
'data LONGVARCHAR,' +
'score REAL,' +
'clipping INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
var Create_Table_Pinned =
'CREATE TABLE pinned (' +
'id INTEGER PRIMARY KEY,' +
'place_id INTEGER UNIQUE,' +
'position INTEGER,' +
'dateAdded INTEGER,' +
'lastModified INTEGER' +
')';
To find out whether there are fundamental differences in the execution of queries, use EXPLAIN QUERY PLAN.
In SQLite 3.7.almost15, your queries have the following plans:
selectid order from detail
-------- ----- ---- ------
0 0 0 SCAN TABLE places (~100000 rows)
0 1 1 SEARCH TABLE thumbnails USING INDEX sqlite_autoindex_thumbnails_1 (place_id=?) (~1 rows)
0 2 2 SEARCH TABLE pinned AS j USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE pinned AS pi USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
selectid order from detail
-------- ----- ---- ------
0 0 0 SCAN TABLE places (~100000 rows)
0 1 1 SEARCH TABLE thumbnails USING INDEX sqlite_autoindex_thumbnails_1 (place_id=?) (~1 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE pinned AS pi USING COVERING INDEX sqlite_autoindex_pinned_1 (place_id=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
These two plans are almost identical, except for the duplicate pinned lookup.
If your SQLite doesn't execute the queries this way, update it.
In you first query, you can remove the subquery for the pinned field because you are already joining with the pinned table, and you're executing exactly the same lookup that was done for the join; use j.id IS NOT NULL instead.
Your CASE WHEN has the purpose of sorting the NULLs after the other values.
You can get the same effect by converting all NULLs to some value that is sorted after numbers, such as a string:
... ORDER BY IFNULL(j.id, ''), frecency DESC
However, in theory, this should not have much of a runtime difference from CASE WHEN.

Parsing data rows in plsql

This is quite clumsy.
Initial info: There's a clumsy select query eventually returning 0 or 1 depending on several conditions. Usually it get's to select only one row (other data is cut of by where, etc.). The problem occurs when there's more rows to parse. So the data actually looks like follows:
Status
0
1
instead of
Status
1
Problem: Only one rowed data is needed in return, i.e. if there's a 1 present in any row, it should 1 returned, otherwise 0.
Condition: It should be done only in a query (no variables, ifs etc.).
Thanks in advance.
If you are sure that 1 and 0 are the only valuesbeing returned, Can't you use a max over this query to see any 1s..?
select max(id) result
from (
select 1 id from dual
union all
select 0 id from dual
)
RESULT
----------
1
1 select max(id)
2 from (
3 select 0 id from dual
4 union all
5 select 0 id from dual
6 union all
7 select 0 id from dual
8* )
SQL> /
MAX(ID)
----------
0

Resources