sqlite3 recursive aggregation of data - recursion

This may be a kind of the Knapsack problem.
I need to traverse a data table, group it by a column, choosing ones with better time.
Then repeat the previous step until a limit given by column CAPACITY is not reached.
This is the demo scenario:
create table if not exists data( vid num, size num, epid num, sid num, capacity num, dt );
delete from data;
insert into data(vid,size,epid,sid,capacity,dt)
values
(0,20,1,1,50,1100), -- 2nd choice
(0,20,1,1,50,1000), -- 1st choice
(0,20,1,1,50,1200), -- last choice excluded because out of capacity
(1,20,2,2,50,1100), -- 2nd choice
(1,20,2,2,50,1000), -- 1st choice
(1,20,2,2,50,1200); -- last choice excluded because out of capacity
This is the non recursive solution:
with best0 as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
),
best1 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best0 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
),
best2 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best1 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0 union all select tid from best1)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best0
union all
select * from best1
union all
select * from best2
And this the result:
tid | vid | sid | size | Dtime | capacity | group_level
--- | --- | --- | ---- | ----- | -------- | -----------
2 | 0 | 1 | 20 | 1000 | 30 | 0
5 | 1 | 2 | 20 | 1000 | 30 | 0
1 | 0 | 1 | 20 | 1100 | 10 | 1
4 | 1 | 2 | 20 | 1100 | 10 | 1
This is the recursive version that give error: "recursive reference in a subquery: best"
with recursive best(tid,vid,sid,size,dt,remains,level)
as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
union all
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best) and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best
I tried differents solutions even using a loop counter but everyone give the same error.

Related

ROW_NUMBER, RANK, or DENSE_RANK to have the SAME order number if there are rows with same values

PROBLEM
Form a scoreboard with position number, player name and best score. If two players have the same score, they share the position and the names are in alpabetical order. (See example.)
I have two tables
INSERT INTO players (id,name) VALUES (1, Uolevi),(2,Maija),(3,Liisa),(4,Kaaleppi),(5,Kotivalo);
INSERT INTO results (id, player_id, score) VALUES (1, 1, 100), (2, 2, 200), (3, 3, 200), (4, 4, 100), (5, 5, 50);
The expected result is:
Order
Name
Score
1
Liisa
200
1
Maija
200
3
Kaaleppi
100
3
Uolevi
100
5
Kotivalo
50
Please look carefully at the order number. Because there are 2 rows with order number 1, the next order number will be 3, instead of 2.
Here is a possible solution:
WITH
places AS (
SELECT row_number() OVER (ORDER BY score DESC, name) AS place, name, score
FROM players
JOIN results ON players.id = results.player_id
)
SELECT first_value(place) OVER (PARTITION BY score ORDER BY place) AS place, name, score
FROM places ORDER BY score DESC, name;
See the fiddle.
You need rank() to keep ranking if there is a tie. dense_rank() when you don't need gaps between ranks.
select rank() over (order by r.score desc) as order1,
dense_rank() over (order by r.score desc) as order2,
p.name,
r.score
from players p
join results r
on p.id = r.player_id
order by r.score desc, p.name;
Outcome (include both rank() and dense_rank() for your reference).
| order1 | order2 | name | score |
+--------+--------+-----------+-------+
| 1 | 1 | Liisa | 200 |
| 1 | 1 | Maija | 200 |
| 3 | 2 | Kaaleppi | 100 |
| 3 | 2 | Uolevi | 100 |
| 5 | 3 | Kotivalo | 50 |

How can I set multiple aliases for a single derived table in MariaDB 5.5?

Consider a database with three tables:
goods (Id is the primary key)
+----+-------+-----+
| Id | Name | SKU |
+----+-------+-----+
| 1 | Nails | 123 |
| 2 | Nuts | 456 |
| 3 | Bolts | 789 |
+----+-------+-----+
invoiceheader (Id is the primary key)
+----+--------------+-----------+---------+
| Id | Date | Warehouse | BuyerId |
+----+--------------+-----------+---------+
| 1 | '2021-10-15' | 1 | 223 |
| 2 | '2021-09-18' | 1 | 356 |
| 3 | '2021-07-13' | 2 | 1 |
+----+--------------+-----------+---------+
invoiceitems (Id is the primary key)
+----+----------+--------+-----+-------+
| Id | HeaderId | GoodId | Qty | Price |
+----+----------+--------+-----+-------+
| 1 | 1 | 1 | 15 | 1.1 |
| 2 | 1 | 3 | 7 | 1.5 |
| 3 | 2 | 1 | 12 | 1.5 |
| 4 | 3 | 3 | 3 | 1.3 |
+----+----------+--------+-----+-------+
What I'm trying to do is to get the MAX(invoiceheader.Date) for every invoiceitems.GoodId. Or, in everyday terms, to find out, preferably in a single query, when was the last time any of the goods were sold, from a specific warehouse.
To do that, I'm using a derived query, and the solution proposed here . In order to be able to do that, I think that I need to have a way of giving multiple (well, two) aliases for a derived table.
My query looks like this at the moment:
SELECT tmp.* /* placing the second alias here, before or after tmp.* doesn't work */
FROM ( /* placing the second alias, tmpClone, here also doesn't work */
SELECT
invoiceheader.Id,
invoiceheader.Date,
invoiceitems.HeaderId,
invoiceitems.Id,
invoiceitems.GoodId
FROM invoiceheader
LEFT JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
WHERE invoiceheader.Warehouse = 3
AND invoiceheader.Date > '0000-00-00 00:00:00'
AND invoiceheader.Date IS NOT NULL
AND invoiceheader.Date > ''
AND invoiceitems.GoodId > 0
ORDER BY
invoiceitems.GoodId ASC,
invoiceheader.Date DESC
) tmp, tmpClone /* this doesn't work with or without a comma */
INNER JOIN (
SELECT
invoiceheader.Id,
MAX(invoiceheader.Date) AS maxDate
FROM tmpClone
WHERE invoiceheader.Warehouse = 3
GROUP BY invoiceitems.GoodId
) headerGroup
ON tmp.Id = headerGroup.Id
AND tmp.Date = headerGroup.maxDate
AND tmp.HeaderId = headerGroup.Id
Is it possible to set multiple aliases for a single derived table? If it is, how should I do it?
I'm using 5.5.52-MariaDB.
you can use both (inner select) and left join to achieve this for example:
select t1.b,(select t2.b from table2 as t2 where t1.x=t2.x) as 'Y' from table as t1 Where t1.y=(select t3.y from table3 as t3 where t2.a=t3.a)
While this doesn't answer my original question, it does solve the problem from which the question arose, and I'll leave it here in case anyone ever comes across a similar issue.
The following query does what I'd intended to do - find the newest sale date for the goods from the specific warehouse.
SELECT
invoiceheader.Id,
invoiceheader.Date,
invoiceitems.HeaderId,
invoiceitems.Id,
invoiceitems.GoodId
FROM invoiceheader
INNER JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
INNER JOIN (
SELECT
MAX(invoiceheader.Date) AS maxDate,
invoiceitems.GoodId
FROM invoiceheader
INNER JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
WHERE invoiceheader.Warehouse = 3
AND invoiceheader.Date > '0000-00-00 00:00:00'
AND invoiceheader.Date IS NOT NULL
AND invoiceheader.Date > ''
GROUP BY invoiceitems.GoodId
) tmpDate
ON invoiceheader.Date = tmpDate.maxDate
AND invoiceitems.GoodId = tmpDate.GoodId
WHERE invoiceheader.Warehouse = 3
AND invoiceitems.GoodId > 0
ORDER BY
invoiceitems.GoodId ASC,
invoiceheader.Date DESC
The trick was to join by taking into consideration two things - MAX(invoiceheader.Date) and invoiceitems.GoodId - since one GoodId can only appear once inside a specific invoiceheader / invoiceitems JOINing (strict limit imposed on the part of the code which inserts into invoiceitems).
Whether this is the most optimal solution (ignoring the redundant conditions in the query), and whether it would scale well, remains to be seen - it has been tested on tables with ~5000 entries for invoiceheader, ~60000 entries for invoiceitems, and ~4000 entries for goods. Execution time was < 1 sec.

Make partitions based on difference in date in Postgres window function

I have data in the following format
id | first_name | last_name | birth_date
abc | Jared | Pollard | 1970-01-01
def | Jared | Pollard | 1972-02-02
ghi | Jared | Pollard | 1980-01-01
klm | Jared | Pollard | 2015-01-01
and I would like a query which groups data based on the following rule
If first_name, last_name are equal and birth_dates are within 5 years of each other, than records belong to same group
So the above data contains three groups group1=(abc, def), group2=(ghi) and group3=(klm)
Currently I have the following query which incorrectly creates only 2 groups, group1=(abc, def) and group2=(ghi, klm)
SELECT
g.id,
FIRST_VALUE(g.id) OVER (PARTITION BY lower(trim(g.last_name)), lower(trim(g.first_name)),
CASE WHEN g.birth_date between g.fv_birth_date - interval '5 year' AND g.fv_birth_date + interval '5 year' THEN 1 ELSE 0 END
ORDER BY g.last_used_dt DESC NULLS LAST) AS cluster_id
FROM (
SELECT id, last_used_dt, last_name, first_name, birth_date,
FIRST_VALUE(birth_date)
OVER (PARTITION BY
lower(trim(last_name)),
lower(trim(first_name))
ORDER BY last_used_dt DESC NULLS LAST) AS fv_birth_date
FROM guest
) g;
I understand this is because of the CASE statement within the PARTITION BY clause but am unable to come up with any other query

SQLite Sum of Sums

Let's say I have two tables which look like this:
Games:
| AwayTeam | HomeTeam | AwayPoints | HomePoints |
------------------------------------------------------
| Aardvarks | Bobcats | 2 | 1 |
| Bobcats | Caterpillars | 20 | 10 |
| Aardvarks | Caterpillars | 200 | 100 |
Teams:
| Name |
----------------
| Aardvarks |
| Bobcats |
| Caterpillars |
How can I make a result which looks like this?
| Name | TotalPoints |
------------------------------
| Aardvarks | 202 |
| Bobcats | 21 |
| Caterpillars | 110 |
I think my real problem is how to splice statements together in SQL. These two statements work well individually:
SELECT SUM ( AwayPoints )
FROM Games
WHERE AwayTeam='Bobcats';
SELECT SUM ( HomePoints )
FROM Games
WHERE HomeTeam='Bobcats';
I suspect that I need a compound operator if I want to splice two SELECT statements togeather. Then pass that statement into the aggregate expression below:
SELECT Name, SUM( aggregate_expression )
AS 'TotalPoints'
FROM Teams
GROUP BY Name;
If I had to just throw it all together, I think I'd end up with something like this:
SELECT Name, SUM (
SELECT SUM ( AwayPoints )
FROM Games
WHERE AwayTeam=Name
UNION
SELECT SUM ( HomePoints )
FROM Games
WHERE HomeTeam=Name
)
AS 'TotalPoints'
FROM Teams
GROUP BY Name;
However that doesn't work because SELECT SUM ( SELECT ... is completely invalid
Use a UNION ALL
SELECT team, SUM(points)
FROM (
SELECT HomeTeam AS team, SUM(HomePoints) AS points
FROM Games
GROUP BY HomeTeam
UNION ALL
SELECT AwayTeam AS team, SUM(AwayPoints) AS points
FROM Games
GROUP BY AwayTeam
)
GROUP BY team
See SQLite documentation for
SELECT expr AS alias
SELECT ... FROM ( select-stmt ) AS table-alias

Suggestion needed writing a complex query - sqlite

I have 4 columns in a table called musics - 'artist','genre', 'writer' , 'producer'.
I need to write a query such that, it returns a value 0 , if there are no repetition of values corresponding to the column name; if there is a repetition of values, it should return a value 1, corresponding to that column name.
Any help is much appreciated
SELECT (COUNT(artist) <> COUNT(DISTINCT artist)) artist,
(COUNT(genre) <> COUNT(DISTINCT genre)) genre,
(COUNT(writer) <> COUNT(DISTINCT writer)) writer,
(COUNT(producer) <> COUNT(DISTINCT producer)) producer
FROM musics
Another version
SELECT
( SELECT (COUNT(*) > 0)
FROM (SELECT 1 FROM musics GROUP BY artist HAVING COUNT(*) > 1) a
) artist,
( SELECT (COUNT(*) > 0)
FROM (SELECT 1 FROM musics GROUP BY genre HAVING COUNT(*) > 1) g
) genre,
( SELECT (COUNT(*) > 0)
FROM (SELECT 1 FROM musics GROUP BY writer HAVING COUNT(*) > 1) w
) writer,
( SELECT (COUNT(*) > 0)
FROM (SELECT 1 FROM musics GROUP BY producer HAVING COUNT(*) > 1) p
) producer
Sample data
| artist | genre | writer | producer |
------------------------------------------
| artist1 | genre1 | writer1 | producer1 |
| artist2 | genre2 | writer1 | producer2 |
| artist1 | genre3 | writer3 | producer3 |
Sample output:
| artist | genre | writer | producer |
--------------------------------------
| 1 | 0 | 1 | 0 |
SQLFiddle
For Artist
select convert(bit,(count(1)-1))
from table_name
group by artist -- <-- Replace artist with column name for which duplicate
write a select count statement using distinct with specified column and another select count without distinct and compare both of them based on your requirement
you can use 4 different query with union & each query must contain count(column name) + group by clause

Resources