MonetDB recursive CTE (common table expressions) - recursion

It seems MonetDB does not support recursive CTE. This is a useful feature that I used to get BOM from ERP systems. For a greater flexibility I used Firebird recursive stored procedures to enhance the output with extra calculations. A good example of SQLServer recursive CTE can be found here https://www.essentialsql.com/recursive-ctes-explained/
Question is: Is it any way I can achieve similar results in MonetDB?

There is currently no support for recursive CTEs in MonetDB[Lite]. The solution you have proposed yourself seems like the way to go.

It is clear that once I have access to procedures, variables and while-loop, something can be done. The following code provides me the desired result using temporary tables. I would appreciate if anybody can provide me an alternative to this solution that provides the same results without using the temporary tables overhead.
CREATE TEMPORARY TABLE BOM (parent_id string, comp_id string, qty double) ON COMMIT PRESERVE ROWS;
INSERT INTO BOM VALUES('a','b',5), ('a','c',2), ('b','d',4), ('b','c',7), ('c','e',3);
select * from BOM;
+-----------+---------+--------------------------+
| parent_id | comp_id | qty |
+===========+=========+==========================+
| a | b | 5 |
| a | c | 2 |
| b | d | 4 |
| b | c | 7 |
| c | e | 3 |
+-----------+---------+--------------------------+
CREATE TEMPORARY TABLE EXPLODED_BOM (parent_id string, comp_id string, path string, qty double, level integer) ON COMMIT PRESERVE ROWS;
CREATE OR REPLACE PROCEDURE UPDATE_BOM()
BEGIN
DECLARE prev_count int;
DECLARE crt_count int;
DECLARE crt_level int;
delete from EXPLODED_BOM; --make sure is empty
insert into EXPLODED_BOM select parent_id, comp_id, parent_id||'-'||comp_id, qty, 0 from BOM; --insert first level
SET prev_count = 0;
SET crt_count = (select count(*) from EXPLODED_BOM);
SET crt_level = 0;
-- (crt_level < 100) avoids possible infinite loop, if BOM is malformed
WHILE (crt_level < 100) and (crt_count > prev_count) DO
SET prev_count = crt_count;
insert into EXPLODED_BOM select e.parent_id, a.comp_id, e.path||'-'||a.comp_id, a.qty*e.qty, crt_level+1
from BOM a, EXPLODED_BOM e
where a.parent_id = e.comp_id and e.level=crt_level;
-- is it any chance to get the amount of "affected rows" by insert, update or delete statements, this way I can avoid checking the new count?
SET crt_count = (select count(*) from EXPLODED_BOM);
SET crt_level = crt_level +1;
END WHILE;
END;
call UPDATE_BOM();
select * from EXPLODED_BOM;
+-----------+---------+---------+--------------------------+-------+
| parent_id | comp_id | path | qty | level |
+===========+=========+=========+==========================+=======+
| a | b | a-b | 5 | 0 |
| a | c | a-c | 2 | 0 |
| b | d | b-d | 4 | 0 |
| b | c | b-c | 7 | 0 |
| c | e | c-e | 3 | 0 |
| a | d | a-b-d | 20 | 1 |
| a | c | a-b-c | 35 | 1 |
| a | e | a-c-e | 6 | 1 |
| b | e | b-c-e | 21 | 1 |
| a | e | a-b-c-e | 105 | 2 |
+-----------+---------+---------+--------------------------+-------+

Related

Updating multiple rows in SQLite with relevant data from the same table

I have a database that I don't control the source of directly and results in errant '0' entries which mess up generated graphs with these drops to zero. I am able to manipulate the data after the fact and update that database.
It is acceptable that the last known good value can be used instead and so I am trying to make a general query that will remove all the zeros and populate it with the last known value.
Luckily, every entry includes the ID of the last entry and so it is a matter of simply looking back and grabbing it.
I have got very close to a final answer, but instead of updating with the last good value, it just uses the first value over and over again.
dummy data
CREATE TABLE tbl(id INT,r INT,oid INT);
INSERT INTO tbl VALUES(1,10,0);
INSERT INTO tbl VALUES(2,20,1);
INSERT INTO tbl VALUES(3,0,2);
INSERT INTO tbl VALUES(4,40,3);
INSERT INTO tbl VALUES(5,50,4);
INSERT INTO tbl VALUES(6,0,5);
INSERT INTO tbl VALUES(7,70,6);
INSERT INTO tbl VALUES(8,80,7);
SELECT * FROM tbl;
OUTPUT:
| id| r |oid|
|---|----|---|
| 1 | 10 | 0 |
| 2 | 20 | 1 |
| 3 | 0 | 2 | ** NEEDS FIXING
| 4 | 40 | 3 |
| 5 | 50 | 4 |
| 6 | 0 | 5 | ** NEEDS UPDATE
| 7 | 70 | 6 |
| 8 | 80 | 7 |
I have worked several queries to get results around what I am after:
All zero entries:
SELECT * FROM tbl WHERE r = 0;
OUTPUT:
| id | r | oid |
|----|----|-----|
| 3 | 0 | 2 |
| 6 | 0 | 5 |
Output only the those rows with the preceding good row
SELECT * FROM tbl WHERE A in (
SELECT id FROM tbl WHERE r = 0
UNION
SELECT oid FROM tbl WHERE r = 0
)
OUTPUT:
| id| r |oid|
|---|----|---|
| 2 | 20 | 1 |
| 3 | 0 | 2 |
| 5 | 50 | 4 |
| 6 | 0 | 5 |
Almost works
This is as close as I have got, it does change all the zero's, but it changes them all to the value of the first lookup
UPDATE tbl
SET r = (SELECT r
FROM tbl
WHERE id in (SELECT oid
FROM tbl
WHERE r = 0)
) WHERE r = 0 ;
OUTPUT:
| id| r |oid|
|---|----|---|
| 1 | 10 | 0 |
| 2 | 20 | 1 |
| 3 | 20 | 2 | ** GOOD
| 4 | 40 | 3 |
| 5 | 50 | 4 |
| 6 | 20 | 5 | ** BAD, should be 50
| 7 | 70 | 6 |
| 8 | 80 | 7 |
If it helps, I created this fiddle here that I've been playing with:
http://sqlfiddle.com/#!5/8afff/1
For this sample data all you have to do is use the correct correlated subquery that returns the value of r from the row with id equal to the current oid in the WHERE clause:
UPDATE tbl AS t
SET r = (SELECT tt.r FROM tbl tt WHERE tt.id = t.oid)
WHERE t.r = 0;
See the demo.

Get all table values if match in 2 other tables exists

I have a table "channel".
channelId
a
b
c
d
a table "video"
videoId | channelId
1 | a
2 | b
3 | c
4 | e
a table "comment"
commentID | videoID | videoID_channelID
xx | 1 | a
yy | 2 | b
zz | 5 | e
tt | 6 | f
Keys are:
channel.channelId = video.channelId = comment.videoID_channelID
video.videoId = comment.videoID
I need:
all channels with at least 1 video and 1 comment
all videos with at least 1 channel and 1 comment
all comments with a video and a channel
So I want to do 3 SQL statements, one for each table that references the other 2.
I tried it with a double inner-join (https://www.sqlitetutorial.net/sqlite-inner-join/) but it seems to return all combinations that fit rather than:
channelId
a
b
videoId | channelId
1 | a
2 | b
commentID | videoID | videoID_channelID
xx | 1 | a
yy | 2 | b
My code so far to get all channels with at least 1 video and 1 comment:
SELECT
channel.channelId
FROM
channel
INNER JOIN video ON video.channelId = channel.channelId
INNER JOIN comment ON comment.videoID_channelID = video.channelId
You can get all the results that you want with the same query that joins all 3 tables, but for each case select different columns:
SELECT c.channelId
FROM channel c
INNER JOIN video v ON v.channelId = c.channelId
INNER JOIN comment cm ON cm.videoID_channelID = v.channelId;
SELECT v.videoID, c.channelId
FROM channel c
INNER JOIN video v ON v.channelId = c.channelId
INNER JOIN comment cm ON cm.videoID_channelID = v.channelId;
SELECT cm.commentID, v.videoID, c.channelId
FROM channel c
INNER JOIN video v ON v.channelId = c.channelId
INNER JOIN comment cm ON cm.videoID_channelID = v.channelId;
You may have to add DISTINCT after each SELECT if you get duplicates in your actual data.
See the demo.
Results:
| channelId |
| --------- |
| a |
| b |
| videoID | channelId |
| ------- | --------- |
| 1 | a |
| 2 | b |
| commentID | videoID | channelId |
| --------- | ------- | --------- |
| xx | 1 | a |
| yy | 2 | b |

SQLite take N rows per each group

I have an SQLite table similar to the following:
| A | B |
_________
| e | 5 |
| f | 7 |
| a | 5 |
| n | 7 |
| g | 5 |
| d | 7 |
| i | 5 |
| j | 5 |
| e | 7 |
| v | 7 |
How can I retrieve three random rows with value 5 in column B and three random rows with value 7? I don't know values in B, neither values5 ad 7. I want 3 random rows for each different value in B. Result may be not grouped by column B values. It could be something like:
| A | B |
_________
| e | 5 |
| g | 5 |
| e | 7 |
| v | 7 |
| j | 5 |
| f | 7 |
The following almost does what you want:
select t.*
from t
where t.rowid in (select t2.rowid
from t t2
where t2.b = t.b
order by random()
limit 3
);
Alas, the subquery will be run for every row, so this is only approximate because the random number generator changes values on each execution.
One solution is to use a temporary table to store a random number for each row, which can then be used for sorting. Unfortunately, a CTE doesn't seem to do the trick, because these are re-evaluated on each reference.
After some thought, I think a temporary table might be the only solution:
drop table if exists tempt;
create temporary table tempt as
select t.*, random() as rand
from t;
select t.*
from tempt t
where t.rowid in (select t2.rowid
from tempt t2
where t2.b = t.b
order by rand
limit 3
);
You can use the hidden RowID column to get three rows per B value as follows:
SELECT A, B FROM T T1
WHERE RowID IN (SELECT RowID FROM T T2 WHERE B = T1.B LIMIT 3);
Note that you're likely (but not 100% guaranteed) to get the same three rows each time. If you want to get random rows at the expense of some performance, you can do:
SELECT A, B FROM T T1
WHERE RowID IN (SELECT RowID FROM T T2 WHERE B = T1.B ORDER BY random() LIMIT 3);

SQLite - Update a column based on values from two other tables' columns

I am trying to update Data1's ID to Record2's ID when:
Record1's and Record2's Name are the same, and
Weight is greater in Record2.
Record1
| ID | Weight | Name |
|----|--------|------|
| 1 | 10 | a |
| 2 | 10 | b |
| 3 | 10 | c |
Record2
| ID | Weight | Name |
|----|--------|------|
| 4 | 20 | a |
| 5 | 20 | b |
| 6 | 20 | c |
Data1
| ID | Weight |
|----|--------|
| 4 | 40 |
| 5 | 40 |
I have tried the following SQLite query:
update data1
set id =
(select record2.id
from record2,record1
where record1.name=record2.name
and record1.weight<record2.weight)
where id in
(select record1.id
from record1, record2
where record1.name=record2.name
and record1.weight<record2.weight)
Using the above query Data1's id is updated to 4 for all records.
NOTE: Record1's ID is the foreign key for Data1.
For the given data set the following seems to serve the cause:
update data1
set id =
(select record2.id
from record2,record1
where
data1.id = record1.id
and record1.name=record2.name
and record1.weight<record2.weight)
where id in
(select record1.id
from record1, record2
where
record1.id in (select id from data1)
and record1.name=record2.name
and record1.weight<record2.weight)
;
See it in action: SQL Fiddle.
Please comment if and as this requires adjustment / further detail.

Oracle 11g r2: strange behavior on index

I have the query:
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
--TBELENCO.RAGIONE_SOCIALE,
--ROW_NUMBER() OVER (ORDER BY TBELENCO.DATA_PROC DESC) R
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
--,C.RAGIONE_SOCIALE
--ROW_NUMBER() OVER (ORDER BY LOG.DATA_PROC DESC) R
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
-- LEFT JOIN(
-- SELECT CUST.RAGIONE_SOCIALE,STR.POD,RSC.DATA_DA, RSC.DATA_A
-- FROM
-- MS038_METERS STR JOIN MS036_REL_SITES_CUSTOMERS RSC ON
-- STR.ID_SITO=RSC.ID_SITO
-- JOIN MS030_CUSTOMERS CUST ON
-- CUST.ID_CLIENTE=RSC.ID_CLIENTE
-- ) C ON
-- C.POD=LOG.POD
--AND LOG.DATA_PROC BETWEEN C.DATA_DA AND C.DATA_A
WHERE
1=1
--AND LOG.DATA_PROC>=TRUNC(SYSDATE)
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3
--TO_DATE('01/11/2014', 'DD/MM/YYYY')
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
If I execute the query with AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3, Oracle uses the index on the data_proc field of the MS042_LOADING_LOGS (LOG) table, if I use, instead, AND LOG.DATA_PROC>=TRUNC(SYSDATE)-4 or -5, or -6, etc, it uses a table access full. Why this behavior?
I also execute a :
ALTER INDEX MS042_DATA_PROC_IDX REBUILD;
but with no changes.
Thank,
Igor
--***********************************************************
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
WHERE
1=1
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-1
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
Plan hash value: 2191058229
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 30866 (2)| 00:06:11 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 94236 | 1196K| 30866 (2)| 00:06:11 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 94236 | 1104K| 30866 (2)| 00:06:11 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MS042_LOADING_LOGS | 94236 | 920K| 30864 (2)| 00:06:11 |
|* 7 | INDEX RANGE SCAN | MS042_DATA_PROC_IDX | 94236 | | 25742 (2)| 00:05:09 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
7 - access(SYS_OP_DESCEND("DATA_PROC")<=SYS_OP_DESCEND(TRUNC(SYSDATE#!)-1))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("DATA_PROC"))>=TRUNC(SYSDATE#!)-1)
Plan hash value: 69930686
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 95921 (1)| 00:19:12 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 1467K| 18M| 95921 (1)| 00:19:12 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 1467K| 16M| 95921 (1)| 00:19:12 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| MS042_LOADING_LOGS | 1467K| 13M| 95912 (1)| 00:19:11 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
6 - filter("LOG"."DATA_PROC">=TRUNC(SYSDATE#!)-4)
The larger the fraction of rows that will be returned, the more efficient a table scan is and the less efficient it is to use an index. Apparently, Oracle expects that inflection point to come when the query returns more than 3 days of data. If that is inaccurate, I would expect that the statistics on your table or indexes are inaccurate.

Resources