Recursive join and group_concat with SQLite - sqlite

I have the following table:
id value successor_id
-- ----- ------------
1 v1 2
2 v2 4
4 v3 null
7 v4 9
9 v5 null
12 v6 null
Note: Those are simple paths (no trees), so two ids can not have the same successor_id. Also, the ids are ordered and a successor must be the following id. E.g. in my example the only possible successor of id 2 is 4. It cannot be 1 or 7.
Now, I want to do some kind of recursive LEFT JOIN ON id = successor_id with the table itself and GROUP_CONCAT the values in order to get the following result:
min_id max_id values
------ ------ --------
1 4 v1,v2,v3
7 9 v4,v5
12 12 v6
How can I achieve this? I guess a combination of WITH RECURSIVE and GROUP BY, but I don't know how to start since WITH RECURSIVE requires a starting point, but I have multiple starting points (ids: 1, 7 and 12).
Here is the SQL code to create the example table:
.mode column
.nullvalue null
.width -1 -1 -1
CREATE TABLE test (
id INTEGER NOT NULL,
value STRING NOT NULL,
successor_id INTEGER
);
INSERT INTO test (id, value, successor_id) VALUES ( 1, 'v1', 2);
INSERT INTO test (id, value, successor_id) VALUES ( 2, 'v2', 4);
INSERT INTO test (id, value, successor_id) VALUES ( 4, 'v3', null);
INSERT INTO test (id, value, successor_id) VALUES ( 7, 'v4', 9);
INSERT INTO test (id, value, successor_id) VALUES ( 9, 'v5', null);
INSERT INTO test (id, value, successor_id) VALUES (12, 'v6', null);
SELECT id, value, successor_id
FROM test;

There is no need for a recursive query.
Use window functions:
WITH cte AS (
SELECT *, LAG(successor_id) OVER (ORDER BY id) IS NULL flag
FROM test
)
SELECT DISTINCT
MIN(id) OVER (PARTITION BY grp) min_id,
MAX(id) OVER (PARTITION BY grp) max_id,
GROUP_CONCAT(value) OVER (
PARTITION BY grp
ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) "values"
FROM (
SELECT *, SUM(flag) OVER (ORDER BY id) grp
FROM cte
);
See the demo.

Related

SQLite - Create dummy variable vector/string from multiple columns

I have some data that looks like this:
UserID Category
------ --------
1 a
1 b
2 c
3 b
3 a
3 c
A I'd like to binary-encode this grouped by UserID: three different values exist in Category, so a binary encoding would be something like:
UserID encoding
------ --------
1 "1, 1, 0"
2 "0, 0, 1"
3 "1, 1, 1"
i.e., all three values are present for UserID = 3, so the corresponding vector is "1, 1, 1".
Is there a way to do this without doing a bunch of CASE WHEN statements? There may be dozens of possible values in Category
Cross join the distinct users to distinct categories and left join to the table.
Then use GROUP_CONCAT() window function which supports an ORDER BY clause, to collect the 0s and 1s:
WITH
users AS (SELECT DISTINCT UserID FROM tablename),
categories AS (
SELECT DISTINCT Category, DENSE_RANK() OVER (ORDER BY Category) rn
FROM tablename
),
cte AS (
SELECT u.UserID, c.rn,
'"' || GROUP_CONCAT(t.UserID IS NOT NULL)
OVER (PARTITION BY u.UserID ORDER BY c.rn) || '"' encoding
FROM users u CROSS JOIN categories c
LEFT JOIN tablename t
ON t.UserID = u.UserID AND t.Category = c.Category
)
SELECT DISTINCT userID,
FIRST_VALUE(encoding) OVER (PARTITION BY UserID ORDER BY rn DESC) encoding
FROM cte
ORDER BY userID
This will work for any number of categories.
See the demo.
Results:
UserID
encoding
1
"1,1,0"
2
"0,0,1"
3
"1,1,1"
First create an encoding table to explicit establish order of categories in the bitmap:
create table e (Category int, Encoding int);
insert into e values ('a', 1), ('b', 2), ('c', 4);
First generate a list of users u (cross) joined with the encoding table e to get a fully populated (UserId, Category, Encoding) table. Then left join the fully populated table with the user supplied data t. The right hand side t can now be used to drive if we need to set a bit or not:
select
u.UserId,
'"' ||
group_concat(case when t.UserId is null then 0 else 1 end, ', ')
|| '"' 'encoding'
from
(select distinct UserID from t) u
join e
left natural join t
group by 1
order by e.Encoding
and it gives the expected result:
1|"1, 1, 0"
2|"0, 0, 1"
3|"1, 1, 1"

How to get the last updated value of a value in sql

sample data I have 2 columns old_store_id, changed_new_store_id and there are cases when changed_new_store_id value will also get updated to new value. how can i traverse through DB(teradata) to get the last value (changed_new_store_id ) of the respective old_store_id
let say in 1 st row
old_store_id = A ;
changed_new_store_id = B
and 5 th row contains
old_store_id = B ;
changed_new_store_id = C
and some other nth row C is changed to X etc
how to get final value of A which is X ?
I can try using multiple self joins
using Stored procedure but it will not be an efficient way (for many reasons)
Is there any way to find ?
Please anyone suggest me
This assumes no "loops", and uses "bottom-up" recursion. Something very similar could be done "top-down", limiting the seed query to rows where the "old" value doesn't appear anywhere as a "new" value.
CREATE VOLATILE TABLE #Example (
Old_Store_ID VARCHAR(8),
New_Store_ID VARCHAR(8)
)
PRIMARY INDEX(Old_Store_ID)
ON COMMIT PRESERVE ROWS;
INSERT INTO #Example VALUES ('A', 'B');
INSERT INTO #Example VALUES ('D', 'c');
INSERT INTO #Example VALUES ('B', 'F');
INSERT INTO #Example VALUES ('c', 'FF');
INSERT INTO #Example VALUES ('FF', 'GG');
INSERT INTO #Example VALUES ('F', 'X');
WITH RECURSIVE #Traverse(Old_Store_ID,New_Store_ID,Final_ID)
AS
(
--Seed Query - start with only the rows having no further changes
SELECT Old_Store_ID
,New_Store_ID
,New_Store_ID as Final_ID
FROM #Example as This
WHERE NOT EXISTS (
SELECT 1 FROM #Example AS Other WHERE This.New_Store_ID = Other.Old_Store_ID
)
UNION ALL
--Recursive Join
SELECT NewRow.Old_Store_ID
,NewRow.New_Store_ID
,OldRow.Final_ID
FROM #Example AS NewRow
INNER JOIN #Traverse AS OldRow
ON NewRow.New_Store_ID = OldRow.Old_Store_ID
)
SELECT *
FROM #Traverse
;
A recursive answer:
CREATE VOLATILE TABLE #SearchList (
SearchID CHAR(2),
ParentSearchID CHAR(2)
)
PRIMARY INDEX(SearchID)
ON COMMIT PRESERVE ROWS;
INSERT INTO #SearchList VALUES ('A', 'B');
INSERT INTO #SearchList VALUES ('D', 'c');
INSERT INTO #SearchList VALUES ('B', 'F');
INSERT INTO #SearchList VALUES ('c', 'FF');
INSERT INTO #SearchList VALUES ('FF', 'GG');
INSERT INTO #SearchList VALUES ('F', 'X');
CREATE VOLATILE TABLE #IntermediateResults(
SearchID CHAR(2),
ParentSearchID CHAR(2),
SearchLevel INTEGER
)
ON COMMIT PRESERVE ROWS;
INSERT INTO #IntermediateResults
WITH RECURSIVE RecursiveParent(SearchID,ParentSearchID,SearchLevel)
AS
(
--Seed Query
SELECT SearchID
,ParentSearchID
,1
FROM #SearchList
UNION ALL
--Recursive Join
SELECT a.SearchID
,b.ParentSearchID
,SearchLevel+1
FROM #SearchList a
INNER JOIN RecursiveParent b
ON a.ParentSearchID = b.SearchID
)
SELECT SearchID
,ParentSearchID
,MAX(SearchLevel)
FROM RecursiveParent
GROUP BY SearchID
,ParentSearchID
;
SELECT RESULTS.*
FROM #IntermediateResults RESULTS
INNER JOIN (SELECT RESULTS_MAX.SearchID
,MAX(RESULTS_MAX.SearchLevel) MaxSearchLevel
FROM #IntermediateResults RESULTS_MAX
GROUP BY RESULTS_MAX.SearchID
) GROUPED_RESULTS
ON RESULTS.SearchID = GROUPED_RESULTS.SearchID
AND RESULTS.SearchLevel = GROUPED_RESULTS.MaxSearchLevel
ORDER BY RESULTS.SearchID ASC
,RESULTS.SearchLevel ASC
;
Output:
SearchID ParentSearchID SearchLevel
-------- -------------- -----------
A X 3
B X 2
c GG 2
D GG 3
F X 1
FF GG 1

Simple Split function in SQL Server 2012 with explanation pls

I have two tables Procedures and ProcedureTypes.
Procedures has a column Type which is a varchar with the values (1, 2), (3, 4), (4, 5) etc...
ProcedureType has a primary key 'ID' 1 to 9.
ID Description
1 Drug
2 Other-Drug
etc...
ID is an integer value and Type is varchar value.
Now I need to join these two tables to show the values
ID in the Procedures table
ProcedureType in the Procedures table
Description in the ProceduresType table with the value separated by a "-".
For example if he value in Type is (1,2) the new table after join should show values in the description like (Drug-Other Drug)
I have used this query bot to no avail
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Can anyone tell me how to do it and why the above query is not working
with Procedures as (
select 1 as ID, '1,2,3' as Typ
),
ProcedureTypes as (
select 1 as TypeID, 'Drug' as Name
union select 2 , 'Other-Drug'
union select 3 , 'Test 3'
)
/*Get one extra column of type xml*/
,Procedures_xml as (
select id,CONVERT(xml,' <root> <s>' + REPLACE(Typ,',','</s> <s>') + '</s> </root> ') as Typ_xml
from Procedures
)
/*Convert the field string to multiple rows then join to procedure types*/
, Procdure_With_Type as (
select ID,T.c.value('.','varchar(20)') as TypeID,
ProcedureTypes.Name
from Procedures_xml
CROSS APPLY Typ_xml.nodes('/root/s') T(c)
INNER JOIN ProcedureTypes ON T.c.value('.','varchar(20)') = ProcedureTypes.TypeID
)
/*Finally, group the procedures type names by procedure id*/
select id,
STUFF((
SELECT ', ' + [Name]
FROM Procdure_With_Type inn
WHERE (Procdure_With_Type.ID = inn.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
from Procdure_With_Type
group by ID
You can't have a select statement as a parameter for a function, so instead of this:
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Use this:
select S.*
from GPsProcedures P
cross apply dbo.[Split](P.RequestType, ',') S

SQLite cross reference unique combinations

I've got two tables already populated with data with the given schemas:
CREATE TABLE objects
(
id BIGINT NOT NULL,
latitude BIGINT NOT NULL,
longitude BIGINT NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE tags
(
id BIGINT NOT NULL,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500),
PRIMARY KEY (id , tag_key)
)
object.id and tags.id refer to the same object
I'd like to populate a third table with the unique combinations of tag_key and tag_value. For example:
INSERT OR REPLACE INTO objects (id) VALUES (0);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (0, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (1);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (1, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (2);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'x');
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (3);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (3, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (4);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (4, 'a', 'y');
Should result in 3 entries of
0: ([a,x])
1: ([a,y])
3: ([a,x][a,y])
Currently I have:
CREATE TABLE tags_combinations
(
id INTEGER PRIMARY KEY,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500)
);
The id shouldn't be related to the original id of the object, just something to group unique combinations.
This is the query I have so far:
SELECT
t1.tag_key, t1.tag_value
FROM
tags t1
WHERE
t1.id
IN
(
/* select ids who's every tags entry is not under one id in tags_combinations */
SELECT
t2.id
FROM
tags t2
WHERE
t2.tag_key, t2.tag_value
NOT IN
(
)
);
The part with the comment is what I am not sure about, how would I select every id from tags that does not have all of the corresponding tag_key and tag_value entries already under one id in tags_combinations?
To clarify exactly the result I am after: From the sample data given, it should return 4 rows with:
row id tag_key tag_value
0 0 a x
1 1 a y
2 2 a x
3 2 a y
SQL is a set-based language. If you reformulate your question in the language of set theory, you can directly translate it into SQL:
You want all rows of the tags table, except those from duplicate objects.
Objects are duplicates if they have exactly the same key/value combinations. However, we still want to return one of those objects, so we define duplicates only as those objects where no other duplicate object with a smaller ID exists.
Two objects A and B have exactly the same key/value combinations if
all key/value combinations in A also exist in B, and
all key/value combinations in B also exist in A.
All key/value combinations in A also exist in B if there is no key/value combination in A that does not exist in B (note: double negation).
SELECT id, tag_key, tag_value
FROM tags
WHERE NOT EXISTS (SELECT 1
FROM tags AS dup
WHERE dup.id < tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND B.tag_key = A.tag_key
AND B.tag_value = A.tag_value)
)
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND A.tag_key = B.tag_key
AND A.tag_value = B.tag_value)
)
)
ORDER BY id, tag_key;
This is not easy in SQLite. We want to identify groups of tag key/value pairs. So we could group by id and get a string of the associated pairs with group_concat. This would be the way to do it in another DBMS. SQLite, however, cannot order in group_concat, so we might end up with 2: 'a/x,a/y' and 5: 'a/y,a/x'. Two different strings for the same pairs.
Your best bet may be to write a program and find the distinct pairs iteratively.
In SQLite you may want to try this:
insert into tags_combinations (id, tag_key, tag_value)
select id, tag_key, tag_value
from tags
where id in
(
select min(id)
from
(
select id, group_concat(tag_key || '/' || tag_value) as tag_pairs
from
(
select id, tag_key, tag_value
from tags
order by id, tag_key, tag_value
) ordered_data
group by id
) aggregated_data
group by tag_pairs
);
Ordering the data before applying group_concat is likely to get the tag pairs ordered, but in no way guaranteed! If this is something you want to do only once, it may be worth a try, though.
To merge multiple rows into one value, you need a function like group_concat().
The ORDER BY is needed to ensure a consistent order of the rows within a group:
SELECT DISTINCT group_concat(tag_key) AS tag_keys,
group_concat(tag_value) AS tag_values
FROM (SELECT id,
tag_key,
tag_value
FROM tags
ORDER BY id,
tag_key,
tag_value)
GROUP BY id;
If you want to have keys and values interleaved, as shown in the question, you need to do more string concatenation:
SELECT DISTINCT group_concat(tag_key || ',' || tag_value, ';') AS keys_and_values
FROM (...

Sqlite Group By of subquery returns only one row

We observed that Sqlite returns always only one row if we apply group on a subquery and do not use an aggregation operation such as count or sum.
Here is a toy example:
Given table
CREATE TABLE ExampleTable (
id INT PRIMARY KEY,
rank INT NOT NULL
);
with data
INSERT INTO ExampleTable(id, rank) VALUES (1, 1);
INSERT INTO ExampleTable(id, rank) VALUES (2, 2);
INSERT INTO ExampleTable(id, rank) VALUES (3, 2);
the query
SELECT rank, COUNT(*) FROM (select id, rank from ExampleTable) GROUP BY rank;
returns
rank|count
2|2
1|1
However, without the COUNT operation Sqlite returns only 1 row.
SELECT rank FROM (select id, rank from ExampleTable) GROUP BY rank;
=>
rank
1
Is this is a bug or an expected behavior?

Resources