SQLite cross reference unique combinations - sqlite

I've got two tables already populated with data with the given schemas:
CREATE TABLE objects
(
id BIGINT NOT NULL,
latitude BIGINT NOT NULL,
longitude BIGINT NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE tags
(
id BIGINT NOT NULL,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500),
PRIMARY KEY (id , tag_key)
)
object.id and tags.id refer to the same object
I'd like to populate a third table with the unique combinations of tag_key and tag_value. For example:
INSERT OR REPLACE INTO objects (id) VALUES (0);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (0, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (1);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (1, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (2);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'x');
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (3);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (3, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (4);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (4, 'a', 'y');
Should result in 3 entries of
0: ([a,x])
1: ([a,y])
3: ([a,x][a,y])
Currently I have:
CREATE TABLE tags_combinations
(
id INTEGER PRIMARY KEY,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500)
);
The id shouldn't be related to the original id of the object, just something to group unique combinations.
This is the query I have so far:
SELECT
t1.tag_key, t1.tag_value
FROM
tags t1
WHERE
t1.id
IN
(
/* select ids who's every tags entry is not under one id in tags_combinations */
SELECT
t2.id
FROM
tags t2
WHERE
t2.tag_key, t2.tag_value
NOT IN
(
)
);
The part with the comment is what I am not sure about, how would I select every id from tags that does not have all of the corresponding tag_key and tag_value entries already under one id in tags_combinations?
To clarify exactly the result I am after: From the sample data given, it should return 4 rows with:
row id tag_key tag_value
0 0 a x
1 1 a y
2 2 a x
3 2 a y

SQL is a set-based language. If you reformulate your question in the language of set theory, you can directly translate it into SQL:
You want all rows of the tags table, except those from duplicate objects.
Objects are duplicates if they have exactly the same key/value combinations. However, we still want to return one of those objects, so we define duplicates only as those objects where no other duplicate object with a smaller ID exists.
Two objects A and B have exactly the same key/value combinations if
all key/value combinations in A also exist in B, and
all key/value combinations in B also exist in A.
All key/value combinations in A also exist in B if there is no key/value combination in A that does not exist in B (note: double negation).
SELECT id, tag_key, tag_value
FROM tags
WHERE NOT EXISTS (SELECT 1
FROM tags AS dup
WHERE dup.id < tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND B.tag_key = A.tag_key
AND B.tag_value = A.tag_value)
)
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND A.tag_key = B.tag_key
AND A.tag_value = B.tag_value)
)
)
ORDER BY id, tag_key;

This is not easy in SQLite. We want to identify groups of tag key/value pairs. So we could group by id and get a string of the associated pairs with group_concat. This would be the way to do it in another DBMS. SQLite, however, cannot order in group_concat, so we might end up with 2: 'a/x,a/y' and 5: 'a/y,a/x'. Two different strings for the same pairs.
Your best bet may be to write a program and find the distinct pairs iteratively.
In SQLite you may want to try this:
insert into tags_combinations (id, tag_key, tag_value)
select id, tag_key, tag_value
from tags
where id in
(
select min(id)
from
(
select id, group_concat(tag_key || '/' || tag_value) as tag_pairs
from
(
select id, tag_key, tag_value
from tags
order by id, tag_key, tag_value
) ordered_data
group by id
) aggregated_data
group by tag_pairs
);
Ordering the data before applying group_concat is likely to get the tag pairs ordered, but in no way guaranteed! If this is something you want to do only once, it may be worth a try, though.

To merge multiple rows into one value, you need a function like group_concat().
The ORDER BY is needed to ensure a consistent order of the rows within a group:
SELECT DISTINCT group_concat(tag_key) AS tag_keys,
group_concat(tag_value) AS tag_values
FROM (SELECT id,
tag_key,
tag_value
FROM tags
ORDER BY id,
tag_key,
tag_value)
GROUP BY id;
If you want to have keys and values interleaved, as shown in the question, you need to do more string concatenation:
SELECT DISTINCT group_concat(tag_key || ',' || tag_value, ';') AS keys_and_values
FROM (...

Related

How can I simulate variables, loops, and concatenations to work together in SQLite?

I am stuck with the limitations of both SQLite and the design of some of its tables. Here's what I'd like to achieve:
Create a variable to track the number of iterations of a loop. Then use this variable to help create unique table rows during the loop. Also insert it as the literal count value in a record.
Concatenate said variable to existing strings to create unique IDs that can be referenced multiple times.
Loop code using the previous variable to track iterations, and the previous concatenations as IDs for inserting new rows for every loop. 255 loops is an arbitrary number, I doubt I'd need 255 loops in 99.9% of cases, but want to avoid failure in the cases where they are needed. I am realistically looking at 50 loops minimum, with 100 as a rare maximum. 200 would likely be closer to the true maximum outlier number. 255 is just to be safe.
Here is what I have attempted so far:
DECLARE #cnt INT = 1;
WHILE #cnt < 256
BEGIN
#seyield = 'BUILDING_STOCK_EXCHANGE_YIELD_' + #cnt;
#secitizens = 'BUILDING_STOCK_EXCHANGE_CITIZENS_' + #cnt;
#secount = 'COUNT_CITIZENS_' + #cnt;
INSERT INTO
BuildingModifiers (BuildingType, ModifierId)
VALUES
('BUILDING_STOCK_EXCHANGE', #seyield);
INSERT INTO
Modifiers (ModifierId, ModifierType, RunOnce, Permanent, SubjectRequirementSetId)
VALUES
(#seyield, 'MODIFIER_BUILDING_YIELD_CHANGE', 0, 0, #secitizens);
INSERT INTO
ModifierArguments (ModifierID, Name, Value)
VALUES
(#seyield, 'BuildingType', 'BUILDING_STOCK_EXCHANGE'),
(#seyield, 'Amount', '2'),
(#seyield, 'YieldType', 'YIELD_GOLD');
INSERT INTO
RequirementSets(RequirementSetId, RequirementSetType)
VALUES
(#secitizens, 'REQUIREMENT_TEST_ALL');
INSERT INTO
RequirementSetRequirements(RequirementSetId, RequirementId)
VALUES
(#secitizens, #secount);
INSERT INTO
Requirements(RequirementId, RequirementType)
VALUES
(#secount, 'REQUIREMENT_COLLECTION_ATLEAST');
INSERT INTO
RequirementArguments(RequirementId, Name, Value)
VALUES
(#secount, 'CollectionType', 'COLLECTION_CITY_PLOT_YIELDS'),
(#secount, 'Count', #cnt);
SET #cnt = #cnt + 1;
END;
Of course this does not work due to the limitations of SQLite.
Are there any valid workarounds to this?
I know of one, but is almost unfeasible: Leave out the loop, variable, and concatenations, and manually copy and paste this code bloc, manually changing the relevant fields each time. However, this would require 255 copy and pastes multiplied by around 8 or 9 times for each different BUILDING_TYPE I need to attach rows to. I'd rather not do this if there is a faster and more efficient way!
You can do it with a recursive CTE and the use of a temporary table:
drop table if exists temp.temptable;
create temporary table temptable(cnt int, seyield text, secitizens text, secount text);
with
recursive constants as (
select
'BUILDING_STOCK_EXCHANGE_YIELD_' seyield,
'BUILDING_STOCK_EXCHANGE_CITIZENS_' secitizens,
'COUNT_CITIZENS_' secount
),
numbers as (
select 1 cnt
from constants
union all
select cnt + 1 from numbers
where cnt < 255
),
cte as (
select
n.cnt cnt,
c.seyield || n.cnt seyield,
c.secitizens || n.cnt secitizens,
c.secount || n.cnt secount
from numbers n cross join constants c
)
insert into temptable
select * from cte;
INSERT INTO BuildingModifiers (BuildingType, ModifierId)
SELECT 'BUILDING_STOCK_EXCHANGE', seyield FROM temptable;
INSERT INTO Modifiers (ModifierId, ModifierType, RunOnce, Permanent, SubjectRequirementSetId)
SELECT seyield, 'MODIFIER_BUILDING_YIELD_CHANGE', 0, 0, secitizens FROM temptable;
INSERT INTO ModifierArguments (ModifierID, Name, Value)
SELECT seyield, 'BuildingType', 'BUILDING_STOCK_EXCHANGE' FROM temptable
UNION ALL
SELECT seyield, 'Amount', '2' FROM temptable
UNION ALL
SELECT seyield, 'YieldType', 'YIELD_GOLD' FROM temptable;
INSERT INTO RequirementSets(RequirementSetId, RequirementSetType)
SELECT secitizens, 'REQUIREMENT_TEST_ALL' FROM temptable;
INSERT INTO RequirementSetRequirements(RequirementSetId, RequirementId)
SELECT secitizens, secount FROM temptable;
INSERT INTO Requirements(RequirementId, RequirementType)
SELECT secount, 'REQUIREMENT_COLLECTION_ATLEAST' FROM temptable;
INSERT INTO RequirementArguments(RequirementId, Name, Value)
SELECT secount, 'CollectionType', 'COLLECTION_CITY_PLOT_YIELDS' FROM temptable
UNION ALL
SELECT secount, 'Count', cnt FROM temptable;
See the demo.

Does two columns have any equal value? SQLite

In two columns within the same table (idParent1 and idParent2), I want to know if there are any two equal ids in any row (not just in the same row). If there are equal values (in any row) I want to obtain these ids.
I try this without success.
SELECT idParent1
FROM
Person
WHERE
Person.idParent1 IN Person.idParent2;
I'm getting the error: no such table: Person.idParent2
Example:
sqlite> CREATE TABLE Person(
ID INT PRIMARY KEY NOT NULL,
Parent1 INT NOT NULL,
Parent2 INT NOT NULL,
);
INSERT INTO Person (ID,Parent1,Parent2)
VALUES (1, 2,3);
INSERT INTO Person (ID,Parent1,Parent2)
VALUES (2, 5,6);
INSERT INTO Person (ID,Parent1,Parent2)
VALUES (3, 7,5);
INSERT INTO Person (ID,Parent1,Parent2)
VALUES (9, 10,12);
INSERT INTO Person (ID,Parent1,Parent2)
VALUES (45, 2,3);
In this example 5 is common in Parent1 and Parent2. I want to identify all the values like 5 are present in both columns
If you want to check only for idParent1:
SELECT idParent1
FROM
Person
WHERE
Person.idParent1 IN (
SELECT
idParent2
FROM
Person
);
or
SELECT idParent1
FROM
Person AS p
WHERE EXISTS (SELECT 1 FROM Person WHERE Person.idParent2 = p.idParent1);
A similar solution can apply to idParent2.
Your syntax is not correct. The correct syntax is:
SELECT idParent1 FROM Person WHERE idParent1 IN (SELECT idParent2 FROM Person)

Simple Split function in SQL Server 2012 with explanation pls

I have two tables Procedures and ProcedureTypes.
Procedures has a column Type which is a varchar with the values (1, 2), (3, 4), (4, 5) etc...
ProcedureType has a primary key 'ID' 1 to 9.
ID Description
1 Drug
2 Other-Drug
etc...
ID is an integer value and Type is varchar value.
Now I need to join these two tables to show the values
ID in the Procedures table
ProcedureType in the Procedures table
Description in the ProceduresType table with the value separated by a "-".
For example if he value in Type is (1,2) the new table after join should show values in the description like (Drug-Other Drug)
I have used this query bot to no avail
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Can anyone tell me how to do it and why the above query is not working
with Procedures as (
select 1 as ID, '1,2,3' as Typ
),
ProcedureTypes as (
select 1 as TypeID, 'Drug' as Name
union select 2 , 'Other-Drug'
union select 3 , 'Test 3'
)
/*Get one extra column of type xml*/
,Procedures_xml as (
select id,CONVERT(xml,' <root> <s>' + REPLACE(Typ,',','</s> <s>') + '</s> </root> ') as Typ_xml
from Procedures
)
/*Convert the field string to multiple rows then join to procedure types*/
, Procdure_With_Type as (
select ID,T.c.value('.','varchar(20)') as TypeID,
ProcedureTypes.Name
from Procedures_xml
CROSS APPLY Typ_xml.nodes('/root/s') T(c)
INNER JOIN ProcedureTypes ON T.c.value('.','varchar(20)') = ProcedureTypes.TypeID
)
/*Finally, group the procedures type names by procedure id*/
select id,
STUFF((
SELECT ', ' + [Name]
FROM Procdure_With_Type inn
WHERE (Procdure_With_Type.ID = inn.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
from Procdure_With_Type
group by ID
You can't have a select statement as a parameter for a function, so instead of this:
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Use this:
select S.*
from GPsProcedures P
cross apply dbo.[Split](P.RequestType, ',') S

SQLite FTS4 with preferred language

I have an SQLite table that was generated by using the FTS4 module. Each entry is listed at least twice with different languages, but still sharing a unique ID (int column, not indexed).
Here is what I want to do:
I want to lookup a term in a preferred language. I want to union the result with a lookup for the same term using another language.
For the second lookup though, I want to ignore all entries (identified by their ID) that I already found during the first lookup. So basically I want to do this:
WITH term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1)
SELECT *
FROM term_search1
UNION
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 2
AND id NOT IN (SELECT id FROM term_search1)
The problem here is, that the term_seach1 Query would be executed twice. Is there a way of materializing my results maybe? Any solution for limiting it to 2 Queries (instead of 3) would be great.
I also tried using recursive Queries, something like:
WITH RECURSIVE term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1
UNION ALL
SELECT m.*
FROM myFts m LEFT OUTER JOIN term_search1 t ON (m.id = t.id)
WHERE myFts MATCH 'term'
AND m.languageId = 2
AND t.id IS NULL
)
SELECT * FROM term_search1
This didn't work neither. Apparently he just executed two lookups for languageId = 2 (is this a bug maybe?).
Thanks in advance :)
You can use TEMPORARY tables to reduce the number of queries to myFts to 2:
CREATE TEMP TABLE results (id INTEGER PRIMARY KEY);
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 1;
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 2
AND id NOT IN (SELECT id FROM results);
SELECT * FROM myFts
WHERE id IN (SELECT id FROM results);
DROP TABLE results;
If it's possible to change the schema, you should only keep text data in the FTS table. This way you will avoid incorrect results when you are searching for numbers and rows matching languageId is not desired. Create another meta table holding non-textual data (like id and languageId) and filter the rows by joining against the rowid of the myFts. This way you will need to query the FTS table only once - use the temporary table to store the FTS table results then use the meta table to order them.
This is the best I can think of :
SELECT *
FROM myFts t1
JOIN (SELECT COUNT(*) AS cnt, id
FROM myFts t2
WHERE t2.languageId in (1, 2)
AND t2.myFts MATCH 'term'
GROUP BY t2.id) t3
ON t1.id = t3.id
WHERE t1.myFts MATCH 'term'
AND t1.languageId in (1, 2)
AND (t1.languageId = 1 or t3.cnt = 1)
I am not sure if the second MATCH clause is necessary.
The idea is to first count the acceptable rows, then choose the best one.
Edit : I have no idea why it does not work with your table. This is what I did to test it (SQLite version 3.8.10.2):
CREATE VIRTUAL TABLE myFts USING fts4(
id integer,
languageId integer,
content TEXT
);
insert into myFts(id, languageId, content) values (10, 1, 'term 10 lang 1');
insert into myFts(id, languageId, content) values (10, 2, 'term 10 lang 2');
insert into myFts(id, languageId, content) values (11, 1, 'term 11 lang 1');
insert into myFts(id, languageId, content) values (12, 2, 'term 12 lang 2');
insert into myFts(id, languageId, content) values (13, 1, 'not_erm 13 lang 1');
insert into myFts(id, languageId, content) values (13, 2, 'term 13 lang 2');
executing the query gives :
sqlite> SELECT *
...> FROM myFts t1
...> JOIN (SELECT COUNT(*) AS cnt, id
...> FROM myFts t2
...> WHERE t2.languageId in (1, 2)
...> AND t2.myFts MATCH 'term'
...> GROUP BY t2.id) t3
...> ON t1.id = t3.id
...> WHERE t1.myFts MATCH 'term'
...> AND t1.languageId in (1, 2)
...> AND (t1.languageId = 1 or t3.cnt = 1);
10|1|term 10 lang 1|2|10
11|1|term 11 lang 1|1|11
12|2|term 12 lang 2|1|12
13|2|term 13 lang 2|1|13
sqlite>

SQLite Insert and Replace with condition

I can not figure out how to query a SQLite.
needed:
1) Replace the record (the primary key), if the condition (comparison of new and old fields entries)
2) Insert an entry if no such entry exists in the database on the primary key.
Importantly, it has to work very fast!
I can not come up with an effective inquiry.
Edit.
MyInsertRequest - the desired expression.
Script:
CREATE TABLE testtable (a INT PRIMARY KEY, b INT, c INT)
INSERT INTO testtable VALUES (1, 2, 3)
select * from testtable
1|2|3
-- Adds an entry, because the primary key is not
++ MyInsertRequest VALUES (2, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
-- Adds
++ MyInsertRequest VALUES (3, 8, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing, because such a record (from primary key field 'a')
-- is in the database and none c>4
++ MyInsertRequest VALUES (1, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing
++ MyInsertRequest VALUES (3, 34, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- replace, because such a record (from primary key field 'a')
-- is in the database and c>2
++ MyInsertRequest VALUES (3, 34, 1) {if c>2 then replace}
select * from testtable
1|2|3
2|2|3
3|34|1
Isn't INSERT OR REPLACE what you need ? e.g. :
INSERT OR REPLACE INTO table (cola, colb) values (valuea, valueb)
When a UNIQUE constraint violation occurs, the REPLACE algorithm
deletes pre-existing rows that are causing the constraint violation
prior to inserting or updating the current row and the command
continues executing normally.
You have to put the condition in a unique constraint on the table. It will automatically create an index to make the check efficient.
e.g.
-- here the condition is on columnA, columnB
CREATE TABLE sometable (columnPK INT PRIMARY KEY,
columnA INT,
columnB INT,
columnC INT,
CONSTRAINT constname UNIQUE (columnA, columnB)
)
INSERT INTO sometable VALUES (1, 1, 1, 0);
INSERT INTO sometable VALUES (2, 1, 2, 0);
select * from sometable
1|1|1|0
2|1|2|0
-- insert a line with a new PK, but with existing values for (columnA, columnB)
-- the line with PK 2 will be replaced
INSERT OR REPLACE INTO sometable VALUES (12, 1, 2, 6)
select * from sometable
1|1|1|0
12|1|2|6
Assuming your requirements are:
Insert a new row when a doesn't exists;
Replacing row when a exist and existing c greater then new c;
Do nothing when a exist and existing c lesser or equal then new c;
INSERT OR REPLACE fits first two requirements.
For last requirement, the only way I know to make an INSERT ineffective is supplying a empty rowset.
A SQLite command like following whould make the job:
INSERT OR REPLACE INTO sometable SELECT newdata.* FROM
(SELECT 3 AS a, 2 AS b, 1 AS c) AS newdata
LEFT JOIN sometable ON newdata.a=sometable.a
WHERE newdata.c<sometable.c OR sometable.a IS NULL;
New data (3,2,1 in this example) is LEFT JOINen with current table data.
Then WHERE will "de-select" the row when new c is not less then existing c, keeping it when row is new, ie, sometable.* IS NULL.
I tried the others answers because I was also suffering from a solution to this problem.
This should work, however I am unsure about the performance implications. I believe that you may need the first column to be unique as a primary key else it will simply insert a new record each time.
INSERT OR REPLACE INTO sometable
SELECT columnA, columnB, columnC FROM (
SELECT columnA, columnB, columnC, 1 AS tmp FROM sometable
WHERE sometable.columnA = 1 AND
sometable.columnB > 9
UNION
SELECT 1 AS columnA, 1 As columnB, 404 as columnC, 0 AS tmp)
ORDER BY tmp DESC
LIMIT 1
In this case one dummy query is executed and union-ed onto a second query which would have a performance impact depending on how it is written and how the table is indexed. The next performance problem has potential where the results are ordered and limited. However, I expect that the second query should only return one record and therefore it should not be too much of a performance hit.
You can also omit the ORDER BY tmp LIMIT 1 and it works with my version of sqlite, but it may impact performance since it can end up updating the record twice (writing the original value then the new value if applicable).
The other problem is that you end up with a write to the table even if the condition states that it should not be updated.

Resources