SQLite FTS4 with preferred language - sqlite

I have an SQLite table that was generated by using the FTS4 module. Each entry is listed at least twice with different languages, but still sharing a unique ID (int column, not indexed).
Here is what I want to do:
I want to lookup a term in a preferred language. I want to union the result with a lookup for the same term using another language.
For the second lookup though, I want to ignore all entries (identified by their ID) that I already found during the first lookup. So basically I want to do this:
WITH term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1)
SELECT *
FROM term_search1
UNION
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 2
AND id NOT IN (SELECT id FROM term_search1)
The problem here is, that the term_seach1 Query would be executed twice. Is there a way of materializing my results maybe? Any solution for limiting it to 2 Queries (instead of 3) would be great.
I also tried using recursive Queries, something like:
WITH RECURSIVE term_search1 AS (
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 1
UNION ALL
SELECT m.*
FROM myFts m LEFT OUTER JOIN term_search1 t ON (m.id = t.id)
WHERE myFts MATCH 'term'
AND m.languageId = 2
AND t.id IS NULL
)
SELECT * FROM term_search1
This didn't work neither. Apparently he just executed two lookups for languageId = 2 (is this a bug maybe?).
Thanks in advance :)

You can use TEMPORARY tables to reduce the number of queries to myFts to 2:
CREATE TEMP TABLE results (id INTEGER PRIMARY KEY);
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 1;
INSERT INTO results
SELECT id FROM myFts
WHERE myFts MATCH 'term' AND languageId = 2
AND id NOT IN (SELECT id FROM results);
SELECT * FROM myFts
WHERE id IN (SELECT id FROM results);
DROP TABLE results;
If it's possible to change the schema, you should only keep text data in the FTS table. This way you will avoid incorrect results when you are searching for numbers and rows matching languageId is not desired. Create another meta table holding non-textual data (like id and languageId) and filter the rows by joining against the rowid of the myFts. This way you will need to query the FTS table only once - use the temporary table to store the FTS table results then use the meta table to order them.

This is the best I can think of :
SELECT *
FROM myFts t1
JOIN (SELECT COUNT(*) AS cnt, id
FROM myFts t2
WHERE t2.languageId in (1, 2)
AND t2.myFts MATCH 'term'
GROUP BY t2.id) t3
ON t1.id = t3.id
WHERE t1.myFts MATCH 'term'
AND t1.languageId in (1, 2)
AND (t1.languageId = 1 or t3.cnt = 1)
I am not sure if the second MATCH clause is necessary.
The idea is to first count the acceptable rows, then choose the best one.
Edit : I have no idea why it does not work with your table. This is what I did to test it (SQLite version 3.8.10.2):
CREATE VIRTUAL TABLE myFts USING fts4(
id integer,
languageId integer,
content TEXT
);
insert into myFts(id, languageId, content) values (10, 1, 'term 10 lang 1');
insert into myFts(id, languageId, content) values (10, 2, 'term 10 lang 2');
insert into myFts(id, languageId, content) values (11, 1, 'term 11 lang 1');
insert into myFts(id, languageId, content) values (12, 2, 'term 12 lang 2');
insert into myFts(id, languageId, content) values (13, 1, 'not_erm 13 lang 1');
insert into myFts(id, languageId, content) values (13, 2, 'term 13 lang 2');
executing the query gives :
sqlite> SELECT *
...> FROM myFts t1
...> JOIN (SELECT COUNT(*) AS cnt, id
...> FROM myFts t2
...> WHERE t2.languageId in (1, 2)
...> AND t2.myFts MATCH 'term'
...> GROUP BY t2.id) t3
...> ON t1.id = t3.id
...> WHERE t1.myFts MATCH 'term'
...> AND t1.languageId in (1, 2)
...> AND (t1.languageId = 1 or t3.cnt = 1);
10|1|term 10 lang 1|2|10
11|1|term 11 lang 1|1|11
12|2|term 12 lang 2|1|12
13|2|term 13 lang 2|1|13
sqlite>

Related

Querying a many to many relationship with filter and pagination in SQLite

I am trying to paginate over a list of blog posts and filter those based on a list of tags they might have in an SQLite database.
Posts and Tags have a n-to-n relationship so I created a PostTag relation table.
CREATE TABLE "Post" (
"Id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
"Title" TEXT
);
CREATE TABLE "Tag" (
"Id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
"Label" TEXT
);
CREATE TABLE "PostTag" (
"Id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
"PostId" INTEGER,
"TagId" INTEGER,
FOREIGN KEY("PostId") REFERENCES "Post"("Id"),
FOREIGN KEY("TagId") REFERENCES "Tag"("Id")
);
Given the following data
INSERT INTO Post (Title) VALUES ('Post title 1'), ('Post title 2'), ('Post title 3');
INSERT INTO Tag (Label) VALUES ('news'), ('funny'), ('review');
INSERT INTO PostTag (PostId, TagId) VALUES (1, 1), (1, 2), (2, 3), (3, 2), (3, 3);
I am trying to select 10 posts that have both the tags 'news' and 'funny' so I would like only 'Post title 1' to be returned (edit for clarification : I need post 1 to be returned twice here, once with the 'news' tag, and once with the 'funny' tag).
I am using DENSE_RANK to actually have 10 different posts in the results even though the join could return more than 10 rows.
The issue I have is how to manage the 'AND' operator on tags values, i.e. not returning posts that have only one of the tags. So here I would not want post 3 to be returned because it only has the 'funny' tag, not the 'news' tag.
Here is my best query so far (updated below), which will return posts having 'news' or 'funny' which is not what I want :
SELECT * FROM (
SELECT p.*, t.*, DENSE_RANK() OVER(order by p.id desc) rnk
FROM Post p
JOIN PostTag pt ON p.Id = pt.PostId
JOIN Tag t ON pt.TagId = t.Id AND t.Label IN ('news', 'funny')
ORDER BY p.id desc
) ranked
WHERE rnk <= 10
Please note that I am deduplicating and regrouping the results by posts afterwards using dapper, so having each post appearing several times is not real concern (please read update below for more details).
UPDATE:
The query must return a matching post as many times as the number of its associated tags (even though those tags may not be in the queried tags), something like :
Id Title Id:2 Label rnk
1 'Post Title 1' 1 'news' 1
1 'Post Title 1' 2 'funny' 1
If later on, someone adds a tag to post 1 like so :
INSERT INTO Tag (Label) VALUES ('tech'); -- id is 4
INSERT INTO PostTag (PostId, TagId) VALUES (1, 4);
The result of the query should be
Id Title Id:2 Label rnk
1 'Post Title 1' 1 'news' 1
1 'Post Title 1' 2 'funny' 1
1 'Post Title 1' 4 'tech' 1
So I can show the matching post with all its tags, even though the tag was not in the query.
I finally have something working, but it is horribly nested and I am sincerely wondering why this problem is ending up being so convoluted. Isn't there a way to count on ranks directly ?
select * from (
select *, dense_rank() over(order by p.id desc) rnk
from Post p
join PostTag pt on p.Id = pt.PostId
join Tag t on pt.TagId = t.Id
and postId in (
select postId from (
select dense_rank() over(order by pt2.PostId) rnk2,
from PostTag pt2
join Tag t2 on pt2.TagId = t2.Id
where t2.Label in ('news', 'funny')
)
group by rnk2
having count(rnk2) == 2 -- 2 being the number of tags requested
) order by p.id desc
)
ranked where rnk <= 10
Just to give you one idea.
select * from (
select p.*, t.*, dense_rank() over(order by p.id desc) rnk
from Post p
join PostTag on p.Id = PostId
join Tag t on TagId = t.Id
and postid in (select postid
from posttag join tag t on tagid = t.id
where label in ('news', 'funny')
group by postid having count(distinct tagid) > 1)
order by p.id desc
) ranked
where rnk <= 10;
You can do it with a simple group by post, like this:
select p.id, p.title
from posttag pt
inner join post p on p.id = pt.postid
inner join tag t on t.id = pt.tagid
where t.label in ('news', 'funny')
group by p.id, p.title
having count(distinct t.id) = 2
order by p.id limit 10
See the demo.

Query fails to execute after converting a column from Varchar2 to CLOB

I have a oracle query
select id from (
select ID, ROW_NUMBER() over (partition by LATEST_RECEIPT order by ID) rownumber
from Table
where LATEST_RECEIPT in
(
select LATEST_RECEIPT from Table
group by LATEST_RECEIPT
having COUNT(1) > 1
)
) t
where rownumber <> 1;
The data type of LATEST_RECEIPT was earlier varchar2(4000) and this query worked fine. Since the length of the column needs to be extended i modified it to CLOB, after which this fails. Could anyone help me fix this issue or provide a work around?
You can change your inner query to look for other rows with the same last_receipt value but a different ID (assuming ID is unique); if another row exists then that is equivalent to your count returning greater than one. But you can't simply test two CLOB values for equality, you need to use dbms_lob.compare:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
);
Applying the row number filter is tricker, as you also can't use a CLOB in the analytic partition by clause. As André Schild suggested, you can use a hash; here passing the integer value 3, which is the equivalent of dbms_crypto.hash_sh1 (though in theory that could change in a future release!):
select id from (
select ID, ROW_NUMBER() over (partition by dbms_crypto.hash(LATEST_RECEIPT, 3)
order by ID) rownumber
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
)
)
where rownumber > 1;
It is of course possible to get a hash collision, and if that happened - you had two latest_receipt values which both appeared more than once and both hashed to the same value - then you could get too many rows back. That seems pretty unlikely, but it's something to consider.
So rather than ordering you can only look for rows which have the same lastest_receipt and a lower ID:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID < t1.ID
);
Again that assumes ID is unique. If it isn't then you could still use rowid instead, but you would have less control over which rows were found - the lowest rowid isn't necessarily the lowest ID. Presumably you're using this to dine rows to delete. If you actually don't mind which row you keep and which you delete then you could still do:
and t2.ROWID < t1.ROWID
But since you are currently ordering that probably isn't acceptable, and hashing might be preferable, despite the small risk.

Simple Split function in SQL Server 2012 with explanation pls

I have two tables Procedures and ProcedureTypes.
Procedures has a column Type which is a varchar with the values (1, 2), (3, 4), (4, 5) etc...
ProcedureType has a primary key 'ID' 1 to 9.
ID Description
1 Drug
2 Other-Drug
etc...
ID is an integer value and Type is varchar value.
Now I need to join these two tables to show the values
ID in the Procedures table
ProcedureType in the Procedures table
Description in the ProceduresType table with the value separated by a "-".
For example if he value in Type is (1,2) the new table after join should show values in the description like (Drug-Other Drug)
I have used this query bot to no avail
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Can anyone tell me how to do it and why the above query is not working
with Procedures as (
select 1 as ID, '1,2,3' as Typ
),
ProcedureTypes as (
select 1 as TypeID, 'Drug' as Name
union select 2 , 'Other-Drug'
union select 3 , 'Test 3'
)
/*Get one extra column of type xml*/
,Procedures_xml as (
select id,CONVERT(xml,' <root> <s>' + REPLACE(Typ,',','</s> <s>') + '</s> </root> ') as Typ_xml
from Procedures
)
/*Convert the field string to multiple rows then join to procedure types*/
, Procdure_With_Type as (
select ID,T.c.value('.','varchar(20)') as TypeID,
ProcedureTypes.Name
from Procedures_xml
CROSS APPLY Typ_xml.nodes('/root/s') T(c)
INNER JOIN ProcedureTypes ON T.c.value('.','varchar(20)') = ProcedureTypes.TypeID
)
/*Finally, group the procedures type names by procedure id*/
select id,
STUFF((
SELECT ', ' + [Name]
FROM Procdure_With_Type inn
WHERE (Procdure_With_Type.ID = inn.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
from Procdure_With_Type
group by ID
You can't have a select statement as a parameter for a function, so instead of this:
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Use this:
select S.*
from GPsProcedures P
cross apply dbo.[Split](P.RequestType, ',') S

SQLite cross reference unique combinations

I've got two tables already populated with data with the given schemas:
CREATE TABLE objects
(
id BIGINT NOT NULL,
latitude BIGINT NOT NULL,
longitude BIGINT NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE tags
(
id BIGINT NOT NULL,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500),
PRIMARY KEY (id , tag_key)
)
object.id and tags.id refer to the same object
I'd like to populate a third table with the unique combinations of tag_key and tag_value. For example:
INSERT OR REPLACE INTO objects (id) VALUES (0);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (0, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (1);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (1, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (2);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'x');
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (2, 'a', 'y');
INSERT OR REPLACE INTO objects (id) VALUES (3);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (3, 'a', 'x');
INSERT OR REPLACE INTO objects (id) VALUES (4);
INSERT OR REPLACE INTO tags (id, tag_key, tag_value) VALUES (4, 'a', 'y');
Should result in 3 entries of
0: ([a,x])
1: ([a,y])
3: ([a,x][a,y])
Currently I have:
CREATE TABLE tags_combinations
(
id INTEGER PRIMARY KEY,
tag_key VARCHAR(100) NOT NULL,
tag_value VARCHAR(500)
);
The id shouldn't be related to the original id of the object, just something to group unique combinations.
This is the query I have so far:
SELECT
t1.tag_key, t1.tag_value
FROM
tags t1
WHERE
t1.id
IN
(
/* select ids who's every tags entry is not under one id in tags_combinations */
SELECT
t2.id
FROM
tags t2
WHERE
t2.tag_key, t2.tag_value
NOT IN
(
)
);
The part with the comment is what I am not sure about, how would I select every id from tags that does not have all of the corresponding tag_key and tag_value entries already under one id in tags_combinations?
To clarify exactly the result I am after: From the sample data given, it should return 4 rows with:
row id tag_key tag_value
0 0 a x
1 1 a y
2 2 a x
3 2 a y
SQL is a set-based language. If you reformulate your question in the language of set theory, you can directly translate it into SQL:
You want all rows of the tags table, except those from duplicate objects.
Objects are duplicates if they have exactly the same key/value combinations. However, we still want to return one of those objects, so we define duplicates only as those objects where no other duplicate object with a smaller ID exists.
Two objects A and B have exactly the same key/value combinations if
all key/value combinations in A also exist in B, and
all key/value combinations in B also exist in A.
All key/value combinations in A also exist in B if there is no key/value combination in A that does not exist in B (note: double negation).
SELECT id, tag_key, tag_value
FROM tags
WHERE NOT EXISTS (SELECT 1
FROM tags AS dup
WHERE dup.id < tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND B.tag_key = A.tag_key
AND B.tag_value = A.tag_value)
)
AND NOT EXISTS (SELECT 1
FROM tags AS B
WHERE B.id = dup.id
AND NOT EXISTS (SELECT 1
FROM tags AS A
WHERE A.id = tags.id
AND A.tag_key = B.tag_key
AND A.tag_value = B.tag_value)
)
)
ORDER BY id, tag_key;
This is not easy in SQLite. We want to identify groups of tag key/value pairs. So we could group by id and get a string of the associated pairs with group_concat. This would be the way to do it in another DBMS. SQLite, however, cannot order in group_concat, so we might end up with 2: 'a/x,a/y' and 5: 'a/y,a/x'. Two different strings for the same pairs.
Your best bet may be to write a program and find the distinct pairs iteratively.
In SQLite you may want to try this:
insert into tags_combinations (id, tag_key, tag_value)
select id, tag_key, tag_value
from tags
where id in
(
select min(id)
from
(
select id, group_concat(tag_key || '/' || tag_value) as tag_pairs
from
(
select id, tag_key, tag_value
from tags
order by id, tag_key, tag_value
) ordered_data
group by id
) aggregated_data
group by tag_pairs
);
Ordering the data before applying group_concat is likely to get the tag pairs ordered, but in no way guaranteed! If this is something you want to do only once, it may be worth a try, though.
To merge multiple rows into one value, you need a function like group_concat().
The ORDER BY is needed to ensure a consistent order of the rows within a group:
SELECT DISTINCT group_concat(tag_key) AS tag_keys,
group_concat(tag_value) AS tag_values
FROM (SELECT id,
tag_key,
tag_value
FROM tags
ORDER BY id,
tag_key,
tag_value)
GROUP BY id;
If you want to have keys and values interleaved, as shown in the question, you need to do more string concatenation:
SELECT DISTINCT group_concat(tag_key || ',' || tag_value, ';') AS keys_and_values
FROM (...

SQLite Insert and Replace with condition

I can not figure out how to query a SQLite.
needed:
1) Replace the record (the primary key), if the condition (comparison of new and old fields entries)
2) Insert an entry if no such entry exists in the database on the primary key.
Importantly, it has to work very fast!
I can not come up with an effective inquiry.
Edit.
MyInsertRequest - the desired expression.
Script:
CREATE TABLE testtable (a INT PRIMARY KEY, b INT, c INT)
INSERT INTO testtable VALUES (1, 2, 3)
select * from testtable
1|2|3
-- Adds an entry, because the primary key is not
++ MyInsertRequest VALUES (2, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
-- Adds
++ MyInsertRequest VALUES (3, 8, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing, because such a record (from primary key field 'a')
-- is in the database and none c>4
++ MyInsertRequest VALUES (1, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing
++ MyInsertRequest VALUES (3, 34, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- replace, because such a record (from primary key field 'a')
-- is in the database and c>2
++ MyInsertRequest VALUES (3, 34, 1) {if c>2 then replace}
select * from testtable
1|2|3
2|2|3
3|34|1
Isn't INSERT OR REPLACE what you need ? e.g. :
INSERT OR REPLACE INTO table (cola, colb) values (valuea, valueb)
When a UNIQUE constraint violation occurs, the REPLACE algorithm
deletes pre-existing rows that are causing the constraint violation
prior to inserting or updating the current row and the command
continues executing normally.
You have to put the condition in a unique constraint on the table. It will automatically create an index to make the check efficient.
e.g.
-- here the condition is on columnA, columnB
CREATE TABLE sometable (columnPK INT PRIMARY KEY,
columnA INT,
columnB INT,
columnC INT,
CONSTRAINT constname UNIQUE (columnA, columnB)
)
INSERT INTO sometable VALUES (1, 1, 1, 0);
INSERT INTO sometable VALUES (2, 1, 2, 0);
select * from sometable
1|1|1|0
2|1|2|0
-- insert a line with a new PK, but with existing values for (columnA, columnB)
-- the line with PK 2 will be replaced
INSERT OR REPLACE INTO sometable VALUES (12, 1, 2, 6)
select * from sometable
1|1|1|0
12|1|2|6
Assuming your requirements are:
Insert a new row when a doesn't exists;
Replacing row when a exist and existing c greater then new c;
Do nothing when a exist and existing c lesser or equal then new c;
INSERT OR REPLACE fits first two requirements.
For last requirement, the only way I know to make an INSERT ineffective is supplying a empty rowset.
A SQLite command like following whould make the job:
INSERT OR REPLACE INTO sometable SELECT newdata.* FROM
(SELECT 3 AS a, 2 AS b, 1 AS c) AS newdata
LEFT JOIN sometable ON newdata.a=sometable.a
WHERE newdata.c<sometable.c OR sometable.a IS NULL;
New data (3,2,1 in this example) is LEFT JOINen with current table data.
Then WHERE will "de-select" the row when new c is not less then existing c, keeping it when row is new, ie, sometable.* IS NULL.
I tried the others answers because I was also suffering from a solution to this problem.
This should work, however I am unsure about the performance implications. I believe that you may need the first column to be unique as a primary key else it will simply insert a new record each time.
INSERT OR REPLACE INTO sometable
SELECT columnA, columnB, columnC FROM (
SELECT columnA, columnB, columnC, 1 AS tmp FROM sometable
WHERE sometable.columnA = 1 AND
sometable.columnB > 9
UNION
SELECT 1 AS columnA, 1 As columnB, 404 as columnC, 0 AS tmp)
ORDER BY tmp DESC
LIMIT 1
In this case one dummy query is executed and union-ed onto a second query which would have a performance impact depending on how it is written and how the table is indexed. The next performance problem has potential where the results are ordered and limited. However, I expect that the second query should only return one record and therefore it should not be too much of a performance hit.
You can also omit the ORDER BY tmp LIMIT 1 and it works with my version of sqlite, but it may impact performance since it can end up updating the record twice (writing the original value then the new value if applicable).
The other problem is that you end up with a write to the table even if the condition states that it should not be updated.

Resources