SQLite Duplicates NULL Perl - sqlite

I use the following code in Perl to find duplicates in my SQLite database on the basis of 5 columns (Term1 to Term5).
my $dublette = $dbh->selectall_arrayref("
SELECT t.ID, t.Tag1, t.Tag2, t.Term1, t.Term2, t.Term3, t.Term4, t.Term5
FROM Data t inner join (
SELECT ID, Tag1, Tag2, Term1, Term2, Term3, Term4, Term5, COUNT(*) c FROM Data GROUP BY Term1, Term2, Term3, Term4, Term5 HAVING c > 1)
x on t.Term1=x.Term1 AND t.Term2=x.Term2 AND t.Term3=x.Term3 AND t.Term4=x.Term4 AND t.Term5=x.Term5
This seems to work exept if my unused values are NULL (but it works if they are empty). Any idea ho can I change it to get all duplicates?

Related

Is it possible to compare value to multiple columns in ''In'' clause?

select m.value
from MY_TABLE m
where m.value in (select m2.some_third_value, m2.some_fourth_value
from MY_TABLE_2 m2
where m2.first_val member of v_my_array
or m2.second_val member of v_my_array_2)
Is it possible to write a select similar to this, where m.value is compared to two columns and has to match at least one of those? Something like where m.value in (select m2.first_val, m2.second_val). Or is writing two separate selects unavoidable here?
No. When there are multiple columns in the IN clause, there must be the same number of columns in the WHERE clause. The pairwise query compares each record in the WHERE clause against the records returned by the sub-query. The statement below
SELECT *
FROM table_main m
WHERE ( m.col_1, m.col_2 ) IN (SELECT s.col_a,
s.col_b
FROM table_sub s)
is equivalent to
SELECT *
FROM table_main m
WHERE EXISTS (SELECT 1
FROM table_sub s
WHERE m.col_1 = s.col_a
AND m.col_2 = s.col_b)
The only way to search both columns in one SELECT statement would be to OUTER JOIN the second table to the first table.
SELECT m.*
FROM table_main m
LEFT JOIN table_sub s ON (m.col_1 = s.col_a OR m.col_1 = s.col_b)
WHERE m.col_1 = s.col_a
OR m.col_1 = s.col_b

Find rows with same non-null values

This seems like a rather straightforward problem, yet I have not been able to find the solution:
In a table test, I have some subset of columns which I am interested in, say a,b,c,d,e,f.
Some or most of these columns are NULL, but at least one is always filled.
Now for some rows, returned, say by:
SELECT rowid,a,b,c,d,e,f LIMIT 1;
I would like to get the number of rows which have the same non-null values.
So for example if a,d,f are the columns that are not NULL for this row, the result would be the same as for:
SELECT COUNT(*)
FROM test WHERE a=a_ AND d=d_ AND f=f_
SELECT a as a_, d as d_, f as f_ FROM test LIMIT 1;
How can this be done in one step / line? Or do I need a temporary table?
You can use the operator IS to compare safely values that may be NULL:
SELECT COUNT(*)
FROM test t1
INNER JOIN (SELECT a, b, c, d, e, f FROM test LIMIT 1) t2
ON (t1.a, t1.b, t1.c, t1.d, t1.e, t1.f) IS (t2.a, t2.b, t2.c, t2.d, t2.e, t2.f);
or with a CTE:
WITH cte AS (SELECT a, b, c, d, e, f FROM test LIMIT 1)
SELECT COUNT(*)
FROM test t1 INNER JOIN cte t2
ON (t1.a, t1.b, t1.c, t1.d, t1.e, t1.f) IS (t2.a, t2.b, t2.c, t2.d, t2.e, t2.f);
See the demo.

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

SQLITE query, if last row matches criteria, check row preceding it matches different criteria

I'm finding it hard to get my head around this problem, and I couldn't find any answers to this specific problem anywhere:
Say I have a table like this, I'm just using fruit as an example:
Fruit | Date | Value
=================================
Apple | 1 | other_random_value
Apple | 2 | some_value_1
Apple | 3 | some_value_2
Pear | 1 | other_random_value
Pear | 2 | unexpected_value_1
Pear | 3 | some_value_2
Everything will be ordered by Fruit, then Date.
Basically, if the last row (for each fruit) is some_value_2, but the one preceding it is not some_value_1, I want to match just those fruits (i.e. in this case, Pear).
So, some_value_2 I always expect to come after a row with a certain value for that particular fruit, and if it doesn't I want to flag errors against those particular fruits. It would also be nice to match cases where nothing precedes some_value_2 as well, though if this is too complicated I could match it seperately and just check that some_value_2 is not the first row, which I don't imagine would be a difficult query.
EDIT: Also, being able to match any consecutive rows where the preceding value is unexpected would be nice, though I mainly care about the last 2 rows. So if being able to match all consecutive rows results in a simpler and better performing query, then I might go with that. I'm going to be doing an INSERT at the same time (into an alert table), so if I could flag it as an ERROR if it's the last two rows and a WARNING if it's not, that would be really nifty. Though I wouldn't know where to start with writing a query that does that. Also having a query that performs well is a must, as I will be using this across a large dataset.
EDIT:
This is what I used in the end, it's quite slow, but if I index Date, it's not so bad:
SELECT c.Id AS CId, c.Fruit AS CFruit,
c.Date AS CDate, c.Value AS CValue,
(SELECT Id
FROM fruits
WHERE Fruit = c.Fruit
AND Date >= c.Date
AND Id > c.Id
ORDER BY Date, Id) AS NId, n.Fruit AS NFruit,
n.Date AS NDate, n.Value AS NValue
FROM fruits AS c
JOIN fruits AS n ON n.Id = NId
ORDER BY c.Date, c.Id
I might try Joachim's method again at some point, as I realised I'm getting a lot of results I don't really care much about. Or I might even try incorporating the two somehow and delegate to INFO/ERROR as appropriate...
Solved: I used the same SELECT statement that I used to get NId, and used SELECT COUNT(*) instead of SELECT Id. This told me the number of results after the current one. Then I just used a CASE operator to turn it into a boolean field called Latest :). So I effectively combined Nicolas' and Joachim's methods. Performance still seems OK, probably because SQLite caches the results.
SQLite is (as far as I know) a bit low on efficient operators for this, so this is the best I can come up with for now :)
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 1
AND fruits.value <> 'some_value_1'
INTERSECT
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 0
AND fruits.value = 'some_value_2'
An SQLfiddle to test with.
I named the table fruits. This query gets you the preceding date for a ‘key‘ (fruit + date)
select fruit, date, value currvalue,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c ;
From there we can get the precedent value for each key
select f1.*, precedingdate, f2.value precedingvalue
from
fruits f1 join
(select fruit, date, value,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c) f2
on f1.fruit = f2.fruit and f1.date = precedingdate ;
For all the rows that have a previous row, you get both the current and preceding date and the current and preceding value.
Edit : we add an id used to choose when there are several identical previous date (see comment below)
I will be using intermediate views for the sake of clarity but you could write one big query.
As before, what's the previous date :
create view VFruitsWithPreviousDate
as select fruit, date, value, id,
(select max(date)
from fruits p
where p.fruit = c.fruit
and p.date < c.date) previousdate
from fruits c ;
What's the previous id :
create view VFruitsWithPreviousId
as select fruit, date, value,
(select max(id)
from fruits f
where v.fruit = f.fruit AND
v.previousdate = f.date) previousID
from VFruitsWithPreviousDate v ;
A query for all consecutive rows :
select f.*, v.value
from fruits f
join VFruitsWithPreviousId v on f.id = v.previousid ;
You can then add the condition WHERE f.Value = 'some_value_2' AND v.value != 'some_value_1'

record types that weren't found for a specific value in oracle query

I have this query
Select distinct p_id, p_date,p_city
from p_master
where p_a_id in(1,2,5,8,2,1,10,02)
and my IN clause contains 200 values. How do I get to know which ones weren't returned by the query. Each value in the IN clause may have a record in some cases they don't. I want to know all the records that weren't found for any selected p_a_id type.
Please help
This will do the trick but I'm sure there's an easier way to find this out :-)
with test1 as
(select '1,2,5,8,2,1,10,02' str from dual)
select * from (
select trim(x.column_value.extract('e/text()')) cols
from test1 t, table (xmlsequence(xmltype('<e><e>' || replace(t.str,',','</e><e>')|| '</e></e>').extract('e/e'))) x) cols
left outer join
(Select count(*), p_a_id from p_master where p_a_id in (1,2,5,8,2,1,10,02) group by p_a_id) p
on p.p_a_id = cols.cols
where p_a_id is null
;

Resources