This seems like a rather straightforward problem, yet I have not been able to find the solution:
In a table test, I have some subset of columns which I am interested in, say a,b,c,d,e,f.
Some or most of these columns are NULL, but at least one is always filled.
Now for some rows, returned, say by:
SELECT rowid,a,b,c,d,e,f LIMIT 1;
I would like to get the number of rows which have the same non-null values.
So for example if a,d,f are the columns that are not NULL for this row, the result would be the same as for:
SELECT COUNT(*)
FROM test WHERE a=a_ AND d=d_ AND f=f_
SELECT a as a_, d as d_, f as f_ FROM test LIMIT 1;
How can this be done in one step / line? Or do I need a temporary table?
You can use the operator IS to compare safely values that may be NULL:
SELECT COUNT(*)
FROM test t1
INNER JOIN (SELECT a, b, c, d, e, f FROM test LIMIT 1) t2
ON (t1.a, t1.b, t1.c, t1.d, t1.e, t1.f) IS (t2.a, t2.b, t2.c, t2.d, t2.e, t2.f);
or with a CTE:
WITH cte AS (SELECT a, b, c, d, e, f FROM test LIMIT 1)
SELECT COUNT(*)
FROM test t1 INNER JOIN cte t2
ON (t1.a, t1.b, t1.c, t1.d, t1.e, t1.f) IS (t2.a, t2.b, t2.c, t2.d, t2.e, t2.f);
See the demo.
Related
select m.value
from MY_TABLE m
where m.value in (select m2.some_third_value, m2.some_fourth_value
from MY_TABLE_2 m2
where m2.first_val member of v_my_array
or m2.second_val member of v_my_array_2)
Is it possible to write a select similar to this, where m.value is compared to two columns and has to match at least one of those? Something like where m.value in (select m2.first_val, m2.second_val). Or is writing two separate selects unavoidable here?
No. When there are multiple columns in the IN clause, there must be the same number of columns in the WHERE clause. The pairwise query compares each record in the WHERE clause against the records returned by the sub-query. The statement below
SELECT *
FROM table_main m
WHERE ( m.col_1, m.col_2 ) IN (SELECT s.col_a,
s.col_b
FROM table_sub s)
is equivalent to
SELECT *
FROM table_main m
WHERE EXISTS (SELECT 1
FROM table_sub s
WHERE m.col_1 = s.col_a
AND m.col_2 = s.col_b)
The only way to search both columns in one SELECT statement would be to OUTER JOIN the second table to the first table.
SELECT m.*
FROM table_main m
LEFT JOIN table_sub s ON (m.col_1 = s.col_a OR m.col_1 = s.col_b)
WHERE m.col_1 = s.col_a
OR m.col_1 = s.col_b
I have a table that have a column like this:
table1:
c1 c2 c3
. a .
. a .
. a .
a
b
b
c
How to get a result like the following?:
-- a b c
count(a) count(b) count(c)
Of course, there is an auxiliary table like the one below:
--field table
d1 d2
a
b
c
Transferring comments into an answer.
If there was an entry in table1.c2 with d as the value, is it correct to guess/assume that you'd want a fourth column of output with the name d and the count of the number of d values as the value. And there'd be an extra row in the auxilliary table too. That's pretty tricky.
You'd probably be better off with a result table with N rows, one for each value in the table1.c2 column, with the first column identifying the value and the second the count:
SELECT c2, COUNT(c2) FROM table1 GROUP BY c2 ORDER BY c2
To generate a single row with the names and counts as shown requires a dynamically built SQL statement — you write an SQL statement that generates the SQL (or the key components of the SQL) for a second statement that you actually execute to get the result. The main reason for it being dynamic like that is that the number of columns in the result set is not known until you run a query that determines which values exist in table1.c2. That's non-trivial — doable, but non-trivial.
I forget whether 11.50 has a built-in sysmaster:sysdual table. I ordinarily use a regular one-column, one-row table called dual. You can get the result you want, if your Table1.C2 has values a through e in it, with:
SELECT (SELECT COUNT(*) FROM Table1 WHERE c2 = 'a') AS a,
(SELECT COUNT(*) FROM Table1 WHERE c2 = 'b') AS b,
(SELECT COUNT(*) FROM Table1 WHERE c2 = 'c') AS c,
(SELECT COUNT(*) FROM Table1 WHERE c2 = 'd') AS d,
(SELECT COUNT(*) FROM Table1 WHERE c2 = 'e') AS e
FROM dual;
This gets the information you need. I don't think it is elegant, but "works" beats "doesn't work".
I use the following code in Perl to find duplicates in my SQLite database on the basis of 5 columns (Term1 to Term5).
my $dublette = $dbh->selectall_arrayref("
SELECT t.ID, t.Tag1, t.Tag2, t.Term1, t.Term2, t.Term3, t.Term4, t.Term5
FROM Data t inner join (
SELECT ID, Tag1, Tag2, Term1, Term2, Term3, Term4, Term5, COUNT(*) c FROM Data GROUP BY Term1, Term2, Term3, Term4, Term5 HAVING c > 1)
x on t.Term1=x.Term1 AND t.Term2=x.Term2 AND t.Term3=x.Term3 AND t.Term4=x.Term4 AND t.Term5=x.Term5
This seems to work exept if my unused values are NULL (but it works if they are empty). Any idea ho can I change it to get all duplicates?
I need to perform a recursive count operation on tables but here are the challenges that I am facing with.
Lets say I have tables A, B, C, D, E, F, .... Z
Here is the code snippet of what I have,
Proc sql;
create table temp as(
select count(*)
from a
inner join b on a.id = b.id
inner join c on a.id = c.id
inner join d on a.id = d.id
where <condition>
);
Once this code is complete I need to run the same query with B, C, D and E and update the result in same temp table that I am trying to create. This way I have to do for the entire table list that I have.
Is there a recursive sql to do this. I don't require a separate macro to call the query each time with different tables.
I would not do it quite this way.
proc sql;
create table temp as (
select count(case when n(a.id,b.id,c.id,d.id)=4 then 1 else 0 end) as abcd_count,
count(case when n(b.id,c.id,d.id,e.id)=4 then 1 else 0 end) as bcde_count
from a outer join b on a.id=b.id
outer join c ... etc.
;
quit;
IE, just do one join and use case when... to determine what has the counts you need. Here I use n() to identify records with all 4 ids on them.
C:\Users\pengsir>sqlite3 e:\\test.db
sqlite> create table test (f1 TEXT,f2 TEXT, f3 TEXT);
sqlite> insert into test values("x1","y1","w1");
sqlite> insert into test values("x1","y1","w2");
sqlite> insert into test values("x1","y3","w2");
sqlite> insert into test values("x2","y3","w2");
sqlite> insert into test values("x3","y4","w4");
sqlite> insert into test values("x2","y3","w4");
sqlite> insert into test values("x1","y3","w2");
sqlite>
1.select the record rows which contain the same f1 and f2 ,and the rowid .
sqlite> select rowid,f1,f2 from test group by f1,f2 having(count(f2)>1 and count(f2)>1);
2|x1|y1
7|x1|y3
6|x2|y3
I want the result to be :
1|x1|y1
2|x1|y1
3|x1|y3
4|x2|y3
6|x2|y3
7|x1|y3
2.select the record rows which contain the same f1 f2 and f3,and the rowid .
sqlite> select rowid,f1,f2,f3 from test group by f1,f2,f3 having(count(f2)>1 and count(f3)>1);
7|x1|y3|w2
I want the result to be
3|x1|y3|w2
7|x1|y3|w2
let us discuss this problem further , i want to delete one |x1|y3|w2 and keep one |x1|y3|w2 in the table?here is my method.
DELETE FROM test
WHERE rowid in(
SELECT rowid FROM test
WHERE (SELECT count(*)
FROM test AS t2
WHERE t2.f1 = test.f1
AND t2.f2 = test.f2
AND t2.f3 = test.f3
) >= 2 limit 1);
Is there more simple and smart way to do that? (the method is wrong)
I find the proper way to do .
delete from test
where rowid not in
(
select max(rowid)
from test
group by
f1,f2,f3
);
and the method to more than one duplicate for a f1/f2 combination is :
delete from test
where rowid not in
(select rowid from test group by f1,f2);
It will be executed only one time.
You want records that have duplicates (in those fields), i.e., where the number of records with the same values in those fields is at least two:
SELECT rowid, f1, f2
FROM test
WHERE (SELECT count(*)
FROM test AS t2
WHERE t2.f1 = test.f1
AND t2.f2 = test.f2
) >= 2
This requires executing the subquery for each record.
Alternatively, compute the records with duplicates in a subquery once; this might be more efficient:
SELECT test.rowid, test.f1, test.f2
FROM test
JOIN (SELECT f1, f2
FROM test
GROUP BY f1, f2
HAVING count(*) >= 2
) USING (f1, f2)
If you want to remove one of the duplicates, this is easier to do because GROUP BY already returns exactly one output row for each group:
DELETE FROM test
WHERE rowid IN (SELECT max(rowid)
FROM test
GROUP BY f1, f2
HAVING COUNT(*) >= 2)
(If there is more than one duplicate for a f1/f2 combination, you have to execute this multiple times.)
Try using SELECT ALL instead of SELECT but according to the linked docs SELECT should behave like SELECT ALL by default and not like SELECT DISTINCT so I don't know where the problem may be.