Count Persons in tree of Groups with a single query - sqlite

I'm writing a query in SQLite, for Android, with a schema like this (extremely simplified here, just the fields I need)
GROUP
group_id primary_key,
parent_group_id nullable
PERSON
person_id primary_key,
parent_group
I need to count the number of persons in a group and in its descendant groups, given the group_id of the group I want to count for. I think I need a CTE query and I've been reading all morning about them, but I'm not grasping how they work.

You're on the right track with needing a CTE. Something like:
WITH tree AS
(SELECT g.group_id AS root,
g.group_id AS parent,
p.person_id AS person
FROM "group" AS g
LEFT JOIN person AS p ON g.group_id = p.parent_group
WHERE g.group_id = #desired_group
UNION ALL
SELECT t.root, g.group_id, p.person_id
FROM tree AS t
JOIN "group" AS g ON t.parent = g.parent_group_id
LEFT JOIN person AS p on g.group_id = p.parent_group)
SELECT count(DISTINCT person)
FROM tree;
Start by selecting the desired group and its members, and then recursively select all members of groups with the given parent group. Finally, count all the unique users that were found.
db<>fiddle example.

I powered through all the articles I could find and via a lot of trials and errors, I got here (please note that in my real database person is model):
WITH RECURSIVE is_in_group(group_id, group_name, parent_group_id) AS(
SELECT gr.group_id, gr.group_name, gr.parent_group_id FROM _group as gr WHERE gr.group_id = :groupId
UNION ALL
SELECT g.group_id, g.group_name, g.parent_group_id FROM _group as g
JOIN is_in_group as c ON g.parent_group_id = c.group_id
)
SELECT q.group_id, q.group_name, count(m.model_id) as model_count FROM is_in_group as q
LEFT JOIN _model m ON m.parent_group_id = group_id
GROUP BY q.group_id
This will give me a list of groups (including the root one), with a group_id, group_name and a model_count of models in each group. With this I can simply sum to get the total or look at the row with the searched group_id to know how many models are just in this group.

Related

Would these two SQLite queries generate the same result?

I'm working through this exercise.
On question 4, the goal is to find employees hired after "Jones". I think this problem can be solved without a join like so:
SELECT first_name, last_name, hire_date
FROM employees
WHERE hire_date > (
SELECT hire_date FROM employees WHERE last_name = "Jones"
)
But the answer on the website suggests:
SELECT e.first_name, e.last_name, e.hire_date
FROM employees e
JOIN employees davies
ON (davies.last_name = "Jones")
WHERE davies.hire_date < e.hire_date;
Are these more-or-less the same or is there a reason the second answer should be considered better?
I assume that the column last_name is defined as UNIQUE, so that the subquery in the 1st query returns only 1 row.
If not, then the queries do not return the same results, because although the subquery in the 1st query may return more than 1 row (and in other databases the query would not even run), SQLite will pick just the 1st of the returned rows and use its hire_date to compare it with all the rows of the table, while the join will use all the rows where last_name = "Jones".
If my assumption is correct then the 2 queries are equivalent, but the 1st one is what I would suggest because it is more readable and I believe it would perform better than the join.
If I had to use a join for this requirement (since it is homework) I would choose a more readable form:
SELECT e.first_name, e.last_name, e.hire_date
FROM employees e
JOIN (SELECT * FROM employees WHERE last_name = "Jones") t
ON t.hire_date < e.hire_date;

I've got 2 questions regarding sqlite3 queries using 2 tables

A) List each lecturer together with each module they teach and the number of students studying that module, in order of the lecturer name.
B) Output the number of modules in which everyone passed the module (assuming pass mark is 40).
For A, you need to join the 2 tables and then group by lecturer, module and count the number of rows for each group (each row corresponds to a student):
select t.lecturer, t.module, count(*) numberofstudents
from teaches t inner join studies s
on s.module = t.module
group by t.lecturer, t.module
order by t.lecturer
For B, use NOT EXISTS to find the modules where all grades are >= 40 and count them:
select count(distinct module) numberofmodules
from studies s
where not exists (
select 1 from studies
where module = s.module and grade < 40
)

SQLite - Select by column value count

How do I select by column value count? In SQL query it would be something like this: select * from band inner join bandsinger on band.id = bandsinger.bandid inner join singer on singer.id = bandsinger.singerid group by band.id having count(singerid=6)>0 and count(singerid=4)>0 if SQLite function count() could accept a function as a parameter, but it doesn't.
The point is to select two bands, where two singers with known IDs sing.
I found the solution. In this case a query should be: select * from band inner join bandsinger on band.id = bandsinger.bandid inner join singer on singer.id = bandsinger.singerid where dinger.id = 6 or singer.id=4 group by band.id having count(*)=x where x is number of given IDs to count.

Need to apply Primary Indexes and secondary indexes in teradata tables

Can some one please help in solving my problem
I have three tables to be joined ed using indexes in Teradata to improve performance. Query specified below:-
Select b.Id, b.First_name, b.Last_name, c. Id,
c.First_name, c.Last_name, c.Result
from
(
select a.Id, a.First_name, a. Last_name, a.Approver1, a.Approver2
From table1 a
Inner join table2 d
On a.Id =D.Id
and A.Approver1 =a.Approver1
And a.Approve2 =D.Approver2
) b
Left join
(
select * from table3
where result is not null
and application like 'application1'
) c
On c. Id=b.Id
Group by b.Id, b.First_name, b.Last_name, c.Id,
c.First_name, c.Last_name, c.Result
The above query is taking so much of time since PI not defined correctly.
First two tables (table1 and 2) are with same set of columns hence pi can be defined like PI on I'd, approve1, approve2
However, while joining with table3 am confused and need to understand how to define pi. Is it something that PI can only work when we have same set of columns in the tables?
Structure of table3 is
I'd, first name, last name, result
And table 1 and table2
Id , First name, Last name, Approved 1, Approved 2, Results
Can you please help in defining primary indexes so that query can be optimised.
Teradata will usually not use Secondary Indexes for joins. The best PI would be id for all three tables, of course you need to check if there are not too many rows per value and it's not too skewed.
GROUP BY can be simplified to a DISTINCT, why do you need it, can you show the Primary Keys of those tables?
Edit based on comment:
PI-based joins are by far the fastest way. But you should be able the get rid of the DISTINCT, too, it's always a huge overhead.
Try replacing the 1st join with a NOT EXISTS:
Select b.Id, b.First_name, b.Last_name, c. Id,
c.First_name, c.Last_name, c.Result
from
(
select a.Id, a.First_name, a. Last_name, a.Approver1, a.Approver2
From table1 a
WHERE EXISTS
(
SELECT *
FROM table2 d
WHERE a.Id =D.Id
and A.Approver1 =a.Approver1
And a.Approve2 =D.Approver2
)
) b
Left join
(
select * from table3
where result is not null
and application like 'application1'
) c
On c. Id=b.Id

Advanced SQLite Update table query

I am trying to update Table B of a database looking like this:
Table A:
id, amount, date, b_id
1,200,6/31/2012,1
2,300,6/31/2012,1
3,400,6/29/2012,2
4,200,6/31/2012,1
5,200,6/31/2012,2
6,200,6/31/2012,1
7,200,6/31/2012,2
8,200,6/31/2012,2
Table B:
id, b_amount, b_date
1,0,0
2,0,0
3,0,0
Now with this query I get all the data I need in one select:
SELECT A.*,B.* FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,0,0
3,400,6/29/2012,1,1,0,0
Now, I just want to copy the selected column amount to b_amount and date to b_date
b_amount=amount, b_date=date
resulting in
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,200,6/31/2012
3,400,6/29/2012,1,1,400,6/29/2012
I've tried COALESCE() without success.
Does someone experienced have a solution for this?
Solution:
Thanks to the answers below, I managed to come up with this. It is probably not the most efficient way but it is fine for a one time only update. This will insert for you the first corresponding entry of each group.
REPLACE INTO A SELECT id, amount, date FROM
(SELECT A.id, A.amount, B.id as Bid FROM A INNER JOIN B ON (B.id=A.B_id)
ORDER BY A.id DESC)
GROUP BY Bid;
So what you are looking for seems to be a JOIN inside of an UPDATE query. In mySQL you would use
UPDATE B INNER JOIN A ON B.id=A.b_id SET B.amount=A.amount, B.date=A.date;
but this is not supported by sqlite as this probably related question points out. However, there is a workaround using REPLACE:
REPLACE INTO B
SELECT B.id, A.amount, A.date FROM A
LEFT JOIN B ON B.id=A.b_id
WHERE A.b_id>0 GROUP BY B.id;
The query will simply fill in the values of table B for all columns which should keep their state and fill in the values of table A for the copied values. Make sure the order of the columns in the SELECT statement meet your column order of table B and all columns are mentioned or you will loose these field's data. This is probably dangerous for future changes on table B. So keep in mind to change the column order/presence of this query when changing table B.
Something a bit off topic, because you did not ask for that: A.b_id is obviously a foreign key to B.id. It seems you are using the value 0 for the foreign key to express that there is no corresponding entry in B. (Inferred from your SELECT with WHERE A.b_id>0.) You should consider using the null value for that. When you are using INNER JOIN then instead of LEFT JOIN you can drop the WHERE clause entirely. The DBS will then sort out all unsatisfied relations.
WARNING Some RDBMS will return 2 rows as you show above. Others will return the Cartesian product of the rows i.e. A rows times B rows.
One tricky method is to generate SQL that is then executed
SELECT "update B set b.b_amount = ", a.amount, ", b.b_date = ", a.date,
" where b.id = ", a.b_id
FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
Now add the batch terminator and execute this SQL. The query result should look like this
update B set b.b_amount = 200, b.b_date = 6/31/2012 where b.id = 1
update B set b.b_amount = 400, b.b_date = 6/29/2012 where b.id = 3
NOTE: Some RDBMS will handle dates differently. Some require quotes.

Resources