Optimize left join query with multiple counts in SQLAlchemy?

Optimize left join query with multiple counts in SQLAlchemy? - count

Trying to optimize a query, which has multiple counts for objects in subordinate table (used aliases in SQLAlchemy). In Witch Academia terms, something like this:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Number of witches can be large (hundreds of thousands), number of exams is lower (hundreds), so the above query is quite slow. In a lot of similar questions I've found answers, which propose the above, but I feel like a totally different approach is needed here. I am stuck at coming up with alternative. NB, there is a need to order by calculated counts. It is also important to have zeros as counts, of course, where due. (do not pay attention to a somewhat funny model: witches can easily clone themselves to go to multiple exams, thus per exam identity)
With one EXISTS subquery, which is not reflected in the above and does not influence the ouotcome, the situation is:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
Updated query, which is still quite slow:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20

Indexing is the key to get performance of the query.
I do not know MariaDB at all, so not sure what the possibilities are. But if it is anything like Microsoft SQL Server, then here is what I would try:
Create ONE composite index covering ALL the required columns: witch_id, status and is_failed. If the query uses that index, that should be it. Here the order of the included columns might be very important. Then profile the query in order to understand if the index is used. See Optimization and Indexes documentation page.
Consider Generated (Virtual and Persistent) Columns.
It looks like all the information for classification of the witch into tried, passed or failed bucket is contained in the row for witch. Therefore, you can basically create those virtual columns on the database table directly and use PERSISTENT option. This option allows creating index on it. Then you can create an index specifically for this query containing witch_id and three virtual columns: tried, passed and failed. Make sure you query uses it, and that should be pretty good. The query will then look very simple:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
Although query simple comparisons and AND/OR clauses, you are basically offloading the calculation of the 3 statuses to the database during INSERT/UPDATE. Then during SELECT you query should be much faster.
Your example does not specify any result filtering (WHERE clause), but if you have one, it might also have an impact on the way one optimises indices for query performance.
Original answer: Below is the originally proposed change to the query.
Here i assume that indexing part of the optimisation has been already done.
Could you try with SUM instead of COUNT?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
The rest:
Given you have specified sqlalchemy in your answer, here is the sqlalchemy code, which i used to model and generate the query:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
#hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
#hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
#hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
#tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
#passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
#failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
and:
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)

Related

I can't run this query with MERGE in Teradata

This query worked perfectly until the moment I went in for vacations, now itdoes not run anymore and does not merge, dont know what it can be
MERGE INTO STG_FATO_MACRO_GESTAO AS FAT
USING(SELECT DISTINCT
COD_EMPRESA
,FUN.MATRICULA AS FUN_MAT
,APR.MATRICULA AS APR_MAT
,FUN.CPF AS FUN_CPF
,APR.CPF AS APR_CPF
,APR.DAT_DESLIGAMENTO
,YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) AS DESL
,FUN.DATA_ADMISSAO
,YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) AS ADM
, CASE WHEN YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) <= YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) THEN 1 ELSE 0 END AS ADMITIDO
,CASE WHEN FUN.DATA_ADMISSAO <= (APR.DAT_DESLIGAMENTO + INTERVAL '90' DAY) THEN 1 ELSE 0 END AS APR_90
FROM (SELECT CPF,DATA_ADMISSAO, MATRICULA, COD_EMPRESA FROM DIM_FUNCIONARIO
WHERE PROFISSAO NOT LIKE '%APRENDIZ%') AS FUN
INNER JOIN (SELECT DISTINCT
CPF,DAT_DESLIGAMENTO,MATRICULA
FROM HST_APRENDIZ
WHERE FLAG_FECHAMENTO = 2
AND DAT_DESLIGAMENTO IS NOT NULL) AS APR
ON FUN.CPF = APR.CPF) AS APR_90
ON FAT.COD_EMPRESA = APR_90.COD_EMPRESA
AND FAT.MATRICULA = APR_90.FUN_MAT
AND APR_90.APR_90 = 1
AND APR_90.ADMITIDO = 1
WHEN MATCHED THEN
UPDATE SET APRENDIZ_EFETIVADO_90 = 1
;
when running this query returns me this error:
"The search condition must fully specify the Target table primary index and partition column(s) and expression must match INSERT specification primary index and partition column(s). "

Count of group by statement AX2012 R3

I want to count group by statement values and write control.
There is a table which have 3 fields; ItemId, TaxGroup and TaxItemGroup. Same itemId must have same taxgroup and taxItemGroup values. If lines which have same ItemId have different tax values, i should throw an exception.
I wrote this code but this return number of all records. How can i write this control?
while select count(RecId) from ActivityLineExtraLcl
group by Itemid, TaxGroup, TaxItemGroup
where ActivityLineExtraLcl.Amount != 0
if(ActivityLineExtraLcl.RecId > 1 )
throw error("Same ItemIds can't have different values!");

The logic you're attempting won't work as written and it doesn't really make sense. You can try something like the below. This identifies the actual items. Your code (if it worked) just looks to see if the entire table ActivityLineExtraLcl has any issues.
ActivityLineExtraLcl ActivityLineExtraLcl;
ActivityLineExtraLcl ActivityLineExtraLclExists;
while select ItemId from ActivityLineExtraLcl
group by ItemId
where ActivityLineExtraLcl.Amount != 0
exists join ActivityLineExtraLclExists
where ActivityLineExtraLclExists.ItemId == ActivityLineExtraLcl.ItemId &&
ActivityLineExtraLclExists.Amount != 0 &&
(ActivityLineExtraLclExists.TaxGroup != ActivityLineExtraLcl.TaxGroup ||
ActivityLineExtraLclExists.TaxItemGroup != ActivityLineExtraLcl.TaxItemGroup)
{
error(strFmt("Item '%1' has different tax values", ActivityLineExtraLcl.ItemId));
}

Why would oracle subquery with AND & OR return returning wrogn results set

I have two subqueries. as shown below. the first query works fine but the second query which is basically the first query that I modified to use AND & OR, doesn't work in the sense that it doesn't return ID as expected. any suggestions on what is happening here?
1. (SELECT * FROM (SELECT EMPID FROM EVENT_F
INNER JOIN WCINFORMATION_D
ON EVENT_F.JOB_INFO_ROW_WID= WCINFORMATION_D.ROW_WID
INNER JOIN WCANDIDATE_D ON WCCANDIDATE_D.ROW_WID = VENT_F.CANDIDATE_ROW_WID
WHERE STEP_NAME = 'Offer'
AND WCINFORMATION_D.JOB_FAMILY_NAME IN ('MDP','ELP','Emerging Leader Program','Other')
AND TITLE NOT IN ('Student Ambassador Program for Eligible Summer Interns','Student Ambassador')
AND PI_CANDIDATE_NUM = OUTERAPP.PI_CANDIDATE_NUM
--limit 1
ORDER BY CREATION_DT ASC
) T1 WHERE ROWNUM=1) AS A_ID,
2.(SELECT * FROM (SELECT EMPID FROM EVENT_F
INNER JOIN WCINFORMATION_D
ON EVENT_F.JOB_INFO_ROW_WID= WCINFORMATION_D.ROW_WID
INNER JOIN WCANDIDATE_D ON WCCANDIDATE_D.ROW_WID = VENT_F.CANDIDATE_ROW_WID
WHERE STEP_NAME = 'Offer'
AND WCINFORMATION_D.JOB_FAMILY_NAME IN ('MDP','ELP','Emerging Leader Program','Other') or WCINFORMATION_D.JOB_FAMILY_NAME NOT IN ('MDP','ELP','Emerging Leader Program','Other')
AND TITLE NOT IN ('Student Ambassador Program for Eligible Summer Interns','Student Ambassador')
AND PI_CANDIDATE_NUM = OUTERAPP.PI_CANDIDATE_NUM
--limit 1
ORDER BY CREATION_DT ASC
) T1 WHERE ROWNUM=1) AS A_ID,

If you're wanting to get the count of people in one set of job families, plus a count of people in another set, you need to use a conditional count, e.g. something along the lines of:
SELECT COUNT(CASE WHEN wid.job_family_name IN ('MDP', 'ELP', 'Emerging Leader Program', 'Other') THEN 1 END) job_family_grp1,
COUNT(CASE WHEN wid.job_family_name IS NULL OR wid.job_family_name NOT IN ('MDP', 'ELP', 'Emerging Leader Program', 'Other') THEN 1 END) job_family_grp2
FROM event_f ef
INNER JOIN wcinformation_d wid
ON ef.job_info_row_wid = wid.row_wid
INNER JOIN wcandidate_d wcd
ON wcd.row_wid = ef.candidate_row_wid
WHERE step_name = 'Offer' -- alias this column name
AND title NOT IN ('Student Ambassador Program for Eligible Summer Interns', 'Student Ambassador') -- alias this column name;
You will most likely need to amend this to work for your particular case (it'll have to go as a join into your main query, given there are two columns being selected) since you didn't provide enough information in your question to give us the wider context.

SQLite query returns 0 results

I am having trouble with a query.
Fiddle: https://www.db-fiddle.com/f/JXQHw1VzF7vAowNLFrxv5/1
This is not going to work.
So my question is: What has to be done to get a result when I wanna use both conditions.
(attr_key = 0 AND attr_value & 201326592 = 201326592)
AND
(attr_key = 30 AND attr_value & 8 = 8)
Thanks in advance!
Best regards

One way to check for the presence of some number of key value pairs in the items_attributes table would be to use conditional aggregation:
SELECT i.id
FROM items i
LEFT JOIN items_attributes ia
ON i.id = ia.owner
GROUP BY
i.id
HAVING
SUM(CASE WHEN ia.key = 0 AND ia.value = 201326592 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN ia.key = 30 AND ia.value = 8 THEN 1 ELSE 0 END) > 0
The trick in the above query is that we scan each cluster of key/value pairs for each item, and then check whether the pairs you expect are present.
Note: My query just returns id values from items matching all key value pairs. If you want to bring in other columns from either of the two tables, you may simply add on more joins to what I wrote above.

Doctrine2 Join count on same table returning empty array

$messageQuery
->select('m, COUNT(pm) AS newResponses')
//->addSelect($messageQuery->expr()->countDistinct('pm.id'))
->from('entities:PrivateMessage', 'm')
->where('m.employeeId = :employeeId AND m.responseTo = 0')
->innerJoin('entity:PrivateMessage', 'pm', 'WITH', 'pm.responseTo = m.id AND pm.employeeRead = 0')
->setParameter('employeeId', $employeeId)
->setFirstResult($offset)
->setMaxResults($max)
->addGroupBy('m.id')
->orderBy('m.id', 'DESC');
Let's assume there are two messages with employeeId = 1 and responseTo = 0. One of those messages also has two responses (therefore two other records with responseTo = messageId). The other has none. The result I would expect from this query is two arrays with the entity object as index 0, and the count as index numResponses (value of 2 for the first row, 0 for the second). What I'm getting back is an empty array on $messageQuery->getQuery()->getResult();
Does anyone have any ideas as to why this may be happening? Is there something obvious I'm missing here?

Actually, I seem to have fixed it by switching to a leftJoin.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Optimize left join query with multiple counts in SQLAlchemy? - count

Related

I can't run this query with MERGE in Teradata

Count of group by statement AX2012 R3

Why would oracle subquery with AND & OR return returning wrogn results set

SQLite query returns 0 results

Doctrine2 Join count on same table returning empty array

Categories

Resources