I am having trouble with a query.
Fiddle: https://www.db-fiddle.com/f/JXQHw1VzF7vAowNLFrxv5/1
This is not going to work.
So my question is: What has to be done to get a result when I wanna use both conditions.
(attr_key = 0 AND attr_value & 201326592 = 201326592)
AND
(attr_key = 30 AND attr_value & 8 = 8)
Thanks in advance!
Best regards
One way to check for the presence of some number of key value pairs in the items_attributes table would be to use conditional aggregation:
SELECT i.id
FROM items i
LEFT JOIN items_attributes ia
ON i.id = ia.owner
GROUP BY
i.id
HAVING
SUM(CASE WHEN ia.key = 0 AND ia.value = 201326592 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN ia.key = 30 AND ia.value = 8 THEN 1 ELSE 0 END) > 0
The trick in the above query is that we scan each cluster of key/value pairs for each item, and then check whether the pairs you expect are present.
Note: My query just returns id values from items matching all key value pairs. If you want to bring in other columns from either of the two tables, you may simply add on more joins to what I wrote above.
Related
First, I'm sure there is a cleaner way to do this, but it's the only way I've been able to make the code combine the DX's into one column. Originally they were in separate columns as 0/1's and I needed them in one column. I tried the PIVOT function, but was not able to figure it out.
The issue is I need the paid amounts to be based on duplicated instances DX's. Which sounds counterintuitive, but for this report it's what I need.
For example. If member A has COPD, ASTHMA, AND DIABETES. The member's paid claims were 40,000 so I need the paid amount for that member to reflect 120,000, etc. and so forth.
The code:
SELECT
DX_FLAG
,Sum( AMT_PAID) AS PHARM_PAID_AMT
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
FROM
(SELECT
st.MEMBER_AMISYS_NBR
,ph.PHARMACY_CLAIM_CK
,ph.AMT_PAID
,FILL.DATE_DATE AS Fill_Date
,Coalesce(CASE WHEN DX_ASTHMA = 'ASTHMA' THEN 'Asthma' END,
CASE WHEN DX_COPD = 'COPD' THEN 'COPD' END,
CASE WHEN DX_DIABETES = 'DIABETES' THEN 'DIABETES' END,
CASE WHEN DX_HEART_FAILURE = 'HEART FAILURE' THEN 'HEART_FAILURE' END,
CASE WHEN DX_HYPERTENSION = 'HYPERTENSION' THEN 'HYPERTENSION' END)
AS DX_FLAG
FROM
STATE_OVERALL_MBRS st
JOIN FT_PHARMACY_CLAIM ph ON st.MEMBER_CURR_CK = ph.PRESCRIBER_MEMBER_CURR_CK AND ph.DELETED_IND = 'N'
JOIN DIM_DATE FILL ON ph.FILL_DATE_DIM_CK = FILL.DATE_DIM_CK
WHERE FILL.DATE_DATE BETWEEN '2021-10-01' AND '2022-09-30'
AND ph.PLAN_DIM_CK =10
AND ph.REVERSAL_IND = 'N'
AND ph.AMT_PAID > 0
) rx
My output looks like this .
DX_FLAG
PHARM_PAID_AMT
MEMBER_COUNT
DIABETES
70,000,000
14,144
COPD
38,266,409
6,641
HEART_FAILURE
10,908,000
2,544
ASTHMA
125,000,000
30,000
HYPERTENSION
52,900
22,325
I have tried adding/removing the Distinct from each select statement and the only one that made a difference was removing distinct from this line, in which case I ended up with far too many member counts (even taking into account the duplicate DX counts).
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
The State_Overall_Mbrs table with DX_Flag looks like this and I needed all the diagnosis to be in one column (with duplicate rows for members depending on how many diagnoses they have):
Member ID Asthma COPD Hypertension Diabetes CHF
55555555 0 1 1 1 0
66666666 1 0 0 1 0
77777777 0 0 1 0 0
Normalize the members table, then join and aggregate; something like this:
SELECT
DX_FLAG
,Sum(AMT_PAID) AS PHARM_PAID_AMT
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
FROM
(SELECT * FROM State_Overall_Members
UNPIVOT (has_dx /* New column to hold the 0 or 1 value */
FOR DX_FLAG IN (Asthma,COPD,Hypertension,Diabetes,CHF)
/* Original column names become the values in new column DX_FLAG */
) nmlz
WHERE has_dx = 1 /* Only unpivot rows with a 1 in original column */
) st
JOIN FT_PHARMACY_CLAIM ph ON st.MEMBER_CURR_CK = ph.PRESCRIBER_MEMBER_CURR_CK AND ph.DELETED_IND = 'N'
JOIN DIM_DATE FILL ON ph.FILL_DATE_DIM_CK = FILL.DATE_DIM_CK
WHERE FILL.DATE_DATE BETWEEN '2021-10-01' AND '2022-09-30'
AND ph.PLAN_DIM_CK =10
AND ph.REVERSAL_IND = 'N'
AND ph.AMT_PAID > 0
GROUP BY DX_FLAG;
Another option to normalize the members table would be to have a subquery for each DX and UNION those, along these lines:
... FROM
(SELECT MEMBER_CURR_CK, MEMBER_AMISYS_NBR, AMT_PAID, 'Asthma' (VARCHAR(16)) AS DX_FLAG
FROM State_Overall_Members
WHERE Asthma = 1
UNION ALL
SELECT MEMBER_CURR_CK, MEMBER_AMISYS_NBR, AMT_PAID, 'COPD' (VARCHAR(16)) AS DX_FLAG
FROM State_Overall_Members
WHERE COPD = 1
UNION ALL
...
) st
JOIN ...
This query worked perfectly until the moment I went in for vacations, now itdoes not run anymore and does not merge, dont know what it can be
MERGE INTO STG_FATO_MACRO_GESTAO AS FAT
USING(SELECT DISTINCT
COD_EMPRESA
,FUN.MATRICULA AS FUN_MAT
,APR.MATRICULA AS APR_MAT
,FUN.CPF AS FUN_CPF
,APR.CPF AS APR_CPF
,APR.DAT_DESLIGAMENTO
,YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) AS DESL
,FUN.DATA_ADMISSAO
,YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) AS ADM
, CASE WHEN YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) <= YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) THEN 1 ELSE 0 END AS ADMITIDO
,CASE WHEN FUN.DATA_ADMISSAO <= (APR.DAT_DESLIGAMENTO + INTERVAL '90' DAY) THEN 1 ELSE 0 END AS APR_90
FROM (SELECT CPF,DATA_ADMISSAO, MATRICULA, COD_EMPRESA FROM DIM_FUNCIONARIO
WHERE PROFISSAO NOT LIKE '%APRENDIZ%') AS FUN
INNER JOIN (SELECT DISTINCT
CPF,DAT_DESLIGAMENTO,MATRICULA
FROM HST_APRENDIZ
WHERE FLAG_FECHAMENTO = 2
AND DAT_DESLIGAMENTO IS NOT NULL) AS APR
ON FUN.CPF = APR.CPF) AS APR_90
ON FAT.COD_EMPRESA = APR_90.COD_EMPRESA
AND FAT.MATRICULA = APR_90.FUN_MAT
AND APR_90.APR_90 = 1
AND APR_90.ADMITIDO = 1
WHEN MATCHED THEN
UPDATE SET APRENDIZ_EFETIVADO_90 = 1
;
when running this query returns me this error:
"The search condition must fully specify the Target table primary index and partition column(s) and expression must match INSERT specification primary index and partition column(s). "
I have a table TABLE in SQLite database with columns DATE, GROUP. I want to select the first 10 entries in each group. After researching similar topics here on stackoverflow, I came up with the following query, but it runs very slowly. Any ideas how to make it faster?
select * from TABLE as A
where (select count(*) from TABLE as B
where B.DATE < A.DATE and A.GROUP == B.GROUP) < 10
This is the result of EXPLAIN QUERY PLAN (TABLE = clients_bets):
Here are a few suggestions :
Use a covering index (an index containing all the data needed in the subquery, in this case the group and date)
create index some_index on some_table(some_group, some_date)
Additionally, rewrite the subquery to make is less dependent on outer query :
select * from some_table as A
where rowid in (
select B.rowid
from some_table as B
where A.some_group == B.some_group
order by B.some_date limit 10 )
The query plan change from :
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=?)
to
0 0 0 SCAN TABLE some_table AS A
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE some_table AS B USING COVERING INDEX idx_1 (some_group=? AND some_date<?)
While it is very similar, the query seems quite faster. I'm not sure why.
Trying to optimize a query, which has multiple counts for objects in subordinate table (used aliases in SQLAlchemy). In Witch Academia terms, something like this:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Number of witches can be large (hundreds of thousands), number of exams is lower (hundreds), so the above query is quite slow. In a lot of similar questions I've found answers, which propose the above, but I feel like a totally different approach is needed here. I am stuck at coming up with alternative. NB, there is a need to order by calculated counts. It is also important to have zeros as counts, of course, where due. (do not pay attention to a somewhat funny model: witches can easily clone themselves to go to multiple exams, thus per exam identity)
With one EXISTS subquery, which is not reflected in the above and does not influence the ouotcome, the situation is:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
Updated query, which is still quite slow:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Indexing is the key to get performance of the query.
I do not know MariaDB at all, so not sure what the possibilities are. But if it is anything like Microsoft SQL Server, then here is what I would try:
Create ONE composite index covering ALL the required columns: witch_id, status and is_failed. If the query uses that index, that should be it. Here the order of the included columns might be very important. Then profile the query in order to understand if the index is used. See Optimization and Indexes documentation page.
Consider Generated (Virtual and Persistent) Columns.
It looks like all the information for classification of the witch into tried, passed or failed bucket is contained in the row for witch. Therefore, you can basically create those virtual columns on the database table directly and use PERSISTENT option. This option allows creating index on it. Then you can create an index specifically for this query containing witch_id and three virtual columns: tried, passed and failed. Make sure you query uses it, and that should be pretty good. The query will then look very simple:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
Although query simple comparisons and AND/OR clauses, you are basically offloading the calculation of the 3 statuses to the database during INSERT/UPDATE. Then during SELECT you query should be much faster.
Your example does not specify any result filtering (WHERE clause), but if you have one, it might also have an impact on the way one optimises indices for query performance.
Original answer: Below is the originally proposed change to the query.
Here i assume that indexing part of the optimisation has been already done.
Could you try with SUM instead of COUNT?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
The rest:
Given you have specified sqlalchemy in your answer, here is the sqlalchemy code, which i used to model and generate the query:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
#hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
#hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
#hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
#tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
#passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
#failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
and:
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)
I am using the following insert query to create a comparison between two tables using the dates to join on.
INSERT INTO Comp_Table (Date, CKROne, CKRTwo, ChangeOne, ChangeTwo, State)
SELECT BaseTbl.Date, BaseTbl.CKR, CompTbl.CKR, BaseTbl.Change, CompTbl.Change,
CASE
WHEN BaseTbl.Change > 0 AND CompTbl.Change > 0 THEN 'positive'
WHEN BaseTbl.Change < 0 AND CompTbl.Change < 0 THEN 'positive'
ELSE 'inversely'
END AS 'Correlation'
FROM BaseTbl
JOIN CompTbl ON BaseTbl.Date = CompTbl.Date;
This works well. However, I would like to be able to join the tables with a lag. As in, the user can define if they want to do exact match on dates or if they want to use a date of one's occurrence plus a number and return the value from the latter date for comparison to the number to the former date. Pseudo code example:
User sets variable = 0 then
Join ComTbl On BaseTbl.Date = CompTbl.Date + 0;
User sets variable = 7 then
Join CompTbl On BaseTbl.Date = CompTbl.Date + 7;
(joins 2012-01-01 from BaseTbl to 2012-01-08 from CompTbl)
I tried to add days like you would in a Where clause ('+7 day'), but this didn't work. I also tried to using a Where clause with BaseTbl.Date = CompTbl.Date '+ 7 day' but that returned a 0 value also. How can this be accomplished in SQLite?
I think you can use the DATE() function to build the WHERE clause you want:
INSERT INTO ...
SELECT ...
FROM BaseTbl
INNER JOIN ComTbl
ON BaseTbl.Date = DATE(CompTbl.Date, '7 days')