I have a Question regarding my current project with HSQLDB.
I have a huge join which should give me a feedback about the selled amunt of different products in which region. The Statement works complety corect, but the performance is quite bad.
SELECT Verkaufsgebiet.verkaufsgebiet, Artikel.Artikel_Name, sum(Artikel_Liste.Menge)
FROM Artikel_Liste
JOIN Artikel ON Artikel_Liste.Artikel = Artikel.Artikel
JOIN Rechnung ON artikel_liste.rechnungsnummer = Rechnung.rechnungsnummer
JOIN Bearbeiter ON Rechnung.Bearbeiter = Bearbeiter.B_Bearbeiter_ID
JOIN Verkaufsgebiet ON Bearbeiter.b_verkaufsgebiet = Verkaufsgebiet.verkaufsgebiet_id
GROUP BY Artikel.Artikel_Name, Verkaufsgebiet.verkaufsgebiet
Explain Analyze:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: PUBLIC.VERKAUFSGEBIET.VERKAUFSGEBIET nullable
COLUMN: PUBLIC.ARTIKEL.ARTIKEL_NAME not nullable
SUM arg=[ COLUMN: PUBLIC.ARTIKEL_LISTE.MENGE
nullable
]
[range variable 1
join type=INNER
table=ARTIKEL_LISTE
cardinality=4000009
access=FULL SCAN
join condition = [index=SYS_IDX_ARTIKEL_LISTE_PK_10099
]
][range variable 2
join type=INNER
table=ARTIKEL
cardinality=11
access=INDEX PRED
join condition = [index=SYS_IDX_ARTIKEL_PK_10093
start conditions=[
EQUAL arg_left=[ COLUMN: PUBLIC.ARTIKEL.ARTIKEL
] arg_right=[ COLUMN: PUBLIC.ARTIKEL_LISTE.ARTIKEL
]]
end condition=[
EQUAL arg_left=[ COLUMN: PUBLIC.ARTIKEL.ARTIKEL
] arg_right=[ COLUMN: PUBLIC.ARTIKEL_LISTE.ARTIKEL
]]
]
][range variable 3
join type=INNER
table=RECHNUNG
cardinality=1332427
access=INDEX PRED
join condition = [index=SYS_IDX_RECHNUNG_PK_10120
start conditions=[
EQUAL arg_left=[ COLUMN: PUBLIC.RECHNUNG.RECHNUNGSNUMMER
] arg_right=[ COLUMN: PUBLIC.ARTIKEL_LISTE.RECHNUNGSNUMMER
]]
end condition=[
EQUAL arg_left=[ COLUMN: PUBLIC.RECHNUNG.RECHNUNGSNUMMER
] arg_right=[ COLUMN: PUBLIC.ARTIKEL_LISTE.RECHNUNGSNUMMER
]]
]
][range variable 4
join type=INNER
table=BEARBEITER
cardinality=50
access=INDEX PRED
join condition = [index=SYS_IDX_BEARBEITER_PK_10108
start conditions=[
EQUAL arg_left=[ COLUMN: PUBLIC.BEARBEITER.B_BEARBEITER_ID
] arg_right=[ COLUMN: PUBLIC.RECHNUNG.BEARBEITER
]]
end condition=[
EQUAL arg_left=[ COLUMN: PUBLIC.BEARBEITER.B_BEARBEITER_ID
] arg_right=[ COLUMN: PUBLIC.RECHNUNG.BEARBEITER
]]
]
][range variable 5
join type=INNER
table=VERKAUFSGEBIET
cardinality=5
access=INDEX PRED
join condition = [index=SYS_IDX_VERKAUFSGEBIET_PK_10129
start conditions=[
EQUAL arg_left=[ COLUMN: PUBLIC.VERKAUFSGEBIET.VERKAUFSGEBIET_ID
] arg_right=[ COLUMN: PUBLIC.BEARBEITER.B_VERKAUFSGEBIET
]]
end condition=[
EQUAL arg_left=[ COLUMN: PUBLIC.VERKAUFSGEBIET.VERKAUFSGEBIET_ID
] arg_right=[ COLUMN: PUBLIC.BEARBEITER.B_VERKAUFSGEBIET
]]
]
]]
groupColumns=[COLUMN: PUBLIC.ARTIKEL.ARTIKEL_NAME
COLUMN: PUBLIC.VERKAUFSGEBIET.VERKAUFSGEBIET
]
PARAMETERS=[]
SUBQUERIES[]
My Questin is, how can i get a better performance of this query. For example i will run the Query with 4.000.000 of Data.
The HSQLDB runs on a Virtual Server with 2x vCPU Intel XEON E5540 2,53 GHz, 6GB RAM on an OpenSuse 11.4 Server with Openjdk-1.5, I know that is not the best Java version but i don't want to much trouble, in case of an upgrade, in the end of my final exam ;)
My Question is how can i get a better query performance as well?
And my Other Question how can i get more data into these database in Server Memory mode?
Currently I'm running out of Memory, if i try to get more 4.000.000 records into the database. The Server is started with -Xmx2048M Parameter, if i try to start with more RAM i'm not able to launch a JVM. They are using 2gb RAM for this dataset.
Is there any possibility to get a better RAM handling? Maybe with the CHECKPOINT Statement should that work?
Or is that simply the limit of the HSQLDB for me. I don't really want to have CACHED Tables, they should be stored in Memory as well.
Related
I just want to get some hint. Is there a type issue?
issue case.
SET #ids = '4094,8562,11144,3017,5815,11121,1957,4095,8563,11145,3018,5816,8527,11122,1959,4096,8564,3020,5817,8528,11123,1961,4097,8571,3021,6020,8535,11128,1962,5181,8572,3581,6021';
this #ids value is actually collected by GROUP_CONCAT() from the subquery;
SELECT
ifnull(sum(case when a.student IS NOT NULL then total END), 0)
from
tb_class a
WHERE
a.id IN (#ids)
and a.date >= '2023-02-01' AND a.DATE <= '2023-02-02'
==> 0
correct case2.
SELECT
ifnull(sum(case when a.student IS NOT NULL then total END), 0)
from
tb_class a
WHERE
a.id IN (4094,8562,11144,3017,5815,11121,1957,4095,8563,11145,3018,5816,8527,11122,1959,4096,8564,3020,5817,8528,11123,1961,4097,8571,3021,6020,8535,11128,1962,5181,8572,3581,6021)
and a.date >= '2023-02-01' AND a.DATE <= '2023-02-02'
==> 54
I got answer from googling. use function FIND_IN_SET()
SELECT
ifnull(sum(case when a.student IS NOT NULL then total END), 0)
from
tb_class a
WHERE
FIND_IN_SET(a.id, #ids)
and a.date >= '2023-02-01' AND a.DATE <= '2023-02-02'
Variables store single values, not lists. Your #ids is just a string that happens to have a comma separated list of numbers. The IN operator only compares against an explicit list; what you are doing is no different than a.id = #ids (which will actually be true, with a warning, for the first number in the list if id is a numeric type, since the string will be converted to a number and the trailing non-numeric portion discarded).
Sometimes you do want to work with a string containing a list of ids such as this, for instance if you have a query that reads many rows that you want to use to produce a small list of ids to update, without the update locking those all the rows read. Then you can use dynamic sql:
SET #ids = '4094,8562,...';
SET #sql = concat('select * from a where a.id in (',#ids,')');
prepare stmt from #sql;
execute stmt;
deallocate prepare stmt;
Or, in mariadb since 10.2,
EXECUTE IMMEDIATE concat('select * from a where a.id in (',#ids,')');
Another alternative is to use FIND_IN_SET, as shown in another answer, but that will not use an index to look up ids, so may be inefficient.
So i have a query and it need to return a JSON object but the DISTINCT doesn't work no matter what I try. I've been trying a series of other tests and no matter what the 'WBS' always shows up with 3 or more duplicated columns. Anyone got any ideas?
I am working in Asp.net 6 MVC
PROCEDURE GET_BASELINE_RPT (in_WBS_LEVEL_ID IN NUMBER, in_FISCAL_YEAR IN VARCHAR2,
in_FISCAL_MONTH IN VARCHAR2, RET OUT CLOB) AS
BEGIN
WITH cte AS (
SELECT /*+MATERIALIZE*/ DISTINCT L.WBS_LEVEL_ID
FROM
WBS_LEVEL L
)
SELECT
JSON_ARRAYAGG (
JSON_OBJECT (
'WBS' VALUE L.WBS_LEVEL_NAME,
'Title' VALUE W.DESCRIPTION,
'Rev' VALUE B.REV_NUMBER,
'ScopeStatus' VALUE W.STATUS,
'BCP' VALUE CASE WHEN BC.FISCAL_YEAR = 0 THEN '' ELSE
SUBSTR(BC.FISCAL_YEAR,3,2)||'-'||LPAD(BC.BCP_FISCAL_ID, 3, '0') END,
'BCPApprovalDate' VALUE BC.APPROVAL_DATE,
'Manager' VALUE P1.NICK_NAME,
'ProjectControlManager' VALUE P2.NICK_NAME,
'ProjectControlEngineer' VALUE P3.NICK_NAME,
'FiscalYear' VALUE W.FISCAL_YEAR,
'FiscalMonth' VALUE W.FISCAL_MONTH,
'WBSNumber' VALUE L.WBS_LEVEL_ID
)RETURNING CLOB)
INTO RET
FROM WBS_LEVEL L
LEFT OUTER JOIN BASELINE_RPT B ON L.WBS_LEVEL_ID = B.WBS_LEVEL_ID
JOIN BCP BC ON BC.BCP_ID = B.BCP_ID
LEFT OUTER JOIN WBS_TREE_MOD W ON L.WBS_LEVEL_ID = W.WBS_LEVEL_ID
LEFT OUTER JOIN VW_SITEPEOPLE P1 ON W.WBS_MANAGER_SNUMBER = P1.SNUMBER
LEFT OUTER JOIN VW_SITEPEOPLE P2 ON W.PCM_SNUMBER = P2.SNUMBER
LEFT OUTER JOIN VW_SITEPEOPLE P3 ON W.PCE_SNUMBER = P3.SNUMBER
ORDER BY L.WBS_LEVEL_NAME, B.REV_NUMBER DESC;
END GET_BASELINE_RPT;
so it turns out I wasn't getting duplicates at all. There were differences in the data but there were so many columns that had the same data that I didn't notice the differences until another review. I will try to restart my query but to be honest I may just put in a filter in my C#.
Trying to optimize a query, which has multiple counts for objects in subordinate table (used aliases in SQLAlchemy). In Witch Academia terms, something like this:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Number of witches can be large (hundreds of thousands), number of exams is lower (hundreds), so the above query is quite slow. In a lot of similar questions I've found answers, which propose the above, but I feel like a totally different approach is needed here. I am stuck at coming up with alternative. NB, there is a need to order by calculated counts. It is also important to have zeros as counts, of course, where due. (do not pay attention to a somewhat funny model: witches can easily clone themselves to go to multiple exams, thus per exam identity)
With one EXISTS subquery, which is not reflected in the above and does not influence the ouotcome, the situation is:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
Updated query, which is still quite slow:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Indexing is the key to get performance of the query.
I do not know MariaDB at all, so not sure what the possibilities are. But if it is anything like Microsoft SQL Server, then here is what I would try:
Create ONE composite index covering ALL the required columns: witch_id, status and is_failed. If the query uses that index, that should be it. Here the order of the included columns might be very important. Then profile the query in order to understand if the index is used. See Optimization and Indexes documentation page.
Consider Generated (Virtual and Persistent) Columns.
It looks like all the information for classification of the witch into tried, passed or failed bucket is contained in the row for witch. Therefore, you can basically create those virtual columns on the database table directly and use PERSISTENT option. This option allows creating index on it. Then you can create an index specifically for this query containing witch_id and three virtual columns: tried, passed and failed. Make sure you query uses it, and that should be pretty good. The query will then look very simple:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
Although query simple comparisons and AND/OR clauses, you are basically offloading the calculation of the 3 statuses to the database during INSERT/UPDATE. Then during SELECT you query should be much faster.
Your example does not specify any result filtering (WHERE clause), but if you have one, it might also have an impact on the way one optimises indices for query performance.
Original answer: Below is the originally proposed change to the query.
Here i assume that indexing part of the optimisation has been already done.
Could you try with SUM instead of COUNT?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
The rest:
Given you have specified sqlalchemy in your answer, here is the sqlalchemy code, which i used to model and generate the query:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
#hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
#hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
#hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
#tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
#passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
#failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
and:
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)
I have a scenario in which I have table A which has fixed data of 7 rows as below :
Range 0-59
60-119
120-179
180-239
240-299
300-499
500+
Now I have another table B in which I have units.. I have to show all records of Table A and then the units which fall in that range that particular value and for rest range as 0 or null.
So in order to join the two tables I created a column in table B with case statement so that rows have the range as the column :
(case
when v between '0' and '59' then '0 to 59'
when v between '60' and '119' then '60 to 119'
when V between '120' and '179' then '120 to 179'
when v between '180' and '239' then '180 to 239'
when v between '240' and '299' then '240 to 299'
when v between '300' and '499' then '300 to 499'
when v >'500' then '500+'
else 'other'
end)
Then I joined these two tables.... Now when I populate these records I get just the matching row only one.... My requirement is to show all ranges as well.....
Please refer below screenshots for Clarification of scenario :
I'm running the following stored procedure and there's a join of Classes and Dates tables. However, out of seven test records, I'm getting a single duplicate record in the results:
SELECT DISTINCT dbo.Classes.Title, dbo.Classes.ClassTime, dbo.Classes.Category,
dbo.Classes.SubCategory1, dbo.Classes.[Description],dbo.Classes.ContactName,
dbo.Classes.ContactPhone, dbo.Classes.Location, dbo.Classes.Room,
dbo.Dates.StartDate
FROM dbo.Classes INNER JOIN dbo.Dates ON dbo.Classes.ClassID = dbo.Dates.ClassID
ORDER BY StartDate DESC
Most likely one of your Date columns of those two rows differ somewhat but the difference doesn't show up in the output.
You can verify this by dropping those columns from your results.
SELECT DISTINCT dbo.Classes.Title
, dbo.Classes.Category
, dbo.Classes.SubCategory1
, dbo.Classes.[Description]
, dbo.Classes.ContactName
, dbo.Classes.ContactPhone
, dbo.Classes.Location
, dbo.Classes.Room
FROM dbo.Classes
INNER JOIN dbo.Dates ON dbo.Classes.ClassID = dbo.Dates.ClassID
On a different note, I would advice you to use aliases to improve the readability of the statement.
SELECT DISTINCT dbo.Classes.Title
, c.Category
, c.SubCategory1
, c.[Description]
, c.ContactName
, c.ContactPhone
, c.Location
, c.Room
FROM dbo.Classes AS c
INNER JOIN dbo.Dates AS d ON c.ClassID = d.ClassID