I am trying to generate the below query in Teradata SQL Assistant. Unfortunately I am running into a spool space error. Any tips on how to make this query execute using less resources?
SELECT TOP 10
A.USER_ID,
C.SUPER_NM_LVL2,
C.SUPER_NM_LVL1,
DIR.SUPER_NM_LVL2,
A.ACTIVITY_DT,
B.MTN,
A.ACCT_NUM,
A.PPLAN_CD_CURR
FROM DLY_LINE_ACTIVITY_PPLAN_V AS A
INNER JOIN
CUST_ACCT_LINE_V AS B
ON A.ACCT_NUM=B.ACCT_NUM
INNER JOIN
HR_EMPLOYEE_V AS C
ON A.USER_ID=C.NT_USER_ID
INNER JOIN
HR_EMPLOYEE_V AS DIR
ON C.SUPER_ID_LVL2_EMP_ID=DIR.EMP_ID
WHERE
A.ACTIVITY_DT >= '2012-11-01'
AND A.ACTIVITY_DT <= '2012-11-19'
AND C.EMP_AREA_CD = 'WE'
AND A.PPLAN_CD_CURR IN ('86489', '86491', '86492',
'86494', '86495', '86496', '86497', '86498', '86499', '86500', '86501',
'86502', '86487', '86489', '86504', '86505', '86506', '86507', '86508',
'86509', '86510', '86511', '86512')
GROUP BY
A.USER_ID,
C.SUPER_NM_LVL2,
C.SUPER_NM_LVL1,
DIR.SUPER_NM_LVL2,
A.ACTIVITY_DT,
B.MTN,
A.ACCT_NUM,
A.PPLAN_CD_CURR
EDIT: Here is the explain statement.
Explanation
1) First, we lock UDM_PRDSTG_SECTBLS.MASKING_PROD_NM_XREF in view
NTL_PRD_ALLVM.CUST_ACCT_LINE_V for access, we lock
UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V for access, we lock
UDM_PRD_TBLS.CUST_ACCT_LINE in view NTL_PRD_ALLVM.CUST_ACCT_LINE_V
for access, and we lock UDM_PRD_TBLS.HR_EMPLOYEE in view
NTL_PRD_ALLVM.HR_EMPLOYEE_V for access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from a single partition of
UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V with a condition of (
"(NOT (UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.ACCT_NUM IS NULL ))
AND ((NOT (UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.USER_ID IS NULL ))
AND ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.ACTIVITY_DT <= DATE
'2012-11-19') AND ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in
view NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.ACTIVITY_DT >=
DATE '2012-11-01') AND ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN
in view NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR
= '86489') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86487') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86491') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86492') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86494') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86495') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86496') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86497') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86498') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86499') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86500') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86501') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86502') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86504') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86505') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86506') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86507') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86508') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86509') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86510') OR ((UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86511') OR (UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN in view
NTL_PRD_ALLVM.DLY_LINE_ACTIVITY_PPLAN_V.PPLAN_CD_CURR =
'86512')))))))))))))))))))))))))") into Spool 9 (all_amps)
(compressed columns allowed), which is built locally on the
AMPs. The size of Spool 9 is estimated with low confidence
to be 2,908,683 rows (119,256,003 bytes). The estimated time
for this step is 0.02 seconds.
2) We do an all-AMPs RETRIEVE step from UDM_PRD_TBLS.HR_EMPLOYEE
in view NTL_PRD_ALLVM.HR_EMPLOYEE_V by way of an all-rows
scan with a condition of ("(NOT (UDM_PRD_TBLS.HR_EMPLOYEE in
view NTL_PRD_ALLVM.HR_EMPLOYEE_V.NT_USER_ID IS NULL )) AND
((NOT (UDM_PRD_TBLS.HR_EMPLOYEE in view
NTL_PRD_ALLVM.HR_EMPLOYEE_V.SUPER_ID_LVL2_EMP_ID IS NULL ))
AND (UDM_PRD_TBLS.HR_EMPLOYEE in view
NTL_PRD_ALLVM.HR_EMPLOYEE_V.EMP_AREA_CD = 'WE'))") into Spool
10 (all_amps) (compressed columns allowed), which is
duplicated on all AMPs. The size of Spool 10 is estimated
with low confidence to be 31,938,444 rows (2,012,121,972
bytes). The estimated time for this step is 0.67 seconds.
3) We do an all-AMPs RETRIEVE step from UDM_PRD_TBLS.A in view
NTL_PRD_ALLVM.CUST_ACCT_LINE_V by way of an all-rows scan
with a condition of ("(( CASE WHEN (NOT (UDM_PRD_TBLS.A in
view NTL_PRD_ALLVM.CUST_ACCT_LINE_V.CUST_ASSOC_ID IS NULL ))
THEN (UDM_PRD_TBLS.A in view
NTL_PRD_ALLVM.CUST_ACCT_LINE_V.CUST_ASSOC_ID) ELSE (' ') END
))<> '12'") into Spool 11 (all_amps) (compressed columns
allowed) fanned out into 14 hash join partitions, which is
built locally on the AMPs. The size of Spool 11 is estimated
with no confidence to be 186,602,214 rows (5,784,668,634
bytes). The estimated time for this step is 2.27 seconds.
3) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an
all-rows scan, which is joined to Spool 10 (Last Use) by way of an
all-rows scan. Spool 9 and Spool 10 are joined using a single
partition hash join, with a join condition of ("USER_ID =
NT_USER_ID"). The result goes into Spool 12 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs into
14 hash join partitions. The size of Spool 12 is estimated with
low confidence to be 360,568,332 rows (30,648,308,220 bytes). The
estimated time for this step is 8.09 seconds.
4) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an
all-rows scan, which is joined to Spool 12 (Last Use) by way of an
all-rows scan. Spool 11 and Spool 12 are joined using a hash join
of 14 partitions, with a join condition of ("ACCT_NUM = ACCT_NUM").
The result goes into Spool 13 (all_amps) (compressed columns
allowed), which is redistributed by the hash code of (
UDM_PRD_TBLS.HR_EMPLOYEE.SUPER_ID_LVL2_EMP_ID) to all AMPs. The
size of Spool 13 is estimated with no confidence to be 8,247,457
rows (783,508,415 bytes). The estimated time for this step is
1.01 seconds.
5) We do an all-AMPs JOIN step from UDM_PRD_TBLS.HR_EMPLOYEE in view
NTL_PRD_ALLVM.HR_EMPLOYEE_V by way of an all-rows scan with no
residual conditions, which is joined to Spool 13 (Last Use) by way
of an all-rows scan. UDM_PRD_TBLS.HR_EMPLOYEE and Spool 13 are
joined using a single partition hash join, with a join condition
of ("SUPER_ID_LVL2_EMP_ID = UDM_PRD_TBLS.HR_EMPLOYEE.EMP_ID").
The result goes into Spool 8 (all_amps) (compressed columns
allowed), which is built locally on the AMPs. The size of Spool 8
is estimated with no confidence to be 8,247,457 rows (882,477,899
bytes). The estimated time for this step is 0.06 seconds.
6) We do an all-AMPs SUM step to aggregate from Spool 8 (Last Use) by
way of an all-rows scan , grouping by field1 (
UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN.USER_ID
,UDM_PRD_TBLS.HR_EMPLOYEE.SUPER_NM_LVL2
,UDM_PRD_TBLS.HR_EMPLOYEE.SUPER_NM_LVL1
,UDM_PRD_TBLS.HR_EMPLOYEE.SUPER_NM_LVL2
,UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN.ACTIVITY_DT
,UDM_PRD_TBLS.A.MTN ,UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN.ACCT_NUM
,UDM_PRD_TBLS.DLY_LINE_ACTIVITY_PPLAN.PPLAN_CD_CURR). Aggregate
Intermediate Results are computed globally, then placed in Spool
14. The size of Spool 14 is estimated with no confidence to be
8,247,455 rows (2,762,897,425 bytes). The estimated time for this
step is 0.76 seconds.
7) We do an all-AMPs STAT FUNCTION step from Spool 14 by way of an
all-rows scan into Spool 18, which is redistributed by hash code
to all AMPs. The result rows are put into Spool 5 (all_amps),
which is built locally on the AMPs. This step is used to retrieve
the TOP 10 rows. Load distribution optimization is used.
If this step retrieves less than 10 rows, then execute step 8.
The size is estimated with no confidence to be 10 rows (1,130
bytes).
8) We do an all-AMPs STAT FUNCTION step from Spool 14 (Last Use) by
way of an all-rows scan into Spool 18 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 5 (all_amps), which is built locally on the AMPs. This
step is used to retrieve the TOP 10 rows. The size is estimated
with no confidence to be 10 rows (1,130 bytes).
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 5 are sent back to the user as the result of
statement 1.
try moving your limiting criteria closer to your massive tables:
SELECT TOP 10
A.USER_ID,
C.SUPER_NM_LVL2,
C.SUPER_NM_LVL1,
DIR.SUPER_NM_LVL2,
A.ACTIVITY_DT,
B.MTN,
A.ACCT_NUM,
A.PPLAN_CD_CURR
FROM
( SELECT * FROM DLY_LINE_ACTIVITY_PPLAN_V
WHERE
ACTIVITY_DT >= '2012-11-01'
AND ACTIVITY_DT <= '2012-11-19'
AND PPLAN_CD_CURR IN ('86489', '86491', '86492',
'86494', '86495', '86496', '86497', '86498', '86499', '86500', '86501',
'86502', '86487', '86489', '86504', '86505', '86506', '86507', '86508',
'86509', '86510', '86511', '86512')
)
AS A
INNER JOIN
CUST_ACCT_LINE_V AS B
ON A.ACCT_NUM=B.ACCT_NUM
INNER JOIN
(SELECT * FROM HR_EMPLOYEE_V
WHERE
C.EMP_AREA_CD = 'WE' ) C
ON A.USER_ID=C.NT_USER_ID
INNER JOIN
HR_EMPLOYEE_V AS DIR
ON C.SUPER_ID_LVL2_EMP_ID=DIR.EMP_ID
GROUP BY
A.USER_ID,
C.SUPER_NM_LVL2,
C.SUPER_NM_LVL1,
DIR.SUPER_NM_LVL2,
A.ACTIVITY_DT,
B.MTN,
A.ACCT_NUM,
A.PPLAN_CD_CURR
Related
This query worked perfectly until the moment I went in for vacations, now itdoes not run anymore and does not merge, dont know what it can be
MERGE INTO STG_FATO_MACRO_GESTAO AS FAT
USING(SELECT DISTINCT
COD_EMPRESA
,FUN.MATRICULA AS FUN_MAT
,APR.MATRICULA AS APR_MAT
,FUN.CPF AS FUN_CPF
,APR.CPF AS APR_CPF
,APR.DAT_DESLIGAMENTO
,YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) AS DESL
,FUN.DATA_ADMISSAO
,YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) AS ADM
, CASE WHEN YEAR(APR.DAT_DESLIGAMENTO)*100+MONTH(APR.DAT_DESLIGAMENTO) <= YEAR(FUN.DATA_ADMISSAO)*100+MONTH(FUN.DATA_ADMISSAO) THEN 1 ELSE 0 END AS ADMITIDO
,CASE WHEN FUN.DATA_ADMISSAO <= (APR.DAT_DESLIGAMENTO + INTERVAL '90' DAY) THEN 1 ELSE 0 END AS APR_90
FROM (SELECT CPF,DATA_ADMISSAO, MATRICULA, COD_EMPRESA FROM DIM_FUNCIONARIO
WHERE PROFISSAO NOT LIKE '%APRENDIZ%') AS FUN
INNER JOIN (SELECT DISTINCT
CPF,DAT_DESLIGAMENTO,MATRICULA
FROM HST_APRENDIZ
WHERE FLAG_FECHAMENTO = 2
AND DAT_DESLIGAMENTO IS NOT NULL) AS APR
ON FUN.CPF = APR.CPF) AS APR_90
ON FAT.COD_EMPRESA = APR_90.COD_EMPRESA
AND FAT.MATRICULA = APR_90.FUN_MAT
AND APR_90.APR_90 = 1
AND APR_90.ADMITIDO = 1
WHEN MATCHED THEN
UPDATE SET APRENDIZ_EFETIVADO_90 = 1
;
when running this query returns me this error:
"The search condition must fully specify the Target table primary index and partition column(s) and expression must match INSERT specification primary index and partition column(s). "
I have a SQL query in teradata that returns a results set of ~160m rows in (I guess) a reasonable time: dependent on how good a day the server is having it runs between 10-60 minutes.
I recently got access to space to save it as a table, however using my initial query and the "insert into " command I get error 2646-no more spool.
query structure is
insert into <test_DB.tablename>
with smaller_dataset as
(
select
*
from
(
select
items
,case items
from
<Database.table>
QUALIFY ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY C desc , LAST_UPDATE_DTM DESC) = 1
where 1=1
and other things
) T --irrelevant alias for subquery
QUALIFY ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY C desc) = 1)
, employee_table as
(
select
items
,max(J1.field1) J1_field1
,max(J2.field1) J2_field1
,max(J3.field1) J3_field1
,max(J4.field1) J4_field1
from smaller_dataset S
self joins J1,J2,J3,J4
group by
non-aggregate items
)
select
items
case items
from employee_table
;
How can I break up the return into smaller chunks to prevent this error?
I have a big table which is 100k rows in size and the PRIMARY KEY is of the datatype NUMBER. The way data is populated in this column is using a random number generator.
So my question is, can there be a possibility to have a SQL query that can help me with getting partition the table evenly with the range of values. Eg: If my column value is like this:
1
2
3
4
5
6
7
8
9
10
And I would like this to be broken into three partitions, then I would expect an output like this:
Range 1 1-3
Range 2 4-7
Range 3 8-10
It sounds like you want the WIDTH_BUCKET() function. Find out more.
This query will give you the start and end range for a table of 1250 rows split into 20 buckets based on id:
with bkt as (
select id
, width_bucket(id, 1, 1251, 20) as id_bucket
from t23
)
select id_bucket
, min(id) as bkt_start
, max(id) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1
;
The two middle parameters specify min and max values; the last parameter specifies the number of buckets. The output is the rows between the minimum and maximum bows split as evenly as possible into the specified number of buckets. Be careful with the min and max parameters; I've found poorly chosen bounds can have an odd effect on the split.
This solution works without width_bucket function. While it is more verbose and certainly less efficient it will split the data as evenly as possible, even if some ID values are missing.
CREATE TABLE t AS
SELECT rownum AS id
FROM dual
CONNECT BY level <= 10;
WITH
data AS (
SELECT id, rownum as row_num
FROM t
),
total AS (
SELECT count(*) AS total_rows
FROM data
),
parts AS (
SELECT rownum as part_no, total.total_rows, total.total_rows / 3 as part_rows
FROM dual, total
CONNECT BY level <= 3
),
bounds AS (
SELECT parts.part_no,
parts.total_rows,
parts.part_rows,
COALESCE(LAG(data.row_num) OVER (ORDER BY parts.part_no) + 1, 1) AS start_row_num,
data.row_num AS end_row_num
FROM data
JOIN parts
ON data.row_num = ROUND(parts.part_no * parts.part_rows, 0)
)
SELECT bounds.part_no, d1.ID AS start_id, d2.ID AS end_id
FROM bounds
JOIN data d1
ON d1.row_num = bounds.start_row_num
JOIN data d2
ON d2.row_num = bounds.end_row_num
ORDER BY bounds.part_no;
PART_NO START_ID END_ID
---------- ---------- ----------
1 1 3
2 4 7
3 8 10
I am responsible for an old time recording system which was written in ASP.net Web Forms using ADO.Net 2.0 for persistence.
Basically the system allows users to add details about a piece of work they are doing, the amount of hours they have been assigned to complete the work as well as the amount of hours they have spent on the work to date.
The system also has a reporting facility with the reports based on SQL queries. Recently I have noticed that many reports being run from the system have become very slow to execute. The database has around 11 tables, and it doesn’t store too much data. 27,000 records is the most records any one table holds, with the majority of tables well below even 1,500 records.
I don’t think the issue is therefore related to large volumes of data, I think it is more to do with poorly constructed sql queries and possibly even the same applying to the database design.
For example, there are queries similar to this
#start_date datetime,
#end_date datetime,
#org_id int
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
consultant.consultant_fname + ' ' + consultant.consultant_lname AS [Person],
proposal.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(proposal_time_assigned.days_assigned)-- * 8.0)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), '0')
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(proposal_time_assigned.days_assigned)
from proposal_time_assigned
where proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant,
proposal,
proposal_time_assigned
where (proposal_time_assigned.consultant_id = consultant.consultant_id)
and (t1.proposal_job_ref = proposal.proposal_ref_code)
and (proposal_time_assigned.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Which are expected to return data for reports. I am not even sure if anyone can follow what is happening in the query above, but basically there are quite a few calculations happening, i.e., dividing, multiplying, substraction. I am guessing this is what is slowing down the sql queries.
I suppose my question is, can anyone even make enough sense of the query above to even suggest how to speed it up.
Also, should calculations like the ones mentioned above ever been carried out in an sql query? Or should the this be done within code?
Any help would be really appreciated with this one.
Thanks.
based on the information given i had to do an educated guess about certain table relationships. if you post the table structures, indexes etc... we can complete remaining columns in to this query.
As of right now this query calculates "Days Assigned", "Days Spent" and "Days Remaining"
for the KEY "timesheet_id and proposal_job_ref"
what we have to see is how "work_date", "timesheet_time", "[Person]", "proposal_title" is associate with that.
are these calculation by person and Proposal_title as well ?
you can use sqlfiddle to provide us the sample data and output so we can work off the meaning full data instead doing guesses.
SELECT
q1.timesheet_id
,q1.proposal_job_ref
,q1.[Total Days Assigned]
,q2.[Days Spent Since 1st April 2013]
,(
q1.[Total Days Assigned]
-
q2.[Days Spent Since 1st April 2013]
) AS [Total Days Remaining]
FROM
(
select
t1.timesheet_id
,t1.proposal_job_ref
,sum(t4.days_assigned) as [Total Days Assigned]
from tbl1.timesheet_entries t1
JOIN tbl1.proposal t2
ON t1.proposal_job_ref=t2.proposal_ref_code
JOIN tbl1.proposal_time_assigned t4
ON t4.proposal_ref_code = t1.proposal_job_ref
JOIN tbl1.consultant t3
ON t3.consultant_id=t4.consultant_id
WHERE t1.code_id = #org_id
AND t1.work_date BETWEEN #start_date AND #end_date
AND t1.proposal_job_ref <> '0'
GROUP BY t1.timesheet_id,t1.proposal_job_ref
)q1
JOIN
(
select
tbl1.timesheet_id,tbl1.proposal_job_ref
,isnull(sum(tbl1.timesheet_time / 8.0), '0') AS [Days Spent Since 1st April 2013]
from tbl1.timesheet_entries tbl1
JOIN tbl1.timesheet_entries tbl2
ON tbl1.proposal_job_ref=tbl2.proposal_job_ref
AND tbl2.work_date <= tbl1.work_date
AND tbl2.work_date >= '01/04/2013'
WHERE tbl1.code_id = #org_id
AND tbl1.work_date BETWEEN #start_date AND #end_date
AND tbl1.proposal_job_ref <> '0'
GROUP BY tbl1.timesheet_id,tbl1.proposal_job_ref
)q2
ON q1.timesheet_id=q2.timesheet_id
AND q1.proposal_job_ref=q2.proposal_job_ref
The Problem what i see in your query is :
1> Alias name is not provided for the Tables.
2> Subqueries are used (which are execution cost consuming) instead of WITH clause.
if i would write your query it will look like this :
select distinct t1.timesheet_id,
t1.proposal_job_ref,
t1.work_date AS [Work Date],
c1.consultant_fname + ' ' + c1.consultant_lname AS [Person],
p1.proposal_title AS [Work Title],
t1.timesheet_time AS [Hours],
--GET TOTAL DAYS ASSIGNED TO PROPOSAL
(select sum(pta2.days_assigned)-- * 8.0)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
as [Total Days Assigned],
--GET TOTAL DAYS SPENT ON THE PROPOSAL SINCE 1ST APRIL 2013
(select isnull(sum(t2.timesheet_time / 8.0), 0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013' )
as [Days Spent Since 1st April 2013],
--GET TOTAL DAYS REMAINING ON THE PROPOSAL
(select sum(pta2.days_assigned)
from proposal_time_assigned pta2
where pta2.proposal_ref_code = t1.proposal_job_ref )
-
(select sum(t2.timesheet_time / 8.0)
from timesheet_entries t2
where t2.proposal_job_ref = t1.proposal_job_ref
and t2.work_date <= t1.work_date
) as [Total Days Remaining]
from timesheet_entries t1,
consultant c1,
proposal p1,
proposal_time_assigned pta1
where (pta1.consultant_id = c1.consultant_id)
and (t1.proposal_job_ref = p1.proposal_ref_code)
and (pta1.proposal_ref_code = t1.proposal_job_ref)
and (t1.code_id = #org_id) and (t1.work_date >= #start_date) and (t1.work_date <= #end_date)
and (t1.proposal_job_ref <> '0')
order by 2, 3
Check above query for any indexing option & number of records to be processed from each table.
Check your databases for indexes on the following tables (if those columns are not indexed, then start by indexing each).
proposal_time_assigned.proposal_ref_code
proposal_time_assigned.consultant_id
timesheet_entries.code_id
timesheet_entries.proposal_job_ref
timesheet_entries.work_date
consultant.consultant_id
proposal.proposal_ref_code
Without all of these indexes, nothing will improve this query.
The only thing in your query that would affect performance is the way you are filtering the [work_date]. Your current syntax causes a table scan:
--bad
and t2.work_date <= t1.work_date
and t2.work_date >= '01/04/2013'
This syntax uses an index (if it exists) and would be much faster:
--better
and t2.work_date between t1.work_date and '01/04/2013'
I have query that runs as part of a function which produces a one row table full of counts, and averages, and comma separated lists like this:
select
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1) applicants,
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1
and applied != 'yes') s_applicants,
(select
LISTAGG(discipline, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_number
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines,
(select
LISTAGG(discipline_count, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_count
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines_count,
(select
round(avg(util.getawardstocols(application_id,'1','AWARD_NAME')), 2)
from vw_disp_details
where round = 2013
and rating = 1) average_award_score,
(select
round(avg(age))
from vw_disp_details
where round = 2013
and rating = 1) average_age
from dual;
Except that instead of 6 main sub-queries there are 23.
This returns something like this (if it were a CSV):
applicants | s_applicants | disciplines | disciplines_count | average_award_score | average_age
107 | 67 | "speed,accuracy,strength" | 3 | 97 | 23
Now I am programmatically swapping out the "rating = 1" part of the where clauses for other expressions. They all work rather quickly except for the "rating = 1" one which takes about 90 seconds to run and that is because the rating column in the vw_disp_details view is itself compiled by a sub-query:
(SELECT score
FROM read r,
eval_criteria_lookup ecl
WHERE r.criteria_id = ecl.criteria_id
AND r.application_id = a.lgo_application_id
AND criteria_description = 'Overall Score'
AND type = 'ABC'
) reader_rank
So when the function runs this extra query seems to slow everything down dramatically.
My question is, is there a better (more efficient) way to run a query like this that is basically just a series of counts and averages, and how can I refactor to optimize the speed so that the rating = 1 query doesn't take 90 seconds to run.
You could choose to MATERIALIZE the vw_disp_details VIEW. That would pre-calculate the value of the rating column. There are various options for how up-to-date a materialized view is kept, you would probably want to use the ON COMMIT clause so that vw_disp_details is always correct.
Have a look at the official documentation and see if that would work for you.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm
Do all most of your queries in only one. Instead of doing:
select
(select (count(*) from my_tab) as count_all,
(select avg(age) from my_tab) as avg_age,
(select avg(mypkg.get_award(application_id) from my_tab) as_avg-app_id
from dual;
Just do:
select count(*), avg(age),avg(mypkg.get_award(application_id)) from my_tab;
And then, maybe you can do some union all for the other results. But this step all by itself should help.
I was able to solve this issue by doing two things: creating a new view that displayed only the results I needed, which gave me marginal gains in speed, and in that view moving the where clause of the sub-query that caused the lag into the where clause of the view and tacking on the result of the sub-query as column in the view. This still returns the same results thanks to the fact that there are always going to be records in the table the sub-query accessed for each row of the view query.
SELECT
a.application_id,
util.getstatus (a.application_id) status,
(SELECT score
FROM applicant_read ar,
eval_criteria_lookup ecl
WHERE ar.criteria_id = ecl.criteria_id
AND ar.application_id = a.application_id
AND criteria_description = 'Overall Score' //THESE TWO FIELDS
AND type = 'ABC' //ARE CRITERIA_ID = 15
) score
as.test_total test_total
FROM application a,
applicant_scores as
WHERE a.application_id = as.application_id(+);
Became
SELECT
a.application_id,
util.getstatus (a.application_id) status,
ar.score,
as.test_total test_total
FROM application a,
applicant_scores as,
applicant_read ar
WHERE a.application_id = as.application_id(+)
AND ar.application_id = a.application_id(+)
AND ar.criteria_id = 15;