sql avg not returning expected result

sql avg not returning expected result - sqlite

I'm running the following on my sqlite3 DB, but the result is not limited to the last 3 records. It is returning the average for all records.
SELECT AVG(time) FROM tbl_aa ORDER BY ID LIMIT 3
Any thoughts?

Use a subquery to get the first 3 records and then calculate the average on them
select avg(time) from
(
SELECT time
FROM tbl_a
ORDER BY ID
LIMIT 3
) x

Limit will restrict the number of results in your result set, however AVG is calculated on the entire set so will only return one row. Therefore the limit is redundant.

Related

Oracle 11g PLSQL - Splitting A Record Out Into Constituent Records Over Time - Row Generating

I have a dataset (a view) that has a numeric field "WR_EST_MHs". If that field exceeds a certain number of man hours (120 or 60, depending on 2 other fields' values), I need to split it out into constiuent records and spread those hours over future weeks.
The OH_UG_Key and 1kMCM_Flag fields determine the threshold for splitting. For example, if the OH_UG = 1 AND 1kMCM_Flag = 'N' and the WR_EST_MHs > 120, then spread the WR_EST_MHs value over as many records as is necessary, in 120 MH increments, changing only the WRSchedDate and WRSchedDate_Key fields (advancing each by one week).
Each OH_UG / 1kMCM_Flag / WR_EST_MHs scenario is as follows:
This is an example of what I need to do:
I thought that something like this might work, but I haven't worked with levels before:
with cte as
2 (Select * from "STJOF"."vfactScheduledWAWork"
5 )
6 select WR_Key, WP_Key, WRShedDate, DistSA_Key_Hash, CrewHQ_Key_Hash, Priority_Key_Hash, JobType_Key_Hash, WRStatus_Key_Hash, PerfBy_Key, OHUG_Key, 1kMCM_Flag, WR_EST_MHs
7 from cte cross join table(cast(multiset(select level from dual
8 connect by level >= WR_EST_MHs / 120
9 ) as sys.odcinumberlist))
10 order by WR_Key;
I also thought this could be done with a "tally table" which I have a little experience with. I really don't know where to begin on this one.

So I would say that a "Tally Table" will work if it is applied correctly. (Or, in this case, a tally view.)
First, break the logic for the hour breakout into a function so we don't have case when everywhere like so:
CREATE OR REPLACE FUNCTION get_hour_breakout(in_ohug_key IN NUMBER, in_1kmcm_flag in varchar2, in_tot_hours in number)
RETURN number
IS hours number;
BEGIN
hours:=
case when in_ohug_key=2 and in_1kmcm_flag='N' and in_tot_hours>60 then 60 else
case when in_ohug_key=2 and in_1kmcm_flag='Y' and in_tot_hours>60 and in_tot_hours<=120 then 60 else
case when in_ohug_key=2 and in_1kmcm_flag='Y' and in_tot_hours>120 then 120 else
120
end
end
end;
RETURN(hours);
END get_hour_breakout;
This way, if the hour breakout logic changes, it can be tweaked in one place.
Second, join to a dynamic "tally" view like so:
select wr_key,
WP_Key,
wrscheddate+idxkey.nnn*7 wrscheddate,
to_char(wrscheddate+idxkey.nnn*7,'yyyymmdd') WRSchedDate_Key,
OHUG_Key,
kMCM_Flag,
case when (wr_est_mhs-idxkey.nnn*get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs))>=get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) then get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) else wr_est_mhs-idxkey.nnn*get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) end wr_est_mhs
from yourView inner join (SELECT ROWNUM-1 nnn
FROM ( SELECT 1 just_a_column
FROM dual
CONNECT BY LEVEL <= 52
)
) idxkey on vwrk.wr_est_mhs/get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) > idxkey.nnn
By using the connect by level we, in effect, generate a bunch of zero indexed rows, then by joining to it with the hours divided by the breakout greater than the feed number we get a few rows for each group.
For example, if the function returns 120 and the hours are 100 you get a single row, so it stays 1 to 1. If the function returns 120 and the hours are 500, however, you get 5 rows because 500/120=4.1666666…, which in the join gives rows 4,3,2,1,0. Then the rest is simple math to determine the number of hours per breakout.
This could also be improved by moving the function call into the lower view so it is only used once per row. And the inline tally view could be made into it's own view, depends on the maintainability you need to build into it.

How to find maximum number of records for a particular key in a table

I was trying to find which customer has more number of records in a table, i got suggested by RANK function but its not the useful in finding the exact record , so i used this following snippet:
select count(customerkey),customerkey
FROM FILEMAPPERTEMPLATE
group by customerkey;
Result :
1 298,254
1 299,732
2 246,027
43 197,053
1 299,745
1 299,751
60 271,623
Though i am able to find how many reocrds attributed to a customerkey in the table, I couldn't find the single exact record(after executing the query ) that has maximum record fro a customer. Please help
I want only
60 271,623 as reult

select * from (select count(customerkey) cnt,customerkey
FROM FILEMAPPERTEMPLATE
group by customerkey order by cnt desc) where rownum<2;

Optimization of Oracle Query

I have following query :
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1, A2,A3,A4,A5,A6,COUNT(A7) AS total,A8
FROM (
select a.* from table1 a
left join (select * from table_reject where name = 'smith') b on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
GROUP BY
CUBE(A1, A2,A3,A4,A5,A6,A8)
)INN
WHERE
INN.A1 IS NOT NULL AND
INN.A2 IS NULL AND
INN.A3 IS NULL AND
INN.A4 IS NULL AND
INN.A5 IS NULL AND
INN.A6 is NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
Total number records in table1 is around 8 million.
My problem is i need to optimize the above query in best possible way.I did tried to make index on column A8 of table1 and creating the index helped me to decrease the cost of query but execution time of query is more or less same when there was no index on the table1.
Any help would be appreciated.
Thanks

CUBE operation on large data set is really expensive, so you need to check do you really need all that data in inner query. because i see you are doing COUNT in inner and then on the outer query you have SUM of counts. so in other words, give me the row count of A7 at for all combination A1-A8 (-A7). then get only SUM for selected combinations filtered by WHERE clause. we can sure optimize this by limiting CUBE on certain column itself but very obvious things so far i have notice are as follows.
if you use below query and have right index o Table1 and Table_reject then both query can utilize the Index and reduce the data set needs to be join and further processed.
I am not 100% sure but yes Partial CUBE processing is possible and need to check that.
clustered index --> Table1 need on A8 And Table_Reject need clustered index on NAME.
non-clustered index--> Table1 need on A3,A9and Table_reject need on B3,B2
SELECT qry1.
(
SELECT A1, A2,A3,A4,A5,A6,A7,A8
FROM table1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
)qry1
LEFT JOIN
(
select B3,B2,ID
from table_reject
where name = 'smith'
)qry2
ON qry1.A3 = qry2.B3 and qry1.A9=qry2.B2
WHERE qry2.ID IS NULL
EDIT1:
I tried to find out what will be the difference in CUBE operator result if you do it on all Columns or you do it on only columns that you need it in result set. what I found is the way CUBE function works you do not need to perform CUBE on all columns. because at the end you just care about combinations generated by CUBE where A1 and A8 is NOT NULL.
Try this link and see the output.
enter link description here
Query 1 and Query2 is just inner most queries to compare the CUBE result set.
Query3 and Query4 is the same query that you are trying and you see the results are same in both case.
DECLARE #NEXT_DAY DATE = NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1,COUNT(A7) AS total,A8
FROM (
select a.a1,a.a7,a.a8
from table1 a
left join (select * from table_reject where name = 'smith') b
on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= #NEXT_DAY
GROUP BY
CUBE(A1,A8)
)INN
WHERE INN.A1 IS NOT NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
EDIT3
As I mentioned in the Comment this is a Round3 update. i can not change comment but i meant Edit3 instead Round3.
well the new change in your query is adding the WHERE A8 >= #NEXT_DAY condition in the inner most left join select where A8 >= #NEXT_DAY AND B.ID is null as well. that has improved the selection very much.
in your last comment you mentioned that query is taking 30-35 second and as you change the value of A8 it keep increasing. now with the execution time you didn't mentioned how much data is in the result set. why that is important? because if my query is returning 5M rows as a final result set that will going to spend 90% time in just droping that data on to UI, or output file what ever output method you are using. but actual performance should be measured how soon the query has started giving first couple of rows. because by that time Optimizer has already decided the execution paln and DB is executing that plan. yet i agree that if query is returning 100 rows and taking 10 seconds then something can be wrong with execution plan.
to demo that what I did is I created the dummy data. and perfomred your query against it.
i have table Test_CubeData with 9M rows in it with the same column numbers and data type you explained for your Table1. I have second table Table_Reject with 80K rows with number of columns and its datatype I figured out from query. To test the extreme side of this table; name column has only one value "smith" and ID is null for all 80K rows. so column values that can affect inner left join result will be B2 and B3.
in these tests i do not have any index on both tables. both are heap. and you see the results are in still few seconds with acceptable range of data in result set. as my result data set increases the completion time increases. if i create explained indexes then it will give me Index Seek operation
for all these tested cases. but at certain point that index will also exhaust and become Index Scan.
one sure example would be if my filter value for A8 column is smallest date value exist in that column. in that case Optimizer will see that all 9M rows need to be participate in inner select and CUBE and lot of data will be get processed in memory. which is expected. on the other hand lets see the another example of queries. i have unique 32873 values in A8 column and those values are almost equally distributed among 9M rows. so per single A8 values there are 260 to 300 rows. now if I execute the query for any single value smallest, largest, or any thing in between the query execution time should not change.
notice the highlighted text in each image below that indicated what the value of A8 filter is chosen,
important columns only in the select list instead using *, added A8 filter in the inner left join query, execution plan showing the TableScan operation on both table, query execution time in second,and total number of rows return by the query.
I hope that this will clear some doubts on performance of your query and will help you to set right expectation.
**Table Row Counts**
**TableScan_InnerLeftJoin**
**TableScan_FullQuery_248Rows**
**TableScan_FullQuery_5K**
**TableScan_FullQuery_56K**
**TableScan_FullQuery_480k**

You're calculating a potentially very large cube result on seven columns, and then discarding all the results except those that are logically just a group_by on column A1.
I suggest that you rewrite the query to just group by A1.

Cognos: Count the number of occurences of a distinct id

I'm making a report in Cognos Report Studio and I'm having abit of trouble getting a count taht I need. What I need to do is count the number of IDs for a department. But I need to split the count between initiated and completed. If an ID occures more than once, it is to be counted as completed. The others, of course, will be initiated. So I'm trying to count the number of ID occurences for a distinct ID. Here is the query I've made in SQl Developer:
SELECT
COUNT((CASE WHEN COUNT(S.RFP_ID) > 8 THEN MAX(CT.GCT_STATUS_HISTORY_CLOSE_DT) END)) AS "Sales Admin Completed"
,COUNT((CASE WHEN COUNT(S.RFP_ID) = 8 THEN MIN(CT.GCT_STATUS_HISTORY_OPEN_DT) END)) as "Sales Admin Initiated"
FROM
ADM.B_RFP_WC_COVERAGE_DIM S
JOIN ADM.B_GROUP_CHANGE_REQUEST_DIM CR
ON S. RFP_ID = CR.GCR_RFP_ID
JOIN ADM.GROUP_CHANGE_TASK_FACT CT
ON CR.GROUP_CHANGE_REQUEST_KEY = CT.GROUP_CHANGE_REQUEST_KEY
JOIN ADM.B_DEPARTMENT_DIM D
ON D.DEPARTMENT_KEY = CT.DEPARTMENT_RESP_KEY
WHERE CR.GCR_CHANGE_TYPE_ID = '20'
AND S.RFP_LOB_IND = 'WC'
AND S.RFP_AUDIT_IND = 'N'
AND CR.GCR_RECEIVED_DT BETWEEN '01-JAN-13' AND '31-DEC-13'
AND D.DEPARTMENT_DESC = 'Sales'
AND CT.GCT_STATUS_IND = 'C'
GROUP BY S.RFP_ID ;
Now this works. But I'm not sure how to translate taht into Cognos. I tried doing a CASE taht looked liek this(this code is using basic names such as dept instead of D.DEPARTMENT_DESC):
CASE WHEN dept = 'Sales' AND count(ID for {DISTINCT ID}) > 1 THEN count(distinct ID)END)
I'm using count(distinct ID) instead of count(maximum(close_date)). But the results would be the same anyway. The "AND" is where I think its being lost. It obviously isn't the proper way to count occurences. But I'm hoping I'm close. Is there a way to do this with a CASE? Or at all?
--EDIT--
To make my question more clear, here is an example:
Say I have this data in my table
ID
---
1
2
3
4
2
5
5
6
2
My desired count output would be:
Initiated Completed
--------- ---------
4 2
This is because two of the distinct IDs (2 and 5) occure more than once. So they are counted as Completed. The ones that occure only once are counted as Initiated. I am able to do this in SQl Dev, but I can't figure out how to do this in Cognos Report Studio. I hope this helps to better explaine my issue.

Oh, I didn't quite got it originally, amending the answer.
But it's still easiest to do with 2 queries in Report Studio. Key moment is that you can use a query as a source for another query, guaranteeing proper group by's and calculations.
So if you have ID list in the table in Report Studio you create:
Query 1 with dataitems:
ID,
count(*) or count (1) as count_occurences
status (initiated or completed) with a formula: if (count_occurences > 1) then ('completed') else ('initiated').
After that you create a query 2 using query one as source with just 2 data items:
[Query1].[Status]
Count with formula: count([Query1].[ID])
That will give you the result you're after.
Here's a link to doco on how to nest queries:
http://pic.dhe.ibm.com/infocenter/cx/v10r1m0/topic/com.ibm.swg.ba.cognos.ug_cr_rptstd.10.1.0.doc/c_cr_rptstd_wrkdat_working_with_queries_rel.html?path=3_3_10_6#cr_rptstd_wrkdat_working_with_queries_rel

Fastest Way to Count Distinct Values in a Column, Including NULL Values

The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.
This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.
This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).
What is the most efficient way to do this?
Update After Testing
I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest
Count/GroupBy (Cheran)
CountDistinct+SubQuery (Ellis)
dense_rank (Eriksson)
Count+Max (Andriy)
Testing Results (in seconds):
Method 20_Columns 5_Columns 1_Column (Large) 1_Column (Small)
1) Count/GroupBy 10.8 4.8 2.8 0.14
2) CountDistinct 12.4 4.8 3 0.7
3) dense_rank 226 30 6 4.33
4) Count+Max 98.5 44 16 12.5
Notes:
Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.
Thanks for the help and suggestions!

SELECT COUNT(*)
FROM (SELECT ColumnName
FROM TableName
GROUP BY ColumnName) AS s;
GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.

I think you should try to keep the number of table scans down and count all columns in one table in one go. Something like this could be worth trying.
;with C as
(
select dense_rank() over(order by Col1) as dnCol1,
dense_rank() over(order by Col2) as dnCol2
from YourTable
)
select max(dnCol1) as CountCol1,
max(dnCol2) as CountCol2
from C
Test the query at SE-Data

A development on OP's own solution:
SELECT
COUNT(DISTINCT acolumn) + MAX(CASE WHEN acolumn IS NULL THEN 1 ELSE 0 END)
FROM atable

Run one query that Counts the number of Distinct values and adds 1 if there are any NULLs in the column (using a subquery)
Select Count(Distinct COLUMNNAME) +
Case When Exists
(Select * from TABLENAME Where COLUMNNAME is Null)
Then 1 Else 0 End
From TABLENAME

You can try:
count(
distinct coalesce(
your_table.column_1, your_table.column_2
-- cast them if you want replace value from column are not same type
)
) as COUNT_TEST
Function coalesce help you combine two columns with replace not null values.
I used this in mine case and success with correctly result.

Not sure this would be the fastest but might be worth testing. Use case to give null a value. Clearly you would need to select a value for null that would not occur in the real data. According to the query plan this would be a dead heat with the count(*) (group by) solution proposed by Cheran S.
SELECT
COUNT( distinct
(case when [testNull] is null then 'dbNullValue' else [testNull] end)
)
FROM [test].[dbo].[testNullVal]
With this approach can also count more than one column
SELECT
COUNT( distinct
(case when [testNull1] is null then 'dbNullValue' else [testNull1] end)
),
COUNT( distinct
(case when [testNull2] is null then 'dbNullValue' else [testNull2] end)
)
FROM [test].[dbo].[testNullVal]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

sql avg not returning expected result - sqlite

I'm running the following on my sqlite3 DB, but the result is not limited to the last 3 records. It is returning the average for all records. SELECT AVG(time) FROM tbl_aa ORDER BY ID LIMIT 3 Any thoughts?

Use a subquery to get the first 3 records and then calculate the average on them select avg(time) from ( SELECT time FROM tbl_a ORDER BY ID LIMIT 3 ) x

Limit will restrict the number of results in your result set, however AVG is calculated on the entire set so will only return one row. Therefore the limit is redundant.

Related

Oracle 11g PLSQL - Splitting A Record Out Into Constituent Records Over Time - Row Generating

How to find maximum number of records for a particular key in a table

Optimization of Oracle Query

Cognos: Count the number of occurences of a distinct id

Fastest Way to Count Distinct Values in a Column, Including NULL Values

Categories

Resources