PL SQL SELECT Case Statements involving aggregate values - plsql

I'm trying to write a query that in Teradata but I'm not sure how to do it; my table looks like this:
col1: text (account_number)
col2: text (secondary account number)
col3: text (Primary_cust)
the business requirements are:
"Group records by account number.
If there is only one record for an account then keep that record.
If there are multiple records for an account number then:
(1) if only one record has Primary_CUST = 'Y' then keep.
(2) if multiple records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR
(3) If no records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR.
I know I need a CASE statement and I'm able to write the first requirement, but not sure on the second. Any help would be greatly appreciated.

You just have to think about how to order the rows to get the row you want on top, seems to be like this:
SELECT * FROM tab
QUALIFY
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR -- lowest number
) = 1 -- return the top row
Of course QUALIFY is proprietary Teradata syntax, if you need to do this on Oracle you have to wrap it in a Derived Table:
SELECT *
FROM
(
SELECT t.*,
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR) AS rn-- lowest number
FROM tab
) AS dt
WHERE rn = 1 -- return the top row

Related

Count returns null with Insert Into, but it works when I just use select in Netezza

I'm trying to insert data into a table and one of my columns is coming back null. The columns I sum are working fine, but the one I count is returning null with the insert into command followed by select. When I eliminate insert into, and just run my select statement I get the result I need. What am I doing wrong?
`
INSERT INTO DAILY_TOTALS(TIME_FRAME,
VIDEO,
DATA,
VOICE,
PREV_VIDEO,
PREV_DATA,
PREV_VOICE,
VIDEO_REVENUE,
INTERNET_REVENUE,
PHONE_REVENUE,
OCC_REVENUE,
TOTAL_REVENUE,
ACCOUNTS)
SELECT
D.TIME_FRAME,
D.VIDEO,
D.DATA,
CASE WHEN D.VOICE ='N' THEN 'N' ELSE 'Y' END AS VOICE,
D.PREV_VIDEO,
D.PREV_DATA,
CASE WHEN D.PREV_VOICE = 'N' THEN 'N' ELSE 'Y' END AS PREV_VOICE,
SUM(D.VIDEO_REVENUE) AS VIDEO_REVENUE,
SUM(D.INTERNET_REVENUE) AS INTERNET_REVENUE,
SUM(D.OCC_REVENUE) AS OCC_REVENUE,
SUM(D.TOTAL_REVENUE) AS TOTAL_REVENUE,
COUNT(D.ACCOUNT_NUMBER) AS ACCOUNTS
FROM DAILY D
WHERE D.TIME_FRAME = CURRENT_DATE -1
GROUP BY
D.TIME_FRAME,
D.VIDEO,
D.DATA,
D.VOICE,
D.PREV_VIDEO,
D.PREV_DATA,
D.PREV_VOICE`
Aren't you missing 1 column?
The INSERT statement mentions 13 columns ...but the SELECT only selects 12 columns.
So "COUNT(D.ACCOUNT_NUMBER) AS ACCOUNTS" is being inserted into the TOTAL_REVENUE column. And the ACCOUNTS column in your table is NULL because you haven't specified anything to go in there.

SQLite: Using COALESCE inside a CASE statement

I have two tables: one with record of a person with initial number, and a second one with records of changes to this number.
During a join, I do coalesce(latest_of_series, initial) to get a singular number per person. So good so far.
I also group this numbers into a groups, on order these groups separately. I know I can do:
select
coalesce(latest, initial) as final,
case
when coalesce(latest, inital) > 1 and coalesce(latest, inital) < 100 then 'group 1'
-- other cases
end as group
-- rest of the query
but that's of course horribly unreadable.
I tried:
select
coalesce(latest_of_series, initial_if_no_series) as value,
case
when value > 1 and value < 100 then 'group 1'
-- rest of the cases
end as group
-- rest of the query
but then the sqlite complains that there's no column "value"
Is there really no way of using previous result of coalesce as a "variable"?
That's not an SQLite limitation. That's an SQL limitation.
All the column names are decided as one. You can't define a column in line 2 of your query and then refer to it in line 3 of your query. All columns derive from the tables you select, each on their own, they can't "see" each other.
But you can use nested queries.
select
value,
case
when value >= 1 and value < 100 then 'group 1'
when value >= 100 and value < 200 then 'group 2'
else 'group 3'
end value_group
from
(
select
coalesce(latest_of_series, initial_if_no_series) as value
from
my_table
group by
user_id
) v
This way, the columns of the inner query can be decided as one, and the columns of the outer query can be decided as one. It might ever be faster, depending on the circumnstances.

SQLite order results by smallest difference

In many ways this question follows on from my previous one. I have a table that is pretty much identical
CREATE TABLE IF NOT EXISTS test
(
id INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL,
c INTEGER NOT NULL,
d INTEGER NOT NULL,
weather INTEGER NOT NULL);
in which I would typically have entries such as
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
INSERT INTO test (a,b,c,d) VALUES(1,2,5,5,10100306);
INSERT INTO test (a,b,c,d) VALUES(1,5,5,5,11100306);
INSERT INTO test (a,b,c,d) VALUES(5,5,5,5,21101306);
Typically this table would have multiple rows with the some/all of b, c and d values being identical but with different a and weather values. As per the answer to my other question I can certainly issue
WITH cte AS (SELECT *, DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn FROM test where a = 1) SELECT * FROM cte WHERE rn < 3;
No issues thus far. However, I have one further requirement which arises as a result of the weather column. Although this value is an integer it is in fact a composite where each digit represents a "banded" weather condition. Take for example weather = 20100306. Here 2 represents the wind direction divided up into 45 degree bands on the compass, 0 represents a wind speed range, 1 indicates precipitation as snow etc. What I need to do now while obtaining my ordered results is to allow for weather differences. Take for example the first two rows
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
Though otherwise similar they represent rather different weather conditions - the fourth number is four as opposed to 0 indicating a higher precipitation intensity brand. The WITH cte... above would rank the first two rows at the top which is fine. But what if I would rather have the row that differs the least from an incoming "weather condition" of 30130306? I would clearly like to have the second row appearing at the top. Once again, I can live with the "raw" result returned by WITH cte... and then drill down to the right row based on my current "weather condition" in Java. However, once again I find myself thinking that there is perhaps a rather neat way of doing this in SQL that is outwith my skill set. I'd be most obliged to anyone who might be able to tell me how/whether this can be done using just SQL.
You can sort the results 1st by DENSE_RANK() and 2nd by the absolute difference of weather and the incoming "weather condition":
WITH cte AS (
SELECT *,
DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn
FROM test
WHERE a = 1
)
SELECT a,b,c,d,weather
FROM cte
WHERE rn < 3
ORDER BY rn, ABS(weather - ?);
Replace ? with the value of that incoming "weather condition".

How to cast a column into decimal of varying significant digits in Oracle

I have a column that is stored in ###0.0000000000 format. In a report I'm generating I need it to only show a few significant digits. Problem is the number needed changes based on the product with a default of 2. There's a column in another table that provides the required digits per each product.
I've tried a few things so far but it seems to not like it and throws a syntax error.
Cast(A.Price as Numeric(10,coalesce(B.Sig_Digits,2)))
That threw an error so I tried making the coalesce part a column and aliasing it in case the coalesce broke it, and that didn't work either. Round will take a column as an argument but I don't want it to round. Other than an ugly
case when Sig_digits = 1 then to_char(price,'###0.0') when Sig_digits = 2...
etc. what other options are there? This is a very large report, with 100+ columns and a few million rows so I'd prefer to not do the case when.
Use TO_CHAR with RPAD to add 0s to the end of the format model to the correct number of decimal places:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value, sig ) AS
SELECT 123.456789, 2 FROM DUAL UNION ALL
SELECT 123456789.123456789, 7 FROM DUAL;
Query 1:
SELECT TO_CHAR( value, RPAD( 'FM999999999990.', sig + 15, '0' ) )
FROM table_name
Results:
| TO_CHAR(VALUE,RPAD('FM999999999990.',SIG+15,'0')) |
|---------------------------------------------------|
| 123.46 |
| 123456789.1234568 |

Fastest Way to Count Distinct Values in a Column, Including NULL Values

The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.
This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.
This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).
What is the most efficient way to do this?
Update After Testing
I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest
Count/GroupBy (Cheran)
CountDistinct+SubQuery (Ellis)
dense_rank (Eriksson)
Count+Max (Andriy)
Testing Results (in seconds):
Method 20_Columns 5_Columns 1_Column (Large) 1_Column (Small)
1) Count/GroupBy 10.8 4.8 2.8 0.14
2) CountDistinct 12.4 4.8 3 0.7
3) dense_rank 226 30 6 4.33
4) Count+Max 98.5 44 16 12.5
Notes:
Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.
Thanks for the help and suggestions!
SELECT COUNT(*)
FROM (SELECT ColumnName
FROM TableName
GROUP BY ColumnName) AS s;
GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.
I think you should try to keep the number of table scans down and count all columns in one table in one go. Something like this could be worth trying.
;with C as
(
select dense_rank() over(order by Col1) as dnCol1,
dense_rank() over(order by Col2) as dnCol2
from YourTable
)
select max(dnCol1) as CountCol1,
max(dnCol2) as CountCol2
from C
Test the query at SE-Data
A development on OP's own solution:
SELECT
COUNT(DISTINCT acolumn) + MAX(CASE WHEN acolumn IS NULL THEN 1 ELSE 0 END)
FROM atable
Run one query that Counts the number of Distinct values and adds 1 if there are any NULLs in the column (using a subquery)
Select Count(Distinct COLUMNNAME) +
Case When Exists
(Select * from TABLENAME Where COLUMNNAME is Null)
Then 1 Else 0 End
From TABLENAME
You can try:
count(
distinct coalesce(
your_table.column_1, your_table.column_2
-- cast them if you want replace value from column are not same type
)
) as COUNT_TEST
Function coalesce help you combine two columns with replace not null values.
I used this in mine case and success with correctly result.
Not sure this would be the fastest but might be worth testing. Use case to give null a value. Clearly you would need to select a value for null that would not occur in the real data. According to the query plan this would be a dead heat with the count(*) (group by) solution proposed by Cheran S.
SELECT
COUNT( distinct
(case when [testNull] is null then 'dbNullValue' else [testNull] end)
)
FROM [test].[dbo].[testNullVal]
With this approach can also count more than one column
SELECT
COUNT( distinct
(case when [testNull1] is null then 'dbNullValue' else [testNull1] end)
),
COUNT( distinct
(case when [testNull2] is null then 'dbNullValue' else [testNull2] end)
)
FROM [test].[dbo].[testNullVal]

Resources