TERADATA: Is it possible to ignore rows in an OLAP partition when the condition is met and still pass the value down when it isn't met? - teradata

I'm partitioning data based on a customers previous order, so if the customer previously added a service to their account (they either have the service or they don't), I want that value to carry down to the next row for that customer for all orders regardless of the order status, but I don't want canceled order services to be calculated with the next order, I want to skip those rows and bring down the value from the previously completed order. Does anyone know if this is possible? If I add the field into the Partition By clause, it'll partition by order status instead of reporting the order status from the previous completed order.
(
Sum
(
SUBSCR1_ORD
)
Over
(
PARTITION BY ACCT_NO
ORDER BY ORDER_DATE
ROWS BETWEEN 1 Preceding AND 1 Preceding
)
)
AS EXISTING_SVC1
This is what I'd want the results to look like for the EXISTING_SVC columns based on activity in the SUBSCR1_ORD column with special handing on ORDER_STATUS
ACCT_NO
ORDER_DATE
ORDER_STATUS
SUBSCR1_ORD
SUBSCR2_ORD
EXISTING_SVC1
EXISTING_SVC2
1234
6/5/2022
Complete
1
null
0
0
1234
6/6/2022
Canceled
-1
1
1
0
1234
6/7/2022
Complete
null
1
1
0

Use LAG with IGNORE NULLS and a CASE expression to "pull down" the prior value.
SELECT Acct_No, Order_Date, Order_Status, Subscr1_Ord, Subscr2_Ord,
LAG(CASE WHEN Order_Status='Canceled' THEN NULL ELSE Subscr1_Ord END,1,0)
IGNORE NULLS
OVER(PARTITION BY Acct_No ORDER BY Order_Date)
AS Existing_Svc1,
LAG(CASE WHEN Order_Status='Canceled' THEN NULL ELSE Subscr2_Ord END,1,0)
IGNORE NULLS
OVER(PARTITION BY Acct_No ORDER BY Order_Date)
AS Existing_Svc2
FROM MyTable
ORDER BY Order_Date;

Related

Select conversations from a SQLite chatlog

I have a SQLite table representing a chatlog. The two important columns for this question are 'content' and 'timestamp'.
I need to group the messages in the chatlog by conversations. Each message is only an individual line, so a conversation can be selected as each message joined by a new line using group_concat
group_concat(content, CHAR(10)
I want to identify a conversation by any messages which are within a length of time (such as 15 minutes) from each other. A conversation can be any length (including just an individual message, if there are no other messages within 15 minutes of it).
Knowing this, I can identify whether a message is the start or part of a conversation as
WHEN timestamp - LAG(timestamp, 1, timestamp) OVER (ORDER BY timestamp) < 900
But this is as far as I've gotten. I can make a column 'is_new_convo' using
WITH ordered_messages AS (
SELECT content, timestamp
FROM messages
ORDER BY timestamp
), conversations_identified AS (
SELECT *,
CASE
WHEN timestamp - LAG(timestamp, 1, timestamp) OVER (ORDER BY timestamp) < 900
THEN 0
ELSE 1
END AS is_new_convo
FROM ordered_messages
) SELECT * FROM conversations_identified
How can I then form a group of messages from where is_new_convo = 1 to the last subsequent is_new_convo = 0?
Here is some sample data and the expected result.
If you take the sum of the is_new_convo column from the start to a certain row, you get the number of times a new conversation has been formed, resulting in an ID that is unique for all messages in a conversation (since is_new_convo is 0 for messages continuing a conversation, they result in the same conversation ID). Using this, we can find the conversation ID for all messages, then group them together for group_concat. This doesn't require referencing the original table multiple times, so the 'WITH' clauses aren't needed.
SELECT group_concat(content, CHAR(10)) as conversation
FROM (
SELECT content, timestamp,
SUM(is_new_convo) OVER (ORDER BY timestamp) as conversation_id
FROM (
SELECT content, timestamp,
CASE
WHEN timestamp - LAG(timestamp, 1, timestamp) OVER (ORDER BY timestamp) < 900
THEN 0
ELSE 1
END AS is_new_convo
FROM messages
)
) GROUP BY conversation_id

android room database Dao two queries?

How do I return the results of two queries using one #Query statement?
I have a database of items with a single table. Each item has a due date (saved as a long in the Room database) or no due date (saved as -1 in the database). I would like to have a query that returns all items with due dates in ascending order and then return all of the remaining items, sorted by a timestamp that is saved in the database. The timestamp represents the calendar date and time when the item was originally saved to the Room database.
Here is an example of the output I expect, using a U.S. calendar for the due dates:
8/17/2022 (August 17, 2022 due date)
8/19/2022 (due date)
12/15/2022 (due date)
5601 timestamp (no due date)
4200 timestamp (no due date)
1150 timestamp (no due date)
The below query in the Dao returns the expected results of the first part of the query, the ascending due dates. So how do I append the below query with the second part where I also return the items that have no due dates and show their timestamps in descending order? I tried multiple ways to use UNION, UNION ALL, etc. with no luck.
#Query("SELECT * FROM cards WHERE cardDuedatentime !=-1 ORDER BY cardDuedatentime ASC")
First sort by the boolean expression cardDuedatentime = -1 to get all the rows with no due date at the bottom of the resultset.
Then use conditional sorting with a CASE expression to sort the rows with no due date descending and the rows with a valid due date ascending:
SELECT *
FROM cards
ORDER BY cardDuedatentime = -1,
CASE WHEN cardDuedatentime = -1 THEN -timestamp ELSE cardDuedatentime END;
If you want only 1 column in the results:
SELECT CASE WHEN cardDuedatentime = -1 THEN timestamp ELSE cardDuedatentime END time
FROM cards
ORDER BY cardDuedatentime = -1,
CASE WHEN cardDuedatentime = -1 THEN -timestamp ELSE cardDuedatentime END;
See the demo.
If I understand you questions correctly then I believe that you could use:-
#Query("WITH cte1 AS (SELECT * FROM cards WHERE cardDueDatentime != -1 ORDER BY cardDueDatentime ASC),cte2 AS (SELECT * FROM cards WHERE cardDueDatentime = -1 ORDER BY timestamp ASC) SELECT * FROM cte1 UNION ALL SELECT * FROM cte2;")
The following was used to test/demonstrate:-
DROP TABLE IF EXISTS cards;
CREATE TABLE IF NOT EXISTS cards (cardDueDatentime INTEGER,timestamp INTEGER, othercolumns TEXT);
INSERT INTO cards VALUES
(strftime('%s','2022-08-17'),strftime('%s','now'),'A')
,(-1,5601,'A')
,(strftime('%s','2022-08-11'),strftime('%s','now'),'A')
,(-1,4201,'A')
,(strftime('%s','2022-12-15'),strftime('%s','now'),'A')
,(-1,1150,'A')
;
WITH
cte1 AS (SELECT * FROM cards WHERE cardDueDatentime != -1 ORDER BY cardDueDatentime ASC),
cte2 AS (SELECT * FROM cards WHERE cardDueDatentime = -1 ORDER BY timestamp ASC)
SELECT * FROM cte1 UNION ALL SELECT * FROM cte2;
DROP TABLE IF EXISTS cards;
The result from the executing the above (using the Navicat for SQLite tool):-
Two CTE's (Common Table Expressions, aka temporary tables) were used, each to extract one of the sets of data importantly sorting them independently. They are then combined via the UNION and not sorted (as the sort affects the complete set of data).
Note how the data has purposefully been inserted so that they are not appropriately sorted.
Here's an SQLFiddle of the above
An even simpler way would be to use:-
#Query("SELECT * FROM cards ORDER BY cardDueDatentime=-1 ASC,cardDueDatentime ASC, timestamp ASC;")
Which using the data above results in the same. This works because cardDueDatentime=-1 will equate to either true (1) or false (0). Therefore -1 will equate to 1 and a valid datetime will equate to 0, so the valid datetimes precede the invalid (-1) datetimes. Then the subsequent sort fields sort each set accordingly.
If you wanted any invalid date (les than 0) then you could use something like:-
#Query("SELECT * FROM cards ORDER BY cardDueDatentime<0 ASC,max(CAST(cardDueDatentime AS INTEGER),0) ASC, timestamp ASC;")
So if you had additional rows inserted such as :-
....
,(-2,111,'B')
,(-3,11,'C')
,(-1,1,'X')
The the result would be:-
Whilst with the first simpler SELECT, with the additional data, the result (WRONG) would be :-
i.e. for the C row as -3 is not equal to -1 then it will be as if it were a valid date,
so < 0 treats it as an invalid date so it is include in the set of invalid dates;
However, with < 0 allowing the -3 to in the invalid date set, the second sort, on the cardDueDatentime would place -3 before the -2 and before the -1 so the max function will for values less than -1 make them -1 and hence -3 becomes -1 (as with all the other invalid dates) so the third sort field is then the applicable sort field within the set of invalid dates.
this could be useful if you for some reason wanted to have different sets/types of invalid dates but not affect the query.

PL SQL SELECT Case Statements involving aggregate values

I'm trying to write a query that in Teradata but I'm not sure how to do it; my table looks like this:
col1: text (account_number)
col2: text (secondary account number)
col3: text (Primary_cust)
the business requirements are:
"Group records by account number.
If there is only one record for an account then keep that record.
If there are multiple records for an account number then:
(1) if only one record has Primary_CUST = 'Y' then keep.
(2) if multiple records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR
(3) If no records have Primary_CUST = 'Y' then keep one with lowest SCDRY_ACCT_NBR.
I know I need a CASE statement and I'm able to write the first requirement, but not sure on the second. Any help would be greatly appreciated.
You just have to think about how to order the rows to get the row you want on top, seems to be like this:
SELECT * FROM tab
QUALIFY
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR -- lowest number
) = 1 -- return the top row
Of course QUALIFY is proprietary Teradata syntax, if you need to do this on Oracle you have to wrap it in a Derived Table:
SELECT *
FROM
(
SELECT t.*,
Row_Number()
Over (PARTITION BY account_number -- for each account
ORDER BY Primary_CUST DESC -- 'Y' before 'N' (assuming it's a Y/N column)
,SCDRY_ACCT_NBR) AS rn-- lowest number
FROM tab
) AS dt
WHERE rn = 1 -- return the top row

Fastest Way to Count Distinct Values in a Column, Including NULL Values

The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.
This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.
This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).
What is the most efficient way to do this?
Update After Testing
I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest
Count/GroupBy (Cheran)
CountDistinct+SubQuery (Ellis)
dense_rank (Eriksson)
Count+Max (Andriy)
Testing Results (in seconds):
Method 20_Columns 5_Columns 1_Column (Large) 1_Column (Small)
1) Count/GroupBy 10.8 4.8 2.8 0.14
2) CountDistinct 12.4 4.8 3 0.7
3) dense_rank 226 30 6 4.33
4) Count+Max 98.5 44 16 12.5
Notes:
Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.
Thanks for the help and suggestions!
SELECT COUNT(*)
FROM (SELECT ColumnName
FROM TableName
GROUP BY ColumnName) AS s;
GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.
I think you should try to keep the number of table scans down and count all columns in one table in one go. Something like this could be worth trying.
;with C as
(
select dense_rank() over(order by Col1) as dnCol1,
dense_rank() over(order by Col2) as dnCol2
from YourTable
)
select max(dnCol1) as CountCol1,
max(dnCol2) as CountCol2
from C
Test the query at SE-Data
A development on OP's own solution:
SELECT
COUNT(DISTINCT acolumn) + MAX(CASE WHEN acolumn IS NULL THEN 1 ELSE 0 END)
FROM atable
Run one query that Counts the number of Distinct values and adds 1 if there are any NULLs in the column (using a subquery)
Select Count(Distinct COLUMNNAME) +
Case When Exists
(Select * from TABLENAME Where COLUMNNAME is Null)
Then 1 Else 0 End
From TABLENAME
You can try:
count(
distinct coalesce(
your_table.column_1, your_table.column_2
-- cast them if you want replace value from column are not same type
)
) as COUNT_TEST
Function coalesce help you combine two columns with replace not null values.
I used this in mine case and success with correctly result.
Not sure this would be the fastest but might be worth testing. Use case to give null a value. Clearly you would need to select a value for null that would not occur in the real data. According to the query plan this would be a dead heat with the count(*) (group by) solution proposed by Cheran S.
SELECT
COUNT( distinct
(case when [testNull] is null then 'dbNullValue' else [testNull] end)
)
FROM [test].[dbo].[testNullVal]
With this approach can also count more than one column
SELECT
COUNT( distinct
(case when [testNull1] is null then 'dbNullValue' else [testNull1] end)
),
COUNT( distinct
(case when [testNull2] is null then 'dbNullValue' else [testNull2] end)
)
FROM [test].[dbo].[testNullVal]

Conditional pl/sql output

I need to get a value 1 or 0 from DB query which in its turn should do next:
get some field value from a table
compare that field value to some literal (defined in query itself)
if value will not match the literal and query is executed in certain time period (i.e. from 9:00 AM to 10:00 AM) it should return 0, else 1
it includes multiple result sets (rows) response (see further)
So far I have next thing:
select instr(field, 'literal') from table_name where trunc(time) = trunc(sysdate)
which returns 1 if field from table table_name contains 'literal' (where clause checks if truncated time in table_name is equal to truncated system time).
What I can't get is how I can:
introduce a time constraint (basically, if its from 9:00 AM to 10:00 AM always return 1)
handle several response rows, meaning that if any of the response rows will return 1 then I need only 1 row with 1 value in it
Thanks in advance.
P.S.: Please comment on the question if something is left vague.
It sounds like you want a CASE statement. It would be helpful if you posted the DDL to create the table, some DML to populate the data, and the expected output. You seem to have conflicting requirements about what you want returned if the query is run between 9 and 10:00 am. You say "if ... query is executed in certain time period ... it should return 0, else 1" initially but then later you say "if its from 9:00 AM to 10:00 AM always return 1"). My guess is that you want something like
SELECT MAX(zero_or_one)
FROM (
SELECT (CASE WHEN to_char( sysdate, 'HH24' ) = '09'
THEN 1
WHEN instr( column_name, 'literal' ) > 0
THEN 1
ELSE 0
END) zero_or_one
FROM table_name
WHERE trunc(date_column) = trunc(sysdate)
)

Resources