Lead and Lag in Teradata Query - For category - teradata

I am writing a query to get data as SCD 2 type from a data dump.
My data and code are as follows:
create table promotions
(
start_date date,
end_date date,
promotion_name varchar(50));
Insert statements to populate the table:
insert into promotions values ('9/1/2017','9/2/2017','P1');
insert into promotions values ('9/2/2017','9/3/2017','P1');
insert into promotions values ('9/3/2017','9/4/2017','P1');
insert into promotions values ('9/4/2017','9/5/2017','P1');
insert into promotions values ('9/5/2017','9/6/2017','P2');
insert into promotions values ('9/6/2017','9/7/2017','P2');
insert into promotions values ('9/7/2017','9/8/2017','P2');
insert into promotions values ('9/8/2017','9/9/2017','P2');
insert into promotions values ('9/9/2017','9/10/2017','P2');
insert into promotions values ('9/10/2017','9/11/2017','P2');
insert into promotions values ('9/11/2017','9/12/2017','P3');
insert into promotions values ('9/12/2017','9/13/2017','P3');
insert into promotions values ('9/13/2017','9/14/2017','P3');
insert into promotions values ('9/14/2017','9/15/2017','P3');
Expected Result:
Date_Start Date_End Promotion Name
9/1/2017 9/4/2017 P1
9/5/2017 9/10/2017 P2
9/11/2017 9/13/2017 P3
Query I have Written:
with cte as (select rank() over (partition by promotion_name order by start_date asc) as "Rank"
,start_date
,dateadd(day,-1,start_date) as EndDate
,promotion_name
--first_name, last_name
from dbo.promotions)
select * from cte where rank=1;
Output of Query
start_date EndDate promotion_name
2017-09-01 2017-08-31 P1
2017-09-05 2017-09-04 P2
2017-09-11 2017-09-10 P3
Issue with the above query is that EndDate is getting displayed in the Wrong way,
when compared to the output table above.
On SQL server lead and lag functions solve this issue, but On TERADATA im not able to get the equivalent for lead /lag function.
How should I go about it. I dont want to create any volatile / Temp tables, its just a plain query for ETL.

LAG and LEAD are just shorter syntax, you can rewrite it like this:
LAG(col1, n) OVER (PARTITION BY ... ORDER BY col2)
=
MIN(col1) OVER (PARTITION BY ... ORDER BY col2
ROWS BETWEEN n PRECEDING AND n PRECEDING), 0)
LEAD(col1, n) OVER (PARTITION BY ... ORDER BY col2)
=
MIN(col1) OVER (PARTITION BY ... ORDER BY col2
ROWS BETWEEN n FOLLOWING AND n FOLLOWING), 0)
To get a default value simply use COALESCE:
LAG(col1, n, default) OVER (PARTITION BY ... ORDER BY col2)
=
COALESCE(MIN(col1) OVER (PARTITION BY ... ORDER BY col2
ROWS BETWEEN n PRECEDING AND n PRECEDING), 0)
,default)

Related

SQLite: Running balance with an ending balance

I have an ending balance of $5000. I need to create a running balance, but adjust the first row to show the ending balance then sum the rest, so it will look like a bank statement. Here is what I have for the running balance but how can I adjust row 1 to not show a sum of the first row, but the ending balance instead.
with BalBefore as (
select *
from transactions
where ACCT_NAME = 'Real Solutions'
ORDER BY DATE DESC
)
select
DATE,
amount,
'$' || printf("%.2f", sum(AMOUNT) over (order by ROW_ID)) as Balance
from BalBefore;
This gives me"
DATE AMOUNT BALANCE
9/6/2019 -31.00 $-31.00 <- I need this balance to be replaced with $5000 and have the rest
9/4/2019 15.00 $-16.00 sum as normal.
9/4/2019 15.00 $-1.00
9/3/2019 -16.00 $-17.00
I have read many other questions, but I couldn't find one that I could understand so I thought I would post a simpler question.
The following is not short and sweet, but using the WITH statement and CTEs, I hope that the logic is apparent. Multiple CTEs are defined which refer to each other to make the overall query more readable. Altogether the goal was just to add a beginning balance record that could be :
/*
DROP TABLE IF EXISTS data;
CREATE temp TABLE data (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
date DATETIME NOT NULL,
amount NUMERIC NOT NULL
);
INSERT INTO data
(date, amount)
VALUES
('2019-09-03', -16.00),
('2019-09-04', 15.00),
('2019-09-04', 15.00),
('2019-09-06', -31.00)
;
*/
WITH
initial_filter AS (
SELECT id, date, amount
FROM data
--WHERE ACCT_NAME = 'Real Solutions'
),
prepared AS (
SELECT *
FROM initial_filter
UNION ALL
SELECT
9223372036854775807 as id, --largest signed integer
(SELECT MAX(date) FROM initial_filter) AS FinalDate,
-(5000.00) --ending balance (negated for summing algorithm)
),
running AS (
SELECT
id,
date,
amount,
SUM(-amount) OVER
(ORDER BY date DESC, id DESC
RANGE UNBOUNDED PRECEDING
EXCLUDE CURRENT ROW) AS balance
FROM prepared
ORDER BY date DESC, id DESC
)
SELECT *
FROM running
WHERE id != 9223372036854775807
ORDER BY date DESC, id DESC;
This produces the following
id date amount balance
4 2019-09-06 -31.00 5000
3 2019-09-04 15.00 5031
2 2019-09-04 15.00 5016
1 2019-09-03 -16.00 5001
UPDATE: The first query was not producing the correct balances. The beginning balance row and the windowing function (i.e. OVER clause) were updated to accurately sum over the correct amounts.
Note: The balance on each row is determined completely from the previous rows, not from the current row's amount, because this works backward from an ending balance, not forward from the previous row balance.

SQLite Nested Query for maximum

I'm trying to use DB Browser for SQLite to construct a nested query to determine the SECOND highest priced item purchased by the top 10 spenders. The query I have to pick out the top 10 spenders is:
SELECT user_id, max(item_total), SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders AS o
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10
This gives the user_id, most expensive item they purchased (not counting shipping or discounts) as well as the total amount they spent on the site.
I was trying to use a nested query to generate a list of the second most expensive items they purchased, but keep getting errors. I've tried
SELECT user_id, MAX(item_total) AS second_highest
FROM orders
WHERE item_total < (SELECT user_id, SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10)
group by user_id
I keep getting a row value misused error. Does anyone have pointers on this nested query or know of another way to find the second highest item purchased from within the group found in the first query?
Thanks!
(Note: The following assumes you're using Sqlite 3.25 or newer since it uses window functions).
This will return the second-largest item_total for each user_id without duplicates:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders)
SELECT user_id, item_total FROM ranked WHERE ranking = 2;
You can combine it with your original query with something like:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders),
totals AS
(SELECT user_id
, sum (item_total + shipping_cost - discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = 0
GROUP BY user_id)
SELECT t.user_id, r.item_total, t.total_spent
FROM totals AS t
JOIN ranked AS r ON t.user_id = r.user_id
WHERE r.ranking = 2
ORDER BY t.total_spent DESC, t.user_id
LIMIT 10;
Okay, after fixing your table definition to better reflect the values being stored in it and the stated problem, and fixing the data and adding to it so you can actually get results, plus an optional but useful index like so:
CREATE TABLE orders (order_id INTEGER PRIMARY KEY
, user_id INTEGER
, item_total REAL
, shipping_cost NUMERIC
, discounts_applied NUMERIC
, payment_reject INTEGER);
INSERT INTO orders(user_id, item_total, shipping_cost, discounts_applied
, payment_reject) VALUES (9852,60.69,10,0,FALSE),
(2784,123.91,15,0,FALSE), (1619,119.75,15,0,FALSE), (9725,151.92,15,0,FALSE),
(8892,153.27,15,0,FALSE), (7105,156.86,25,0,FALSE), (4345,136.09,15,0,FALSE),
(7779,134.93,15,0,FALSE), (3874,157.27,15,0,FALSE), (5102,108.3,10,0,FALSE),
(3098,59.97,10,0,FALSE), (6584,124.92,15,0,FALSE), (5136,111.06,10,0,FALSE),
(1869,113.44,20,0,FALSE), (3830,129.63,15,0,FALSE), (9852,70.69,10,0,FALSE),
(2784,134.91,15,0,FALSE), (1619,129.75,15,0,FALSE), (9725,161.92,15,0,FALSE),
(8892,163.27,15,0,FALSE), (7105,166.86,25,0,FALSE), (4345,146.09,15,0,FALSE),
(7779,144.93,15,0,FALSE), (3874,167.27,15,0,FALSE), (5102,118.3,10,0,FALSE),
(3098,69.97,10,0,FALSE), (6584,134.92,15,0,FALSE), (5136,121.06,10,0,FALSE),
(1869,123.44,20,0,FALSE), (3830,139.63,15,0,FALSE);
CREATE INDEX orders_idx_1 ON orders(user_id, item_total DESC);
the above query will give:
user_id item_total total_spent
---------- ---------- -----------
7105 156.86 373.72
3874 157.27 354.54
8892 153.27 346.54
9725 151.92 343.84
4345 136.09 312.18
7779 134.93 309.86
3830 129.63 299.26
6584 124.92 289.84
2784 123.91 288.82
1619 119.75 279.5
(If you get a syntax error from the query now, it's because you're using an old version of sqlite that doesn't support window functions.)

Transpose rows to columns in SQLite

I have data like this:
I am trying to transform it to this (using SQLite). In the desired result, within each id, each start should be on the same row as the chronologically closest end. If an id has a start but no end (like id=4), then the corresponding end, will be empty (as shown below).
I have tried this
select
id,
max( case when start_end = "start" then date end) as start,
max(case when start_end = "end" then date end ) as end
from df
group by id
But the result is this, which is wrong because id=5 only have one row, when it should have two:
id start end
1 2 1994-05-01 1996-11-04
2 4 1979-07-18 <NA>
3 5 2010-10-01 2012-10-06
Any help is much appreciated
CREATE TABLE mytable(
id INTEGER NOT NULL PRIMARY KEY
,start_end VARCHAR(5) NOT NULL
,date DATE NOT NULL
);
INSERT INTO mytable(id,start_end,date) VALUES (2,'start','1994-05-01');
INSERT INTO mytable(id,start_end,date) VALUES (2,'end','1996-11-04');
INSERT INTO mytable(id,start_end,date) VALUES (4,'start','1979-07-18');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2005-02-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2009-09-17');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2010-10-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2012-10-06');
select
s.id as id,
s.date as 'start',
min(e.date) as 'end' -- earliest end date from "same id&start"
from
-- only start dates
(select id, date
from intable
where start_end='start'
) as s
left join -- keep the start-only lines
-- only end dates
(select id, date
from intable
where start_end='end'
) as e
on s.id = e.id
and s.date < e.date -- not too early
group by s.id, s.date -- "same id&start"
order by s.id, s.date; -- ensure sequence
Left join (to keep the start-only line for id "4") two on-the-fly tables, start dates and end dates.
Take the minimal end date which is just higher than start date (same id, using min()and group by.
Order by id, then start date.
I tested this on a test table which is similar to your dump, but has no "NOT NULL" and no "PRIMARY KEY". I guess for this test table that is irrelevant; otherwise explain the effect, please.
Note:
Internally three pairs of dates for id 5 (those that match end>start) are found, but only those are forwarded with the lowest end (min(end)) for each of the two different combinations of ID and start group by ID, start. The line where end>start but end not being the minimum is therefor not returned. That makes two lines with start/end pairs as desired.
Output (with .headers on):
id|start|end
2|1994-05-01|1996-11-04
4|1979-07-18|
5|2005-02-01|2009-09-17
5|2010-10-01|2012-10-06
UPDATE: Incorporate helpful comments by #MatBailie.
Thank you! This is exactly what I needed to do, only with a few changes:
SELECT
s.value AS 'url',
"AVGDATE" AS 'fieldname',
sum(e.value)/count(*) AS 'value'
FROM
(SELECT url, value
FROM quicktag
WHERE fieldname='NAME'
) AS s
LEFT JOIN
(SELECT url, substr(value,1,4) AS value
FROM quicktag
WHERE fieldname='DATE'
) AS e
ON s.url = e.url
WHERE e.value != ""
GROUP BY s.value;
I had a table like this:
url fieldname value
---------- ---------- ----------
1000052801 NAME Thomas
1000052801 DATE 2007
1000131579 NAME Morten
1000131579 DATE 2005
1000131929 NAME Tanja
1000131929 DATE 2014
1000158449 NAME Knud
1000158449 DATE 2007
1000158450 NAME Thomas
1000158450 DATE 2003
I needed to correlate NAME and DATE in columns based on url as a key, and generate a field with average DATE grouped by multiple NAME fields.
So my result looks like this:
url fieldname value
---------- ---------- ----------
Thomas AVGDATE 2005
Morten AVGDATE 2005
Tanja AVGDATE 2014
Knud AVGDATE 2007
Unfortunately I not have enough posts to make my vote count yet.

Teradata First and Last Value

How to take first value and last value of a column with group by on a particuar column?
For eg:i need first_value and last_value of case_owner column based on group by og case id.
For first value:
select case_owner as case_owner_first_value
from
table
qualify row_number() over (partition by case_id order by case_id) = 1
For last value:
select case_owner as case_owner_last_value
from
table
qualify row_number() over (partition by case_id order by case_id desc ) = 1
Please note, while combining the FIRST_VALUE with ORDER BY clause, you may need to add rows between.
Example:
CREATE VOLATILE TABLE test_fv (id INTEGER,seq SMALLINT) ON COMMIT PRESERVE ROWS;
INSERT INTO test_fv VALUES (NULL,1);
INSERT INTO test_fv VALUES (2,2);
INSERT INTO test_fv VALUES (3,3);
INSERT INTO test_fv VALUES (4,4);
INSERT INTO test_fv VALUES (5,5);
SELECT seq, FIRST_VALUE(id ignore NULLS) OVER (ORDER BY seq ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM test_fv;
This wont work:
SELECT seq, FIRST_VALUE(id ignore NULLS) OVER (ORDER BY seq ASC) FROM test_fv;

Time Difference between query result rows in SQLite: How To?

Consider the following reviews table contents:
CustomerName ReviewDT
Doe,John 2011-06-20 10:13:24
Doe,John 2011-06-20 10:54:45
Doe,John 2011-06-20 11:36:34
Doe,Janie 2011-06-20 05:15:12
The results are ordered by ReviewDT and grouped by CustomerName, such as:
SELECT
CustomerName,
ReviewDT
FROM
Reviews
WHERE
CustomerName NOT NULL
ORDER BY CustomerName ASC, ReviewDT ASC;
I'd like to create a column of the time difference between each row of this query for each Customer... rowid gives the original row, and there is no pattern to the inclusion from the rowid etc...
For the 1st entry for a CustomerName, the value would be 0. I am asking here incase this is something that can be calculated as part of the original query somehow. If not, I was planning to do this by a series of queries - initially creating a new TABLE selecting the results of the query above - then ALTERING to add the new column and using UPDATE/strftime to get the time differences by using rowid-1 (somehow)...
To compute the seconds elapsed from one ReviewDT row to the next:
SELECT q.CustomerName, q.ReviewDT,
strftime('%s',q.ReviewDT)
- strftime('%s',coalesce((select r.ReviewDT from Reviews as r
where r.CustomerName = q.CustomerName
and r.ReviewDT < q.ReviewDT
order by r.ReviewDT DESC limit 1),
q.ReviewDT))
FROM Reviews as q WHERE q.CustomerName NOT NULL
ORDER BY q.CustomerName ASC, q.ReviewDT ASC;
To get the DT of each ReviewDT and its preceding CustomerName row:
SELECT q.CustomerName, q.ReviewDT,
coalesce((select r.ReviewDT from Reviews as r
where r.CustomerName = q.CustomerName
and r.ReviewDT < q.ReviewDT
order by r.ReviewDT DESC limit 1),
q.ReviewDT)
FROM Reviews as q WHERE q.CustomerName NOT NULL
ORDER BY q.CustomerName ASC, q.ReviewDT ASC;

Resources