Split a row into multiple rows - Teradata - teradata

Below is an example of my table
Names Start_Date Orders Items
AAA 2020-01-01 300 100
BAA 2020-02-01 896 448
My requirement would be as below
Names Start_Date Orders
AAA 2020-01-01 100
AAA 2020-01-01 100
AAA 2020-01-01 100
BBB 2020-02-01 448
BBB 2020-02-01 448
The rows should be split based on the (Orders/Items) value

This is a nice task for Teradata's SQL extension to create time series (based on #Andrew's test data):
SELECT *
FROM vt_foo
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
For an exact split of orders into items:
SELECT dt.*,
CASE WHEN items * (end_date - start_date) > orders
THEN orders MOD items
ELSE items
end
FROM
(
SELECT t.*, End(pd) AS end_date
FROM vt_foo AS t
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
) AS dt

This calls for a recursive CTE. Here's how I'd approach it, with a lovely volatile table for some sample data.
create volatile table vt_foo
(names varchar(100), start_date date, orders int, items int)
on commit preserve rows;
insert into vt_foo values ('AAA','2020-01-01',300,100);
insert into vt_foo values ('BAA','2020-02-01',896,448);
insert into vt_foo values ('CCC','2020-03-01',525,100); -
with recursive cte (names, start_date,items, num, counter) as (
select
names,
start_date,
items,
round(orders /( items * 1.0) ) as num ,
1 as counter
from vt_foo
UNION ALL
select
a.names,
a.start_date,
a.items,
b.num,
b.counter + 1
from vt_foo a
inner join cte b
on a.names = b.names
and a.start_date =b.start_date
where b.counter + 1 <= b.num
)
select * from cte
order by names,start_date
This bit: b.counter + 1 <= b.num is the key to limiting the output to the proper # of rows per product/date.
I think this should be ok, but test it with small volumes of data.

Related

Select Rows by Consecutive Dates in SQLite

I have a table with data like below:
Log Table:
User Id
Login Date
1
2022-01-03
1
2022-01-04
1
2022-01-10
1
2022-01-11
1
2022-01-12
1
2022-01-23
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
What I'm trying to do is to create a query that return rows of the latest logins by consecutive dates with var_date as parameter.
If var_date is 2022-01-29, then the result is:
User Id
Login Date
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
If var_date is 2022-01-30, then no result is returned, since 2022-01-29 is not in the table.
If var_date is 2022-01-24, then the query will return row with 2022-01-23 as login date.
How am I to do this in SQLite?
Thank you.
This question is a variant of gaps and islands, with the islands being clusters of records per user with continuous dates. Here is one approach using analytic functions:
WITH cte AS (
SELECT *, CASE WHEN julianday(LoginDate) -
julianday(LAG(LoginDate) OVER (PARTITION BY UserID
ORDER BY LoginDate))
> 1 THEN 1 ELSE 0 END AS counter
FROM yourTable
),
cte2 AS (
SELECT *, SUM(counter) OVER (PARTITION BY UserID ORDER BY LoginDate) AS grp
FROM cte
)
SELECT UserID, LoginDate
FROM cte2 t1
WHERE LoginDate < '2022-01-29' AND
grp = (SELECT t2.grp FROM cte2 t2
WHERE t2.UserID = t1.UserID AND t2.LoginDate = '2022-01-28');
Demo
The two CTEs generate a pseudo date group for each cluster per user. The final query returns all records less than the target date for which the group value is the same as the immediately preceding date. Hence, for dates having no immediate record for a given user, the query will return empty set.
Use a recursive CTE:
WITH cte(UserId, LoginDate) AS (
SELECT :var_user_id, :var_date
UNION ALL
SELECT UserId, date(c.LoginDate, '-1 day')
FROM cte c
WHERE EXISTS (SELECT 1 FROM tablename t WHERE t.UserId = c.UserId AND t.LoginDate = date(c.LoginDate, '-1 day'))
)
SELECT *
FROM cte
WHERE LoginDate < (SELECT MAX(LoginDate) FROM cte);
Change :var_user_id and :var_date to the values that you want for the user's id and the date.
See the demo.

SQLITE get next row after ORDERBY

I need to get the next row from an ORDERBY query
I have 2 columns, ID(Primary key), Age(float) in a table T and I need something like the following
SELECT ID FROM T WHERE !> (inputted ID) + 1 rowID/Next row <! ORDERBY Age (then primary key, but I suspect if the Age values are the same SQLite would default to order by primary key anyway) LIMIT 1
Essentially it would select the next row after the inputted ID in the ordered table, its the next row / rowID + 1 I am not sure how to get.
As suggested here is a data set as an example
https://dbfiddle.uk?rdbms=sqlite_3.27&fiddle=19685ac20cc42041a59d318a01a2010f
ID Age
1 12.2
2 36.8
3 22.5
4 41
5 16.7
I am attempting to get the the following row from the ordered (by age) list given a specific ID
ID Age
1 12.2
5 16.7
3 22.5
2 36.8
4 41
Something similar to
SELECT ID FROM OrderedInfo WHERE ID = 5 ORDER BY Age ASC LIMIT 1 OFFSET 1;
My expected result would be '3' from the example data above
I have expanded the data set to include duplicate entries as I didn't implicitly state it could have such data - as such forpas answer works for the first example with no duplicate entries - thanks for your help
https://dbfiddle.uk?rdbms=sqlite_3.27&fiddle=f13d7f5a44ba414784547d9bbdf4997e
Use a subquery for the ID that you want in the WHERE clause:
SELECT *
FROM OrderedInfo
WHERE Age > (SELECT Age FROM OrderedInfo WHERE ID = 5)
ORDER BY Age LIMIT 1;
See the demo.
If there are duplicate values in the column Age use a CTE that returns the row that you want and join it to the table so that you expand the conditions:
WITH cte AS (SELECT ID, Age FROM OrderedInfo WHERE ID = 5)
SELECT o.*
FROM OrderedInfo o INNER JOIN cte c
ON o.Age > c.Age OR (o.Age = c.Age AND o.ID > c.ID)
ORDER BY o.Age, o.ID LIMIT 1;
See the demo.

MariaDB running total up to N and rows NOT included in its calculation

I have a table which amongst other columns has amt and created(timestamp).
I'm trying to calculate the running total of amt up to N
Get all the rows not included in the calculation leading to the sum up to N
I'm doing this in code but was wondering if there was a way to get these with SQL and ideally in one query.
Looking around and it's easy to find examples of calculating the running total like
https://stackoverflow.com/a/1290936/400048 but less so to find running total up N and then only actually return rows not involved in calculating N.
You can use the window version of the SUM aggregate function to get the running total for each row.
CREATE TABLE TEST (ID BIGINT PRIMARY KEY, AMT INT, CREATED TIMESTAMP);
INSERT INTO TEST VALUES
(1, 1, TIMESTAMP '2000-01-01 00:00:00'),
(2, 2, TIMESTAMP '2000-01-02 00:00:00'),
(3, 1, TIMESTAMP '2000-01-03 00:00:00'),
(4, 3, TIMESTAMP '2000-01-04 00:00:00'),
(5, 5, TIMESTAMP '2000-01-05 00:00:00'),
(6, 1, TIMESTAMP '2000-01-07 00:00:00');
SELECT ID, AMT, SUM(AMT) OVER (ORDER BY CREATED) RT, CREATED FROM TEST ORDER BY CREATED;
> ID AMT RT CREATED
> -- --- -- -------------------
> 1 1 1 2000-01-01 00:00:00
> 2 2 3 2000-01-02 00:00:00
> 3 1 4 2000-01-03 00:00:00
> 4 3 7 2000-01-04 00:00:00
> 5 5 12 2000-01-05 00:00:00
> 6 1 13 2000-01-07 00:00:00
Then you can use a non-standard QUALIFY clause in H2 or a subquery (in both MariaDB and H2) to filter out rows below the limit.
If N is a running total limit and by “rows not included in the calculation” you mean rows above the limit, the queries will look like these:
-- Simple non-standard query for H2
SELECT ID, AMT, SUM(AMT) OVER (ORDER BY CREATED) RT, CREATED FROM TEST
QUALIFY RT > 10 ORDER BY CREATED;
-- Equivalent standard query with subquery for MariaDB, H2, and many others
SELECT * FROM (
SELECT ID, AMT, SUM(AMT) OVER (ORDER BY CREATED) RT, CREATED FROM TEST
) T WHERE RT > 10 ORDER BY CREATED;
> ID AMT RT CREATED
> -- --- -- -------------------
> 5 5 12 2000-01-05 00:00:00
> 6 1 13 2000-01-07 00:00:00
RT - AMT in the first row here is a running total of all previous rows. You can select it separately, if you wish:
-- Non-standard query for H2
SELECT SUM(AMT) OVER (ORDER BY CREATED) RT FROM TEST
QUALIFY RT < 10 ORDER BY CREATED DESC FETCH FIRST ROW ONLY;
-- Non-standard query for MariaDB or H2
SELECT RT FROM (
SELECT ID, AMT, SUM(AMT) OVER (ORDER BY CREATED) RT, CREATED FROM TEST
) T WHERE RT < 10 ORDER BY CREATED DESC LIMIT 1;
-- Standard query for H2 and others (but not for MariaDB)
SELECT RT FROM (
SELECT ID, AMT, SUM(AMT) OVER (ORDER BY CREATED) RT, CREATED FROM TEST
) T WHERE RT < 10 ORDER BY CREATED DESC FETCH FIRST ROW ONLY;
> RT
> --
> 7
If you meant something else, the QUALIFY or WHERE criteria will be different.

Teradata - Case - PARTITION

i have the following query that brings for each party_id all the business_party_id related, their monthly accumulated value for each BP and the rank for that value. there s a special business_party_id = 200 that when a given party_id has more than 1 related business_party and that business_party_ ranks 1 while being 200 then it shouldn t be considered and the rank 2 for that party_id should be the one considered as rank 1 and move the 200 to rank 2.
how can i do that?
here s my query
select party_id,
business_party_id,
monthly_accum_event_amt,
Row_number() OVER (PARTITION BY party_id ORDER BY monthly_accum_event_amt desc) orden
from (
select * from P_DMT_VIEWS.BUSINESS_AGREEMENT_PAYROLL
where LAST_DAY( payroll_dt) = '2018-05-31'
and not business_payrroll_concept_val in ('SUE004','SUE005','SUE00A')
and monthly_event_qty<>0
QUALIFY Row_number() OVER (PARTITION BY business_party_id,party_id ORDER BY monthly_event_qty desc ) =1) a

SQLite count(*) in while clause

I have a calendar table in which there are all the dates in the future and a workday field:
fld_date / fld_workday
2014-01-01 / 1
2014-01-02 / 1
2014-01-03 / 0
...
I want select a date which are n workday far from another date. I tried two ways, but i failed:
The 5th workday from 2014-11-07:
1.
SELECT n1.fld_date FROM calendar as n1 WHERE n1.fld_workday=1 AND
(select count(*) FROM calendar as n2 WHERE n2.fld_date>='2014-11-07' AND n2.fld_workday=1)=5
It gave back 0 row.
2.
SELECT fld_date FROM calendar WHERE fld_date>='2014-11-07' AND fld_workday=1 LIMIT 1 OFFSET 5
It's ok, but i would like to change the 5 days constant to a field, and it's cannot (it would be inside a bigger select statement):
SELECT fld_date FROM calendar WHERE fld_date>='2014-11-07' AND fld_workday=1 LIMIT 1 OFFSET fld_another_field
Any suggestion?
In the first query, the subquery does not refer to the row in n1.
You need a correlated subquery:
SELECT fld_Date
FROM Calendar AS n1
WHERE fld_WorkDay = 1
AND (SELECT COUNT(*)
FROM Calendar AS n2
WHERE fld_Date BETWEEN '2014-11-07' AND n1.fld_Date
AND fld_WorkDay = 1
) = 5
LIMIT 1
The subquery is extremly inefficient if there is no index on the fld_Date column.
You can avoid executing the subquery for every row in n1 by adding another condition with an estimate of the result date (assuming that there are between about four to five work days per week, and using a few extra days to be sure):
...
WHERE fldDate BETWEEN date('2014-11-07', (5 * 4/7 - 10) || ' days')
AND date('2014-11-07', (5 * 5/7 + 10) || ' days')
AND fldWorkDay = 1
AND (SELECT ...

Resources