MariaDB, merge items based on set of rules - mariadb

How to implement the following rules, (knowing A < B < C < D)
rule1 | rule2 | rule3
-----------|--------------|----------
A->B | A->B | A->C
(B+1)->C | A->C | B->C
(B+1)->D | (B+1)->D | (C+1)->D
All of them should result in A->D
on this dataset:
create table testing
(
id int,
start_dt date,
stop_dt date
);
insert into testing
values (2, date '2010-02-14', date '2010-03-22'), # R1
(3, date '2010-03-23', date '2010-04-12'), # R1
(4, date '2010-03-23', date '2010-05-14'), # R1
(5, date '2010-05-15', date '2010-06-07'), # R1
# -> 2010-02-14 | 2010-06-07
(6, date '2011-01-01', date '2011-02-02'), # R2
(7, date '2011-01-01', date '2011-03-04'), # R2
(8, date '2011-02-03', date '2011-04-04'), # R2
# -> 2011-01-01 | 2011-04-04
(14, date '2014-05-05', date '2014-06-06'), # R3
(15, date '2014-05-07', date '2014-06-06'), # R3
(16, date '2014-06-07', date '2014-12-12'); # R3
# -> 2012-05-05 | 2012-12-12
My current results:
start_dt
stop_dt
2010-02-14
2010-06-07
2011-01-01
2011-03-04 # should merge
2011-02-03
2011-04-04 # with this
2014-05-05
2014-12-12
My current approach:
If two similar start_dt exists then merge by applying min(start_dt)
If two similar stop_dt exists then merge by applying max(stop_dt)
However, this approach violet rule 2 (A->B is removed)
select min(start_dt) as start_dt,
case when count(*) = count(stop_dt) then max(stop_dt) end as stop_dt,
grp
from (select start_dt,
stop_dt,
count(flag) over (order by start_dt, stop_dt) as grp
from (select start_dt,
stop_dt,
IF(lag(stop_dt) over (order by start_dt,stop_dt) =
start_dt - interval 1 day, null, 1) as flag
from (select start_dt,
max(stop_dt) as stop_dt
from (select min(start_dt) as start_dt,
stop_dt,
if(stop_dt is null, 1, 0) as grp
from testing
group by stop_dt
order by start_dt, grp) as minAndMaxDate
group by start_dt, grp) as sdsd) with_flags) grouped
group by grp
order by grp, start_dt;
Demo
I am using 10.4.12-MariaDB
Thanks.

Related

SQL: grouping to have exact rows

Let's say there is a schema:
|date|value|
DBMS is SQLite.
I want to get N groups and calculate AVG(value) for each of them.
Sample:
2020-01-01 10:00|2.0
2020-01-01 11:00|2.0
2020-01-01 12:00|3.0
2020-01-01 13:00|10.0
2020-01-01 14:00|2.0
2020-01-01 15:00|3.0
2020-01-01 16:00|11.0
2020-01-01 17:00|2.0
2020-01-01 18:00|3.0
Result (N=3):
2020-01-01 11:00|7.0/3
2020-01-01 14:00|15.0/3
2020-01-01 17:00|16.0/3
I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
SELECT
/*AVG(*/value/*)*/,
NTILE (3) OVER (ORDER BY date) bucket
FROM
test
/*GROUP BY bucket*/
/*GROUP BY NTILE (3) OVER (ORDER BY date) bucket*/
Also dropped the test data and this query into DBFiddle.
You can use NTILE() window function to create the groups and aggregate:
SELECT
DATETIME(MIN(DATE), ((STRFTIME('%s', MAX(DATE)) - STRFTIME('%s', MIN(DATE))) / 2) || ' second') date,
ROUND(AVG(value), 2) avg_value
FROM (
SELECT *, NTILE(3) OVER (ORDER BY date) grp
FROM test
)
GROUP BY grp;
To change the number of rows in each bucket, you must change the number 3 inside the parentheses of NTILE().
See the demo.
Results:
| date | avg_value |
| ------------------- | --------- |
| 2020-01-01 11:00:00 | 2.33 |
| 2020-01-01 14:00:00 | 5 |
| 2020-01-01 17:00:00 | 5.33 |
I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
You first use NTILE to assign bucket numbers in a subquery, then group by it in an outer query.
Using sub-query
SELECT bucket
, AVG(value) AS avg_value
FROM ( SELECT value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
) x
GROUP BY bucket
ORDER BY bucket
Using WITH clause
WITH x AS (
SELECT date
, value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
)
SELECT bucket
, COUNT(*) AS bucket_size
, MIN(date) AS from_date
, MAX(date) AS to_date
, MIN(value) AS min_value
, AVG(value) AS avg_value
, MAX(value) AS max_value
, SUM(value) AS sum_value
FROM x
GROUP BY bucket
ORDER BY bucket

Split a row into multiple rows - Teradata

Below is an example of my table
Names Start_Date Orders Items
AAA 2020-01-01 300 100
BAA 2020-02-01 896 448
My requirement would be as below
Names Start_Date Orders
AAA 2020-01-01 100
AAA 2020-01-01 100
AAA 2020-01-01 100
BBB 2020-02-01 448
BBB 2020-02-01 448
The rows should be split based on the (Orders/Items) value
This is a nice task for Teradata's SQL extension to create time series (based on #Andrew's test data):
SELECT *
FROM vt_foo
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
For an exact split of orders into items:
SELECT dt.*,
CASE WHEN items * (end_date - start_date) > orders
THEN orders MOD items
ELSE items
end
FROM
(
SELECT t.*, End(pd) AS end_date
FROM vt_foo AS t
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
) AS dt
This calls for a recursive CTE. Here's how I'd approach it, with a lovely volatile table for some sample data.
create volatile table vt_foo
(names varchar(100), start_date date, orders int, items int)
on commit preserve rows;
insert into vt_foo values ('AAA','2020-01-01',300,100);
insert into vt_foo values ('BAA','2020-02-01',896,448);
insert into vt_foo values ('CCC','2020-03-01',525,100); -
with recursive cte (names, start_date,items, num, counter) as (
select
names,
start_date,
items,
round(orders /( items * 1.0) ) as num ,
1 as counter
from vt_foo
UNION ALL
select
a.names,
a.start_date,
a.items,
b.num,
b.counter + 1
from vt_foo a
inner join cte b
on a.names = b.names
and a.start_date =b.start_date
where b.counter + 1 <= b.num
)
select * from cte
order by names,start_date
This bit: b.counter + 1 <= b.num is the key to limiting the output to the proper # of rows per product/date.
I think this should be ok, but test it with small volumes of data.

How to get nearest DateTime from 2 tables

In SQLite, I want to build a query to get the nearest datetime for 'tag' entries against a 'tick' list:
CREATE TABLE Tick (
id integer primary key,
dt varchar(20)
);
INSERT INTO Tick (id, dt) VALUES
( 1, '2018-10-30 13:00:00'),
( 2, '2018-10-30 14:00:00'),
( 3, '2018-10-30 15:00:00'),
( 4, '2018-10-30 16:00:00'),
( 5, '2018-10-30 17:00:00'),
( 6, '2018-10-30 18:00:00'),
( 7, '2018-10-30 19:00:00'),
( 8, '2018-10-31 05:00:00'),
( 9, '2018-10-31 06:00:00'),
(10, '2018-10-31 07:00:00');
CREATE TABLE Tag (
id integer primary key,
dt varchar(20)
);
INSERT INTO Tag (id, dt) VALUES
(100, '2018-10-30 16:08:00'),
(101, '2018-10-30 17:30:00'),
(102, '2018-10-30 19:12:00'),
(103, '2018-10-31 04:00:00'),
(104, '2018-10-31 13:00:00');
The following query gives me the good match (based on diff) but I'm unable to get Tick columns:
SELECT Tag.dt,
(SELECT ABS(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as diff
FROM Tick
ORDER BY diff ASC
LIMIT 1
) as diff from Tag
I tried the following but I receive an error on Tag.dt in ORDER BY:
SELECT
Tag.id, Tag.dt,
Tick.id, Tick.dt,
abs(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as Diff FROM Tag JOIN Tick ON Tick.dt = (SELECT Tick.dt
FROM Tick
ORDER BY abs(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) ASC
limit 1)
The result I would like to have is something like:
TagID,DateTimeTag ,TickID,DateTimeTick
100,2018-10-30 16:08:00, 4,2018-10-30 16:00:00
101,2018-10-30 17:30:00, 6,2018-10-30 18:00:00
102,2018-10-30 19:12:00, 7,2018-10-30 19:00:00
103,2018-10-31 04:00:00, 8,2018-10-31 05:00:00
104,2018-10-31 13:00:00, 10,2018-10-31 07:00:00
Edited later...
Based on forpas's answer, I was able to derive something without using the ROW_COUNTER() keyword which I can't use in FME. I also set a maximum delta time difference (10000 sec) to find a match:
SELECT t.TagId, t.Tagdt, t.TickId, t.Tickdt, MIN(t.Diff)
FROM
(
SELECT
Tag.id as TagId, Tag.dt as Tagdt,
Tick.id as TickId, Tick.dt as Tickdt,
abs(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as Diff
FROM Tag, Tick
WHERE Diff < 10000
) AS t
GROUP BY t.TagId
Thanks again!
Use ROW_NUMBER() window function:
SELECT t.tagID, t.tagDT, t.tickID, t.tickDT
FROM (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.tagID, t.tagDT ORDER BY t.Diff) AS rn
FROM (
SELECT Tag.id tagID, Tag.dt tagDT, Tick.id tickID, Tick.dt tickDT,
ABS(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as Diff
FROM Tag CROSS JOIN Tick
) AS t
) AS t
WHERE t.rn = 1
See the demo.
Rsults:
| tagID | tagDT | tickID | tickDT |
| ----- | ------------------- | ------ | ------------------- |
| 100 | 2018-10-30 16:08:00 | 4 | 2018-10-30 16:00:00 |
| 101 | 2018-10-30 17:30:00 | 5 | 2018-10-30 17:00:00 |
| 102 | 2018-10-30 19:12:00 | 7 | 2018-10-30 19:00:00 |
| 103 | 2018-10-31 04:00:00 | 8 | 2018-10-31 05:00:00 |
| 104 | 2018-10-31 13:00:00 | 10 | 2018-10-31 07:00:00 |
Create a temp_table query to get the differences of time stamps of the cross product of Tick and Tag tables and select the min value for each of the Tick table id s.
The two temp_table queries are identical.
Note that this query may not be efficient as it takes full cross product across the two tables
SELECT temp_table.tid, temp_table.tdt, temp_table.tiid, temp_table.tidt, temp_table.diff
FROM
(SELECT Tag.id AS tid, Tag.dt AS tdt, Tick.id AS tiid, Tick.dt AS tidt, abs(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as diff
FROM tag, tick) temp_table
WHERE temp_table.diff =
(SELECT MIN(temp_table2.diff) FROM
(SELECT Tag.id AS tid, Tag.dt AS tdt, Tick.id AS tiid, Tick.dt AS tidt, abs(strftime('%s',Tick.dt) - strftime('%s',Tag.dt)) as diff
FROM tag, tick) temp_table2
WHERE temp_table2.tid = temp_table.tid
)
group by temp_table.tid

Not A valid month error while subtracting timestamps

I am trying to subtract timestamp values. It gives ORA-01843: NOT A VALID MONTH error.
Below query runs fine in SQL Devloper,
But while runtime it throws not a valid month error.
I am not able to find out. Can anybody modify this query.
Select substr(TO_TIMESTAMP(TO_CHAR(end_time,'DD-MM-YY HH12:MI:SS'))-(TO_TIMESTAMP(TO_CHAR(start_time,'DD-MM-YY HH12:MI:SS')),12,8))as Duration from Job_execution
If those datatypes are TIMESTAMP, then why don't you just subtract them?
SQL> create table job_execution
2 (id number,
3 start_time timestamp(6),
4 end_time timestamp(6));
Table created.
SQL> insert into job_execution (id, start_time, end_time) values
2 (1, to_timestamp('20.11.2019 10:30:00', 'dd.mm.yyyy hh24:mi:ss'),
3 to_timestamp('25.11.2019 14:00:00', 'dd.mm.yyyy hh24:mi:ss'));
1 row created.
SQL> select end_time - start_time diff from job_execution where id = 1;
DIFF
---------------------------------------------------------------------------
+000000005 03:30:00.000000
SQL>
"Not a valid month" can be result of timestamp values stored into VARCHAR2 columns where you think everything is entered correctly, but there are values such as 25.18.2019 (dd.mm.yyyy), as there's no 18th month in any year.
That's why I asked for the datatype.
[EDIT: how to format the result]
If you want a "nice" displayed result, then it requires some more typing. For example:
SQL> with difference as
2 (select end_time - start_Time as diff
3 from job_execution
4 where id = 1
5 )
6 select extract (day from diff) ||' '||
7 lpad(extract (hour from diff), 2, '0') ||':'||
8 lpad(extract (minute from diff), 2, '0') ||':'||
9 lpad(extract (second from diff), 2, '0') result
10 from difference;
RESULT
--------------------------------------------------------------
5 03:30:00
SQL>

Create multiple rows based off a date range

I have a calendar query and a table below. I have a StartDate and end date for a member. Also on my calendar table I have captured a "Weekof" based on the startDate. I would like to capture if a member is active anytime during that weekof. See expected results.
SELECT DISTINCT
--CA.CALENDAR_DATE,
TO_CHAR(CALENDAR_DATE,'MM/DD/YYYY') AS CALENDAR_DATE
TO_CHAR(NEXT_DAY(CALENDAR_DATE, 'Monday') - 7, 'MM/DD/YY-') ||
TO_CHAR(NEXT_DAY(CALENDAR_DATE, 'Monday') - 1, 'MM/DD/YY') AS WEEK_OF_YEAR,
ROW_NUMBER () OVER ( ORDER BY CALENDAR_DATE) AS MasterCalendar_RNK
FROM CALENDAR CA
WHERE 1=1
--AND CA.CALENDAR_DATE BETWEEN ADD_MONTHS(TRUNC(SYSDATE), -12) AND TRUNC(SYSDATE)
--AND CA.CALENDAR_DATE BETWEEN TRUNC(SYSDATE) -5 AND TRUNC(SYSDATE)
ORDER BY TO_DATE(CALENDAR_DATE,'MM/DD/YYYY') DESC
Table
Member StartDate EndDate
A 1/31/17
B 2/1/17 2/15/17
Expected output:
Member StartDate EndDate Week_Of_Year Active
A 1/31/17 1/30/17-2/5/17 1
A 1/31/17 2/6/17-2/12/17 1
A 1/31/17 2/13/17-2/19/17 1
B 2/1/17 2/15/17 1/30/17/2/5/17 1
B 2/1/17 2/15/17 2/6/17-2/12/17 1
B 2/1/17 2/15/17 2/13/17-2/19/17 1
Current Query:
WITH MASTER_CALENDAR AS (
SELECT TRUNC(SYSDATE) + 1 - LEVEL , A.CALENDAR_DATE
FROM (SELECT C.CALENDAR_DATE FROM MST.CALENDAR C WHERE 1=1 AND C.CALENDAR_DATE > SYSDATE-30 AND C.CALENDAR_DATE < SYSDATE) A
WHERE 1=1
CONNECT BY LEVEL <= 1 --NEED TO UPDATE?
ORDER BY A.CALENDAR_DATE DESC
),
ActiveMembers AS (
SELECT H.CLT_CLT_PGMID, H.START_DT
,CASE WHEN TRUNC(H.END_DT) = '1-JAN-3000'
THEN SYSDATE
ELSE TO_DATE(H.END_DT)
END AS END_DT
FROM H
WHERE 1=1
AND H.CLT_CLT_PGMID IN ('1','2','3')
)
SELECT CLT_CLT_PGMID, STARTDATE, ENDDATE, WEEK_OF_YEAR, ACTIVE -- but not week_start
FROM (
SELECT DISTINCT A.CLT_CLT_PGMID,
TO_CHAR(A.START_DT, 'MM/DD/YY') AS STARTDATE,
TO_CHAR(A.END_DT, 'MM/DD/YY') AS ENDDATE,
NEXT_DAY(CAL.CALENDAR_DATE, 'Monday') - 7 AS WEEK_START, -- for ordering later
TO_CHAR(NEXT_DAY(CAL.CALENDAR_DATE, 'Monday') - 7, 'MM/DD/YY-') ||
TO_CHAR(NEXT_DAY(CAL.CALENDAR_DATE, 'Monday') - 1, 'MM/DD/YY') AS WEEK_OF_YEAR,
1 AS ACTIVE
FROM ActiveMembers A
INNER JOIN MASTER_CALENDAR CAL ON CAL.CALENDAR_DATE BETWEEN A.START_DT AND A.END_DT
--BETWEEN TO_CHAR(A.START_DT,'MM/DD/YYYY') AND COALESCE(A.END_DT,(SYSDATE))
)
WHERE 1=1
ORDER BY
CLT_CLT_PGMID , STARTDATE, ENDDATE, WEEK_START
;
Since the calendar query currently generates strings, it would be simpler to go back to the calendar table, join that to your member/date table, and regenerate the week range string:
With CTEs to represent your calendar table (just with dates for the last few weeks for now) and member data:
with calendar(calendar_date) as (
select trunc(sysdate) + 1 - level from dual connect by level <= 42
),
mytable (member, startdate, enddate) as (
select cast('A' as varchar2(6)), date '2017-01-31', cast (null as date) from dual
union all select cast('B' as varchar2(6)), date '2017-02-01', date '2017-02-15' from dual
)
select member, startdate, enddate, week_of_year, active -- but not week_start
from (
select distinct m.member,
to_char(m.startdate, 'MM/DD/YY') as startdate,
to_char(m.enddate, 'MM/DD/YY') as enddate,
next_day(c.calendar_date, 'Monday') - 7 as week_start, -- for ordering later
to_char(next_day(c.calendar_date, 'Monday') - 7, 'MM/DD/YY-') ||
to_char(next_day(c.calendar_date, 'Monday') - 1, 'MM/DD/YY') as week_of_year,
1 as active
from mytable m
join calendar c
on c.calendar_date between m.startdate and coalesce(m.enddate, trunc(sysdate))
)
order by member, startdate, enddate, week_start;
gets
MEMBER STARTDAT ENDDATE WEEK_OF_YEAR ACTIVE
------ -------- -------- ----------------- ----------
A 01/31/17 01/30/17-02/05/17 1
A 01/31/17 02/06/17-02/12/17 1
A 01/31/17 02/13/17-02/19/17 1
A 01/31/17 02/20/17-02/26/17 1
B 02/01/17 02/15/17 01/30/17-02/05/17 1
B 02/01/17 02/15/17 02/06/17-02/12/17 1
B 02/01/17 02/15/17 02/13/17-02/19/17 1
You haven't specified an upper limit for members with no end-date, so I've used today, via coalesce().
The inner query is only needed for ordering, as the week range string can't be used, and you don't want to see the week start on its own; and you can't use distinct and order by a field you aren't selecting.
I'd do this in a similar way to Alex, but slightly different. Seeing as your weeks start with a Monday, I'd use TRUNC(dt, 'iw') to get the ISO start of the week (which happens to be defined as a Monday) for the specified date. Then I'd get the distinct values of those before joining to your table, like so:
with calendar as (select trunc(sysdate) - level + 1 calendar_date
from dual
connect by level <= 50),
your_table as (select 'A' member, date '2017-01-31' startdate, NULL enddate from dual union all
select 'B' member, date '2017-02-01' startdate, date '2017-02-15' enddate from dual)
select yt.member,
yt.startdate,
yt.enddate,
to_char(c.week_start, 'mm/dd/yyyy')
|| ' - ' || to_char(c.week_start + 6, 'mm/dd/yyyy') week_of_year,
1 as active
from your_table yt
inner join (select distinct trunc(cl.calendar_date, 'iw') week_start
from calendar cl) c on c.week_start <= nvl(yt.enddate, SYSDATE) AND c.week_start + 6 >= yt.startdate
order by yt.member,
c.week_start;
MEMBER STARTDATE ENDDATE WEEK_OF_YEAR ACTIVE
------ ---------- ---------- ----------------------- ----------
A 01/31/2017 01/30/2017 - 02/05/2017 1
A 01/31/2017 02/06/2017 - 02/12/2017 1
A 01/31/2017 02/13/2017 - 02/19/2017 1
A 01/31/2017 02/20/2017 - 02/26/2017 1
B 02/01/2017 02/15/2017 01/30/2017 - 02/05/2017 1
B 02/01/2017 02/15/2017 02/06/2017 - 02/12/2017 1
B 02/01/2017 02/15/2017 02/13/2017 - 02/19/2017 1
Like Alex, I've assumed your null enddate runs up until today (sysdate). However, looking at your results for member B, it looks like you're looking for an overlapping range (since 30th Jan is not between 1st and 15th Feb), so I've amended my join clause accordingly. This results in an extra row for member A, so maybe you're wanting to run null enddates up until the previous Sunday of sysdate? Not sure. I'm sure you'll be able to amend that yourself, if you need to.

Resources