SQL: grouping to have exact rows - sqlite

Let's say there is a schema:
|date|value|
DBMS is SQLite.
I want to get N groups and calculate AVG(value) for each of them.
Sample:
2020-01-01 10:00|2.0
2020-01-01 11:00|2.0
2020-01-01 12:00|3.0
2020-01-01 13:00|10.0
2020-01-01 14:00|2.0
2020-01-01 15:00|3.0
2020-01-01 16:00|11.0
2020-01-01 17:00|2.0
2020-01-01 18:00|3.0
Result (N=3):
2020-01-01 11:00|7.0/3
2020-01-01 14:00|15.0/3
2020-01-01 17:00|16.0/3
I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
SELECT
/*AVG(*/value/*)*/,
NTILE (3) OVER (ORDER BY date) bucket
FROM
test
/*GROUP BY bucket*/
/*GROUP BY NTILE (3) OVER (ORDER BY date) bucket*/
Also dropped the test data and this query into DBFiddle.

You can use NTILE() window function to create the groups and aggregate:
SELECT
DATETIME(MIN(DATE), ((STRFTIME('%s', MAX(DATE)) - STRFTIME('%s', MIN(DATE))) / 2) || ' second') date,
ROUND(AVG(value), 2) avg_value
FROM (
SELECT *, NTILE(3) OVER (ORDER BY date) grp
FROM test
)
GROUP BY grp;
To change the number of rows in each bucket, you must change the number 3 inside the parentheses of NTILE().
See the demo.
Results:
| date | avg_value |
| ------------------- | --------- |
| 2020-01-01 11:00:00 | 2.33 |
| 2020-01-01 14:00:00 | 5 |
| 2020-01-01 17:00:00 | 5.33 |

I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
You first use NTILE to assign bucket numbers in a subquery, then group by it in an outer query.
Using sub-query
SELECT bucket
, AVG(value) AS avg_value
FROM ( SELECT value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
) x
GROUP BY bucket
ORDER BY bucket
Using WITH clause
WITH x AS (
SELECT date
, value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
)
SELECT bucket
, COUNT(*) AS bucket_size
, MIN(date) AS from_date
, MAX(date) AS to_date
, MIN(value) AS min_value
, AVG(value) AS avg_value
, MAX(value) AS max_value
, SUM(value) AS sum_value
FROM x
GROUP BY bucket
ORDER BY bucket

Related

Split a row into multiple rows - Teradata

Below is an example of my table
Names Start_Date Orders Items
AAA 2020-01-01 300 100
BAA 2020-02-01 896 448
My requirement would be as below
Names Start_Date Orders
AAA 2020-01-01 100
AAA 2020-01-01 100
AAA 2020-01-01 100
BBB 2020-02-01 448
BBB 2020-02-01 448
The rows should be split based on the (Orders/Items) value
This is a nice task for Teradata's SQL extension to create time series (based on #Andrew's test data):
SELECT *
FROM vt_foo
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
For an exact split of orders into items:
SELECT dt.*,
CASE WHEN items * (end_date - start_date) > orders
THEN orders MOD items
ELSE items
end
FROM
(
SELECT t.*, End(pd) AS end_date
FROM vt_foo AS t
EXPAND ON PERIOD(start_date, start_date + Cast(Ceiling(Cast(orders AS FLOAT)/items) AS INT)) AS pd
) AS dt
This calls for a recursive CTE. Here's how I'd approach it, with a lovely volatile table for some sample data.
create volatile table vt_foo
(names varchar(100), start_date date, orders int, items int)
on commit preserve rows;
insert into vt_foo values ('AAA','2020-01-01',300,100);
insert into vt_foo values ('BAA','2020-02-01',896,448);
insert into vt_foo values ('CCC','2020-03-01',525,100); -
with recursive cte (names, start_date,items, num, counter) as (
select
names,
start_date,
items,
round(orders /( items * 1.0) ) as num ,
1 as counter
from vt_foo
UNION ALL
select
a.names,
a.start_date,
a.items,
b.num,
b.counter + 1
from vt_foo a
inner join cte b
on a.names = b.names
and a.start_date =b.start_date
where b.counter + 1 <= b.num
)
select * from cte
order by names,start_date
This bit: b.counter + 1 <= b.num is the key to limiting the output to the proper # of rows per product/date.
I think this should be ok, but test it with small volumes of data.

SQLite: Calculate how a counter has increased in current day and week

I have a SQLite database with a counter and timestamp in unixtime as showed below:
+---------+------------+
| counter | timestamp |
+---------+------------+
| | 1582933500 |
| 1 | |
+---------+------------+
| 2 | 1582933800 |
+---------+------------+
| ... | ... |
+---------+------------+
I would like to calculate how 'counter' has increased in current day and current week.
It is possible in a SQLite query?
Thanks!
Provided you have SQLite version >= 3.25.0 the SQLite window functions will help you achieve this.
Using the LAG function to retrieve the value from the previous record - if there is none (which will be the case for the first row) a default value is provided, that is same as current row.
For the purpose of demonstration this code:
SELECT counter, timestamp,
LAG (timestamp, 1, timestamp) OVER (ORDER BY counter) AS previous_timestamp,
(timestamp - LAG (timestamp, 1, timestamp) OVER (ORDER BY counter)) AS diff
FROM your_table
ORDER BY counter ASC
will give this result:
1 1582933500 1582933500 0
2 1582933800 1582933500 300
In a CTE get the min and max timestamp for each day and join it twice to the table:
with cte as (
select date(timestamp, 'unixepoch', 'localtime') day,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by day
)
select c.day, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;
With similar code get the results for each week:
with cte as (
select strftime('%W', date(timestamp, 'unixepoch', 'localtime')) week,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by week
)
select c.week, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;

Grouping query in Redshift takes huge amount of time

I have a following requirement: I have a table in following format.
and this is what I want it to be transformed into:
Basically I want number of users with various combination of activities
I want to have this format as I want to create a TreeMap visualization out of it.
This is what I have done till now.
First find out number of users with activity groupings
WITH lookup AS
(
SELECT listagg(name,',') AS groupings,
processed_date,
guid
FROM warehouse.test
GROUP BY processed_date,
guid
)
SELECT groupings AS activity_groupings,
LENGTH(groupings) -LENGTH(REPLACE(groupings,',','')) + 1 AS count,
processed_date,
COUNT( guid) AS users
FROM lookup
GROUP BY processed_date,
groupings
I put the results in a separate table
Then, I do a Split and coalesce like this:
SELECT NULLIF(SPLIT_PART(groupings,',', 1),'') AS grouping_1,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 2),''), grouping_1) AS grouping_2,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 3),''), grouping_2, grouping_1) AS grouping_3,
num_users
FROM warehouse.groupings) AS expr_qry
GROUP BY grouping_1,
grouping_2,
grouping_3
The problem is the first query takes more than 90 minutes to execute as I have more than 250M rows.
There must be a better and efficient way to di this.
Any heads up would be greatly appreciated.
Thanks
You do not need to use complex string manipulation functions (LISTAGG(), SPLIT_PART()). You can achieve what you're after with the ROW_NUMBER() function and simple aggregates.
-- Create sample data
CREATE TEMP TABLE test_data (id, guid, name)
AS SELECT 1::INT, 1::INT, 'cooking'
UNION ALL SELECT 2::INT, 1::INT, 'cleaning'
UNION ALL SELECT 3::INT, 2::INT, 'washing'
UNION ALL SELECT 4::INT, 4::INT, 'cooking'
UNION ALL SELECT 6::INT, 5::INT, 'cooking'
UNION ALL SELECT 7::INT, 3::INT, 'cooking'
UNION ALL SELECT 8::INT, 3::INT, 'cleaning'
;
-- Assign a row number to each name per guid
WITH name_order AS (
SELECT guid
, name
, ROW_NUMBER() OVER(PARTITION BY guid ORDER BY id) row_n
FROM test_data
) -- Use MAX() to collapse each guid's data to 1 row
, groupings AS (
SELECT guid
, MAX(CASE WHEN row_n = 1 THEN name END) grouping_1
, MAX(CASE WHEN row_n = 2 THEN name END) grouping_2
FROM name_order
GROUP BY guid
) -- Count the guids per each grouping
SELECT grouping_1
, COALESCE(grouping_2, grouping_1) AS grouping_2
, COUNT(guid) num_users
FROM groupings
GROUP BY 1,2
;
-- Output
grouping_1 | grouping_2 | num_users
------------+------------+-----------
washing | washing | 1
cooking | cleaning | 2
cooking | cooking | 2

SQLite - add days to a certain date in insert

I am using SQLite.
Let's say I have a table like this one:
CREATE TABLE dates (
date1 DATE NOT NULL PRIMARY KEY,
date2 DATE NOT NULL
);
Now, I want date1 to be a certain date and date2 to be date1 + 10 days.
How can I insert values to the table by using only date1 to produce both of them?
only thing i could find on the internet was something like that, but it's obviously not working, except for the case that I replace date('date1',+10days)) with date('now',+10days), but this is not what I want:
insert into dates values('2012-01-01', date('date1','+10 days'))
Any ideas?
Raise a trigger to automatically insert date2 every time you insert a date1 into the table.
CREATE TRIGGER date2_trigger AFTER INSERT ON dates
BEGIN
UPDATE dates SET date2 = DATE(NEW.date1, '+10 days') WHERE date1 = NEW.date1;
END;
-- insert date1 like so; date2 will be set automatically.
INSERT INTO dates(date1) VALUES('2012-01-01');
Instead of INSERT...VALUES use INSERT...SELECT like this:
insert into dates (date1, date2)
select t.date1, date(t.date1, '+10 days')
from (
select '2012-01-01' as date1
union all
select '2012-01-02'
union all
....................
) t
See the demo.
Results:
| date1 | date2 |
| ---------- | ---------- |
| 2012-01-01 | 2012-01-11 |
| 2012-01-02 | 2012-01-12 |

SQLite - Merge 2 tables according to modified date, insert a new row if necessary

I have a table having an ID column, this column is a primary key and unique as well. In addition, the table has a modified date column.
I have the same table in 2 databases and I am looking to merge both into one database. The merging scenario in a table is as follows:
Insert the record if the ID is not present;
If the ID exists, only update if the modified date is greater than that of the existing row.
For example, having:
Table 1:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | jane | 2019-01-01 | 2019-04-03
Table 2:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-04-30
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
The resulting table would be:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
I've read about INSERT OR REPLACE, but I couldn't figure out how the date condition can be applied. I know as well that I can loop through each pair of similar row and check the date manually but this would be very time and performance consuming. Therefore, is there an efficient way to accomplish this in SQLite?
I'm using sqlite3 on Node.js .
The UPSERT notation added in Sqlite 3.24 makes this easy:
INSERT INTO table1(id, name, createdAt, modifiedAt)
SELECT id, name, createdAt, modifiedAt FROM table2 WHERE true
ON CONFLICT(id) DO UPDATE
SET (name, createdAt, modifiedAt) = (excluded.name, excluded.createdAt, excluded.modifiedAt)
WHERE excluded.modifiedAt > modifiedAt;
First create the table Table3:
CREATE TABLE Table3 (
id INTEGER,
name TEXT,
createdat TEXT,
modifiedat TEXT,
PRIMARY KEY(id)
);
and then insert the rows like this:
insert into table3 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat from (
select * from table1 t1
where not exists (
select 1 from table2 t2
where t2.id = t1.id and t2.modifiedat >= t1.modifiedat
)
union all
select * from table2 t2
where not exists (
select 1 from table1 t1
where t1.id = t2.id and t1.modifiedat > t2.modifiedat
)
)
This uses a UNION ALL for the 2 tables and gets only the needed rows with EXISTS which is a very efficient way to check the condition you want.
I have >= instead of > in the WHERE clause for Table1 in case the 2 tables have a row with the same id and the same modifiedat values.
In this case the row from Table2 will be inserted.
If you want to merge the 2 tables in Table1 you can use REPLACE:
replace into table1 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat
from table2 t2
where
not exists (
select 1 from table1 t1
where (t1.id = t2.id and t1.modifiedat > t2.modifiedat)
)

Resources