clickhouse moving average betwween partitions - window-functions

how to calculate moving average betwween partitions in clickhouse.
for window_functions
set allow_experimental_window_functions = 1;
1、create table
CREATE TABLE IF NOT EXISTS tb (t DateTime, v Float32) ENGINE=MergeTree()
PARTITION BY toYYYYMM(t) ORDER BY t;
(need partition by monthly)
2、insert values
INSERT INTO tb (t, v) VALUES ('2021-07-31 23:00:00', 5.0),
('2021-07-31 22:00:00', 4.0), ('2021-07-31 21:00:00', 3.0),
('2021-07-31 20:00:00', 2.0), ('2021-07-31 19:00:00', 1.0);
3、calculate moving average in materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS mv_tb ENGINE=MergeTree()
PARTITION BY toYYYYMM(t)
ORDER BY t
POPULATE
AS SELECT t, v, avg(v) OVER w AS ma
FROM tb
WINDOW w AS (ORDER BY t ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);
4、query materialized view(everything is ok)
select * from mv_tb;
5、insert new datas
INSERT INTO tb (t, v) VALUES ('2021-08-01 01:00:00', 7.0),
('2021-08-01 00:00:00', 6.0);
6、query materialized view( !!!wrong!!!)
select * from mv_tb;

Related

sqlite3 update column with rollling average

I want to update a column with the rolling average of the product of two other columns, all in the one table.
create table tb (code str, date str, cl float, vo float);
insert into tb values
('BHP', '2020-01-03', 3.25, 1000),
('BHP', '2020-01-04', 3.50, 2000),
('BHP', '2020-01-05', 3.55, 1000),
('CSR', '2020-01-03', 5.55, 1500),
('CSR', '2020-01-04', 5.60, 2000),
('CSR', '2020-01-05', 5.55, 2000),
('DDG', '2020-01-03', 10.20, 4000),
('DDG', '2020-01-04', 10.25, 4500),
('DDG', '2020-01-05', 10.30, 5000);
alter table tb add column dv float;
alter table tb add column avg_dv float;
update tb set dv = (select cl * vo);
select * from tb;
BHP|2020-01-03|3.25|1000.0|3250.0|
BHP|2020-01-04|3.5|2000.0|7000.0|
BHP|2020-01-05|3.55|1000.0|3550.0|
CSR|2020-01-03|5.55|1500.0|8325.0|
CSR|2020-01-04|5.6|2000.0|11200.0|
CSR|2020-01-05|5.55|2000.0|11100.0|
DDG|2020-01-03|10.2|4000.0|40800.0|
DDG|2020-01-04|10.25|4500.0|46125.0|
DDG|2020-01-05|10.3|5000.0|51500.0|
So far, so good. I can select a rolling average -
select code, date, avg(dv)
over (partition by code order by date asc rows 1 preceding)
from tb;
BHP|2020-01-03|3250.0
BHP|2020-01-04|5125.0
BHP|2020-01-05|5275.0
CSR|2020-01-03|8325.0
CSR|2020-01-04|9762.5
CSR|2020-01-05|11150.0
DDG|2020-01-03|40800.0
DDG|2020-01-04|43462.5
DDG|2020-01-05|48812.5
but when I try to put that selection into the avg_vo column of the table I don't get what I was expecting -
update tb set avg_dv =
(select avg(dv)
over (partition by code order by date asc rows 1 preceding)
);
select * from tb;
BHP|2020-01-03|3.25|1000.0|3250.0|3250.0
BHP|2020-01-04|3.5|2000.0|7000.0|7000.0
BHP|2020-01-05|3.55|1000.0|3550.0|3550.0
CSR|2020-01-03|5.55|1500.0|8325.0|8325.0
CSR|2020-01-04|5.6|2000.0|11200.0|11200.0
CSR|2020-01-05|5.55|2000.0|11100.0|11100.0
DDG|2020-01-03|10.2|4000.0|40800.0|40800.0
DDG|2020-01-04|10.25|4500.0|46125.0|46125.0
DDG|2020-01-05|10.3|5000.0|51500.0|51500.0
The avg_dv column has been updated with just the dv values, not the rolling average.
I also tried to adapt the answer here as
update tb
set tb.avg_dv = b.avg_dv
from tb as a inner join
(select code, date, avg(dv)
over (partition by code order by date asc rows 1 preceding)) b
on a.code = b.code and a.date = b.date;
But that just gives me syntax error -
Error: near ".": syntax error
In particular, the reference to "b" in the line
set tb.avg_dv = b.avg_dv
looks like garbage, but I don't know what to replace it with.
Can this be done in a single query?
The code that you tried to adapt in your case uses SQL Server syntax.
If your version of SQLite is 3.33.0+ the correct syntax is:
update tb
set avg_dv = b.avg_dv
from (
select code, date,
avg(dv) over (partition by code order by date asc rows 1 preceding) avg_dv
from tb
) b
where tb.code = b.code and tb.date = b.date;
If you are using a previous version of SQLite that does not support the FROM clause in the UPDATE statement, then you can do it with a CTE:
with cte as (
select code, date,
avg(dv) over (partition by code order by date asc rows 1 preceding) avg_dv
from tb
)
update tb
set avg_dv = (select c.avg_dv from cte c where c.code = tb.code and c.date = tb.date)
Also note that the update statement for the column dv can be written simply:
update tb set dv = cl * vo;
and if your version of SQLite is 3.31.0+, you could create it as generated column so that there is no need for updates:
alter table tb add column dv float generated always as(cl * vo);

Moving average in SQLite

I would like to compute a moving average over data in a SQLite table. I found several method in MySQL, but couldn't find an efficient one in SQLite.
In SQL, I think something like this should do it (however, I was not able to try it...) :
SELECT date, value,
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;
However, I see two drawbacks :
This does not seems to work on sqlite
If data are not continuous for few dates on preceding/following rows, it computes a moving average on a window which is wider than what I actually want since it is only based on the number of surrounding rows. Thus, a date condition should be added
Indeed, I would like it to compute the average of 'value' at each date, over +/-3 days (weekly moving average) or +/-15 days (monthly moving average)
Here is an example data set :
CREATE TABLE t ( date DATE, value INTEGER );
INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);
Window functions were added in version 3.25.0 (2018-09-15). With the RANGE frame type added in version 3.28.0 (2019-04-16), you can now do:
SELECT date, value,
avg(value) OVER (
ORDER BY CAST (strftime('%s', date) AS INT)
RANGE BETWEEN 3 * 24 * 60 * 60 PRECEDING
AND 3 * 24 * 60 * 60 FOLLOWING
) AS MovingAverageWindow7
FROM t ORDER BY date;
I think I actually found a solution :
SELECT date, value,
(SELECT AVG(value) FROM t t2
WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date);
I don't know if it is the most efficient way, but it seems to work
Edit :
Applied to my real database containing 20 000 rows, a weekly moving average over two parameters takes approximately 1 minute to be calculated.
I see two options there :
There is a more efficient way to compute this with SQLite
I compute the moving average in Python after extracting data from SQLite
One approach is to create a intermediate table that maps each date to the groups it belong to.
CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
FROM t;
INSERT INTO groups
SELECT date, date AS daygroup FROM t;
You get for example,
SELECT * FROM groups WHERE date = '2018-02-05'
date daygroup
2018-02-05 2018-02-04
2018-02-05 2018-02-03
2018-02-05 2018-02-02
2018-02-05 2018-02-06
2018-02-05 2018-02-07
2018-02-05 2018-02-08
2018-02-05 2018-02-05
indicating that '2018-02-05' belongs to groups '2018-02-02' to '2018-02-08'. If a date belongs to a group, then the value of the data joins the calculation of moving average for the group.
With this, calculating the moving average becomes straightforward:
SELECT
d.date, d.value, c.ma
FROM
t AS d
INNER JOIN
(SELECT
b.daygroup,
avg(a.value) AS ma
FROM
t AS a
INNER JOIN
groups AS b
ON a.date = b.date
GROUP BY b.daygroup) AS c
ON
d.date = c.daygroup
Note that the number of rows of intermediate table is 7 times as large as that of original table, it grows proportionately as taking wider the window. This should be acceptable unless you have much larger table.
I also experimented with 20 000 rows.
The insert query took 1.5s and select query took 0.5s on my laptop.
ADDED, perhaps better.
An alternative that does not require intermediate table.
The query below merges the table with itself, in a way 3 days lag is allowed, then takes average.
SELECT
t1.date, avg(t2.value) AS MVG
FROM
t AS t1
INNER JOIN
t AS t2
ON
datetime(t1.date, '-3 days') <= datetime(t2.date)
AND
datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
t1.date
;

Column with keywords found in comments field

At work I've been given a task that I'm told is "simple and straightforward", but I'm having difficulty with:
I have a view that contains 4 columns, a PK, FK, comments, and column #4. My manager is telling me to make a table that contains 75 or so keywords, and then a query that will go through each row in the view, compare the comments to the keyword table, and then append each found keyword to column #4. I've searched google and SO, and have not found a query that would do this. Any help would be appreciated.
Try below. I am new to Teradata.
--***************************************************************
DROP TABLE x;
--***************************************************************
CREATE MULTISET VOLATILE TABLE x, NO FALLBACK,
CHECKSUM = DEFAULT,
LOG
(
RCD_ID INTEGER,
FK INTEGER,
CMT VARCHAR(200),
COL_4 VARCHAR(200),
RN INTEGER
)
PRIMARY INDEX (RCD_ID)
ON COMMIT PRESERVE ROWS;
--***************************************************************
INSERT INTO x VALUES (1,10,'DID YOU SEE THE COW?','',0);
INSERT INTO x VALUES (2,20,'DID YOU SEE THE CAT?','',0);
INSERT INTO x VALUES (3,30,'DID YOU SEE THE FOX?','',0);
INSERT INTO x VALUES (4,40,'DID YOU SEE THE GOAT, FOX, AND CAT?','',0);
INSERT INTO x VALUES (5,50,'DID YOU SEE THE DUCK AND THE COW?','',0);
--***************************************************************
SELECT * FROM x ORDER BY 1;
--***************************************************************
DROP TABLE y;
--***************************************************************
CREATE MULTISET VOLATILE TABLE y, NO FALLBACK,
CHECKSUM = DEFAULT,
LOG
(
RCD_ID INTEGER,
KEY_WORD VARCHAR(20)
)
PRIMARY INDEX (RCD_ID)
ON COMMIT PRESERVE ROWS;
--***************************************************************
INSERT INTO y VALUES (1,'COW');
INSERT INTO y VALUES (2,'CAT');
INSERT INTO y VALUES (3,'FOX');
INSERT INTO y VALUES (4,'GOAT');
INSERT INTO y VALUES (5,'DUCK');
--***************************************************************
SELECT * FROM y ORDER BY 1;
--***************************************************************
DROP TABLE z;
--***************************************************************
CREATE MULTISET VOLATILE TABLE z AS(
SELECT x.RCD_ID,x.CMT,x.COL_4,y.key_word, ROW_NUMBER() OVER(PARTITION BY x.RCD_ID ORDER BY x.RCD_ID) AS RN
FROM x JOIN y ON x.cmt LIKE '%' || y.KEY_WORD || '%'
)
WITH DATA PRIMARY INDEX (RCD_ID)
ON COMMIT PRESERVE ROWS;
--***************************************************************
SELECT * FROM z ORDER BY 1,5;
--***************************************************************
WITH RECURSIVE RPT AS(
SELECT
RCD_ID,FK,CMT,COL_4,RN
FROM x
UNION ALL
SELECT
b.RCD_ID,b.FK,b.CMT,b.COL_4 || ';' || a.KEY_WORD,a.RN
FROM z AS a
JOIN RPT AS b
ON b.RCD_ID = a.RCD_ID
AND b.RN = a.RN-1
)
SELECT *
FROM RPT
QUALIFY ROW_NUMBER() OVER (PARTITION BY RCD_ID ORDER BY RCD_ID, RN DESC) = 1
ORDER BY 1,5;
--***************************************************************

intersect function for arrays in pl sql

I am trying to find the intersection result between two arrays in sql.
Also i have a small doubt in the below code is that i have written y.extend(32000).
If i remove his line i get limit error. I have only 9 records though in my 2d array.
Because of extend function i am unable to get current count in the array i.e y.count results 32000
The demo code is given as follows:
DECLARE
type items is table of number;
type item_sets is table of items;
y item_sets;
i number := 0;
v_c items;
cursor c1 is
select distinct item from sales_demo order by item;
BEGIN
y := item_sets();
y.EXTEND(32000);
FOR Z IN c1 LOOP
i := i + 1;
SELECT tid bulk collect into y(i) FROM sales_demo WHERE item = z.item;
END LOOP;
v_c := y(1) multiset intersect y(2); -- i want intersection result between y1 and y2
DBMS_OUTPUT.PUT_LINE((v_c).count);
END;
Any help will be useful
Try this:
WITH Y1 AS (SELECT TID
FROM SALES_DEMO
WHERE ITEM = (SELECT LEAST(ITEM)
FROM SALES_DEMO))
WITH Y2 AS (SELECT TID
FROM SALES_DEMO
WHERE ITEM = (SELECT *
FROM (SELECT ITEM
FROM SALES_DEMO
WHERE ITEM > LEAST(ITEM)
ORDER BY ITEM)
WHERE ROWNUM = 1)
SELECT *
FROM Y1
INTERSECT Y2
This avoids all the mucking about with arrays and such, and lets the database do what it's good at.
Share and enjoy.

SQLite trigger insert/replace multiple rows multiple tables

I am attempting to respond to the insertion of a row in one table (A) to create or update multiple rows in a second table (B) based on the values of a third table (C) (which can be joined to the first).
I have the following construct,
CREATE TRIGGER MyTrigger AFTER INSERT ON A
BEGIN
INSERT OR REPLACE INTO B (ID, T1, T2, Role)
VALUES
(
( SELECT ID FROM C WHERE R1 = NEW.R1 ),
NEW.T1,
B.T2, -- The existing row's T2
( SELECT Role FROM C WHERE R1 = NEW.R1 ),
)
END;
Table A has columns ID, T1, R1
Table B has columns ID, T1, T2, Role
Table C has columns ID, R1, R2, Role
I have at least two problems with my attempts at composing the trigger
I don't know how to reference B's existing values in the REPLACE case, thus the "B.T2"
I don't know how to reference multiple columns (R1, Role) from the same row in table C when doing my INSERT/REPLACE in table B.
Thanks for any help in sorting this out.
Using SELECT instead of VALUES:
CREATE TRIGGER MyTrigger AFTER INSERT ON A BEGIN
INSERT OR REPLACE INTO B (ID, T1, T2, Role) SELECT
(SELECT ID FROM C WHERE R1 = NEW.R1),
NEW.T1,
B.T2,
(SELECT Role FROM C WHERE R1 = NEW.R1)
FROM B WHERE ROWID=NEW.ROWID
END;
I was able to use a LEFT OUTER JOIN on the SELECT so that all the needed values can be named regardless of whether there's an existing row.
CREATE TRIGGER MyTrigger AFTER INSERT ON A
BEGIN
INSERT OR REPLACE INTO B (ID, T1, T2, Role)
SELECT
C.ID,
NEW.T1,
B.T2,
C.Role
FROM C LEFT OUTER JOIN B ON C.ID = B.ID WHERE C.R1 = NEW.R1;
END;
To find the B record, just use a subquery like you're doing with C.
The B.ID value to search for is the same as that you're trying to insert.
CREATE TRIGGER MyTrigger
AFTER INSERT ON A
BEGIN
INSERT OR REPLACE INTO B (ID, T1, T2, Role)
VALUES
(
( SELECT ID FROM C WHERE R1 = NEW.R1 ),
NEW.T1,
( SELECT T2 FROM B WHERE ID = ( SELECT ID FROM C WHERE R1 = NEW.R1 ) ),
( SELECT Role FROM C WHERE R1 = NEW.R1 )
);
END;

Resources