grand total column in Impala using window function - window-functions

I'm looking for a way to do a "grand total" column across ALL groups in Impala.
It's easy to use window functions to obtain total of partitioned groups like this:
sum(x) over (partition by A)
however it does not appear that there is an expression to partition by 'all'. Is this a shortcoming in Impala? It looks like in postgress you can leave the over statement blank.

The partition clause is optional. You can write a query like this:
select sum(x) over () from t;
For example:
[localhost:21000] > select id, sum(id) over () from tbl;
+----+-------------------+
| id | sum(id) OVER(...) |
+----+-------------------+
| 0 | 28 |
| 1 | 28 |
| 2 | 28 |
| 3 | 28 |
| 6 | 28 |
| 7 | 28 |
| 4 | 28 |
| 5 | 28 |
+----+-------------------+
Fetched 8 row(s) in 0.08s

Related

How to get most recent data from DynamoDB for each primary partition key in PartiQL

inspired from this How to get most recent data from DynamoDB for each primary partition key?
I have a table in dynamodb. It stores account stats. It's possible that the account stats will be updated several times per day. So table records may look like:
+------------+--------------+-------+-------+
| account_id | record_id | views | stars |
+------------+--------------+-------+-------+
| 3 | 2019/03/16/1 | 29 | 3 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/2 | 130 | 21 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/3 | 12 | 2 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/1 | 57 | 12 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/2 | 8 | 2 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/1 | 3 | 0 |
+------------+--------------+-------+-------+
account_id is a primary partition key. record_id is a primary sort key
How I can get only latest records for each of the account_ids? So from the example above I expect to get:
+------------+--------------+-------+-------+
| account_id | record_id | views | stars |
+------------+--------------+-------+-------+
| 3 | 2019/03/16/1 | 29 | 3 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/2 | 130 | 21 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/3 | 12 | 2 |
+------------+--------------+-------+-------+
This data is convenient to use for a reporting purposes.
Execute the following PartiQL query for each account_id:
SELECT * FROM <Table> WHERE account_id='3' AND record_id > '2021/11' ORDER BY record_id DESC
PartiQL has no LIMIT keyword, so will return all matching records.
You can reduce overfetching by constraining the record_id date to the extent possible. If only the current date is of interest, for example, the sort key expression would be record_id > 2021/12/01.
As in the referenced example, you must execute one query for each account_id of interest. Batching operations are supported.

Updating multiple rows in SQLite with relevant data from the same table

I have a database that I don't control the source of directly and results in errant '0' entries which mess up generated graphs with these drops to zero. I am able to manipulate the data after the fact and update that database.
It is acceptable that the last known good value can be used instead and so I am trying to make a general query that will remove all the zeros and populate it with the last known value.
Luckily, every entry includes the ID of the last entry and so it is a matter of simply looking back and grabbing it.
I have got very close to a final answer, but instead of updating with the last good value, it just uses the first value over and over again.
dummy data
CREATE TABLE tbl(id INT,r INT,oid INT);
INSERT INTO tbl VALUES(1,10,0);
INSERT INTO tbl VALUES(2,20,1);
INSERT INTO tbl VALUES(3,0,2);
INSERT INTO tbl VALUES(4,40,3);
INSERT INTO tbl VALUES(5,50,4);
INSERT INTO tbl VALUES(6,0,5);
INSERT INTO tbl VALUES(7,70,6);
INSERT INTO tbl VALUES(8,80,7);
SELECT * FROM tbl;
OUTPUT:
| id| r |oid|
|---|----|---|
| 1 | 10 | 0 |
| 2 | 20 | 1 |
| 3 | 0 | 2 | ** NEEDS FIXING
| 4 | 40 | 3 |
| 5 | 50 | 4 |
| 6 | 0 | 5 | ** NEEDS UPDATE
| 7 | 70 | 6 |
| 8 | 80 | 7 |
I have worked several queries to get results around what I am after:
All zero entries:
SELECT * FROM tbl WHERE r = 0;
OUTPUT:
| id | r | oid |
|----|----|-----|
| 3 | 0 | 2 |
| 6 | 0 | 5 |
Output only the those rows with the preceding good row
SELECT * FROM tbl WHERE A in (
SELECT id FROM tbl WHERE r = 0
UNION
SELECT oid FROM tbl WHERE r = 0
)
OUTPUT:
| id| r |oid|
|---|----|---|
| 2 | 20 | 1 |
| 3 | 0 | 2 |
| 5 | 50 | 4 |
| 6 | 0 | 5 |
Almost works
This is as close as I have got, it does change all the zero's, but it changes them all to the value of the first lookup
UPDATE tbl
SET r = (SELECT r
FROM tbl
WHERE id in (SELECT oid
FROM tbl
WHERE r = 0)
) WHERE r = 0 ;
OUTPUT:
| id| r |oid|
|---|----|---|
| 1 | 10 | 0 |
| 2 | 20 | 1 |
| 3 | 20 | 2 | ** GOOD
| 4 | 40 | 3 |
| 5 | 50 | 4 |
| 6 | 20 | 5 | ** BAD, should be 50
| 7 | 70 | 6 |
| 8 | 80 | 7 |
If it helps, I created this fiddle here that I've been playing with:
http://sqlfiddle.com/#!5/8afff/1
For this sample data all you have to do is use the correct correlated subquery that returns the value of r from the row with id equal to the current oid in the WHERE clause:
UPDATE tbl AS t
SET r = (SELECT tt.r FROM tbl tt WHERE tt.id = t.oid)
WHERE t.r = 0;
See the demo.

Default value for LAG function in MariaDB

I'm trying to build a view which allows me to track the difference between paid values at two consecutive month_ids. When a figure is missing however, that would be because it's the first entry and therefore has a paid amount of 0. At present, I'm using the below to represent the previous figure since the [,default] argument has not been implemented in MariaDB.
CASE WHEN (
NOT(policy_agent_month.policy_agent_month_id IS NOT NULL
AND LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id ) IS NULL)) THEN
LAG(days_paid, 1) OVER ( PARTITION BY claim_id ORDER BY month_id)
ELSE
0
END
The problem I have with this is that I have about 30 variables which this function needs to be applied over and it makes my code unreadable and very clunky. Is there a better solution?
Why use WITH?
SELECT province, tot_pop,
tot_pop - COALESCE(
(LAG(tot_pop) OVER (ORDER BY tot_pop ASC)),
0) AS delta
FROM provinces
ORDER BY tot_pop asc;
+---------------------------+----------+---------+
| province | tot_pop | delta |
+---------------------------+----------+---------+
| Nunavut | 14585 | 14585 |
| Yukon | 21304 | 6719 |
| Northwest Territories | 24571 | 3267 |
| Prince Edward Island | 63071 | 38500 |
| Newfoundland and Labrador | 100761 | 37690 |
| New Brunswick | 332715 | 231954 |
| Nova Scotia | 471284 | 138569 |
| Saskatchewan | 622467 | 151183 |
| Manitoba | 772672 | 150205 |
| Alberta | 2481213 | 1708541 |
| British Columbia | 3287519 | 806306 |
| Quebec | 5321098 | 2033579 |
| Ontario | 10071458 | 4750360 |
+---------------------------+----------+---------+
13 rows in set (0.00 sec)
However, it is not cheap (at least in MySQL 8.0);
the table has 13 rows, yet
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
MySQL 8.0:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_read_rnd | 89 |
| Handler_read_rnd_next | 52 |
| Handler_write | 26 |
(and others)
MariaDB 10.3:
| Handler_read_rnd | 77 |
| Handler_read_rnd_next | 42 |
| Handler_tmp_write | 13 |
| Handler_update | 13 |
You can use a CTE (Common Table Expression) in MariaDB 10.2+ to pre-compute frequently used expressions and name them for later use:
with
x as ( -- first we compute the CTE that we name "x"
select
*,
coalesce(
LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id),
123456
) as prev_month -- this expression gets the name "prev_month"
from my_table -- or a simple/complex join here
)
select -- now the main query
prev_month
from x
... -- rest of your query here where "prev_month" is computed.
In the main query prev_month has the lag value, or the default value 123456 when it's null.

how to reference a result in a subquery

I have the following table in an sqlite database
+----+-------------+-------+
| ID | Week Number | Count |
+----+-------------+-------+
| 1 | 1 | 31 |
| 2 | 2 | 16 |
| 3 | 3 | 73 |
| 4 | 4 | 59 |
| 5 | 5 | 44 |
| 6 | 6 | 73 |
+----+-------------+-------+
I want to get the following table out. Where I get this weeks sales as one column and then the next column will be last weeks sales.
+-------------+-----------+-----------+
| Week Number | This_Week | Last_Week |
+-------------+-----------+-----------+
| 1 | 31 | null |
| 2 | 16 | 31 |
| 3 | 73 | 16 |
| 4 | 59 | 73 |
| 5 | 44 | 59 |
| 6 | 73 | 44 |
+-------------+-----------+-----------+
This is the select statement i was going to use:
select
id, week_number, count,
(select count from tempTable
where week_number = (week_number-1))
from
tempTable;
You are comparing values in two different rows. When you are just writing week_number, the database does not know which one you mean.
To refer to a column in a specific table, you have to prefix it with the table name: tempTable.week_number.
And if both tables have the same name, you have to rename at least one of them:
SELECT id,
week_number,
count AS This_Week,
(SELECT count
FROM tempTable AS T2
WHERE T2.week_number = tempTable.week_number - 1
) AS Last_Week
FROM tempTable;
In case of you want to take a query upon a same table twice, you have to put aliases on the original one and its replicated one to differentiate them
select a.week_number,a.count this_week,
(select b.count from tempTable b
where b.week_number=(a.week_number-1)) last_week
from tempTable a;

Converting column to rows in Oracle 11g

I have a table like this:
+----+----------+----------+----------+-----------+----------+----------+
| ID | AR_SCORE | ER_SCORE | FS_SCORE | CPF_SCORE | IF_SCORE | IS_SCORE |
+----+----------+----------+----------+-----------+----------+----------+
| 1 | 25 | 35 | 45 | 55 | 65 | 75 |
| 2 | 95 | 85 | 75 | 65 | 55 | 45 |
+----+----------+----------+----------+-----------+----------+----------+
And I need to extract this:
+----+----------+-------+
| ID | SCORE | VALUE |
+----+----------+-------+
| 1 | AR_SCORE | 25 |
| 1 | ER_SCORE | 35 |
| 2 | AR_SCORE | 95 |
+----+----------+-------+
I read many questions about how to use pivoting in oracle but I could not make it work.
The conversion from columns into rows is actually an UNPIVOT. Since you are using Oracle 11g there are a few ways that you can get the result.
The first way would be using a combination of SELECT yourcolumn FROM...UNION ALL:
select ID, 'AR_SCORE' as Score, AR_SCORE as value
from yourtable
union all
select ID, 'ER_SCORE' as Score, ER_SCORE as value
from yourtable
union all
select ID, 'FS_SCORE' as Score, FS_SCORE as value
from yourtable
union all
select ID, 'IF_SCORE' as Score, IF_SCORE as value
from yourtable
union all
select ID, 'IS_SCORE' as Score, IS_SCORE as value
from yourtable
order by id
See Demo. Using UNION ALL was how you needed to unpivot data prior to Oracle 11g, but starting in that version the UNPIVOT function was implemented. This will get you the same result with fewer lines of code:
select ID, Score, value
from yourtable
unpivot
(
value
for Score in (AR_SCORE, ER_SCORE, FS_SCORE, IF_SCORE, IS_SCORE)
) un
order by id
See Demo. Both will give a result:
| ID | SCORE | VALUE |
|----|----------|-------|
| 1 | AR_SCORE | 25 |
| 1 | FS_SCORE | 45 |
| 1 | IS_SCORE | 75 |
| 1 | IF_SCORE | 65 |
| 1 | ER_SCORE | 35 |
| 2 | FS_SCORE | 75 |
| 2 | IS_SCORE | 45 |
| 2 | ER_SCORE | 85 |
| 2 | IF_SCORE | 55 |
| 2 | AR_SCORE | 95 |

Resources