MariaDB How to group by 2 or more columns combined - mariadb

I want to get 1 entry per day per hour from my MariaDB database.
I have a table structured like this (with some more columns):
+------------+-----------+
| dayOfMonth | hourOfDay |
+------------+-----------+
Let's assume this table is filled like this:
+------------+-----------+
| dayOfMonth | hourOfDay |
+------------+-----------+
| 11 | 0 |
| 11 | 0 |
| 11 | 1 |
| 12 | 0 |
| 12 | 0 |
| 12 | 1 |
+------------+-----------+
What I want to get is this(in fact all columns) (Every hourOfDay for each dayOfMonth):
+------------+-----------+
| dayOfMonth | hourOfDay |
+------------+-----------+
| 11 | 0 |
| 11 | 1 |
| 12 | 0 |
| 12 | 1 |
+------------+-----------+
I was able to achieve this with this statement, but it would become way too long if I want to do this for an entire month:
(SELECT * FROM table WHERE dayOfMonth = 11 GROUP BY hourOfDay)
UNION
(SELECT * FROM table WHERE dayOfMonth = 12 GROUP BY hourOfDay)

You can group by dayOfMonth, hourOfDay:
SELECT dayOfMonth, hourOfDay
FROM table
GROUP BY dayOfMonth, hourOfDay
ORDER BY dayOfMonth, hourOfDay
This way you can't select other columns (if they exist), only aggregate on them with MIN(), MAX(), AVG() etc.
Or use DISTINCT:
SELECT DISTINCT dayOfMonth, hourOfDay
FROM table
ORDER BY dayOfMonth, hourOfDay

Your question is unclear. This will transform your initial data into your proposed data:
SELECT DISTINCT
dayOfMonth, hourOfDay
FROM tbl;
"Every hourOfDay" -- do you want all hours 24 rows per day? Of so, see the "sequence table" (eg, seq_0_to_23) feature in MariaDB.

Related

How to get most recent data from DynamoDB for each primary partition key in PartiQL

inspired from this How to get most recent data from DynamoDB for each primary partition key?
I have a table in dynamodb. It stores account stats. It's possible that the account stats will be updated several times per day. So table records may look like:
+------------+--------------+-------+-------+
| account_id | record_id | views | stars |
+------------+--------------+-------+-------+
| 3 | 2019/03/16/1 | 29 | 3 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/2 | 130 | 21 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/3 | 12 | 2 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/1 | 57 | 12 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/2 | 8 | 2 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/1 | 3 | 0 |
+------------+--------------+-------+-------+
account_id is a primary partition key. record_id is a primary sort key
How I can get only latest records for each of the account_ids? So from the example above I expect to get:
+------------+--------------+-------+-------+
| account_id | record_id | views | stars |
+------------+--------------+-------+-------+
| 3 | 2019/03/16/1 | 29 | 3 |
+------------+--------------+-------+-------+
| 2 | 2019/03/16/2 | 130 | 21 |
+------------+--------------+-------+-------+
| 1 | 2019/03/16/3 | 12 | 2 |
+------------+--------------+-------+-------+
This data is convenient to use for a reporting purposes.
Execute the following PartiQL query for each account_id:
SELECT * FROM <Table> WHERE account_id='3' AND record_id > '2021/11' ORDER BY record_id DESC
PartiQL has no LIMIT keyword, so will return all matching records.
You can reduce overfetching by constraining the record_id date to the extent possible. If only the current date is of interest, for example, the sort key expression would be record_id > 2021/12/01.
As in the referenced example, you must execute one query for each account_id of interest. Batching operations are supported.

SQLite time duration calculation from rows

I want to calculate duration between rows with datetime data in SQLite.
Let's consider this for the base data (named intervals):
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | Off |
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
I'd like to calculate the duration for both On and Off state:
| Total On | 26mins |
| Total Off | 48mins |
Then I wrote this query:
SELECT
"Total " || interval_start.state AS state,
(SUM(strftime('%s', interval_end.date)-strftime('%s', interval_start.date)) / 60) || "mins" AS duration
FROM
intervals interval_start
INNER JOIN
intervals interval_end ON interval_end.id =
(
SELECT id FROM intervals WHERE
id > interval_start.id AND
state = CASE WHEN interval_start.state = 'On' THEN 'Off' ELSE 'On' END
ORDER BY id
LIMIT 1
)
GROUP BY
interval_start.state
However if the base data is a not in strict order:
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | On | !!!
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
My query will calculate wrong, as it will pair the only Off date with each On dates and sum them together.
Desired behavior should result something like this:
| Total On | 74mins |
| Total Off | 0mins | --this line can be omitted, or can be N/A
I have two questions:
How can I rewrite the query to handle these wrong data situations?
I feel my query is not the best in terms of performance, is it possible to improve it?
Use a CTE where you return only the starting rows of each state and then aggregate:
with cte as (
select *, lead(id) over (order by date) next_id
from (
select *, lag(state) over (order by date) prev_state
from intervals
)
where state <> coalesce(prev_state, '')
)
select c1.state,
sum(strftime('%s', c2.date) - strftime('%s', c1.date)) / 60 || 'mins' duration
from cte c1 inner join cte c2
on c2.id = c1.next_id
group by c1.state
See the demos: 1 and 2

how to reference a result in a subquery

I have the following table in an sqlite database
+----+-------------+-------+
| ID | Week Number | Count |
+----+-------------+-------+
| 1 | 1 | 31 |
| 2 | 2 | 16 |
| 3 | 3 | 73 |
| 4 | 4 | 59 |
| 5 | 5 | 44 |
| 6 | 6 | 73 |
+----+-------------+-------+
I want to get the following table out. Where I get this weeks sales as one column and then the next column will be last weeks sales.
+-------------+-----------+-----------+
| Week Number | This_Week | Last_Week |
+-------------+-----------+-----------+
| 1 | 31 | null |
| 2 | 16 | 31 |
| 3 | 73 | 16 |
| 4 | 59 | 73 |
| 5 | 44 | 59 |
| 6 | 73 | 44 |
+-------------+-----------+-----------+
This is the select statement i was going to use:
select
id, week_number, count,
(select count from tempTable
where week_number = (week_number-1))
from
tempTable;
You are comparing values in two different rows. When you are just writing week_number, the database does not know which one you mean.
To refer to a column in a specific table, you have to prefix it with the table name: tempTable.week_number.
And if both tables have the same name, you have to rename at least one of them:
SELECT id,
week_number,
count AS This_Week,
(SELECT count
FROM tempTable AS T2
WHERE T2.week_number = tempTable.week_number - 1
) AS Last_Week
FROM tempTable;
In case of you want to take a query upon a same table twice, you have to put aliases on the original one and its replicated one to differentiate them
select a.week_number,a.count this_week,
(select b.count from tempTable b
where b.week_number=(a.week_number-1)) last_week
from tempTable a;

grand total column in Impala using window function

I'm looking for a way to do a "grand total" column across ALL groups in Impala.
It's easy to use window functions to obtain total of partitioned groups like this:
sum(x) over (partition by A)
however it does not appear that there is an expression to partition by 'all'. Is this a shortcoming in Impala? It looks like in postgress you can leave the over statement blank.
The partition clause is optional. You can write a query like this:
select sum(x) over () from t;
For example:
[localhost:21000] > select id, sum(id) over () from tbl;
+----+-------------------+
| id | sum(id) OVER(...) |
+----+-------------------+
| 0 | 28 |
| 1 | 28 |
| 2 | 28 |
| 3 | 28 |
| 6 | 28 |
| 7 | 28 |
| 4 | 28 |
| 5 | 28 |
+----+-------------------+
Fetched 8 row(s) in 0.08s

Get number of distinct records with same key

Let's assume I have a table A:
| ID | B_ID | C | column 1 | ... | column x|
| 1 | 24 | 44 | xxxxxxx
| 2 | 25 | 55 | xxxxxxx
| 3 | 25 | 66 | xxxxxxx (data in all other columns are the same)
| 4 | 26 | 77 | xxxxxxx
| 4 | 26 | 78 | xxxxxxx
| 4 | 26 | 79 | xxxxxxx
I want to get highest number of distinct records with same B_ID (and I also want to know B_ID where this occurs). So in this example I want to get values 3 and 26.
What would be best approach to achieve this?
The best approach is a simple aggregation + ordering + limiting the number of rows returned to 1:
select b_id, no_of_records
from (
select b_id, count(1) as no_of_records
from table_A
group by b_id
order by no_of_records desc
)
where rownum <= 1;
This, of course, returns 1 row even if you have multiple groups with the same highest number of records. If you want all groups with the same highest number of records, then the approach is slightly different ...
select b_id, no_of_records
from (
select b_id, count(1) as no_of_records,
rank() over (partition by null order by count(1) desc) as rank$
from table_A
group by b_id
)
where rank$ <= 1;

Resources