Looking for a way to calculate aggregates without collapsing rows - azure-data-explorer

As the title says, I'd like to find an efficient way to calculate aggregates over groups of rows without collapsing those rows together. For an example I want to create the mean column in the table below.
|------------|---------|-------------|
| category | value | mean(value) |
|------------|---------|-------------|
| A | 1 | 3 |
|------------|---------|-------------|
| A | 3 | 3 |
|------------|---------|-------------|
| A | 5 | 3 |
|------------|---------|-------------|
| B | 1 | 1.5 |
|------------|---------|-------------|
| B | 2 | 1.5 |
|------------|---------|-------------|
So far, the best way I've found to do this is:
T
| join kind=leftouter (T | summarize avg() by category) on category
This seems to be causing performance problems. I'm also aware of a way of doing it using partition by, but need to support having more than 64 categories.
Am I missing a good way of doing this task?

Here you go:
let MyTable = datatable(Category:string, value:long) [
"A", 1,
"A", 3,
"A", 5,
"B", 1,
"B", 2
];
let Avgs = MyTable | summarize avg(value) by Category;
MyTable | lookup (Avgs) on Category
This will output exactly what you want.
Explanation:
First you create a temporary table (using a let statement) named Avgs, where you'll have the average per Category.
Your main statement is to output MyTable, but for every category you want to also display the relevant value from Avgs, which you achieve by using the lookup operator.

Related

Kusto - Get Average and Count in the same row

Using Kusto, I want to write a query to see the average duration of events and total count of those events as well. I am able to do it in two queries like this but is it possible to do this in 1 query?
dependencies
| where type == 'SQL' and operation_Name == 'something'
| summarize avg(duration) by data
| order by data asc
dependencies
| where type == 'SQL' and operation_Name == 'something'
| summarize count() by data
| order by data asc
This is giving me what I want in two separate results. For example I get this:
A 1.2
B 1.4
C 4.5
and
A 5
B 90
C 78
but what I want is this
A 1.2 5
B 1.4 90
C 4.5 78
See summarize operator for syntax
T | summarize [SummarizeParameters] [[Column =] Aggregation [, ...]] [by [Column =] GroupExpression [, ...]]
dependencies
| where type == 'SQL' and operation_Name == 'something'
| summarize avg(duration), count() by data
| order by data asc

query to transpose rows to columns SQLite

Good afternoon,
I would like to know if it is possible to make a query to generate columns according to the number of rows that I have in my table
example:
ID COD DIAG
111111111 | Z359 | D
111111112 | Z359 | D
111111112 | Z359 | D
111111113 | Z359 | R
111111113 | Z359 | P
111111113 | Z359 | R
111111114 | Z359 | D
111111114 | Z359 | D
111111114 | Z359 | D
111111115 | Z359 | D
it would be ideal that columns be created according to the number of rows for each id, if not possible it would put a fixed number of columns.
result query
ID | COD1 | DIAG1 | COD2 | DIAG2 | COD3 | DIAG3
111111111 | Z359 | D | | | |
111111112 | Z359 | D | Z359 | D | |
111111113 | Z359 | R | Z359 | P | Z359 | R
111111114 | Z359 | D | Z359 | D | Z359 | D
111111115 | Z359 | D | | | |
sorry my english
Thanks a Lot !!
This first query follows the pattern of the answer to the duplicate question, included here for comparison.
WITH numbered AS (
SELECT row_number() OVER
(PARTITION BY ID ORDER BY COD, DIAG)
AS seq,
t.*
FROM SO58566470 t)
SELECT ID,
max(CASE WHEN seq = 1 THEN COD END) AS COD1,
max(CASE WHEN seq = 1 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 2 THEN COD END) AS COD1,
max(CASE WHEN seq = 2 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 3 THEN COD END) AS COD3,
max(CASE WHEN seq = 3 THEN DIAG END) AS DIAG3
FROM numbered n
GROUP BY ID;
But that really is a naive use of window functions, since it could have maximized the window by calculating other values at the same time. The first query is already collecting and traversing partitioned rows to get the row number, yet it essentially repeats that process twice by collecting values in the next query using the aggregate max() functions.
The following query looks longer and perhaps more complicated, but it takes advantage of the partitioned data (i.e. window data) by collecting the transformed values in the same process. But because window functions necessarily operate on each row, it becomes necessary to filter out "incomplete" rows. I did not do any kind of profiling on the queries, but I suspect this second query is much more efficient overall.
WITH transform AS (
SELECT id,
lag(COD, 0) OVER IDWin AS COD1,
lag(DIAG, 0) OVER IDWin AS DIAG1,
lag(COD, 1) OVER IDWin AS COD2,
lag(DIAG, 1) OVER IDWin AS DIAG2,
lag(COD, 2) OVER IDWin AS COD3,
lag(DIAG, 2) OVER IDWin AS DIAG3,
row_number() OVER IDWin AS seq
FROM SO58566470 t
WINDOW IDWin AS (PARTITION BY ID ORDER BY COD, DIAG)
ORDER BY ID, SEQ
),
last AS (
SELECT id, max(seq) as maxseq
FROM transform
GROUP BY id
)
SELECT transform.*
FROM transform
JOIN last
ON transform.id = last.id AND transform.seq = last.maxseq
ORDER BY id;

Add serial number for each id based on dates

I have a dataset like shown below (except the Ser_NO, this is the field i want to create).
+--------+------------+--------+
| CaseID | Order_Date | Ser_No |
+--------+------------+--------+
| 44 | 22-01-2018 | 1 |
+--------+------------+--------+
| 44 | 24-02-2018 | 3 |
+--------+------------+--------+
| 44 | 12-02-2018 | 2 |
+--------+------------+--------+
| 100 | 24-01-2018 | 1 |
+--------+------------+--------+
| 100 | 26-01-2018 | 2 |
+--------+------------+--------+
| 100 | 27-01-2018 | 3 |
+--------+------------+--------+
How can i achieve a serial number for each CaseId based on my dates. So the first date in a specific CaseID gets number 1, the second date in this CaseID gets number 2 and so on.
I'm working with T-SQL btw,
I've tried a few things:
CASE
WHEN COUNT(CaseID) > 1
THEN ORDER BY (Order_Date)
AND Ser_no +1
END
Thanks in advance.
First of all, although I don't understand what you did, it gives you what you wanted. The serial number is assigned by date order. The problem I can see is that the result shows you the rows in the wrong order (1, 3, 2 instead of 1, 2, 3).
To sort that order you can try this:
SELECT *, ROW_NUMBER() OVER (PARTITION BY caseid ORDER BY caseid, order_date) AS ser_no
FROM [Table]
Thanks for your reply,
Sorry for the misunderstanding, because the ser_no is not yet in my table. That is the field a want to calculate.
I finished it myself this morning, but it looks almost the same like your measure:
RANK() OVER(PARTITION BY CaseID ORDER BY CaseID, Order_Date ASC

selecting a row based on a number of column values in SQLite

I have a table with this structure:
id | IDs | Name | Type
1 | 10 | A | 1
2 | 11 | B | 1
3 | 12 | C | 2
4 | 13 | D | 3
except id nothing else is a FOREIGN or PRIMARY KEY. I want to select a row based on it's column values that are not PRIMARY KEY. I have tried the following syntax but it yields no results.
SELECT * FROM MyTable WHERE Name = 'A', Type = 1;
what am I doing wrong? What is exactly returned by a SELECT statement? I'm totally new to Data Base and I'm currently experimenting and trying to learn it. so far my search has not yield any results regarding this case.
Use and to add multiple conditions to your query
SELECT *
FROM MyTable
WHERE Name = 'A'
AND Type = 1;

How to add a new column in a View in sqlite?

I have this database in sqlite (table1):
+-----+-------+-------+
| _id | name | level |
+-----+-------+-------+
| 1 | Mike | 3 |
| 2 | John | 2 |
| 3 | Bob | 2 |
| 4 | David | 1 |
| 5 | Tom | 2 |
+-----+-------+-------+
I want to create a view with all elements of level 2 and then to add a new column indicating the order of the row in the new table. That is, I would want this result:
+-------+------+
| index | name |
+-------+------+
| 1 | John |
| 2 | Bob |
| 3 | Tom |
+-------+------+
I have tried:
CREATE VIEW words AS SELECT _id as index, name FROM table1;
But then I get:
+-------+------+
| index | name |
+-------+------+
| 2 | John |
| 3 | Bob |
| 5 | Tom |
+-------+------+
I suppose it should be something as:
CREATE VIEW words AS SELECT XXXX as index, name FROM table 1;
What should I use instead of XXXX?
When ordered by _id, the number of rows up to and including this one is the same as the number of rows where the _id value is less than or equal to this row's _id:
CREATE VIEW words AS
SELECT (SELECT COUNT(*)
FROM table1 b
WHERE level = 2
AND b._id <= a._id) AS "index",
name
FROM table1 a
WHERE level = 2;
(The computation itself does not actually require ORDER BY _id because the order of the rows does not matter when we're just counting them.)
Please note that words is not guaranteed to be sorted; add ORDER BY "index" if needed.
And this is, of course, not very efficient.
You have two options. First, you could simply add a new column with the following:
ALTER TABLE {tableName} ADD COLUMN COLNew {type};
Second, and more complicatedly, but would actually put the column where you want it, would be to rename the table:
ALTER TABLE {tableName} RENAME TO TempOldTable;
Then create the new table with the missing column:
CREATE TABLE {tableName} (name TEXT, COLNew {type} DEFAULT {defaultValue}, qty INTEGER, rate REAL);
And populate it with the old data:
INSERT INTO {tableName} (name, qty, rate) SELECT name, qty, rate FROM TempOldTable;
Then delete the old table:
DROP TABLE TempOldTable;
I'd much prefer the second option, as it will allow you to completely rename everything if need be.

Resources