Kusto summarize unique occurrences of the value in the column

Kusto summarize unique occurrences of the value in the column - azure-data-explorer

I have following dataset:
let t1 = datatable(id:string, col1:string, col2:string)
[
'1', 'ValueA', 'AT',
'2', 'ValueC', 'AT',
'3', 'ValueA', 'AT',
'4', 'ValueB', 'AT',
'1', 'ValueC', 'v-username',
];
t1
| summarize (Id) by col1
My goal is to count occurrences of values in col1 per Id. Because ID=1 occurs twice, I need to decide whether to take ValueA or ValueC. This is decided by value of col2. If col2 startswith "v-" then take Value from this row.
When I use "summarize (Id) by col1" I am getting:
ValueA,2
ValueC,2
ValueB,1
ValueD,1
Total:6
Expected result is:
ValueA,1
ValueC,2
ValueB,1
ValueD,1
Total:5
Is it possible to achieve with Kusto?

a. when you run ... | summarize (id) by col1" you should get a semantic error as there's no aggregation function specified (e.g. you could have run... | summarize dcount(id) by col1`)
b. it's not clear where ValueD, 1, in your expected result, come from. as your datatable expression includes no record with ValueD
c. if i had to guess the solution to your question, despite a and b, this would be my guess:
let t1 = datatable(id:string, col1:string, col2:string)
[
'1', 'ValueA', 'AT',
'2', 'ValueC', 'AT',
'3', 'ValueD', 'AT',
'4', 'ValueB', 'AT',
'1', 'ValueC', 'v-username',
];
t1
| summarize c = dcount(id) by col1
| as T
| union (T | summarize c = sum(c) by col1 = "Total")

Related

Join values of two tables that represent the missing values between those tables in SQLITE

I have the following table
CREATE TABLE holes (`tournament_id` INTEGER, `year` INTEGER, `course_id` INTEGER, `round` INTEGER, `hole` INTEGER, `front` INTEGER, `side` INTEGER, `region` INTEGER);
With the following data sample
INSERT INTO holes (`tournament_id`, `year`, `course_id`, `round`, `hole`, `front`, `side`, `region`) VALUES
('33', '2016', '895', '1', '1', '12', '5', 'L'),
('33', '2016', '895', '1', '2', '18', '10', 'R'),
('33', '2016', '895', '1', '3', '15', '7', 'R'),
('33', '2016', '895', '1', '4', '11', '7', 'R'),
('33', '2016', '895', '1', '5', '18', '7', 'L'),
('33', '2016', '895', '1', '6', '28', '5', 'L'),
('33', '2016', '895', '1', '7', '21', '12', 'R'));
In addition, I have another table tournaments
CREATE TABLE tournaments (`tournament_id` INTEGER, `year` INTEGER, `R1` INTEGER, `R2` INTEGER, `R3` INTEGER, `R4` INTEGER);
With data
INSERT INTO tournaments VALUES
(33, 2016, 715, 715, 895, 400);
The values for R1, R2, R3 and R4 present ids of the courses.
I want the columns tournament_id, year and course_id that are missing in table holes based on all the possible values of table tournaments.
With the help of this answer I tried the following:
WITH h AS (
SELECT DISTINCT tournament_id, year, course_id
FROM holes)
SELECT t.tournament_id, t.year
FROM tournaments t
WHERE NOT EXISTS (
SELECT *
FROM h
WHERE h.tournament_id = t.tournament_id
AND h.year = t.year
AND h.course_id IN (t.R1, t.R2, t.R3, t.R4)
);
demo
The above goes a long way but I also want the h.course_id that is/are missing. Desired result:
33 2016 715
33 2016 400
These combinations of tournament_id, year and course_id are not present in holes. However, they do exists because they are present in tournaments.

For this requirement you need a resultset consisting of all the values of the Rx columns which you can get with UNION in a CTE.
Then you can use NOT EXISTS to get all the combinations of id, year and course that do not exist in holes:
WITH cte AS (
SELECT id, year, R1 AS course FROM tournaments
UNION
SELECT id, year, R2 FROM tournaments
UNION
SELECT id, year, R3 FROM tournaments
UNION
SELECT id, year, R4 FROM tournaments
)
SELECT c.*
FROM cte c
WHERE NOT EXISTS (
SELECT *
FROM holes h
WHERE (h.id, h.year, h.course) = (c.id, c.year, c.course)
);
See the demo.

Neo4js conditional information storage

I am new to graph database and stuck with the following issue. I'm trying to store below conditional information in the graph.
when a=1 and b=2 then sum=3,
when a=2 and b=3 then sum=5 and mul=6
Here there are 4 pre-conditions[(a=1, b=2),(a=2, b=3)], 3 post conditions(sum=3,sum=5,mul=6)
The number of pre/post conditions can change from sentence to sentence.
What is the appropriate way to store such information in graphs.
Case 1:
Case 2:
Or please do suggest any other scalable way to store such info which can be easily queried.

One option is to use something like this graph, only Input, Cond and Res nodes:
MERGE (a:Input{key: 'a', value: 2})
MERGE (b:Input{key: 'b', value: 1})
MERGE (c:Res{key: 'sum', value: 5})
MERGE (d:Input{key: 'a', value: 7})
MERGE (e:Res{key: 'sum', value: 19})
MERGE (a)-[:POINTS]-(c)
MERGE (b)-[:POINTS]-(c)
MERGE (d)-[:POINTS]-(e)
MERGE (b)-[:POINTS]-(e)
With a result from a query like this:
MATCH (n:Res{key: 'sum'})<-[:POINTS]-(a:Input{key: 'a', value: 2})
WITH n
MATCH (n)<-[:POINTS]-(b:Input{key: 'b', value: 1})
WITH n
MATCH (n)<--(p:Input)
WITH n, COUNT(p) as inputCount
WHERE inputCount=2
RETURN n
Or:
MATCH (res:Res)<--(i:Input)
WITH res, count(i) as inputCount
WHERE EXISTS {MATCH (res)<--(Input{key: 'a', value: 2})}
AND EXISTS {MATCH (res)<--(Input{key: 'b', value: 1})}
AND inputCount=2
RETURN res
But keep in mind that this works for 'AND' conditions only

Kusto | KQL: Expand dynamic column to all combinations of two ( Couples | Tuples )

I have a scenario where I am trying to create a view that shows me all the unique couples of values per key. For example:
datatable(Key:string, Value:string)[
'1', 'A',
'2', 'B',
'2', 'C',
'3', 'A',
'3', 'B',
'3', 'C',
'3', 'C']
| sort by Key, Value asc
| summarize Tuples=make_set(Value) by Key
Result:
Key Tuples
1 ["A"]
2 ["B","C"]
3 ["A","B","C"]
Desired Result:
Key Tuples
1 ["A"]
2 ["B","C"]
3 ["A","B"]
3 ["A","C"]
3 ["B","C"]
How can I achieve this in KQL?

Here's a not too elegant nor efficient way, that uses an inner self join to get all combinations per Key
datatable(Key:string, Value:string)
[
'1', 'A',
'2', 'B',
'2', 'C',
'3', 'A',
'3', 'B',
'3', 'C',
'3', 'C'
]
| distinct Key, Value
| as hint.materialized=true T1
| join kind=inner T1 on Key
| where Value != Value1
| project Key, Tuple = tostring(array_sort_asc(pack_array(Value, Value1)))
| distinct Key, Tuple
| as hint.materialized=true T2
| union (
T1
| where Key !in ((T2 | project Key)) | project Key, Tuple = tostring(pack_array(Value))
)
| order by Key asc, Tuple asc
Key
Tuple
1
["A"]
2
["B","C"]
3
["A","B"]
3
["A","C"]
3
["B","C"]

Kusto | add column to show percentages of total

I have seen several other questions similar to this but each is slightly different and none of them provide an answer I was able to adapt to my situation. I have a table like this:
let T =
datatable(Val1:string, Val2:string, Val3:bool)
[
'', 'false', 'false',
'Yes', 'false', 'true',
'No', 'false', 'false',
'Yes', 'false', 'false'
]
;
I want to only get results where Val3 is false, summarize count by Val1 & Val2, then extend a column to show the percentage of the whole on each row. I have tried doing this:
T
| where Val3 == "false"
| summarize Count = count() by Val1, Val2
| let Total = sum(Count)
| extend Percentage = round(100.0 * Count/Total, 0)
Which throws an error "A recognition error occurred. Token: let" And I've tried many variations such as this:
T
| where Val3 == "false"
| summarize Count = count() by Val1, Val2
| extend Total = sum(Count)
| extend Percentage = round(100.0 * Count/Total, 0)
Which throws an error "Function 'sum' cannot be invoked in current context". If I change extend Total to summarize Total then that line works, but it throws an error about not recognizing Count on the next line. If I add Count on the summarize line like this:
| summarize Total = sum(Count), Count
Then I get an error "Non valid aggregation function is used after summarize".
This is the output I'm going for:
It seems like this is a lot more difficult than it should be. What am I missing?

You can calculate the percentage using a sub-query to get the total, passed into toscalar(). In the examples below, I've also used the as operator.
toscalar(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/toscalarfunction
as operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/asoperator
datatable(Val1:string, Val2:string, Val3:bool)
[
'', 'false', 'false',
'Yes', 'false', 'true',
'No', 'false', 'false',
'Yes', 'false', 'false'
]
| where Val3 == "false"
| as T
| summarize Count = count() by Val1, Val2
| extend Percentage = round(100.0 * Count / toscalar(T | count), 2)
or, following the same concept - here's an alternative:
datatable(Val1:string, Val2:string, Val3:bool)
[
'', 'false', 'false',
'Yes', 'false', 'true',
'No', 'false', 'false',
'Yes', 'false', 'false'
]
| where Val3 == "false"
| summarize Count = count() by Val1, Val2
| as T
| extend Percentage = round(100.0 * Count / toscalar(T | summarize sum(Count)), 2)

SQLite 5 tables, Join, Sum, GroupBy

I am desperately working on a issue and cannot resolve. I have a SQLite DB with five tables and corresponding columns:
Tab1 = {Job_ID, Company_ID, Source_ID}
Tab1_Category = {JOb_ID, CAtegory_ID}
Category = {ID, First_level, Second_level}
Tab2 = {Job_ID, Log_Date, Clicks, Applications}
Source = {ID, Name}
I craeated a sample db:
CREATE TABLE TAB1 (
`job_id` INTEGER,
`company_id` INTEGER,
`source_id` INTEGER
);
INSERT INTO TAB1
(`job_id`, `company_id`, `source_id`)
VALUES
('1', '222', '2'),
('2', '222', '1'),
('3', '222', '1'),
('4', '222', '1'),
('5', '255', '3');
CREATE TABLE TAB1_CATEGORY (
`job_id` INTEGER,
`category_id` INTEGER
);
INSERT INTO TAB1_CATEGORY
(`job_id`, `category_id`)
VALUES
('1', '31'),
('2', '36'),
('3', '33'),
('3', '35'),
('4', '32'),
('4', '31'),
('5', '34');
CREATE TABLE CATEGORY (
`id` INTEGER,
`first_level` VARCHAR(3),
`second_level` VARCHAR(3)
);
INSERT INTO CATEGORY
(`id`, `first_level`, `second_level`)
VALUES
('30', 'sss', 'aaa'),
('31', 'sss', 'aaa'),
('32', 'sss', 'bbb'),
('33', 'ggg', 'ccc'),
('34', 'ggg', 'ddd'),
('35', 'ggg', 'eee'),
('36', 'hhh', 'fff');
CREATE TABLE SOURCE (
`id` INTEGER,
`name` VARCHAR(3)
);
INSERT INTO SOURCE
(`id`, `name`)
VALUES
('1', 'mmm'),
('2', 'nnn'),
('3', 'ooo');
CREATE TABLE TAB2 (
`job_id` INTEGER,
`log_date` VARCHAR(10),
`clicks` INTEGER,
`applications` INTEGER
);
INSERT INTO TAB2
(`job_id`, `log_date`, `clicks`, `applications`)
VALUES
('1', '01-01-1999', '6', '2'),
('1', '02-01-1999', '7', '3'),
('1', '03-01-1999', '9', '1'),
('2', '02-01-1999', '4', '1'),
('2', '05-01-1999', '8', '2'),
('3', '03-01-1999', '9', '0'),
('4', '05-01-1999', '5', '3'),
('4', '06-01-1999', '4', '1'),
('5', '01-01-1999', '1', '0'),
('5', '03-01-1999', '3', '1');
I need the following results with one query>
list of all JOB_ID (Tab1) and Company_ID (Tab1) where First_level(from table category) is "ggg" or "sss" and Name (from table Source) is "mmm"
sum of clicks and sum of applications (Tab2) per Job_ID
sum of distinct Second_level (from table Category)
sum of applications for each company_ID (a company_ID can have many Job_ids)
This is what I did so far, but is not working the way i want it>
SELECT t1.job_id, t1.company_id,
SUM(t2.clicks), SUM(t2.applications), COUNT(DISTINCT c.second_level)
FROM TAB1 t1
JOIN SOURCE s ON s.id = t1.source_id
JOIN TAB1_CATEGORY tc ON t1.job_id = tc.job_id
JOIN CATEGORY c ON tc.category_id = c.id
JOIN TAB2 t2 ON t1.job_id = t2.job_id
WHERE c.first_level IN ('ggg', 'sss') AND s.NAME ='mmm'
GROUP BY t1.job_id
What I get is sum of all clicks/applications and not per job_id. :
job_id
company_id
SUM(t2.clicks)
SUM(t2.applications)
COUNT(DISTINCT c.second_level)
3
222
18
0
2
4
222
18
8
2
And this is what i want to get:
job_id
company_id
SUM(t2.clicks)
SUM(t2.applications)
COUNT(DISTINCT c.second_level)
Total Appl per company
3
222
9
0
2
4
4
222
9
4
2
4

First you must aggregate inside TAB2 and then join (with INNER joins).
Also you need SUM() window function for the column Total Appl per company:
SELECT t1.JOB_ID, t1.COMPANY_ID,
t2.total_clicks, t2.total_apps,
COUNT(DISTINCT c.SECOND_LEVEL) count_second_level,
SUM(t2.total_apps) OVER (PARTITION BY t1.COMPANY_ID) [Total Appl per company]
FROM TAB1 t1
INNER JOIN SOURCE s ON s.ID = t1.SOURCE_ID
INNER JOIN TAB1_CATEGORY tc ON t1.JOB_ID = tc.JOB_ID
INNER JOIN CATEGORY c ON tc.CATEGORY_ID = c.ID
INNER JOIN (
SELECT JOB_ID, SUM(CLICKS) total_clicks, SUM(APPLICATIONS) total_apps
FROM TAB2
GROUP BY JOB_ID
) t2 ON t1.JOB_ID = t2.JOB_ID
WHERE c.FIRST_LEVEL IN ('ggg', 'sss') AND s.NAME ='mmm'
GROUP BY t1.JOB_ID, t1.COMPANY_ID, t2.total_clicks, t2.total_apps
See the demo.
Results:
> job_id | company_id | total_clicks | total_apps | count_second_level | Total Appl per company
> -----: | ---------: | -----------: | ---------: | -----------------: | ---------------------:
> 3 | 222 | 9 | 0 | 2 | 4
> 4 | 222 | 9 | 4 | 2 | 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Kusto summarize unique occurrences of the value in the column - azure-data-explorer

Related

Join values of two tables that represent the missing values between those tables in SQLITE

Neo4js conditional information storage

Kusto | KQL: Expand dynamic column to all combinations of two ( Couples | Tuples )

Kusto | add column to show percentages of total

SQLite 5 tables, Join, Sum, GroupBy

Categories

Resources