everyone!
Do you know how can I create a recursive query using KQL in application insights?
Just to give you a context: As you know, currently there is a hierarchical relationship between the tables requests and dependencies in application insights by the id and operation_ParentId columns:
->(system A) request id=req_1, parent_id=dep_1
-> (system B) dependency id=dep_2, parent_id=req_1
->(system C) request id=req_3, parent_id=dep_2
I'm trying to build a tree view in my Workbook to have a better distributed tracing visualization and consequently know what happened in a specific request.
Do you know if there's something I can use to archive that goal?
Here is a quick example for how you can work with hierarchical data using multiple JOINs.
Please note that you must assume the depth of the tree and matched the number of JOINs to that assumption, e.g., in the following example I assumed there are no more than 7 hierarchy levels.
In the following example we traverse from the root to the leaves.
In the result data set, we get a record of every leaf in the hierarchical tree.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| where pid == -1
| join kind=leftouter t on $left.id == $right.pid
| join kind=leftouter t on $left.id1 == $right.pid
| join kind=leftouter t on $left.id2 == $right.pid
| join kind=leftouter t on $left.id3 == $right.pid
| join kind=leftouter t on $left.id4 == $right.pid
| join kind=leftouter t on $left.id5 == $right.pid
| join kind=leftouter t on $left.id6 == $right.pid
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
1
3
7
8
9
1
3
7
8
10
1
2
6
1
2
5
1
2
4
Fiddle
In the following example we traverse from each node (of any kind), up to the root.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| join kind=leftouter t on $left.pid == $right.id
| join kind=leftouter t on $left.pid1 == $right.id
| join kind=leftouter t on $left.pid2 == $right.id
| join kind=leftouter t on $left.pid3 == $right.id
| join kind=leftouter t on $left.pid4 == $right.id
| join kind=leftouter t on $left.pid5 == $right.id
| join kind=leftouter t on $left.pid6 == $right.id
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
10
8
7
3
1
9
8
7
3
1
6
2
1
5
2
1
4
2
1
7
3
1
1
3
1
2
1
8
7
3
1
Fiddle
Related
Let example i have one master table
The ID 1, 2 having empty values for X column
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00
2012-12-28T12:04:00
12
11
10
2
2012-12-28T12:06:00
2012-12-28T12:06:00
2
9
7
3
2012-12-29T12:11:00
2012-12-29T12:11:00
2
9
7
1
2012-12-29T12:15:00
2012-12-29T12:15:00
33
7
2
2012-12-29T12:24:00
2012-12-29T12:24:00
9
7
I'm having in function demo(datetime:fromTime, datetime:toTime)
from this I'm querying for fromTime 2012-12-29T12:11:00 to toTime: same 29thdecmber)
so if any empty values there i need to fill
those empty values from previous date with respective column
Need a filled x value for the same ID from the master table
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00
2012-12-28T12:04:00
12
11
10
2
2012-12-28T12:06:00
2012-12-28T12:06:00
2
9
7
3
2012-12-29T12:11:00
2012-12-29T12:11:00
2
9
7
1
2012-12-29T12:15:00
2012-12-29T12:15:00
lastknownvalueforthisID?
33
7
2
2012-12-29T12:24:00
2012-12-29T12:24:00
lastknownvalueforthisID?
9
7
datatable(ID:int, DateTime:datetime, IngestionTime:datetime, X:int, Y:int, Z:int)
[
1 ,datetime(2012-12-28T12:04:00) ,datetime(2012-12-28T12:04:00) ,12 ,11 ,10
,2 ,datetime(2012-12-28T12:06:00) ,datetime(2012-12-28T12:06:00) ,2 ,9 ,7
,3 ,datetime(2012-12-29T12:11:00) ,datetime(2012-12-29T12:11:00) ,2 ,9 ,7
,1 ,datetime(2012-12-29T12:15:00) ,datetime(2012-12-29T12:15:00) ,int(null) ,33 ,7
,2 ,datetime(2012-12-29T12:24:00) ,datetime(2012-12-29T12:24:00) ,int(null) ,9 ,7
]
| partition hint.strategy=native by ID
(
order by DateTime asc
| scan with (step s: true => X = coalesce(X, s.X);)
)
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00Z
2012-12-28T12:04:00Z
12
11
10
1
2012-12-29T12:15:00Z
2012-12-29T12:15:00Z
12
33
7
3
2012-12-29T12:11:00Z
2012-12-29T12:11:00Z
2
9
7
2
2012-12-28T12:06:00Z
2012-12-28T12:06:00Z
2
9
7
2
2012-12-29T12:24:00Z
2012-12-29T12:24:00Z
2
9
7
Fiddle
If the gaps are always at the end, you can use the following query.
let t = datatable(ID:int, DateTime:datetime, IngestionTime:datetime, X:int, Y:int, Z:int)
[
1 ,datetime(2012-12-28T12:04:00) ,datetime(2012-12-28T12:04:00) ,12 ,11 ,10
,2 ,datetime(2012-12-28T12:06:00) ,datetime(2012-12-28T12:06:00) ,2 ,9 ,7
,3 ,datetime(2012-12-29T12:11:00) ,datetime(2012-12-29T12:11:00) ,2 ,9 ,7
,1 ,datetime(2012-12-29T12:15:00) ,datetime(2012-12-29T12:15:00) ,int(null) ,33 ,7
,2 ,datetime(2012-12-29T12:24:00) ,datetime(2012-12-29T12:24:00) ,int(null) ,9 ,7
];
let last_notnull_X_values =
t
| where isnotnull(X)
| summarize arg_max(DateTime, X) by ID
| project ID, new_X = X;
t
| lookup last_notnull_X_values on ID
| extend X = coalesce(X, new_X)
| project-away n`ew_X
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00Z
2012-12-28T12:04:00Z
12
11
10
2
2012-12-28T12:06:00Z
2012-12-28T12:06:00Z
2
9
7
3
2012-12-29T12:11:00Z
2012-12-29T12:11:00Z
2
9
7
1
2012-12-29T12:15:00Z
2012-12-29T12:15:00Z
12
33
7
2
2012-12-29T12:24:00Z
2012-12-29T12:24:00Z
2
9
7
Fidlde
I have the following table :
Group
UserId
count_
1
2
2
1
1
3
2
3
3
2
4
7
I want to run a sum() over partition by group in order to calculate the total requests for every group and add a percentage column for every user in the group.
The expected output :
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
In SQL i would do something like the following :
select group,user,count_/sum(count_) over(partition by group) from table
How can i get this output ?
At least at this point, a JOIN is needed (similarly to a SQL solution without the use of windows functions)
let t = datatable(Group:int, UserId:int, count:int)
[
1 ,2 ,2
,1 ,1 ,3
,2 ,3 ,3
,2 ,4 ,7
];
t
| summarize sum(['count']) by Group
| join kind=inner t on Group
| project Group, UserId, percent = 1.0*['count']/sum_count
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
Fiddle
I have a table like this:
user_id | subscription_id
-------------------------
1 | 1
1 | 2
2 | 3
2 | 4
3 | 1
3 | 2
4 | 3
5 | 3
What I want to do is count how many users have similar subscriptions:
user_id | same_subscriptions
----------------------------
1 | 1
2 | 0
3 | 1
4 | 1
5 | 1
Is this even possible? How can I achieve this...
Best I managed to do is get a table like this with group_concat:
user_id | subscriptions
-----------------------
1 | 1,2
2 | 3,4
3 | 1,2
4 | 3
5 | 3
This is how I achieved it:
SELECT A.user_id, group_concat(B.subscription_id)
FROM Subscriptions A LEFT JOIN Subscriptions B ON
A.user_id=B.user_id GROUP BY A.user_id;
The aggregate function GROUP_CONCAT() does not help in this case because in SQLite it does not support an ORDER BY clause, so that a safe comparison can be done.
But you can use GROUP_CONCAT() window function instead:
SELECT user_id,
COUNT(*) OVER (PARTITION BY subs) - 1 same_subscriptions
FROM (
SELECT user_id,
GROUP_CONCAT(subscription_id) OVER (PARTITION BY user_id ORDER BY subscription_id) subs,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY subscription_id DESC) rn
FROM Subscriptions
)
WHERE rn = 1
ORDER BY user_id
See the demo.
Results:
> user_id | same_subscriptions
> ------: | -----------------:
> 1 | 1
> 2 | 0
> 3 | 1
> 4 | 1
> 5 | 1
I have a table which looks like this:
id timestamp value1 value2
1 09:12:37 1 1
1 09:12:42 1 2
1 09:12:41 1 3
1 10:52:16 2 4
1 10:52:18 2 5
2 09:33:12 3 1
2 09:33:15 3 2
2 09:33:13 3 3
I need to group by id and value1. For each group i want to have the row with the highest timestamp.
The result for the table above would look like this:
id timestamp value1 value2
1 09:12:42 1 2
2 09:33:15 3 2
I know there is the summarize operator which would give me this:
mytable
| project id, timestamp, value1, value2
| summarize max(timestamp) by id, value1
Result:
id timestamp value1
1 09:12:42 1
2 09:33:15 3
But i was not able to get value2 for this rows too.
Thanks in advance
If i understand your question correctly, you should be able to use summarize arg_max():
doc: https://learn.microsoft.com/en-us/azure/kusto/query/arg-max-aggfunction
datatable(id:long, timestamp:datetime, value1:long, value2:long)
[
1, datetime(2019-03-20 09:12:37), 1, 1,
1, datetime(2019-03-20 09:12:42), 1, 2,
1, datetime(2019-03-20 09:12:41), 1, 3,
1, datetime(2019-03-20 10:52:16), 2, 4,
1, datetime(2019-03-20 10:52:18), 2, 5, // this has the latest timestamp for id == 1
2, datetime(2019-03-20 09:33:12), 3, 1,
2, datetime(2019-03-20 09:33:15), 3, 2, // this has the latest timestamp for id == 2
2, datetime(2019-03-20 09:33:13), 3, 3,
]
| summarize arg_max(timestamp, *) by id
This will result with:
| id | timestamp | value1 | value2 |
|----|-----------------------------|--------|--------|
| 2 | 2019-03-20 09:33:15.0000000 | 3 | 2 |
| 1 | 2019-03-20 10:52:18.0000000 | 2 | 5 |
I found a solution to my problem, but there might be a better one.
mytable
| project id, timestamp, value1, value2
| order by timestamp desc
| summarize max(timestamp), makelist(value2) by id, value1
Results in:
id timestamp value1 list_value2
1 09:12:42 1 ["2", "3", "1"]
2 09:33:15 3 ["2", "3", "1"]
Now you can extend the query by adding
| project max_timestamp, id, value1, list_value2[0]
to get the first element from that list. Replace '0' by any number between 0 and length(list_value2)-1 to access the other values.
One more advice:
The timestamp i use is the one that is generated by ApplicationInsights. In our code we call TrackTrace to log some data. If you order the rows by this timestamp, the resulting list of rows is not garanteed to be in the same order in which the data was produced in code.
I ran a study in Qualtrics with 4 conditions. I'm only including 3 in the example below for ease. The resulting data looks something like this:
condition Q145 Q243 Q34 Q235 Q193 Q234 Q324 Q987 Q88
condition How a? How b? How c? How a? How b? How c? How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
The questions in the 2nd row are longer and more complex in the actual dataset, but they are consistent across conditions. In this sample, I've tried to capture the consistency and the fact that the default variable names (all starting with Q) do not match up.
Ultimately, I would like a dataframe that looks like the following. I would like to consolidate all the responses to a single question into one column per question. (Then I will go in and rename the lengthy questions with more concise variable names and "tidy" the data.)
condition How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
I'd appreciate any ideas for how to accomplish this.
library(tidyverse)
file = 'condition,Q145 ,Q243 ,Q34 ,Q235 ,Q193 ,Q234 ,Q324 ,Q987 ,Q88
condition,How a?,How b?,How c?,How a?,How b?,How c?,How a?,How b?,How c?
1 ,3 ,5 ,2 , , , , , ,
1 ,5 ,4 ,7 , , , , , ,
1 ,3 ,1 ,4 , , , , , ,
2 , , , ,3 ,4 ,7 , , ,
2 , , , ,1 ,2 ,8 , , ,
2 , , , ,1 ,3 ,9 , , ,
3 , , , , , , , 7 , 6 , 5
3 , , , , , , , 8 , 1 , 3
3 , , , , , , , 9 , 2 , 2'
# Read in just the data without the weird header situation
data <- read_csv(file, col_names = FALSE, skip = 2)
# Pull out the questions row and reshape into a dataframe to make the next part easy
questions <- gather(read_csv(file, col_names = FALSE, skip = 1, n_max = 1))
# Generate list of data frames (one df for each question)
split(questions, questions$value) %>%
# Then coalesce the columns
map_df(~do.call(coalesce, data[, .x$key]))
Gives the following result:
# A tibble: 9 x 4
condition `How a?` `How b?` `How c?`
<int> <int> <int> <int>
1 1 3 5 2
2 1 5 4 7
3 1 3 1 4
4 2 3 4 7
5 2 1 2 8
6 2 1 3 9
7 3 7 6 5
8 3 8 1 3
9 3 9 2 2
Of course, if you intend to move to long format eventually, you might just do something like this:
data %>%
gather(key, answer, -X1) %>%
filter(!is.na(answer)) %>%
left_join(questions, by = 'key') %>%
select(condition = X1, question = value, answer)
Resulting in the following:
# A tibble: 27 x 3
condition question answer
<int> <chr> <int>
1 1 How a? 3
2 1 How a? 5
3 1 How a? 3
4 1 How b? 5
5 1 How b? 4
6 1 How b? 1
7 1 How c? 2
8 1 How c? 7
9 1 How c? 4
10 2 How a? 3
# ... with 17 more rows