kql window query sum over partition - azure-data-explorer

I have the following table :
Group
UserId
count_
1
2
2
1
1
3
2
3
3
2
4
7
I want to run a sum() over partition by group in order to calculate the total requests for every group and add a percentage column for every user in the group.
The expected output :
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
In SQL i would do something like the following :
select group,user,count_/sum(count_) over(partition by group) from table
How can i get this output ?

At least at this point, a JOIN is needed (similarly to a SQL solution without the use of windows functions)
let t = datatable(Group:int, UserId:int, count:int)
[
1 ,2 ,2
,1 ,1 ,3
,2 ,3 ,3
,2 ,4 ,7
];
t
| summarize sum(['count']) by Group
| join kind=inner t on Group
| project Group, UserId, percent = 1.0*['count']/sum_count
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
Fiddle

Related

How can I create a recursive query using KQL?

everyone!
Do you know how can I create a recursive query using KQL in application insights?
Just to give you a context: As you know, currently there is a hierarchical relationship between the tables requests and dependencies in application insights by the id and operation_ParentId columns:
->(system A) request id=req_1, parent_id=dep_1
-> (system B) dependency id=dep_2, parent_id=req_1
->(system C) request id=req_3, parent_id=dep_2
I'm trying to build a tree view in my Workbook to have a better distributed tracing visualization and consequently know what happened in a specific request.
Do you know if there's something I can use to archive that goal?
Here is a quick example for how you can work with hierarchical data using multiple JOINs.
Please note that you must assume the depth of the tree and matched the number of JOINs to that assumption, e.g., in the following example I assumed there are no more than 7 hierarchy levels.
In the following example we traverse from the root to the leaves.
In the result data set, we get a record of every leaf in the hierarchical tree.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| where pid == -1
| join kind=leftouter t on $left.id == $right.pid
| join kind=leftouter t on $left.id1 == $right.pid
| join kind=leftouter t on $left.id2 == $right.pid
| join kind=leftouter t on $left.id3 == $right.pid
| join kind=leftouter t on $left.id4 == $right.pid
| join kind=leftouter t on $left.id5 == $right.pid
| join kind=leftouter t on $left.id6 == $right.pid
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
1
3
7
8
9
1
3
7
8
10
1
2
6
1
2
5
1
2
4
Fiddle
In the following example we traverse from each node (of any kind), up to the root.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| join kind=leftouter t on $left.pid == $right.id
| join kind=leftouter t on $left.pid1 == $right.id
| join kind=leftouter t on $left.pid2 == $right.id
| join kind=leftouter t on $left.pid3 == $right.id
| join kind=leftouter t on $left.pid4 == $right.id
| join kind=leftouter t on $left.pid5 == $right.id
| join kind=leftouter t on $left.pid6 == $right.id
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
10
8
7
3
1
9
8
7
3
1
6
2
1
5
2
1
4
2
1
7
3
1
1
3
1
2
1
8
7
3
1
Fiddle

Group by having clause in teradata

A teradata table
as
Group_categ id
A 1
A 2
A 3
A 5
A 8
A 9
B 11
C 1
C 2
C 3
C 4
need to filter it like
Group_categ min_id max _id
A 1 2
A 3 5
A 8 9
B 11 11
C 1 2
C 3 4
Seems you want to combine two consecutive rows into a single row:
SELECT Group_categ, Min(id), Max(id)
FROM
(
SELECT
Group_categ, id,
-- assign the same value to two consecutive rows: 0,0,1,1,2,2,..
-- -> used in the outer GROUP BY
(Row_Number() Over (PARTITION BY Group_categ ORDER BY id)-1) / 2 AS grp
FROM mytab
) AS dt
GROUP BY Group_categ, grp

Combining several columns based on matching text in R

I ran a study in Qualtrics with 4 conditions. I'm only including 3 in the example below for ease. The resulting data looks something like this:
condition Q145 Q243 Q34 Q235 Q193 Q234 Q324 Q987 Q88
condition How a? How b? How c? How a? How b? How c? How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
The questions in the 2nd row are longer and more complex in the actual dataset, but they are consistent across conditions. In this sample, I've tried to capture the consistency and the fact that the default variable names (all starting with Q) do not match up.
Ultimately, I would like a dataframe that looks like the following. I would like to consolidate all the responses to a single question into one column per question. (Then I will go in and rename the lengthy questions with more concise variable names and "tidy" the data.)
condition How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
I'd appreciate any ideas for how to accomplish this.
library(tidyverse)
file = 'condition,Q145 ,Q243 ,Q34 ,Q235 ,Q193 ,Q234 ,Q324 ,Q987 ,Q88
condition,How a?,How b?,How c?,How a?,How b?,How c?,How a?,How b?,How c?
1 ,3 ,5 ,2 , , , , , ,
1 ,5 ,4 ,7 , , , , , ,
1 ,3 ,1 ,4 , , , , , ,
2 , , , ,3 ,4 ,7 , , ,
2 , , , ,1 ,2 ,8 , , ,
2 , , , ,1 ,3 ,9 , , ,
3 , , , , , , , 7 , 6 , 5
3 , , , , , , , 8 , 1 , 3
3 , , , , , , , 9 , 2 , 2'
# Read in just the data without the weird header situation
data <- read_csv(file, col_names = FALSE, skip = 2)
# Pull out the questions row and reshape into a dataframe to make the next part easy
questions <- gather(read_csv(file, col_names = FALSE, skip = 1, n_max = 1))
# Generate list of data frames (one df for each question)
split(questions, questions$value) %>%
# Then coalesce the columns
map_df(~do.call(coalesce, data[, .x$key]))
Gives the following result:
# A tibble: 9 x 4
condition `How a?` `How b?` `How c?`
<int> <int> <int> <int>
1 1 3 5 2
2 1 5 4 7
3 1 3 1 4
4 2 3 4 7
5 2 1 2 8
6 2 1 3 9
7 3 7 6 5
8 3 8 1 3
9 3 9 2 2
Of course, if you intend to move to long format eventually, you might just do something like this:
data %>%
gather(key, answer, -X1) %>%
filter(!is.na(answer)) %>%
left_join(questions, by = 'key') %>%
select(condition = X1, question = value, answer)
Resulting in the following:
# A tibble: 27 x 3
condition question answer
<int> <chr> <int>
1 1 How a? 3
2 1 How a? 5
3 1 How a? 3
4 1 How b? 5
5 1 How b? 4
6 1 How b? 1
7 1 How c? 2
8 1 How c? 7
9 1 How c? 4
10 2 How a? 3
# ... with 17 more rows

Retrieve all rows with same minimum value for a column with sqldf

I have to retrieve IDs for employees who have completed the minimum number of jobs. There are multiple employees who have completed 1 job. My current sqldf query retrieves only 1 row of data, while there are multiple employee IDs who have completed just 1 job. Why does it stop at the first minimum value? And how do I fetch all rows with the minimum value in a column? Here is a data sample:
ID TaskCOunt
1 74
2 53
3 10
4 5
5 1
6 1
7 1
The code I have used:
sqldf("select id, min(taskcount) as Jobscompleted
from (select id,count(id) as taskcount
from MyData
where id is not null
group by id order by id)")
Output is
ID leastcount
5 1
While what I want is all the rows with minimum jobs completed.
ID Jobscompleted
5 1
6 1
7 1
min(...) always returns one row in SQL as do all SQL aggregate functions. Try this instead:
sqldf("select ID, TaskCount TasksCompleted from MyData
where TaskCount = (select min(TaskCount) from MyData)")
giving:
ID TasksCompleted
1 5 1
2 6 1
3 7 1
Note: The input in reproducible form is:
Lines <- "
ID TaskCount
1 74
2 53
3 10
4 5
5 1
6 1
7 1"
MyData <- read.table(text = Lines, header = TRUE)
As an alternative to sqldf, you could use data.table:
library(data.table)
dt <- data.table(ID=1:7, TaskCount=c(74, 53, 10, 5, 1, 1, 1))
dt[TaskCount==min(TaskCount)]
## ID TaskCount
## 1: 5 1
## 2: 6 1
## 3: 7 1

Combining Dataframes in R

I have two data frames, A and B, which both contain the same columns.
A = person, place
1 , 2
1 , 3
B = person, place
2 , 4
2 , 3
I want to combine both tables into one.
NewTable = person, place
1 , 2
1 , 3
2 , 4
2 , 3
Any ideas how?

Resources