Fill the empty values with lastknown value in kusto kql - azure-data-explorer

Let example i have one master table
The ID 1, 2 having empty values for X column
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00
2012-12-28T12:04:00
12
11
10
2
2012-12-28T12:06:00
2012-12-28T12:06:00
2
9
7
3
2012-12-29T12:11:00
2012-12-29T12:11:00
2
9
7
1
2012-12-29T12:15:00
2012-12-29T12:15:00
33
7
2
2012-12-29T12:24:00
2012-12-29T12:24:00
9
7
I'm having in function demo(datetime:fromTime, datetime:toTime)
from this I'm querying for fromTime 2012-12-29T12:11:00 to toTime: same 29thdecmber)
so if any empty values there i need to fill
those empty values from previous date with respective column
Need a filled x value for the same ID from the master table
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00
2012-12-28T12:04:00
12
11
10
2
2012-12-28T12:06:00
2012-12-28T12:06:00
2
9
7
3
2012-12-29T12:11:00
2012-12-29T12:11:00
2
9
7
1
2012-12-29T12:15:00
2012-12-29T12:15:00
lastknownvalueforthisID?
33
7
2
2012-12-29T12:24:00
2012-12-29T12:24:00
lastknownvalueforthisID?
9
7

datatable(ID:int, DateTime:datetime, IngestionTime:datetime, X:int, Y:int, Z:int)
[
1 ,datetime(2012-12-28T12:04:00) ,datetime(2012-12-28T12:04:00) ,12 ,11 ,10
,2 ,datetime(2012-12-28T12:06:00) ,datetime(2012-12-28T12:06:00) ,2 ,9 ,7
,3 ,datetime(2012-12-29T12:11:00) ,datetime(2012-12-29T12:11:00) ,2 ,9 ,7
,1 ,datetime(2012-12-29T12:15:00) ,datetime(2012-12-29T12:15:00) ,int(null) ,33 ,7
,2 ,datetime(2012-12-29T12:24:00) ,datetime(2012-12-29T12:24:00) ,int(null) ,9 ,7
]
| partition hint.strategy=native by ID
(
order by DateTime asc
| scan with (step s: true => X = coalesce(X, s.X);)
)
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00Z
2012-12-28T12:04:00Z
12
11
10
1
2012-12-29T12:15:00Z
2012-12-29T12:15:00Z
12
33
7
3
2012-12-29T12:11:00Z
2012-12-29T12:11:00Z
2
9
7
2
2012-12-28T12:06:00Z
2012-12-28T12:06:00Z
2
9
7
2
2012-12-29T12:24:00Z
2012-12-29T12:24:00Z
2
9
7
Fiddle

If the gaps are always at the end, you can use the following query.
let t = datatable(ID:int, DateTime:datetime, IngestionTime:datetime, X:int, Y:int, Z:int)
[
1 ,datetime(2012-12-28T12:04:00) ,datetime(2012-12-28T12:04:00) ,12 ,11 ,10
,2 ,datetime(2012-12-28T12:06:00) ,datetime(2012-12-28T12:06:00) ,2 ,9 ,7
,3 ,datetime(2012-12-29T12:11:00) ,datetime(2012-12-29T12:11:00) ,2 ,9 ,7
,1 ,datetime(2012-12-29T12:15:00) ,datetime(2012-12-29T12:15:00) ,int(null) ,33 ,7
,2 ,datetime(2012-12-29T12:24:00) ,datetime(2012-12-29T12:24:00) ,int(null) ,9 ,7
];
let last_notnull_X_values =
t
| where isnotnull(X)
| summarize arg_max(DateTime, X) by ID
| project ID, new_X = X;
t
| lookup last_notnull_X_values on ID
| extend X = coalesce(X, new_X)
| project-away n`ew_X
ID
DateTime
IngestionTime
X
Y
Z
1
2012-12-28T12:04:00Z
2012-12-28T12:04:00Z
12
11
10
2
2012-12-28T12:06:00Z
2012-12-28T12:06:00Z
2
9
7
3
2012-12-29T12:11:00Z
2012-12-29T12:11:00Z
2
9
7
1
2012-12-29T12:15:00Z
2012-12-29T12:15:00Z
12
33
7
2
2012-12-29T12:24:00Z
2012-12-29T12:24:00Z
2
9
7
Fidlde

Related

How can I create a recursive query using KQL?

everyone!
Do you know how can I create a recursive query using KQL in application insights?
Just to give you a context: As you know, currently there is a hierarchical relationship between the tables requests and dependencies in application insights by the id and operation_ParentId columns:
->(system A) request id=req_1, parent_id=dep_1
-> (system B) dependency id=dep_2, parent_id=req_1
->(system C) request id=req_3, parent_id=dep_2
I'm trying to build a tree view in my Workbook to have a better distributed tracing visualization and consequently know what happened in a specific request.
Do you know if there's something I can use to archive that goal?
Here is a quick example for how you can work with hierarchical data using multiple JOINs.
Please note that you must assume the depth of the tree and matched the number of JOINs to that assumption, e.g., in the following example I assumed there are no more than 7 hierarchy levels.
In the following example we traverse from the root to the leaves.
In the result data set, we get a record of every leaf in the hierarchical tree.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| where pid == -1
| join kind=leftouter t on $left.id == $right.pid
| join kind=leftouter t on $left.id1 == $right.pid
| join kind=leftouter t on $left.id2 == $right.pid
| join kind=leftouter t on $left.id3 == $right.pid
| join kind=leftouter t on $left.id4 == $right.pid
| join kind=leftouter t on $left.id5 == $right.pid
| join kind=leftouter t on $left.id6 == $right.pid
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
1
3
7
8
9
1
3
7
8
10
1
2
6
1
2
5
1
2
4
Fiddle
In the following example we traverse from each node (of any kind), up to the root.
let t = datatable(id:int, pid:int)
[
1 ,-1
,2 ,1
,3 ,1
,4 ,2
,5 ,2
,6 ,2
,7 ,3
,8 ,7
,9 ,8
,10 ,8
];
t
| join kind=leftouter t on $left.pid == $right.id
| join kind=leftouter t on $left.pid1 == $right.id
| join kind=leftouter t on $left.pid2 == $right.id
| join kind=leftouter t on $left.pid3 == $right.id
| join kind=leftouter t on $left.pid4 == $right.id
| join kind=leftouter t on $left.pid5 == $right.id
| join kind=leftouter t on $left.pid6 == $right.id
| project-away pid*
id
id1
id2
id3
id4
id5
id6
id7
10
8
7
3
1
9
8
7
3
1
6
2
1
5
2
1
4
2
1
7
3
1
1
3
1
2
1
8
7
3
1
Fiddle

kql window query sum over partition

I have the following table :
Group
UserId
count_
1
2
2
1
1
3
2
3
3
2
4
7
I want to run a sum() over partition by group in order to calculate the total requests for every group and add a percentage column for every user in the group.
The expected output :
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
In SQL i would do something like the following :
select group,user,count_/sum(count_) over(partition by group) from table
How can i get this output ?
At least at this point, a JOIN is needed (similarly to a SQL solution without the use of windows functions)
let t = datatable(Group:int, UserId:int, count:int)
[
1 ,2 ,2
,1 ,1 ,3
,2 ,3 ,3
,2 ,4 ,7
];
t
| summarize sum(['count']) by Group
| join kind=inner t on Group
| project Group, UserId, percent = 1.0*['count']/sum_count
Group
UserId
percent
1
2
0.4
1
1
0.6
2
3
0.3
2
4
0.7
Fiddle

R Creating new columns using vector contains name of variables

I have a data and a vector contain name of variables and i want to create new variable contain rowsum of variables in my vector, and i want the name of new variable ( sum of variables in my vector) to be concatenation of names of variables
for example i have this data
> data
Name A B C D E
r1 1 5 12 21 15
r2 2 4 7 10 9
r3 5 15 6 9 6
r4 7 8 0 7 18
and this vector
>Vec
"A" , "C" , "D"
the result i want is the sum of Variables A , C and D and the name of my variable is ACD
here's the result i want :
> data
Name A B C D ACD E
r1 1 5 12 21 34 15
r2 2 4 7 10 18 9
r3 5 15 6 9 20 6
r4 7 8 0 7 14 18
I tried this :
data <- cbind(data , as.data.frame(rowSums(data[,Vec]) ))
But i don't know how to create the name
Here's the result i got
>data
Name A B C D E rowSums(data[,Vec])
r1 1 5 12 21 15 34
r2 2 4 7 10 9 18
r3 5 15 6 9 6 20
r4 7 8 0 7 18 14
Not that i gave just a sample example to explain what i want to do
i want to do affectation of my old data to my new data ( that contains the new variable), like i did in my command above
edit 1 : in my real program , i don't know the elements ( name of my variables in my vector so i can not do data$ACD <- cbind(data , as.data.frame(rowSums(data[,Vec]) )) as suggested by Pax, in fact i have for loop that generate my vectors and each time i create variable to put the result i want ( sum of variable in my vector) so i don't know how to affect the name without knowing the elements of vectors
Please tell me if you need anymore clarifications or informations
Thank you
It's not a one line solution but you can set the name on the subsequent line:
data <- data.frame(A = c(1, 2, 5, 7),
B = c(5, 4, 15, 8),
C = c(12, 7, 6, 0),
D = c(21, 10, 9, 7),
E = c(15, 9, 6, 18))
Vec <- c("A" , "C" , "D")
data <- cbind(data, rowSums(data[,Vec]))
# Add name
names(data)[ncol(data)] <- paste(Vec, collapse="")
# A B C D E ACD
# 1 1 5 12 21 15 34
# 2 2 4 7 10 9 19
# 3 5 15 6 9 6 20
# 4 7 8 0 7 18 14
Here is an option with the janitor package. You can use adorn_totals which appends a totals row or column to a data.frame. The name argument includes the name of the new column in this case, and final Vec included at the end includes the columns to total.
library(janitor)
adorn_totals(data, "col", fill = NA, na.rm = TRUE, name = paste(Vec, collapse = ""), all_of(Vec))
Output
A B C D E ACD
1 5 12 21 15 34
2 4 7 10 9 19
5 15 6 9 6 20
7 8 0 7 18 14

Combining several columns based on matching text in R

I ran a study in Qualtrics with 4 conditions. I'm only including 3 in the example below for ease. The resulting data looks something like this:
condition Q145 Q243 Q34 Q235 Q193 Q234 Q324 Q987 Q88
condition How a? How b? How c? How a? How b? How c? How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
The questions in the 2nd row are longer and more complex in the actual dataset, but they are consistent across conditions. In this sample, I've tried to capture the consistency and the fact that the default variable names (all starting with Q) do not match up.
Ultimately, I would like a dataframe that looks like the following. I would like to consolidate all the responses to a single question into one column per question. (Then I will go in and rename the lengthy questions with more concise variable names and "tidy" the data.)
condition How a? How b? How c?
1 3 5 2
1 5 4 7
1 3 1 4
2 3 4 7
2 1 2 8
2 1 3 9
3 7 6 5
3 8 1 3
3 9 2 2
I'd appreciate any ideas for how to accomplish this.
library(tidyverse)
file = 'condition,Q145 ,Q243 ,Q34 ,Q235 ,Q193 ,Q234 ,Q324 ,Q987 ,Q88
condition,How a?,How b?,How c?,How a?,How b?,How c?,How a?,How b?,How c?
1 ,3 ,5 ,2 , , , , , ,
1 ,5 ,4 ,7 , , , , , ,
1 ,3 ,1 ,4 , , , , , ,
2 , , , ,3 ,4 ,7 , , ,
2 , , , ,1 ,2 ,8 , , ,
2 , , , ,1 ,3 ,9 , , ,
3 , , , , , , , 7 , 6 , 5
3 , , , , , , , 8 , 1 , 3
3 , , , , , , , 9 , 2 , 2'
# Read in just the data without the weird header situation
data <- read_csv(file, col_names = FALSE, skip = 2)
# Pull out the questions row and reshape into a dataframe to make the next part easy
questions <- gather(read_csv(file, col_names = FALSE, skip = 1, n_max = 1))
# Generate list of data frames (one df for each question)
split(questions, questions$value) %>%
# Then coalesce the columns
map_df(~do.call(coalesce, data[, .x$key]))
Gives the following result:
# A tibble: 9 x 4
condition `How a?` `How b?` `How c?`
<int> <int> <int> <int>
1 1 3 5 2
2 1 5 4 7
3 1 3 1 4
4 2 3 4 7
5 2 1 2 8
6 2 1 3 9
7 3 7 6 5
8 3 8 1 3
9 3 9 2 2
Of course, if you intend to move to long format eventually, you might just do something like this:
data %>%
gather(key, answer, -X1) %>%
filter(!is.na(answer)) %>%
left_join(questions, by = 'key') %>%
select(condition = X1, question = value, answer)
Resulting in the following:
# A tibble: 27 x 3
condition question answer
<int> <chr> <int>
1 1 How a? 3
2 1 How a? 5
3 1 How a? 3
4 1 How b? 5
5 1 How b? 4
6 1 How b? 1
7 1 How c? 2
8 1 How c? 7
9 1 How c? 4
10 2 How a? 3
# ... with 17 more rows

Ways to improve for loop in R

I have a large file and need to match admin ids with users something like this:
TABLE1 TABLE 2
INDEX V1 IDS AdmID
1 A 30 30
2 U 3 123
3 U 25 60
4 U 4 .
5 U 5 .
6 A 123 .
7 U 7
8 U 8
9 U 9
10 A 60
11 U 26
12 U 2
. . .
. . .
. . .
I want something like this:
COMPLETE TABLE
INDEX V1 IDS ADMIN_ID
1 A 30 30
2 U 3 30
3 U 25 30
4 U 4 30
5 U 5 30
6 A 123 123
7 U 7 123
8 U 8 123
9 U 9 123
10 A 60 60
11 U 26 60
12 U 2 60
. . . .
. . . .
. . . .
So I wrote this loop, but is taking forever to finish. Any idea of how to use apply() on this situation:
ln=10,000;#number of records in the Adm table
TABLE2= index of the adm ids
for (k in 1:ln){
w<-TABLE2$A_ID[k] #Ids of the adms
for(i in seq(from=AdmID[k], to=AdmID[k+1], by=1)){
TABLE1$ADMIN_ID[i]<-w
}
}
It'll be easier if how the mapping is applied by records - admin$ind. Cumulative sums are obtained and the mapping table is reversed - admin. Then ids can be replaced sequentially - in your case, 12, 9, 5.
df <- data.frame(index = c(1:12),
v1 = c("A","U","U","U","U","A","U","U","U","A","U","U"),
ids = 13:24,
admin = 0)
# need rule to assign ids - ind
admin <- data.frame(ind = c(5,4,3), id = c(30,123,60))
# get cumulative sum and reverse admin table
admin$cum <- cumsum(admin$ind)
admin <- admin[nrow(admin):1,]
admin
ind id cum
3 3 60 12
2 4 123 9
1 5 30 5
# ids will be subsequently updated - 12, 9, 5
for(i in 1:length(admin$cum)) {
df[as.numeric(row.names(df)) <= admin$cum[i], 4] <- admin$id[i]
}
df
index v1 ids admin
1 1 A 13 30
2 2 U 14 30
3 3 U 15 30
4 4 U 16 30
5 5 U 17 30
6 6 A 18 123
7 7 U 19 123
8 8 U 20 123
9 9 U 21 123
10 10 A 22 60
11 11 U 23 60
12 12 U 24 60
Below is another version that uses the individual matching rule, but the cumulative one.
df <- data.frame(index = c(1:12),
v1 = c("A","U","U","U","U","A","U","U","U","A","U","U"),
ids = 13:24)
# need rule to assign ids - ind
admin <- data.frame(ind = c(5,4,3), id = c(30,123,60))
df$admin <- do.call(c, lapply(1:length(admin$ind), function(x) {
rep(admin$id[x], sum(as.numeric(row.names(df)) <= admin$ind[x]))
}))

Resources