Kusto: How to find missing values from a list - azure-data-explorer

I am trying to generate a list of values that are missing from a set list of values.I have a query like below:
let fcu = todynamic(pack_array("Alarm",
"State",
"Zone",
"Air",
"Temp Sp",
"Fan",
"Zone Air"));
let ac = all
| join kind=inner (AT) on $left.SourceId == $right.Id
| summarize Models=todynamic(make_list(Name2)) by Id
| extend MissingValues =
array_iff(dynamic([false,false,false,false,false,false,false]), fcu, Models);
This gives me the MissingValues as below, with null values that are missing in Models. How do I get the list of values that are missing?
"MissingValues": [
"Alarm",
"State",
"Zone",
"Air",
"Temp Sp",
null,
null
],

you should be able to use set_difference in order to get the set of all distinct values that are in the first array ("expected") but aren't in other array ("actual")

Related

Extend a columns value in same table

I have the below datatable, where WId and ParentId are values of the same column but are related to each other. The State that's shown here is for WId, I want to extend another column as ParentIdState which should be the State of ParentId. (The value of State also exists in the same table). How can I do so?
datatable(WId:int, WType:string, Link:string, ParentId:dynamic, State:string)
[
374075, "Deliverable", "Link", dynamic(315968), "Started",
]
Updating further for clarification -
datatable(WId:int, WType:string, Link:string, ParentId:dynamic, State:string)
[
374075, "Deliverable", "Link", dynamic(315968), "Started",
315968, "Parent", "Link", dynamic(467145), "Planned"
]
ParentId is dynamic because it's extracted from a JSON. In the above datatable ParentId is actually a value of WId and has its relevant details. My intent is to extend my table to give ParentState in another column like below -
Table
You should use join or lookup.
I believe you could join 2 tables:
the one you provided with a small modification - type of ParentId is changed from dynamic to int (the same as the type of WId as the join will be performed on it).
a simplified version of table 1) - with only 2 columns: WId and State
let data = datatable(WId:int, WType:string, Link:string, ParentId:dynamic, State:string)
[
374075, "Deliverable", "Link", dynamic(315968), "Started",
315968, "Parent", "Link", dynamic(467145), "Planned"
]
| extend ParentId = toint(ParentId); // to make sure the type of ParentId is the same as WId
data
| join kind=leftouter (data | project WId, State) on $left.ParentId == $right.WId
| project WId, WType, Link, ParentId, State, ParentState = State1
There might be some optimization to be done here (for example by using materialize, but I'm not entirely sure)
You can also achieve the same with lookup
data
| lookup (data | project WId, State) on $left.ParentId == $right.WId
| project WId, WType, Link, ParentId, State, ParentState = State1

How can I adapt this Power query recursion to handle multiple Parents in a hierarchy?

I'm trying to adapt the recursion code below, to handle the situation where there can be multiple ParentId for each Id (This is where I found the code on SO).
I see that the List.PositionOf can take an optional parameter with Occurence. All so that instead of returning the position of the Id in the List, I can return a list of the positions of all the matching Ids.
The problem I have is what to do next.
How can I use this list of positions to get the list of ParentId elements in the ParentID_List corresponding to these positions? Or is there a better approach?
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"ID", type text}, {"ParentID", type text}}),
ID_List = List.Buffer( ChangedType[ID] ),
ParentID_List = List.Buffer( ChangedType[ParentID] ),
Type_List = List.Buffer( ChangedType[Type] ),
Highest = (n as text, searchfor as text) as text =>
let
Spot = List.PositionOf( ID_List, n ),
ThisType = Type_List{Spot},
Parent_ID = ParentID_List{Spot}
in if Parent_ID = null or ThisType=searchfor then ID_List{Spot} else #Highest(Parent_ID,searchfor),
FinalTable = Table.AddColumn( ChangedType, "StrategyID", each Highest( [ID],"Strategy" ), type text),
FinalTable2 = Table.AddColumn( FinalTable, "SubstrategyID", each Highest( [ID],"Substrategy" ), type text),
#"Replaced Errors" = Table.ReplaceErrorValues(FinalTable2, {{"SubstrategyID", null}})
in #"Replaced Errors"

How to query for grouped distinct records that both have null and non-null values in Kusto?

I am trying to create a query that returns a result set with a distinct (car) column based on another (data) column that is non-null.
In the example below, if there is a non-null value found in the data column then return the single instance with a value and if not, return the value with null and always maintain the distinctness of the first column.
let Car = datatable(car, data:string)
[
"mercedes", "fast",
"mercedes", null,
"tesla", null
"toyota", "good",
"sonata", null,
"sonata", null,
"sonata", "amazing"
];
So the desired output would be:
"mercedes", "fast",
"tesla", null,
"toyota", "good",
"sonata", "amazing",
Thanks!
one option would be using a combination of set_difference() and make_set():
make_set() will create a set of all unique values of data (by car, the aggregation key)
dynamic([""]) is an array with an empty string
set_difference() will produce the difference between the two former arrays - to provide a set with a non-empty string (or an empty set)
last, by accessing the first element of the result set (using [0]), you'll get the first element that's not-empty (or null, if the set is empty)
datatable(car:string, data:string)
[
"mercedes", "",
"mercedes", "fast",
"tesla", "",
"toyota", "good",
"sonata", "",
"sonata", "",
"sonata", "amazing"
]
| summarize data = set_difference(make_set(data), dynamic([""]))[0] by car
car
data
mercedes
fast
tesla
toyota
good
sonata
amazing

Kusto | Summarize count() multiple columns with where clauses

I'm trying to get the count of multiple things in a Kusto query but having trouble getting it working. Let's say I have a sample table like this:
let SampleTable = datatable(Department:string, Status:string, DateStamp:datetime)
[
"Logistics", "Open", "05-01-2019",
"Finance", "Closed", "05-01-2020",
"Logistics", "Open", "05-01-2020"
];
And I query like this:
SampleTable
| summarize closedEntries = count() by (Status | where Status == "Closed"),
openEntries = (Status | where Status == "Open"),
recentDates = (DateStamp | where DateStamp > "12-31-2019"),
Department
Expected results:
But this gives an error "The name 'Status' does not refer to any known column, table, variable or function." and the same error for DateStamp. I've also tried using extend and join but it's a mess.
you could use the countif() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/countif-aggfunction
datatable(Department:string, Status:string, DateStamp:datetime)
[
"Logistics", "Open", "05-01-2019",
"Finance", "Closed", "05-01-2020",
"Logistics", "Open", "05-01-2020"
]
| summarize closedEntries = countif(Status == "Closed"),
openEntries = countif(Status == "Open"),
recentDates = countif(DateStamp > datetime(12-31-2019))
by Department

Replace nulls by default values in oracle

Please concern following oracle beginner's case:
Table "X" contains customer data:
ID Variable_A Variable_B Variable_C Variable_D
--------------------------------------------------
1 100 null abc 2003/07/09
2 null 2 null null
Table "Dictionary" contains what we can regard as default values for customer data:
Variable_name Default_Value
----------------------------
Variable_A 50
Variable_B 0
Variable_C text
Variable_D sysdate
The goal is to examine the row in "X" by given ID and replace null values by the default values from "Dictionary". The concrete question is about the optimal solution because, for now my own solution lies in use of looping with MERGE INTO statement which is, I think, not optimal. Also it is necessary to use flexible code without being ought to change it when new column is added into "X".
The direct way is to use
update X set
variable_a = coalesce(variable_a, (select default_value from Dictionary where name = 'Variable_A')),
variable_b = coalesce(variable_b, (select default_value from Dictionary where name = 'Variable_B')),
... and so on ...
Generally it should be fast enough.
Since you don't know which fields of table X will be null, you should provide every row with every default value. And since each field of X may be a different data type, the Dictionary table should have each default value in a field of the appropriate type. Such a layout is shown in thisFiddle.
A query which shows each row of X fully populated with either the value in X or its default becomes relatively simple.
select ID,
nvl( Var_A, da.Int_Val ) Var_A,
nvl( Var_B, db.Int_Val ) Var_B,
nvl( Var_C, dc.Txt_Val ) Var_C,
nvl( Var_D, dd.Date_Val ) Var_D
from X
join Dict da
on da.Name = 'VA'
join Dict db
on db.Name = 'VB'
join Dict dc
on dc.Name = 'VC'
join Dict dd
on dd.Name = 'VD';
Turning this into an Update statement is a little more complicated but is simple enough once you've used it a few times:
update X
set (Var_A, Var_B, Var_C, Var_D) =(
select nvl( Var_A, da.Int_Val ),
nvl( Var_B, db.Int_Val ),
nvl( Var_C, dc.Txt_Val ),
nvl( Var_D, dd.Date_Val )
from X InnerX
join Dict da
on da.Name = 'VA'
join Dict db
on db.Name = 'VB'
join Dict dc
on dc.Name = 'VC'
join Dict dd
on dd.Name = 'VD'
where InnerX.ID = X.ID )
where exists(
select 1
from X
where Var_A is null
or Var_B is null
or Var_C is null
or Var_D is null );
There is a problem with this. The default for Date types is given as sysdate which means that it will show the date and time the default table was populated not the date and time the Update was performed. This, I assume, is not what you want. You could try to make this all work using dynamic sql, but that will be a lot more complicated. Much too complicated for what you want to do here.
I see only two realistic options: either store a meaningful date as the default (9999-12-31, for example) or just know that every default for a date type will be sysdate and use that in your updates. That would be accomplished in the above Update just by changing one line:
nvl( Var_D, sysdate )
and getting rid of the last join.

Resources