Kusto equivalent of SQL NOT IN - azure-data-explorer

I am trying to identify what records exist in table 1 that are not in table 2 (so essentially using NOT IN)
let outliers =
Table 2
| project UniqueEventGuid;
Table 1
|where UniqueEventGuid !in (outliers)
|project UniqueEventGuid
but getting 0 records back even though I know there are orphans in table 1.
Is the !in not the right syntax?
Thanks in advance!

!in operator
"In tabular expressions, the first column of the result set is
selected."
In the following example I intentionally ordered the column such that the query will result in error due to mismatched data types.
In your case, the data types might match, so the query is valid, but the results are wrong.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in (t2)
Relop semantic error: SEM0025: One of the values provided to the
'!in' operator does not match the left side expression type 'int',
consider using explicit cast
Fiddle
If that is indeed the case, you can reorder the columns or project only the relevant one.
Note the use of double brackets.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in ((t2 | project i))
i
x
1
A
2
B
3
C
Fiddle
Another option is to use leftanti join
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| join kind=leftanti t2 on i
i
x
2
B
3
C
1
A
Fiddle

Related

Find the values missing between two tables

Pretty new to kql. Have a very basic question.
Say - I have two tables - Table1, Table2 which has a column named id.
What i am looking for is a query - to find the id's which are present in Table1 but not in Table2?
I saw set_differnce, where i am stuck is to generate the arrays to be passed to this.
Thanks in advance.
You could consider using a leftanti/rightanti join, or !in.
examples:
this returns a table with a column x, with the values 1,4,7,22,25,28
let T1 = range x from 1 to 30 step 3;
let T2 = range y from 10 to 20 step 1;
T1
| join kind=leftanti T2 on $left.x == $right.y
and so does this:
let T1 = range x from 1 to 30 step 3;
let T2 = range y from 10 to 20 step 1;
T1
| where x !in((T2 | project y))

Can I use tabular parameters in Kusto user-defined functions

Basically I'd like to pass in a set of field values to a function so I can use in/!in operators. I'd prefer to be able to use the result of a previous query rather than having to construct a set manually.
As in:
let today = exception | where EventInfo_Time > ago(1d) | project exceptionMessage;
MyAnalyzeFunction(today)
What is then the signature of MyAnalyzeFunction?
See: https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
For instance, the following will return a table with a single column (y) with the values 2 and 3:
let someTable = range x from 2 to 10 step 1
;
let F = (T:(x:long))
{
range y from 1 to 3 step 1
| where y in (T)
}
;
F(someTable)

BQ array lookup: similar to NTH, but based on index, not position

The NTH function is really useful for extracting nested array elements in BQ, but its utility for a given table depends on each row's nested array containing the same amount of elements, and in the same order. If I have a 2+ column nested array where one column is variable name/ID, and the different instances of the array in different rows have inconsistent naming and/or ordering, is there an elegant way to fetch/pivot a variable based on the variable name/ID?
For example, if row1 has customDimensions array:
index value
4 aaa
23 bbb
70 ccc
and row2 has customDimensions array:
index value
4 ddd
70 eee
I'd want to run something like
SELECT
NTHLOOKUP(70, customdims.index, customdims.value) as val70,
NTHLOOKUP(4, customdims.index, customdims.value) as val4,
NTHLOOKUP(23, customdims.index, customdims.value) as val23
from my_table;
And get:
val70 val4 val23
ccc aaa bbb
eee ddd (null)
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
EDIT: Based on Mikhail's answer below (thank you!!) I was able to write my query more elegantly. Not quite as slick as an NTHLOOKUP, but I'll take it:
select id,
max(case when index = 41 then value[OFFSET(0)] else '' end) as val41,
max(case when index = 59 then value[OFFSET(0)] else '' end) as val59
from
(select
concat(array1.thing1, array1.thing2) as id,
cd.index,
ARRAY_AGG(distinct cd.value) as value
FROM my_table g
,unnest(array1) as array1
,unnest(array1.customDimensions) as cd
where index in (41,59)
group by 1,2
order by 1,2
) x
group by 1
order by 1
The best I can "offer" is below (BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.my_table` AS (
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'aaa'), (23, 'bbb'), (70, 'ccc')] customDimensions
UNION ALL
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'ddd'), (70, 'eee')] customDimensions
)
SELECT cd.index, ARRAY_AGG(cd.value) VALUES
FROM `project.dataset.my_table`,
UNNEST(customDimensions) cd
GROUP BY cd.index
with result as below
Row index values
1 4 aaa
ddd
2 23 bbb
3 70 ccc
eee
I would recommend to stay with this flatten version as it serves most of practical cases I can think of
But if you still want to further pivot this - there are quite a number of posts related to how to pivot in BigQuery
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
Yes, you can use a user-defined function to encapsulate the common logic. For example,
CREATE TEMP FUNCTION NTHLOOKUP(
targetIndex INT64,
customDimensions ARRAY<STRUCT<index INT64, value STRING>>
) AS (
(SELECT value FROM UNNEST(customDimensions)
WHERE index = targetIndex)
);
SELECT
NTHLOOKUP(70, customDimensions) as val70,
NTHLOOKUP(4, customDimensions) as val4,
NTHLOOKUP(23, customDimensions) as val23
from my_table;

How can I concatenate(or merge) values from 2 result sets with the same PK?

I don't know if I'm being dumb here but I can't seem to find an efficient way to do this. I wrote a very long and inefficient query that does what I need, but what I WANT is a more efficient way.
I have 2 result sets that displays an ID (a PK which is generic/from the same source in both sets) and a FLAG (A - approve and V - Validate).
Result Set 1
ID FLAG
1 V
2 V
3 V
4 V
5 V
6 V
Result Set 2
ID FLAG
2 A
5 A
7 A
8 A
I want to "merge" these two sets to give me this output:
ID FLAG
1 V
2 (V/A)
3 V
4 V
5 (V/A)
6 V
7 A
8 A
Neither of the 2 result sets will at any time have all the ID's to make a simple left join with a case statement on the other result set an easy solution.
I'm currently doing a union between the two sets to get ALL the ID's. Thereafter I left join the 2 result sets to get the required '(V/A)' by use of a case statement.
There must be a more efficient way but I just can't seem to figure it out now as I'm running low on amps... I need a holiday... :-/
Thanks in advance!
Use a FULL OUTER JOIN:
SELECT ID,
CASE
WHEN t1.FLAG IS NULL THEN t2.FLAG
WHEN t2.FLAG IS NULL THEN t1.FLAG
ELSE '(' || t1.FLAG || '/' || t2.FLAG || ')'
END AS MERGED_FLAG
FROM TABLE1 t1
FULL OUTER JOIN TABLE2 t2
USING (ID)
ORDER BY ID
See this SQLFiddle.
Share and enjoy.
I think that you can use xmlagg. Here an exemple :
SELECT deptno,
SUBSTR (REPLACE (REPLACE (XMLAGG (XMLELEMENT ("x", ename)
ORDER BY ename),'</x>'),'<x>','|'),2) as concated_list
FROM emp
GROUP BY deptno
ORDER BY deptno;
Bye

Strange result in SQLite query

I´m getting a strange result from a SQLite query. The query is the next one:
SELECT rule FROM rules
WHERE idRule = (SELECT idRuleForeign FROM rulesXfilter
WHERE idFilterForeign = (SELECT idFilter FROM filters
WHERE name = 'Filter1'));
Now, let´s suppose that I have the following tables with a few rows on it.
filters rules rulesXfilter
idFilter name idRule rule idRuleForeign idFilterForeign
1 Filter1 1 Rule1 1 1
2 Filter2 2 Rule2 2 1
3 Rule3 3 1
2 2
What I get is {Rule1}, although I think I should get {Rule1, Rule2, Rule3}
What am I doing wrong?
Select idRuleForeign... returns multiple results, yes ({1, 2, 3}). However, you then say "give me the rule where idRule = {SET}", and sql doesnt like this. I believe what is happening is that it is instead taking the first result only and giving you that.
The solution is to use joins. Inner selects like that, while work most of the time, can REALLY slow down your query. If I got my syntax correct, the following should do what you need:
SELECT r.rule FROM rules r
JOIN rulesXfilter rf ON r.idRule = rf.idRuleForeign
JOIN filters f ON f.idFilter = rf.idFilterForeign
WHERE f.name = 'Filter1'

Resources