Pretty new to kql. Have a very basic question.
Say - I have two tables - Table1, Table2 which has a column named id.
What i am looking for is a query - to find the id's which are present in Table1 but not in Table2?
I saw set_differnce, where i am stuck is to generate the arrays to be passed to this.
Thanks in advance.
You could consider using a leftanti/rightanti join, or !in.
examples:
this returns a table with a column x, with the values 1,4,7,22,25,28
let T1 = range x from 1 to 30 step 3;
let T2 = range y from 10 to 20 step 1;
T1
| join kind=leftanti T2 on $left.x == $right.y
and so does this:
let T1 = range x from 1 to 30 step 3;
let T2 = range y from 10 to 20 step 1;
T1
| where x !in((T2 | project y))
Related
I am trying to identify what records exist in table 1 that are not in table 2 (so essentially using NOT IN)
let outliers =
Table 2
| project UniqueEventGuid;
Table 1
|where UniqueEventGuid !in (outliers)
|project UniqueEventGuid
but getting 0 records back even though I know there are orphans in table 1.
Is the !in not the right syntax?
Thanks in advance!
!in operator
"In tabular expressions, the first column of the result set is
selected."
In the following example I intentionally ordered the column such that the query will result in error due to mismatched data types.
In your case, the data types might match, so the query is valid, but the results are wrong.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in (t2)
Relop semantic error: SEM0025: One of the values provided to the
'!in' operator does not match the left side expression type 'int',
consider using explicit cast
Fiddle
If that is indeed the case, you can reorder the columns or project only the relevant one.
Note the use of double brackets.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in ((t2 | project i))
i
x
1
A
2
B
3
C
Fiddle
Another option is to use leftanti join
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| join kind=leftanti t2 on i
i
x
2
B
3
C
1
A
Fiddle
I am just learning SQL and I got a task, that I need to find the final length of a discontinuous line when I have imput such as:
start | finish
0 | 3
2 | 7
15 | 17
And the correct answer here would be 9, because it spans from 0-3 and then I am suppsed to ignore the parts that are present multiple times so from 3-7(ignoring the two because it is between 0 and 3 already) and 15-17. I am supposed to get this answer solely through an sql query(no functions) and I am unsure of how. I have tried to experiment with some code using with, but I can't for the life of me figure out how to ignore all the multiples properly.
My half-attempt:
WITH temp AS(
SELECT s as l, f as r FROM lines LIMIT 1),
cte as(
select s, f from lines where s < (select l from temp) or f > (select r from temp)
)
select * from cte
This really only gives me all the rows tha are not completly usless and extend the length, but I dont know what to do from here.
Use a recursive CTE that breaks all the (start, finish) intervals to as many 1 unit length intervals as is the total length of the interval and then count all the distinct intervals:
WITH cte AS (
SELECT start x1, start + 1 x2, finish FROM temp
WHERE start < finish -- you can omit this if start < finish is always true
UNION
SELECT x2, x2 + 1, finish FROM cte
WHERE x2 + 1 <= finish
)
SELECT COUNT(DISTINCT x1) length
FROM cte
See the demo.
Result:
length
9
I have a big table which is 100k rows in size and the PRIMARY KEY is of the datatype NUMBER. The way data is populated in this column is using a random number generator.
So my question is, can there be a possibility to have a SQL query that can help me with getting partition the table evenly with the range of values. Eg: If my column value is like this:
1
2
3
4
5
6
7
8
9
10
And I would like this to be broken into three partitions, then I would expect an output like this:
Range 1 1-3
Range 2 4-7
Range 3 8-10
It sounds like you want the WIDTH_BUCKET() function. Find out more.
This query will give you the start and end range for a table of 1250 rows split into 20 buckets based on id:
with bkt as (
select id
, width_bucket(id, 1, 1251, 20) as id_bucket
from t23
)
select id_bucket
, min(id) as bkt_start
, max(id) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1
;
The two middle parameters specify min and max values; the last parameter specifies the number of buckets. The output is the rows between the minimum and maximum bows split as evenly as possible into the specified number of buckets. Be careful with the min and max parameters; I've found poorly chosen bounds can have an odd effect on the split.
This solution works without width_bucket function. While it is more verbose and certainly less efficient it will split the data as evenly as possible, even if some ID values are missing.
CREATE TABLE t AS
SELECT rownum AS id
FROM dual
CONNECT BY level <= 10;
WITH
data AS (
SELECT id, rownum as row_num
FROM t
),
total AS (
SELECT count(*) AS total_rows
FROM data
),
parts AS (
SELECT rownum as part_no, total.total_rows, total.total_rows / 3 as part_rows
FROM dual, total
CONNECT BY level <= 3
),
bounds AS (
SELECT parts.part_no,
parts.total_rows,
parts.part_rows,
COALESCE(LAG(data.row_num) OVER (ORDER BY parts.part_no) + 1, 1) AS start_row_num,
data.row_num AS end_row_num
FROM data
JOIN parts
ON data.row_num = ROUND(parts.part_no * parts.part_rows, 0)
)
SELECT bounds.part_no, d1.ID AS start_id, d2.ID AS end_id
FROM bounds
JOIN data d1
ON d1.row_num = bounds.start_row_num
JOIN data d2
ON d2.row_num = bounds.end_row_num
ORDER BY bounds.part_no;
PART_NO START_ID END_ID
---------- ---------- ----------
1 1 3
2 4 7
3 8 10
Does anybody know how to do a group by query by n records.
For example if I have a db with xn records I would like to aggregate the first 3 and then the next 3 and so on.
Where {x,n member of positive integers excluding 0} :)
Thanks
This does exactly what you want :
SELECT int(((T.Rank - 1) / 3)) AS GroupID, SUM(T.field_to_agregate)
FROM
(
SELECT (SELECT COUNT(*) FROM your_table AS T2 WHERE T1.ID>T2.ID) + 1 AS Rank , ID, field_to_agregate
FROM your_table AS T1
) T
GROUP BY int(((T.Rank - 1) / 3))
However since you did not posted any data sample and table structure (mistake!), I had to suppose that you have an ID field in your table, if not you will have to adapt it. If you don't suceed add more info about your data and I will adapt my query to match your table struct
Sorry because I dont think good title for my problem.
I have table a(f1 integer, date Long), date increase, and the data
f1 date
1 1
2 2
3 3
...
I need to sum f1 by date, with record 1{1,1} the sum f1 is 1,with record 2 the sum f1 is 1+2, record 3 the sum f1 is 1+2+3...
How can I do that?
This requires a correlated subquery:
SELECT date,
(SELECT SUM(f1)
FROM a AS a2
WHERE a2.date <= a.date
) AS f1_sum
FROM a
ORDER BY date;
But it's inefficient. Consider just scanning the table, sorted by the date, and summing f1 as you're reading it.