Kusto SubQuery Referencing "outer" query - azure-data-explorer

I'm trying to write a Kusto (KQL) query which in SQL I'd write with a subquery that references the "outer" query like below. However, I cannot find/understand how to accomplish the equivalent in KQL. Is it possible to have a subquery reference the "outer" query in KQL? If not, some other method to accomplish the below?
Basic idea is joining a table onto itself to get the next time value. IE return the min time value where time is greater than current record time and matches the key (DeviceName in this case).
DECLARE #DeviceIPs TABLE (DeviceName NVARCHAR(MAX), DeviceIP NVARCHAR(MAX), TimeGenerated DATETIME)
INSERT INTO #DeviceIPs SELECT 'PC1', '192.168.100.1', '2021-01-01'
INSERT INTO #DeviceIPs SELECT 'PC1', '192.168.100.2', '2021-01-02'
INSERT INTO #DeviceIPs SELECT 'PC1', '192.168.100.3', '2021-01-03'
INSERT INTO #DeviceIPs SELECT 'PC2', '192.168.100.3', '2021-01-01'
INSERT INTO #DeviceIPs SELECT 'PC2', '192.168.100.1', '2021-01-02'
INSERT INTO #DeviceIPs SELECT 'PC2', '192.168.100.2', '2021-01-03'
SELECT
i.DeviceName,
i.DeviceIP,
i.TimeGenerated AS BeginDateTime,
ISNULL((SELECT MIN(i2.TimeGenerated) FROM #DeviceIPs i2 WHERE i.DeviceName = i2.DeviceName AND i.TimeGenerated < i2.TimeGenerated), '2200-01-01') AS EndDateTime
FROM
#DeviceIPs i
In the above data structure each row represents the time when a device was granted an IP address (BeginDateTime). The result I'm looking for is then to also get the "EndDateTime" which would be the next time the device was granted an IP address. If there is no "next record", I'm just setting some random future dated date as the end date, but that part isn't really pertinent to this question. The expected results would be:
DeviceName
DeviceIP
BeginDateTime
EndDateTime
PC1
192.168.0.1
2021-01-01
2021-01-02
PC1
192.168.0.2
2021-01-02
2021-01-03
PC1
192.168.0.3
2021-01-03
2200-01-01
PC2
192.168.0.3
2021-01-01
2021-01-02
PC2
192.168.0.1
2021-01-02
2021-01-03
PC2
192.168.0.2
2021-01-03
2200-01-01

Assuming i understood your intention correctly, the following should work:
.create table DeviceIPs (DeviceName:string, DeviceIP: string, TimeGenerated:datetime)
.ingest inline into table DeviceIPs <|
PC1,192.168.100.1,2021-01-01
PC1,192.168.100.2,2021-01-02
PC1,192.168.100.3,2021-01-03
PC2,192.168.100.3,2021-01-01
PC2,192.168.100.1,2021-01-02
PC2,192.168.100.2,2021-01-03
DeviceIPs
| order by DeviceName asc, TimeGenerated asc
| project DeviceName, DeviceIP, BeginDateTime = TimeGenerated, EndDateTime = case(next(DeviceName) == DeviceName, next(TimeGenerated), datetime(2200-01-01))
DeviceName
DeviceIP
BeginDateTime
EndDateTime
PC1
192.168.100.1
2021-01-01 00:00:00.0000000
2021-01-02 00:00:00.0000000
PC1
192.168.100.2
2021-01-02 00:00:00.0000000
2021-01-03 00:00:00.0000000
PC1
192.168.100.3
2021-01-03 00:00:00.0000000
2200-01-01 00:00:00.0000000
PC2
192.168.100.3
2021-01-01 00:00:00.0000000
2021-01-02 00:00:00.0000000
PC2
192.168.100.1
2021-01-02 00:00:00.0000000
2021-01-03 00:00:00.0000000
PC2
192.168.100.2
2021-01-03 00:00:00.0000000
2200-01-01 00:00:00.0000000

Related

Change the way Join Operator renames like columns

Is there a way to change the way the Join Operator appends a '1' at the end of columns that have the same name across both join tables? I would like to do this without renaming the columns explicitly.
datatable(TableKey:string, CreatedDate:datetime)
['1', datetime('2022-01-01')]
| join kind=inner (
datatable(TableKey:string, CreatedDate:datetime)
['1', datetime('2022-01-02')]
) on TableKey
Result:
TableKey CreatedDate TableKey1 CreatedDate1
1 2022-01-01 00:00:00.0000000 1 2022-01-02 00:00:00.0000000

KQL window functions - how to partition by multiple columns?

Input table dimVehicleV1:
SaleStart
Product
Model
1/1/2020
Car
1
1/2/2020
Bike
1
2/1/2020
Car
2
3/1/2020
Bike
2
Desired output dimVehicleV2:
SaleStart
Product
Model
SaleEnd
1/1/2020
Car
1
2/1/2020
1/2/2020
Bike
1
3/1/2020
2/1/2020
Car
2
null
3/1/2020
Bike
2
null
I see serialization via order by, and then the next() function. I don't see how to make it respect the Product column groupings though.
Fail query:
let dimVehicleV2 =
dimVehicleV1
| order by Product asc, SaleStart asc
| extend SaleEnd = next(SaleStart, 1);
dimVehicleV2
How does one use the next() function so that it respects column groups?
If I understand your question correctly, this should work:
datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
]
| order by Product asc, SaleStart asc
| extend SaleEnd = iff(next(Product) == Product and next(Model) != Model, next(SaleStart), datetime(null))
SaleStart
Product
Model
SaleEnd
2020-01-01 00:00:00.0000000
Car
1
2020-02-01 00:00:00.0000000
2020-01-02 00:00:00.0000000
Bike
1
2020-03-01 00:00:00.0000000
2020-02-01 00:00:00.0000000
Car
2
2020-03-01 00:00:00.0000000
Bike
2
I came to this post searching for an answer to the question actually in the title of this post: "How to partition by multiple columns?"
In case someone else needs, here is what I ended up doing: extend the domain by creating a new column that combines the values of the multiple columns you want, and use that new column as the partition key.
You can combine the columns by using concatenation, or a hash, or something else.
dimVehicleV1
| extend PartitionKey = strcat(Product, ":", Model)
| partition hint.strategy=native by PartitionKey (top 1 by SaleStart) // or wharever partition transformation
In case useful to anyone, I found a solution I prefer over Yoni's perfectly adequate one.
let MyTable = datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
];
MyTable
| partition by Product
(
order by Model asc
| extend SaleEnd = next(SaleStart)
)
This seems to me to abstract away the details of the logic required, expressing just the thought.

SQL: grouping to have exact rows

Let's say there is a schema:
|date|value|
DBMS is SQLite.
I want to get N groups and calculate AVG(value) for each of them.
Sample:
2020-01-01 10:00|2.0
2020-01-01 11:00|2.0
2020-01-01 12:00|3.0
2020-01-01 13:00|10.0
2020-01-01 14:00|2.0
2020-01-01 15:00|3.0
2020-01-01 16:00|11.0
2020-01-01 17:00|2.0
2020-01-01 18:00|3.0
Result (N=3):
2020-01-01 11:00|7.0/3
2020-01-01 14:00|15.0/3
2020-01-01 17:00|16.0/3
I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
SELECT
/*AVG(*/value/*)*/,
NTILE (3) OVER (ORDER BY date) bucket
FROM
test
/*GROUP BY bucket*/
/*GROUP BY NTILE (3) OVER (ORDER BY date) bucket*/
Also dropped the test data and this query into DBFiddle.
You can use NTILE() window function to create the groups and aggregate:
SELECT
DATETIME(MIN(DATE), ((STRFTIME('%s', MAX(DATE)) - STRFTIME('%s', MIN(DATE))) / 2) || ' second') date,
ROUND(AVG(value), 2) avg_value
FROM (
SELECT *, NTILE(3) OVER (ORDER BY date) grp
FROM test
)
GROUP BY grp;
To change the number of rows in each bucket, you must change the number 3 inside the parentheses of NTILE().
See the demo.
Results:
| date | avg_value |
| ------------------- | --------- |
| 2020-01-01 11:00:00 | 2.33 |
| 2020-01-01 14:00:00 | 5 |
| 2020-01-01 17:00:00 | 5.33 |
I need to use a windowing function, like NTILE, but it seems NTILE is not usable after GROUP BY. It can create buckets, but then how can I use these buckets for aggregation?
You first use NTILE to assign bucket numbers in a subquery, then group by it in an outer query.
Using sub-query
SELECT bucket
, AVG(value) AS avg_value
FROM ( SELECT value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
) x
GROUP BY bucket
ORDER BY bucket
Using WITH clause
WITH x AS (
SELECT date
, value
, NTILE(3) OVER ( ORDER BY date ) AS bucket
FROM test
)
SELECT bucket
, COUNT(*) AS bucket_size
, MIN(date) AS from_date
, MAX(date) AS to_date
, MIN(value) AS min_value
, AVG(value) AS avg_value
, MAX(value) AS max_value
, SUM(value) AS sum_value
FROM x
GROUP BY bucket
ORDER BY bucket

How to query data in Oracle, showing 1 extra row under som condition?

i have a simple table like below,
NAME MONEY TYPE
----- ------ ----------
Tom 10000 food
Jim 6000 food
Tom 5000 transport
Jim 3000 transport
i need to split out one extra row for food , and the money will be 20%*orginal amount, type will be food_split, just like below.
NAME MONEY TYPE
----- ------ -----------
Tom 8000 food
Tom 2000 food_split
Tom 5000 transport
Jim 4800 food
Jim 1200 food_split
Jim 3000 transport
How should i do it ? Is there any function/solution that can help. Thanks.
You could query the table twice, the first time adjusting the food amount to 80% of its original value, and the second just calculating the split values:
-- CTE for sample data
with your_table (name, money, type) as (
select 'Tom', 10000,'food' from dual
union all select 'Jim', 6000, 'food' from dual
union all select 'Tom', 5000, 'transport' from dual
union all select 'Jim', 3000, 'transport' from dual
)
-- actual query
select name,
money * case when type = 'food' then 0.8 else 1 end as money,
type
from your_table
union all
select name,
money * 0.2,
'food_split'
from your_table
where type = 'food';
NAME MONEY TYPE
---------- ---------- ----------
Tom 8000 food
Jim 4800 food
Tom 5000 transport
Jim 3000 transport
Tom 2000 food_split
Jim 1200 food_split
Or if you don't want to hit the table twice you could use a recursive CTE (11gR2+):
with rcte (name, money, type) as (
select name,
money * case when type = 'food' then 0.8 else 1 end as money,
type
from your_table
union all
select name,
money / 4,
'food_split'
from rcte
where type = 'food'
)
select * from rcte;
NAME MONEY TYPE
---------- ---------- ----------
Tom 8000 food
Jim 4800 food
Tom 5000 transport
Jim 3000 transport
Tom 2000 food_split
Jim 1200 food_split

SQLite - Merge 2 tables according to modified date, insert a new row if necessary

I have a table having an ID column, this column is a primary key and unique as well. In addition, the table has a modified date column.
I have the same table in 2 databases and I am looking to merge both into one database. The merging scenario in a table is as follows:
Insert the record if the ID is not present;
If the ID exists, only update if the modified date is greater than that of the existing row.
For example, having:
Table 1:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | jane | 2019-01-01 | 2019-04-03
Table 2:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-04-30
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
The resulting table would be:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
I've read about INSERT OR REPLACE, but I couldn't figure out how the date condition can be applied. I know as well that I can loop through each pair of similar row and check the date manually but this would be very time and performance consuming. Therefore, is there an efficient way to accomplish this in SQLite?
I'm using sqlite3 on Node.js .
The UPSERT notation added in Sqlite 3.24 makes this easy:
INSERT INTO table1(id, name, createdAt, modifiedAt)
SELECT id, name, createdAt, modifiedAt FROM table2 WHERE true
ON CONFLICT(id) DO UPDATE
SET (name, createdAt, modifiedAt) = (excluded.name, excluded.createdAt, excluded.modifiedAt)
WHERE excluded.modifiedAt > modifiedAt;
First create the table Table3:
CREATE TABLE Table3 (
id INTEGER,
name TEXT,
createdat TEXT,
modifiedat TEXT,
PRIMARY KEY(id)
);
and then insert the rows like this:
insert into table3 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat from (
select * from table1 t1
where not exists (
select 1 from table2 t2
where t2.id = t1.id and t2.modifiedat >= t1.modifiedat
)
union all
select * from table2 t2
where not exists (
select 1 from table1 t1
where t1.id = t2.id and t1.modifiedat > t2.modifiedat
)
)
This uses a UNION ALL for the 2 tables and gets only the needed rows with EXISTS which is a very efficient way to check the condition you want.
I have >= instead of > in the WHERE clause for Table1 in case the 2 tables have a row with the same id and the same modifiedat values.
In this case the row from Table2 will be inserted.
If you want to merge the 2 tables in Table1 you can use REPLACE:
replace into table1 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat
from table2 t2
where
not exists (
select 1 from table1 t1
where (t1.id = t2.id and t1.modifiedat > t2.modifiedat)
)

Resources