Kusto Query to transform the results in another table

Kusto Query to transform the results in another table - azure-data-explorer

I have in Azure Data Explorer a table like this:
OrderNo
InitDate
EndDate
Value
00001
2021/06/23 00:15
2021/06/23 00:19
5
00002
2021/06/23 00:35
2021/06/23 00:43
8
How I can get with a Kusto query the table with this information:
OrderNo
Date
Value
00001
2021/06/23 00:15
0
00001
2021/06/23 00:19
5
00002
2021/06/23 00:35
5
00002
2021/06/23 00:43
13
I need get foreach InitDate and EndDate a row.
In the value I need:
If is the row representing the InitDate, 0 or the previous value.
If is the row representing the EndDate, the previous value plus their value.
How I can achieve this with Kusto in Azure Data Explorer?

Would something like the below work for you?
datatable(OrderNo:string, InitDate:datetime, EndDate:datetime, Value:long)
[
"00001", datetime(2021-06-23 00:15), datetime(2021-06-23 00:19), 5,
"00002", datetime(2021-06-23 00:35), datetime(2021-06-23 00:43), 8
]
| order by InitDate asc
| extend EndValue = row_cumsum(Value) | extend InitValue = iff(isnull(prev(EndValue)), 0, prev(EndValue))
| extend Date = pack_array(InitDate, EndDate), Value = pack_array(InitValue, EndValue)
| mv-expand Date to typeof(datetime), Value to typeof(long)
| project OrderNo, Date, Value

Related

Kusto Query Language - Round datetime to nearest month using bin

I have plenty of logs with its own timestamp, and I am trying to count the logs on a monthly basis.
Here is a sample table and query using bin(30d):
datatable(Date:datetime, Log:string)[
datetime(2018-02-02T15:14),"log1",
datetime(2018-03-23T12:14),"log2",
datetime(2018-03-24T16:14),"log3",
datetime(2019-04-26T15:14),"log4"]
| summarize count(Log) by bin(Date,30d)
The output I want:
Date count_Log
2018-02 00:00:00.0000000 1
2018-03 00:00:00.0000000 2
2019-04 00:00:00.0000000 1
The output I get:
Date count_Log
2018-01-17 00:00:00.0000000 1 //see the date, it shows JAN but the log is of Feb
2018-03-18 00:00:00.0000000 2
2019-04-12 00:00:00.0000000 1
I need the summary month wise, so How can I do it month wise? I accept the bin size as a parameter of different values like 1h, 1d, 7d, 10d, etc. There is no timespan of 1 month.
So How can I do it without having to extract month/year manually ?

you can use the startofmonth() function
for example:
datatable(Date: datetime, Log: string)
[
datetime(2018-02-02T15:14), "log1",
datetime(2018-03-23T12:14), "log2",
datetime(2018-03-24T16:14), "log3",
datetime(2019-04-26T15:14), "log4"
]
| summarize count() by startofmonth(Date)
Column1
count_
2018-02-01 00:00:00.0000000
1
2018-03-01 00:00:00.0000000
2
2019-04-01 00:00:00.0000000
1

how to apply prev on group of rows with same value for column

I have data which looks like
Id
timestamp
customdimension
1
1/1/2022 10:00 am
"somedata"
1
1/1/2022 10:01 am
"somedata"
1
1/1/2022 10:20 am
"somedata"
2
1/1/2022 10:05 am
"somedata"
2
1/1/2022 10:07 am
"somedata"
2
1/1/2022 10:30 am
"somedata"
I want to group rows with same Id and timestamp within 5 mins of previous row(with same Id). So for any Id drop any rows which comes after > 5mins from the previous row.
expected result (third row for each Id is dropped as it > 5 mins of previous row for that Id)
Id
timestamp
customdimension
1
1/1/2022 10:00 am
"somedata"
1
1/1/2022 10:01 am
"somedata"
2
1/1/2022 10:05 am
"somedata"
2
1/1/2022 10:07 am
"somedata"
I tried using Prev, but not able to figure out how to bucket rows by Id.
requests
| extend PrviousId = prev(Id), timegap = datediff('minute', timestamp, prev(timestamp))
| where timegap < 5
this query is not considering by Id, but just by rows. Kindly provide input how to bucket/window rows by Id column and then apply prev.
Thanks

Here is one way to do it:
Test
| order by Id asc, timestamp asc
| extend prevTimestamp = iif(prev(Id) == Id, prev(timestamp), datetime(null))
| where timestamp - prevTimestamp <=5m or isnull(prevTimestamp)
| project-away prevTimestamp
Id
timestamp
customdimension
1
2022-01-01 10:00:00.0000000
"somedata"
1
2022-01-01 10:01:00.0000000
"somedata"
2
2022-01-01 10:05:00.0000000
"somedata"
2
2022-01-01 10:07:00.0000000
"somedata"
An alternative is to use the partition operator:
Test
| partition hint.strategy=native by Id
(
order by timestamp asc
| where timestamp - prev(timestamp) <= 5m or isnull(prev(timestamp))
)
Id
timestamp
customdimension
1
2022-01-01 10:00:00.0000000
"somedata"
1
2022-01-01 10:01:00.0000000
"somedata"
2
2022-01-01 10:05:00.0000000
"somedata"
2
2022-01-01 10:07:00.0000000
"somedata"

How to get count of multiple distinct columns with one column as date

I have a table with columns as id,date,name
id date name
1 2019-08-01 00:00:00 abc
1 2019-08-01 00:00:00 def
2 2019-08-01 00:00:00 pqr
1 2019-08-31 00:00:00 def
I want to get the count of id for given month.
The expected result for count of id for month 8 must be 3
SELECT strftime('%Y/%m/%d', date) as vdate,count(DISTINCT vdate,id) AS totalcount FROM cardtable WHERE date BETWEEN date('" + $rootScope.mydate + "', 'start of month') AND date('" + $rootScope.mydate + "','start of month','+1 month','-1 day') group by vdate
Basically i want to count if id and date both are distinct.for example if there are 2 entries on date 2019-08-01 with same id than it should give count as 1,if there 3 entries on date 2019-08-01 in which 2 entries are with id 1 and 3rd entry with 2 than it should count 2 and when there are 2 entries with id 1 and on different date lets say 1 entry on 2019-08-01 with id 1 and other on 2019-08-31 with id 1 than count id for month 8 must 2.How can i modify the above query.

Use a subquery which returns the distinct values that you want to count:
SELECT COUNT(*) AS totalcount
FROM (
SELECT DISTINCT strftime('%Y/%m/%d', date), id
FROM cardtable
WHERE date(date) BETWEEN
date('" + $rootScope.mydate + "', 'start of month')
AND
date('" + $rootScope.mydate + "','start of month','+1 month','-1 day')
)
See the demo.
Results:
| totalcount |
| ---------- |
| 3 |

Compare rows in different column Teradata

I need compare value from 1 column with previous value from 2 column. For example, I have table:
id | create_date | end_date
1 | 2016-12-31 | 2017-01-25
2 | 2017-01-26 | 2017-05-21
3 | 2017-05-22 | 2017-08-26
4 | 2017-09-01 | 2017-09-02
I need to compare create_date for id = 2 with end_date for id = 1
and compare create_date for id = 3 with end_date for id = 2 etc.
Result: show me id which has create_date (id = n) <> end_date (id = n-1) + interval '1' day
Should I use lag() function? How I can compare it? Which function I should use and how?
Thank you

Teradata doesn't have lag/lead, but you can still get the same functionality:
select
id,
create_date,
end_date,
max(end_date) over (order by id between 1 preceding and 1 preceding) as prev_end_date
...
qualify
create_date <> prev_end_date + INTERVAL '1' day;

How can I count cells using for loop in r? (table() is not applicable)

This is the sample data.
created_date start_date
2014-12-11 2014-12-10
2014-12-11 2014-12-11
2014-12-12 2014-12-13
2014-12-13 NULL
2014-12-13 2014-12-13
2014-12-13 2014-12-13
2014-12-23 NULL
2014-12-23 NULL
I'd like to count how many start_date was checked each day, according to the created_date. The value of start_date is not important, only the 'number' of start_dates checked is meaningful.
In this case, the result of for loop should be like this
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 2
2014-12-23 0
I cannot simply use table() because:
table(created_date) will count created_date, not start_date.
>table(created_date)
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 3
2014-12-23 2
table(start_date) won't work either, since it doesn't count the created date of "NULL" and more importantly, the value of start_date itself is meaningless.
>table(start_date)
created_date count
2014-12-10 1
2014-12-11 1
2014-12-13 3
NULL 3
I guess for loop should be used, but don't have idea how to code that.
Thanks in advance!

Short version: Use table separately on the complete data and on the empty rows only, subtract second from first.
Long version:
Assuming your data is in x (and that NULLs are actually NAs, see Gist for details):
Count the entries, and put them into data_frames for convenience:
library(dplyr)
all_counts = as_data_frame(table(x$created_date))
na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date))
Subtract the na_counts from the full_counts. To do this, we
first need to join these two tables. Joining will introduce NAs, which we
will replace by 0s:
full_join(all_counts, na_counts, by = 'Var1') %>%
mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
select(created_date = Var1, count)
Result:
| created_date | count |
|:---------------|--------:|
| 2014-12-11 | 2 |
| 2014-12-12 | 1 |
| 2014-12-13 | 2 |
| 2014-12-23 | 0 |

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Kusto Query to transform the results in another table - azure-data-explorer

Related

Kusto Query Language - Round datetime to nearest month using bin

how to apply prev on group of rows with same value for column

How to get count of multiple distinct columns with one column as date

Compare rows in different column Teradata

How can I count cells using for loop in r? (table() is not applicable)

Categories

Resources