I have in Azure Data Explorer a table like this:
OrderNo
InitDate
EndDate
Value
00001
2021/06/23 00:15
2021/06/23 00:19
5
00002
2021/06/23 00:35
2021/06/23 00:43
8
How I can get with a Kusto query the table with this information:
OrderNo
Date
Value
00001
2021/06/23 00:15
0
00001
2021/06/23 00:19
5
00002
2021/06/23 00:35
5
00002
2021/06/23 00:43
13
I need get foreach InitDate and EndDate a row.
In the value I need:
If is the row representing the InitDate, 0 or the previous value.
If is the row representing the EndDate, the previous value plus their value.
How I can achieve this with Kusto in Azure Data Explorer?
Would something like the below work for you?
datatable(OrderNo:string, InitDate:datetime, EndDate:datetime, Value:long)
[
"00001", datetime(2021-06-23 00:15), datetime(2021-06-23 00:19), 5,
"00002", datetime(2021-06-23 00:35), datetime(2021-06-23 00:43), 8
]
| order by InitDate asc
| extend EndValue = row_cumsum(Value) | extend InitValue = iff(isnull(prev(EndValue)), 0, prev(EndValue))
| extend Date = pack_array(InitDate, EndDate), Value = pack_array(InitValue, EndValue)
| mv-expand Date to typeof(datetime), Value to typeof(long)
| project OrderNo, Date, Value
Related
I have plenty of logs with its own timestamp, and I am trying to count the logs on a monthly basis.
Here is a sample table and query using bin(30d):
datatable(Date:datetime, Log:string)[
datetime(2018-02-02T15:14),"log1",
datetime(2018-03-23T12:14),"log2",
datetime(2018-03-24T16:14),"log3",
datetime(2019-04-26T15:14),"log4"]
| summarize count(Log) by bin(Date,30d)
The output I want:
Date count_Log
2018-02 00:00:00.0000000 1
2018-03 00:00:00.0000000 2
2019-04 00:00:00.0000000 1
The output I get:
Date count_Log
2018-01-17 00:00:00.0000000 1 //see the date, it shows JAN but the log is of Feb
2018-03-18 00:00:00.0000000 2
2019-04-12 00:00:00.0000000 1
I need the summary month wise, so How can I do it month wise? I accept the bin size as a parameter of different values like 1h, 1d, 7d, 10d, etc. There is no timespan of 1 month.
So How can I do it without having to extract month/year manually ?
you can use the startofmonth() function
for example:
datatable(Date: datetime, Log: string)
[
datetime(2018-02-02T15:14), "log1",
datetime(2018-03-23T12:14), "log2",
datetime(2018-03-24T16:14), "log3",
datetime(2019-04-26T15:14), "log4"
]
| summarize count() by startofmonth(Date)
Column1
count_
2018-02-01 00:00:00.0000000
1
2018-03-01 00:00:00.0000000
2
2019-04-01 00:00:00.0000000
1
I have data which looks like
Id
timestamp
customdimension
1
1/1/2022 10:00 am
"somedata"
1
1/1/2022 10:01 am
"somedata"
1
1/1/2022 10:20 am
"somedata"
2
1/1/2022 10:05 am
"somedata"
2
1/1/2022 10:07 am
"somedata"
2
1/1/2022 10:30 am
"somedata"
I want to group rows with same Id and timestamp within 5 mins of previous row(with same Id). So for any Id drop any rows which comes after > 5mins from the previous row.
expected result (third row for each Id is dropped as it > 5 mins of previous row for that Id)
Id
timestamp
customdimension
1
1/1/2022 10:00 am
"somedata"
1
1/1/2022 10:01 am
"somedata"
2
1/1/2022 10:05 am
"somedata"
2
1/1/2022 10:07 am
"somedata"
I tried using Prev, but not able to figure out how to bucket rows by Id.
requests
| extend PrviousId = prev(Id), timegap = datediff('minute', timestamp, prev(timestamp))
| where timegap < 5
this query is not considering by Id, but just by rows. Kindly provide input how to bucket/window rows by Id column and then apply prev.
Thanks
Here is one way to do it:
Test
| order by Id asc, timestamp asc
| extend prevTimestamp = iif(prev(Id) == Id, prev(timestamp), datetime(null))
| where timestamp - prevTimestamp <=5m or isnull(prevTimestamp)
| project-away prevTimestamp
Id
timestamp
customdimension
1
2022-01-01 10:00:00.0000000
"somedata"
1
2022-01-01 10:01:00.0000000
"somedata"
2
2022-01-01 10:05:00.0000000
"somedata"
2
2022-01-01 10:07:00.0000000
"somedata"
An alternative is to use the partition operator:
Test
| partition hint.strategy=native by Id
(
order by timestamp asc
| where timestamp - prev(timestamp) <= 5m or isnull(prev(timestamp))
)
Id
timestamp
customdimension
1
2022-01-01 10:00:00.0000000
"somedata"
1
2022-01-01 10:01:00.0000000
"somedata"
2
2022-01-01 10:05:00.0000000
"somedata"
2
2022-01-01 10:07:00.0000000
"somedata"
I have a table with columns as id,date,name
id date name
1 2019-08-01 00:00:00 abc
1 2019-08-01 00:00:00 def
2 2019-08-01 00:00:00 pqr
1 2019-08-31 00:00:00 def
I want to get the count of id for given month.
The expected result for count of id for month 8 must be 3
SELECT strftime('%Y/%m/%d', date) as vdate,count(DISTINCT vdate,id) AS totalcount FROM cardtable WHERE date BETWEEN date('" + $rootScope.mydate + "', 'start of month') AND date('" + $rootScope.mydate + "','start of month','+1 month','-1 day') group by vdate
Basically i want to count if id and date both are distinct.for example if there are 2 entries on date 2019-08-01 with same id than it should give count as 1,if there 3 entries on date 2019-08-01 in which 2 entries are with id 1 and 3rd entry with 2 than it should count 2 and when there are 2 entries with id 1 and on different date lets say 1 entry on 2019-08-01 with id 1 and other on 2019-08-31 with id 1 than count id for month 8 must 2.How can i modify the above query.
Use a subquery which returns the distinct values that you want to count:
SELECT COUNT(*) AS totalcount
FROM (
SELECT DISTINCT strftime('%Y/%m/%d', date), id
FROM cardtable
WHERE date(date) BETWEEN
date('" + $rootScope.mydate + "', 'start of month')
AND
date('" + $rootScope.mydate + "','start of month','+1 month','-1 day')
)
See the demo.
Results:
| totalcount |
| ---------- |
| 3 |
I need compare value from 1 column with previous value from 2 column. For example, I have table:
id | create_date | end_date
1 | 2016-12-31 | 2017-01-25
2 | 2017-01-26 | 2017-05-21
3 | 2017-05-22 | 2017-08-26
4 | 2017-09-01 | 2017-09-02
I need to compare create_date for id = 2 with end_date for id = 1
and compare create_date for id = 3 with end_date for id = 2 etc.
Result: show me id which has create_date (id = n) <> end_date (id = n-1) + interval '1' day
Should I use lag() function? How I can compare it? Which function I should use and how?
Thank you
Teradata doesn't have lag/lead, but you can still get the same functionality:
select
id,
create_date,
end_date,
max(end_date) over (order by id between 1 preceding and 1 preceding) as prev_end_date
...
qualify
create_date <> prev_end_date + INTERVAL '1' day;
This is the sample data.
created_date start_date
2014-12-11 2014-12-10
2014-12-11 2014-12-11
2014-12-12 2014-12-13
2014-12-13 NULL
2014-12-13 2014-12-13
2014-12-13 2014-12-13
2014-12-23 NULL
2014-12-23 NULL
I'd like to count how many start_date was checked each day, according to the created_date. The value of start_date is not important, only the 'number' of start_dates checked is meaningful.
In this case, the result of for loop should be like this
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 2
2014-12-23 0
I cannot simply use table() because:
table(created_date) will count created_date, not start_date.
>table(created_date)
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 3
2014-12-23 2
table(start_date) won't work either, since it doesn't count the created date of "NULL" and more importantly, the value of start_date itself is meaningless.
>table(start_date)
created_date count
2014-12-10 1
2014-12-11 1
2014-12-13 3
NULL 3
I guess for loop should be used, but don't have idea how to code that.
Thanks in advance!
Short version: Use table separately on the complete data and on the empty rows only, subtract second from first.
Long version:
Assuming your data is in x (and that NULLs are actually NAs, see Gist for details):
Count the entries, and put them into data_frames for convenience:
library(dplyr)
all_counts = as_data_frame(table(x$created_date))
na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date))
Subtract the na_counts from the full_counts. To do this, we
first need to join these two tables. Joining will introduce NAs, which we
will replace by 0s:
full_join(all_counts, na_counts, by = 'Var1') %>%
mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
select(created_date = Var1, count)
Result:
| created_date | count |
|:---------------|--------:|
| 2014-12-11 | 2 |
| 2014-12-12 | 1 |
| 2014-12-13 | 2 |
| 2014-12-23 | 0 |