kusto query - how to group by date and also group by name - azure-data-explorer

In below query I am looking at one API (foo/bar1) duration in 80th percentile that called in given date range so that I can see if there is any spike or degradation. (image below)
let dataset = requests
| where name == "GET foo/bar1"
and timestamp between(datetime("2020-10-15") .. datetime('2020-10-28'));
dataset
| summarize loadTime = round(percentile(duration, 80)) by format_datetime(timestamp, 'yyyy-MM-dd')
| order by timestamp desc
The challenge I'm facing is there can be more than one API (there are about 150 in my environment) and I also want to get those API's 80th percentile but having difficulty how to do it or even possible.

I might figure this out.. by removing 'name' from dataset then add 'name' to grouping section at the end of summarize row.
let dataset = requests
|
where timestamp between(datetime("2020-10-25") .. datetime('2020-10-28'));
dataset
| summarize loadTime = round(percentile(duration, 80)) by format_datetime(timestamp, 'yyyy-MM-dd'), name
| order by timestamp desc

Related

Pyspark GroupBy time span

I have data with a start and end date e.g.
+---+----------+------------+
| id| start| end|
+---+----------+------------+
| 1|2021-05-01| 2022-02-01|
| 2|2021-10-01| 2021-12-01|
| 3|2021-11-01| 2022-01-01|
| 4|2021-06-01| 2021-10-01|
| 5|2022-01-01| 2022-02-01|
| 6|2021-08-01| 2021-12-01|
+---+----------+------------+
I want a count for each month on how many observations were "active" in order to display that in a plot. With active I mean I want a count on how many observations have a start and end date that includes the given month. The result for the example data should look like this:
Example of a plot for the active times
I have looked into the pyspark Window function, but I don't think that can help me with my problem. So far my only idea is to specify an extra column for each month in the data and indicate whether the observation is active in that month and work from there. But I feel like there must be a much more efficient way to do this.
You can use sequence SQL. sequence will create the date range with start, end and interval and return the list.
Then, you can use explode to flatten the list and then count.
from pyspark.sql import functions as F
# Make sure your spark session is set to UTC.
# This SQL won't work well with a month interval if timezone is set to a place that has a daylight saving.
spark = (SparkSession
.builder
.config('spark.sql.session.timeZone', 'UTC')
... # other config
.getOrCreate())
df = (df.withColumn('range', F.expr('sequence(to_date(`start`), to_date(`end`), interval 1 month) as date'))
.withColumn('observation', F.explode('range')))
df = df.groupby('observation').count()

Using KQL (Kusto query language), how to group datetimes into weeks (or 7-day chunks)?

I am running KQL (Kusto query language) queries against Azure Application Insights. I have certain measurements that I want to aggregate weekly. I am trying to figure out how to split my data into weeks.
To illustrate what I seek, here is a query that computes daily averages of the duration column.
requests
| where timestamp > ago(7d)
| summarize
avg(duration)
by
Date = format_datetime(timestamp, "yyyy-MM-dd")
This produces something similar to this:
In the above I have converted datetimes to string and thus effectively "rounded them down" to the precision of one day. This may be ugly, but it's the easiest way I could think of in order to group all results from a given day. It would be trivial to round down to months or years with the same technique.
But what if I want to group datetimes by week? Is there a nice way to do that?
I do not care whether my "weeks" start on Monday or Sunday or January 1st or whatever. I just want to group a collection of KQL datetimes into 7-day chunks. How can I do that?
Thanks in advance!
Looks like you are looking for the "bin()" function:
requests
| where timestamp > ago(7d)
| summarize
avg(duration)
by
bin(timestamp, 1d) // one day, for 7 days change it to 7d
I found out that I can use the week_of_year function to split datetimes by week number:
requests
| where timestamp > ago(30d)
| summarize
avg(duration)
by
Week = week_of_year(timestamp)
| sort by Week

Make the value as 0, if rows not available in Kusto

My query has count function which returns the count of rows summarized by day. Now, when there are no rows from that table, I'm not getting any result, instead I need, rows with all days and count as zero. I tried with coalesc but didnt work. Any help is much appreciated!
Thanks!
Here is my query:
exceptions
| where name == 'my_scheduler' and timestamp > ago(30d)
| extend day = split(tostring(timestamp + 19800s), 'T')[0]
| summarize schedulerFailed = coalesce(count(),tolong("0")) by tostring(day)
Instead of summarize you need to use make-series which will fill the gaps with a default value for you.
exceptions
| where name == 'my_scheduler' and timestamp > ago(30d)
| extend day = split(tostring(timestamp + 19800s), 'T')[0]
| make-series count() on tolong(x) step 1
You might want to add from and to to make-series in order for it to also fill gaps at the beginning and the end of the 30d period.

Aggregate/Summarize Timeseries data in Azure Data Explorer using Kusto

I have a requirement where I need to regularize/aggregate data which is polled every 1 sec into 1 min intervals. And I have two columns which need to be aggregated as well, say SensorName, SensorValue. I am able to bin the timestamp to 1 minute, but I am not able to get the corresponding two colums. How do I do that? Below is the query I used and the output I get.
Table
| where TimeStamp between (datetime(2020-09-01)..datetime(2020-09-30))
| summarize by bin(TimeStamp , 1min)
based on my understanding of the question (could be wrong, as there's no clear specification of sample input/schema and matching output), you could try following this example - it calculates the average sensor value for each sensor name, using an aggregation span of 1 minute:
Table
| where TimeStamp between (datetime(2020-09-01)..datetime(2020-09-30))
| summarize avg(SensorValue) by SensorName, bin(TimeStamp, 1min)

Getting the sum of a category in a specific month in sqlite

I am trying to get the sum of all categories from a certain month from my transactions table in my sqlite database. Here is how the table is set up...
| id | transactionDate | transactionAmount | transactionCategory | transactionAccount |
Now, I want to specify three things:
The account name
The month
The year
And get the sum of the transactionAmount grouped by transactionCategory from the specified account, year, and month.
Here is what my SELECT statement looks like...
SELECT SUM(transactionAmount) AS total, transactionDate, transactionCategory
FROM transactions
WHERE transactionAccount=? AND Strftime(\"%m\", transactionDate)=? AND Strftime(\"%y\", transactionDate)=?
GROUP BY transactionCategory ORDER BY transactionCategory
Unfortunately, this returns zero rows. I am able to get accurate results if I don't try and select the month and year, but I would like to see the data from specific ranges of time...
I figured out the issue. I was simply formatting the year incorrectly. It should have been strftime('%Y', transactionDate)=? NOT strftime('%y', transactionDate)=? - the difference being a capital Y vs. a lowercase one.

Resources