Aggregate/Summarize Timeseries data in Azure Data Explorer using Kusto - azure-data-explorer

I have a requirement where I need to regularize/aggregate data which is polled every 1 sec into 1 min intervals. And I have two columns which need to be aggregated as well, say SensorName, SensorValue. I am able to bin the timestamp to 1 minute, but I am not able to get the corresponding two colums. How do I do that? Below is the query I used and the output I get.
Table
| where TimeStamp between (datetime(2020-09-01)..datetime(2020-09-30))
| summarize by bin(TimeStamp , 1min)

based on my understanding of the question (could be wrong, as there's no clear specification of sample input/schema and matching output), you could try following this example - it calculates the average sensor value for each sensor name, using an aggregation span of 1 minute:
Table
| where TimeStamp between (datetime(2020-09-01)..datetime(2020-09-30))
| summarize avg(SensorValue) by SensorName, bin(TimeStamp, 1min)

Related

Pyspark GroupBy time span

I have data with a start and end date e.g.
+---+----------+------------+
| id| start| end|
+---+----------+------------+
| 1|2021-05-01| 2022-02-01|
| 2|2021-10-01| 2021-12-01|
| 3|2021-11-01| 2022-01-01|
| 4|2021-06-01| 2021-10-01|
| 5|2022-01-01| 2022-02-01|
| 6|2021-08-01| 2021-12-01|
+---+----------+------------+
I want a count for each month on how many observations were "active" in order to display that in a plot. With active I mean I want a count on how many observations have a start and end date that includes the given month. The result for the example data should look like this:
Example of a plot for the active times
I have looked into the pyspark Window function, but I don't think that can help me with my problem. So far my only idea is to specify an extra column for each month in the data and indicate whether the observation is active in that month and work from there. But I feel like there must be a much more efficient way to do this.
You can use sequence SQL. sequence will create the date range with start, end and interval and return the list.
Then, you can use explode to flatten the list and then count.
from pyspark.sql import functions as F
# Make sure your spark session is set to UTC.
# This SQL won't work well with a month interval if timezone is set to a place that has a daylight saving.
spark = (SparkSession
.builder
.config('spark.sql.session.timeZone', 'UTC')
... # other config
.getOrCreate())
df = (df.withColumn('range', F.expr('sequence(to_date(`start`), to_date(`end`), interval 1 month) as date'))
.withColumn('observation', F.explode('range')))
df = df.groupby('observation').count()

Using KQL (Kusto query language), how to group datetimes into weeks (or 7-day chunks)?

I am running KQL (Kusto query language) queries against Azure Application Insights. I have certain measurements that I want to aggregate weekly. I am trying to figure out how to split my data into weeks.
To illustrate what I seek, here is a query that computes daily averages of the duration column.
requests
| where timestamp > ago(7d)
| summarize
avg(duration)
by
Date = format_datetime(timestamp, "yyyy-MM-dd")
This produces something similar to this:
In the above I have converted datetimes to string and thus effectively "rounded them down" to the precision of one day. This may be ugly, but it's the easiest way I could think of in order to group all results from a given day. It would be trivial to round down to months or years with the same technique.
But what if I want to group datetimes by week? Is there a nice way to do that?
I do not care whether my "weeks" start on Monday or Sunday or January 1st or whatever. I just want to group a collection of KQL datetimes into 7-day chunks. How can I do that?
Thanks in advance!
Looks like you are looking for the "bin()" function:
requests
| where timestamp > ago(7d)
| summarize
avg(duration)
by
bin(timestamp, 1d) // one day, for 7 days change it to 7d
I found out that I can use the week_of_year function to split datetimes by week number:
requests
| where timestamp > ago(30d)
| summarize
avg(duration)
by
Week = week_of_year(timestamp)
| sort by Week

kusto query - how to group by date and also group by name

In below query I am looking at one API (foo/bar1) duration in 80th percentile that called in given date range so that I can see if there is any spike or degradation. (image below)
let dataset = requests
| where name == "GET foo/bar1"
and timestamp between(datetime("2020-10-15") .. datetime('2020-10-28'));
dataset
| summarize loadTime = round(percentile(duration, 80)) by format_datetime(timestamp, 'yyyy-MM-dd')
| order by timestamp desc
The challenge I'm facing is there can be more than one API (there are about 150 in my environment) and I also want to get those API's 80th percentile but having difficulty how to do it or even possible.
I might figure this out.. by removing 'name' from dataset then add 'name' to grouping section at the end of summarize row.
let dataset = requests
|
where timestamp between(datetime("2020-10-25") .. datetime('2020-10-28'));
dataset
| summarize loadTime = round(percentile(duration, 80)) by format_datetime(timestamp, 'yyyy-MM-dd'), name
| order by timestamp desc

How to include 'Time' in Date Hierarchy in Power BI

I am working on a report in Power BI. One of the tables in my data model collects sensor data. It has the following columns:
Serial (int) i.e. 123456789
Timestamp (datetime) i.e. 12/20/2016 12:04:23 PM
Reading (decimal) i.e. 123.456
A new record is added every few minutes, with the current reading from the sensor.
Power BI automatically creates a Hierarchy for the datetime column, which includes Year, Quarter, Month and Day. So, when you add a visual to your report, you can easily drill down to each of those levels.
I would like to include the "Time" part of the data in the hierarchy, so that you can drill down one more level after "Day", and see the detailed readings during that period.
I have already set up a Date table, using the CALENDARAUTO() function, added all of the appropriate columns, and related it to my Readings table in order to summarize the data by date - which works great. But it does not include the "Time" dimension.
I have looked at the following SO questions, but they didn't help:
Time-based drilldowns in Power BI powered by Azure Data Warehouse
Creating time factors in PowerBI
I also found this article, but it was confusing:
Power BI Date & Time Dimension Toolkit
Any ideas?
Thanks!
Unfortunately, I can not comment on the previous answer, so I have to add this as separate answer:
Yes, there is a way to automatically generate Date and Time-Tables. Here's some example code I use in my reports:
let
Source = List.Dates(startDate, Duration.Days(DateTime.Date(DateTime.LocalNow()) - startDate)+1, #duration(1,0,0,0)),
convertToTable = Table.FromList(Source, Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
calcDateKey = Table.AddColumn(convertToTable, "DateKey", each Date.ToText([Date], "YYYYMMDD")),
yearIndex = Table.AddColumn(calcDateKey, "Year", each Date.Year([Date])),
monthIndex = Table.AddColumn(yearIndex, "MonthIndex", each Date.Month([Date])),
weekIndex = Table.AddColumn(monthIndex, "WeekIndex", each Date.WeekOfYear([Date])),
DayOfWeekIndex = Table.AddColumn(weekIndex, "DayOfWeekIndex", each Date.DayOfWeek([Date], 1)),
DayOfMonthIndex = Table.AddColumn(DayOfWeekIndex, "DayOfMonthIndex", each Date.Day([Date])),
Weekday = Table.AddColumn(DayOfMonthIndex, "Weekday", each Date.ToText([Date], "dddd")),
setDataType = Table.TransformColumnTypes(Weekday,{{"Date", type date}, {"DateKey", type text}, {"Year", Int64.Type}, {"MonthIndex", Int64.Type}, {"WeekIndex", Int64.Type}, {"DayOfWeekIndex", Int64.Type}, {"DayOfMonthIndex", Int64.Type}, {"Weekday", type text}})
in
setDataType
Just paste it into an empty query. The code uses a parameter called startDate, so you want to make sure you have something similar in place.
And here's the snippet for a time-table:
let
Source = List.Times(#time(0,0,0) , 1440, #duration(0,0,1,0)),
convertToTable = Table.FromList(Source, Splitter.SplitByNothing(), {"DayTime"}, null, ExtraValues.Error),
createTimeKey = Table.AddColumn(convertToTable, "TimeKey", each Time.ToText([DayTime], "HHmmss")),
hourIndex = Table.AddColumn(createTimeKey, "HourIndex", each Time.Hour([DayTime])),
minuteIndex = Table.AddColumn(hourIndex, "MinuteIndex", each Time.Minute([DayTime])),
setDataType = Table.TransformColumnTypes(minuteIndex,{{"DayTime", type time}, {"TimeKey", type text}, {"HourIndex", Int64.Type}, {"MinuteIndex", Int64.Type}})
in
setDataType
If you use the DateKey and TimeKey (like suggested in the first answer) in your fact-table, you can easily generate the date/time-hierarchy by simply putting the time-element in the visualization below the date-element like this
date-time-hierarchy
You will want separate date & time tables. You don't want to put the time into the date table, because the time is repeated every day.
A Time dimension is the same principal as a Date dimension, except instead of a row for every day, you would have a row for every minute or every second (depending on how exact you want to be - I wouldn't recommend including second unless you absolutely needed it, as it greatly increases the number of rows you need - impacting performance). There would be no reference to date in the time table.
E.g.
Time | Time Text| Hour | Minute | AM/PM
---------|----------|------|--------|------
12:00 AM | 12:00 AM | 12 | 00 | AM
12:01 AM | 12:01 AM | 12 | 01 | AM
12:02 AM | 12:02 AM | 12 | 02 | AM
... | ... | ... | ... | ...
I include a time/text column since Power BI has a habit of adding a date from 1899 to time data types. You can add other columns if they'd be helpful to you too.
In your fact table, you'll want to split your datetime column into separate date & time columns, so that you can join the date to the date table & the time to the time table. The time will likely need to be converted to the nearest round minute or second so that every time in your data corresponds to a row in your time table.
It's worth keeping but hiding the original datetime field in your data in case you later want to calculate durations that span days.
In Power BI, you'd add the time attribute (or the hour (and minute) attribute) under the month/day attributes on your axis to make a column chart that can be drilled from year > quarter > month > day > hour > minute. Power BI doesn't care that the attributes come from different tables.
You can read more about time dimensions here: http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/
Hope this helps.
My approach was to create new column with given formula:
<new-column-name>=Format([<your-datetime-column>],"hh:mm:ss")
This will create a new column and now you can select it with your-datetime-column to create a drill-down effect.
I created a new custom column and set formula=[Timestamp] and change type to datetime.
#"Added Custom" = Table.AddColumn(#"Added Conditional Column16", "TestTimestamp", each [Timestamp]),
#"Changed Type" = Table.TransformColumnTypes(#"Added Custom",{{"TestTimestamp", type datetime}}),

Count between months in Tableau

I am needing to count month between collect dates. I need to know if the test was run in the last 3 months. Below is the code I used but it is giving me a count of zero, but I know they had 3 of the same tests run in a year because I can see the dates. I understand the first one have a count of zero, because there is no test before that, but the count for the other should be 3, 5 respectively.
DATEDIFF('month',[Collect Date],[Collect Date])
Dates of the Tests.
1/8/2015
4/23/2015
9/30/2015
What you are looking for is possible using the LOOKUP function in Tableau. Keep in mind, that the result relies heavily on the data that is displayed and how it is displayed (sorted, etc).
You can create a calculated field like this:
DATEDIFF("month",LOOKUP(ATTR([Test Date]),-1),ATTR([Test Date]))
Which calculates the number of months between the date in the current row and the date from the prior row.
Your result will look something like this:

Resources