I'm a bit stuck with trying to calculate the 95th centile figure for some data from a time series which has been summarised into 1-minute bins over a 24 hour period, but where some of the bins are missing due to no data being recorded during those bins.
For example given this table, which has already been summarised into bins from the raw data:
datatable (Timestamp: datetime, MaxRUsPerSecond: double)
[
'2020-07-06 00:01:00', 1,
'2020-07-06 00:20:00', 10
]
If I simply add | summarize percentile(MaxRUsPerSecond, 95) it will give me the value 10 which is mathematically correct, but it ignores the 18 missing minute-by-minute samples which should be treated as zero value.
In effect, the result I really want calculated is this, which gives a 95th centile as 1:
datatable (Timestamp: datetime, MaxRUsPerSecond: double)
[
'2020-07-06 00:01:00', 1,
'2020-07-06 00:02:00', 0,
'2020-07-06 00:03:00', 0,
'2020-07-06 00:04:00', 0,
'2020-07-06 00:05:00', 0,
'2020-07-06 00:06:00', 0,
'2020-07-06 00:07:00', 0,
'2020-07-06 00:08:00', 0,
'2020-07-06 00:09:00', 0,
'2020-07-06 00:10:00', 0,
'2020-07-06 00:11:00', 0,
'2020-07-06 00:12:00', 0,
'2020-07-06 00:13:00', 0,
'2020-07-06 00:14:00', 0,
'2020-07-06 00:15:00', 0,
'2020-07-06 00:16:00', 0,
'2020-07-06 00:17:00', 0,
'2020-07-06 00:18:00', 0,
'2020-07-06 00:19:00', 0,
'2020-07-06 00:20:00', 10,
]
| summarize percentile(MaxRUsPerSecond, 95)
I started looking at weighted percentiles using percentilew but it felt like I was starting down a rabbit hole trying to append a synthetic bin to account for the missing ones, and then working out what weight to give it based on the number of missing bins, so I stopped for a minute to see if anyone else has a better idea.
For context, I'm trying to get the maximum throughput (RU/s) per minute from a CosmosDB account. This is the query I've got so far:
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPerSecond = sum(todouble(requestCharge_s)) by collectionName_s, _ResourceId, bin(TimeGenerated, 1sec)
| summarize MaxRUsPerSecond = max(ConsumedRUsPerSecond) by collectionName_s, _ResourceId, bin(TimeGenerated, 1min)
Basically, get the total Consumed RUs for each collection into 1-second bins, and then get the maximum value of those for each minute. If I can then get the 95th centile of those (somehow including the missing 1-minute bins) it will tell me whether I can scale down some of our collections to smaller throughputs.
In general you can fill missing values in arrays, first option is to use the make-series operator and specify the 'default' argument to the value that you want to use to replace the missing values or use one of the series_fill functions such as series_fill_linear.
Once you created the arrays, you can expand them using mv-expand operator and calculate the percentiles.
Here is an example:
let Start = datetime(2020-07-06 00:01:00);
let End = datetime(2020-07-06 00:21:00);
datatable (Timestamp: datetime, MaxRUsPerSecond: double)
[
datetime(2020-07-06 00:01:00), 1,
datetime(2020-07-06 00:20:00), 10
]
| make-series MaxRUsPerSecond= any(MaxRUsPerSecond) default =0 on Timestamp from Start to End step 1m
| mv-expand MaxRUsPerSecond to typeof(double), Timestamp to typeof(datetime)
| summarize percentiles(MaxRUsPerSecond, 95)
Related
I collect Free disk space metrics at regular intervals and would like to predict when the disk will be full.
I thought I could use series_decompose_forecast
Here's a sample query:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
| make-series
FreeSpace = max(FreeSpace) default= long(null)
on Timestamp from ago(60d) to now() step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend series_decompose_forecast(FreeSpace, 24)
| render timechart
And the result
The baseline seems like it could show me when it will hit zero (or some other threshold), but if I specify more Points, it excludes more points from the learning process (still unsure if it excludes them from the start or end).
I don't even care for the whole time series, just the date of running out of free space. Is this the correct approach?
It seems that series_fit_line() is more than enough in this scenario.
Once you got the slope and the interception you can calculate any point on the line.
range Timestamp from now() to ago(60d) step -1d
| extend rn = row_number() + 10
| extend FreeSpace = rn + case(rn % 5 == 0, 5, rn % 3 == 0, -4, rn % 7 == 0, 3, 0)
| make-series FreeSpace = max(FreeSpace) default= long(null) on Timestamp from ago(60d) to now() step 12h
| extend FreeSpace = series_fill_forward(series_fill_backward(FreeSpace))
| extend (rsquare, slope, variance, rvariance, interception, line_fit) = series_fit_line(FreeSpace)
| project slope, interception, Timestamp, FreeSpace, line_fit
| extend x_intercept = todatetime(Timestamp[0]) - 12h*(1 + interception / slope)
| project-reorder x_intercept
| render timechart with (xcolumn=Timestamp, ycolumns=FreeSpace,line_fit)
x_intercept
2022-12-06T01:56:54.0389796Z
Fiddle
P.S.
No need for serialize after order by.
No need for order by if you create the range backwards.
Null value in a time-series breaks a lot of functionality (fixed with additional series_fill_forward)
If you look at the example: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/series-decompose-forecastfunction
You will see that they add 0 slots into the "future" of the original series which the forecast then predicts.
This is also stated in the notes:
The dynamic array of the original input series should include a number of points slots to be forecasted. The forecast is typically done by using make-series and specifying the end time in the range that includes the timeframe to forecast.
To make your example work:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
// add 4 weeks of empty slots in the "future" - these slots will be forecast
| make-series FreeSpace = max(FreeSpace) default=long(null) on Timestamp from ago(60d) to now()+24h*7*4 step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend forecast=series_decompose_forecast(FreeSpace, 7*4*2)
| render timechart
The documentation could be a bit clearer but I think what the points parameter does is simply to omit the last N points from training (since they are empty and you don't want to include them in your forecast model)
Output:
To get when you hit close to 0:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
| make-series FreeSpace = max(FreeSpace) default=long(null) on Timestamp from ago(60d) to now()+24h*7*4 step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend forecast=series_decompose_forecast(FreeSpace, 7*4*2)
| mv-apply with_itemindex=idx f=forecast to typeof(double) on (
where f <= 0.5
| summarize min(idx)
)
| project AlmostOutOfDiskSpace = Timestamp[min_idx], PredictedDiskSpaceAtThatPoint = forecast[min_idx]
AlmostOutOfDiskSpace
PredictedDiskSpaceAtThatPoint
5/12/2022 13:02:24
0.32277009977544
I am trying to achieve these things:
Get most recent data for certain fields (base on timestamp) -> call this latestRequest
Get previous data for these fields (basically timestamp < latestRequest.timestamp)-> call this previousRequest
Count the difference between latestRequest and previousRequest
This is what I come with now:
let LatestRequest=requests
| where operation_Name == "SearchServiceFieldMonitor"
| extend Mismatch = split(tostring(customDimensions.IndexerMismatch), " in ")
| extend difference = toint(Mismatch[0])
, field = tostring(Mismatch[1])
, indexer = tostring(Mismatch[2])
, index = tostring(Mismatch[3])
, service = tostring(Mismatch[4])
| summarize MaxTime=todatetime(max(timestamp)) by service,index,indexer;
let previousRequest = requests
| where operation_Name == "SearchServiceFieldMonitor"
| extend Mismatch = split(tostring(customDimensions.IndexerMismatch), " in ")
| extend difference = toint(Mismatch[0])
, field = tostring(Mismatch[1])
, indexer = tostring(Mismatch[2])
, index = tostring(Mismatch[3])
, service = tostring(Mismatch[4])
|join (LatestRequest) on indexer, index,service
|where timestamp <LatestRequest.MaxTime
However, I get this error from this query:
Ensure that expression: LatestRequest.MaxTime is indeed a simple name
I tried to use toDateTime(LatestRequest.MaxTime) but it doesn't make any difference. What I am doing wrong?
The error you get is because you can't refer to a column in a table using the dot notation, you should simply use the column name since the results of a join operator is a table with the applicable columns from both side of the join.
An alternative to join might be using the row_number() and prev() functions. You can find the last record and the one before it by ordering the rows based on the key and timestamp and then calculate the values between the current row and the row before it.
Here is an example:
datatable(timestamp:datetime, requestId:int, val:int)
[datetime(2021-02-20 10:00), 1, 5,
datetime(2021-02-20 11:00), 1, 6,
datetime(2021-02-20 12:00), 1, 8,
datetime(2021-02-20 10:00), 2, 10,
datetime(2021-02-20 11:00), 2, 20,
datetime(2021-02-20 12:00), 2, 30,
datetime(2021-02-20 13:00), 2, 40,
datetime(2021-02-20 13:00), 3, 100
]
| order by requestId asc, timestamp desc
| extend rn = row_number(0, requestId !=prev(requestId))
| where rn <= 1
| order by requestId, rn desc
| extend diff = iif(prev(rn) == 1, val - prev(val), val)
| where rn == 0
| project-away rn
The results are:
I have the below program to determine the date coming from a third party, sometimes the date is improper in this case I want to fail the comparison but somehow the date always grets parsed to today which returns a positive response.
import (
"fmt"
"time"
)
func main() {
bday := time.Date(0, time.Month(0), 0, 0, 0, 0, 0, time.UTC)
fmt.Print(bday)
}
The print from the main() is: -0001-11-30 00:00:00 +0000 UTC
Concerning for me is the DD and MM value which is converted to today, as this snippet I am using to check a users birthday.
Foreword: The question was asked on November 30, that's why the month and day part seems like today.
Zero values are parsed properly, but there is no "Month 0". The first month is January which has numeric value 1. Similarly, there is no 0th day of month, the first day in every month is 1.
time.Date documents that:
The month, day, hour, min, sec, and nsec values may be outside their usual ranges and will be normalized during the conversion. For example, October 32 converts to November 1.
So if you pass 0 for month and day, that is interpreted the same as passing 1 to month and day, and adding -1 to each.
See this example:
bday := time.Date(0, time.Month(0), 0, 0, 0, 0, 0, time.UTC)
fmt.Println(bday)
bday2 := time.Date(0, time.Month(1), 1, 0, 0, 0, 0, time.UTC)
fmt.Println(bday2)
bday2 = bday2.AddDate(0, -1, -1)
fmt.Println(bday2)
Which outputs (try it on the Go Playground):
-0001-11-30 00:00:00 +0000 UTC
0000-01-01 00:00:00 +0000 UTC
-0001-11-30 00:00:00 +0000 UTC
So the result becoming "today" is purely by accident, today being November 30. If you run the code tomorrow, the month-day part will not be today anymore but yesterday.
I have a Kusto table counts with 4 rows and 3 columns that has the following elements
HasFailure FunnelPhase count_
0 Experienced 172425
0 NewSubs 25399
1 Experienced 3289
1 NewSubs 643
I would like to access the 3rd element in the 2nd column and save it to a scalar. I have tried the following code:
let value = counts | project count_ lookup 3;
But I am not able to obtain the desired result. What would be the correct way in which to obtain this value?
you'll need to order the records in your table (according to an order you define), then access the 3rd record (according to that same order), and finally - project the specific column you're interested in.
e.g.:
let T =
datatable(HasFailure:bool, FunnelPhase:string, count_:long)
[
0, 'Experienced', 172425,
0, 'NewSubs', 25399,
1, 'Experienced', 3289,
1, 'NewSubs', 643,
]
;
let 3rd_element_in_2nd_column = toscalar(
T
| order by count_ desc
| where row_number() == 3
| project FunnelPhase
)
;
print result = 3rd_element_in_2nd_column
I'm new to Peoplesoft and just trying to set the current date field to previous Sunday and for that I have used the 'weekday' function but this is returning an integer value. How can I convert the returned integer value to the date? Can anyone help me out with this issue?
Thanks in advance.
i assume you know how many days before was last sunday, in that case you can use this function
AddToDate(date, num_years, num_months, num_days)
it will return a date
example
AddToDate(Date(),0,0,-3), assuming sunday was 3 days before today
Assuming that you want the last sunday, so for example today is 30/06/2015 then the previous sunday is 28/06/2015.
to do that you can use
Local date &dt = %Date;
Local number &num = Weekday(&dt);
WinMessage(Date(&dt - (&num - 1)), 0);
Weekday function returns number value from 1 (sunday) to 7 (saturday).
So if you know the today's date (%date) then get the weekday from it.
If you want to get another date other than current date then use DateValue(date_str)
where date_srt is the string value of the date you want.
another way of doing this is
SQLExec(select To_date(:1,'DD/MM/YYYY') - (To_Char(To_date(:1,'DD/MM/YYYY'), 'D') -1) from dual, &dtValue, &dtSunday);
substitute &dtValue to the date you want
visit http://peoplesoftdotnet.blogspot.com.au/ for more tips
Here is the code:
%Date is used to retrieve SYSDATE.
I have added a few comments to validate the result.
/* Code Begins Here */
Local date &dtSunday;
Local integer &i;
MessageBox(0, "", 0, 0, "SYSDATE - " | %Date);
MessageBox(0, "", 0, 0, "Previous Sunday - 28-June-2015");
&i = Weekday(%Date);
&dtSunday = AddToDate(%Date, 0, 0, - (&i - 1));
MessageBox(0, "", 0, 0, "Computed Sunday - " | &dtSunday);
/* Code Ends Here */
Here is the result:
SYSDATE - 2015-07-02 (0,0)
Previous Sunday - 28-June-2015 (0,0)
Computed Sunday - 2015-06-28 (0,0)