I am trying to achieve these things:
Get most recent data for certain fields (base on timestamp) -> call this latestRequest
Get previous data for these fields (basically timestamp < latestRequest.timestamp)-> call this previousRequest
Count the difference between latestRequest and previousRequest
This is what I come with now:
let LatestRequest=requests
| where operation_Name == "SearchServiceFieldMonitor"
| extend Mismatch = split(tostring(customDimensions.IndexerMismatch), " in ")
| extend difference = toint(Mismatch[0])
, field = tostring(Mismatch[1])
, indexer = tostring(Mismatch[2])
, index = tostring(Mismatch[3])
, service = tostring(Mismatch[4])
| summarize MaxTime=todatetime(max(timestamp)) by service,index,indexer;
let previousRequest = requests
| where operation_Name == "SearchServiceFieldMonitor"
| extend Mismatch = split(tostring(customDimensions.IndexerMismatch), " in ")
| extend difference = toint(Mismatch[0])
, field = tostring(Mismatch[1])
, indexer = tostring(Mismatch[2])
, index = tostring(Mismatch[3])
, service = tostring(Mismatch[4])
|join (LatestRequest) on indexer, index,service
|where timestamp <LatestRequest.MaxTime
However, I get this error from this query:
Ensure that expression: LatestRequest.MaxTime is indeed a simple name
I tried to use toDateTime(LatestRequest.MaxTime) but it doesn't make any difference. What I am doing wrong?
The error you get is because you can't refer to a column in a table using the dot notation, you should simply use the column name since the results of a join operator is a table with the applicable columns from both side of the join.
An alternative to join might be using the row_number() and prev() functions. You can find the last record and the one before it by ordering the rows based on the key and timestamp and then calculate the values between the current row and the row before it.
Here is an example:
datatable(timestamp:datetime, requestId:int, val:int)
[datetime(2021-02-20 10:00), 1, 5,
datetime(2021-02-20 11:00), 1, 6,
datetime(2021-02-20 12:00), 1, 8,
datetime(2021-02-20 10:00), 2, 10,
datetime(2021-02-20 11:00), 2, 20,
datetime(2021-02-20 12:00), 2, 30,
datetime(2021-02-20 13:00), 2, 40,
datetime(2021-02-20 13:00), 3, 100
]
| order by requestId asc, timestamp desc
| extend rn = row_number(0, requestId !=prev(requestId))
| where rn <= 1
| order by requestId, rn desc
| extend diff = iif(prev(rn) == 1, val - prev(val), val)
| where rn == 0
| project-away rn
The results are:
Related
I collect Free disk space metrics at regular intervals and would like to predict when the disk will be full.
I thought I could use series_decompose_forecast
Here's a sample query:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
| make-series
FreeSpace = max(FreeSpace) default= long(null)
on Timestamp from ago(60d) to now() step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend series_decompose_forecast(FreeSpace, 24)
| render timechart
And the result
The baseline seems like it could show me when it will hit zero (or some other threshold), but if I specify more Points, it excludes more points from the learning process (still unsure if it excludes them from the start or end).
I don't even care for the whole time series, just the date of running out of free space. Is this the correct approach?
It seems that series_fit_line() is more than enough in this scenario.
Once you got the slope and the interception you can calculate any point on the line.
range Timestamp from now() to ago(60d) step -1d
| extend rn = row_number() + 10
| extend FreeSpace = rn + case(rn % 5 == 0, 5, rn % 3 == 0, -4, rn % 7 == 0, 3, 0)
| make-series FreeSpace = max(FreeSpace) default= long(null) on Timestamp from ago(60d) to now() step 12h
| extend FreeSpace = series_fill_forward(series_fill_backward(FreeSpace))
| extend (rsquare, slope, variance, rvariance, interception, line_fit) = series_fit_line(FreeSpace)
| project slope, interception, Timestamp, FreeSpace, line_fit
| extend x_intercept = todatetime(Timestamp[0]) - 12h*(1 + interception / slope)
| project-reorder x_intercept
| render timechart with (xcolumn=Timestamp, ycolumns=FreeSpace,line_fit)
x_intercept
2022-12-06T01:56:54.0389796Z
Fiddle
P.S.
No need for serialize after order by.
No need for order by if you create the range backwards.
Null value in a time-series breaks a lot of functionality (fixed with additional series_fill_forward)
If you look at the example: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/series-decompose-forecastfunction
You will see that they add 0 slots into the "future" of the original series which the forecast then predicts.
This is also stated in the notes:
The dynamic array of the original input series should include a number of points slots to be forecasted. The forecast is typically done by using make-series and specifying the end time in the range that includes the timeframe to forecast.
To make your example work:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
// add 4 weeks of empty slots in the "future" - these slots will be forecast
| make-series FreeSpace = max(FreeSpace) default=long(null) on Timestamp from ago(60d) to now()+24h*7*4 step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend forecast=series_decompose_forecast(FreeSpace, 7*4*2)
| render timechart
The documentation could be a bit clearer but I think what the points parameter does is simply to omit the last N points from training (since they are empty and you don't want to include them in your forecast model)
Output:
To get when you hit close to 0:
let DiskSpace =
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
rn % 5 == 0, rn + 5
, rn % 3 == 0, rn -4
, rn % 7 == 0, rn +3
, rn
)
| project Timestamp, FreeSpace;
DiskSpace
| make-series FreeSpace = max(FreeSpace) default=long(null) on Timestamp from ago(60d) to now()+24h*7*4 step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend forecast=series_decompose_forecast(FreeSpace, 7*4*2)
| mv-apply with_itemindex=idx f=forecast to typeof(double) on (
where f <= 0.5
| summarize min(idx)
)
| project AlmostOutOfDiskSpace = Timestamp[min_idx], PredictedDiskSpaceAtThatPoint = forecast[min_idx]
AlmostOutOfDiskSpace
PredictedDiskSpaceAtThatPoint
5/12/2022 13:02:24
0.32277009977544
I have this query that almost works:
datatable (timestamp:datetime, value:dynamic)
[
datetime("2021-04-19"), "a",
datetime("2021-04-19"), "b",
datetime("2021-04-20"), 1,
datetime("2021-04-20"), 2,
datetime("2021-04-21"), "b",
datetime("2021-04-22"), 2,
datetime("2021-04-22"), 3,
]
| project timestamp, stringvalue=iif(gettype(value)=="string", tostring(value), ""), numericvalue=iif(gettype(value)=="long", toint(value), int(null))
| summarize any(stringvalue), avg(numericvalue) by bin(timestamp, 1d)
| project timestamp, value=iif(isnan(avg_numericvalue), any_stringvalue, avg_numericvalue)
This splits the values in the value field into stringvalue if the value is string and numericvalue of the value is long. Then it summarizes the values based on day level, for the string values it just takes any value and for the numeric values is calculates the average.
After this I want to put the values back into the value field.
I was thinking that the last row could be like below but the dynamic function only wants literals
| project timestamp, value=iif(isnan(avg_numericvalue), dynamic(any_stringvalue), dynamic(avg_numericvalue))
If I do it like this it will actually work:
| project timestamp, value=iif(isnan(avg_numericvalue), parse_json(any_stringvalue), parse_json(tostring(avg_numericvalue)))
But is there a better way than converting it to json and back?
iff expects the type of the 2nd and 3rd arguments to match. In your case, one is a number, and the other one is a string. To fix the issue, just add tostring() around the number:
datatable (timestamp:datetime, value:dynamic)
[
datetime("2021-04-19"), "a",
datetime("2021-04-19"), "b",
datetime("2021-04-20"), 1,
datetime("2021-04-20"), 2,
datetime("2021-04-21"), "b",
datetime("2021-04-22"), 2,
datetime("2021-04-22"), 3,
]
| project timestamp, stringvalue=iif(gettype(value)=="string", tostring(value), ""), numericvalue=iif(gettype(value)=="long", toint(value), int(null))
| summarize any(stringvalue), avg(numericvalue) by bin(timestamp, 1d)
| project timestamp, value=iif(isnan(avg_numericvalue), any_stringvalue, tostring(avg_numericvalue))
I have a Kusto table counts with 4 rows and 3 columns that has the following elements
HasFailure FunnelPhase count_
0 Experienced 172425
0 NewSubs 25399
1 Experienced 3289
1 NewSubs 643
I would like to access the 3rd element in the 2nd column and save it to a scalar. I have tried the following code:
let value = counts | project count_ lookup 3;
But I am not able to obtain the desired result. What would be the correct way in which to obtain this value?
you'll need to order the records in your table (according to an order you define), then access the 3rd record (according to that same order), and finally - project the specific column you're interested in.
e.g.:
let T =
datatable(HasFailure:bool, FunnelPhase:string, count_:long)
[
0, 'Experienced', 172425,
0, 'NewSubs', 25399,
1, 'Experienced', 3289,
1, 'NewSubs', 643,
]
;
let 3rd_element_in_2nd_column = toscalar(
T
| order by count_ desc
| where row_number() == 3
| project FunnelPhase
)
;
print result = 3rd_element_in_2nd_column
I’m newbie in Kusto language but experienced in SQL. So maybe I’m doing things in completely wrong way.
I’m trying to create query which needs to check if value from one table exist in another.
Something like this:
let T1 = datatable(id: int, ss:dynamic)
[
1, dynamic(["qwe", "rty"]),
2, dynamic(["uio", "pas"]),
3, dynamic(["dfg", "hjk"]),
];
let T2 = datatable(id:int, s:string)
[
1, "rty",
2, "abc",
3, "hjk"
];
T2
| join (T1) on id
| extend e=case(s has_any (ss),"Yes","No");
But getting error “Error has_any(): failed to cast argument 2 to scalar constant”.
Is there way to do it?
Even better with function, something like this:
let E = (i_id: int, i_s: string)
{
T1 | where id==i_id | project e=case(i_s has_any (ss),"Yes","No")
};
T2
| extend e=E(id,s);
Please advise.
here are a couple of options for you to consider:
1.
let T1 = datatable(id: int, ss:dynamic)
[
1, dynamic(["qwe", "rty"]),
2, dynamic(["uio", "pas"]),
3, dynamic(["dfg", "hjk"]),
]
;
let T2 = datatable(id:int, s:string)
[
1, "rty",
2, "abc",
3, "hjk"
]
;
T2
| join (T1 | mv-expand ss to typeof(string)) on id
| summarize e = case(countif(s == ss) > 0, "Yes", "No") by id, s
2.
let T1 = datatable(id: int, ss:dynamic)
[
1, dynamic(["qwe", "rty"]),
2, dynamic(["uio", "pas"]),
3, dynamic(["dfg", "hjk"]),
]
;
let T2 = datatable(id:int, s:string)
[
1, "rty",
2, "abc",
3, "hjk"
]
;
T2
| join T1 on id
| project id, s, e = case(indexof(tostring(ss), s) > 0, "Yes", "No")
// not necessarily accurate, depending on the values in the actual data
I have 6 columns with same value as either of 0,1,2,3. I want to display the result such as 0 represents SUCCESS, 1 or 2 represent failure and 3 represents NOT APPLICABLE. So if in DB the values are :
col A | col B | col C | col D | col E | col F
0 | 1 | 2 | 0 | 3 | 2
Output should be :
col A | col B | col C | col D | col E | col F
S | F | F | S | NA | F
Is it possible to do it through decode by selecting all the columns at once rather than selecting them individually?
If I understand your question correctly, it sounds like you just need a case expression (or decode, if you prefer, but that's less self-documenting than a case expression), along the lines of:
case when some_col = 0 then 'S'
when some_col in (1, 2) then 'F'
...
else some_col -- replace with whatever you want the output to be if none of the above conditions are met
end
or maybe:
case some_col
when 0 then 'S'
when 1 then 'F'
...
else some_col -- replace with whatever you want the output to be if none of the above conditions are met
end
So your query would look something like:
select case ...
end col_a,
...
case ...
end col_f
from your_table;
Is it possible to do it through decode by selecting all the columns at once rather than selecting them individually?
No
However, besides using pivot, the only solution I see would be using PL/SQL:
1.This is how I simulated your table
SELECT *
FROM (WITH tb1 (col_a, col_b, col_c, col_d, col_e, col_f) AS
(SELECT 0, 1, 2, 0, 3, 2 FROM DUAL)
SELECT *
FROM tb1)
2.I would append the columns together with a comma between them and save them into a table of strings
SELECT col_a || ',' || col_b || ',' || col_c || ',' || col_d || '.' || col_e || ',' || col_f
FROM (WITH tb1 (col_a, col_b, col_c, col_d, col_e, col_f) AS (SELECT 0, 1, 2, 0, 3, 2 FROM DUAL)
SELECT *
FROM tb1)
3.Then I would use REGEXP_REPLACE to replace your values one row at a time
SELECT REPLACE (REGEXP_REPLACE (REPLACE ('0,1,2,0,3,2', 0, 'S'), '[1-2]', 'F'), 3, 'NA') COL_STR
FROM DUAL
4. Using dynamic SQL I would update the table using rowid or whatever you intend to do. I made this SQL which will separate the string into columns
SELECT REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 1) AS COL_A,
REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 2) AS COL_B,
REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 3) AS COL_C,
REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 4) AS COL_D,
REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 5) AS COL_E,
REGEXP_SUBSTR (COL_STR, '[^,]+', 1, 6) AS COL_F
FROM tst1)
All of this is very tedious and it could take some time. Using DECODE or CASE would be easier to look at and interpret and thus easier to maintain.