Azure log analytics batches - azure-application-insights

I can't figure out why batches in Azure log analytics queries do not work as I expect from documentation.
For example, this query should return two tabular results.
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000;
m | where n < 10
I try to do the same, and always get result only for the first of my subqueries, e.g. in this case only one table and one entry is returned, while the second subquery is not executed (I can leave mistakes in it and they are not caught upon execution).
let someMetrics = materialize (customMetrics | where timestamp > ago (1h));
someMetrics | take 1;
someMetrics | take 3

I think this is limitation of the current Analytics UI. If you submit this query through API layer (DEMO is here), it will come back with two tables.
Here is curl script for reference:
curl "https://api.applicationinsights.io/v1/apps/DEMO_APP/query?query=let%20req%20%3D%20materialize(requests%7C%20where%20timestamp%20%3E%20ago(10m))%3Breq%20%7C%20take%201%3Breq%20%7C%20take%202" -H "x-api-key: DEMO_KEY"
This will return two tables (two metadata parts, one for each table + two result sets, one after each metadata).

Related

Slow query on table | WHERE x | ORDER by timestamp | DISTINCT a,b,c,d | TAKE 20 when table large

We are experiencing a sudden performance drop with a query structured like this:
table(tablename)
| where MeasurementName in ('ActiveJobId')
and MachineId == machineId
and SourceTimestamp <= from
and isnotnull( Value)
| order by SourceTimestamp desc
| distinct SourceTimestamp, MeasurementName, tostring(Value), SourceTimestampUtc
| take rows
tablename, machineId, from, rows are all query parameters. rows is typically "20". Value column is of type "dynamic"
The table contains 240 Million entries, with about 64,000 matching the WHERE criteria. The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
The query runs smooth in the Staging database system, but started to degrade in performance on the Dev system. Possibly because of increased data amount.
If we remove the distinct clause, or move it behind the TAKE clause, the query completes very fast. (<1s). The data contains about 5-10% duplicate entries.
To our understanding the query should be performed like this:
Prepare a filter for the source table, start at a specific datetime range
Order desc: walk backwards
Walk down the table and stop when you got 20 distinct rows
From the time it sometimes takes it looks almost as if ADX walks down the whole table, performs a distinct, and then only takes the topmost 20 rows.
The problem persists if we swap | order and | distinct around.
The problem disappears if we move | distinct to the end of the query, but then we often receive 1-2 items less than required.
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
This part of the description doesn't match the filter in your query: and SourceTimestamp <= from - did you mean to use >= instead of <= ?
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
If you can't eliminate the duplicates upstream, you can consider setting a materialized view that performs the deduplication, then query the view directly instead of the raw data. Also see Handle duplicate data

Kusto: Full table scan on join even join conditions are time-based?

I have a Kusto Query like:
(Events
| take 1)
| join kind=leftouter Sensor_Data on $left.start_timestmp ==
$right.timestmp, someotherfield
and it will never return. The right side of the join has several billion entries.
If I do a
Events
| take 1
and use the result in the where clause of Sensor_Data it returns in no time.
The MS Support Team explains that this query requires a full table scan of the Sensors_Data table. The join parameters are not taken into consideration by the query optimizer.
Question:
Is the Kusto Query Optimizer really no able to optimize queries based on the join condition? To me it sounds a little bit like 1999 to have to first do the left side of the query manualy and then do the right side manualy as well? Is there some hint or so, to make this run fast?
consider rewriting your query as follows (for example) and see if that helps it perform better:
let x = toscalar(Events | take 1 | project pack("ts", start_timestmp, "sof", someotherfield));
Sensor_Data
| where timestmp == todatetime(x["ts"])
| where someotherfield == tostring(x["sof"])

Using the results of the output of one query as table names

I have written the following code which extracts the names of tables which have used storage in Sentinel. I'm not sure if Kusto is capable of doing the query but essentially I want to use the value stored in "Out" as names of tables. e.g union(Out) or search in (Out) *
let MyTables =
Usage
| where Quantity > 0
| summarize by DataType ;
let Out =
MyTables
| summarize makelist(DataType);
No, this is not possible. The tables referenced by the query should be known during query planning. A possible workaround can be generating the query and invoking it using the execute_query plugin.

App Insights 'join' that returns only the first result

I have a job running hourly (at slightly different times) and logging metrics into Application Insights.
I want to trigger an alert based on the metrics from the latest job run.
let metrics = customMetrics | where ... | extend run = bin(timestamp, 1m);
let latestRun = metrics | top 1 by run desc;
metrics | join latestRun on run
Looking at metrics I can see this query should be returning 8 results. But it returns only the first of them. Why?
Surprisingly, this is by design - the query language does not use inner joins by default, instead it uses "innerunique" joins.
Switching to join kind=inner fixed my original query.

DynamoDB ExclusiveStartKey Misuse?

I was planning to use a Dynamo table as a sort of replication log, so I have a table that looks like this:
+--------------+--------+--------+
| Sequence Num | Action | Thing |
+--------------+--------+--------+
| 0 | ADD | Thing1 |
| 1 | DEL | Thing1 |
| 2 | ADD | Thing2 |
+--------------+--------+--------+
Each of my processes keeps track of the last sequence number it read. Then on an interval it issues a Scan against the table with ExclusiveStartKey set to that sequence number. I assumed this would result in reading everything after that sequence, but instead I am seeing inconsistent results.
For example, given the table above, if I do a Scan(ExclusiveStartKey=1), I get zero results when I am expecting to see the 3rd row (seq=2).
I have a feeling it has to do with the internal hashing DynamoDB uses to partition the items and that I am misusing the ExclusiveStartKey option.
Is this the wrong tool for the job?
Alternatively, each process could issue a Query for seq+1 on each interval (looping if anything was found), which would result in the same ReadThroughput, but would require N API calls instead of N/1MB I would get with a Scan.
When you do a DynamoDB Scan operation, it does not seem to proceed sorted by the hash key. So using ExclusiveStartKey does not allow you to get an arbitrary page of keys.
For this example table with the Sequence ID, what I want can be accomplished with a Kinesis stream.

Resources