Multiple where clauses vs. 'and' in kusto - azure-data-explorer

In terms of performance, is the following query
ResourceEvents
| where ResourceType == "Foo" and EventType == "Bar"
practically the same as
ResourceEvents
| where ResourceType == "Foo"
| where EventType == "Bar"
Or are the records filtered sequentially, performing two searches instead of one combined?

both options are equivalent in terms of semantics and performance

Adding to Yoni's answer, you can check it yourself by looking at the query plan.
.show queryplan <|
StormEvents
| where State == "TEXAS" and EventType == "Flood"
.show queryplan <|
StormEvents
| where State == "TEXAS"
| where EventType == "Flood"
The plans are equivalent.

In your exact scenario, it seems to be equivalent; but if you have heavy parsing, better use chained | where clause compare to a | where ... and ....
Maybe have a look at kql query best practices > 'Lookup for rare keys/values in dynamic object'.
Thanks #sheldonzy! I didn't knew about .show queryplan. It seems to not be available when using log analytics sadly; but some testing can be achieved by running queries on Microsoft test ADX which is freely available.

Related

In Kusto, is there any difference in performance when using nested 'where' vs 'and' [duplicate]

In terms of performance, is the following query
ResourceEvents
| where ResourceType == "Foo" and EventType == "Bar"
practically the same as
ResourceEvents
| where ResourceType == "Foo"
| where EventType == "Bar"
Or are the records filtered sequentially, performing two searches instead of one combined?
both options are equivalent in terms of semantics and performance
Adding to Yoni's answer, you can check it yourself by looking at the query plan.
.show queryplan <|
StormEvents
| where State == "TEXAS" and EventType == "Flood"
.show queryplan <|
StormEvents
| where State == "TEXAS"
| where EventType == "Flood"
The plans are equivalent.
In your exact scenario, it seems to be equivalent; but if you have heavy parsing, better use chained | where clause compare to a | where ... and ....
Maybe have a look at kql query best practices > 'Lookup for rare keys/values in dynamic object'.
Thanks #sheldonzy! I didn't knew about .show queryplan. It seems to not be available when using log analytics sadly; but some testing can be achieved by running queries on Microsoft test ADX which is freely available.

Cosmos DB RU Usage by collection name

I am trying to find what's causing the higher RU usage on the Cosmos DB. I enabled the Log Analytics on the Doc DB and ran the below Kusto query to get the RU consumption by Collection Name.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by collectionName_s, _ResourceId, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , collectionName_s, _ResourceId
| render timechart
We have only one collection on the DocDb Account (prd-entities) which is represents Red line in the Chart. I am not able to figure out what the Blue line represents.
Is there a way to get more details about the empty collection name RU usage (i.e., Blue line)
I'm not sure but I think there's no empty collection costs RU actually.
Per my testing in my side, I found that when I execute your kusto query I can also get the 'empty collection', but when I watch the line details, I found all these rows are existing in my operation. What I mean here is that we shouldn't sum by collectionName_s especially you only have one collection in total, you may try to use requestResourceId_s instead.
When using requestResourceId_s, there're still some rows has no id, but they cost 0.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by requestResourceId_s, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , requestResourceId_s
| render timechart
Actually, you can check the requestCharge_s are coming from which operation, just watch details in Results, but not in Chart, and order by the collectionName_s, then you'll see those requests creating from the 'empty collection', judge if these requests existing in your collection.

Application Insights - Calculate timespan between two events that have the same operation Id?

I've a lot of events in the traces section of Application Insights. I'm interested in two events "Beginning" and "End", they each have the same operation Id as they're logged in sets.
Sometimes the "End" event won't exist - as there will have a been a problem with the application we're monitoring.
We can say, for the sake of argument that we have these fields that we're interested in: timestamp, eventName, operationId
How can i calculate the exact time between the two timestamps for the pair of events for all unique operation Ids in a timespan?
My initial thought was to get the distinct operationIds from traces, where the eventName is "Beginning"... But that's as far as i get, as i'm not really sure how to perform the rest of the operations required. (Namely - the calculation, and checking if the "End" event even exists).
let operations =
traces
| where customDimensions.eventName = "Beginning"
| distinct operationId
Any help would be greatly appreciated!
EDIT: I'm obviously thinking about this all wrong. What i'm after is non-unique operationIds. This will filter out missing "end" events.
If i could then merge the resulting results together, based on that id, i would then have 2 timestamps, which i could operate on.
So, i figured it out after some coffee and time to think.
Ended up with:
let a =
traces
| summarize count() by operation_Id;
let b =
a
| where count_ == 2
| project operation_Id;
let c =
traces
| where operation_Id in (b)
| join kind = inner(traces) on operation_Id
| order by timestamp,timestamp1
| project evaluatedTime=(timestamp1 - timestamp), operation_Id, timestamp;
c
| where evaluatedTime > timespan(0)
| project seconds=evaluatedTime/time(1s), operation_Id, timestamp

Application Insights Join/Combine Columns Into A Single Column

I have an application insights query. And in this query I want to join/combine several columns into a single column for display how can this be accomplished.
I want to combine ip, city, state, country.
customEvents
| where timestamp >= ago(7d)
| where (itemType == 'customEvent')
| where name == "Signin"
| project timestamp, customDimensions.appusername, client_IP,client_City,client_StateOrProvince, client_CountryOrRegion
| order by timestamp desc
strcat is your friend, with whatever strings you want as separators (i just use spaces in the example):
| project timestamp, customDimensions.appusername,
strcat(client_IP," ",client_City," ",client_StateOrProvince," ", client_CountryOrRegion)
also, the | where (itemType == 'customEvent') in your query is unnecessary, as everything in the customEvents table is already a customEvent. you only need a filter like that on itemType if you join multiple tables somehow (like union requests, customEvents or a join somewhere in your query that references multiple tables)

DynamoDB ExclusiveStartKey Misuse?

I was planning to use a Dynamo table as a sort of replication log, so I have a table that looks like this:
+--------------+--------+--------+
| Sequence Num | Action | Thing |
+--------------+--------+--------+
| 0 | ADD | Thing1 |
| 1 | DEL | Thing1 |
| 2 | ADD | Thing2 |
+--------------+--------+--------+
Each of my processes keeps track of the last sequence number it read. Then on an interval it issues a Scan against the table with ExclusiveStartKey set to that sequence number. I assumed this would result in reading everything after that sequence, but instead I am seeing inconsistent results.
For example, given the table above, if I do a Scan(ExclusiveStartKey=1), I get zero results when I am expecting to see the 3rd row (seq=2).
I have a feeling it has to do with the internal hashing DynamoDB uses to partition the items and that I am misusing the ExclusiveStartKey option.
Is this the wrong tool for the job?
Alternatively, each process could issue a Query for seq+1 on each interval (looping if anything was found), which would result in the same ReadThroughput, but would require N API calls instead of N/1MB I would get with a Scan.
When you do a DynamoDB Scan operation, it does not seem to proceed sorted by the hash key. So using ExclusiveStartKey does not allow you to get an arbitrary page of keys.
For this example table with the Sequence ID, what I want can be accomplished with a Kinesis stream.

Resources