Cosmos DB RU Usage by collection name - azure-cosmosdb

I am trying to find what's causing the higher RU usage on the Cosmos DB. I enabled the Log Analytics on the Doc DB and ran the below Kusto query to get the RU consumption by Collection Name.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by collectionName_s, _ResourceId, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , collectionName_s, _ResourceId
| render timechart
We have only one collection on the DocDb Account (prd-entities) which is represents Red line in the Chart. I am not able to figure out what the Blue line represents.
Is there a way to get more details about the empty collection name RU usage (i.e., Blue line)

I'm not sure but I think there's no empty collection costs RU actually.
Per my testing in my side, I found that when I execute your kusto query I can also get the 'empty collection', but when I watch the line details, I found all these rows are existing in my operation. What I mean here is that we shouldn't sum by collectionName_s especially you only have one collection in total, you may try to use requestResourceId_s instead.
When using requestResourceId_s, there're still some rows has no id, but they cost 0.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by requestResourceId_s, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , requestResourceId_s
| render timechart
Actually, you can check the requestCharge_s are coming from which operation, just watch details in Results, but not in Chart, and order by the collectionName_s, then you'll see those requests creating from the 'empty collection', judge if these requests existing in your collection.

Related

DynamoDB Global Secondary Index "Batch" Retrieval

I've see older posts around this but hoping to bring this topic up again. I have a table in DynamoDB that has a UUID for the primary key and I created a secondary global index (SGI) for a more business-friendly key. For example:
| account_id | email | first_name | last_name |
|------------ |---------------- |----------- |---------- |
| 4f9cb231... | linda#gmail.com | Linda | James |
| a0302e59... | bruce#gmail.com | Bruce | Thomas |
| 3e0c1dde... | harry#gmail.com | Harry | Styles |
If account_id is my primary key and email is my SGI, how do I query the table to get accounts with email in ('linda#gmail.com', 'harry#gmail.com')? I looked at the IN conditional expression but it doesn't appear to work with SGI. I'm using the go SDK v2 library but will take any guidance. Thanks.
Short answer, you can't.
DDB is designed to return a single item, via GetItem(), or a set of related items, via Query(). Related meaning that you're using a composite primary key (hash key & sort key) and the related items all have the same hash key (aka partition key).
Another way to think of it, you can't Query() a DDB Table/index. You can only Query() a specific partition in a table or index.
Scan() is the only operation that works across partitions in one shot. But scanning is very inefficient and costly since it reads the entire table every time.
You'll need to issue a GetItem() for every email you want returned.
Luckily, DDB now offers BatchGetItem() with will allow you to send multiple, up to 100, GetItem() requests in a single call. Saves a little bit of network time and automatically runs the requests in parallel; but otherwise is the little different from what your application could do itself directly with GetItem(). Make no mistake, BatchGetItem() is making individual GetItem() requests behind the scenes. In fact, the requests in a BatchGetItem() don't even have to be against the same tables/indexes. The cost for each request in a batch will be the same as if you'd used GetItem() directly.
One difference to make note of, BatchGetItem() can only return 16MB of data. So if your DDB items are large, you may not get as many returned as your requested.
For example, if you ask to retrieve 100 items, but each individual
item is 300 KB in size, the system returns 52 items (so as not to
exceed the 16 MB limit). It also returns an appropriate
UnprocessedKeys value so you can get the next page of results. If
desired, your application can include its own logic to assemble the
pages of results into one dataset.
Because you have a GSI with PK of email (from what I understand) you can use PartiQL command to get your batch of emails back. The API is called ExecuteStatment and you use a SQL like syntax:
SELECT * FROM mytable.myindex WHERE email IN ['email#email.com','email1#email.com']

Application Insights - Calculate timespan between two events that have the same operation Id?

I've a lot of events in the traces section of Application Insights. I'm interested in two events "Beginning" and "End", they each have the same operation Id as they're logged in sets.
Sometimes the "End" event won't exist - as there will have a been a problem with the application we're monitoring.
We can say, for the sake of argument that we have these fields that we're interested in: timestamp, eventName, operationId
How can i calculate the exact time between the two timestamps for the pair of events for all unique operation Ids in a timespan?
My initial thought was to get the distinct operationIds from traces, where the eventName is "Beginning"... But that's as far as i get, as i'm not really sure how to perform the rest of the operations required. (Namely - the calculation, and checking if the "End" event even exists).
let operations =
traces
| where customDimensions.eventName = "Beginning"
| distinct operationId
Any help would be greatly appreciated!
EDIT: I'm obviously thinking about this all wrong. What i'm after is non-unique operationIds. This will filter out missing "end" events.
If i could then merge the resulting results together, based on that id, i would then have 2 timestamps, which i could operate on.
So, i figured it out after some coffee and time to think.
Ended up with:
let a =
traces
| summarize count() by operation_Id;
let b =
a
| where count_ == 2
| project operation_Id;
let c =
traces
| where operation_Id in (b)
| join kind = inner(traces) on operation_Id
| order by timestamp,timestamp1
| project evaluatedTime=(timestamp1 - timestamp), operation_Id, timestamp;
c
| where evaluatedTime > timespan(0)
| project seconds=evaluatedTime/time(1s), operation_Id, timestamp

DynamoDB ExclusiveStartKey Misuse?

I was planning to use a Dynamo table as a sort of replication log, so I have a table that looks like this:
+--------------+--------+--------+
| Sequence Num | Action | Thing |
+--------------+--------+--------+
| 0 | ADD | Thing1 |
| 1 | DEL | Thing1 |
| 2 | ADD | Thing2 |
+--------------+--------+--------+
Each of my processes keeps track of the last sequence number it read. Then on an interval it issues a Scan against the table with ExclusiveStartKey set to that sequence number. I assumed this would result in reading everything after that sequence, but instead I am seeing inconsistent results.
For example, given the table above, if I do a Scan(ExclusiveStartKey=1), I get zero results when I am expecting to see the 3rd row (seq=2).
I have a feeling it has to do with the internal hashing DynamoDB uses to partition the items and that I am misusing the ExclusiveStartKey option.
Is this the wrong tool for the job?
Alternatively, each process could issue a Query for seq+1 on each interval (looping if anything was found), which would result in the same ReadThroughput, but would require N API calls instead of N/1MB I would get with a Scan.
When you do a DynamoDB Scan operation, it does not seem to proceed sorted by the hash key. So using ExclusiveStartKey does not allow you to get an arbitrary page of keys.
For this example table with the Sequence ID, what I want can be accomplished with a Kinesis stream.

Using extend to add a count column in Azure Stream Analytics/Application Insights

I have an Application Insights Azure Stream Analytics query that looks like this...
requests
| summarize count() by bin(duration, 1000)
| order by duration asc nulls last
...which gives me something like this, which shows the number of requests binned by duration in seconds, recorded in Application Insights.
| 0 | 1000 |
| 1000 | 500 |
| 2000 | 200 |
I would like to able to add another column which shows the count of exceptions from all requests in each bin.
I understand that extend is used to add additional columns, but to do so I would have to reference the 'outer' expression to get the bin constraints, which I don't know how to do. Is this the best way to do this? Or am I better off trying to join the two tables together and then doing the summarize?
Thanks
As you suspected - extend will not help you much here. You need is to run join kind=leftouter on the operation IDs (leftouter is needed so you won't drop requests that did not have any exceptions):
requests
| join kind=leftouter (
exceptions
| summarize exceptionsCount = count() by operation_Id
) on operation_Id
| summarize count(), sum(exceptionsCount) by bin(duration, 1000)
| order by duration asc nulls last

Can't filter Firebase data

I'm having problems filtering the data with the element of polymer. My code is like this:
<firebase-collection location="--url--" order-by-child="user_id" equal-to="1" log="true" data="{{message}}"></firebase-collection>
My database is:
messages
|
|__0__content
| |__letter_id
| |__user_id
| |__datetime
|
|__1__content
| |__letter_id
| |__user_id
| |__datetime
|
|__etcetera
This should get the messages with a user_id that is equal to 1. However, this shows nothing. I guess this is a problem with my syntax, but I can't figure out the problem.
Never mind, I had my Firebase database setup with user_id set as a number. Therefore it could not obtain fetch the data.
I had the same issue but fixed it by using both the orderByValueType (as number) and the orderByChild set like so:
<firebase-collection
order-by-child="posted_date"
start-at="0"
order-value-type="number"
limit-to-first="1"
log="true"
location="https://somefirebaselocation.firebaseio.com/articles"
data="{{articles}}"></firebase-collection>
Without the order-value-type the query did not work.

Resources