How to check ADX ingestion log and queue? - azure-data-explorer

The command
.show ingestion failures
outputs errors in the ingestion process. I however did not find a way to get list of successfully ingested items, as well as inspect the ingestion queue (names of the items) and current status (what is being ingested at the moment). Is it possible and how to view that information?

ADX is optimized for high throughput, therefore it is not optimized for exposing individual ingest operation tracking by default (that level of granularity puts extra load on the service).
We also do not expose detailed information on the queues, definitely not listing the ingress queue items.
You can track all the ingest operations (failed/succeeded/both) by setting up Diagnostic Logs with Azure Monitor.
An aggregated view on your cluster via metrics is also available.
Please see Monitor Azure Data Explorer performance, health & usage with metrics and Monitor batching ingestion in Azure Data Explorer.

Related

I have multiple JMSlisteners where am using #JMSListener annotation with queue as destination. Observing the loss of events in our listeners

The volume of our application is too huge i.e around 500million per day. We are observing some loss of events. Please share if anyone has observed loss of events while consuming using #JmsListener.
There could be many reasons:
some of your messages are producing errors and they are generating rollbacks which you didn't handle carefully
you are not using transactions while you are having connectivity / sync / infrastructure issues
it can be your configuration: not using persistent messaging, caching messages on client side within driver (etc..)
Whatever it is it is NOT issue of #JmsListner annotation, but of code, driver and messaging configuration.

How to see throttling requests caused by replication?

I have next setup:
CosmosDB, one write region (Central US), one readonly region (West Europe). I need to upload new data nightly.
While uploading test data I see throttled requests to write region (Central US). But I could not find in Azure portal if throttling happened for readonly region as well or not.
I tried to find such information in Insights - but it seems there is no grouping/filtering by region 1.
I tried to create report in Metrics 2 with no luck.
So my questions are:
Is it guaranteed that data replication could not cause throttling request? By reading that thread Can replication cause request throttling? I assumed it is not guaranteed.
If replication process could consume enough RUs and cause throttling, how can I monitor it?
P.S. I tried to use all 5 consistency levels while uploading data.

Verify dynamodb is healthy

I would like to verify in my service /health check that I have a connection with my dynamodb.
I am searching for something like select 1 in MySQL (its only ping the db and return 1) but for dynamodb.
I saw this post but searching for a nonexisting item is an expensive action.
Any ideas on how to only ping my db?
I believe the select 1 equivalent in DDB is Scan with a Limit of 1 item. You can read more here.
Dynamodb is a managed service from AWS. It is highly available anyways. Instead of using query for verifying health of dynamodb, why not setup cloudwatch metrics on your table and check for recent alarm in cloud watch concerning dynamodb. This will also prevent you from spending your read units.
The question is perhaps too broad to answer as stated. There are many ways you could set this up, depending on your concerns and constraints.
My recommendation would be to not over-think, or over-do it in terms of verifying connectivity from your service host to DynamoDB: for example just performing a periodic GetItem should be sufficient to establish basic network connectivity..
Instead of going about the problem from this angle, perhaps you might want to consider a different approach:
a) setup canary tests that exercise all your service features periodically -- these should be "fail-fast" light tests that run constantly and in the event of consistent failure you can take action
b) setup error metrics from your service and monitor on those metrics: for example, CloudWatch allows you to take action on metrics -- you will likely get more milage out of this approach than narrowly focusing on a single failure mode (ie. DynamoDB, which, as other have stated, is a Managed service with very good availability SLA)

How do you retrieve cpu usage from Node in Kubernetes via API?

I want to calculate and show node specific cpu usage in percent in my own web application using Kubernetes API.
I need the same information as Kube UI and Cadvisor displays but I want to use the Kubernetes API.
I have found some cpu metrics under node-ip:10255/stats which contains timestamp, cpu usage: total, user and system in big weird numbers which I do not understand. Also the CPU-Limit is reported as 1024.
How does Kube UI calculate cpu usage and is it possible to do the same via the API?
If you use Kubernetes v1.2, there is a new, cleaner metrics summary API. From the release note:
Kubelet exposes a new Alpha metrics API - /stats/summary in a user friendly format with reduced system overhead.
You can access the endpoint through <node-ip>:10255/stats/summary and detailed API objects is here.
So the way CPU usage metrics are usually collected in Kubernetes is using cAdvisor https://github.com/google/cadvisor which looks at the cgroups to get metircs, so mostly CPU and some memory metrics. cAdvisor then can put its data into a metrics DB like heapster, influxDB or prometheus. Kubernetes does not directly deal with metrics, so therefore does not expose it through the API, however you can use the metrics DB instead. Additionally you can use an additional container in your pod to collect metrics and place that into your metrics DB. Additionally, you can get resource quotas through the API, but not usage. This proposal may be of some interest for you as well https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/proposals/metrics-plumbing.md.

DynamoDB tables uses more Read/Write capacity than expected

Background: I have a DynamoDB table which I interact exclusively with a DAO class. This DAO class logs metrics on the number of calls to insert/update/delete operations to the boto library.
I noticed that the # of operations I logged in my code do correlate with the consumed read/write capacity on AWS monitoring but the AWS measurements on consumption are 2 - 15 times the # of operations I logged in my code.
I know for a fact that the only other process interacting with the table is my manual queries on the AWS UI (which is insignificant in capacity consumption). I also know that the size of each item is < 1 KB, which would mean each call should only consume 1 read.
I use strong consistent reads so I do not enjoy the 2x benefit of eventual consistent reads.
I am aware that boto auto-retries at most 10 times when throttled but my throttling threshold is seldomly reached to trigger such a problem.
With that said, I wonder if anyone knows of any factor that may cause such a discrepency in # of calls to boto w.r.t. the actual consume capacities.
While I'm not sure of the support with the boto AWS SDK, in other languages it is possible to ask DynamoDB to return the capacity that was consumed as part of each request. It sounds like you are logging actual requests and not this metric from the API itself. The values returned by the API should accurately reflect what is consumed.
One possible source for this discrepancy is if you are doing query/scan requests where you are performing server side filtering. DynamoDB will consume the capacity for all of the records scanned and not just those returned.
Another possible cause of a discrepancy are the actual metrics you are viewing in the AWS console. If you are viewing the CloudWatch metrics directly make sure you are looking at the appropriate SUM or AVERAGE value depending on what metric you are interested in. If you are viewing the metrics in the DynamoDB console the interval you are looking at can dramatically affect the graph (ex: short spikes that appear in a 5 minute interval would be smoothed out in a 1 hour interval).

Resources