How do you retrieve cpu usage from Node in Kubernetes via API? - openstack

I want to calculate and show node specific cpu usage in percent in my own web application using Kubernetes API.
I need the same information as Kube UI and Cadvisor displays but I want to use the Kubernetes API.
I have found some cpu metrics under node-ip:10255/stats which contains timestamp, cpu usage: total, user and system in big weird numbers which I do not understand. Also the CPU-Limit is reported as 1024.
How does Kube UI calculate cpu usage and is it possible to do the same via the API?

If you use Kubernetes v1.2, there is a new, cleaner metrics summary API. From the release note:
Kubelet exposes a new Alpha metrics API - /stats/summary in a user friendly format with reduced system overhead.
You can access the endpoint through <node-ip>:10255/stats/summary and detailed API objects is here.

So the way CPU usage metrics are usually collected in Kubernetes is using cAdvisor https://github.com/google/cadvisor which looks at the cgroups to get metircs, so mostly CPU and some memory metrics. cAdvisor then can put its data into a metrics DB like heapster, influxDB or prometheus. Kubernetes does not directly deal with metrics, so therefore does not expose it through the API, however you can use the metrics DB instead. Additionally you can use an additional container in your pod to collect metrics and place that into your metrics DB. Additionally, you can get resource quotas through the API, but not usage. This proposal may be of some interest for you as well https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/proposals/metrics-plumbing.md.

Related

How to check ADX ingestion log and queue?

The command
.show ingestion failures
outputs errors in the ingestion process. I however did not find a way to get list of successfully ingested items, as well as inspect the ingestion queue (names of the items) and current status (what is being ingested at the moment). Is it possible and how to view that information?
ADX is optimized for high throughput, therefore it is not optimized for exposing individual ingest operation tracking by default (that level of granularity puts extra load on the service).
We also do not expose detailed information on the queues, definitely not listing the ingress queue items.
You can track all the ingest operations (failed/succeeded/both) by setting up Diagnostic Logs with Azure Monitor.
An aggregated view on your cluster via metrics is also available.
Please see Monitor Azure Data Explorer performance, health & usage with metrics and Monitor batching ingestion in Azure Data Explorer.

Azure App Insights Operation count is inexplicably high

We are currently monitoring a web API using the data in the Performance page of Application Insights, to give us the number of requests received per operation.
The architecture of our API solution is to use APIM as the frontend and an App Service as the backend. Both instances have App Insights enabled, and we don't see a reasonable correlation between the number of requests to APIM and the requests to the App Service. Also, this is most noticeable only in a couple of operations.
For example,
Apim-GetUsers operation has a count of 60,000 requests per day (APIM's AI instance)
APIM App Insights Performance Page
AS-GetUsers operation has a count of 3,000,000 requests per day (App Service's AI instance)
App Service App Insights Performance Page
Apim-GetUsers routes the request to AS-GetUsers and Apim-GetUsers is the only operation that can call AS-GetUsers.
Given this, I would expect to see ~60,000 requests on the App Service's AI performance page for that operation, instead we see that huge number.
I looked into this issue a little bit and found out about sampling and that some App Insights features use the itemCount property to find the exact number of requests. In summary,
Is my expectation correct, and if so what could cause this? Also, would disabling adaptive sampling and using a fixed sampling rate give me the expected result?
Is my expectation wrong, and if so, what is a good way to get the expected result? Should I not use the Performance page for that metric?
Haven't tried a whole lot yet as I don't have access to play with the settings until I can find a viable solution, but I looked into sampling and itemCount property as mentioned above. APIM sampling is set to 100%.
I ran a query in Log Analytics on the requests table and when I just used the requests count, I got a number that was closer to the one I see in APIM, but when I use a sum of the itemCount, as suggested by some MS docs, I get that huge number as seen in the performance page.
List of NuGet packages and version that you are using:
Microsoft.Extensions.Logging.ApplicationInsights 2.14.0
Microsoft.ApplicationInsights.AspNetCore 2.14.0
Runtime version (e.g. net461, net48, netcoreapp2.1, netcoreapp3.1, etc. You can find this information from the *.csproj file):
netcoreapp3.1
Hosting environment (e.g. Azure Web App, App Service on Linux, Windows, Ubuntu, etc.):
App Service on Windows
Edit 1: Picture of operation_Id and itemCount

What can I do to speed up my load test using NBomber? (VS LT 250 RPS easily; NBomber maxed with 25 RPS)

We've been using Visual Studio Load Test to exercise our .NET Framework 4.7.2 telemetry client where we can set up the load test to post metrics to our Rabbit MQ at a rate of about 250 metrics per second. Recently, we've had to migrate our telemetry client to .NET Core and need to run load testing and verify that it can still post metrics at the same rate. Now, Visual Studio Load Test (VSLT) is being deprecated and has no support for .NET Core framework so we've had to look to something like NBomber to use in place of VSLT.
With regards to NBomber, there doesn't seem to be enough documentation or support that I can get because I've tried all I know and cannot get NBomber to post more than 25 metrics per second. At the same time, I'm seeing 100% CPU usage.
Anyone has any insight to share with me? Thanks in advance for your help,
Tien
Turns out, my logic was bad. A senior developer and friend shared with me some insights where I was initializing a telemetry client for each posting of a metric. This was the key to high consumption of CPU and not allowing me to reach the performance I was expecting. I'm in the process of re-coding my test(s) so that NBomber can be used to initialize 250 telemetry clients posting a minimum of 1MM metrics within an hour. I ran a fix yesterday that posted 17K metrics within 56secs with just 1 telemetry client or of about a rate of 300 RPS. I thought VS LT was awesome, but I'm thinking NBomber is quite impressive.
Cheers to Load Testing with NBomber!!
Tien
If single instance of NBomber is consuming 100% of CPU and not conducting the necessary load you will need to set up another machine and run NBomber in distributed cluster mode
Why do you need cluster?
You reached the point that the capacity of one node is not enough to create a relevant load.
You want to delegate running multiple scenarios to different nodes. For example, you want to test the database by sending in parallel read and write queries. In this case, one node can send inserts and another one can send read queries.
You want to simulate a real production load that requires several nodes to participate. For example, you may have one node that periodically writes data to the Kafka broker and two nodes that constantly read this data from the Redis cache.
Also it seems that Microsoft recommends using Apache JMeter™ so it might worth giving it a try. JMeter is capable of sending messages to various MQ implementations and its documentation is more concise, i.e. see Building a JMS Topic Test Plan

Does U-SQL allow custom code to call external services

In U-SQL custom code (code behind or Assemblies) can external services be called e.g. bing search or map.
Thanks,
Nasir
This is currently not supported for the following reason:
Imagine that you write a UDF or UDO (e.g., an extractor) that calls a REST endpoint of a service that is used to get a few calls per minute from the same originating IP address. But now you execute this user code in a U-SQL job that is scaled out over millions of rows, running possibly on hundreds of vertices concurrently. This is a - hopefully unintended - distributed denial of service attack against that service. And it most likely will lead to that service experiencing an outage and our IP ranges getting blocked.
Thus, we are currently closing off our containers and recommend that you use other mechanisms (like getting a data set for coordinate translations) instead.

Service Fabric: Should I split my API up into multiple little APIs?

I have been building .Net Web API's for years... normally I have one API that has 10 or so different controllers who handle everything from signing users up, handling business logic, payment, etc. Those all talk to class libraries to talk to the database and such. Nothing fancy, but it has been effective.
Fast forward to today... I am building a version 2 for an app that gets a good amount of traffic. I know my app is gonna get hit hard so I am looking for something with a foundation of efficiency and scale.
This has led me to embrace the coolness of Service Fabric and ASP.Net Core Web APIs. I have been reading lots of tutorials, articles, and SO questions and from what I understand, the beauty of Service Fabric is that it will spawn up multiple nodes in a single VM when things get busy.
So, if I maintain my normal pattern and make a single Web API with 10+ controllers, can Service Fabric do what it needs to do? Or am I supposed to create multiple little API's that are more focused so that the Service Fabric can add/remove them as things get busy?
That sounds like the right thing to do, and I have set up my code to do just that by putting my Models and Data classes in their own class libraries so they can be reused by the different API's, but I just wanted to double check before I do something potentially stupid.
If I split up, say each controller into its own Service Fabric service, will the Azure server be more efficient and scale better?
Nodes
In Service Fabric clusters (on Azure / stand alone) a Node equals a VM. If you increase the amount of machines, more Nodes appear in the cluster. (This is not the case for your local dev cluster.) Scaling in Azure Clusters is simple: just change the VMSS instance count.
Only if you configure Stateless Services with instance count -1, Service Fabric will spawn new instances of it. This is caused by the addition of nodes, not by load itself.
You can configure autoscaling for VMSS'es.
Web API
Service Fabric just tries to balance the load of all running SF Services across the available resources. That could be one instance of one service type on every node, or multiple instances of many types. So one service can just use all the resources of the node it's running on, like with IIS. (This is why Container support is coming by the way.)
Web API design isn't directly influenced by Service Fabric. The same rules apply as when running on IIS or elsewhere. It's really your choice.
Microservices
Your normal pattern will work. But making smaller services from it could help reduce the impact of changes. (At the cost of increased complexity.) Consider creating services that offer common functionality following the Microservices paradigm.
In Microservices, your code changes are scoped to smaller modules, less testing is needed, performance is less degraded during updates. This way, in theory, you can release new features in less time.
It depends.
If you have a natural division in your controllers regarding the resources they use then you may get some benefit if you split your services along that division line. Say service A uses lots of CPU and service B uses mostly HTTP then giving SF the ability to split CPU loads on their own may mean fewer affected HTTP calls.
You can optimize how SF distributes load by reporting load from inside your app but do so in the simplest way possible and don't add numerous dimensions, maybe one per service at most.
If all your controllers use the same type of resources roughly the same then there's no real benefit to splitting them away in separate services, just complications in code management, deployments and potentially inter-service communications.

Resources