cpu_util metric in ussuri version of Openstack - openstack

Though cpu_util is deprecated in Ussuri version, I understand there is alternate way to get CPU utilization metrics through cpu metric. I tried the below query to retrieve CPU utilization and it worked.
gnocchi aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) 300000000000) 100)' id=620a7185-fbe4-4744-ae73-66b16c4c533a
But, I'm using Openstack Tacker in Ussuri and I need to trigger automatic scaling of instances when my CPU utilization is greater then 80% , how do I go ahead with the above Gnocchi query?

Related

Why is Datastax Opscenter eating too much CPU?

Environment :
machines : 2.1 xeon, 128 GB ram, 32 cpu
os : centos 7.2 15.11
cassandra version : 2.1.15
opscenter version : 5.2.5
3 keyspaces : Opscenter (3 tables), OpsCenter (10 tables), application`s keyspace with (485 tables)
2 Datacenters, 1 for cassandra (5 machines )and another one DCOPS to store up opscenter data (1 machine).
Right now the agents on the nodes consume on average ~ 1300 cpu (out of 3200 available). The only transactioned data being ~ 1500 w/s on the application keyspace.
Any relation between number tables and opscenter? Is it behaving alike, eating a lot of CPU because agents are trying to write the data from too many metrics or is it some kind of a bug!?
Note, same behaviour on previous version of opscenter 5.2.4. For this reason i first tried to upg opscenter to newest version available.
From opscenter 5.2.5 release notes :
"Fixed an issue with high CPU usage by agents on some cluster topologies. (OPSC-6045)"
Any help/opinion much appreciated.
Thank you.
Observing with the awesome tool you provided Chris, on specific agent`s PID noticed that the heap utilisation was constant above 90% and that triggered a lot of GC activity with huge GC pauses of almost 1 sec. In this period of time i suspect the pooling threads had to wait and block my cpu alot. Anyway i am not a specialist in this area.
I took the decision to enlarge the heap for the agent from default 128 to a nicer value of 512 and i saw that all the GC pressure went off and now any thread allocation is doing nicely.
Overall the cpu utilization dropped from values of 40-50% down to 1-2% for the opscenter agent. And i can live with 1-2% since i know for sure that the CPU is consumed by the jmx-metrics.
So my advice is to edit the file:
datastax-agent-env.sh
and alter the default 128 value of Xmx
-Xmx512M
save the file, restart the agent, and monitor for a while.
http://s000.tinyupload.com/?file_id=71546243887469218237
Thank you again Chris.
Hope this will help other folks.

How do I modify AWS cloudwatch metrics for memory?

I have a cluster of Elastic Beanstalk instances running Wordpress. RAM utilization is high, around 93% because of some plugins that we're running (a different issue altogether). However, every time one of the instances hits 90+% RAM utilization it kicks the instance into a "degraded" state and kicks it out of the cluster.
I can't for the life of me find a way to modify this RAM utilization check. Ideally I'd be able to bump the RAM utilization threshold up to 95% or so....how do I change this metric, disable it, or remove it altogether?
EDIT
Here is a screenshot of the elastic beanstalk Health screen. I may have jumped to conclusions assuming it's a Cloudwatch metric but the concept is still the same...how do I up the threshold?
Of particular note, the instance status and the message beneath it.

MemSQL: High CPU usage

My cluster has one MASTER AGGREGATOR and one LEAF. After running two months, the CPU usage in LEAF is very high, almost at 100%. So, is this normal?
By the way, its size is 545 MB for table data.
This is not normal for MemSQL operation. Note that the Ops console is showing you all CPU use on that host, not just what MemSQL is using. I recommend running 'top' or similar to determine what process(es) are consuming resources.
You can also run 'SHOW PROCESSLIST' on any node to see if there is a long-running MemSQL process.

Difference between CPU Usage and CPU Utilization?

I was wondering if there is a scientific differentiation in terminology when speaking of CPU Usage and CPU Utilization. I have the feeling that both words are used as synonyms. They both describe the relation between CPU Time and CPU Capacity. Wikipedia calls it CPU Usage. Microsoft uses CPU Utilization. But I also found an article where Microsoft uses the term CPU Usage. Now VMware defines to use CPU Utilization in the context of physical CPUs and CPU Usage in the context of logical CPUs. Also, there is no tag for cpu_utilization in stackoverflow.
Does anyone know a scientific differentiation?
Usage
CPU usage as a percentage during the interval.
o VM - Amount of actively used virtual CPU, as a percentage of total available CPU. This is the host's view of the CPU usage, not the guest operating system view. It is the average CPU utilization over all available virtual CPUs in the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, the virtual machine is using one physical CPU completely.
virtual CPU usage = usagemhz / (# of virtual CPUs x core frequency)
o Host - Actively used CPU of the host, as a percentage of the total available CPU. Active CPU is approximately equal to the ratio of the used CPU to the available CPU.
available CPU = # of physical CPUs x clock rate
100% represents all CPUs on the host. For example, if a four-CPU host is running a virtual machine with two CPUs, and the usage is 50%, the host is using two CPUs completely.
o Cluster - Sum of actively used CPU of all virtual machines in the cluster, as a percentage of the total available CPU.
CPU Usage = CPU usagemhz / effectivecpu
CPU usage, as measured in megahertz, during the interval.
o VM - Amount of actively used virtual CPU. This is the host's view of the CPU usage, not the guest operating system view.
o Host - Sum of the actively used CPU of all powered on virtual machines on a host. The maximum possible value is the frequency of the processors multiplied by the number of processors. For example, if you have a host with four 2GHz CPUs running a virtual machine that is using 4000MHz, the host is using two CPUs completely.
4000 / (4 x 2000) = 0.50
Used:
Time accounted to the virtual machine. If a system service runs on behalf of this virtual machine, the time spent by that service (represented by cpu.system) should be charged to this virtual machine. If not, the time spent (represented by cpu.overlap) should not be charged against this virtual machine.
Reference:http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc%2Fcpu_counters.html
Very doubtful. You will probably find exact definitions in some academic text books but I bet they'll be inconsistent between text books. I've seen definitions in manpages that are inconsistent with the actual implementation within the code. This is a case where everyone assumes the definitions are so obvious they never check to see if theirs is consistent with others.
My suggestion is to fully definite your use and go with that. Others can then have a reference (your formula/algorithm) and can translate between yours and theirs.
By the way, figuring out utilization, usage, etc. is very complicated and fraught with traps. OSs move tasks around, logical CPUs move between cores, turbo modes temporarily bump clock rates, work is offloaded to internal coprocessors, processors go to sleep or drop in frequency, hyperthreading where multiple logical CPUs contend for shared resources, etc. What's worse is that it is a moving target. Exact and well-defined metrics today will start to get out of date quickly as hardware and software architectures continue to evolve per Moore's law and any SW equivalent.
Within a single context (paper, book, web article, etc.), there may be a difference, but there are not, as far as I know, consistent universally accepted standard definitions for these terms.
Within one authors writings, however, they might be used to describe different things. For example (not an exhaustive list):
How much of a single CPUs computing capacity is being used over a specific sample period
How much of a single CPUs computing capacity is being used by a specific schedulable entity (thread, process, light-weight process, kernel, interrupt routine, etc.) over a specific sample period
Either of the above, but taking all CPUs in the system into account
Any of the above, but with a difference in perspective between real CPUs and virtual CPUs (whether hyperthreading or CPUs actually being emulated by VMware, KVM/QEMU, Xen, Virtualbox or the like)
A comparative measure of how much CPU capacity is being used in one algorithm over another
Probably several other possibilities as well....

When should I add (auto-scale) new Nginx server instances?

Should I take in consideration CPU utilization, network traffic or http response time checks? I've run some tests with Apache AB (from the same server - eq: ab -k -n 500000 -c 100 http://192.XXX.XXX.XXX/) - and I monitored the load average. Even if the load was between 1.0 - 1.50(one core server), "time per request"(mean) was pretty solid, 140ms for a simple dynamic page with one set/get Redis operation. Anyway, I'm confused as the general advice is to launch a new instance when you pass the 70% CPU utilization threshold.
70% CPU utilization is a good rule of thumb for CPU-bound applications like nginx. CPU time is kind of like body temperature: it actually hides a lot of different things, but is a good general indicator of health. Load average is a separate measure of how many processes are waiting to be scheduled. The reason the rule is 70% (or 80%) utilization is that, past this point, CPU-bound appliations tend to suffer contention-induced latency and non-linear performance.
You can test this yourself by plotting throughput and latency (median and 90th percentile) against CPU utilization on your setup. Finding the inflection point for your particular system is important for capacity planning.
A very good writeup of this phenomenon is given in Facebook's original paper on Dyno, their system for measuring throughput of PHP under load.
https://www.facebook.com/note.php?note_id=203367363919

Resources