Regarding CPU utilization by a given process SNMP - cpu-usage

I have a multiprocessor system and I am trying to calculate CPU usage of a particular process, But I am getting more than 100%
Later I saw the hrSWRunPerf object's properties of that OID (hrSWRunPerfCPU) from which is
Type
Access
Description
Integer32
read-only
The number of centi-seconds of the total system's CPU resources consumed by this process. Note that on a multi-processor system, this value may increment by more than one centi-second in one centi-second of real (wall clock) time.
Hence how do I get to calculate CPU usage(%) of a process in the case of a multi-processor machine?

Related

An optimal number of worker threads in an async IO TCP server

We have migrated our thread per connection communication model to an async IO based TCP server using boost::asio. The reason for this change is that the old model didn't scale well enough. We have permanently about 2k persistent connections on average with the tendency to keep growing on a monthly basis.
My question is what would be the ideal number of worker threads that will poll the io_service queue for completion handlers - the number of virtual cpu cores?
Choosing a small number can lead to the situations where the server does not consume quickly enough and can't cope with the rate the clients send messages.
Does it make sense to add worker threads dynamically in such situations?
Update:
Probably it is my implementation but i find this statement part of the boost asio docu confusing:
Implementation strategies such as thread-per-connection (which a synchronous-only approach would require) can degrade system
performance, due to increased context switching, synchronisation and
data movement among CPUs. With asynchronous operations it is possible
to avoid the cost of context switching by minimising the number of
operating system threads — typically a limited resource — and only
activating the logical threads of control that have events to
process.
As if you have X threads pumping completion events on a machine that has X cores - 1) you don't have any guarantees that each thread gets a dedicated cpu and 2) if my connections are persistent i don't have any guarantees that the thread that say does an async_read will be the same as the one to execute the completion handler.
void Connection::read {
boost::asio::async_read(socket, boost::asio::buffer(buffer, 18),
boost::bind(&Connection::handleRead, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
void Connection::handleRead(const boost::system::error_code &error,
std::size_t bytes_transferred) {
// processing of the bytes
...
// keep processing on this socket
read();
}
In an ideal situation with perfectly non-blocking I/O, a working set that fits entirely in L1 cache, and no other processes in the physical system, each thread will use the entire resources of a processor core. In such a situation, the ideal number of threads is one per logical core.
If any part of your I/O is blocking then it makes sense to add more threads than the number of cores so that no cores sit idle. If half the thread time is spent blocked then you should have almost 2 threads per core. If 75% of the thread time is spent blocked then you should have 3 or 4 per core, and so on. Context switching overhead counts as blocking for this purpose.
I've noticed that when Microsoft has to make a blind guess about this, they tend to go for two or four threads per core. Depending on your budget for making this determination I would either go with 2 or 4, or start with one thread per core and work my way up, measuring throughput (requests serviced / second) and latency (min, max, and average response time) until I hit the sweet spot.
Dynamically adjusting this value only makes sense if you are dealing with radically different programs. For a predictable workload there is a sweet spot for your hardware that should not change much even with increasing amounts of work to be done. If you are making a general purpose web server then dynamic adjustment is probably necessary.

Difference between CPU Usage and CPU Utilization?

I was wondering if there is a scientific differentiation in terminology when speaking of CPU Usage and CPU Utilization. I have the feeling that both words are used as synonyms. They both describe the relation between CPU Time and CPU Capacity. Wikipedia calls it CPU Usage. Microsoft uses CPU Utilization. But I also found an article where Microsoft uses the term CPU Usage. Now VMware defines to use CPU Utilization in the context of physical CPUs and CPU Usage in the context of logical CPUs. Also, there is no tag for cpu_utilization in stackoverflow.
Does anyone know a scientific differentiation?
Usage
CPU usage as a percentage during the interval.
o VM - Amount of actively used virtual CPU, as a percentage of total available CPU. This is the host's view of the CPU usage, not the guest operating system view. It is the average CPU utilization over all available virtual CPUs in the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, the virtual machine is using one physical CPU completely.
virtual CPU usage = usagemhz / (# of virtual CPUs x core frequency)
o Host - Actively used CPU of the host, as a percentage of the total available CPU. Active CPU is approximately equal to the ratio of the used CPU to the available CPU.
available CPU = # of physical CPUs x clock rate
100% represents all CPUs on the host. For example, if a four-CPU host is running a virtual machine with two CPUs, and the usage is 50%, the host is using two CPUs completely.
o Cluster - Sum of actively used CPU of all virtual machines in the cluster, as a percentage of the total available CPU.
CPU Usage = CPU usagemhz / effectivecpu
CPU usage, as measured in megahertz, during the interval.
o VM - Amount of actively used virtual CPU. This is the host's view of the CPU usage, not the guest operating system view.
o Host - Sum of the actively used CPU of all powered on virtual machines on a host. The maximum possible value is the frequency of the processors multiplied by the number of processors. For example, if you have a host with four 2GHz CPUs running a virtual machine that is using 4000MHz, the host is using two CPUs completely.
4000 / (4 x 2000) = 0.50
Used:
Time accounted to the virtual machine. If a system service runs on behalf of this virtual machine, the time spent by that service (represented by cpu.system) should be charged to this virtual machine. If not, the time spent (represented by cpu.overlap) should not be charged against this virtual machine.
Reference:http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc%2Fcpu_counters.html
Very doubtful. You will probably find exact definitions in some academic text books but I bet they'll be inconsistent between text books. I've seen definitions in manpages that are inconsistent with the actual implementation within the code. This is a case where everyone assumes the definitions are so obvious they never check to see if theirs is consistent with others.
My suggestion is to fully definite your use and go with that. Others can then have a reference (your formula/algorithm) and can translate between yours and theirs.
By the way, figuring out utilization, usage, etc. is very complicated and fraught with traps. OSs move tasks around, logical CPUs move between cores, turbo modes temporarily bump clock rates, work is offloaded to internal coprocessors, processors go to sleep or drop in frequency, hyperthreading where multiple logical CPUs contend for shared resources, etc. What's worse is that it is a moving target. Exact and well-defined metrics today will start to get out of date quickly as hardware and software architectures continue to evolve per Moore's law and any SW equivalent.
Within a single context (paper, book, web article, etc.), there may be a difference, but there are not, as far as I know, consistent universally accepted standard definitions for these terms.
Within one authors writings, however, they might be used to describe different things. For example (not an exhaustive list):
How much of a single CPUs computing capacity is being used over a specific sample period
How much of a single CPUs computing capacity is being used by a specific schedulable entity (thread, process, light-weight process, kernel, interrupt routine, etc.) over a specific sample period
Either of the above, but taking all CPUs in the system into account
Any of the above, but with a difference in perspective between real CPUs and virtual CPUs (whether hyperthreading or CPUs actually being emulated by VMware, KVM/QEMU, Xen, Virtualbox or the like)
A comparative measure of how much CPU capacity is being used in one algorithm over another
Probably several other possibilities as well....

Limit CPU & Memory for *nix Process

Is it possible to limit CPU & Memory for the *nix Process?
The CPU limit may look like "use no more than 10% of one core".
The memory limit may look like "use no more than 100Mb", the OS may limit it or kill the process if it try to exceed the limit, both ways are fine.
Any *nix that could do that would be fine.
It seems it is possible to implement it with virtual machines, but it is not acceptable because the overhead is too huge.
If you happen to use Solaris, the ability to limit resource usage is a native feature.
Memory (RAM) usage can be capped per process using the rcap.max-rss setting while CPU usage can be limited per project using the project.cpu-caps.
Note that Solaris also allows OS level virtualization (a.k.a. zones) which have no significant overhead, if any, compared to a bare metal OS instance.
Resource capping is part of Solaris zones configuration.
Try CPULimit
cpulimit is a simple program which attempts to limit the cpu usage of a process (expressed in percentage, not in cpu time). This is useful to control batch jobs, when you don't want them to eat too much cpu. It does not act on the nice value or other scheduling priority stuff, but on the real cpu usage. Also, it is able to adapt itself to the overall system load, dynamically and quickly.

w3wp.exe exceeding max threads

we have requirement to handle 10000 concurrent user.
Let me explain the system. Machine has two processors. ProcessModel in machine.config is set as autoconfig = true. so that makes maxWorkerThreads = 20.
When I Load run my case with 30 users and watch CPU usage it is maximing to 100. and number of threads on w3wp.exe is more then 70. As my default is 20 * 2 (CPU's) = 40.
Once cpu touches 100% most of the transaction fail or talking maximum time to respond
Now questions
1. how do i get 30 more threads assigned to the same workerprocess?
2. How can reduce CPU usage here?
You have a bit of an issue here. Increasing the # of threads will further increase CPU usage. (Your 2 goals are incompatible.) Not only are you asking each CPU to do more, but you'll have additional overhead with context switching.
You could investigate using a non-blocking IO library, which would essentially mean only 1-2 threads per CPU. However, this could be a significant architecture change to your project (probably not feasible) - and what you might actually find is that most of the CPU was actually spent due to the work your code is performing, and not because of anything threading-related.
It sounds like you need to do some performance tuning and optimization of your application.
You should take a look at making async calls so that your threads are not remaining active while the caller is waiting for a response.
http://msdn.microsoft.com/en-us/magazine/cc163463.aspx

Munin Graphs meaning

I've been using Munin for some days and I think it's very interesting information, but I don't understand some of the graphs, and how they can be used/read to get information to improve system.
The ones I don't understand are:
Disk
Disk throughput per device
Inode usage in percent
IOstat
Firewall Throughput
Processes
Fork rate
Number of threads
VMstat
System
Available entropy
File table usage
Individual interrupts
Inode table usage
Interrupts and context switches
Ty!
Munin creates graphs that enable you to see trends. This is very useful to see if a change you made doesn't negatively impact the performance of the system.
Disk
Disk throughput per device & IOstat
The amount of data written or read from a disk device. Disks are always slow compared to memory. A lot of disk reads could for example indicate that your database server doesn't have enough RAM.
Inode usage in percent
Every filesystem has a index where information about the files is stored, like name, permissions and location on the disk. With many small files the space available to this index could run out. If that happens no new files can be saved to that filesystem, even if there is enough space on the device.
Firewall Throughput
Just like it says, the amount of packets going though the iptables firewall. Often this firewall is active on all interfaces on the system. This is only really interesting if you run munin on a router/firewall/gateway system.
Processes
Fork rate
Processes are created by forking a existing process into two processes. This is the rate at wich new processes are created.
Number of threads
The total number of processes running in the system.
VMstat
Usage of cpu time.
running: time spent running non-kernel code
I/O sleep: Time spent waiting for IO
System
Available entropy: The entropy is the measure of the random numbers available from /dev/urandom. These random numbers are needed to create SSL connections. If you create a large number of SSL connections this randomness pool could possibly run out of real random numbers.
File table usage
The total number of files open in the system. If this number suddenly goes up there might be a program that is not releasing its file handles properly.

Resources