How to get cpu cores utilizations in xen? - xen

I have installed xen as hypervisor and there are dom0 and some paravirtualized machins as domu VMs on it.
I know xentop is used for checking the performance of system and virtual machines and I can read the output of it for measuring the virtual machine cpu utilization, But, it just gives the total usage of cpus!
So, is there any tool or any way to get cpu usages per cores?

I think you may be able to get what you want from XenMon. http://www.virtuatopia.com/index.php/Xen_Monitoring_Tools_and_Techniques
Also, try using xentop with the VCPUs option: -v or press V when inside xentop.
If you are using XenServer, then there are lots of interesting host metrics available, much more than the base Xen setup. Check out http://support.citrix.com/servlet/KbServlet/download/38321-102-714737/XenServer-6.5.0_Administrators%20Guide.pdf, Chapter 9.
It's all particularly good if you use XenCenter to view those metrics, but then I would say that since I wrote a significant portion of it ;-)

Related

Running r models on AWS - do multiple vCPUs function like a multicore system?

I'm running models in the r package 'secr'. The simplest models take days to complete on a 4G macbook and I've already done everything possible within the model's setup to decrease run time. Parallel (multicore) processing is possible and straightforward in secr, but benefits are minimal and run time may actually increase. Am I likely to see improvement in run time if I switch to a high-powered virtual machine in the cloud (e.g. AWS's EC2 with 16 RAM and 4 vCPUs), or do the EC2's four vCPUs function like a multicore system (in which case I would only benefit from one vCPU despite having 4)?
I've asked this question in a couple of different forums and received conflicting answers.
You can think of the vCPUs just like a multicore system. They would appear as multiple cores to any software running on the system.
Good question. It depends. You may see a improvement in runtime if you switch to a EC2 instance type with better virtual hardware specifications. AWS runs a custom version of the Xen hypervisor, and your getting vCPUs as you pointed out. Performance will depend on the variability of the other guests's workloads. If the vCPUs are all assigned to instances, and each instance is running CPU heavy workloads, your going to see a downward trend in performance. It depends on the pattern of usage of all the instances running on the hypervisor. This article from Citrix explains some of the nuances of balancing vCPU time between instances on Xen and why performance will vary:
Citrix on Xen vCPU Performance
The instance type matters, not only the vCPUs and RAM. Avoid the T2 instances because they are 'burstable' and CPU performance will certainly vary. This article from AWS recommends to try M4 instance types for parallelization with R:
Running R on AWS
For specific types of EC2 instances you can control the C-state (sleep levels a core can enter when it is idle) and P-State (desired performance in frequency from a core). This would allow you to tune your instance performance for your workload. The following link explains in detail what instance types allow for C-State and P-State control and shows you how to use the utility "stress" to benchmark and tune different configurations.
EC2: Processor State Control
It would be best to design a test when you first provision the instance to see if it the type meets your performance requirements and then run the test again later to see if the performance benchmark holds.

Limit CPU & Memory for *nix Process

Is it possible to limit CPU & Memory for the *nix Process?
The CPU limit may look like "use no more than 10% of one core".
The memory limit may look like "use no more than 100Mb", the OS may limit it or kill the process if it try to exceed the limit, both ways are fine.
Any *nix that could do that would be fine.
It seems it is possible to implement it with virtual machines, but it is not acceptable because the overhead is too huge.
If you happen to use Solaris, the ability to limit resource usage is a native feature.
Memory (RAM) usage can be capped per process using the rcap.max-rss setting while CPU usage can be limited per project using the project.cpu-caps.
Note that Solaris also allows OS level virtualization (a.k.a. zones) which have no significant overhead, if any, compared to a bare metal OS instance.
Resource capping is part of Solaris zones configuration.
Try CPULimit
cpulimit is a simple program which attempts to limit the cpu usage of a process (expressed in percentage, not in cpu time). This is useful to control batch jobs, when you don't want them to eat too much cpu. It does not act on the nice value or other scheduling priority stuff, but on the real cpu usage. Also, it is able to adapt itself to the overall system load, dynamically and quickly.

Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?

I am running the termstrc yield curve analysis package in R across 10 years of daily bond price data for 5 different countries. This is highly compute intensive, it takes 3200 seconds per country on a standard lapply, and if I use foreach and %dopar% (with doSNOW) on my 2009 i7 mac, using all 4 cores (8 with hyperthreading) I get this down to 850 seconds. I need to re-run this analysis every time I add a country (to compute inter-country spreads), and I have 19 countries to go, with many more credit yield curves to come in the future. The time taken is starting to look like a major issue. By the way, the termstrc analysis function in question is accessed in R but is written in C.
Now, we're a small company of 12 people (read limited budget), all equipped with 8GB ram, i7 PCs, of which at least half are used for mundane word processing / email / browsing style tasks, that is, using 5% maximum of their performance. They are all networked using gigabit (but not 10-gigabit) ethernet.
Could I cluster some of these underused PCs using MPI and run my R analysis across them? Would the network be affected? Each iteration of the yield curve analysis function takes about 1.2 seconds so I'm assuming that if the granularity of parallel processing is to pass a whole function iteration to each cluster node, 1.2 seconds should be quite large compared with the gigabit ethernet lag?
Can this be done? How? And what would the impact be on my co-workers. Can they continue to read their emails while I'm taxing their machines?
I note that Open MPI seems not to support Windows anymore, while MPICH seems to. Which would you use, if any?
Perhaps run an Ubuntu virtual machine on each PC?
Yes you can. There are a number of ways. One of the easiest is to use redis as a backend (as easy as calling sudo apt-get install redis-server on an Ubuntu machine; rumor has that you could have a redis backend on a windows machine too).
By using the doRedis package, you can very easily en-queue jobs on a task queue in redis, and then use one, two, ... idle workers to query the queue. Best of all, you can easily mix operating systems so yes, your co-workers' windows machines qualify. Moreover, you can use one, two, three, ... clients as you see fit and need and scale up or down. The queue does not know or care, it simply supplies jobs.
Bost of all, the vignette in the doRedis has working examples of a mix of Linux and Windows clients to make a bootstrapping example go faster.
Perhaps not the answer you were looking for, but - this is one of those situations where an alternative is sooo much better that it's hard to ignore.
The cost of AWS clusters is ridiculously low (my emphasis) for exactly these types of computing problems. You pay only for what you use. I can guarantee you that you will save money (at the very least in opportunity costs) by not spending the time trying to convert 12 windows machines into a cluster. For your purposes, you could probably even do this for free. (IIRC, they still offer free computing time on clusters)
References:
Using AWS for parallel processing with R
http://blog.revolutionanalytics.com/2011/01/run-r-in-parallel-on-a-hadoop-cluster-with-aws-in-15-minutes.html
http://code.google.com/p/segue/
http://www.vcasmo.com/video/drewconway/8468
http://aws.amazon.com/ec2/instance-types/
http://aws.amazon.com/ec2/pricing/
Some of these instances are so powerful you probably wouldn't even need to figure out how to setup your work on a cluster (given your current description). As you can see from the references costs are ridiculously low, ranging from 1-4$ per hour of compute time.
What about OpenCL?
This would require rewriting the C code, but would allow potentially large speedups. The GPU has immense computing power.

Openstack: How to decide hardware capacity?

I'm reading some OpenStack material recently, but didn't get a chance to try yet. I got the sense that Openstack could management a large number of virtual machines via API or dashboard interface. User could easily create/start virtual machines.
Then I come out a confusion. As the underlying computer hardware might vary, some computer maybe only able to host one virtual machine, some maybe ten. When user start a virtual machine, does user manually or Openstack automatically designate a hardware computer to host the virtual machine? In either case, how to decide the hardware computer's capacity? Does Openstack provide the functionality to set capacity attribute of hardware computer?
When you run OpenStack, each physical machine (which OpenStack calls compute hosts) will periodically report how many CPUs it has and how much RAM it has, as well as how many CPUs and how much RAM have been allocated to virtual machines that are currently running.
The OpenStack scheduler uses this information to determine which compute host to run a VM on. First, it checks to see if a host has enough CPUs (by applying the CoreFilter) and enough RAM (by applying the RamFilter). Compute hosts that don't have enough CPUs or RAM available won't even be considered.
Once it has a set of candidate hosts that have enough CPU and RAM, the scheduler needs to pick one of them. By default, the scheduler will use a "spread-first" strategy, allocating VMs to machines that have the most amount of CPU/RAM that isn't currently allocated to VM. It's possible to change this strategy to a "fill-first" behavior, so that the compute host with the least amount of free resources will get allocated first. This is configured by setting the nova.scheduler.least_cost.compute_fill_first_cost_fn parameter.
For more information, see the chapter on scheduling in the OpenStack Compute Admin guide.

How do I select the no. of processors/cores to run my MPI program on?

I am using mpich2 1.2.1p1 version which has MPD as its default process manager.
When we run mpiexec, we can mention the no. of processes we want to spawn, but I also want to mention/select the no. of processors/cores I want to use. How do i do it?
Also, when we simply spawn n no. of processes, how do we know how many processors/cores are being used??
Please help.
Any sensible operating system will use as many cores as possible on each machine. You should not have to worry about that. When spawning 4 mpi processes on a quad core machine, it is safe to assume that all 4 cores will be used. If not, there is something seriously wrong with the configuration. Anyway, if you really want to be sure, check the CPU usage with for example 'top'.
The number of processes is the number of cores used. Mpi will put at least one process on each core. If you want to make sure you are always using the maximum number of cores on your machine then use the OS resources on your system to get the number of cores and pass that to the mpiexec call.

Resources