I have physical host which has cpu model 'Intel(R) Xeon(R) CPU E5-2670 v3 # 2.30GHz' and it has 'avx2' flag in cpuinfo. The host has kvm/qemu hypervisor and libvirt configured. I set cpu mode as host-model in domain XML. Guest vm can be created on the host. When I check cpu model of guest vm, it shows as 'SandyBridge' and it also has 'avx2' flag in cpuinfo. But 'SandyBridge' does not support 'avx2' flag but 'Haswell' model does support. It is just due to host-model mode, libvirt finds nearest cpu model to 'Intel(R) Xeon(R) CPU E5-2670 v3 # 2.30GHz' as 'SandyBridge' but it should show 'Haswell' instead. Does that mean libvirt have a bug or it is valid representation in this scenario? I am using libvirt version 1.2.2
Within a particular chip generation (SandyBridge, Haswell, etc), Intel does not in fact guarantee that all different models it makes have the same CPU flags present. We can see this with Haswell or later where some CPUs have the TSX feature and some don't. QEMU/libvirt generally only provide a single model for each Intel generation though, so its possible that your physical CPU might not actually be compatible with the correspondingly named QEMU model.
From libvirt POV, the names are just a shortcut for a particular group of features. As such when identifying the CPU for "host-model" libvirt completely ignores names, and just looks for the CPU whose list of features is most closely related to your host CPU, and then lists any extra CPU features explicitly in the XML. So all this means that even though you have Haswell as your physical CPU, it is entirely possible that libvirt will display a different model name for your guest. There's nothing really wrong with this from a functional POV - the features should all be present still (except for a few that KVM intentionally blocks), it is merely a bit "surprising" to look at.
In your case, what I think is going on is due to the bug in Intel TSX support. This feature was introduced in Haswell, but then blocked in a microcode update after Intel found out it was broken. This causes the 'tsx' feature to disappear from the CPU model in your physical machine. The libvirt/QEMU Haswell CPU model still contains 'tsx', so this means libvirt won't match against your Haswell CPU. In libvirt >= 1.2.14, we introduced a new Haswell-noTSX CPU model to deall with this particular problem, but you say you only have 1.2.2. SandyBridge is simply the next best compatible CPU model that libvirt can find for you.
I found another workaround which doesn't require to upgrade libvirt. I removed hle and rtm flags from the definition of Haswell in cpu mapping xml file used by libvirt (/usr/share/libvirt/cpu_map.xml). And then I restarted libvirt process. Then I rebooted VM and it showed correct model name as Haswell.
Related
I'm running models in the r package 'secr'. The simplest models take days to complete on a 4G macbook and I've already done everything possible within the model's setup to decrease run time. Parallel (multicore) processing is possible and straightforward in secr, but benefits are minimal and run time may actually increase. Am I likely to see improvement in run time if I switch to a high-powered virtual machine in the cloud (e.g. AWS's EC2 with 16 RAM and 4 vCPUs), or do the EC2's four vCPUs function like a multicore system (in which case I would only benefit from one vCPU despite having 4)?
I've asked this question in a couple of different forums and received conflicting answers.
You can think of the vCPUs just like a multicore system. They would appear as multiple cores to any software running on the system.
Good question. It depends. You may see a improvement in runtime if you switch to a EC2 instance type with better virtual hardware specifications. AWS runs a custom version of the Xen hypervisor, and your getting vCPUs as you pointed out. Performance will depend on the variability of the other guests's workloads. If the vCPUs are all assigned to instances, and each instance is running CPU heavy workloads, your going to see a downward trend in performance. It depends on the pattern of usage of all the instances running on the hypervisor. This article from Citrix explains some of the nuances of balancing vCPU time between instances on Xen and why performance will vary:
Citrix on Xen vCPU Performance
The instance type matters, not only the vCPUs and RAM. Avoid the T2 instances because they are 'burstable' and CPU performance will certainly vary. This article from AWS recommends to try M4 instance types for parallelization with R:
Running R on AWS
For specific types of EC2 instances you can control the C-state (sleep levels a core can enter when it is idle) and P-State (desired performance in frequency from a core). This would allow you to tune your instance performance for your workload. The following link explains in detail what instance types allow for C-State and P-State control and shows you how to use the utility "stress" to benchmark and tune different configurations.
EC2: Processor State Control
It would be best to design a test when you first provision the instance to see if it the type meets your performance requirements and then run the test again later to see if the performance benchmark holds.
I couldn't find any query command about device being integrated/embedded in cpu or using system ram or its own dedicated gddr memory? I can benchmark mapping/unmapping versus reading/writing to get a conclusion but that device can be under load at that time and behave wrong and it would add complexity to already complex load balancing algorithm that I'm using.
Is there a simple way to check if a gpu is using same memory with cpu so I can choose directly mapping/unmapping instead of reading/writing?
Edit: there is CL_DEVICE_LOCAL_MEM_TYPE
CL_GLOBAL or CL_LOCAL
is this an indication of integratedness?
OpenCL 1.x has the device query CL_DEVICE_HOST_UNIFIED_MEMORY:
Is CL_TRUE if the device and the host have a unified memory subsystem
and is CL_FALSE otherwise.
This query is deprecated as of OpenCL 2.0, but should probably still work on OpenCL 2.x platforms for now. Otherwise, you may be able to produce a heuristic from the result of CL_DEVICE_SVM_CAPABILITIES instead.
I was wondering if there is a scientific differentiation in terminology when speaking of CPU Usage and CPU Utilization. I have the feeling that both words are used as synonyms. They both describe the relation between CPU Time and CPU Capacity. Wikipedia calls it CPU Usage. Microsoft uses CPU Utilization. But I also found an article where Microsoft uses the term CPU Usage. Now VMware defines to use CPU Utilization in the context of physical CPUs and CPU Usage in the context of logical CPUs. Also, there is no tag for cpu_utilization in stackoverflow.
Does anyone know a scientific differentiation?
Usage
CPU usage as a percentage during the interval.
o VM - Amount of actively used virtual CPU, as a percentage of total available CPU. This is the host's view of the CPU usage, not the guest operating system view. It is the average CPU utilization over all available virtual CPUs in the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, the virtual machine is using one physical CPU completely.
virtual CPU usage = usagemhz / (# of virtual CPUs x core frequency)
o Host - Actively used CPU of the host, as a percentage of the total available CPU. Active CPU is approximately equal to the ratio of the used CPU to the available CPU.
available CPU = # of physical CPUs x clock rate
100% represents all CPUs on the host. For example, if a four-CPU host is running a virtual machine with two CPUs, and the usage is 50%, the host is using two CPUs completely.
o Cluster - Sum of actively used CPU of all virtual machines in the cluster, as a percentage of the total available CPU.
CPU Usage = CPU usagemhz / effectivecpu
CPU usage, as measured in megahertz, during the interval.
o VM - Amount of actively used virtual CPU. This is the host's view of the CPU usage, not the guest operating system view.
o Host - Sum of the actively used CPU of all powered on virtual machines on a host. The maximum possible value is the frequency of the processors multiplied by the number of processors. For example, if you have a host with four 2GHz CPUs running a virtual machine that is using 4000MHz, the host is using two CPUs completely.
4000 / (4 x 2000) = 0.50
Used:
Time accounted to the virtual machine. If a system service runs on behalf of this virtual machine, the time spent by that service (represented by cpu.system) should be charged to this virtual machine. If not, the time spent (represented by cpu.overlap) should not be charged against this virtual machine.
Reference:http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc%2Fcpu_counters.html
Very doubtful. You will probably find exact definitions in some academic text books but I bet they'll be inconsistent between text books. I've seen definitions in manpages that are inconsistent with the actual implementation within the code. This is a case where everyone assumes the definitions are so obvious they never check to see if theirs is consistent with others.
My suggestion is to fully definite your use and go with that. Others can then have a reference (your formula/algorithm) and can translate between yours and theirs.
By the way, figuring out utilization, usage, etc. is very complicated and fraught with traps. OSs move tasks around, logical CPUs move between cores, turbo modes temporarily bump clock rates, work is offloaded to internal coprocessors, processors go to sleep or drop in frequency, hyperthreading where multiple logical CPUs contend for shared resources, etc. What's worse is that it is a moving target. Exact and well-defined metrics today will start to get out of date quickly as hardware and software architectures continue to evolve per Moore's law and any SW equivalent.
Within a single context (paper, book, web article, etc.), there may be a difference, but there are not, as far as I know, consistent universally accepted standard definitions for these terms.
Within one authors writings, however, they might be used to describe different things. For example (not an exhaustive list):
How much of a single CPUs computing capacity is being used over a specific sample period
How much of a single CPUs computing capacity is being used by a specific schedulable entity (thread, process, light-weight process, kernel, interrupt routine, etc.) over a specific sample period
Either of the above, but taking all CPUs in the system into account
Any of the above, but with a difference in perspective between real CPUs and virtual CPUs (whether hyperthreading or CPUs actually being emulated by VMware, KVM/QEMU, Xen, Virtualbox or the like)
A comparative measure of how much CPU capacity is being used in one algorithm over another
Probably several other possibilities as well....
I have installed xen as hypervisor and there are dom0 and some paravirtualized machins as domu VMs on it.
I know xentop is used for checking the performance of system and virtual machines and I can read the output of it for measuring the virtual machine cpu utilization, But, it just gives the total usage of cpus!
So, is there any tool or any way to get cpu usages per cores?
I think you may be able to get what you want from XenMon. http://www.virtuatopia.com/index.php/Xen_Monitoring_Tools_and_Techniques
Also, try using xentop with the VCPUs option: -v or press V when inside xentop.
If you are using XenServer, then there are lots of interesting host metrics available, much more than the base Xen setup. Check out http://support.citrix.com/servlet/KbServlet/download/38321-102-714737/XenServer-6.5.0_Administrators%20Guide.pdf, Chapter 9.
It's all particularly good if you use XenCenter to view those metrics, but then I would say that since I wrote a significant portion of it ;-)
I'm reading some OpenStack material recently, but didn't get a chance to try yet. I got the sense that Openstack could management a large number of virtual machines via API or dashboard interface. User could easily create/start virtual machines.
Then I come out a confusion. As the underlying computer hardware might vary, some computer maybe only able to host one virtual machine, some maybe ten. When user start a virtual machine, does user manually or Openstack automatically designate a hardware computer to host the virtual machine? In either case, how to decide the hardware computer's capacity? Does Openstack provide the functionality to set capacity attribute of hardware computer?
When you run OpenStack, each physical machine (which OpenStack calls compute hosts) will periodically report how many CPUs it has and how much RAM it has, as well as how many CPUs and how much RAM have been allocated to virtual machines that are currently running.
The OpenStack scheduler uses this information to determine which compute host to run a VM on. First, it checks to see if a host has enough CPUs (by applying the CoreFilter) and enough RAM (by applying the RamFilter). Compute hosts that don't have enough CPUs or RAM available won't even be considered.
Once it has a set of candidate hosts that have enough CPU and RAM, the scheduler needs to pick one of them. By default, the scheduler will use a "spread-first" strategy, allocating VMs to machines that have the most amount of CPU/RAM that isn't currently allocated to VM. It's possible to change this strategy to a "fill-first" behavior, so that the compute host with the least amount of free resources will get allocated first. This is configured by setting the nova.scheduler.least_cost.compute_fill_first_cost_fn parameter.
For more information, see the chapter on scheduling in the OpenStack Compute Admin guide.