I am looking to measure stats for VMs running in OpenStack environment. Stats would be like Uptime, CPU or RAM consumption just by that VM.
My understanding from reading the documentation is that Ceilometer and Healthnmon are for measuring the stats of the resources used on individual OpenStack nodes.
Is this true or can Ceilometer or Healthnmon be extended to capture monitoring stats from VMs as well?
Ceilometer meters can be used to monitor the vm vital stats. In hea templates this is been used to monitor vm cpu utilizations and memory consumptions.
Related
I have an OpenStack VM that is getting really poor performance on its root disk - less than 50MB/s writes. My setup is 10 GbE, OpenStack deployed using kolla, the Queen release, with storage on Ceph. I'm trying to follow the path through the infrastructure to identify where the performance bottleneck is, but getting lost along the way:
nova show lets me see which hypervisor (an Ubuntu 16.04 machine) the VM is running on but once I'm on the hypervisor I don't know what to look at. Where else can I look?
Thank you!
My advice is to check the performance first between host (hypervisor) and ceph , if you are able to create a ceph block device, then you will able to map it with rbd command , create filesystem, and mount it - then you can measure the device io perf with : sysstat , iostas, iotop, dstat, vmastat or even with sar
What are the minimum hardware requirements for setting up an Apache Airflow cluster.
Eg. RAM, CPU, Disk etc for different types of nodes in the cluster.
I have had no issues using very small instances in pseudo-distributed mode (32 parallel workers; Postgres backend):
RAM 4096 MB
CPU 1000 MHz
VCPUs 2 VCPUs
Disk 40 GB
If you want distributed mode, you should be more than fine with that if you keep it homogenous. Airflow shouldn't really do heavy lifting anyways; push the workload out to other things (Spark, EMR, BigQuery, etc).
You will also have to run some kind of messaging queue, like RabbitMQ. I think they take Redis too. However, this doesn't really dramatically impact how you size.
We are running the airflow in AWS with below config
t2.small --> airflow scheduler and webserver
db.t2.small --> postgres for metastore
The parallelism parameter in airflow.cfg is set to 10 and there are around 10 users who access airflow UI
All we do from airflow is ssh to other instances and run the code from there
Amazon / AWS EC2 offers SR-IOV (Single Root I/O Virtualization) instances, which it dubs "enhanced networking" -- does Google offer this on Compute Engine?
Specifically, are any GCE instance types able to bypass the hypervisor and have direct access to a multi-queue NIC?
SRV-IOV support is needed to take advantage of Scylla DB's architecture?
HN Discussion: https://news.ycombinator.com/item?id=10262719
Currently Google Compute Engine does not offer SR-IOV. That said, SR-IOV is not strictly necessary to take advantage of Scylla's architecture.
GCE offers multi-queue networking and it is possible to directly user-mode assign the virtio-net queues using Intel's DPDK. This should allow our virtio-net NIC to work with Scylla, although at least at one point DPDK made certain qemu specific assumptions with respect to virtio-net (in particular it assumed Tx/Rx queue depths of 256 descriptors; the virtio-net NIC in GCE currently advertises 16,384 entry queues although this is likely to change in the near future).
For applications like Scylla this should offer superior network performance and better in-guest compute overhead over utilizing the kernel TCP/IP stack.
Additionally, for all GCE instances with >= 1 cores (i.e., not fractional core instances) we offer multi-Gbps throughput subject to fabric availability. Latency is likely to be lowest in zones with Haswell processors. We do not currently guarantee specific network characteristics, but we offer up to 2 Gbps/core of network throughput shared between the virtual NIC and any attached persistent disk volumes (Local SSD throughput does not count against this limit). Throughput wise this makes 8-vCPU and larger instances comparable to EC2 Enhanced Networking.
At the moment, nothing that we offer is similar to AWS' "enhanced networking".
You are more than welcome posting this as a Feature Request on our Compute Engine Issue tracker though, so we can look at implementing a similar feature.
I know there are built-in tools swift-dispersion-populate and swift-dispersion-report that measure swift cluster health. swift-dispersion-report samples 1% of all partitions to compute the health of cluster.
My question is that are there other efficient approaches rather that sampling to monitor swift cluster health?
https://github.com/Dieterbe/graph-explorer Graph Explorer
https://github.com/BrightcoveOS/Diamond Diamond
Dimond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source. It can be integrated with Swift
Hope it helps.
I'm reading some OpenStack material recently, but didn't get a chance to try yet. I got the sense that Openstack could management a large number of virtual machines via API or dashboard interface. User could easily create/start virtual machines.
Then I come out a confusion. As the underlying computer hardware might vary, some computer maybe only able to host one virtual machine, some maybe ten. When user start a virtual machine, does user manually or Openstack automatically designate a hardware computer to host the virtual machine? In either case, how to decide the hardware computer's capacity? Does Openstack provide the functionality to set capacity attribute of hardware computer?
When you run OpenStack, each physical machine (which OpenStack calls compute hosts) will periodically report how many CPUs it has and how much RAM it has, as well as how many CPUs and how much RAM have been allocated to virtual machines that are currently running.
The OpenStack scheduler uses this information to determine which compute host to run a VM on. First, it checks to see if a host has enough CPUs (by applying the CoreFilter) and enough RAM (by applying the RamFilter). Compute hosts that don't have enough CPUs or RAM available won't even be considered.
Once it has a set of candidate hosts that have enough CPU and RAM, the scheduler needs to pick one of them. By default, the scheduler will use a "spread-first" strategy, allocating VMs to machines that have the most amount of CPU/RAM that isn't currently allocated to VM. It's possible to change this strategy to a "fill-first" behavior, so that the compute host with the least amount of free resources will get allocated first. This is configured by setting the nova.scheduler.least_cost.compute_fill_first_cost_fn parameter.
For more information, see the chapter on scheduling in the OpenStack Compute Admin guide.