devstack multi node installation - openstack

I have 3 nodes which i am using for multi node setup. I am thinking of following the below structure
Controller: keystone, horizon, g-reg, g-api, n-api, n-crt, n-sch, n-cond, n-cauth, n-obj, n-novnc, n-xvnc, c-api, c-sch (this node will have mysql and rabbitmq as well)
Network: q-svc, q-agt, q-dhcp, q-l3, q-meta, quantum
Compute: n-cpu, c-vol
I have a few questions. 1. In Compute node, do i need to keep n-api? Also what else is needed apart from n-api and c-vol? Is q-agt needed in compute? 2. Will i need c-api along with c-vol? Does compute node need rabbit mq installed?

Q1)
You don't want the nova-api on the compute nodes generally. It's better on the controller.
Nova api makes use of pasted hard system credentials and you don't want that paste file exposed on any node that a user may compromise with a hypervisor escape.
nova-compute and nova-volume is all you probably need. they do communicate with the scheduler over rabbitmq so make sure that's working =P
Q2)
You don't NEED cinder to run an openstack cloud, though I see no reason not to include it.
I don't know what impact disabling cinder has on the devstack stack.sh script, I've never done it.
As per RabbitMQ see above answer.

Related

Kubernetes and MPI

I want to run an MPI job on my Kubernetes cluster. The context is that I'm actually running a modern, nicely containerised app but part of the workload is a legacy MPI job which isn't going to be re-written anytime soon, and I'd like to fit it into a kubernetes "worldview" as much as possible.
One initial question: has anyone had any success in running MPI jobs on a kube cluster? I've seen Christian Kniep's work in getting MPI jobs to run in docker containers, but he's going down the docker swarm path (with peer discovery using consul running in each container) and I want to stick to kubernetes (which already knows the info of all the peers) and inject this information into the container from the outside. I do have full control over all the parts of the application, e.g. I can choose which MPI implementation to use.
I have a couple of ideas about how to proceed:
fat containers containing slurm and the application code -> populate
the slurm.conf with appropriate info about the peers at container
startup -> use srun as the container entrypoint to start the jobs
slimmer containers with only OpenMPI (no slurm) -> populate a
rankfile in the container with info from outside (provided by
kubernetes) -> use mpirun as the container entrypoint
an even slimmer approach, where I basically "fake" the MPI runtime by
setting a few environment variables (e.g. the OpenMPI ORTE ones) ->
run the mpicc'd binary directly (where it'll find out about its peers
through the env vars)
some other option
give up in despair
I know trying to mix "established" workflows like MPI with the "new hotness" of kubernetes and containers is a bit of an impedance mismatch, but I'm just looking for pointers/gotchas before I go too far down the wrong path. If nothing exists I'm happy to hack on some stuff and push it back upstream.
I tried MPI Jobs on Kubernetes for a few days and solved it by using dnsPolicy:None and dnsConfig (CustomDNS=true feature gate will be needed).
I pushed my manifests (as Helm chart) here.
https://github.com/everpeace/kube-openmpi
I hope it would help.
Assuming you don't want to use hw-specific MPI library (for example anything that uses direct access to communication fabric), I would go with option 2.
First, implement a wrapper for mpirun which populates necessary data
using kubernetes API, specifically using endpoints if using a
service (might be a good idea), could also scrape pod's exposed
ports directly.
Add some form of checkpoint program that can be used for
"rendezvous" synchronization before starting actual running code (I
don't know how well MPI deals with ephemeral nodes). This is to
ensure that when mpirun starts it has stable set of pods to use
And finally actually build a container with necessary code and I
guess SSH service for mpirun to use for starting processes in
other pods.
Another interesting option would be to use Stateful Sets, possibly even running with SLURM inside, which implement a "virtual" cluster of MPI machines running on kubernetes.
This provides stable hostnames for each node, which would reduce the problem of discovery and keeping track of state. You could also use statefully-assigned storage for container's local work filesystem (which, with some work, could be made to for example always refer to same local SSD).
Another benefit is that it would be probably least invasive to the actual application.

openstack: relation between controller & compute nodes

I just started playing with openstack, and many things still don't understand. As I see it, to start a VM instance, we normally execute some commands on the controller e.g.
glance image-create
nova boot
But how does the controller know:
1) on which compute node to start the VM
2) how many compute nodes it has
Where does it take this information?
The controller will boot determine the location to launch the instance based on the information provided by nova-scheduler:
http://docs.openstack.org/juno/config-reference/content/section_compute-scheduler.html
As for how many compute nodes are recognized, this is determined when you register a compute node with nova compute on the controller. Here is a reference for how compute is installed and configured for RHEL/CentOS/Fedora:
http://docs.openstack.org/juno/install-guide/install/yum/content/ch_nova.html
I'd suggest to learn the OpenStack software architecture for such questions, for example, look at this page http://docs.openstack.org/openstack-ops/content/example_architecture.html.
Simply speacking, OpenStack saves all the configurations in database which is by default mysql, so Controller knows all the information. A Nova component named nova-scheduler running as a controller service will decide where to place VM among all available hosts.
A good staring point is to deploy multiple nodes env. You will know how OpenStack works in the deployment procedure.

how to get instances back after reboot in openstack

After successful installation of devstack and launching instances,but once reboot machine, need to start all over again and lose all the instances which were launched back then.I tried rejoin-stack but did not worked,How can i get the instances back after reboot ?
You might set resume_guests_state_on_host_boot = True in nova.conf. The file should be located at /etc/nova/nova.conf
I've found some old discussion http://www.gossamer-threads.com/lists/openstack/dev/8772
AFAIK at the present time OpenStack (Icehouse) still not completely aware about environments inside it, so it can't restore completely after reboot. The instances will be there (virsh domains), but even if you start them manually or using nova flags I'm not sure whether other facilities will handle this correctly (e.g. neutron will correctly configure all L3 rules according to DB records, etc.) Honestly I'm pretty sure they won't...
The answer depends of what you need to achieve:
If you need a template environment (e.g. similar set of instances and networks each time after reboot) you may just script everything. In other words just make a bash script creating everything you need and run it each time after stack.sh. Make sure you're starting with clean environment since OpenStack DB state remains between ./unstack - ./stack.sh or ./rejoin-stack.sh (you might try to just clean DB, or delete it. stack.sh will build it back).
If you need a persistent environment (e.g. you don't want to loose VM's and whole infrastructure state after reboot) I'm not aware how to do this using OpenStack. F.e. neutron agents (they configure iptables, dhcp etc) do not save state and driven by events from Neutron service. They will not restore after reboot, so the network will be dead. I'll be very glad if someone will share a method to do such recovery.
In general I think OpenStack is not focusing on this and will not focus during the nearest release cycles. Common approach is to have multi-node environment where each node is replaceable.
See http://docs.openstack.org/high-availability-guide/content/ch-intro.html for reference
devstack is an ephemeral environment. it is not supposed to survive a reboot. this is not a supported behavior.
that being said you might find success in re-initializing the environment by running
./unstack.sh
follower by
./stack.sh
again.
Again, devstack is an ephemeral environment. It's primary purpose for existing is to run gate testing for openstack's CI infrastructure.
or try ./rejoin-stack.sh to re-join previous screens.

openstack: how to run multiple glance instances?

I'm new to openstack. And I have followed the installation guide to set an openstack on a single host server. Now I got a question. On the single node, I registered a glance service and relative endpoints in keystone. If I want to run glance on multiple host servers, do I need to register two glance services in keystone? Or do I still just need one glance service but add more glance endpoints?
You would probably want to find a way to load balance the glace API endpoints then place the load balanced address into the keystone catalog as a PublicURL.
The trick is finding a way to load balance glance. The big issues is the size of queries to glance in both time and data throughput. Not exactly an easy service to load balance.

Should nova-api run on different compute nodes?

I am dealing with OpenStackļ¼ˆFolsom) and I want to deploy OpenStack to work on different
compute nodes. Is it necessary to run Nova Api service on every node?
It seems that every compute node needs a nova-api service in my equirement, but I think it does not make sense.
In my understanding only one nova-api service is required in the hole cloud system.
Request -> nova-api -> nova-schedule to determine which node to use.
Yes I think it is so, and according to the office guide writen by the OpenStack Installing Additional Compute Nodes only the dependence and the nova-* component on the additional compute node should be installed or just the nova-compute package.
In general, you only need one nova-api service running.
However, if your networking is configured for multi-host, then you will need to run a metadata service on each compute node. In this scenario, you need to run nova-api-metadata service on each compute node.
It is not necessary to run Nova-API service in every compute node. But, if you are using some of the available images with cloud init script that looks for metadata from Nova API then you need to install it in every compute node.
If you can build your own VM image without cloud init scripts, then it will not be required.

Resources