how to get instances back after reboot in openstack - openstack

After successful installation of devstack and launching instances,but once reboot machine, need to start all over again and lose all the instances which were launched back then.I tried rejoin-stack but did not worked,How can i get the instances back after reboot ?

You might set resume_guests_state_on_host_boot = True in nova.conf. The file should be located at /etc/nova/nova.conf

I've found some old discussion http://www.gossamer-threads.com/lists/openstack/dev/8772
AFAIK at the present time OpenStack (Icehouse) still not completely aware about environments inside it, so it can't restore completely after reboot. The instances will be there (virsh domains), but even if you start them manually or using nova flags I'm not sure whether other facilities will handle this correctly (e.g. neutron will correctly configure all L3 rules according to DB records, etc.) Honestly I'm pretty sure they won't...
The answer depends of what you need to achieve:
If you need a template environment (e.g. similar set of instances and networks each time after reboot) you may just script everything. In other words just make a bash script creating everything you need and run it each time after stack.sh. Make sure you're starting with clean environment since OpenStack DB state remains between ./unstack - ./stack.sh or ./rejoin-stack.sh (you might try to just clean DB, or delete it. stack.sh will build it back).
If you need a persistent environment (e.g. you don't want to loose VM's and whole infrastructure state after reboot) I'm not aware how to do this using OpenStack. F.e. neutron agents (they configure iptables, dhcp etc) do not save state and driven by events from Neutron service. They will not restore after reboot, so the network will be dead. I'll be very glad if someone will share a method to do such recovery.
In general I think OpenStack is not focusing on this and will not focus during the nearest release cycles. Common approach is to have multi-node environment where each node is replaceable.
See http://docs.openstack.org/high-availability-guide/content/ch-intro.html for reference

devstack is an ephemeral environment. it is not supposed to survive a reboot. this is not a supported behavior.
that being said you might find success in re-initializing the environment by running
./unstack.sh
follower by
./stack.sh
again.
Again, devstack is an ephemeral environment. It's primary purpose for existing is to run gate testing for openstack's CI infrastructure.

or try ./rejoin-stack.sh to re-join previous screens.

Related

puppet client reporting to wrong host in Foreman

This is my first post!
I have 100's of nodes managed by puppet/foreman. Everything is fine.
I did something I already did without problem in the past:
Change the hostname of a server.
This time I changed 2 hostnames:
Initially I had 'gate02' and 'gate03'.
I moved gate02 to 'gate02old' (with dummy IP, and switched the server OFF)
then I moved gate03 to gate02 ...
Now (the new) gate02 reports are updating the host called gate02old in foreman.
I did clean the certs in the puppetserver. I rm the ssl dir in the (new) gate02 and run puppet agent. I did not fing any reference to 'gate' in /var/lib/puppet. I changed the certname in puppet.conf and in hostname, and in sysconfig/network-script/ifcfg-xxxx.
The puppet agent run smoothly, and sends it to the puppetserver. But it updates the wrong host!
Anyone would have a clue on how to fix this ?
Thanks!
Foreman 2.0.3
Puppet 6
I do not accept that the sequence of events described led to the behavior described. If reports for the former gate03, now named gate02, are being logged on the server for name gate02old, then that is because that machine is presenting a cert to the server that identifies it as gate02old (and the server is accepting that cert). The sequence of events presented does not explain how that might be, but my first guess would be that it is actually (new) gate02old that is running and requesting catalogs from the server, not (new) gate02.
Fix it by
Ensuring that the machine you want running is in fact the one that is running, and that its hostname is in fact what you intend for it to be.
Shutting down the agent on (new) gate02. If it is running in daemon mode then shut down the daemon and disable it. If it is being scheduled by an external scheduler then stop and disable it there. Consider also using puppet agent --disable.
Deactivating the node on the server and cleaning out its data, including certs:
puppet node deactivate gate02
puppet node deactivate gate02old
puppet node deactivate gate03
You may want to wait a bit at this point, then ...
puppet node clean gate02
puppet node clean gate02old
puppet node clean gate03
Cleaning out the nodes' certs. For safety, I would do this on both nodes. Removing /opt/puppetlabs/puppet/ssl (on the nodes, not the server!) should work for that, or you could remove the puppet-agent package altogether, remove any files left behind, and then reinstall.
Updating the puppet configuration on the new gate02 as appropriate.
Re-enabling the agent on gate02, and starting it or running it in --test mode.
Signing the new CSR (unless you have autosigning enabled), which should have been issued for gate02 or whatever certname is explicitly specified in in that node's puppet configuration.
Thanks for the answer, though it was not the right one.
I did get to the right point by changing again the hostname of the old gateold02 to a another existing-for-testing one, starting the server and get it back in Foreman. Once that done, removing (again!) the certs of the new gate02 put it right, and its reports now updates the right entry in Foreman.
I still beleive there is something (a db ?) that was not updated right so foreman was sure that the host called 'gate02' was in the GUI 'gateold02'.
I am very sorry if you don't beleive me.
Not to say rather disappointed.
Cheers.

Problem communicating over a local area network (LAN) with ROS on WSL2

I am a developer of ROS projects. Recently I am trying using ROS(melodic) on WSL2(Windows Subsystem for Linux), and all things works just great. But I got some trouble when I want to use another PC which also in the same local area network(LAN) to communicate with. Before setting the environment variables like "ROS_MASTER_URI, ROS_IP", I know that since WSL 2 work on Hyper-V so the IP show on WSL2 is not the one in the real LAN. I have to do some command like below in order to make everyone in LAN communicate with the specific host:PORT on WSL2.
netsh interface portproxy delete v4tov4 listenport=$port listenaddress=$addr
But here comes a new question:
The nodes which use TCPROS to communicate with each other have a random PORT every time I launch the file.
How can I handle this kind of problem?
Or is there any information on the internet that I can have a look?
Thank you.
The root problem is described in WSL issue #4150. To quote from that thread,
WSL 2 seems to NAT it's virtual network, instead of making it bridged
to the host NIC.
Option 1 - Port forwarding script on login
Note: From #kraego's comment (and the edited question, which I'm just seeing based on the comment), this is probably not a good option for ROS, since the port numbers are randomly assigned. This makes port forwarding something that would have to be dynamically done.
There are a number of workarounds described in that issue, for which you've already figured out the first part (the port forwarding). The primary technique seems to be to create a PowerShell script to detect the IP address and create the port forwarding rules that runs upon Windows login. This particular comment near the top of the thread seems to be the canonical go-to answer, although many people have posted their tweaks or alternatives throughout the very long thread.
One downside - I believe the script that is mentioned there needs to be run at logon since the WSL subsystem seems to only want to run when a user is logged in. I've found that attempting to run a WSL service or instance through Windows OpenSSH results in that instance/service shutting down soon after the SSH session is closed, unless the user is already logged into Windows with a WSL instance opened.
Option 2 - WSL1
I would also propose that, assuming it fits your workflow and if the ROS works on it (it may not, given the device access you need, but not sure), you can simply use WSL1 instead of WSL2 to avoid this. You can try this out by:
Backing up your existing distro (from PowerShell or cmd, use wsl --export <DistroName> <FileName>
Import the backup into a new WSL1 instance with wsl --import <NewDistroName> <InstallLocation> <FileNameOfBackup> --version 1
It's possible to simply change versions in place, but I tend to like to have a backup anyway before doing it, and as long as you are backing up, you may as well leave the original in place.

What is the difference between cold and hot reboot in openstack

I am new to openstack for virtualization.
I can reboot instance by 2 ways: cold and hard reboot.
I can understand the difference on a physical computer, but what is the difference between cold and hot reboot on a VM ?
Thanks
Apart from the documentation here that it's already mentioned on this thread:
http://docs.openstack.org/user-guide/cli-reboot-an-instance.html
A hard-reboot also affects the virtual machine at hypervisor level. Example: If you are using libvirt-based hypervisors (qemu/kvm), the instance control file (the libvirt XML representing the virtual machine in Libvirt) get's reconstructed from scratch.
That's very usefull when for any reason the instance storage space (/var/lib/nova/instances/INSTANCE_UUID) suffers any kind of problem, or, in general for any reason that you need OpenStack to reconstruct the libvirt definitions !.
It affects both the XML libvirt definition normally stored at /etc/libvirt/qemu and the copy at /var/nova/instances/INSTANCE_UUID.
So, in resume: Use hard-reboot if you need to fully reset/reboot the instance up to Hypervisor level. As you can see, is more like a "power-cycle with steroids".
Hope this helps !!

Kubernetes and MPI

I want to run an MPI job on my Kubernetes cluster. The context is that I'm actually running a modern, nicely containerised app but part of the workload is a legacy MPI job which isn't going to be re-written anytime soon, and I'd like to fit it into a kubernetes "worldview" as much as possible.
One initial question: has anyone had any success in running MPI jobs on a kube cluster? I've seen Christian Kniep's work in getting MPI jobs to run in docker containers, but he's going down the docker swarm path (with peer discovery using consul running in each container) and I want to stick to kubernetes (which already knows the info of all the peers) and inject this information into the container from the outside. I do have full control over all the parts of the application, e.g. I can choose which MPI implementation to use.
I have a couple of ideas about how to proceed:
fat containers containing slurm and the application code -> populate
the slurm.conf with appropriate info about the peers at container
startup -> use srun as the container entrypoint to start the jobs
slimmer containers with only OpenMPI (no slurm) -> populate a
rankfile in the container with info from outside (provided by
kubernetes) -> use mpirun as the container entrypoint
an even slimmer approach, where I basically "fake" the MPI runtime by
setting a few environment variables (e.g. the OpenMPI ORTE ones) ->
run the mpicc'd binary directly (where it'll find out about its peers
through the env vars)
some other option
give up in despair
I know trying to mix "established" workflows like MPI with the "new hotness" of kubernetes and containers is a bit of an impedance mismatch, but I'm just looking for pointers/gotchas before I go too far down the wrong path. If nothing exists I'm happy to hack on some stuff and push it back upstream.
I tried MPI Jobs on Kubernetes for a few days and solved it by using dnsPolicy:None and dnsConfig (CustomDNS=true feature gate will be needed).
I pushed my manifests (as Helm chart) here.
https://github.com/everpeace/kube-openmpi
I hope it would help.
Assuming you don't want to use hw-specific MPI library (for example anything that uses direct access to communication fabric), I would go with option 2.
First, implement a wrapper for mpirun which populates necessary data
using kubernetes API, specifically using endpoints if using a
service (might be a good idea), could also scrape pod's exposed
ports directly.
Add some form of checkpoint program that can be used for
"rendezvous" synchronization before starting actual running code (I
don't know how well MPI deals with ephemeral nodes). This is to
ensure that when mpirun starts it has stable set of pods to use
And finally actually build a container with necessary code and I
guess SSH service for mpirun to use for starting processes in
other pods.
Another interesting option would be to use Stateful Sets, possibly even running with SLURM inside, which implement a "virtual" cluster of MPI machines running on kubernetes.
This provides stable hostnames for each node, which would reduce the problem of discovery and keeping track of state. You could also use statefully-assigned storage for container's local work filesystem (which, with some work, could be made to for example always refer to same local SSD).
Another benefit is that it would be probably least invasive to the actual application.

devstack multi node installation

I have 3 nodes which i am using for multi node setup. I am thinking of following the below structure
Controller: keystone, horizon, g-reg, g-api, n-api, n-crt, n-sch, n-cond, n-cauth, n-obj, n-novnc, n-xvnc, c-api, c-sch (this node will have mysql and rabbitmq as well)
Network: q-svc, q-agt, q-dhcp, q-l3, q-meta, quantum
Compute: n-cpu, c-vol
I have a few questions. 1. In Compute node, do i need to keep n-api? Also what else is needed apart from n-api and c-vol? Is q-agt needed in compute? 2. Will i need c-api along with c-vol? Does compute node need rabbit mq installed?
Q1)
You don't want the nova-api on the compute nodes generally. It's better on the controller.
Nova api makes use of pasted hard system credentials and you don't want that paste file exposed on any node that a user may compromise with a hypervisor escape.
nova-compute and nova-volume is all you probably need. they do communicate with the scheduler over rabbitmq so make sure that's working =P
Q2)
You don't NEED cinder to run an openstack cloud, though I see no reason not to include it.
I don't know what impact disabling cinder has on the devstack stack.sh script, I've never done it.
As per RabbitMQ see above answer.

Resources