I am looking to create a sidecar container while using KubernetesPodOperator. I am seeing options to create init container with pod_mutation_hook but I am not seeing an option to create a sidecar. If I create a init container that has to complete before I can start my actual container, but I don't want that and need the sidecar to be running as long as my main container in the port is alive.
A sidecar is a container than supports the main container. With Composer the main container is airflow. Composer is a managed product and does not need additional support. Keep in mind that you can always increase the machine type if you need more processing power.
If you are looking forward to use the sidecar with pods that you would create with the KubernetesPodOperator you can have the sidecar defined in the Airflow settings file using pod_mutation_hook.
You can use pod_mutation_hook, this is defined in airflow_local_settings.
Related
I am trying to migrate our monolithic PHP Symfony app to a somewhat more scalable solution with Docker. There is some communication between the app and RabbitMQ, and I use docker-compose to bring all the containers up, in this case the app and the RabbitMQ server.
There is a lot of discussions around the topic that one container should spawn only one process, and the Docker best practices is somewhat vague regarding this point:
While this mantra has good intentions, it is not necessarily true that
there should be only one operating system process per container. In
addition to the fact that containers can now be spawned with an init
process, some programs might spawn additional processes of their own
accord.
Does it make sense to create a separate Docker container for each RabbitMQ consumer? It kind of feels "right" and "clean" to not let rabbitmq server know of language/tools used to process the queue. I came up with (relevant parts of docker-compose.yml):
app :
# my php-fpm app container
rabbitmq_server:
container_name: sf.rabbitmq_server
build: .docker/rabbitmq
ports:
- "15672:15672"
- "5672:5672"
networks:
- app_network
rabbitmq_consumer:
container_name: sf.rabbit_consumer
extends: app
depends_on:
- rabbitmq_server
working_dir: /app
command: "php bin/console rabbitmq:consumer test"
networks:
- app_network
I could run several consumers in the rabbitmq_consumer container using nohup or some other way of running them in the background.
I guess my questions are:
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
Or should they be placed in the background of the app container?
I'll share my experience so think critically about it.
Consumers have to be run in a separate container from a web app. The consumer container runs a process manager like these. Its responsibility is to spawn some child consumer processor, reboot them if they exit, reload on SIGUSR1 signal, shut down them correctly on SIGTERM. If the main process exists the whole container exists as well. You may have a police for this case like always restart. Here's how the consume.php script look like:
<?php
// bin/consume.php
use App\Infra\SymfonyDaemon;
use Symfony\Component\Process\ProcessBuilder;
require __DIR__.'/../vendor/autoload.php';
$workerBuilder = new ProcessBuilder(['bin/console', 'enqueue:consume', '--setup-broker', '-vvv']);
$workerBuilder->setPrefix('php');
$workerBuilder->setWorkingDirectory(realpath(__DIR__.'/..'));
$daemon = new SymfonyDaemon($workerBuilder);
$daemon->start(3);
The container config looks like:
app_consumer:
restart: 'always'
entrypoint: "php bin/consume.php"
depends_on:
- 'rabbitmq_server'
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Unfortunately, the RabbitMQ bundle queue managment leaves much to be desired. By default, you have to run a single command per queue. If you have 100 queues you need 100 processes, at least one per queue. There is a way to configure a multi-queue consumer but requires completely different setup. By the way, enqueue does it a lot better. you can run a single command to consume from all queues at once. The --queue command option allows doing more accurate adjustments.
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
RabbitMQ server should be run in a separate container. I would not suggest adding them to mixing them up in one container.
Or should they be placed in the background of the app container?
I'd suggest having at least two app containers. One runs a web server and serves HTTP request another one runs queue consumers.
In the pre docker 1.9 days I used to have a vpn provider container which I could use as the network gateway for my other containers by passing the option --net=container:[container-name].
This was very simple but had a major limitation in that the provider container had to exist prior to starting the consumers and it could not be restarted.
The new docker networking stack seems to have dropped this provision in favour of creating networks which does sound better, but I'm struggling to get equivalent behaviour.
Right now I have created an internal network docker network create isolated --internal --subnet=172.32.0.0/16 and brought up 2 containers one of which is attached only to internal network and one which is attached to both the default bridge and the internal network.
Now I need to route all network traffic from the isolated container through the connected one. I've messed around with some iptable rules but tbh this is not my strongest area.
So my questions are simply: Is my approach along the right lines? What rules need to be in place in the two containers to get this working as --net=container?
I want to run an MPI job on my Kubernetes cluster. The context is that I'm actually running a modern, nicely containerised app but part of the workload is a legacy MPI job which isn't going to be re-written anytime soon, and I'd like to fit it into a kubernetes "worldview" as much as possible.
One initial question: has anyone had any success in running MPI jobs on a kube cluster? I've seen Christian Kniep's work in getting MPI jobs to run in docker containers, but he's going down the docker swarm path (with peer discovery using consul running in each container) and I want to stick to kubernetes (which already knows the info of all the peers) and inject this information into the container from the outside. I do have full control over all the parts of the application, e.g. I can choose which MPI implementation to use.
I have a couple of ideas about how to proceed:
fat containers containing slurm and the application code -> populate
the slurm.conf with appropriate info about the peers at container
startup -> use srun as the container entrypoint to start the jobs
slimmer containers with only OpenMPI (no slurm) -> populate a
rankfile in the container with info from outside (provided by
kubernetes) -> use mpirun as the container entrypoint
an even slimmer approach, where I basically "fake" the MPI runtime by
setting a few environment variables (e.g. the OpenMPI ORTE ones) ->
run the mpicc'd binary directly (where it'll find out about its peers
through the env vars)
some other option
give up in despair
I know trying to mix "established" workflows like MPI with the "new hotness" of kubernetes and containers is a bit of an impedance mismatch, but I'm just looking for pointers/gotchas before I go too far down the wrong path. If nothing exists I'm happy to hack on some stuff and push it back upstream.
I tried MPI Jobs on Kubernetes for a few days and solved it by using dnsPolicy:None and dnsConfig (CustomDNS=true feature gate will be needed).
I pushed my manifests (as Helm chart) here.
https://github.com/everpeace/kube-openmpi
I hope it would help.
Assuming you don't want to use hw-specific MPI library (for example anything that uses direct access to communication fabric), I would go with option 2.
First, implement a wrapper for mpirun which populates necessary data
using kubernetes API, specifically using endpoints if using a
service (might be a good idea), could also scrape pod's exposed
ports directly.
Add some form of checkpoint program that can be used for
"rendezvous" synchronization before starting actual running code (I
don't know how well MPI deals with ephemeral nodes). This is to
ensure that when mpirun starts it has stable set of pods to use
And finally actually build a container with necessary code and I
guess SSH service for mpirun to use for starting processes in
other pods.
Another interesting option would be to use Stateful Sets, possibly even running with SLURM inside, which implement a "virtual" cluster of MPI machines running on kubernetes.
This provides stable hostnames for each node, which would reduce the problem of discovery and keeping track of state. You could also use statefully-assigned storage for container's local work filesystem (which, with some work, could be made to for example always refer to same local SSD).
Another benefit is that it would be probably least invasive to the actual application.
I'm trying to create something like this:
The server containers each have port 8080 exposed, and accept requests from the client, but crucially, they are not allowed to communicate with each other.
The problem here is that the server containers are launched after the client container, so I can't pass container link flags to the client like I used to, since the containers it's supposed to link to don't exist yet.
I've been looking at the newer Docker networking stuff, but I can't use a bridge because I don't want server cross-communication to be possible. It also seems to me like one bridge per server doesn't scale well, and would be difficult to manage within the client container.
Is there some kind of switch-like docker construct that can do this?
It seems like you will need to create multiple bridge networks, one per container. To simplify that, you may want to use docker-compose to specify how the networks and containers should be provisioned, and have the docker-compose tool wire it all up correctly.
Resources:
https://docs.docker.com/engine/userguide/networking/dockernetworks/
https://docs.docker.com/compose/
https://docs.docker.com/compose/compose-file/#version-2
One more side note: I think that exposed ports are accessible to all networks. If that's right, you may be able to set all of the server networking to none and rely on the exposed ports to reach the servers.
Hope this is relevant to your use-case - I'm attempting to draw context regards your actual application from the diagram and comments. I'd recommend you go the Service Discovery route. It may involve a little bit of simple API over a central store (say Redis, or SkyDNS), but would make things simple in the long run.
Kubernetes, for instance, uses SkyDNS to do so with DNS. At the end of the day, any orchestration tool of your choice would most likely do something like this out of the box: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns
The idea is simple:
Use a DNS container that keeps entries of newly spawned servers
Allow the Client Container to query it for a list of servers. e.g. Picture a DNS response with a bunch of server-<<ISO Timestamp of Server Creation>>s
Disallow client containers read-access to this DNS (how to manage this permission-configuration without indirection, i.e. without proxying through an endpoint that allows writing into the DNS Container, but not reading, is going to exotic)
Bonus Edit: I just realised you can use a simpler Redis-like setup to do this, and that DNS might just be overengineering :)
The following documentation link indicates that the docker driver needs to be configured on all compute nodes
from
compute_driver= libvirt.LibvirtDriber
to
compute_driver=docker.DockerDriver
Does this means there will not be an option to select the instantiation of a normal VM ? Will the horizon UI allow to select which type of virtualization ( docker vs kvm ) to be selected ?
In openstack you cannot have hybrid compute drivers unless they are separated by AZs. So it's either one or the other.
Of course the hackish work around would be to spin up an openstack compute instance inside of the docker / lxc environment and join it to a new az as a libvirt node....
a bit of inception there though, and it makes your scheduler basically worthless.
With the basic OpenStack you can't, but you can write and add a filter which makes it possible... Just write a class with a host_passes method and add your new filter to nova scheduler filters.
I did it and it works.