I am trying to migrate our monolithic PHP Symfony app to a somewhat more scalable solution with Docker. There is some communication between the app and RabbitMQ, and I use docker-compose to bring all the containers up, in this case the app and the RabbitMQ server.
There is a lot of discussions around the topic that one container should spawn only one process, and the Docker best practices is somewhat vague regarding this point:
While this mantra has good intentions, it is not necessarily true that
there should be only one operating system process per container. In
addition to the fact that containers can now be spawned with an init
process, some programs might spawn additional processes of their own
accord.
Does it make sense to create a separate Docker container for each RabbitMQ consumer? It kind of feels "right" and "clean" to not let rabbitmq server know of language/tools used to process the queue. I came up with (relevant parts of docker-compose.yml):
app :
# my php-fpm app container
rabbitmq_server:
container_name: sf.rabbitmq_server
build: .docker/rabbitmq
ports:
- "15672:15672"
- "5672:5672"
networks:
- app_network
rabbitmq_consumer:
container_name: sf.rabbit_consumer
extends: app
depends_on:
- rabbitmq_server
working_dir: /app
command: "php bin/console rabbitmq:consumer test"
networks:
- app_network
I could run several consumers in the rabbitmq_consumer container using nohup or some other way of running them in the background.
I guess my questions are:
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
Or should they be placed in the background of the app container?
I'll share my experience so think critically about it.
Consumers have to be run in a separate container from a web app. The consumer container runs a process manager like these. Its responsibility is to spawn some child consumer processor, reboot them if they exit, reload on SIGUSR1 signal, shut down them correctly on SIGTERM. If the main process exists the whole container exists as well. You may have a police for this case like always restart. Here's how the consume.php script look like:
<?php
// bin/consume.php
use App\Infra\SymfonyDaemon;
use Symfony\Component\Process\ProcessBuilder;
require __DIR__.'/../vendor/autoload.php';
$workerBuilder = new ProcessBuilder(['bin/console', 'enqueue:consume', '--setup-broker', '-vvv']);
$workerBuilder->setPrefix('php');
$workerBuilder->setWorkingDirectory(realpath(__DIR__.'/..'));
$daemon = new SymfonyDaemon($workerBuilder);
$daemon->start(3);
The container config looks like:
app_consumer:
restart: 'always'
entrypoint: "php bin/consume.php"
depends_on:
- 'rabbitmq_server'
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Unfortunately, the RabbitMQ bundle queue managment leaves much to be desired. By default, you have to run a single command per queue. If you have 100 queues you need 100 processes, at least one per queue. There is a way to configure a multi-queue consumer but requires completely different setup. By the way, enqueue does it a lot better. you can run a single command to consume from all queues at once. The --queue command option allows doing more accurate adjustments.
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
RabbitMQ server should be run in a separate container. I would not suggest adding them to mixing them up in one container.
Or should they be placed in the background of the app container?
I'd suggest having at least two app containers. One runs a web server and serves HTTP request another one runs queue consumers.
Related
I have spring boot application with /health endpoint accessible deployed in AWS ECS Fargate. Sometimes the container is stopped with Task failed container health checks message. Sometimes happens once daily, sometimes once a week, maybe depends on the load. This is the healthcheck command specified in the Task Definition:
CMD-SHELL,curl -f http://localhost/actuator/health || exit 1
My question is how to troubleshoot what AWS receive when health-check is failed.
In case anyone else lands here because of failing container health checks (not the same as ELB health checks), AWS provides some basic advice:
Check that the command works from inside the container. In my case I had not installed curl in the container image, but when I tested it from outside the container it worked fine, which fooled me into thinking it was working.
Check the task logs in CloudWatch
If the checks are only failing sometimes (especially under load), you can try increasing the timeout, but also check the task metrics (memory and CPU usage). Garbage collection can cause the task to pause, and if all the vCPUs are busy handling other requests, the health check may be delayed, so you may need to allocate more memory and/or vCPUs to the task.
Thank #John Velonis,
I don't have enough reputation for commenting on your answer, so I post that in a different answer
For my case, the ecs container keeps getting UNKNOWN status from the ecs cluster. But I can access the healthcheck successfully. when I read this post, and check my base image which is node:14.19.1-alpine3.14, it doesn't have curl command.
so I have to install that in the Dockerfile
RUN apk --no-cache add curl
Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server.
Server B is the new server which would host the metadata database on MySQL.
Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server in a cluster? Confused as to what dependencies there are between the processes
This article does an excellent job demonstrating how to cluster Airflow onto multiple servers.
Multi-Node (Cluster) Airflow Setup
A more formal setup for Apache Airflow is to distribute the daemons across multiple machines as a cluster.
Benefits
Higher Availability
If one of the worker nodes were to go down or be purposely taken offline, the cluster would still be operational and tasks would still be executed.
Distributed Processing
If you have a workflow with several memory intensive tasks, then the tasks will be better distributed to allow for higher utilizaiton of data across the cluster and provide faster execution of the tasks.
Scaling Workers
Horizontally
You can scale the cluster horizontally and distribute the processing by adding more executor nodes to the cluster and allowing those new nodes to take load off the existing nodes. Since workers don’t need to register with any central authority to start processing tasks, the machine can be turned on and off without any downtime to the cluster.
Vertically
You can scale the cluster vertically by increasing the number of celeryd daemons running on each node. This can be done by increasing the value in the ‘celeryd_concurrency’ config in the {AIRFLOW_HOME}/airflow.cfg file.
Example:
celeryd_concurrency = 30
You may need to increase the size of the instances in order to support a larger number of celeryd processes. This will depend on the memory and cpu intensity of the tasks you’re running on the cluster.
Scaling Master Nodes
You can also add more Master Nodes to your cluster to scale out the services that are running on the Master Nodes. This will mainly allow you to scale out the Web Server Daemon incase there are too many HTTP requests coming for one machine to handle or if you want to provide Higher Availability for that service.
One thing to note is that there can only be one Scheduler instance running at a time. If you have multiple Schedulers running, there is a possibility that multiple instances of a single task will be scheduled. This could cause some major problems with your Workflow and cause duplicate data to show up in the final table if you were running some sort of ETL process.
If you would like, the Scheduler daemon may also be setup to run on its own dedicated Master Node.
Apache Airflow Cluster Setup Steps
Pre-Requisites
The following nodes are available with the given host names:
master1 - Will have the role(s): Web Server, Scheduler
master2 - Will have the role(s): Web Server
worker1 - Will have the role(s): Worker
worker2 - Will have the role(s): Worker
A Queuing Service is Running. (RabbitMQ, AWS SQS, etc)
You can install RabbitMQ by following these instructions: Installing RabbitMQ
If you’re using RabbitMQ, it is recommended that it is also setup to be a cluster for High Availability. Setup a Load Balancer to proxy requests to the RabbitMQ instances.
Additional Documentation
Documentation: https://airflow.incubator.apache.org/
Install Documentation: https://airflow.incubator.apache.org/installation.html
GitHub Repo: https://github.com/apache/incubator-airflow
All airflow processes need to have the same contents in their airflow_home folder. This includes configuration and dags. If you only want server B to run your MySQL database, you do not need to worry about any airflow specifics. Simply install the database on server B and change your airflow.cfg's sql_alchemy_conn parameter to point to your database on Server B and run airflow initdb from Server A.
If you also want to run airflow processes on server B, you would have to look into scaling using the CeleryExecutor.
I want to run an MPI job on my Kubernetes cluster. The context is that I'm actually running a modern, nicely containerised app but part of the workload is a legacy MPI job which isn't going to be re-written anytime soon, and I'd like to fit it into a kubernetes "worldview" as much as possible.
One initial question: has anyone had any success in running MPI jobs on a kube cluster? I've seen Christian Kniep's work in getting MPI jobs to run in docker containers, but he's going down the docker swarm path (with peer discovery using consul running in each container) and I want to stick to kubernetes (which already knows the info of all the peers) and inject this information into the container from the outside. I do have full control over all the parts of the application, e.g. I can choose which MPI implementation to use.
I have a couple of ideas about how to proceed:
fat containers containing slurm and the application code -> populate
the slurm.conf with appropriate info about the peers at container
startup -> use srun as the container entrypoint to start the jobs
slimmer containers with only OpenMPI (no slurm) -> populate a
rankfile in the container with info from outside (provided by
kubernetes) -> use mpirun as the container entrypoint
an even slimmer approach, where I basically "fake" the MPI runtime by
setting a few environment variables (e.g. the OpenMPI ORTE ones) ->
run the mpicc'd binary directly (where it'll find out about its peers
through the env vars)
some other option
give up in despair
I know trying to mix "established" workflows like MPI with the "new hotness" of kubernetes and containers is a bit of an impedance mismatch, but I'm just looking for pointers/gotchas before I go too far down the wrong path. If nothing exists I'm happy to hack on some stuff and push it back upstream.
I tried MPI Jobs on Kubernetes for a few days and solved it by using dnsPolicy:None and dnsConfig (CustomDNS=true feature gate will be needed).
I pushed my manifests (as Helm chart) here.
https://github.com/everpeace/kube-openmpi
I hope it would help.
Assuming you don't want to use hw-specific MPI library (for example anything that uses direct access to communication fabric), I would go with option 2.
First, implement a wrapper for mpirun which populates necessary data
using kubernetes API, specifically using endpoints if using a
service (might be a good idea), could also scrape pod's exposed
ports directly.
Add some form of checkpoint program that can be used for
"rendezvous" synchronization before starting actual running code (I
don't know how well MPI deals with ephemeral nodes). This is to
ensure that when mpirun starts it has stable set of pods to use
And finally actually build a container with necessary code and I
guess SSH service for mpirun to use for starting processes in
other pods.
Another interesting option would be to use Stateful Sets, possibly even running with SLURM inside, which implement a "virtual" cluster of MPI machines running on kubernetes.
This provides stable hostnames for each node, which would reduce the problem of discovery and keeping track of state. You could also use statefully-assigned storage for container's local work filesystem (which, with some work, could be made to for example always refer to same local SSD).
Another benefit is that it would be probably least invasive to the actual application.
compose/docker-compose.yml
version: '2'
services:
worker:
image: some-image
manager:
image: some-image
environment:
# number of workers
- INSTANCES=5
networks:
default:
driver: overlay
the workers are scaled with
docker-compose scale worker=5
The manager container is responsible for distributing the workload to the worker containers.
In order to achieve this,
the manager container needs to know how many workers there are and what the hostnames are.
I know that I can reach the first worker container by using the host, "worker" or "compose_worker_1" and the second container by "compose_worker_2".
But how is manager supposed to know how many workers there are?
My current workaround is to specify the number of workers as an environment variable but it seems tedious having already run docker-compose scale.
Is there any other lightweight method I can use to discover the number of workers?
I would have the worker make a connection back to the manager after it starts and is ready to handle work, as a "registration". That way the manger doesn't need to know about anything, it just waits for workers to register themselves.
Is it recommended to launch a docker instance per request?
I have either lighttpd or Nginx running on my web server as a reverse proxy. I support a number of subdomains with very low usage. When a request for the subdomain arrives I want to start the docker instance. Preferable I'd like to launch them dynamically so that if more than one user arrives that I would launch one per user... and/or a shared instance (determined by configuration)
Originally I said this should work well for low traffic sites, but upon further thought, no, this is a bad idea.
Each time you launch a Docker container, it adds a read-write layer to the image. Even if there is very little data written, the layer exists, and each request will generate one. When a single user visits a website, rendering the page will generate 10's to 1000's of requests, for CSS, for javascript, for each image, for fonts, for AJAX, and each of these would create those read-write layers.
Right now there is no automatic cleanup of the read-write layers -- they persist even after the Docker container has exited. By default, nothing is lost.
So, even for a single low traffic site, you would find your disk use growing steadily over time. You could add your own automated cleanup.
Then there is the second problem: anything uploaded to the website would not be available to any other requests unless it was written to some out-of-container shared storage. That's pretty easy to do with S3 or a separate and persistent database service, but it does start showing the weakness in the "one new Docker container per request" approach. If you're going to have some persistent services, why not make the Docker containers more persistent and run them longer?