I have a requirement to deploy an HTTP application in K8s with zero downtime. I also have a restriction of using a single pod (replica=1). But the problem is when I did that some of the HTTP requests get 502 Bad gateway when I did some changes to the K8s pod.
I refer the following two issues [1] [2], but those issues work fine when I have more than a single replica. For a single replica, NGINX ingress still has a slight downtime which is less than 1 millisecond.
The lifecycle spec and rolling updates spec of my deployment set as below according to the answers given by the above issues [1] [2].
spec:
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
type: RollingUpdate
...
spec:
....
lifecycle:
preStop:
exec:
command:
- sleep
- "30"
Note that I have config maps that mount to this deployment. I'm not sure that would affect this downtime or not.
Also, I refer to these two blogs [3] [4], but they did not solve my problem too. But when I refer this blog [4] it shows that K8s can achieve zero downtime even with a single replica. Unfortunately, in [4] he did not use an ingress-nginx controller.
In brief, I wanted to know that, is it possible to achieve zero-downtime in ingress-nginx with a single replication of pod?
References
1 https://github.com/kubernetes/ingress-nginx/issues/489
2 https://github.com/kubernetes/ingress-nginx/issues/322
3 https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes
4 http://rahmonov.me/posts/zero-downtime-deployment-with-kubernetes/
I suppose that your single-pod restriction is at runtime and not during the upgrade, otherwise, you can't achieve your goal.
My opinion is your rolling upgrade strategy is good, you can add a PodDistruptionBudget to manage disruptions to be sure that at least 1 pod is available.
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: sample-pdb
spec:
minAvailable: 1
selector:
matchLabels:
<your_app_label>
Another very important thing is the probes, according to documentation:
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
You should set the liveness probe, but most of all the readiness probe, to return a success response only when your new pod is really ready to accept a new connection, otherwise k8s think that the new pod is up and the old pod will be destroyed before the new one can accept connections.
Nginx as a revers proxy is able to handle a 0 down time if the IP address of the backend didn't change, but in your case I think that the requirements of only 1 replica and the volumes mounted always makes the down process a little bit more slow, is not possible to achieve the 0 down time because if you are mounting the same volume on the new pod this need to wait the old pod to be destroyed and release the volume to start the wake up process.
In your referenced blog post where it explains how to achieve that, the example didn't use volumes and uses a very small image that makes the pull process and wake up very fast.
I recommend you to study your volume needs and try to not have this as a blocking thing on the weak up process.
Related
I've a strange behaviour when trying to do some load testing.
Environment :
NGINX Ingress controller version: 0.44.0
Kubernetes version : 1.17.8
openidc.lua version : 1.7.4
Here's the situation :
The nginx ingress controller is deployed as daemonset, and due to the openidc module, I activated the sessionAffinity to ClientIP.
I have a simple stateless rest service deployed with a basic ingress which is tested for load (no sessionAffinity on that one).
When launching load testing on the rest service without the sessionAffinity ClientIP, I reach far beyond 25 req/s (about 130 req/s before the service resources begin to crash, that's another thing).
But with the sessionAffinity activated, I only reach 25 req/s.
After some research, I found some interesting things, desribed like here : https://medium.com/titansoft-engineering/rate-limiting-for-your-kubernetes-applications-with-nginx-ingress-2e32721f7f57
So the formula, as the load test should always be served by the same nginx pod, should be : successful requests = period * rate + burst
So I did try to add the annotation nginx.ingress.kubernetes.io/limit-rps: "100" on my ingress, but no luck, still the same 25 req/s.
I also tried different combinations of the following annotations : https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#rate-limiting, but no luck either.
Am I missing something ?
In fact, it was more vicious than that.
It had nothing to do with the sessionAffinity, nor the rate limiting (in fact there's none by default, I didn't get it at first, the rate limit is only there if we want to limit for ddos purpose).
The prob was, I added in the configmap the options for modsecurity AND owasp rules.
And because of that, the request processing was so slow, it limited the number of request per seconds. When the sessionAffinity was not set, I didn't see the prob, as the req/s were fair, as distributed among all pods.
But with the sessionAffinity, so a load test on a single pod, the prob was clearly visible.
So I had to remove modsecurity and owasp, and it'll be the apps who will be responsible for that.
A little sad, as I wanted more central security on nginx so apps don't need to handle it, but not at that cost...
I'd be curious to understand what modsecurity is doing exatly to be so slow.
I would like to upgrade my weave network from version 2.5.0 to 2.5.2. I understand that it's "as simple" as updating the weave daemonset.... however, i was wondering if there is a way that this can be done with minimal disruption to running pods on the system.
An simple example in my mind would be to:
cordon node1
drain node1 of all pods
update weave on node1
uncordon node1
... then rinse and repeat for each k8s node until all done.
Basing on the weave net documentation
Upgrading the Daemon Sets
The DaemonSet definition specifies Rolling Updates, so when you apply a new version Kubernetes will automatically restart the Weave Net pods one by one.
With RollingUpdate update strategy, after you update a DaemonSet template, old DaemonSet pods will be killed, and new DaemonSet pods will be created automatically, in a controlled fashion.
As i could read in another stackoverflow answer
It is possible to perform rolling updates with no downtime using a DeamonSet as of Today! What you need is to have at least 2 nodes running on your cluster and set maxUnavailable to 1 in your DaemonSet configuration.
Assuming the previous configuration, when an update is pushed, a first node will start updating. The second will waiting until the first completes. Upon success, the second does the same.
The major drawback is that you need to keep 2 nodes runnings continuously or to take actions to spawn/kill a node before/after an update.
So i think the best option for you to upgrade your CNI plugin is using DaemonSet with rolling update and set maxUnavailable to 1 in your DaemonSet configuration.
Having a cluster running two nodes, getting the following for one of the two kube-proxy pods :
kube-system kube-proxy-gke-app-... 0/1 Init:0/1 0 4m
A describe command more or less tells the same.
Not knowing enough to understand fully what's happening (the whoel cluster works quite fine), I was wondering if such state for one of the pod could have impact, what was the reason and/or the workaround to solve it
This all runs on gke with nginx-ingress-controller through the load balancer
Thx a lot for any help
Using nginx nginx-ingress-controller:0.9.0, below is the permanent state of the google cloud load balancer :
Basically, the single healthy node is the one running the nginx-ingress-controller pods. Besides not looking good on this screen, everything works super fine. Thing is, Im' wondering why such bad notice appears on the lb
Here's the service/deployment used
Am just getting a little lost over how thing works; hope to get some experienced feedback on how to do thing right (I mean, getting green lights on all nodes), or to double check if that's a drawback of not using the 'official' gcloud l7 thing
Your Service is using the service.beta.kubernetes.io/external-traffic: OnlyLocal annotation. This configures it so that traffic arriving at the NodePort for that service will never go a Pod on another node. Since your Deployment only has 1 replica, the only node that will receive traffic is the one where the 1 Pod is running.
If you scale your Deployment to 2 replicas, 2 nodes will be healthy, etc.
Using that annotation is a recommend configuration so that you are not introducing additional network hops.
I am trying to migrate our monolithic PHP Symfony app to a somewhat more scalable solution with Docker. There is some communication between the app and RabbitMQ, and I use docker-compose to bring all the containers up, in this case the app and the RabbitMQ server.
There is a lot of discussions around the topic that one container should spawn only one process, and the Docker best practices is somewhat vague regarding this point:
While this mantra has good intentions, it is not necessarily true that
there should be only one operating system process per container. In
addition to the fact that containers can now be spawned with an init
process, some programs might spawn additional processes of their own
accord.
Does it make sense to create a separate Docker container for each RabbitMQ consumer? It kind of feels "right" and "clean" to not let rabbitmq server know of language/tools used to process the queue. I came up with (relevant parts of docker-compose.yml):
app :
# my php-fpm app container
rabbitmq_server:
container_name: sf.rabbitmq_server
build: .docker/rabbitmq
ports:
- "15672:15672"
- "5672:5672"
networks:
- app_network
rabbitmq_consumer:
container_name: sf.rabbit_consumer
extends: app
depends_on:
- rabbitmq_server
working_dir: /app
command: "php bin/console rabbitmq:consumer test"
networks:
- app_network
I could run several consumers in the rabbitmq_consumer container using nohup or some other way of running them in the background.
I guess my questions are:
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
Or should they be placed in the background of the app container?
I'll share my experience so think critically about it.
Consumers have to be run in a separate container from a web app. The consumer container runs a process manager like these. Its responsibility is to spawn some child consumer processor, reboot them if they exit, reload on SIGUSR1 signal, shut down them correctly on SIGTERM. If the main process exists the whole container exists as well. You may have a police for this case like always restart. Here's how the consume.php script look like:
<?php
// bin/consume.php
use App\Infra\SymfonyDaemon;
use Symfony\Component\Process\ProcessBuilder;
require __DIR__.'/../vendor/autoload.php';
$workerBuilder = new ProcessBuilder(['bin/console', 'enqueue:consume', '--setup-broker', '-vvv']);
$workerBuilder->setPrefix('php');
$workerBuilder->setWorkingDirectory(realpath(__DIR__.'/..'));
$daemon = new SymfonyDaemon($workerBuilder);
$daemon->start(3);
The container config looks like:
app_consumer:
restart: 'always'
entrypoint: "php bin/consume.php"
depends_on:
- 'rabbitmq_server'
Can I somehow automate the "adding a new consumer", so that I would not have to edit the "build script" of Docker (and others, like ansible) every time the new consumer is added from the code?
Unfortunately, the RabbitMQ bundle queue managment leaves much to be desired. By default, you have to run a single command per queue. If you have 100 queues you need 100 processes, at least one per queue. There is a way to configure a multi-queue consumer but requires completely different setup. By the way, enqueue does it a lot better. you can run a single command to consume from all queues at once. The --queue command option allows doing more accurate adjustments.
Does it make sense to separate RabbitMQ server from Consumers, or should I use the Rabbit server with consumers running in the background?
RabbitMQ server should be run in a separate container. I would not suggest adding them to mixing them up in one container.
Or should they be placed in the background of the app container?
I'd suggest having at least two app containers. One runs a web server and serves HTTP request another one runs queue consumers.