I would like to upgrade my weave network from version 2.5.0 to 2.5.2. I understand that it's "as simple" as updating the weave daemonset.... however, i was wondering if there is a way that this can be done with minimal disruption to running pods on the system.
An simple example in my mind would be to:
cordon node1
drain node1 of all pods
update weave on node1
uncordon node1
... then rinse and repeat for each k8s node until all done.
Basing on the weave net documentation
Upgrading the Daemon Sets
The DaemonSet definition specifies Rolling Updates, so when you apply a new version Kubernetes will automatically restart the Weave Net pods one by one.
With RollingUpdate update strategy, after you update a DaemonSet template, old DaemonSet pods will be killed, and new DaemonSet pods will be created automatically, in a controlled fashion.
As i could read in another stackoverflow answer
It is possible to perform rolling updates with no downtime using a DeamonSet as of Today! What you need is to have at least 2 nodes running on your cluster and set maxUnavailable to 1 in your DaemonSet configuration.
Assuming the previous configuration, when an update is pushed, a first node will start updating. The second will waiting until the first completes. Upon success, the second does the same.
The major drawback is that you need to keep 2 nodes runnings continuously or to take actions to spawn/kill a node before/after an update.
So i think the best option for you to upgrade your CNI plugin is using DaemonSet with rolling update and set maxUnavailable to 1 in your DaemonSet configuration.
Related
I want to setup HA for airflow(2.3.1) on centos7. Messaging queue - Rabbitmq and metadata db - postgres. Anybody knows how to setup it.
Your question is very large, because the high availability has multiple level and definition:
Airflow availability: multiple scheduler, multiple workers, auto scaling to avoid pressure, high storage volume, ...
The databases: a HA cluster for Rabbitmq and a HA cluster for postgres
Even if you have the first two levels, how many node you want to use? you cannot put everything in the same node, you need to run one service replica per node
Suppose you did that, and now you have 3 different nodes running in the same data center, what if there is a fire in the data center? So you need to use multiple nodes in different regions
After doing all of above, is there a risk for network problem? of course there is
If you just want to run airflow in HA mode, you have multiple option to do that on any OS:
docker compose: usually we use it for developing, but you can use it for production too, you can create multiple scheduler instances, with multiple workers, it can help you to improve the availability of your service
docker swarm: similar to docker compose with additional features (scaling, multi nodes, ...), you will not find much resources to install it, but you can use the compose files and just do some changes
kubernetes: the best solution, K8S can help you to ensure the availability of your services, easy install with helm
or just running the different services on your host: not recommended, because of manual tasks, and applying the HA is complicated
I ran into this issue the other day and I'm not sure if this the correct cause. Essentially I am spinning up 10 KubernetesPodOperators in parallel in airflow. When I request the 10 pods, the nodes will autoscale to meet the resource requirements of those 10 pods. However, once let's say 8/10 pods have completed their task, the autoscaler will scale down the nodes, which seemed to crash my currently running remaining 2 pods (as I assume they are being placed onto a new node). When I set autoscale to "off" in kubernetes and predefined the correct number of nodes, my 10 pods run fine. Does this logic make sense? If so has anyone faced a similar issue and if so is there any way around this? We are running airflow in an Azure AKS instance.
Thanks,
I have a requirement to deploy an HTTP application in K8s with zero downtime. I also have a restriction of using a single pod (replica=1). But the problem is when I did that some of the HTTP requests get 502 Bad gateway when I did some changes to the K8s pod.
I refer the following two issues [1] [2], but those issues work fine when I have more than a single replica. For a single replica, NGINX ingress still has a slight downtime which is less than 1 millisecond.
The lifecycle spec and rolling updates spec of my deployment set as below according to the answers given by the above issues [1] [2].
spec:
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
type: RollingUpdate
...
spec:
....
lifecycle:
preStop:
exec:
command:
- sleep
- "30"
Note that I have config maps that mount to this deployment. I'm not sure that would affect this downtime or not.
Also, I refer to these two blogs [3] [4], but they did not solve my problem too. But when I refer this blog [4] it shows that K8s can achieve zero downtime even with a single replica. Unfortunately, in [4] he did not use an ingress-nginx controller.
In brief, I wanted to know that, is it possible to achieve zero-downtime in ingress-nginx with a single replication of pod?
References
1 https://github.com/kubernetes/ingress-nginx/issues/489
2 https://github.com/kubernetes/ingress-nginx/issues/322
3 https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes
4 http://rahmonov.me/posts/zero-downtime-deployment-with-kubernetes/
I suppose that your single-pod restriction is at runtime and not during the upgrade, otherwise, you can't achieve your goal.
My opinion is your rolling upgrade strategy is good, you can add a PodDistruptionBudget to manage disruptions to be sure that at least 1 pod is available.
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: sample-pdb
spec:
minAvailable: 1
selector:
matchLabels:
<your_app_label>
Another very important thing is the probes, according to documentation:
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
You should set the liveness probe, but most of all the readiness probe, to return a success response only when your new pod is really ready to accept a new connection, otherwise k8s think that the new pod is up and the old pod will be destroyed before the new one can accept connections.
Nginx as a revers proxy is able to handle a 0 down time if the IP address of the backend didn't change, but in your case I think that the requirements of only 1 replica and the volumes mounted always makes the down process a little bit more slow, is not possible to achieve the 0 down time because if you are mounting the same volume on the new pod this need to wait the old pod to be destroyed and release the volume to start the wake up process.
In your referenced blog post where it explains how to achieve that, the example didn't use volumes and uses a very small image that makes the pull process and wake up very fast.
I recommend you to study your volume needs and try to not have this as a blocking thing on the weak up process.
Using nginx nginx-ingress-controller:0.9.0, below is the permanent state of the google cloud load balancer :
Basically, the single healthy node is the one running the nginx-ingress-controller pods. Besides not looking good on this screen, everything works super fine. Thing is, Im' wondering why such bad notice appears on the lb
Here's the service/deployment used
Am just getting a little lost over how thing works; hope to get some experienced feedback on how to do thing right (I mean, getting green lights on all nodes), or to double check if that's a drawback of not using the 'official' gcloud l7 thing
Your Service is using the service.beta.kubernetes.io/external-traffic: OnlyLocal annotation. This configures it so that traffic arriving at the NodePort for that service will never go a Pod on another node. Since your Deployment only has 1 replica, the only node that will receive traffic is the one where the 1 Pod is running.
If you scale your Deployment to 2 replicas, 2 nodes will be healthy, etc.
Using that annotation is a recommend configuration so that you are not introducing additional network hops.
I have 3 nodes which i am using for multi node setup. I am thinking of following the below structure
Controller: keystone, horizon, g-reg, g-api, n-api, n-crt, n-sch, n-cond, n-cauth, n-obj, n-novnc, n-xvnc, c-api, c-sch (this node will have mysql and rabbitmq as well)
Network: q-svc, q-agt, q-dhcp, q-l3, q-meta, quantum
Compute: n-cpu, c-vol
I have a few questions. 1. In Compute node, do i need to keep n-api? Also what else is needed apart from n-api and c-vol? Is q-agt needed in compute? 2. Will i need c-api along with c-vol? Does compute node need rabbit mq installed?
Q1)
You don't want the nova-api on the compute nodes generally. It's better on the controller.
Nova api makes use of pasted hard system credentials and you don't want that paste file exposed on any node that a user may compromise with a hypervisor escape.
nova-compute and nova-volume is all you probably need. they do communicate with the scheduler over rabbitmq so make sure that's working =P
Q2)
You don't NEED cinder to run an openstack cloud, though I see no reason not to include it.
I don't know what impact disabling cinder has on the devstack stack.sh script, I've never done it.
As per RabbitMQ see above answer.