I am running nginx on a kubernetes cluster having 3 nodes.
I am wondering if there is any benefit of having for example 4 pods and limit their cpu/mem to approx. 1/4 of the nodes capacity vs running 1 pod per node limiting cpu/mem so that pod can use resources of the whole node (for the sake of simplicity, we leave cubernet services out of the equation).
My feeling is that the fewer pods the less overhead and going for 1 pod per node should be the best in performance?
Thanks in advance
With more then 1 Pod, you have a certain high availability. Your pod will die at one point, and if it is behind a controller (which is how is must be), it will be re-created, but you will have a small downtime.
Now, take into consideration that if you deploy more then one replica of your app, even though you give it 1/n resources, there is a base image and dependencies that are going to be replicated.
As an example, let's imagine an app that runs on Ubuntu, and has 5 dependencies:
If you run 1 replica of this app, you are deploying 1 Ubuntu + 5 dependencies + the app itself.
If you are run 4 replicas of this app, you are running 4 Ubuntus + 4*5 dependencies + 4 times the app.
My point is, if your base image would be big, and you would need heavy dependencies, it would be not a linear increase of resources.
Performance-wise, I don't think there is much difference. One of your nodes will be heavily bombed as all your requests will end up there, but if your nodes can handle it, there should be no problem.
What you are referring to is the difference between horizontal and vertical scaling. Regarding vertical scaling, you would increase the resources of your application as you see fit. Otherwise, you would scale horizontally by increasing the amount of replicas of your application.
Doing one or the other depends on features that you application may or may not have. In the case of nginx scaling horizontally would split traffic per pod and also per node which would result in a better throughput for your most likely reverse proxy.
Related
I ran into this issue the other day and I'm not sure if this the correct cause. Essentially I am spinning up 10 KubernetesPodOperators in parallel in airflow. When I request the 10 pods, the nodes will autoscale to meet the resource requirements of those 10 pods. However, once let's say 8/10 pods have completed their task, the autoscaler will scale down the nodes, which seemed to crash my currently running remaining 2 pods (as I assume they are being placed onto a new node). When I set autoscale to "off" in kubernetes and predefined the correct number of nodes, my 10 pods run fine. Does this logic make sense? If so has anyone faced a similar issue and if so is there any way around this? We are running airflow in an Azure AKS instance.
Thanks,
Is it possible to limit replication on OpenStack Swift to one or possibly 2? 3 data replicas are currently being created.
The default replication in Swift is 3 replicas, but you can change that to any number greater or equal to 1. The swift-ring-builder command has a set-replicas verb for adjusting the number of replicas; see
https://docs.openstack.org/swift/xena/admin/objectstorage-ringbuilder.html#replica-counts
Of course, there is a tradeoff between the number of replicas and your cluster's ability to cope with loss of disk drives and servers.
You also asked this:
If I don't want replication on Openstack Swift(replication equal to 1), can I achieve this by disabling replicators? sudo swift-init object-replicator stop If I may, is this a better option?
I guess it would probably work ... until someone or something restarts the replicators.
But it is not a good idea:
Monitoring may complain that the replicators are not working(!)
Other swift tools may complain that the cluster is out of spec; e.g. swift-recon --replication.
If you have been previously running with (say) a 3 replica policy, then nothing will delete the (now) redundant replicas created before you turned off the replicators.
I have 2 nfs mounts of 100TB each i.e. 200TB in total. I have mounted these 2 on Kubernetes container. My file server is a typical log server that holds a mix of data types like JSON, HTML, images, logs and text files, etc. The size of files also varies a lot. I am kind of guessing what should be the ideal resource request for this kubernetes container? My assumption,
As this is file reads its i/o intensive operation, CPU should be high
Since we may have a large file size transferred over, Memory should also be high.
Just wanted to check if my assumptions are right?
Posting this community wiki answer to set a baseline and to show one possible set of actions that should led to solution.
Feel free to edit and expand.
As I stated previously, this setup will heavily depend on case to case basis and giving the approximate could be misleading. In my opinion the best course of actions to take would be:
Install monitoring tools
Deploy the application for testing
Simulate the load
Install monitoring tools
There are a lot of monitoring tools that can retrieve the data about the CPU and Memory usage of your Pods. You will need to choose the one that suits your workloads and infrastructure best.
Some of them are:
Prometheus.io
Elastic.co
Datadoghq.com
Deploy the application for testing
This can also be a quite wide topic considering the fact that the exact requirements and the infrastructure is not known. One of many questions is if the Deployment should have a steady replica amount or should use some kind of Horizontal Pod Autoscaling (basing on CPU and/or Memory). The access modes on the storage shouldn't matter as NFS supports RWX.
The basic implementation of the Deployment that could be used can be found in the official Kubernetes documentation:
Kubernetes.io: Docs: Concepts: Workloads: Controllers: Deployment: Creating a deployment
Kubernetes.io: Docs: Concepts: Storage: Volumes: NFS
Simulate the load
The simulation part could go either as a real life usage or by using a tool to simulate the load. You would need in this part to choose the option/tool that suits your requirements the most. This part will show you the approximate resources that should be allocated to your nginx file explorer.
A side note!
In my testing I've used ab to check if the load was divided equally by X amount of replicas.
Additional resources
I do recommend to check the official guide on official Kubernetes documentation regarding managing resources:
Kubernetes.io: Docs: Concepts: Configuration: Manage resources containers
I also think that the VPA could help you in the whole process as:
Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.
It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.
-- Github.com: Kubernetes: Autoscaler: Vertical Pod Autoscaler
I'd reckon you could also look on this answer:
Stackoverflow.com: Answers: PromQL query to find CPU and memory used for the last week
A flask-api (using gunicorn) is used as an inference api of a deep learning model.
This specific inference process is very cpu intensive (not using gpu yet).
What is the best practice of deploying it to a kubernetes cluster, based on these aspects:
should I create multiple pods handling requests using single gunicorn worker or less pods enabling gunicorn multiple workers? (node memory footprint)
since google provides to expose your deployment as a service using an external loadbalancer,
do I need a nginx web server on my flask-gunicorn stack?
creating multiple identical pods on the same node, is it more memory intensive than handling all these requests using multithreading on a single pod?
More smaller pods is generally better, provided you're staying under "thousands". It is easier for the cluster to place a pod that requires 1 CPU and 1 GB of RAM 16 times than it is to place a single pod that requires 16 CPU and 16 GB RAM once. You usually want multiple replicas for redundancy, to tolerate node failure, and for zero-downtime upgrades in any case.
If the Istio Ingress system works for you, you may not need separate a URL-routing layer (Nginx) inside your cluster. If you're okay with having direct access to your Gunicorn servers with no routing or filtering in front of that, directly pointing a LoadBalancer Service at them is a valid choice.
Running 16 copies of 1 application will generally need more memory than 1 copy with 16 threads; how much more depends on the application.
In particular, if you load your model into memory and the model itself is large, but your multi-threaded setup can share a single copy of it, 1 large pod could use significantly less memory than 16 small pods. If the model is COPYed directly into the Docker image and the application code mmap()s it then you'd probably get to share memory at the kernel layer.
If the model itself is small and most of the memory is used in the processing, it will still use "more" memory to have multiple pods, but it would just be the cost of your runtime system and HTTP service; it shouldn't substantially change the memory required per thread/task/pod if that isn't otherwise shared.
I research about Kubernetes and actually saw that they do load balancer on a same node. So if I'm not wrong, one node means one server machine, so what good it be if doing load balancer on the same server machine. Because it will use same CPU and RAM to handle requests. First I thought that load balancing would do on separate machine to share resource of CPU and RAM. So I wanna know the point of doing load balancing on same server.
If you can do it on one node , it doesn't mean that you should do it , specially in production environment.
the production cluster will have least 3 or 5 nodes min
kubernetes will spread the replicas across the cluster nodes in balancing node workload , pods ends up on different nodes
you can also configure on which nodes your pods land
use advanced scheduling , pod affinity and anti-affinity
you can also plug you own schedular , that will not allow placing the replica pods of the same app on the same node
then you define a service to loadbalance across pods on different nodes
kube proxy will do the rest
here is a useful read:
https://itnext.io/keep-you-kubernetes-cluster-balanced-the-secret-to-high-availability-17edf60d9cb7
So you generally need to choose a level of availability you are
comfortable with. For example, if you are running three nodes in three
separate availability zones, you may choose to be resilient to a
single node failure. Losing two nodes might bring your application
down but the odds of loosing two data centres in separate availability
zones are low.
The bottom line is that there is no universal approach; only you can
know what works for your business and the level of risk you deem
acceptable.
I guess you mean how Services do automatical load-balancing. Imagine you have a Deployment with 2 replicas on your one node and a Service. Traffic to the Pods goes through the Service so if that were not load-balancing then everything would go to just one Pod and the other Pod would get nothing. You could then handle more load by spreading evenly and still be confident that traffic will be served if one Pod dies.
You can also load-balance traffic coming into the cluster from outside so that the entrypoint to the cluster isn't always the same node. But that is a different level of load-balancing. Even with one node you can still want load-balancing for the Services within the cluster. See Clarify Ingress load balancer on load-balancing of external entrypoint.