2017/06/29 18:37:56 [crit] 2470#2470: ngx_slab_alloc() failed: no memory in upstream zone "backends"
2017/06/29 18:37:56 [error] 2470#2470: cannot add new server to upstream "<redacted>", memory exhausted
I am receiving a stream of critical errors in my log that indicate various upstream zones cannot be added to various upstreams because memory is exhausted.
That said, I have plenty of free memory. I'm guessing I need to increase some setting somewhere, but for the life of me, The Google doesn't seem to be able to tell me what I need to increase.
We use nginx as a reverse proxy for service discovery with our AWS ECS Docker container cluster.
Turns out the problem was that I had the zone "backends" set too small.. I increased that from 64k to 256k and all is fine now. It also turns out that I should probably be using a different zone for each upstream rather than a shared one for all my upstreams.
This answer was compliments of the nginx professional support team. We're using a licensed version of nginx. Awesome support!
Related
I have a fully working config doing loadbalancing over 2 upstream servers.
I want to use "least_conn"
When I put least_conn in, it still is doing round robin.
I can confirm that other configs like "weight' and "ip_hash" are working as
expected.
Is there some other configuration/setting that also affects whether
least_conn is honored?
This is using nginx 1.18.0
For future reference. It was indeed working. Turns out that there is quite a large amount of buffer (default is ~1GB between a temp file and some in memory buffers) and the files that were being transferred were a little less than that.
I have a simple location block in nginx conf which echo's back the Server's ip. Its deployed on a 2 core 4gb ram EC2. i am able to get 400 req per second on load testing it.
Made optimizations like logs buffering, opening more FD, followed guidelines in http://www.freshblurbs.com/blog/2015/11/28/high-load-nginx-config.html.
The peak cpu load on the node is 4-5% and same for memory. I am wondering how can i blow it up even further. Will using Docker help or the cpu load & memory is irrelevant here as it might be running into network congestion. Will increasing EC2 Node size help ?
OS: CentOS. Any help appreciated. Thanks !
One of our production stacks using NancyFX (self-hosted) on Mono 4.2.3 on Ubuntu 14.04 (on AWS EC2 behind an ELB) serves RESTful HTTP calls from external clients. The processing of those requests, in many occasions, also includes making HTTP calls to other services, database lookups and such.
With a certain amount of incoming requests on each machine (that seems to us the server should be able to handle gracefully), the servers start to stall, new incoming connections run into timeouts, etc.
Using netstat, we saw that there are thousands of sockets in TIME_WAIT state. We presume those are either cause or at least symptom of the problems we're having.
Does somebody with a similar setup have an idea how we could identify and fix the root cause of those problems?
We've experimented with varying methods of making HTTP requests in C#, mono startup parameters (--server and setting the number of threads per cpu to sensible amounts) already but could not stop the servers from stalling and degrading catastrophically.
Thanks for any input!
I'm new to ubuntu and storm , i need to solve this problem
[ERROR] Async loop died!
org.zeromq.ZMQException: Address already in use(0x62)
at org.zeromq.ZMQ$Socket.bind(Native Method)
it appeared in worker log file due to supervisor still hasn't start and by searching found someone wrote that is due to
ephemeral port range was messed
up on the machines
tried to increase /proc/sys/net/ipv4/ip_local_port_range 1024 65000
but not working
This issue is related to already used ports in your system. A port can only be used by a single application. Using lsof -i you can check what application are using which ports. You should spot an conflicting port number. Either terminate this application of change the configuration of Zookeeper or Storm to use a different port.
I have a machine with only nginx installed without passenger that acts as load balancer with ips of some machines in its upstream list. All the app machines have nginx with phusion passenger that serve the main application. Now some of the application machines are of medium type while others are large type. As far as I know the default nginx load balancing scheme is round robin. As the load is distributed among the large and medium machines equally, if the traffic is large the medium machines get overloaded and when its less the large machines resources are wasted. Now I use newrelic to monitor the cpu and memory on these machines and a script to get the data from newrelic, so is there any way to use this data to decide the traffic route on load balancer.
One way I know is to monitor and the mark machines in upstream good or bad and then replace the upstream with the good ones and reload the nginx.conf each time without complete restart. So my second question, is the way correct. In other words does it have any drawbacks or will it cause any issues?
Third and more general question is there a better way tackle this issue of load balancing?
You can use another load balancing algorithm that will distribute load more fair: http://nginx.org/r/least_conn or/and configure weights.
Making decision based on current cpu/memory usage isn't a good idea if your goal is faster request processing instead of meaningless numbers