Rate limiting in the built-in HTTP server of Rserve?

Rate limiting in the built-in HTTP server of Rserve? - r

I'm looking into the built-in HTTP server of Rserve (1.8.5) after modifying .http.request() from FastRWeb. It's fine with the updated request function but the issue is, whenever # concurrent requests are high, some/most of them throw the following error.
WARNING: fork() failed in fork_http(): Cannot allocate memory
WARNING: fork() failed in Rserve_prepare_child(): Cannot allocate memory
This is due to there's not enough free memory remaining and it is necessary to limit # requests in one way or another.
I tried a couple of client layers (1) Python's requests + hug libraries, (2) Python's pyRserve + hug libraries where # worker processes are adjusted by # CPUs. Also I tried reverse proxy with Nginx both in a single/multiple container setup (3) (4).
In all the cases, I observe some overhead (~ 300 - 450 ms) compared to the setup of only Rserve with the built-in HTTP server.
I guess using it as it is would be the most efficient option but I'm concerned that it just keeps trying to fork and returns an error. (Besides errors are quickly thrown, it wouldn't be easy to auto-scale with some typical metrics such as CPU utilization or mean response time.)
Can anyone inform if there is a way to enforce rate limiting with/without relying on another tool, which doesn't sacrifice performance?
My Rserve config is roughly as following.
http.port 8000
socket /var/rserve/socket
sockmod 0666
control disable
Also here is a simplified nginx.conf.
worker_processes auto;
events {
worker_connections 1024;
}
http {
upstream backend {
server 127.0.0.1:8000;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}

I was misguided by Locust (load testing tool) that it showed cached output for the setup of Rserve with the built-in HTTP server.
Manual investigation shows Rserve + Nginx returns a slightly improved result.

Related

How can I make nginx handle fastcgi requests concurrently?

Using a minimal fastcgi/nginx configuration on ubuntu 18.04, it looks like nginx only handles one fastcgi request at a time.
# nginx configuration
location ~ ^\.cgi$ {
# Fastcgi socket
fastcgi_pass unix:/var/run/fcgiwrap.socket;
# Fastcgi parameters, include the standard ones
include /etc/nginx/fastcgi_params;
}
I demonstrate this by using a cgi script like this:
#!/bin/bash
echo "Content-Type: text";
echo;
echo;
sleep 5;
echo Hello world
Use curl to access the script from two side-by-side command prompts, and you will see that the server handles the requests sequentially.
How can I ensure nginx handles fastcgi requests in parallel?

Nginx is a non-blocking server, even when working with fcgiwrap as a backend. So the number of nginx processes should not be the cause of the problem. The real solution is to increase the number of fcgiwrap processes using the -c option.
If fcgiwrap is launched with -c2, even with one Nginx worker process, you can run 2 cgi scripts in parallel.

In order to have Nginx handles fastcgi requests in parallel you'll need several things:
Nginx >= 1.7.1 for threadpools, and this configuration:
worker_processes N; // N as integer or auto
where N is the number of processes, auto number of processes will equate the number of cores; if you have many IO, you might want to go beyond this number (having as many processes/threads as cores is not a warranty that the CPU will be saturated).
In terms of NGINX, the thread pool is performing the functions of the delivery service. It consists of a task queue and a number of threads that handle the queue. When a worker process needs to do a potentially long operation, instead of processing the operation by itself it puts a task in the pool’s queue, from which it can be taken and processed by any free thread.
Consequently, you want to choose N bigger than the maximum number of parallel requests. Hence you can pick say 1000, even if you got 4 cores; for IO, threads will only take some memory, not much CPU.
When you have many IO requests with large latencies, you'll also need aio threads in the 'http', 'server', or 'location' context, which is a short for:
# in the 'main' context
thread_pool default threads=32 max_queue=65536;
# in the 'http', 'server', or 'location' context
aio threads=default;
You might see that switching from Linux to FreeBSD can be an alternative when dealing with slow IO. See the reference blog for deeper understanding.
Thread Pools in NGINX Boost Performance 9x! (www.nginx.com/blog)

Reverse Proxy Locking feature

I want to request a Microsoft API which is not protected for concurrent access (pull the data with a GET, then Push it with a POST).
To prevent any weird behavior, I want to use a lock when I'm accessing this Api.
The easiest way I've found out (without messing up the code) is to create a middleware service (it will be targeted instead of the original one).
When requested, it could save a Lock in a redis, and forward the request to Microsoft.
When it's done, the lock is released.
Then if another request is coming to the server, it will be denied, and I'll be able to perform an exponential backoff until the lock is free.
My question is : Do I Have to code this thing, or is this a feature which could be found in an existing reverse proxy ?

To do this in a highly available way, I believe you would need to do what you wrote and use some sort of distributed lock to determine if the API was in use.
However, if high availability is not required, you could use a single HAProxy instance with maxconn set to 1 for that server. You would also want to set timeout queue to something short so that you could handle the 503 response and do the exponential backoff you mention.
backend microsoft_api_backend
timeout queue 2s
server microsoft_api 1.1.1.1:80 check maxconn 1
In Nginx, you could do something equivalent:
upstream microsoft_api {
server 1.1.1.1:80 max_conns=1
queue 1 timeout=2
}
server {
location / {
proxy_pass http://microsoft_api;
}
}

Nginx/uwsgi request buffer or queue

Our web servers run a Python app behind nginx + uwsgi.
Sometimes we have short spikes (2-5x avg no requests) for a second resulting in some requests getting a 502 if there are no workers available to handle them.
Is there a way for nginx or uwsgi to queue these requests up and serve them when workers become available?
It's better with a short increase in response time rather than getting an error ;-)

Add delay between retries when origin is down (Nginx, 502)

Pretty much the tile. Got a node app behind nginx, and when i restart the app i would like nginx to delay the response, and retry doing the request a couple of times with some delay inbetween. Everything that i found would only instantly retry N times, but that obviously is no useful when the app is down for a restart, which is my use-case. Is there some way? I dont even care how hacky it is / if it is, i just need a solution that is not starting a second instance of the app, and killing the first one when the second one started.
Thanks!

You can add same server as multiple upstream and configure proxy_next_upstream, proxy_next_upstream_timeout and proxy_next_upstream_tries options as well. Reference
upstream node_servers {
server 127.0.0.1:12005;
server 127.0.0.1:12005;
}
...
proxy_next_upstream http_502;
proxy_next_upstream_timeout 60;
proxy_next_upstream_tries 3;
However, I would recommend you to use process managers like pm2 which support graceful reload/restart. They are relevant if you are using more than one CPU for your nodejs server in a clustered mode.

Nginx: Call all upstreams at once

How I could call all upstreams at once and return a result from first that respond and the response will not be a 404?
Example:
Call to load balancer at "serverX.org/some-resource.png" creates two requests to:
srv1.serverX.org/some-resource.png
srv2.serverX.org/some-resource.png
srv2 responds faster and the response is shown to the user.
Is this possible at all? :)
Thanks!

Short answer, NO. You can't do exactly what you described with nginx. Come to think of it a bit, this operation can't be called load balancing since the whole back-end gets the total amount of traffic.
A good question is what do you think that you could accomplish with that? Better performance?
You can be sure that you will have better results with simple load balancing between your servers since the will have to handle the half of the traffic.
In case that you have a more complex architecture i.e. different loads from different paths to your backend servers we could discuss a more sophisticated load balancing method.
So if your purpose is sth else than performance there are some things that you can do:
1) After you sent the request to first server you can send it using the post_action to another one.
location ~ ^/*.png {
proxy_pass http://srv1.serverX.org;
...
post_action #mirror_to_srv2;
...
}
location #mirror_to_srv2 {
proxy_ignore_client_abort on;
...
proxy_pass http://srv2.serverX.org;
}
2) The request is available to you in nginx as a variable so with some lua scripting you can send it where ever you want.
Note that the above methods are not useful to tackle performance issues but to enable you to do things like mirroring live traffic to dev servers for test/debug purposes.
Last this one seems to provide the functionality you want but remember that isn't built for the use that you seem to have in mind.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex