I have a service running in low end machine (behind Nginx) and the CPU performance is rather weak. One of the API needs a lots of CPU time, so it's required to limit the max concurrent requests. But if the request is cached, it can response much faster.
What I want to do is limit the max concurrent connections sent to the backend service for the certain API. I researched limit_req and limit_conn, but neither of them satisfies my case. limit_req may cause high load (too many miss) or low load (when most of the requests are cached), it is not easy to determine the value. While limit_conn will drop the rest of the requests (I want them to be queued).
Currently, I'm using apache2 mpm module, but it limits all the requests.
Is it possible to make Nginx keep max connections and make the others wait?
Nginx Cache Based
If many of the requests try to access the exact same data, then you can use the locking mechanism to at least prevent overloading the server when really not useful.
proxy_cache_lock on
I do not know of another solution for your situation. Holding requests when N were already sent to the service does not seem to be an option by default. If you had multiple such servers, you could setup nginx as a load balancer, but that's a quite different concept.
Apache2 Request Based
With Apache you can specify the maximum number of client connections that can be made at once. By setting this value to a very small number, including 1, it will automatically queue additional requests.
MaxRequestWorkers 1
(In older versions, before 2.3.13, use MaxClients)
This is a server configuration so all connections will be affected. Therefore it is important that you run a separate instance for that specific service and route all access through that one specific Apache2 server.
O Internet
|
v
+-------------------+
| | proxy based on URL or such
| Main HTTP Server +---------------+
| | |
+---------+---------+ |
| v
| +-----------------------------------+
v | |
+-------------------+ | Apache with MaxRequestWorkers=1 |
| | | |
| Main Services | +---------+-------------------------+
| | |
+-------------------+ |
v
+--------------------------+
| |
| Slow Service Here |
| |
+--------------------------+
Related
I want to write a Proxy Server for SMB2 based on Asio and consider using a cumulative buffer to receive a full message so as to do business logic, and introducing a queue for multiple messages which will force me to synchronize the following resouce accesses:
the read and write operation on the queue because the two upstream/downstream queue are shared the frontend client and the backend server,
the backend connection state because reads on the frontend won't wait for the completion of connect or writes on the backend server before the next read, and
the resource release when an error occurs or a connection is normally closed because both read and write handlers on the same scoket registered with the EventLoop are not yet completed and a asynchronous connect operation can be initiated in worker threads while its partner socket has been closed, and those may run concurrently.
If not using the two queues, only one (read, write and connect) handler is register with the EventLoop on the proxy flow for a request, so no need to synchronize.
From the Application level,
I think a cumulative buffer is generally a must in order to process a full message packet (e.g. a message in the fomat | length (4 bytes) | body (variable) |) after multiple related API calls (System APIs: recv or read, or Library APIs: asio::sync_read).
And then, is it necessary to use a queue to save messages received from clients and pending to be forwarded to the backend server
use the following diagram from http://www.partow.net/programming/tcpproxy/index.html, it turned out to have similar thoughts to mine (the upstream concept as in NGINX upstream servers).
---> upstream ---> +---------------+
+---->------> |
+-----------+ | | Remote Server |
+---------> [x]--->----+ +---<---[x] |
| | TCP Proxy | | +---------------+
+-----------+ | +--<--[x] Server <-----<------+
| [x]--->--+ | +-----------+
| Client | |
| <-----<----+
+-----------+
<--- downstream <---
Frontend Backend
For a Request-Response protocol without a message ID field (useful for matching each reply message to the corresponding request message), such as HTTP, I can use one single buffer for every connection in the two downstream and upstream flows, and then continue processing the next request (note for the first request, a connection to the server is attempted, so it's slower than the subsequent processes), because clients always wait (may block or get notified by an asynchronous callback function) for the response after sending requests.
However, for a protocol in which clients don't wait for the response before sending the next request, a message ID field can be used to uniquely identify or distinguish request-replies pairs. For example, JSON-RPC 2.0, SMB2, etc. If I strictly complete the two above flows regardless of next read (without call to read and make TCP data accumulated in kernel), the subsequent requests from the same connection cannot be timely processed. After reading What happens if one doesn't call POSIX's recv “fast enough”? I think it can be done.
I also did a SMB2 proxy test using one single buffer for the two downstream and upstream flows on windows and linux using the ASIO networking library (also included in Boost.Asio). I used smbclient as a client on linux to create 251 of connections (See the following command):
ft=$(date '+%Y%m%d_%H%M%S.%N%z'); for ((i = 2000; i <= 2250; ++i)); do smbclient //10.23.57.158/fromw19 user_password -d 5 -U user$i -t 100 -c "get 1.96M.docx 1.96M-$i.docx" >>smbclient_${i}_${ft}_out.txt 2>>smbclient_${i}_${ft}_err.txt & done
Occasionally, it printed several errors, "Connection to 10.23.57.158 failed (Error NT_STATUS_IO_TIMEOUT)". But if increasing the number of connections, the number of errors would increase, so it's a threshold? In fact, those connections were completed within 30 seconds, and I also set the timeout for smbclient to 100. What's wrong?
Now, I know those problems need to be resolved. But here, I just want to know "Is it necessary to use a queue to save messages received from clients and pending to be forwarded to the backend server?" so I can determine my goal because it causes a great deal of difference.
Maybe they cannot care about the application message format, the following examples will reqest the next read after completing the write operation to it peer.
HexDumpProxyFrontendHandler.java or tcpproxy based on c++ Asio.
Other References
[Computer Networks: A Systems Approach] 5.3 Remote Procedure Call - Overcoming Network Limitations
[Computer Networks: A Systems Approach] 5.3 Remote Procedure Call - Overcoming Network Limitations at github
JSON RPC at wikipedia
I have 8 verticle in my application. Each Verticle is on a separate thread. Each Verticle has an WebClient ( Vert.x HTTP client)
I am setting the MaxPoolSize to 10.
WebClientOptions webClientOptions = new WebClientOptions() .setMaxPoolSize(10)
However when I checked with
/usr/sbin/ss -o state established -tn | tail -n +2 | awk '{ print $4 }' | sort |uniq -c | sort -n
On a production host, I can see that there are more than 10 connections per IP:Port.
Question 1:
Is MaxPoolSize global for the entire application or per verticle.
So for X.X.X.X:Y can I created 10 connections or 80 from my application?
Question 2:
When I send a request to a host that has more than one IP in its DNS, would the connection pool be per host, or per IP?
For example gogo.com resolves to 2 IP addresses. Can I create 10 connections to gogo.com 20?
To understand how it works, let's look at the actual code of HttpClientImpl.
You would be most interested in this part:
https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/http/impl/HttpClientImpl.java#L161
As you can see, each WebClient/HttpClient has its own connection pool. So, 8 clients with maxPool of 10 will result in 80 connections.
As to you second question, the connection is per host, not IP, as far as I know and can see from the code. So you'll always be able to establish up to 10 connections:
https://github.com/eclipse/vert.x/blob/39c22d657d2daf640cfbdd8c63e5110fc73474fb/src/main/java/io/vertx/core/http/impl/ConnectionManager.java#L56
Footnote: this is all true only if you don't touch http2MaxPoolSize. If you do, the math is a bit different.
I have a backend process that does work on my database. That's used on a separate computer so that way the frontend works miracles (in terms of speed at least). That backend process creates a UDP server and listen for packets on it.
On the frontend computer, I create child process from a server. Each child may create data in the database that require the backend to do some more work. To let the backend know, I send a PING using a UDP client connection.
Front End / Backend Setup Processing
+-------+ +---------+ +----------+
| | | | | Internet |
| Front | PING | Backend | | Client |
| End |-------->| | +----------+
| | | | HTTP Request |
+-------+ +---------+ v
^ ^ +----------+
| | | FrontEnd |--------+
| | +----------+ PING |
v v HTTP Response | v
+---------------------------+ v +---------+
| | +----------+ | Backend |
| Cassandra Database | | Internet | +---------+
| | | Client |
+---------------------------+ +----------+
Without the PING, the backends ends its work and falls asleep until the next PING wakes it up. Although there is a failsafe, I put a timeout of 5 minutes so the backend wakes up once in a while no matter what.
My question here is about the UDP stack, I understand it is a FIFO, but I am wondering about two parameters:
How many PING can I receive before the FIFO gets full?
May I receive a PING and lose it if I don't read it soon enough?
The answer to these questions can help me adjust the current waiting loop of the backend server. So far I have assumed that the FIFO had a limit and that I may lose some packets, but I have not implemented a way to allow for packets disappearing (i.e. someone sends a PING, but the backend takes too long before checking the UDP stack again and thus the network decides that the packet has now timed out and removes it from under my feet.)
Update: I added a simple processing to show what happens when (it is time based from top to bottom)
How many PING can I receive before the FIFO gets full?
It depends on the size of your socket receive buffer.
May I receive a PING and lose it if I don't read it soon enough?
Yes and no. Datagrams which have been received and which fit into the socket receive buffer remain there until they have been read or the socket is closed. You can tune the size of the socket receive buffer within limits. However a datagram that arrives when the socket receive buffer is full are dropped.
You can set the default buffer size on your system with sysctl, or set it per socket using setsockopt with the SO_RCVBUF option.
int n = 512 * 1024; // 512K
if (setsockopt(my_socket, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n)) == -1) {
perror("Failed to set buffer size, using default");
}
There is also a maximum set on the system that you can't go over. On my machine default receiving buffer size is 208K and max is 4M:
# sysctl net.core.rmem_max
net.core.rmem_max = 4194304
# sysctl net.core.rmem_default
net.core.rmem_default = 212992
I've successfully written a UDP client-server chat application but my way of handling requests and responses is hacky and not very scalable. The server basically listens for messages coming in and then runs some code depending on the message type:
if command == "CONN":
# handle new connection from client then send "OK"
if command == "MSG":
# send message to other connected clients
...
I'm happy with the design of the server but the client is really fiddly.
Here's a sample of the commands the client can send from the server:
Command Name | Argument | Outcome/Description
------------------------------------------------------------------------------
CONN | username | OK, ERR, or timeout if server isn't running
MSG | message | -
USRS | - | ["username1", "username2"]
QUIT | - | -
And receive from the server:
USRC | username | new user connected
USRD | username | user disconnected
MSG | username, message | print message from user
SHDW | - | server shut down
Basically I'm having trouble building a system that will handle these different sets of commands and responses. I know I have a state machine of sorts and can conceptualize a solution in my head, I just don't seem to be able to translate this to anything other than:
socket.send("CONN username")
if response == "OK":
# connected to the server ok
if response == "ERR":
# oops, there was a problem of sorts
# otherwise handle timeout
socket.send("USRS")
if response == "":
# no other users connected
else:
# print users
# start main listening loop
while True:
# send typed text as MSG
# handle any messages received from the server on separate thread
Any help appreciated and apologies for the weird python-esqe pseudocode.
To make things more scalable, your application could benefit from using multiple threads on both client side and the server side. Be sure to use locks when handling common data.
First, the client side could certainly benefit from using three threads. The first thread can listen for input from server (the recvfrom() call). The second thread can listen for input from the user and put these messages in a queue. The third thread can process messages from the queue and call socket.send() to send those messages to the server.
Since the server is handling multiple clients, it could also benefit from having threads to listen for messages from the client and to process them. Once again, you could use one thread to get messages from the client and then queue them. You can use second thread to process the received messages (make sure to store client information) and call sendto() to send responese; btw, recvfrom() does provide client information.
I am working on a websocket implementation and do not know what the sense of a mask is in a frame.
Could somebody explain me what it does and why it is recommend?
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Websockets are defined in RFC6455, which states in Section 5.3:
The unpredictability of the masking key is
essential to prevent authors of malicious applications from selecting
the bytes that appear on the wire.
In a blog entry about Websockets I found the following explanation:
masking-key (32 bits): if the mask bit is set (and trust me, it is if you write for the server side) you can read for unsigned bytes here which are used to xor the payload with. It's used to ensure that shitty proxies cannot be abused by attackers from the client side.
But the most clearly answer I found in an mailing list archive. There John Tamplin states:
Basically, WebSockets is unique in that you need to protect the network
infrastructure, even if you have hostile code running in the client, full
hostile control of the server, and the only piece you can trust is the
client browser. By having the browser generate a random mask for each
frame, the hostile client code cannot choose the byte patterns that appear
on the wire and use that to attack vulnerable network infrastructure.
As kmkaplan stated, the attack vector is described in Section 10.3 of the RFC.
This is a measure to prevent proxy cache poisoning attacks1.
What it does, is creating some randomness. You have to XOR the payload with the random masking-key.
By the way: It isn't just recommended. It is obligatory.
1: See Huang, Lin-Shung, et al. "Talking to yourself for fun and profit." Proceedings of W2SP (2011)
From this article:
Masking of WebSocket traffic from client to server is required because of the unlikely chance that malicious code could cause some broken proxies to do the wrong thing and use this as an attack of some kind. Nobody has proved that this could actually happen, but since the fact that it could happen was reason enough for browser vendors to get twitchy, masking was added to remove the possibility of it being used as an attack.
So assuming attackers were able to compromise both the JavaScript code executed in a browser as well as the the backend server, masking is designed to prevent the the sequence of bytes sent between these two endpoints being crafted in a special way that could disrupt any broken proxies between these two endpoints (by broken this means proxies that might attempt to interpret a websocket stream as HTTP when in fact they shouldn't).
The browser (and not the JavaScript code in the browser) has the final say on the randomly generated mask used to send the message which is why it's impossible for the attackers to know what the final stream of bytes the proxy might see will be.
Note that the mask is redundant if your WebSocket stream is encrypted (as it should be). Article from the author of Python's Flask:
Why is there masking at all? Because apparently there is enough broken infrastructure out there that lets the upgrade header go through and then handles the rest of the connection as a second HTTP request which it then stuffs into the cache. I have no words for this. In any case, the defense against that is basically a strong 32bit random number as masking key. Or you know… use TLS and don't use shitty proxies.
I have struggled to understand the purpose of the WebSocket mask until I encountered the following two resources which summarize it clearly.
From the book High Performance Browser Networking:
The payload of all client-initiated frames is masked using the value specified in the frame header: this prevents malicious scripts executing on the client from performing a cache poisoning attack against intermediaries that may not understand the WebSocket protocol.
Since the WebSocket protocol is not always understood by intermediaries (e.g. transparent proxies), a malicious script can take advantage of it and create traffic that causes cache poisoning in these intermediaries.
But how?
The article Talking to Yourself for Fun and Profit (http://www.adambarth.com/papers/2011/huang-chen-barth-rescorla-jackson.pdf) further explains how a cache poisoning attack works:
The attacker’s Java applet opens a raw socket connection to attacker.com:80 (as before, the attacker can also a SWF to mount a
similar attack by hosting an appropriate policy file to authorize this
request).
The attacker’s Java applet sends a sequence of bytes over the socket crafted with a forged Host header as follows: GET /script.js
HTTP/1.1 Host: target.com
The transparent proxy treats the sequence of bytes as an HTTP request and routes the request based on the original destination IP,
that is to the attacker’s server.
The attacker’s server replies with malicious script file with an HTTP Expires header far in the future (to instruct the proxy to cache
the response for as long as possible).
Because the proxy caches based on the Host header, the proxy stores the malicious
script file in its cache as http://target.com/script.js, not as
http://attacker.com/script.js.
In the future, whenever any client
requests http://target.com/script.js via the proxy, the proxy will
serve the cached copy of the malicious script.
The article also further explains how WebSockets come into the picture in a cache-poisoning attack:
Consider an intermediary examining packets exchanged between the browser and the attacker’s server. As above, the client requests
WebSockets and the server agrees. At this point, the client can send
any traffic it wants on the channel. Unfortunately, the intermediary
does not know about WebSockets, so the initial WebSockets handshake
just looks like a standard HTTP request/response pair, with the
request being terminated, as usual, by an empty line. Thus, the client
program can inject new data which looks like an HTTP request and the
proxy may treat it as such. So, for instance, he might inject the
following sequence of bytes: GET /sensitive-document HTTP/1.1 Host: target.com
When the intermediary examines these bytes, it might conclude that
these bytes represent a second HTTP request over the same socket. If
the intermediary is a transparent proxy, the intermediary might route
the request or cache the response according to the forged Host header.
In the above example, the malicious script took advantage of the WebSocket not being understood by the intermediary and "poisoned" its cache. Next time someone asks for sensitive-document from target.com they will receive the attacker's version of it. Imagine the scale of the attack if that document is for google-analytics.
To conclude, by forcing a mask on the payload, this poisoning won't be possible. The intermediary's cache entry will be different every time.