If I create a server application, and I have a post or get API that have response of big size data. On what systems parameters will it effect? I mean to say will the server CPU, or RAM usage will increase with response size increase or it will only increase latency that depend on network speed whenever server returns response to client?
Related
I have an application server which does nothing but send requests to an upstream service, wait, and then respond to the client with data recieved from the upstream service. The microservice takes Xms to respond, or sometimes Yms, where X<<Y. The client response time is (in steady state) essentially equal to the amount of time the upstream microservice takes to process the request - any additional latency is negligible, as the client, application server, and upstream microservice are all located in the same datacenter, and communicate over private IPs with a very large network bandwidth.
When the client starts sending requests at a rate of N, the application server becomes overloaded and response times spike dramatically as the server becomes unsteady. The client and the microservice have minimal CPU usage, and the application server is at maximum CPU usage. (The application server is on a much weaker baremetal than the other two services - this is a testing environment used to monitor the application server's behavior under stress.)
Intuivetly, I would expect N to be the same value, regardless of how long the microservice is taking to respond, but I'm finding that the maximum throughput in steady state is significantly less when the microservice takes Yms then when it's only taking Xms. The number of ephemeral ports in use when this happens is also significantly less than the limit. Since the amount of reading and writing being done is the same, and memory usage is the same, I can't really figure out why N is a factor of the microservice's execution time. Also, no, the input/output of the services is the same regardless of the execution time, so the amount of bytes being written is the same regardless. Since the only difference is the execution time, which only requires more TCP connections to be used when responses are taking a while, I'm not sure why maximum throughput is affected? From my understanding, the cost of a TCP connection is negligible once it has already been established.
Am I missing something?
Thanks,
Additional details:
The services use HTTP/1.1 with keepalive, with no pipelining.
Also should've mentioned that I'm using an IO-Thread model. If I were using a thread per request I could understand this behavior, but with only a thread per core it's confusing.
I'm been building a NodeJS application with a team. A team member limited the server request size to 8Kb - if it's bigger than that, the request will be rejected on the server. The idea is that we don't want to process requests that are too big to avoid a potential DoS.
This brings up the issue that what if we wanted to make generally big requests (batch a couple of small requests together, since according to this, it's better than sending a bunch of small requests). And example would be for a TODO list, if I edit 100 TODOs at the same time (I send the UUIDs of each of the todo items back to the server along with the updates); this request could exceed 8Kb in size. I couldn't find if there are standards for max HTTP request sizes.
What would be the solution for wanting to send back larger HTTP requests from the client to the server? Should I:
Increase the HTTP request size on the server? What's the standard? I could 100x it and that would solve much of the problem
Limit the request size on the client. For example, limit it so the user could only edit 100 TODOs at max! Anymore, and the request won't send.
A combination of 1 and 2?
Thank you!
I am developing a SNMP poller which will poll around 40K devices every hour for CPU,Memory,Bandwidth and Connection Count related information. I am currently using snmp4j API. I am performing a snmpwalk separately for CPU, Memory, Bandwidth and Connection Count, but given the number of devices, this is taking huge amount of time. I am thinking of using SNMP getbulk request to get all the information at once, but this is restricted by the maximum response PDU packet size of the queried device. I wanted to know is there a way to know the maximum PDU response size of the remote system so that I can break up my request PDU accordingly. I have around 2500 OIDs to poll in one request. And also, I am not allowed to modify the response packet size of the remote system.
This has been a problem for 30 years (SNMP is that old): part of device discovery is to determine max response size (in addition to response time, supported versions, etc) of each device.
It's basically a trade-off of discovery time vs. just assuming some minimal capabilities.
I have several clients that constantly post data to a REST service. REST service is put behind a network load balancer. Each client sends 100 - 500 MB a day and I need to support 500+ clients.
I can POST either very large packets, this will reduce overhead for TCP/IP session set up and HTTP headers. This will, however, firmly tie one client to a particular server and limit my scalability options. Alternatively, I can send small HTTP packets, which I can load balance well, but I will get more overhead for TCP/IP session set up and HTTP headers.
What is the recommended packet size for HTTP POST? Or how can I calculate one for my environment?
There is no recommended size.
While HTTP POST size is not constrained by the RFCs, since HTTP is a commodity protocol implementing request / response type messaging, most of the infrastructure is configured around the idea that TCP connections are not particularly long lasting / does not carry significant amounts of data. i.e. there will be factors outside your control which may impact the service - although HTTP supports range requests for responses, there is no corollary for requests.
You can get around a lot of these (although not all) by using HTTPS. However you still need to think about how you detect/manage outages - are you happy to wait for a TCP timeout?
With 500+ clients presumably using the system quite heavily, the congestion avoidance limits shouldn't be a problem - whether TCP window scaling is likely to be an issue depends on how the system is used. HTTP handshakes should not be an issue unless you restrict the request size to something silly.
If the service is highly dependant on clients pushing lots of data on to your server, then I'd encourage you to look at parsing the data on the client (given the volume, presumably it's coming from files - implying a signed java applet or javascript with UniversalBrowserRead privilege) then sending it over a bi-directional communication channel (e.g. websocket).
Leaving that aside for now, the only way you can find out what the route between your clients and your server will support is to measure it - and monitor it. I would expect that a 2Mb upload size would work pretty much anywhere, while a 10Mb size would work most of the time within the US or Europe - and that you could probably increase this to 50Mb as long as there's no mobile clients.
But if you want to maintain the effectiveness of the service you'll need to monitor bandwidth, packet loss and lost connections.
How can I measure the server utilization in terms of requests per unit of time (lets say one hour), assuming the server's maximum capacity is known (for example, 1000 requests per hour)?
I know the equation will be:
utilization = Number of executed requests by server / server capacity
But how can I measure the requests sent from a client to a server?
I need a valid equation to define a request please.
This cannot be answered as "request" in a client/server model cannot be identified without knowing the protocol. To illustrate, multiple HTTP requests can be sent in one connection. UDP based protocols do not use connections at all.
Most general description I can come up with to define a request in an unidentified client/server protocol is the number of messages initiated by the client that require a response by the server. It is a observed variable, not a derivative.
In a program you would obtain this variable via a callback or RPC to the server in question or a program that would be able to provide this variable by inspecting logfiles.
For utilisation you can get all you need from the "sar" utility.
However "request" will be very specific to the software you are running. For instance if you are running Apache web server, by default, it will log each request and you can scan these logs to extract you request data.
However be aware that these are "technical" requests and may not conform to your users idea of a request. Think Amazon, I may think of my book order as one "request", Amazons server will log this as 50 or so http requests.