How can I measure the server utilization? - cpu-usage

How can I measure the server utilization in terms of requests per unit of time (lets say one hour), assuming the server's maximum capacity is known (for example, 1000 requests per hour)?
I know the equation will be:
utilization = Number of executed requests by server / server capacity
But how can I measure the requests sent from a client to a server?
I need a valid equation to define a request please.

This cannot be answered as "request" in a client/server model cannot be identified without knowing the protocol. To illustrate, multiple HTTP requests can be sent in one connection. UDP based protocols do not use connections at all.
Most general description I can come up with to define a request in an unidentified client/server protocol is the number of messages initiated by the client that require a response by the server. It is a observed variable, not a derivative.
In a program you would obtain this variable via a callback or RPC to the server in question or a program that would be able to provide this variable by inspecting logfiles.

For utilisation you can get all you need from the "sar" utility.
However "request" will be very specific to the software you are running. For instance if you are running Apache web server, by default, it will log each request and you can scan these logs to extract you request data.
However be aware that these are "technical" requests and may not conform to your users idea of a request. Think Amazon, I may think of my book order as one "request", Amazons server will log this as 50 or so http requests.

Related

Are HTTP responses to the same resource returned in sequence?

Say I want to request some resource on a given domain, e.g. example.com/image.jpg.
If I do two requests to this particular resource, can I expect the first response I get back to be mappable to the first response I sent, and so on (or does this depend on client/server implementation)?
I'm asking because if I want to debug request-response pair, I necessarily need to know exactly which response belongs to which request (for timing purposes etc.). So, are there any sure-fire ways to achieve this mapping?
I'm assuming that you are talking about HTTP/1.x using persistent connections (i.e. HTTP keep-alive) when sending a new request without having the response for the previous one yet, i.e. using HTTP pipelining. In this case the order of the responses matches the order of the requests.
If you are instead talking about HTTP/2 then the situation is different because multiple requests can receive the responses in parallel inside the same TCP connection and this also means that a later requests might have received a response before earlier requests. And in which order requests to the same resource are handled by the server depends fully on the server implementation.
The same is true when the requests are done within independent TCP connections. The order might be even more unpredictable than with HTTP/2 because the requests might be handled in different threads or processes and thus the order also depends on the scheduling of these in the operating system.

HTTP proxy server - need to handle dynamic client

I have developed serve-client model based on UDP. Client are connected to server on random basis. I mean number of clients alive at a time is not fixed.
Any new client can communicate any time. It means, there could be 1 live client or 100 clients or any number of clients.
Now in such model, I need to add HTTP requests. Browser could send request to server and then server will forward that to any of client based on some identification.
Is there any method or readymade server(like nginix or lighttpd), which I can use for this requirement.
My big worry is that, destination client are not fixed, they keep changing. Most of server (nginix or lighttd) have static entries for destination address.
I visualize your scenario as multiple sensors that connect to the servers when they have something to say, and then they send a request and wait for the answer.
I visualize you also want to somehow administer such modules so that you want to access to them via HTTP.
You could leave the new configuration items on the regular server so that upon any update connection the response would include (in a piggy-backed fashion) the changes to the node.
Or the server could mark somehow your interest in accessing a certain node, and then, when this connects, the server could notify the interested client. The sensor should pay attention to clients wanting to connect to them during a window time.
Certainly, more information would help us help you.

How does a http client associate an http response with a request (with Netty) or in general?

Is a http end point suppose to respond to requests from a particular client in order that they are received?
What about if it doesn't make sense to in the case of requests handled by cluster behind a proxy or in requests handled with NIO where one request is finished faster than the other?
Is there a standard way of associating a unique id with each http request to associate with the response? How is this handled in clients like http componenets httpclient or curl?
The question comes down to the following case:
Suppose, I am downloading a file from a server and the request is not finished. Is a client capable of completing other requests on the same keep-alive connection?
Whenever a TCP connection is opened, the connection is recognized by the source and destination ports and IP addresses. So if I connect to www.google.com on destination port 80 (default for HTTP), I need a free source port which the OS will generate.
The reply of the web server is then sent to the source port (and IP). This is also how NAT works, remembering which source port belongs to which internal IP address (and vice versa for incoming connections).
As for your edit: no, a single http connection can execute one command (GET/POST/etc) at the same time. If you send another command while you are retreiving data from a previously issued command, the results may vary per client and server implementation. I guess that Apache, for example, will transmit the result of the second request after the data of the first request is sent.
I won't re-write CodeCaster's answer because it is very well worded.
In response to your edit - no. It is not. A single persistent HTTP connection can only be used for one request at once, or it would get very confusing. Because HTTP does not define any form of request/response tracking mechanism, it simply would not be possible.
It should be noted that there are other protocols which use a similar message format (conforming to RFC822), which do allow for this (using mechanisms such as SIP's cSeq header), and it would be possible to implement this in a custom HTTP app, but HTTP does not define any standard mechanism for doing this, and therefore nothing can be done that could be assumed to work everywhere. It would also present a problem with the response for the second message - do you wait for the first response to finish before sending the second response, or try and pause the first response while you send the second response? How will you communicate this in a way that guarantees messages won't become corrupted?
Note also that SIP (usually) operates over UDP, which does not guarantee packet ordering, making the cSeq system more of a necessity.
If you want to send a request to a server while another transaction is still in progress, you will need to create a new connection to the server, and hence a new TCP stream.
Facebook did some research into this while they were building their CDN, and they concluded that you can efficiently have 2 or 3 open HTTP streams at any one time, but any more than that reduces overall transfer time because of the extra packet overhead cost. I would link to the blog entry if I could find the link...

What does concurrent requests really mean?

When we talk about capacity of a web application, we often mention the concurrent requests it could handle.
As my another question discussed, Ethernet use TDM (Time Division Multiplexing) and no 2 signals could pass along the wire simultaneously. So if the web server is connected to the outside world through a Ethernet connection, there'll be literally no concurrent requests at all. All requests will come in one after another.
But if the web server is connected to the outside world through something like a wireless network card, I believe the multiple signals could arrive at the same time through the electro-magnetic wave. Only in this situation, there are real concurrent requests to talk about.
Am I right on this?
Thanks.
I imagine "concurrent requests" for a web application doesn't get down to the link level. It's more a question of the processing of a request by the application and how many requests arrive during that processing.
For example, if a request takes on average 2 seconds to fulfill (from receiving it at the web server to processing it through the application to sending back the response) then it could need to handle a lot of concurrent requests if it gets many requests per second.
The requests need to overlap and be handled concurrently, otherwise the queue of requests would just fill up indefinitely. This may seem like common sense, but for a lot of web applications it's a real concern because the flood of requests can bog down a resource for the application, such as a database. Thus, if the application has poor database interactions (overly complex procedures, poor indexing/optimization, a slow link to a database shared by many other applications, etc.) then that creates a bottleneck which limits the number of concurrent requests the application can handle, even though the application itself should be able to handle them.
.Imagining a http server listening at port 80, what happens is:
a client connects to the server to request some page; it is connecting from some origin IP address, using some origin local port.
the OS (actually the network stack) looks at the incoming request's destination IP (since the server may have more than one NIC) and destination port (80) and verifies that some application is registered to handle data on that port (the http server). The combination of 4 numbers (origin IP, origin port, destination IP, port 80) uniquely identifies a connection. If such a connection does not exists yet, a new one is added to the network stack's internal table and a connection request is passed on to the http server's listening socket. From now on the network stack just passes on data for that connection to the application.
Multiple client can be sending requests, for each one the above happens. So from the network perspective, all happens serially, since data arrives one packet at a time.
From the software perspective, the http server is listening to incoming requests. The number of requests it can have queued before the clients start getting errors is determined by the programmer based on the hardware capacity (this is the first bit of concurrency: there can be multiple requests waiting to be processed). For each one it will create a new socket (as fast as possible in order to continue emptying the request queue) and let the actual processing of the request be done by another part of the application (different threads). These processing routines will (ideally) spend most of their time waiting for data to arrive and react (ideally) quickly to it.
Since usually the processing of data is many times faster than the network I/O, the server can handle many requests while processing network traffic, even if the hardware consist of only one processor. Multiple processors increase this capability. So from the software perspective all happens concurrently.
How the actual processing of the data is implemented is where the key to performance lies (you want it to be as efficient as possible). Several possibilities exist (async socket operations as provided by the Socket class, threadpool, unique threads, the new parallel features from .NET 4).
It's true that no two packets can arrive at the exact same time (unless multiple network cards are in use per Gabe's comment). However, web request usually requires a number of packets. The arrival of these packages is interspersed when multiple requests are coming in at near the same time (whether using wired or wireless access). Also, the processing of these requests can overlap.
Add multi-threading (or multiple processors / cores) to the picture, and you can see how lengthy operations such as reading from a database (which requires a lot of waiting around for a response) can easily overlap even though the individual packets are arriving in a serial fashion.
Edit: Added note above to incorporate Gabe's feedback.

Detecting missing responses to long running HTTP (SOAP) requests

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Resources