I am maxing out my CPU on my gRPC client doing unary RPC. I wonder if it makes any sense to try replacing unary RPC with a bidirectional streaming RPC (or set of them?) that basically lasts throughout the life of the application? I can't tell if bidirectional streaming RPC is intended for standard 1 request/1 response communication like this. The motivation would be to avoid creating new TCP connections.
I can't tell if bidirectional streaming RPC is intended for standard 1 request/1 response communication
If you are going to use a request-reply message pattern, just use unary request-reply (RPC). It is designed for that pattern and semantics for e.g. retry is well known.
The motivation would be to avoid creating new TCP connections.
gRPC uses HTTP/2, so all unary RPC requests already use the same TCP connection - since HTTP/2 multiplexes all requests over the same TCP connection.
I am maxing out my CPU on my gRPC client doing unary RPC.
This sounds a bit rare. Can you change your communication pattern, e.g. stream multiple requests before waiting for a response? Alternative batch data and send bigger requests more seldom? Or can you elaborate more about your message pattern?
Like #Jonas suggested, using a bidirectional stream just for higher throughput would be a bad idea.
Google’s gRPC team advises against it, but nevertheless, few seem to argue that theoretically, streams should have lower overhead. But that does not seem to be true.
Probably because streams ensure the messages are delivered in the order they were sent, thus creating some kind of bottleneck when there are concurrent messages.
For lower concurrent requests, both have comparable latencies. However, for higher loads, unary calls are much more performant.
A detailed analysis here: https://nshnt.medium.com/using-grpc-streams-for-unary-calls-cd64a1638c8a.
Related
How is using asynchronous HTTP Requests different from using Messages when it comes to sending data in ZeroMQ?
A http request is simply the use of the hypertext transport protocol used over IP between two machines, client and server. It can be used for moving data in either direction. There's no particular restrictions as to what that data can be. An asynchronous request is simply one where the requester isn't bothering to wait for the reply having made the request; it'll use some mechanism to later rendezvous with the request, whenever that happens to come in.
Sending a message through ZeroMQ can be somewhat similar, specifically the REQ/REP pattern (request, reply). Similar to a http request, the requester will send some sort of message and the replier will reply in some way, and strictly in this pattern.
ZeroMQ uses its own protocol, zmtp, to move messages around. Again, there's nothing really limiting what data is in a message. ZeroMQ is inherently asynchronous - it's implementing the Actor programming model (though I notice that the way some implementations in some languages have eroded ZeroMQ's simplicity w.r.t. that, fitting into the language's own way of being asynchronous rather than use a poll funcion provided by ZeroMQ).
However, ZeroMQ builds many more data distribution patterns than req/rep on top of zmtp, like pub/sub, dealer/router, that http simply has no equivalent of. Further differences are that ZeroMQ can use IP, interprocess comms, or in-memory transports; this makes it highly suited for both in-application use, and for inter-machine distributed applications. I guess that a webserver could be contacted over ipc too, but I've never heard of anyone bothering to do that. Http is expected to be used over specific ports (e.g. port 80), whereas ZMQ gets used on whatever ports the developer wants (obeying the normal port allocation rules if they want a quiet life).
I'm taking a google video course about the http protocol. The http 1.1 introduced so called the pipeling technique to reduce a time between requestes and responses. There might occur the head of line blocking, so browsers uses parallel connections to avoid the HOL blocking.
I wonder, how does browsers send parallel network packets? I have never thought about possibility of multiple packets sent simultaneously, is it even possible to send parallel requests through a "cable"? How does it work?
Another thing is the http 2.0, does browsers implement parallel connections in this protocol? The http 2.0 uses the streams, but I'm not sure how browsers handles it.
Nothing in HTTP is truly parallel. If multiple resources are to be tranferred at once, clients have to establish multiple connections. For HTTP/1.1, it is not uncommon to see three to five of these per host.
HTTP/2 is a bit different in that it can engage into interleaving: The smalest entity in HTTP/1.x is a message whereas in HTTP/2 this would be a frame of a message. This allows HTTP/2 to transmit multiple messages "at once" (really: one frame of a given message at a time) while HTTP/1.1 could just start pipelining and possible suffer from HOL-blocking as you mentioned.
As for your question regarding multiple packets being sent simultaneously: Yes, that is possible and is also regularly done. That would concern wave physics, fourier transformations, and electrical engineering and thus be a bit off-topic for SO ;)
The problem I'm trying to solve is latency between Microservice communication on the backend. Scenario. Client makes a request to service A, which then calls service B that calls service C before returning a response to B which goes to A and back to the client.
Request: Client -> A -> B -> C
Response: C -> B -> A -> Client
The microservices expose a REST interface that is accessed using HTTP. Where each new HTTP connection between services to submit requests is an additional overhead. I'm looking for ways to reduce this overhead without bringing in another transport mechanism into the mix (i.e. stick to HTTP and REST as much as possible). Some answers suggest using Apache Thrift but I'd like to avoid that. Other possible solutions are using Messaging Queues which I'd also like to avoid. (To keep operational complexity down).
Has anyone experience in microservices communication using HTTP Connection pooling or HTTP/2? The system is deployed on AWS where service groups are fronted by a ELB.
HTTP/1.0 working mode was to open a connection for each request, and close the connection after each response.
Using HTTP/1.0 from remote clients and clients inside microservices (e.g. those in A that call B, and those in B that call C) should be avoided because the cost of opening a connection for each request can contribute for most of the latency.
HTTP/1.1 working mode is to open a connection and then leave it open until either peer explicitly requests to close it. This allow for the connection to be reused for multiple requests, and it's a big win because it reduces the latency, it uses less resources, and in general it is more efficient.
Fortunately nowadays both remote clients (e.g. browsers) and clients inside microservices support HTTP/1.1 well, or even HTTP/2.
Surely browsers have connection pooling, and any decent HTTP client that you may use inside your microservices does also have connection pooling.
Remote clients and microservices clients should be using at least HTTP/1.1 with connection pooling.
Regarding HTTP/2, while I am a big promoter of HTTP/2 for browser-to-server usage, for REST microservices calls inside data centers I would benchmark the parameters you are interested in for both HTTP/1.1 and HTTP/2, and then see how they fare. I expect HTTP/2 to be on par with HTTP/1.1 for most cases, if not slightly better.
The way I would do it using HTTP/2 (disclaimer, I'm a Jetty committer) would be to offload TLS from remote clients using HAProxy, and then use clear-text HTTP/2 between microservices A, B and C using Jetty's HttpClient with HTTP/2 transport.
I'm not sure AWS ELB already supports HTTP/2 at the time of this writing, but if it does not please be sure to drop a message to Amazon asking to support it (many others already did that). As I said, alternatively you can use HAProxy.
For communication between microservices, you can use HTTP/2 no matter what is the protocol used by remote clients.
By using Jetty's HttpClient, you can very easily switch between the HTTP/1.1 and the HTTP/2 transports, so this gives you the maximum of flexibility.
If latency is really an issue to you then you should probably not be using service calls between your components. Rather you should minimize the number of times control passes to an out-of-band resource and be making the calls in-process, which is much faster.
However, in most cases, the overheads incurred by the service "wrappers" (channel construction, serialisation, marshalling, etc), are negligible enough and still well within adequate latency tolerances for the business process being supported.
So you should ask yourself:
Is latency really an issue for you, in respect to the business process? In my experience only engineers care about latency. Your business customers do not.
If latency is an issue, then can the latency definitively be attributed to the cost of making the service calls? Could there be another reason the calls are taking a long time?
If it is the services, then you should look at consuming the service code as an assembly, rather than out-of-band.
For the benefit of others running into this problem, apart from using HTTP/2, SSL/TLS Offloading, Co-location, consider using Caching where you can. This not only improves performance, but reduces dependency on downstream services. Also, consider data formats that are perform well.
latency between Micro-service communications is an issue for low latency applicaitons however number of call can be minimize by hybrid between micro-services and monolith
Emerging C++ microservices framework is best for low latency applications
https://github.com/CppMicroServices/CppMicroServices
Apparently, I don't get true parallel reads of different URLs on the same server, even issuing truly contemporary requests, on multiple physical interfaces (NICs).
I think the problem could be that HTTP protocol is connection oriented, then requests are serialized at lower level into TCP/IP stack (is this correct wording?).
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
HTTP requests are independent. They can be issues over arbitrarily many independent connections. HTTP does not impose an limits regarding concurrency.
You hit some resource limit. Maybe your client library restricts the number of concurrent calls. Maybe the server does. Maybe the network is fully utilized. Maybe back-end resources that the server uses are maxed out.
Find the bottleneck and eliminate it. The transport protocol is not the problem. Changing it can't help.
different URLs
Whether the URL is different or not makes no difference, except if the server implements some special throttling. Highly unlikely.
on multiple physical interfaces (NICs).
You are probably not network-bound.
requests are serialized at lower level into TCP/IP stack
No. Connection management is not part of HTTP. The client decided how many connections to use. Reconfigure the client.
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
You will have to re-implement flow control, segment fragmentation, re-transmission and other features of TCP protocol yourself. And then your HTTP implementation will not be compatible with the standard one.
So no, it does not make much sense.
For streaming you may like to use protocols designed for streaming, like WebRTC.
I've been working with websockets lately in detail. Created my own server and there's a public demo. I don't have such detailed experience or knowledge re: http. (Although since websocket requests are upgraded http requests, I have some.)
On my end, the server reports details of each hit. Among them are a bunch of http keep-alive requests. My server doesn't handle them because they're not websocket requests. But it got my curiosity up.
The whole big thing about websockets is that the connection stays alive. Then you can pass messages in both directions (simultaneously even). I've read that the Keep-Alive HTTP connection is a relatively new development (I don't know how many years in people time, just that it's only included in the latest standard - 1.1 - is that actually old now?)
I guess I can assume that there's a behavioral difference between the two or there would have been no reason for a websocket standard? What's the difference?
A Keep Alive HTTP header since HTTP 1.0, which is used to indicate a HTTP client would like to maintain a persistent connection with HTTP server. The main objects is to eliminate the needs for opening TCP connection for each HTTP request. However, while there is a persistent connection open, the protocol for communication between client and server is still following the basic HTTP request/response pattern. In other word, server side can't push data to client.
WebSocket is completely different mechanism, which is used to setup a persistent, full-duplex connection. With this full-duplex connection, server side can push data to client and client should be expected to process data from server side at any time.
Quoting corresponding entries on Wikipedia for reference:
1) http://en.wikipedia.org/wiki/HTTP_persistent_connection
2) http://en.wikipedia.org/wiki/WebSocket
You should read up on COMET, a design pattern which shows the limits of HTTP Keep-Alive. Keep-Alive is over 12 years old now, so it's not a new feature of HTTP. The problem is that it's not sufficient; the client and server cannot communicate in a truly asynchronous manner. The client must always use a "hanging" request in order to get a message back from the server; the server may not just send a message to the client at any time it wants.
HTTP vs Websockets
REST (HTTP)
Resources benefit from caching when the representation of a resource changes rarely or multiple clients are expected to retrieve the resource.
HTTP methods have well-known idempotency and safety properties. A request is “idempotent” if it can be issued multiple times without resulting in unique outcomes.
The HTTP design allows for responses to describe errors with the request, with the resource, or to provide nuanced status information to differentiate between success scenarios.
Have request and response functionality.
HTTP v1.1 may allow multiple requests to reuse a single connection, there will generally be small timeout periods intended to control resource consumption.
You might be using HTTP incorrectly if…
Your design relies on a client polling the service often, without the user taking action.
Your design requires frequent service calls to send small messages.
The client needs to quickly react to a change to a resource, and it cannot predict when the change will occur.
The resulting design is cost-prohibitive. Ask yourself: Is a WebSocket solution substantially less effort to design, implement, test, and operate?
WebSockets
WebSocket design does not allow explicit or transparent proxies to cache messages, which can degrade client performance.
WebSocket protocol offers support only for error scenarios affecting the establishment of the connection. Once the connection is established and messages are exchanged, any additional error scenarios must be addressed in the messaging layer design, but WebSockets allow for a higher amount of efficiency compared to REST because they do not require the HTTP request/response overhead for each message sent and received.
When a client needs to react quickly to a change (especially one it cannot predict), a WebSocket may be best.
This makes the protocol well suited to “fire and forget” messaging scenarios and poorly suited for transactional requirements.
WebSockets were designed specifically for long-lived connection scenarios, they avoid the overhead of establishing connections and sending HTTP request/response headers, resulting in a significant performance boost
You might be using WebSockets incorrectly if..
The connection is used only for a very small number of events, or a very small amount of time, and the client does not - need to quickly react to the events.
Your feature requires multiple WebSockets to be open to the same service at once.
Your feature opens a WebSocket, sends messages, then closes it—then repeats the process later.
You’re re-implementing a request/response pattern within the messaging layer.
The resulting design is cost-prohibitive. Ask yourself: Is a HTTP solution substantially less effort to design, implement, test, and operate?
Ref: https://blogs.windows.com/buildingapps/2016/03/14/when-to-use-a-http-call-instead-of-a-websocket-or-http-2-0/