I want to use gRPC to expose an interface for bidirectional transfer of large data sets (~100 MB) between two services. Because gRPC imposes a 4 MB message size limit by default, it appears that the preferred way to do this is to manually code streaming of chunks, and re-assemble them at the receiving end [1][2].
However, gRPC also allows increasing the message size limit via grpc.max_receive_message_length and grpc.max_send_message_length, making it possible to directly transmit a message of up to ~2 GB in size, without any manual chunking or streaming. Quick testing indicates that this simpler approach works equally well in terms of performance and throughput, making it seem more desirable for this use case. Assume that the entire data set is needed in memory.
Is one of these approaches inherently better than the other? Are there any potential side effects of the simpler non-chunked approach? Can I rely on the MTU-dependent fragmentation at lower layers to be sufficient to avoid network delay and other handicaps?
References:
Chunking large messages with gRPC
Sending files via gRPC
The 4 MB limit is protect clients/servers who haven't thought about message size constraints. gRPC itself is fine with going much higher (100s of MBs), but most applications could be trivially attacked or accidentally go out-of-memory allowing messages of that size.
If you're willing to receive a 100 MB message all-at-once, then increasing the limit is fine.
Related
I have a way to control message size when I stream the data through grpc. Unfortunately I am not able to find info on what would be optimal message size. I found this but it is not resolved.
Is keeping it under 4MB threshold good enough or there are some guidelines?
It depends a lot on your application needs, network configuration, and language. Messages around 16-64K are perhaps best suited for the most wide variety of configurations including mobile etc. For pure throughput-oriented workloads in data centers we regularly see GB sized messages, but 1 MB messages are perhaps pretty close to ideal tradeoff of minimal computational overhead and immediate memory capacity needs for the amount of network pipelining that they provide.
I'm looking for guidelines to maximize throughput and minimize latency for gRPC unary calls. I need to achieve about 20,000 QPS, < 50ms each. On a moderate hardware (4 core CPU) I could only achieve about 15K QPS with an average latency of 200ms. I'm using Java client and server. The server does nothing except return a response. The client sends multiple concurrent requests using an async stub. The number of concurrent requests is limited.CPU remains in the ~80% range.
In comparison, using Apache Kafka I can achieve much higher throughput (100's of thousands QPS), as well as latency in the 10ms range.
If you are using grpc-java 1.21 or later and grpc-netty-shaded you should already be using the Netty Epoll transport. If you are using grpc-netty, add a runtime dependency on io.netty:netty-transport-native-epoll (the correct version can be found by looking at grpc-netty's pom.xml or by the version table in SECURITY.md).
The default executor for callbacks is a "cached thread pool." If you do not block (or know the limits of your blocking), specifying a fixed-size thread pool can increase performance. You can try both Executors.newFixedThreadPool and ForkJoinPool; we've seen the "optimal" choice vary depending on the work load. You specify your own executor via ServerBuilder.executor() and ManagedChannelBuilder.executor().
If you have high throughput (~Gbps+ per client with TLS; higher if plaintext) using multiple Channels can improve performance by using multiple TCP connections. Each TCP connection is pinned to a Thread, so having more TCP connections allows using more Threads. You can create the multiple Channels and then round-robin over them; selecting a different one for each RPC. Note that you can easily implement the Channel interface to "hide" this complexity from the rest of your application. This looks like it would provide you specifically with a large gain, but I put it last because it's commonly not necessary.
I am trying to make a simple general purpose multi-threaded async downloader in python.How many parallel connections can be generally be made to a server with minimum risk of being banned or rate limited.
I am aware that network will be a limiting in some cases but lets assume in this case that network isn't an issue in this case for the sake of discussion.I/O is also done asynchronously.
According to Browserscope , browsers make a maximum of 17 connections at a time.
However according to my research , most download managers download files in multi-part and make 8+ connections per file.
1.How many files can be downloaded at a time ?
2.How many chunks for a single can be downloaded at one time ?
3.What should be the minimum size of those chunks to make it worth creating the overhead of creating parallel connections ?
It depends.
While some servers tolerate a high number of connections, others don't. General web servers might be more on the high side (low two digit), file hosters might be more sensitive.
There's little to say unless you can check the server's configuration or just try and remember for the next time when your ban has timed out.
You should however watch your bandwidth. Once you max out your access line there's no gain in further increasing the connections.
We are making an application involving a server(tomcat, apache, linux) and multiple mobile clients(Android, iPhone, Windows, Nokia J2ME).
Normally the clients and the server will communicate using http.
I would like to know the download and upload speeds of the client from the http request that it made.
Ideally I would not like to upload a file and download a file to come up with these speeds. I am assuming that there might be some thing at the HTTP protocol level that can give me this, or some lower layer of the network.
If only it were that simple.
Even where the bandwidth and latency of a network are very well defined, the actual throughput will be limited by the congestion window and where the end points are in establishing the slow start threshold. These can affect throughput by a factor of 20 or more.
There's nothing in HTTP which will provide metrics for these. Some TCP stacks will expose limited information about throughput (as used by iftop, iptraf).
However if you really want to gather useful metrics on HTTP throughput, then you need to start shoving data across the network - have a look at yahoo boomerang for an implementation.
If the http connection goes to the Apache server first, you can use Apache Bench to do all sorts of load testing. It comes with apache and can be invoked with something like the following.
Suppose we want to see how fast Yahoo can handle 100 requests, with a maximum of 10 requests running concurrently:
ab -n 100 -c 10 http://www.yahoo.com/
HTTP does not deal with connection speeds. Although I could imagine some solution that involves some HTTP (reverse) proxy that estimates speeds on a connection and sets custom headers to pass this info. You would also need to to associate stats of different connections with particular client. I have not seen yet a readily available solution for this.
Also note that
network traffic can be buffered or shaped so download speed may depend on amount of data transferred or previous load of network. So even downloading file would not be accurate.
Amount of data transferred depends on protocol level (payload wrapped in HTTP wrapped in gzip wrapped in TLS wrapped TCP). Which one do you want to measure? Or what do you want to achieve with this measured speed?
I've seen some Real User Monitoring (RUM) tools that can do this passively (they get a feed from a SPAN port or network TAP infront of the servers at the data centre)
There are probably ways of integrating the data they produce into your applications but I'm not sure it would be easy or perhaps given the way latency and bandwidth can 'dynamically' change on a mobile network that accurate.
I guess the real thing to focus on is the design of the app, how much data is travelling across the network, how you can minimise it etc.
Other thing to consider is whether you could offer a solution that allows some of the application to be hosted in the telco's POPs (some telcos route all their towers back to a central pop, others have multiple POPs)
Using PL/SQL, what are good options for sending large amounts of data to client side code?
To elaborate, server side PL/SQL operates on a request and generates a response with a large amount of data that must be sent to the client side code. Are there "good options" for sending down large amounts of data? What types of Oracle pros/cons are important to consider here?
The two problems you have when you want to return large amounts of data are:
bandwidth issues
memory issues (both at the server and the client)
If in any way possible, you should attempt to stream the data instead of returning it all at once. You will occupy the same bandwidth but there is less peak usage and you prevent memory issues (at least at the server, it depends on your client implementation how memory is used over there).
Oracle provides streaming support through pipelined functions. You can find examples here and here.
there are no good options, always try to send the smallest amount of data to the client. your database and network will thanks you!
if you can send small chunks spread over time, that would be better that dumping everything at once.