rapache + rplumber/ jug - concurrent requests? - r

I understand that R is single-threaded and it does not support concurrent requests. This is the same issue when we use rplumber:
R is a single-threaded programming language, meaning that it can only do one task at a time. This is still true when serving APIs using Plumber, so if you have a single endpoint that takes two seconds to generate a response, then every time that endpoint is requested, your R process will be unable to respond to any additional incoming requests for those two seconds.
What about rapache? Does it support concurrent requests? Can I use rapache as a server for rplumber or jug?

Related

How to send 50.000 HTTP requests in a few seconds?

I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.

Guidelines for high-throughput low-latency unary calls in gRPC

I'm looking for guidelines to maximize throughput and minimize latency for gRPC unary calls. I need to achieve about 20,000 QPS, < 50ms each. On a moderate hardware (4 core CPU) I could only achieve about 15K QPS with an average latency of 200ms. I'm using Java client and server. The server does nothing except return a response. The client sends multiple concurrent requests using an async stub. The number of concurrent requests is limited.CPU remains in the ~80% range.
In comparison, using Apache Kafka I can achieve much higher throughput (100's of thousands QPS), as well as latency in the 10ms range.
If you are using grpc-java 1.21 or later and grpc-netty-shaded you should already be using the Netty Epoll transport. If you are using grpc-netty, add a runtime dependency on io.netty:netty-transport-native-epoll (the correct version can be found by looking at grpc-netty's pom.xml or by the version table in SECURITY.md).
The default executor for callbacks is a "cached thread pool." If you do not block (or know the limits of your blocking), specifying a fixed-size thread pool can increase performance. You can try both Executors.newFixedThreadPool and ForkJoinPool; we've seen the "optimal" choice vary depending on the work load. You specify your own executor via ServerBuilder.executor() and ManagedChannelBuilder.executor().
If you have high throughput (~Gbps+ per client with TLS; higher if plaintext) using multiple Channels can improve performance by using multiple TCP connections. Each TCP connection is pinned to a Thread, so having more TCP connections allows using more Threads. You can create the multiple Channels and then round-robin over them; selecting a different one for each RPC. Note that you can easily implement the Channel interface to "hide" this complexity from the rest of your application. This looks like it would provide you specifically with a large gain, but I put it last because it's commonly not necessary.

How many requests can be processed simultaneously by R OpenCPU

I am new to OpenCPU, I look at the documents at https://www.opencpu.org/, It looks that OpenCPU can process http requests concurrently? I ask so because R itself only has single-thread mode, and how many requests can it process concurrently?
Thanks.
If you run the Apache based opencpu-server there is no limit to the number of concurrent requests. You can tweak the number of workers in the prefork settings.
The local single-user server in R on the other hand only uses a single R process. You can still make concurrent requests, but they will automatically be queued and processed one after the other.
One way or another, you shouldn't worry about it in the client.

Does Wookie Common Lisp server process requests in parallel?

I have a Wookie-based app accepting requests behind nginx. The app works in general, but I'm running into some issues with parallel requests. For instance, when the app accepts a long-running request (R1) to generate a report from a dataset in the database (mongodb, via cl-mongo), it would appear unresponsive to any following request (R2) that comes in before the response to R1 starts being sent over the network.
The client reports an error in communicating with the server for R2, but after the server finishes with R1 and finally sends the reponse, it tries to process R2 (as evident from debugging output) -- performs proper routing etc (only too late).
Putting blackbird promises around the request processing routines didn't help (and was probably excessive anyway as Wookie is designed to be async).
So what's the proper way to handle this? I'm probably okay with clients waiting for a long time for their responses (via very long timeouts), but it would be much better to process short requests in parallel.
The idea of the underlying libraries (libevent2, libuv) of cl-async, is to use IO wait time of one task (request) for CPU time of another task (request). So it is just a mechanism to not waste IO wait time. The only thing happening in parallel is IO and at most one task using the CPU at a time (per thread/process depending on implementation).
If your requests need on avarage x ms of CPU time, then as soon as you have n requests in parallel, where n is the number of cores, your n+1st requests has to wait at least x ms, regardless of whether you use a threaded or event based server.
You can of course spawn more server processes and use load balancing to make use of all available cores.

How to test the performance of an http JSON server?

How to test the performance of an http server that serves and accepts only JSON requests (post and get)? I'm new to web testing, so tell me if I'm trying to do it in incorrect way.
I want to test if:
server is capable of handling hundreds of simultaneous connections.
server is capable to serve thousands requests per second.
server does not crash or get stuck when the number of requests exceeds server capabilities, and continues to run normally when the number of requests drops below average.
One way is to write some logic that repeats certain actions per run, and run multiple of them.
PS: Ideally, the tool/method should support compression like gzip as an option.
You can try JMeter and it's HTTPSampler.
About gzip. I've never used it in JMeter, but it seems it can:
How to get JMeter to request gzipped content?
Apache Bench (ab) is a command line tool that's great for these kinds of things. http://en.wikipedia.org/wiki/ApacheBench
ab -n 100 -c 10 http://www.yahoo.com/
If you are new to web testing then there are a lot of factors that you need to take into account. At the most basic level you want to do the things you have outlined.
Beyond this you need to think about how poorly performing clients might impact your service eg. keeping connections alive, sending malformed requests etc. These may translate into exceptions on the server which might in turn have additional impact (due to logging or slower execution). This means that you have to think of ways to break the service and monitor events that have an impact at higher scales.
Microsoft have a fairly good introduction to performance testing for web applications.

Resources