Would R Plumber served through Apache be able to handle multiple requests? - r

I'm pretty new to R Plumber. I'm trying to deploy an R function as an API to be able to do live calculations for a web app. I understand R is single-thread by default and, hence, Plumber inherits the same limitations to deal with requests. The R function I'm trying to deploy is not costly, but will probably be called multiple times in a single session.
I'm also quite a newbie in terms of serving/deploying web applications, but I do know how to set up an Apache server. I've noticed that Apache can receive and process multiple requests by opening new threads (I honestly consider this as a black-box, magical thing and have zero knowledge on how apache does this). Would serving the plumber API through Apache allow me to bypass the single-thread limitations?
Alternatively, would it be possible to bypass single-thread limitations by using doParallel (or something alike)?

RPlumber is single thread and, as such, can only work with a single request at a time. Found out that, with enough resources, you can deploy several PlumbR servers listening on different ports. It's hacky and nasty, but it got the job done.

Related

HTTP response times GUI

I'm looking for an application available on CentOS, that allows me to check periodic connectivity response times between that server and a specific port of a remote server (in this case servers a SOAP API).
Something that preferentially allows me to send periodic API calls, but if not possible, just telnet's that remote port, but shows results in a graphic.
Does someone know about an application that allows this, without the need for me to create a script that writes results to a log file that is less readable in terms of time perspective?
After digging and testing a bit more, ended up using netdata:
https://www.netdata.cloud/
Awesome tool, extremely simple to use and install.

Will Docker suffice for a Shiny app with ~100 connections or do I need Shiny Proxy?

I'm looking for a free and open source option for serving out a Shiny appl to ~100 of my students simultaneously. I tried to do this with Shiny Server Open and it throttled. Users got a message like
Too Many Users
Sorry, but this application has exceeded its quota of concurrent users. Please try again later.
After searching on that error message I now know that I can increase the number of concurrent connections, but I'm afraid of bottlenecks due to R's single threaded-ness. I'm aware of Shiny Proxy and I've been experimenting with this, but it seems like it may contain an extra layer of complexity that I don't need.
I've served Shiny apps before with Docker (but not to this large of an audience), so I'm wondering if it will be sufficient.
My question is this: if I don't need authentication (user logins), will Docker suffice for a single page application for ~100 simultaneous connections? Or do I really need Shiny Proxy?
Corollary: how can I test this and ensure that it will work (outside of getting in front of a 100 student class and testing on the fly)?
Do you care if they all share the same underlying R process?
The open-source version of shiny-server allows you to serve apps, but they all share a single R process. So if your app has a long-running simulation, while one user runs that, it would tie-up your R thread and block the other users until it finishes running.
I don't know that there is a limit on the concurrent connections, if you don't mind them sharing the R process as described above. You can try increasing the simple_scheduler setting, see Section 3.1.2 Simple Scheduler, in the documentation for shiny-server.conf (typically at /etc/shiny-server/).
If you don't care about them all being at the same URL, you could just use multiple instances of the open-source shiny-server, for instance in docker containers hosted on your machine at different ports.
If you want to do something like load-balance between instances of your application (horizontally scaling behind a single URL), you'll need either shiny-server pro, ShinyProxy, or to use a load balancer with sticky-sessions. This is because shiny apps handle state in-memory in a R session, so if you try to send your students to a URL and that URL is backed by n instances of your app, but there is no stickiness guarantee, than an individual student's action will not necessarily be on the same instance each time, and the apps won't work as you expect.
Shiny-Server pro and ShinyProxy handle this stickiness for you with cookies and headers. Depending on your cloud-services provider, they probably support a browser cookie which would work as long as you don't need your students to be able to open multiple tabs of your app with different instances.

R web server that handles sessions

I am not sure if this is the right place to ask this question. Please point out to me where if this is the case.
I must build a multi user, stateful (sessions; object persistance) web application that will uses .NET in the backend and must connect to R in order to perform calculations on data that lies in a SQL server 2016 DB. Basically, I need to connect a MS based backend with R.
Everything is clear, except for the problem that I need to find an R server that handles sessions. I know shiny but I can't use it (long story).
rApache and openCPU do not handle sessions.
Rserve for windows is very limited (no parallel connections are supported, subsequent connections share the same namespace and sessions are not supported - this is a consequence of the fact that parallel connections are not supported)
Finally, I have seen Rook (i.e. Run R/Rook as a web server on startup) but I can't read anywhere, even the docs. if it is able to deal with sessions. My question is: is there a non stateless R web server or does anyone knows if Rook is stateless?
EDIT:
Apparently, this question has been around for longer: http://jeffreyhorner.tumblr.com/about#comment-789093732

Sailsjs distribution across multiple Google compute engine instances

Sailsjs requires setup to handle scaling horizontally. There are multiple ways to do this. I'm not sure if I have done this correctly, due to poor performance during load testing. Please confirm if I understand and am doing the setup correctly.
I've created a load balancer on the Google platform for handling the distribution of requests across the instances. Much is spoken about of Nginx for distributing, but I understand Googles load balancer does all I need in this regard. Note, I use session affinity: Client IP.
I've set up config/session.js to use express-mysql-session, so MemoryStore is not used.
I haven't set up anything in config/sockets.js. My project doesn't use live chat etc with socket.io, all requests are to waterline for data from db. But if this is a issue, please refer me to a way to do this with Mysql db not redis (or memory).
I use pm2 as a way to keep it live and to distribute processing on a instance.
Those are the main factors I've found regarding horizontal scaling with sailsjs.

how to send response directly from worker to client

When Nginx is used as a reverse proxy so that the client connects to Nginx and Nginx load balances or otherwise redirects the request to a backend worker via CGI etc... what is it called and how is it implemented when the worker responds directly to the client bypassing Nginx?
The source of my question is from two places. a) erlangonxen uses Nginx and a "spawner" app to launch a huge volume of instant-on workers. However, the response still passes through the spawner (an expensive step); b) I recently scanned an article that described this solution but I can no longer find it.
You got your jargon mixed I believe, so I'm going to ignore the proxy bit and assume this is about CGI. In that case you should be looking for fast CGI solutions. Nginx has support for fast CGI built in.
This spawner as you call it, is meant to provide concurrency, so that multiple CGI requests can be handled in parallel, without having to spawn an interpreter for each request. Instead the workers get spawned and ideally live forever.
If the selection of an available worker really is a performance bottleneck, then the implementation of this fast CGI daemon is severely lacking and you should look for a better solution. Worker selection should be a fraction of the time of the workers job.
I'm not sure if it's a jargon thing. The good news (for me anyway) is that I had read the articles and seen the diagrams... I just could not remember where. So reverse proxy not withstanding... I was looking for a "direct server request" (DSR) and the spawner from the erlangonxen project.
I'm not certain whether ot not these two technologies are going to work together. The DSR seems to have fallen out of favor and I'll probably not use it al all although in the given architecture it would seem to make sense to try. a) limits the total number of trips and sockets; b) really allows for some functions like gzip to be distributed nicely
Anyway, "found it".

Resources