R web server that handles sessions - r

I am not sure if this is the right place to ask this question. Please point out to me where if this is the case.
I must build a multi user, stateful (sessions; object persistance) web application that will uses .NET in the backend and must connect to R in order to perform calculations on data that lies in a SQL server 2016 DB. Basically, I need to connect a MS based backend with R.
Everything is clear, except for the problem that I need to find an R server that handles sessions. I know shiny but I can't use it (long story).
rApache and openCPU do not handle sessions.
Rserve for windows is very limited (no parallel connections are supported, subsequent connections share the same namespace and sessions are not supported - this is a consequence of the fact that parallel connections are not supported)
Finally, I have seen Rook (i.e. Run R/Rook as a web server on startup) but I can't read anywhere, even the docs. if it is able to deal with sessions. My question is: is there a non stateless R web server or does anyone knows if Rook is stateless?
EDIT:
Apparently, this question has been around for longer: http://jeffreyhorner.tumblr.com/about#comment-789093732

Related

Will Docker suffice for a Shiny app with ~100 connections or do I need Shiny Proxy?

I'm looking for a free and open source option for serving out a Shiny appl to ~100 of my students simultaneously. I tried to do this with Shiny Server Open and it throttled. Users got a message like
Too Many Users
Sorry, but this application has exceeded its quota of concurrent users. Please try again later.
After searching on that error message I now know that I can increase the number of concurrent connections, but I'm afraid of bottlenecks due to R's single threaded-ness. I'm aware of Shiny Proxy and I've been experimenting with this, but it seems like it may contain an extra layer of complexity that I don't need.
I've served Shiny apps before with Docker (but not to this large of an audience), so I'm wondering if it will be sufficient.
My question is this: if I don't need authentication (user logins), will Docker suffice for a single page application for ~100 simultaneous connections? Or do I really need Shiny Proxy?
Corollary: how can I test this and ensure that it will work (outside of getting in front of a 100 student class and testing on the fly)?
Do you care if they all share the same underlying R process?
The open-source version of shiny-server allows you to serve apps, but they all share a single R process. So if your app has a long-running simulation, while one user runs that, it would tie-up your R thread and block the other users until it finishes running.
I don't know that there is a limit on the concurrent connections, if you don't mind them sharing the R process as described above. You can try increasing the simple_scheduler setting, see Section 3.1.2 Simple Scheduler, in the documentation for shiny-server.conf (typically at /etc/shiny-server/).
If you don't care about them all being at the same URL, you could just use multiple instances of the open-source shiny-server, for instance in docker containers hosted on your machine at different ports.
If you want to do something like load-balance between instances of your application (horizontally scaling behind a single URL), you'll need either shiny-server pro, ShinyProxy, or to use a load balancer with sticky-sessions. This is because shiny apps handle state in-memory in a R session, so if you try to send your students to a URL and that URL is backed by n instances of your app, but there is no stickiness guarantee, than an individual student's action will not necessarily be on the same instance each time, and the apps won't work as you expect.
Shiny-Server pro and ShinyProxy handle this stickiness for you with cookies and headers. Depending on your cloud-services provider, they probably support a browser cookie which would work as long as you don't need your students to be able to open multiple tabs of your app with different instances.

Would R Plumber served through Apache be able to handle multiple requests?

I'm pretty new to R Plumber. I'm trying to deploy an R function as an API to be able to do live calculations for a web app. I understand R is single-thread by default and, hence, Plumber inherits the same limitations to deal with requests. The R function I'm trying to deploy is not costly, but will probably be called multiple times in a single session.
I'm also quite a newbie in terms of serving/deploying web applications, but I do know how to set up an Apache server. I've noticed that Apache can receive and process multiple requests by opening new threads (I honestly consider this as a black-box, magical thing and have zero knowledge on how apache does this). Would serving the plumber API through Apache allow me to bypass the single-thread limitations?
Alternatively, would it be possible to bypass single-thread limitations by using doParallel (or something alike)?
RPlumber is single thread and, as such, can only work with a single request at a time. Found out that, with enough resources, you can deploy several PlumbR servers listening on different ports. It's hacky and nasty, but it got the job done.

How does the Realm Mobile Platform scale?

You could say I am a fan of the Realm Mobile Platform. I'm using it and it seems to be working well.
However I am confused with how to operate it going to production. It seems to be deployed only to one server, and even the professional and enterprise editions are working on my single server.
Assuming Realm have thought of this (as Enterprise edition supports 'enterprise scaling) - how does this work if all clients point to my owned server URL?
Another question is how to monitor the load on that server.
Thanks!
The Professional Edition and the Enterprise Edition emit statsd compatible metrics which allow you to track the usage and load on each node in a Realm Object Server cluster. These metrics are also used internally inside the cluster in order to display statistics about the health of the cluster.
We are obviously still adding metrics as we understand more about our customer's use-cases, and fine-tuning the ones that we have.
With regards to the way the clustering works, we are currently implementing this according to an iterative process, where we add more and more features, and more and more resilience to the system with every passing day.
Basically, we have a logical load balancer process, which receives the incoming client connections, and then dispatches that to a node inside the cluster. This logical load balancer can be HA'd and LB'd itself as well, just like you would any regular WS connection handler. Handling many connections these days is easy. It's handling the quadratic merge algorithms that is expensive on the Realm Object Server, which is why the clustering is required for deployments at scale.

Is there any possibility for one rserve's client to share workspace with another?

I'd like replace RExcel with the Excelsi-R. Excelsi-R talks R via RServe, and RServe has this feature, that makes each client work in independent workspaces.
What I want is to actually share a single workspace between at least 2 simultaneously connected clients. One client would be run by Excelsi-R, and another by manually launched interactive R Session. That would allow me to interface with the Excelsi-R session in traditional way (say, in RStudio).
I don't need asynchronous computation; I'm perfectly happy if Excelsi-R would have to wait, until a command issued by the other connection finishes, and vice versa; just like in the RExcel "foreground mode".
Is it possible?
Not currently, since each process has exactly one connection. There are a few hacks - such as you can "switch" sessions by starting a listener for another connection in an existing session - but that may be a bit too limited.
That said, it is technically possible (Rserve support looping over multiple connections - it is used in RCloud to support two separate processes on one connection) - the challenge is how to link two independent connections to a single process. The rsio communication was added in Rserve 1.8 specifically to allow the passing of descriptors between Rserve instances, but it was not used so far. If there is interest in that kind of functionality, I can see how it could be added.

Should i use Coherence standalone server ? for a java webservice to use cache data ?

I am new to oracle coherence
Basically we have some data and we wanted some java/bpel webservice to get those data from coherence cache instead of database. [we are planning to load all those data to cache server]
So we have below questions before we start this solution.
Webservice we are planning to start is going to be just java would be fine.
And all operations are reading only.
Question
1. IS it Coherence needs to be stand alone server ? (down load from oracle and install it separately and run the default cacheserver) ?
2.If so we are planning to do the pre loading of data from database to cache server by using code ? i hope thas possible ? Any pointers would be helpful ?
3.How does the webservice connect with Coherence server if webservice running in different nmachine vs coherence server running ?
(OR)
Is it mandatory that webservice and coherence should run in the same machine ?
If webservice can run in different machine how does the webservice code connects to coherence server (any code sample , url would be helpful) ?
Also what is that coherence comes with weblogic ? Is it not fit for our applications design i assume ?!!!! then what type of solution we go for weblogic with coherence ?
FYI : Our goal is simple we want to store the data in cache server and have our new webservice to retrieve the data from Cache servere instead of database(because v are planning to avoid database trip )
Well, you questions are very open and probably have more than 1 correct answer. I'll try to answer all of them.
First, please take into consideration, that Coherence is not a free tool and you have to pay for a license.
Now, to the answers:
Basically, coherence has 2 parts: proxy and server. The first is the responsible to routing your requests and the second for hosting the data. You can run both together in the same service but this has pros&cons. One Con is that your services will be very loaded and the memory will be shared between two kind of processes. A pro is that is very simple to run.
You can preload all the data from the DB. For that you have to open the Coherence and write your own code. For that, you need to define you own cachestore (look for that keyword in coherence docs) and override the loadAll method.
As far I remember Coherence comes together with Weblogic. That says the license for the one is the license for the second and they come in the same product. I'm not familiar with weblogic, but I suppose is a service of the package. In any case, for connecting to coherence you can refer to Configuring and Managing Coherence Clusters
The coherence services can run in different machines, in different network and even in different places of the world if you want. Each, proxy or server, consumer and DB, could be in a different network. Everything can be configured. You have to say you weblogic server where the coherence proxy will be, you'll set in the coherence proxy/server the addresses of them and you'll configure your coherence server for finding out his database by configuration. Is a bit complicated to explain everything here.
I think I answered before.
Just take into consideration coherence is a very powerful tool but very complicated to operate and to troubleshoot. Consider the pros/cons of accessing directly your DB and think about if you really need it.
If you have specific questions, please don't hesitate. Is a bit complicated to explain everything here since you're trying to set up one of the complicated system I ever seen. But is excellent and I really recommend it.
EDIT:
Basically Coherence is composed by 2 main parts: proxy and server. The names are a bit confusing since both are servers, but the proxy serves to the clients trying to perform cache operations (CRUD) while the "servers" serve to the proxies. Proxy is the responsible for receiving all the requests, processing them and routing them, according to their keys, to the respective server who holds the data or who would be the responsible for holding it if the operation requires a loading act. So the answer to your questions is: YES, you need at least one proxy active in your cluster, otherwise you'll be unable to operate correctly. It can run on one of your machines on into a third one. Is recommended to hold more than 1 proxy for HA purposes and proxies can act as servers as well (by setting the localstorage flag to true). I know, is a bit complicated and I recommend to follow oracle docs.
Essentially, there are 2 types of Coherence installation.
1) Stand-alone installation (without a WebLogic Server in the mix)
2) Managed installation (with Weblogic Server in the mix)
Here are a few characteristics for each of the above
Stand-alone installation (without a WebLogic Server in the mix)
Download the Coherence installation package and install (without any dependency on existing WebLogic or FMW installations)
Setup and Configure the Coherence Servers from the command-line
Administer and Maintain the Coherence Servers from the command-line
Managed installation (with Weblogic Server in the mix)
Utilize the existing installation of Coherence that was installed when WebLogic or FMW was installed
Setup and Configure the Managed Conherence Servers to work with WebLogic server
Administer and Maintain the Managed Coherence Servers via the WebLogic console
Note the key difference in terminology, Coherence Servers (no WL dependency) vs. Managed Coherence Servers (with WL dependency)

Resources