Does opencpu supports asynchronous call for time consuming R functions? - r

I have recently created an R package that makes use of sparklyr possibilities. I invoke the package main function from opencpu and pass as argument an url with all my data as a stream. Data stream is successfully analysed in a distributed way via spark and provides some results.
My only problem is that it needs a lot of time to complete the execution part. I tried to invoke my package both via opencpu.call and opencpu.rpc but both of them make me wait until the end of the process.
Since opencpu is an amazing approach of microservice architecture it would be extremely useful to have the possibility of really asynchronous calls.
Is something of the following supported or planned to be supported an the near future?
Option A: receive instantly a sessionid (even though the process is still executed). Then the client is responsible to ask for the status of the process using his sessionid.
Option B: define a callback url that ocpu server triggers, passing the sessionid upon the completion of the analytic process execution.
Thank you very much for you help!

No, current OpenCPU does not support background jobs. You'll have to create a middle layer yourself that performs the request that does the waiting on behalf of the user.

Related

Is there a way to supply a clean R session for every call to a plumber API in R?

I have access to a plumber API which handles some automatic pipelining workflows for me. However, every job runs on the same R session and I'm a bit concerned regarding different calls interfering with each other.
Off to top of my head the wish is that every call starts on a fresh R session. When regularly programming in R I would just restart the session and have a clean setup but with the plumber server this seems to be "not so easy". Some reading also revealed that "cleaning up" your R session without a restart isn't a straight forward thing.
Of course I'm also open to alternative suggestions. Any suggestions and insights are welcome. In the end I would simply like the API calls to return the same thing if the inputs are the same.

what is the best practice for handling asynchronous api call that take time

So suppose I have an API to create a cloud instance asynchronously. So after I made an API call it will just return the success response, but the cloud instance will not been initialized yet. It will take 1-2 minutes to create cloud instance and after that it will save the cloud instance information (ex. ip, hostname, os) to db which mean I have to wait 1-2 minutes so I can fetch the data again to show cloud information. At first I try making a loading component, but the problem is that I don't know when the cloud instance is initialized (each instance has different time duration for creating). I'm considering using websocket or using cron or should I redesign my API? Has anyone design asynchronous system before how do you handle such a case.
If the API that you call gives you no information on when it's done with its asynchronous processing, it seems to me that you'll have to check at intervals until you find that the resource is ready; i.e. to poll it.
This seems to me to roughly fit the description and intent of the Polling Consumer pattern. In general, for asynchronous systems design, I can't recommend Enterprise Integration Patterns enough.
As other noted you can either have a notification channel using WebSockets or poll the backend. Personally I'd probably go with the latter for this case and would actually create several APIs, one for initiating the work and get back a URL with "job id" in it where the status of the job can be polled.
RESTfully that would look something like POST /instances to initiate a job GET /instances see all the instances that are running/created/stopped and GET /instances/<id> to see the status of a current instance (initiating , failed , running or whatever)
WebSockets would work, but might be an overkill for this use case. I would probably display a status of 'creating' or something similar after receiving the success response from the API call, and then start polling the API to see if the creation process has finished.

Serve web request in python that spawns a new long running subprocess

I currently have a python command line application that uses python invoke package to organise, list and execute tasks. There are many task files (controlled & created by users, not me). Execution time for some task files can be more than an hour. Each task is actually a test script/program. invoke is useful in listing/executing all the tasks in a task file (we call it a testsuite) or only a bunch of them (a tasks collection) or a single task. (Having a ton of loose scripts and organising, listing & running them in the way users want would be quite a task, hence invoke).
However, invoke cannot be used as a library. It does not offer an API that can be leveraged to list and run test tasks. So I am forced to run invoke as a shell command in subprocess from command line program. I replace (via execl()) the current process with invoke because once the control passes to invoke, there is no need to come back to parent process. So far good..
Now, there is a requirement that this command line program be callable from a web application. So I need to wrap this cmdline program in a restful http API. I've decided to use bottle.py to keep things simple.
I understand that the long running testsuite (tasks) will have to be done off the http request/response cycle. But I'm unable to finalise exactly how to go about it (prob. I may be overthiniking). But here is what I want ...
Tasks are written by users. They are always synchronous, they may sleep or execute shell commands via subprocess.run().
Application is internal, it will not be bombarded with huge number of requests. No of users Max. 10.
But each request (of type that runs the task) will take minutes and some cases > hour to complete. New requests during this should not block.
Calling application (running on a different host) will need to report progress of the running task to the browser UI. ('progress bar')
Ability to communicate with running task and 'cancel' it from browser UI.
With above situation, am I correct in saying ..
because a new 'process' must be spawnned (due use of subprocess and excl in current code) for a request, it rules out using 'threads' of any type (os threads, greenlets, gevent)?
Using any async libraries (web framework, web/http server or in app code) won't be of much help, because every run request will have to be a new process anyway?
How will the process be spawned when a request comes in? Let the web/htpp server (gunicorn?) do it? or My application has to take case of forking itself?
is 'gunicorn' a good choice for this situation?
I have a feeling that users may also ask for the ability to schedule tasks/tests. I might end up using some sort of task queue. I have read 'huey' and feel that it is light & simple for my needs. (No redis/Celery). But any task queue also means a separate consumer process to administer? More moving parts to the mix.
'progress-bar' functionality means, subprocess has to keep updating its progress somewhere and calling application has to read from there. Does this necessitate 'task queue' anyway?
There is a lot of material on all of this and I have read quite some if it. But it still has left me unclear as to how exactly to go about implementing my requirements. Any direction/pointers would be appreciated. I'd also appreciate any advice on what 'not to use'.
If you need something really simple then you could write a wrapper around task spooler (linux tool to run tasks) https://vicerveza.homeunix.net/~viric/soft/ts/ (especially https://vicerveza.homeunix.net/~viric/soft/ts/article_linux_com.html for more details)
Otherwise it's probably better to switch to uwsgi spooler, rq with redis or celery with rabbitmq (cause with redis it works to certain extent).

MQSeries: Is syncpoint/rollback possible when getting asynchronously with MCB?

I want to pull messages off a MQS queue in a C client, and would love to do so asynchronously so I don't have to start (explicitly) multithreading. The messages will be forwarded to another system that acts "transactionally" but is completely incompatible with XA. So I'd like to have a way to explicitly commit (and thereby remove) a message that's been successfully handed off to the other system, and not commit if this failed, so that the last message is retained for a more successful later attempt.
I've read about the SYNCPOINT option and understand how I'd use that around a regular GET, but I haven's seen any hints on how to make asynchronous message retrieval have transactional behavior like this. Any hints, please?
I think you are describing using the asynchronous callback capability, ie you register a routine to be called when a message arrives, and ask for any get to be under syncpoint... An explanation of how some of it works is in here, https://share.confex.com/share/117/webprogram/Handout/Session9513/share_advanced_mqi.pdf page 4+
Effectively you get called with the MQ message under syncpoint, do your processing with another system, then commit or rollback the message before returning.
Be aware without the use of e.g. XA 2 phase commit, there is always going to be the windows of e.g. committing to the external system and a power outage means the message under the unit of work gets rolled back inside MQ as you didnt have time to perform the commit.
Edit: my misunderstanding, didn't realise that the application was using a callback to retrieve messages, which is indeed fully asynchronous behavior. Disregard the answer below.
Do MQGET with MQGMO_SYNCPOINT, then issue either MQCMIT or MQBACK.
"Asynchronous" and "synchronous" may be misnomers - these are your patterns of using MQ - whether you wait for a reply message or not, these patterns do not affect how MQ processes your calls. Transaction management (unit of work management) works across any MQI calls that use SYNCPOINT, no matter if they are part of a request/reply pattern or not.

Can the R console support background tasks or interrupts (event-handling)?

While working in an R console, I'd like to set up a background task that monitors a particular connection and when an event occurs, another function (an alert) is executed. Alternatively, I can set things up so that an external function simply sends an alert to R, but this seems to be the same problem: it is necessary to set up a listener.
I can do this in a dedicated process of R, but I don't know if this is feasible from within a console. Also, I'm not interested in interrupting R if it is calculating a function, but alerting or interrupting if the console is merely waiting on input.
Here are three use cases:
The simplest possible example is watching a file. Suppose that I have a file called "latestData.csv" and I want to monitor it for changes; when it changes, myAlert() is executed. (One can extend it to do different things, but just popping up with a note that a file has changed is useful.)
A different kind of monitor would watch for whether a given machine is running low on RAM and might execute a save.image() and terminate. Again, this could be a simple issue of watching a file produced by an external monitor that saves the output of top or some other command.
A different example is like another recent SO question, about : have R halt the EC2 machine it's running on. If an alert from another machine or process tells the program to save & terminate, then being able to listen for that alert would be great.
At the moment, I suspect there are two ways of handling this: via Rserve and possibly via fork. If anyone has examples of how to do this with either package or via another method, that would be great. I think that solving any of these three use cases would solve all of them, modulo a little bit external code.
Note 1: I realize, per this answer to another SO question that R is single threaded, which is why I suspect fork and Rserve may work. However, I'm not sure about feasibility if one is interfacing with an R terminal. Although R's REPL is attached to the input from the console, I am trying to either get around this or mimic it, which is where fork or Rserve may be the answer.
Note 2: For those familiar with event handling / eventing methods, that would solve everything, too. I've just not found anything about this in R.
Update 1: I've found that the manual for writing R extensions has a section referencing event handling, which mentions the use of R_PolledEvents. This looks promising.
One more option is the svSocket package. It is non blocking.
Here is an 8 minute video using it, which has over 3,000 views. It shows how to turn an R session into a server and how to send commands to it and receive data back. It demonstrates doing that even while the server is busy; e.g., say you start a long running process and forget to save intermediate results, you can connect to the server and fetch the results (before it has finished) from it.
It depends whether you want to interrupt idling or working R. If the first, you can think of bypassing the R default REPL loop by some event listener that will queue the incoming events and evaluate them. The common option is to use tcl/tk or gtk loop; I have made something like this around libev in my triggr package, which makes R digest requests coming from a socket.
The latter case is mostly hopeless, unless you will manually make the computational code to execute if(evenOccured) processIt code periodically.
Multithreading is not a real option, because as you know two interpreters in one process will break themselves by using same global variables, while forked processes will have independent memory contents.
It turns out that the package Rdsm supports this as well.
With this package, one can set up a server/client relationship between different instances of R, each is a basic R terminal, and the server can send messages, including functions, to the clients.
Transformed to the use case I described, the server process can do whatever monitoring is necessary, and then send messages to the clients. The documentation is a little terse, unfortunately, but the functionality seems to be straightforward.
If the server process is, say, monitoring a connection (a file, a pipe, a URL, etc.) on a regular basis and a trigger is encountered, it can then send a message to the clients.
Although the primary purpose of the package is shared memory (which is how I came across it), this messaging works pretty well for other purposes, too.
Update 1: Of course for message passing, one can't ignore MPI and the Rmpi package. That may do the trick, but the Rdsm package launches / works with R consoles, which is the kind of interface I'd sought. I'm not yet sure what Rmpi supports.
A few ideas:
Run R from within another language's script (this is possible, for example, in Perl using RSPerl) and use the wrapping script to launch the listener.
Another option may be to run an external (non-R) command (using system()) from within R that will launch a listener in the background.
Running R in batch mode in the background either before launching R or in a separate window.
For example:
R --no-save < listener.R > output.out &
The listener can send an approraite email when the event occurs.

Resources