Hosting shinyApp on EC2 with background running capability - r

I want to host a shiny app on amazon EC2 which takes a excelsheet using fileinput(). Then I need to make some API calls for each row in the excelsheet which is expected to take 1-2 hours on average for my purposes. So I figured out that this is what I should do:
Host a shiny app where one can upload an excelsheet.
On receiving an excelsheet from a user, store it on the amazon servers, notify the user that an email will be sent once the processing is complete, and trigger run another R script (I'm not sure how to do that) which will keep running in background even if the user closes the browser window and collect all the information by making the slow API calls.
Once I have all the data, store it in another excelsheet and email back to the user.
If it is possible and reasonable to do it this way or you have some other ideas to do my task, please help me with how to do it.
Edit: I've found this is what I can do otherwise:
Get the excelsheet data and store it in a file.
Call a bash script from the R shiny like this: ./<my-script> &; disown
The bash script will call a python file which makes all API calls, decodes the relevant data from JSON output and stores it in another file on the server.
It finally sends an email to the user with he processed data attached.
I wanted to know if this is an appropriate way to do the job. Thanks a lot.

Try implementing simple web framework like Django since you are using python. Flask may come in handy for creating simple routes. Please comment if you find any issues.

Related

Symfony: call command in controller for lengthy action

I have a application in which I can generate raw export in xls.
The problem is that the xls generation can be very long, more than the timeout duration.
I've checked and my query isn't the culprit (takes <2s for a regular query), but the xls generation is very long (for several thousand lines, I put different colors in cells, conditionally display data...).
I was thinking about the command, which runs in CLI, without timeout problem.
I can't use it directly, because the data to be generated has to be called by users (without cli access).
So I thought about calling the command in my controller
The user would choose the parameters in a form, send the form, and then in the controller, the parameters would be passed to the command that would do the heavy lifting.
My question is: In this case, is the command called in the CLI context (with CLI timeout = 0) or is it called in the application (Web) context (with timeout <50s) ? In the latter case, this would be useless, and I would be grateful for any advice on any alternate method to resolve my problem.
This is a textbook case for a message queue.
RabbitMq is recommended, and easy to use with Symfony.
You will have a producer, which will generate a message and put it in a queue. This will be done in your controller.
The db query and the sheet generation should be placed in the consumer (the command running in the background, picking messages from the queue and processing them).
When the sheet is ready, save it as a file, and perhaps log it in the database with a unique ID.
This migth sound difficult, but it is very simple, and you should learn it anyway :)
A problem is showing the result to the user. The simplest way is to refresh the browser every X seconds. Other choices include polling with ajax, and websocket based notifications from the server.

R-Shiny two way communication

We have separate process which provides data to our R-Shiny application. I know I can provide data to R-Shiny via file or database and observe the data source via reactivePoll. This works fine and I understand it's sort of recommended way.
What I don't like on this approach is:
It's hard to send to shiny various type of inputs (like data and metadata)
I miss the shiny application feedback to the data providing process. I just write a file and hope that shiny app will pick it up and process and will be successful. Data sourcing process cannot be notified about failure for example (invalid data)
I would love to have some 2-ways protocol. For example send the data through a websocket (this would have to be different websocket than the one Shiny has with the UI obviously) or raw socket and be able to send response back.
Surely I can implement some file-based API where I store files under different names observing them with shiny and shiny then writes other files back and I would observe them with my application which provided the data. But this basically sucks :)
Any tips much appreciated!
Edit: it mas or may not be obvious from saying the Java and R applications are writing files for each other ... But the apps are running on the same host and I can live with this limitation

Is there a way to change the MONGO_URL in code?

I'm searching for a way to change the way Meteor loads the Mongo database. Right now, I know I can set an environment variable when I launch Meteor (or export it), but I was hoping there was a way to do this in code. This way, I could dynamically connect to different instances based on conditions.
An example test case would be for the code to parse the url 'testxx.site.com' and then look up a URL based on the 'textxx' subdomain and then connect to that particular instance.
I've tried setting the process.env.MONGO_URL in the server code, but when things execute on the client, it's not picking up the new values.
Any help would be greatly appreciated.
Meteor connects to Mongo right when it starts (using this code), so any changes to process.env.MONGO_URL won't affect the database connection.
It sounds like you are trying to run one Meteor server on several domains and have it connect to several databases at the same time depending on the client's request. This might be possible with traditional server-side scripting languages, but it's not possible with Meteor because the server and database are pretty tightly tied together, and the server basically attaches to one main database when it starts up.
The *.meteor.com hosting is doing something similar to this right now, and in the future Meteor's Galaxy commercial product will allow you to do this - all by starting up separate Meteor servers per subdomain.

Programmatic Remote Access to the Datastore

I have a requirement to implement a batch processing system that will run outside of Google App Engine (GAE) to batch process data from an RDBMS and insert it into GAE.
The appcfg.py does this from various input files but I would like to do it "by hand" using some API so I can fully control the lifecycle of the process. Is there a public API that is used internally by appcfg.py?
I would write a daemon in Python that runs on my internal server and monitors certain MySQL tables. Under the correct conditions, it would grab data from MySQL, process it, and post it using the GAE RemoteAPI to the GAE application.
sounds like you already know what to do. in your own words: "grab data from MySQL, process it, and post it using the GAE RemoteAPI." the remote api docs even have examples that write to the datastore.
What you could probably do (If I understand right what your problem is) is using the Task Queue. With that you could define a Task that does what you expect it to do;
Lets say you want to insert something into GAE-datastore. prepare the insert file on some server. Than go to your application and prepare an "Start Insert Task". By clicking on that a background task will start, read that file and insert it into the datastore.
Furthermore, if that task is daily performed you could invoke the task creation with a cron job.
However, if you could say more about the work you have to perform it would be easier :-P

long running http process - how to put in separate process?

I know that similar questions have been asked all over the place, but I'm having trouble finding one that relates directly to what I'm after.
I have a website where a user uploads a data file, then that file is transformed and imported into SQL. The file could be up to 50mb in size, and some times this process can take 30 minutes or sometimes even longer.
I realise I need to palm off the actual work to another process, and poll that process on the web page. I'm wondering what the best approach would be though? Being a web developer by trade, I'm finding all this new Windows Service stuff a bit confusing, and I just wanted somewhere to start.
So:
Can I do / should I being doing this with a windows service? if so, how?
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
clarifications
The data is imported into sql, there's no file distribution taking place.
If there is a failure, it absolutely MUST be reported to the user. The web page will poll every, lets say, 5 seconds, from the time the async task begins, to get the 'status' of the import. Once it's finished another response will tell the page to stop polling for status updates.
queries on final decision
ok, so as I thought, it seems that a windows service is the best idea. So as to HOW to get it to work, it seems the 'put the file there and wait for the service to pick it up' idea is the generally accepted way, is there a way I can start a process run by the service, without it having to constantly be checking a database table / folder? As I said earlier, I don't have any experience with Windows Services - I wondered if I put a public method in the service, can I call it somehow?
well ...
var thread = new Thread(() => {
// your action
});
thread.Start();
but you will have problems with that:
what if the import to sql fails? should there be any response to the client
if it fails, how do you ensure the file on a later request
what if the applications shuts down ... this newly created and started thread will be killed either
...
it's not always a good idea to store everything in sql (especially files...). if you want to make the file available to several servers why not distribute them via ftp ...?
i believe that your whole concept is a bit messed up (sry assuming this), and it might be helpful if you elaborate and give us more information about your intentions!
edit:
Can I do / should I being doing this
with a windows service? if so, how?
you can :) i advise you to create a simple console-program and convert this with srvany and sc. you can get a rough overview howto here (note: insert blanks after =... that's a silly pitfall)
the term should is relative, because you did not answer the most important question
what if a record is persisted to the database, telling a consumer that file test.img should be persisted, but your service hasn't captured it or did not transform it yet?
so ... next on
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you probably could create a WCF-service which recieves some binary-data and then stores this to a database. this request could be async. yes. but what for?
once again:
please give us more insight to your workflow: what are you exactly trying to achieve? which "environmental-conditions" to you have (eg. app A polls db and expects file-records which are referenced in table x to be persisted) ...
edit:
so you want to import a .csv-file. well that changes everything :)
but i won't advise you to use a wcf-service (there could be a usage: eg. a wcf-service which has a method to insert a single row, then your iteration through the file would be implemented in another app... not that good, though).
i would suggest following:
at first do everything in your webapp (as you've already done), but rather use some sort of bulk-insert and do your transformation/logic on the database.
if you have some sort of bottle-neck then, i would suggest you something like a minor job-service, eg:
webapp will upload the file and insert a row to a job-table. the job-service is continiously polling the table/or gets informed via wcf by the webapp (hey, hey, finally some sort of usage for WCF in your scenario... :) ) and then does the import-job, writing a finish-note to a table/or set the state of the job to finished ...
but this is a bit overkill :)
Please see if my below comments helps you to resolve your issue:
•Can I do / should I being doing this with a windows service? if so, how?
Yes you can do this with a windows service. And I think that is the way you should be doing it. You can implement your own service to process your request or you can use the open source code Job Proccessor
Basically the idea is..
You submit a request for processing
the csv file in database table with
some status as not started.
Then your windows service picks up
the request from database table which
are not started and update them as in
progress status.
Once the processing is complete
succesfully /unsuccesfuly your
service updated the database table
with status as Completed / Failed.
And your asp.net page can poll to
database table for the current status
every 5 sec or so.
•Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you should not be using WCF for this purpose.

Resources