Understanding the scalability of RShiny apps hosted on ShinyServer - r

I am building a series of interactive shiny web apps for a project that I am considering turning into a Company. My background is in data science and I don't have a lot of experience on the web app / server side of things, but these are important aspects for me to consider with my project. I currently have an Amazon Linux AMI EC2 instance with ShinyServer (free, open-source) installed, and I am currently hosting early versions of my web apps there. So far everything works fine, but I haven't made the links public yet.
My first question is whether anyone knows if there are certain limitations (scalability limitations, integration with database limitations, security / authentication limitations, etc.) that I will inevitably run into using RShiny apps and ShinyServer? I haven't heard of many successful, super-popular web apps being shiny apps hosted on ShinyServer, but rather my feeling is that ShinyServer is mainly used for hosting RShiny apps that are shared amongst only a small number of people (i.e. shared amongst team members at a company.). Per this thread - Does R-Server or Shiny Server create a new R process/instance for each user? - I am particularly concerned that my app won't be able to handle thousands of users simultaneously since only 1 R process is created for the app regardless of the # of concurrent users of the app. Having 10-20 processes through ShinyServer pro probably doesn't solve the issue either if I ever intend to scale greater than the hundreds or thousands of users. I also noticed that ShinyServer Pro would run me a not-so-negligible $10K per year.
My second question is whether RShiny apps can be deployed using other server technologies, such as Heroku. I came across this github page (https://github.com/virtualstaticvoid/heroku-buildpack-r/tree/heroku-16) but haven't dug too deep into it yet. I've been told that heroku makes it easy to update releases to apps whose code is on github (git push heroku:master), amongst other things.
My third question involves certain specific considerations of mine. In particular, I am currently working on a script that queries data from an API and writes that data to a (not-yet-setup) database of mine. This is the data my apps use, and I'd be interested in having the apps update in real time as the database updates, without requiring the user to refresh the webpage. A buddy of mine suggested AJAX for this type of asynchronous behavior, and it looks like this may be possible in R with something like this (https://github.com/daattali/advanced-shiny/tree/master/api-ajax).
Sorry that this is such a loaded question, but I hope it doesn't get closed down as I think it is fairly educational. Any suggestions / sources / pointing me in the right direction would be greatly appreciated on this.

Canovice,
I'd recommend you take a look at the following RStudio / AWS support articles. To scale a shiny server you'll need to look at using a load balancer:
RStudio
https://shiny.rstudio.com/articles/scaling-and-tuning.html
https://support.rstudio.com/hc/en-us/articles/220546267-Scaling-and-Performance-Tuning-Applications-in-Shiny-Server-Pro
https://support.rstudio.com/hc/en-us/articles/217801438-Can-I-load-balance-across-multiple-nodes-running-Shiny-Server-Pro-
AWS
https://aws.amazon.com/blogs/big-data/running-r-on-aws/
Blog Article:
http://mgritts.github.io/2016/07/08/shiny-aws/
Shiny is a great platform, their support is fabulous. I'd recommend you ring them up - they'll be sure to help answer your questions.
That said if your plan is to create a scalable website that will support thousands or hundreds of thousands of people then my sense would be to recommend you also review and consider using D3.js in conjunction with react.js or Angular.js, not forgetting to mention node.js.
My sense is that you are looking at a backend database connected to a logic engine and visualisation front end. If you are looking for a good overview of usage take a look at the following web page and git repo [A little dated but useful]:
https://anmolkoul.wordpress.com/2015/06/05/interactive-data-visualization-using-d3-js-dc-js-nodejs-and-mongodb/
https://github.com/anmolkoul/node-dc-mongo
I hope the above points you in the right direction.

I'd like to provide some notes related to your second question: Yes, you can use the mentioned buildback to deploy shiny applications on heroku.
I was in a similar situation with you (asking myself about possible ways of serving Shiny applications in a scalable manner) and decided to go the "heroku way".
You may find these hints helpful when deploying your app to heroku using the buildpack mentioned above:
Heroku tries to "guess" how to execute your application. But you can also add a special file, named Procfile, to your application to control the process commands you want to execute for your application. In my case I used web: R -f ~/run.R --gui-none --no-save, where this means that a file named run.R is being passed to the R executable for the web server process
The stack on heroku is based on Ubuntu. If you need additional deb-packages, you can create another special file named Aptfile and add the package names therein, heroku will then automatically install these for you (I needed it for RPostgreSQL)
You can add another special file named init.R and install all R packages as necessary just as you are used to, i.e. with install.packages etc. You can also add initial configuration material within this file.
As a running example, here is an example toy application that I wrote for myself to remember how a "full-stack" shiny app may look like, including compability with heroku.

For a large number of concurrent users, use a load balancer like nginx and enable the autoscaling of your app, e.g. through Kubernetes.
You can deploy your app on Heroku. On the paid tiers it includes NoOps autoscaling of your app. See this tutorial on how to deploy a Shiny app in a Docker container on Heroku: https://medium.com/analytics-vidhya/deploying-an-r-shiny-app-on-heroku-free-tier-b31003858b68
You can query the table last update timestamp in the Shiny server logic with reactivePoll() and rerun your db query if it changed. It is not "real-time" but depending on your application close enough if you set the time interval small.

Related

"Session-Security" in R-Shiny and AWS Fargate

I am currently thinking about the best way to deploy my RShiny app. After trying to host my app on a dedicated server via Shinyproxy, Docker and Nginx - but this solution was (surprise!) not really scalable. The RAM requirement per user was too high for that.
I'm currently considering hosting the app via a Docker image in AWS Fargate, where RAM resources scale up and down as needed.
I'm now wondering about security, though.
Brief background:
My goal is to add my app as a tool to an online store. Here it can and will (hopefully) happen that several users will use the tool at the same time. It's important that users can't mess with each other's data - that's why I thought of ShinyProxy, so that each user gets their "own R session".
Now I am wondering what this looks like with AWS Fargate. Could it be that if multiple users are active in the tool at the same time, there can be mutual interference?
If so, does anyone have any ideas on how to prevent this? Unfortunately, publishing ShinyProxy via Fargate is not possible as far as I know.
I hope I could formulate my question understandably and someone of you can help me.
Thank you and have a nice day!
Brief background: My goal is to add my app as a tool to an online
store. Here it can and will (hopefully) happen that several users will
use the tool at the same time. It's important that users can't mess
with each other's data - that's why I thought of ShinyProxy, so that
each user gets their "own R session".
Probably depends on what you need for your use case.
Shiny actually has no user management per default - in the sense of limiting access to your application for certain groups and requiring authentication (can be done by hosting with Shinyapps.io and others).
But you probably do not really need this anyway - your problem sounds more like a scoping issue.
(you should read this information about it)
Sure, there might only be one R process, but it actually supports multiple client connections (sessions). You can define, what objects these sessions share. This is totally independent from where you host your app.
Everything you put into the shinyServer() function in the server.R file will only be visible within the user session. (every user has it's own session)
If you need to share variables between sessions, you have to put them in the server.R file, but outside of the shinyServer() function.

Is there a way to prevent directory traversal attacks in a Shiny App running in Windows?

I'm trying to develop an internal Shiny app for my organization as a test run. The IT department is requiring the app to be safe from Directory Traversal Attacks. Unfortunately, I have to deploy the Shiny app in a Windows machine. (currently using runApp).
I have searched but not found a way to implement the different recommendations of avoiding Directory Traversal Attacks. Can anyone help me out?
Protecting from a traversal attack is two fold. Once in the application and once in the system.
For the application, you will need to make sure that you are cleaning any inputs that point to a hosted file. For example, if your application allows a user to call images/supercool.png youll need to verify that the path is not being changed to something like ../../../../etc/psswd.
For the system, it is a matter of separating privileges. The accounts given access to the runapp files should not also have access to system files(beyond what is absolutely needed.
I would recommend using shiny server or connect to host the files for you, especially if you do not feel prepared to implement the needed security.
Rstudio has done a lot of work and a great job to make a good product and is continuing to add new features including enhancements around security/access.

Accessing a console application from web page

I've recently created two C# console applications. The first transforms a bunch of command outputs into an XML, and the second transforms the XML into a Word document using a template.
I'd like to know how I could get this onto the web, i.e having a web page where the command output can be uploaded, the two step conversion executed, and finally the Word document made available for download.
Should the web page be created in ASP.NET or are there other (better) options? Do I need to rewrite the console applications in some other format?
This question is fairly broad, with plenty of room for novel sized explanations, but here's a brief highlevel walk through of what likely needs to happen to achieve the proposed results (language agnostic):
Get a hosting provider that allows users to spin up their own machine (i.e. AWS).
Spin up a machine that is compatible with the "console" programs in question.
Install "console" programs on machine.
Install a programming language (i.e. Node.js, PHP, ASP.NET, even C# could do) on the machine.
Install a web server (i.e. NGINX, Apache) on machine, configure it to serve public requests and run with chosen language.
On server request, execute appropriate commands from within the chosen language. Languages typically come with a exec method (i.e. in node.js: require('child_process').exec(command,options,callback))
Get the results of said commands and send it back to the client. Alternatively (for downloads), write the result to a path on the system that is publicly available to the internet and redirect the user to that url (additional configuration might be required to make sure the browser downloads the file as oppose to just serving it).
The steps above should get you pretty close to that you want. As for your questions:
Should the web page be created in ASP.NET or are there other (better)
options?
The "better" options is whatever you feel most comfortable with at the moment, you could always change it later with reasonable effort (assuming that your "console" apps are not unsuspecting unicorns).
Do I need to rewrite the console applications in some other format?
No, unless you have strong reasons to do so (i.e. multi environment compatibility). You could also rewrite to significantly simplify (i.e. bypass working with a CLI and do everything in C#).
Try thinking through these high level steps, begin working on a implementation, and post more specific questions here on StackOverflow when you get stuck.
I hope that helps!

Installing MeteorJS in Amazon S3 Bucket

I currently manage Web App on a LAMP stack hosted with GreenGeeks. As it has scaled up, I have started learning MeteorJS on my local machine and am thinking about redeveloping the app in Meteor to support more concurrent connections. My questions are:
Can Meteor simply be hosted in a Simple Amazon S3 Bucket with no need for a stack of any kind? Is this smart? When something seems this simple, I get nervous.
Is Meteor as portable as it feels? Migrating a LAMP app from one server to another can be a real pain. This "feels" like it's as simple as zipping up the whole thing and simply dragging it anywhere. Again, feels too simple = nervous.
Is meteor the right choice if I am looking to maximize concurrent connections and reduce the number of times I need to go to the server for information? My app loads about 2 MB of data per user and I'd love a situation where this can be loaded once and the user has it available to interact with without going to the server (unless it changes).
Ok answers to your questions:
Well actually You can deploy your meteor app into an Amazon EC2 instance, the process is pretty easy, take a look to This video
Meteor is incredible portable, actually it was made with nodejs, therefore it inherits its features
You are in the right way, you know meteor is reactive, it acts in real time, also uses mongoDB, which is incredible faster than a regular SQL database, so in general meteor's performance is amazing, in fact, there are lots of packages that improve even more the performance of your app like this one and many others

How to avoid chaotic ASP.NET web application deployment?

Ok, so here's the thing.
I'm developing an existing (it started being an ASP classic app, so you can imagine :P) web application under ASP.NET 4.0 and SQLServer 2005. We are 4 developers using local instances of SQL Server 2005 Express, having the source-code and the Visual Studio database project
This webapp has several "universes" (that's how we call it). Every universe has its own database (currently on the same server) but they all share the same schema (tables, sprocs, etc) and the same source/site code.
So manually deploying is really annoying, because I have to deploy the source code and then run the sql scripts manually on each database. I know that manual deploying can cause problems, so I'm looking for a way of automating it.
We've recently created a Visual Studio Database Project to manage the schema and generate the diff-schema scripts with different targets.
I don't have idea how to put the pieces together
I would like to:
Have a way to make a "sync" deploy to a target server (thanksfully I have full RDC access to the servers so I can install things if required). With "sync" deploy I mean that I don't want to fully deploy the whole application, because it has lots of files and I just want to deploy those new or changed.
Generate diff-sql update scripts for every database target and combine it to just 1 script. For this I should have some list of the databases names somewhere.
Copy the site files and executing the generated sql script in an easy and automated way.
I've read about MSBuild, MS WebDeploy, NAnt, etc. But I don't really know where to start and I really want to get rid of this manual deploy.
If there is a better and easier way of doing it than what I enumerated, I'll be pleased to read your option.
I know this is not a very specific question but I've googled a lot about it and it seems I cannot figure out how to do it. I've never used any automation tool to deploy.
Any help will be really appreciated,
Thank you all,
Regards
Have you heard of the term Multi-Tenancy? It might be worth look that up to see if that applied to your "Multiverse" especially if one universe is never accessed by another...
See:
http://en.wikipedia.org/wiki/Multitenancy
http://msdn.microsoft.com/en-us/library/aa479086.aspx
UPDATE:
If the application and database is the same for each client (or Tenant) I believe there are applications that may help in providing the same code/db as an SaaS application? ie another application/configuration layer on top that can handle the deployments etc?
I think these are called Platform as a Service (PaaS) applications:
see: http://en.wikipedia.org/wiki/Platform_as_a_service
Multi-Tenancy in your case may be possible, depending on client security requirements, with a bit of work (or a lot of work):
Option 1:
You could use the one instance of the application, ie deploy the site once and connect to a different database for each client. You would need to differentiate each client by URL to isolate content/data byt setting a connection string for each etc. (This would reduce your site deployments to one deployment)
Option 2:
You could create both a single instance of the application and use a single database. You would need to add a "TenantID" to each table and adjust all your code to accept a TenantID to ensure data security/isolation. Again you wold need to detect/differentiate the Tenant based on the URL to set the TenantID for the session used for every database call. (This would reduce your site and database deployment to one of each)

Resources