Do shiny apps data-scoping rules apply to ShinyProxy? - r

as far as I understand, ShinyProxy launches a separate container for every connected user, is it possible to share data among user sessions by using these documented shiny scoping rules (see Objects visible across all sessions)?
My use case involves loading a big static dataset in memory that is the same for every app user, so the correct approach here is to have a single copy of the dataset in memory and share it among all user sessions (= load it before the 'server' function). Does this work with ShinyProxy as explained in the above Shiny documentation?
Thanks in advance,
Juanje.

Related

Is it possible to run Shiny as a subprocess of a main app?

I'm facing the usual problem of scaling a computational-intense shiny app to multiple non-blocking users.
I am aware of promises for outsourcing intense computations, but I'd like single-user operations to be synchronous (so as to easily add progress bars for long computations), while each user is independent from each other. That is, an independent process per user session, not per operation.
So my idea was to use a main shiny server (or other R server) to spawn independent child processes for each user, each with her own shiny app which then send back the results to the front end.
Do you have any guidance on this?

R Shiny and Spark: how to free Spark resources?

Say we have a Shiny app which is deployed on a Shiny Server. We expect that the app will be used several users via their web browser, as usual.
The Shiny app's server.R includes some sparklyr package code which connects to a Spark cluster for classic filter, select, mutate, and arrange operations on data located on HDFS.
Is it mandatory to disconnect from Spark: to include a spark_disconnect at the end of the server.R code to free resources ? I think we should never disconnect at let Spark handle the load for each arriving and leaving user. Can somebody please help me to confirm this ?
TL;DR SparkSession and SparkContext are not lightweight resources which can be started on demand.
Putting aside all security considerations related to starting Spark session directly from a user-facing application, maintaining SparkSession inside server (starting session on entry, stopping on exit) is simply not a viable option.
server function will be executed every time there is an upcoming event effectively restarting a whole Spark application, and rendering project unusable. And this only the tip of the iceberg. Since Spark reuses existing sessions (only one context is allowed for a single JVM), multiuser access could lead to random failures if reused session has been stopped from another server call.
One possible solution is to register onSessionEnded with spark_disconnect, but I am pretty sure it will be useful only in a single user environment.
Another possible approach is to use global connection, and wrap runApp with function calling spark_disconnect_all on exit:
runApp <- function() {
shiny::runApp()
on.exit({
spark_disconnect_all()
})
}
although in practice resource manager should free resources when driver disassociates, without stopping session explicitly.

Persistent R session in R shiny App

I have a shiny App with around 5 GB data load in global.R. For the first App user the app page load time is around 3 to 4 mins as all the global data have to be read from the disk during app initiation.
But for subsequent users (second or third user) the page load is immediate as the app uses the previously loaded global data in memory.
Is there a way to make the Shiny App's R process to be persistent in Memory even if all the users log-out. So that whenever a new user access the app it will load immediately ?
I am guessing you are using the .RData binary representaton of the data, this is much faster than most files to read, but still slow.
Therefore, have you tried running an RServe session? (https://www.rforge.net/Rserve/). This could have your data avaliable within it, and then passing the relevant queries/commands to retrieve the data.
The alternative is a faster instantiating dataset, maybe as an ffdf file.

Do I need to create a new SQLite database every time an application is updated?

I have a Xamarin Forms application I would like to develop. It will have a SQLite database and I wish to make this available on iOS and Android. The database will be populated with data from a SQL Server database on the cloud with initial seed data. I'm thinking this will be about 500 rows of data with each row about 1Kb.
What I don't understand is when and how to populate this. Should I try to put the data into a CSV file and have this populate the database when the application is installed, or when it first starts? What's the normal way to populate seed data other than lines inside of the code with a huge number of insert statements.
Any help or advice on how this is normally done (I'm thinking most people do it the same way) would be much appreciated.
Thanks
Lets break the problem down.
Is the initial data that you wish to use in your app going to change over time?
If you include any pre-populated data (a SQLite, Realm, or CSV-based file, ...) and the data that you are including goes stale and you have to update it on a routine basis, you will need to publish an application update (.apk/.ipa) so your new user installs receive the updated data (more on this below).
Note: This assumes that your current users get the updated data via actually running your app and it is handling the local data updates on routine basis (background service, push notifications, data polling, etc..)
Is this a Line of Business (LoB) application published via Ad-Hoc, private Store, and/or iOS Enterprise publishing?
If you control the user base, than having to force an update install so your users get your new/updated pre-populated data might be an acceptable approach, but not a great user experience if they forced to update the application all the time... but it works...
Is this application going to be distributed via the public Apple and Google App Stores?
This is where you need to be very careful on what pre-populated data you include within your application.
If the data goes stale and you need to push an updated app version to the Stores for your new install installs, beware that it could be days (or weeks or even month+) to get that new app into the store.
The Play Store usually is less then 24 hours on publishing app updates, and while the Apple Store can be the same, do not bet on it.
We routinely see 48-72 hour delays and randomly get rejected and thus it can take a week or more to get an update app into the Apple Store. We have had rejections delaying an app update for over a month and have gone into the appeal process and even removed already existing features to get re-published
Note: Every app update to the Apple store resets your user reviews... :-(
Bottom line: You want to want to publish to the Stores when you are bug fixing and/or adding features, not to update some "static" data that is stored within your app bundle...
What does this data cost your end-user and you?
Negative costs to you as an app developer are bad reviews and uninstalls. Look at how this "data" effects the end-users access to your application and how they react. Longer download time, usually acceptable. Longer initial app startup times, less acceptable... etc....
What markets will your app be used in? Network speeds and the cost of data transfer in many markets across the world are slow and costly...
What really is the true size of the data?
I "pre-populate" a Realm data instance with thousand of rows with 5MB of JSON data in under a second. SQLite takes longer, but it is still not bad. The data itself is stored in a zip and accessed as a static file (https-based get) and at a 80% compression factor, the 1MB of compressed data is pulled from a server (AWS S3) in under one second using LTE cellular data speeds and uncompressing it as stream while deserializing the JSON on-fly to update the Realm instance adds another second...
So, the user impact is very small and I "hide" this initial pre-populate update via a first-time welcome screen and some text that the user hopefully reads before getting to the first "real" app screen...
Note: This does assume that the user will have network data access the first time they open the app... In many markets around the world, this is not true, so factor this into your app design.
I also architect the app so its data can be update on background threads during its launch (the initial one or not) and thus the user does not stand there watching a spinning busy indictor, they can at least interact with the data that they do have.
So should you include any pre-populated data in your app bundle?
Sure, when that data is absolutely required to get the user up and running as fast as possible to enhance the user experience. Games are a great example of this in bundling 100s of megabytes or even gigabytes via .obb... with the various levels, media files, etc... into the app so the user does not experience a 10+ min. wait time upon opening the app the first time.
Now this does mean that their initial download time for the install was longer as that data was bundled within the app, the overall user experience was better as users accept the download/install times and view that as a carrier/phone/service plan issue vs. the time to open your app the first time to actually get to a functional screen.
So what do?
Personally I look at this issue on a case by case basis. I look at the data and if it is not going to change and only get added to and possibly pruned over time, include it as a pre-populated SQLite or Realm store or... Why cause the user to wait for the web requests, database updates and the additional network data usage and associated costs. If the data is going to go stale, do not bundle it in your app.
As for the mechanics of installing pre-populated data:
See my answer on this SO Question about "Bundle prebuilt Realm files"
You don't have to create your sqlite database every time the app is updated.
Actually SQLiteOpenHelper provides the following two methods:
OnCreate() : you should implement this method and create your sqlite database with populated data from the server. It is called when you the app is started for the first time.
OnUpgrade(): you should implement this method if you want to modify the database (add a new table or column in a table) or populate additional data.
The database is preserved between app updates and you don't need to create it each time.
Check the following examples which explain how to use sqlite database with Xamarin:
Using Sqlite in a Xamarin.Android Application Developed using Visual Studio
and
An Introduction to Xamarin.Forms and SQLite

Shiny app unstable at many simultaneous requests

I have built a quiz system using Shiny Server on Amazon Web Services. The system runs reliably when I tested it on one or two devices at home. However when I used it in the classroom with more than 10 students the system broke down. The questions and widgets loaded correctly, but when the students tried to submit their answers (after 30 - 40 minutes looking at them) the data was not handled correctly (results are saved in a csv file so I could see that).
I understand that there can be many causes for this, but I would like to know whether one might be that Shiny server is just not designed to handle many simultaneous requests. This would mean I can just forget about using Shiny for my purposes and look elsewhere. For those who are interested in the system, here is the code:
https://github.com/witusj/CFA-2/tree/master/WK4
Many thanks!
It depends on the complexity of your app and the server you host it on. There is an explanation by one of their developers here, although there are no clear guidelines.
Since you have students you can test on, you may be able to get an estimate of how many users the application will be able to handle correctly, and use this number to set a limit to the number of people who can join. If you look at the manual you will find the "Simple Scheduler" to do this. To use the example out of the manual, if you want to limit the number of connected students to 5, you would add simple_scheduler to you configuration:
location / {
# Define the scheduler to use for this location
simple_scheduler 5;
...
}
Since you have more than 5 students, set multiple copies of the application under a number of different locations. You can extend this using the load balancing idea of Huidong Tang, or an implementation of that idea by sjewo.
What #FvD said. But additionally, bear in mind that there's shinyapps.io if you want someone else to host your application in a scalable way, or Shiny Server Pro if you want to back a Shiny application with multiple R processes.
Shiny Server itself can certainly handle plenty of requests (we've seen a single Shiny Server instance gracefully handle up to a thousand concurrent users) -- and it had plenty of room for more -- but as #FvD described, it all comes down to how well your R application scales.
One caveat here: there is a bit of complexity to think through in an application like yours. If you write all your data out to a single .csv file, then you can't safely run multiple instances of the application simultaneously (the processes would be overwriting each other's file). Instead, you could consider writing out the results into a bunch of distinct CSV files which can be aggregated together later, or you could look at using something like a relational database to really do this right. This problem is described in more detail here.

Resources