Persistent R session in R shiny App - r

I have a shiny App with around 5 GB data load in global.R. For the first App user the app page load time is around 3 to 4 mins as all the global data have to be read from the disk during app initiation.
But for subsequent users (second or third user) the page load is immediate as the app uses the previously loaded global data in memory.
Is there a way to make the Shiny App's R process to be persistent in Memory even if all the users log-out. So that whenever a new user access the app it will load immediately ?

I am guessing you are using the .RData binary representaton of the data, this is much faster than most files to read, but still slow.
Therefore, have you tried running an RServe session? (https://www.rforge.net/Rserve/). This could have your data avaliable within it, and then passing the relevant queries/commands to retrieve the data.
The alternative is a faster instantiating dataset, maybe as an ffdf file.

Related

Do shiny apps data-scoping rules apply to ShinyProxy?

as far as I understand, ShinyProxy launches a separate container for every connected user, is it possible to share data among user sessions by using these documented shiny scoping rules (see Objects visible across all sessions)?
My use case involves loading a big static dataset in memory that is the same for every app user, so the correct approach here is to have a single copy of the dataset in memory and share it among all user sessions (= load it before the 'server' function). Does this work with ShinyProxy as explained in the above Shiny documentation?
Thanks in advance,
Juanje.

Hosting shinyApp on EC2 with background running capability

I want to host a shiny app on amazon EC2 which takes a excelsheet using fileinput(). Then I need to make some API calls for each row in the excelsheet which is expected to take 1-2 hours on average for my purposes. So I figured out that this is what I should do:
Host a shiny app where one can upload an excelsheet.
On receiving an excelsheet from a user, store it on the amazon servers, notify the user that an email will be sent once the processing is complete, and trigger run another R script (I'm not sure how to do that) which will keep running in background even if the user closes the browser window and collect all the information by making the slow API calls.
Once I have all the data, store it in another excelsheet and email back to the user.
If it is possible and reasonable to do it this way or you have some other ideas to do my task, please help me with how to do it.
Edit: I've found this is what I can do otherwise:
Get the excelsheet data and store it in a file.
Call a bash script from the R shiny like this: ./<my-script> &; disown
The bash script will call a python file which makes all API calls, decodes the relevant data from JSON output and stores it in another file on the server.
It finally sends an email to the user with he processed data attached.
I wanted to know if this is an appropriate way to do the job. Thanks a lot.
Try implementing simple web framework like Django since you are using python. Flask may come in handy for creating simple routes. Please comment if you find any issues.

How can I keep objects stored in firebase cloud function RAM?

My application needs to build a couple of large hashmaps before processing a user's request. Ideally I want to store these hashmaps in-memory on the machine, which means it never has to do any expensive processing and can process any incoming requests quickly.
But this doesn't work for firebase because there's a chance a user triggers a new instance which sets off the very time-consuming preprocessing step.
So, I tried designing my application to use the firebase database, and get only the data it needs from the database each time instead of holding all the data in-memory. But, since the cloud functions are downloading loads of data from the database, I have now triggered over 1.7 GB in download for this month, just by myself from testing. This goes over the quota.
There must be something I'm missing; all I want is a permanent memory storage of some hashmaps. All I want is for those hashmaps to be ready by the time the function is called with a request. It seems like such a simple requirement; how come there is no way to do this?
If you want to store data in the container that runs your Cloud Functions, you can use its local tempfs, which is actually kept in memory. But this will disappear when the container is recycled, which happens when your function hasn't been access for a while. So this local file system will have to be rebuilt whenever the container spins up.
If you want permanent storage of values you generate, consider using Google Cloud Storage. It is probably a more cost effective option, and definitively the most scalable one.

R Shiny and Spark: how to free Spark resources?

Say we have a Shiny app which is deployed on a Shiny Server. We expect that the app will be used several users via their web browser, as usual.
The Shiny app's server.R includes some sparklyr package code which connects to a Spark cluster for classic filter, select, mutate, and arrange operations on data located on HDFS.
Is it mandatory to disconnect from Spark: to include a spark_disconnect at the end of the server.R code to free resources ? I think we should never disconnect at let Spark handle the load for each arriving and leaving user. Can somebody please help me to confirm this ?
TL;DR SparkSession and SparkContext are not lightweight resources which can be started on demand.
Putting aside all security considerations related to starting Spark session directly from a user-facing application, maintaining SparkSession inside server (starting session on entry, stopping on exit) is simply not a viable option.
server function will be executed every time there is an upcoming event effectively restarting a whole Spark application, and rendering project unusable. And this only the tip of the iceberg. Since Spark reuses existing sessions (only one context is allowed for a single JVM), multiuser access could lead to random failures if reused session has been stopped from another server call.
One possible solution is to register onSessionEnded with spark_disconnect, but I am pretty sure it will be useful only in a single user environment.
Another possible approach is to use global connection, and wrap runApp with function calling spark_disconnect_all on exit:
runApp <- function() {
shiny::runApp()
on.exit({
spark_disconnect_all()
})
}
although in practice resource manager should free resources when driver disassociates, without stopping session explicitly.

Do I need to create a new SQLite database every time an application is updated?

I have a Xamarin Forms application I would like to develop. It will have a SQLite database and I wish to make this available on iOS and Android. The database will be populated with data from a SQL Server database on the cloud with initial seed data. I'm thinking this will be about 500 rows of data with each row about 1Kb.
What I don't understand is when and how to populate this. Should I try to put the data into a CSV file and have this populate the database when the application is installed, or when it first starts? What's the normal way to populate seed data other than lines inside of the code with a huge number of insert statements.
Any help or advice on how this is normally done (I'm thinking most people do it the same way) would be much appreciated.
Thanks
Lets break the problem down.
Is the initial data that you wish to use in your app going to change over time?
If you include any pre-populated data (a SQLite, Realm, or CSV-based file, ...) and the data that you are including goes stale and you have to update it on a routine basis, you will need to publish an application update (.apk/.ipa) so your new user installs receive the updated data (more on this below).
Note: This assumes that your current users get the updated data via actually running your app and it is handling the local data updates on routine basis (background service, push notifications, data polling, etc..)
Is this a Line of Business (LoB) application published via Ad-Hoc, private Store, and/or iOS Enterprise publishing?
If you control the user base, than having to force an update install so your users get your new/updated pre-populated data might be an acceptable approach, but not a great user experience if they forced to update the application all the time... but it works...
Is this application going to be distributed via the public Apple and Google App Stores?
This is where you need to be very careful on what pre-populated data you include within your application.
If the data goes stale and you need to push an updated app version to the Stores for your new install installs, beware that it could be days (or weeks or even month+) to get that new app into the store.
The Play Store usually is less then 24 hours on publishing app updates, and while the Apple Store can be the same, do not bet on it.
We routinely see 48-72 hour delays and randomly get rejected and thus it can take a week or more to get an update app into the Apple Store. We have had rejections delaying an app update for over a month and have gone into the appeal process and even removed already existing features to get re-published
Note: Every app update to the Apple store resets your user reviews... :-(
Bottom line: You want to want to publish to the Stores when you are bug fixing and/or adding features, not to update some "static" data that is stored within your app bundle...
What does this data cost your end-user and you?
Negative costs to you as an app developer are bad reviews and uninstalls. Look at how this "data" effects the end-users access to your application and how they react. Longer download time, usually acceptable. Longer initial app startup times, less acceptable... etc....
What markets will your app be used in? Network speeds and the cost of data transfer in many markets across the world are slow and costly...
What really is the true size of the data?
I "pre-populate" a Realm data instance with thousand of rows with 5MB of JSON data in under a second. SQLite takes longer, but it is still not bad. The data itself is stored in a zip and accessed as a static file (https-based get) and at a 80% compression factor, the 1MB of compressed data is pulled from a server (AWS S3) in under one second using LTE cellular data speeds and uncompressing it as stream while deserializing the JSON on-fly to update the Realm instance adds another second...
So, the user impact is very small and I "hide" this initial pre-populate update via a first-time welcome screen and some text that the user hopefully reads before getting to the first "real" app screen...
Note: This does assume that the user will have network data access the first time they open the app... In many markets around the world, this is not true, so factor this into your app design.
I also architect the app so its data can be update on background threads during its launch (the initial one or not) and thus the user does not stand there watching a spinning busy indictor, they can at least interact with the data that they do have.
So should you include any pre-populated data in your app bundle?
Sure, when that data is absolutely required to get the user up and running as fast as possible to enhance the user experience. Games are a great example of this in bundling 100s of megabytes or even gigabytes via .obb... with the various levels, media files, etc... into the app so the user does not experience a 10+ min. wait time upon opening the app the first time.
Now this does mean that their initial download time for the install was longer as that data was bundled within the app, the overall user experience was better as users accept the download/install times and view that as a carrier/phone/service plan issue vs. the time to open your app the first time to actually get to a functional screen.
So what do?
Personally I look at this issue on a case by case basis. I look at the data and if it is not going to change and only get added to and possibly pruned over time, include it as a pre-populated SQLite or Realm store or... Why cause the user to wait for the web requests, database updates and the additional network data usage and associated costs. If the data is going to go stale, do not bundle it in your app.
As for the mechanics of installing pre-populated data:
See my answer on this SO Question about "Bundle prebuilt Realm files"
You don't have to create your sqlite database every time the app is updated.
Actually SQLiteOpenHelper provides the following two methods:
OnCreate() : you should implement this method and create your sqlite database with populated data from the server. It is called when you the app is started for the first time.
OnUpgrade(): you should implement this method if you want to modify the database (add a new table or column in a table) or populate additional data.
The database is preserved between app updates and you don't need to create it each time.
Check the following examples which explain how to use sqlite database with Xamarin:
Using Sqlite in a Xamarin.Android Application Developed using Visual Studio
and
An Introduction to Xamarin.Forms and SQLite

Resources