Is it possible to manage R sessions? - r

Is it possible to manage R sessions, as in:
Connect your R console to an existing R session process?
Can two R sessions transfer data to one another?
One might desire this in the following likely scenario:
You're happily working on your R project and have generated data that took 3 hours to compute.
You decide to save your workspace in the case of a technical issue.
Upon saving your Rstudio decides to hang for eternity, however, leaving the R session unaffected.
In this scenario, you would want to
Connect to the R session with a terminal to retrieve your data anyway.
Setup another new R session that continuously synchronizes with the existing R session as a backup session.
Is it possible?

Connect your R console to an existing R session process?
Not possible.
Can two R sessions transfer data to one another?
Yes, there are multiple ways to do this. The general keyword for this is “inter-process communication”. You can use files, named pipes or sockets, for example. To serialise the data you can use either builtin functions (saveRDS, readRDS) or packages (e.g. feather).
But for your given use-case, there’s a much simpler solution:
Never rely on RStudio to save your R session. Instead, do so explicitly by calling saveRDS (or, to save the whole workspace, which I don’t generally recommend, save.image). In fact, the general recommendation is to disable the RStudio options for saving and restoring the session!
Make sure that your preferences look like this:

Related

How to speed up loading data at start of shiny app

I'm pretty new to using shiny apps to visualize data. We plan to host our shiny app on our own server. So for that we used docker to deploy our app. However the app is super slow to load, since we have to load a lot of (big) dataframes (up to 10000000 rows x 10 columns), that are saved within a RData object.
My first question is: Will the data be loaded each time a user visits/reloads the website?
I was looking into ways how to spead up loading the data. One possbility might be to use the feather package, which seems to be faster in loading data tables.
Another option would be to put the data into a database. However I do not have experience with that. I saw there are some nice packages like DBI and RMariaDB that seem to work well with shiny app. However, I only find examples where an exterinal database is queried. Is it possible to pack a MySQL database within the docker and access it from within the shiny app? Or is the normal procedure to host the database externally?
I'm really new to all this, so I'm not even sure if I'm asking the right questions. These are the conditions: We have a lot of data in the form of multiple data tables. Those need to be read into our app quickly and needs to be queried quickly through interactive user input. We need to dockerize our app in order to deploy it. What is the best approach here?

Is there a persistent location that is always writable which can be used as data cache by a package?

Is there a predefined location where an R package could store cached data? The data should persist across sessions. I was thinking about creating a subdirectory of ${R_LIBS_USER}/package_name, but I'm not sure if this is portable and if this is "allowed" if my package is installed systemwide.
The idea is the following: Create an R script mydata.R in the data subdirectory of the package which would be executed by calling data(mydata) (according to the documentation of data()). This script would load the data from the internet and cache it, if it hasn't been cached before. (If the data has been cached already, the cache will be used.) In addition, a function will be provided to invalidate the cache and/or to check if a newer version of the data is available online.
This is from the documentation of data():
Currently, four formats of data files are supported:
files ending ‘.R’ or ‘.r’ are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.)
...
Indeed, creating a file fortytwo.R in the data subdirectory of a package with the following contents:
fortytwo = data.frame(answer=42)
and then executing data(fortytwo) creates a data frame variable fortytwo. Now the question is: Where would fortytwo.R cache the data if it were difficult to compute?
EDIT: I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it. The question concerns the "data" package: Where can it store files in a per-user storage so that it is persistent across R sessions and is accessible from different R projects?
Related: Package that downloads data from the internet during installation.
There is no absolutely defined location for package-specific persistent caching in R. However, the R.cache package provides an interface for creating and managing cached data. It looks like it could be useful for your scenario.
When users load R.cache (library(R.cache)), they get the following prompt:
The R.cache package needs to create a directory that will hold cache files.
It is convenient to use one in the user's home directory, because it remains
also after restarting R. Do you wish to create the '~/.Rcache/' directory? If
not, a temporary directory (/tmp/RtmpqdUcbP/.Rcache) that is specific to this
R session will be used. [Y/n]:
They can then choose to create the cache directory in their home directory, which is presumably persistent, or to create a session-specific directory. If you make your data package depend on R.cache, you could check for the existence of the cached object(s) in its .onLoad() hook function and download the data if it isn't there. Alternatively, you could do this in the way suggested in your own question.
Have you looked at in-memory databases? H2 & Redis have bindings in R via RH2 & rredis- both allow you to share the data across r sessions- till the creating session is alive. in order to have it persisting across non-concurrent sessions, you need to write your data to the disk (assuming you can't re-create it on the fly- that would defeat the purpose of this question), and I believe the data package would be a good option. That way, you could add an update function that initializes everytime you load either package (i.e. if the code package has the right dependencies)
An example is RWeka & RWekaJars packages. Look them up on CRAN, and it should be fairly easy to understand how they work.

Using MySQL as cache for quantmod getSymbols

I've been using quantmods getSymbols function a lot and would like to reduce the load on the external data providers and reduce the time it takes to execute longer code loops because of network latency.
Ideal would be a function that takes a list of symbols (like getSymbols), downloads them from the provider configured in 'setSymbolLookup' and saves them in a MySQL database for easy retrieval later using getSymbols.MySQL.
A major bonus would be if another function (or the same one) allowed only downloading the difference since the last update.
Alternatively a type of proxy where the symbol is downloaded if it doesn't already exist in a local MySQL database/cache would also be good.
Has anyone developed something like this, or come across any documentation on how to do it? I've searched around but the closest I can get are some questions about how to use MySQL as an input source.
Thanks in advance!

Can my CGI call R?

I know barely more than zero about R: until yesterday I didn't know how to spell it. But I'm suicidal: for my web site, I'm thinking about letting a visitor type in an R "program" ( is it even called a "program") and then, at submit time, blindly calling the R interpreter from my CGI. I'd then return the interpreter's output to the visitor.
Does this make sense? Or does it amount to useless noise?
If it's workable, what are the pitfalls in this approach? For example, what are the security issues, if any? Is it possible to make R crash, killing my CGI program? Do I have to clean up the R code before calling the interpreter? And the like.
you could take a look to Rserve which allows to execute R scripts via the TCP/IP interface available in PHP for example if I'm not mistaken.
Its just asking for trouble to let people run arbitrary R code on your server. You could try running it in a chroot jail, but these things can be broken out of. Even in a chroot, the R process could delete or alter files, or spawn a long-running process, or download a file to your server, and all manner of nastiness.
You might look at Rweb, which has exactly this behavior: http://www.math.montana.edu/Rweb/
Since you can read and write files in R, it would not be safe to let people run arbitrary R code at your server. I would look if R has something like PHP's safe mode... If not, and if you are root, you can try to run R under user nobody in a chroot (you must also place there packages and libraries - for readonly access, and some temporary directory for RW access).

Pass commands to a running R-Runtime

Is there a way to pass commands (from a shell) to an already running R-runtime/R-GUI, without copy and past.
So far I only know how to call R via shell with the -f or -e options, but in both cases a new R-Runtime will process the R-Script or R-Command I passed to it.
I rather would like to have an open R-Runtime waiting for commands passed to it via whatever connection is possible.
What you ask for cannot be done. R is single threaded and has a single REPL aka Read-eval-print loop which is, say, attached to a single input as e.g. the console in the GUI, or stdin if you pipe into R. But never two.
Unless you use something else as e.g. the most excellent Rserve which (when hosted on an OS other than Windoze) can handle multiple concurrent requests over tcp/ip. You may however have to write your custom connection. Examples for Java, C++ and R exist in the Rserve documentation.
You can use Rterm (under C:\Program Files\R\R-2.10.1\bin in Windows and R version 2.10.1). Or you can start R from the shell typing "R" (if the shell does not recognize the command you need to modify your path).
You could try simply saving the workspace from one session and manually loading it into the other one (or any kind of variation on this theme, like saving only the objects you share between the 2 sessions with saveRDS or similar). That would require some extra load and save commands but you could automatise this further by adding some lines in your .RProfile file that is executed at the beginning of every R session. Here is some more detailed information about R on startup. But I guess it all highly depends on what are you doing inside the R sessions. hth

Resources