Workflow for using command line R? - r

I am used to using R in RStudio. For a new project, I have to use R on the command line, because the data storage and analysis are only allowed to be on a specific server that I connect to using ssh. This server doesn't have rstudio-server to support remote RStudio sessions.
The project involves an extremely large dataset, and some pre-written code to load/format the data that I have been told to run using "source()" before I do anything else. This takes several minutes to run and load the data each time.
What would a good workflow be for something like this? Editing my code in a .r file, saving, then running it would require taking several minutes to load the data each time. But just running R in an interactive session would make it hard to keep track of what I am doing and repeat things if necessary.
Is there some command-line equivalent to RStudio where you can have an interactive session but be editing/saving a file of your code as you go?

Sounds like JuPyteR might be your friend here.
The R kernel works great.
You can use it on a remote server either with exposing an open port (and setting up JuPyteR login credentials)
Or via port forwarding over SSH.
It is a lot like an interactive reply, except it holds state.
And you can go back and rerun cells.
(Of course state can be dangerous for reproduceability)

For RStudio you can launch console and ssh to your remote servers even if your servers don't use expensive RStudio for servers platform. You can then execute all commands from R Studio directly into the ssh with the default shortcut key. This might allow to continue using R studio, track what you're doing in the R script, execute interactively.

Related

RStudio Server on Microsoft Azure instance

I am currently running R on a Microsoft Azure instance (Ubuntu virtual machine) using RStudio as my IDE, to which I connect simply through my browser. I am trying to run some commands that take quite some time to complete from within RStudio and figured that I could simply close my tab with RStudio open and the process would keep running. However, when I try to reconnect to see how the process is doing, the page keeps loading but I am unable to see RStudio.
I have a few questions regarding running RStudio on a server:
First, am I correct in thinking that I can close my tab and keep the process running?
Second, is it normal behaviour that I am unable to connect to the server while the process is running?
Third, am I going about this the correct way or are there better ways?
Yes, you can close your tab and keep it running.
RStudio Server waits on updates from the R process to update the UI. This means that if you have a long-running computation, your tab may not fully reload until it's finished. You may also have seen this in the middle of a session: when R is busy, you can have problems saving scripts that are open in the editor pane.
Logging out in the middle of a computation should be safe, but be aware that RStudio will save your workspace and shut R down after a period of inactivity. It then reloads everything when you log back in. But this only extends to objects in memory; if you have any files saved in your temp directory, they'll have disappeared when you come back. They're probably still on the disk, but since your new R session has a new temp directory, you'll have to do a manual search for them.

Code secure protection

I wrote an R script on an institutional computer. I saved it on an external device (USB PenDrive) and tested it on a computer not of my institution. When I run it to test it using R from Terminal and from GUI I never saved it on the host computer.
This because it has to remain secret until it will be published. I simply would like to know if R or Unix itself saved it elsewhere while running although I load it directly giving the path of Volumes and so on.
In other words the question is does running an R script off a USB leave any trace of it on the host computer.
P.S.: I checked the R history to look for it but it seems not to be saved.

is it possible to run R as a daemon

I have a script in R that is frequently called during the day (by other scripts). I call R in a terminal using
Rscript code.R
I notice it takes a lot of time to load packages and set up R.
Is it possible to run R as a background service which I hit using a port or something?
Yes, look into RServe which has been available for over a dozen years for this reason. There are a couple of fairly high profile applications too.
You can check out this add-in for Rstudio, it is not a port like solution but maybe it can help you https://github.com/bnosac/taskscheduleR

Using RStudio with R backend on cluster via SSH

I have access to (not authority over) a computing cluster which has R installed. Is there a way for me to use R-Studio on my local computer -- but have the code running on the cluster via SSH?
To clarify -- No I don't really have non-SSH access, no I can't install R-Studio (server or desktop) on the cluster.
In line with the hackish options #hrbrmstr mentioned...
If your aim is to run mostly non-interactive code, then you can probably establish an n-node parallel::makePSOCKcluster() on the remote machines and run each of your commands via parallel like commands. Similarly, you could use package::svSocket, see this neat demo on YouTube for more details than fit in a reasonable answer.
But, given that you said RStudio, I suspect you are thinking of interactive use, and the above would be doable (but painful). Nothing I know of will let you just pretend that the remote machine is the local machine (which is a pity to be sure). However you might be able to hack something together, with sink() etc and a server and client side loop, e.g. How to connect two computers using R?.

Running GUI analysis packages from RStudio server

RStudio server uses a headless R session and seems to pass all of the I/O operations encoded to save bandwidth. This works for everything except for packages like Rattle or Latticist, which work through their own GUI. Is there a way to use these packages through RStudio server or otherwise access the RStudio server R session to run these packages remotely?
Bonus if there's an efficient way to run these packages remotely without forwarding an X session over SSH.
I'm not sure this is possible over the RStudio interface because of the way these graphical programs work. It's easy enough for RStudio to capture textual input and output for R. Capturing normal graphical output is pretty impressive, but that's done "natively" in R. Even packages like ggplot2 and lattice use the builtin R plotting capabilities -- they do some rendering and data processing on their own, pass that onto grid and then grid renders the plots via R builtins when plot() or print is called (including implicitly in the REPL for interactive sessions). RCommander, RGL and the like use external libraries (Tcl/Tk, OpenGL), which render their interfaces directly over operating system services and not via R. R doesn't even see the output from these programs -- it only knows that the R wrapper function for these services hasn't returned yet. For local RStudio, this isn't a problem because the services are forwarded directly to the local display, but for RStudio server, there is no display!
Another consideration: assuming R could capture and forward X, that would imply having an X Server (in X, Server is the display/keyboard/etc, Client is the program that needs I/O) running in your browser. Modern JavaScript is pretty amazing at times, but X is a very complicated codebase and very sensitive to latency. Running X over the Internet is much slower than over the local network -- the protocol just wasn't designed for such things and most operations involve far too many roundtrips.
On a more practical side, you can still do most of your work via RStudio and only do the graphical commands via X forwarding:
Do everything that doesn't involve an external graphics interface.
Save your R Session (in the Environment tab or via the command line) as .RData in your project directory. (You can actually do this elsewhere, but it's generally more convenient if your workspace is saved in the working directory.)
Login in via SSH and X Forwarding and cd to the project directory.
Start R -- R will automatically load any existing workspaces saved as .RData. (You can disable this behavior with --vanilla. Depending on the size of your workspace, R may take a few seconds to a few minutes to load.
Have fun with Rattle, Latticist, RCommander, RGL, etc! Be ready for massive lag if you're doing this over the Internet and not the local network (see above).

Resources