Using RStudio with R backend on cluster via SSH - r

I have access to (not authority over) a computing cluster which has R installed. Is there a way for me to use R-Studio on my local computer -- but have the code running on the cluster via SSH?
To clarify -- No I don't really have non-SSH access, no I can't install R-Studio (server or desktop) on the cluster.

In line with the hackish options #hrbrmstr mentioned...
If your aim is to run mostly non-interactive code, then you can probably establish an n-node parallel::makePSOCKcluster() on the remote machines and run each of your commands via parallel like commands. Similarly, you could use package::svSocket, see this neat demo on YouTube for more details than fit in a reasonable answer.
But, given that you said RStudio, I suspect you are thinking of interactive use, and the above would be doable (but painful). Nothing I know of will let you just pretend that the remote machine is the local machine (which is a pity to be sure). However you might be able to hack something together, with sink() etc and a server and client side loop, e.g. How to connect two computers using R?.

Related

Point RStudio to remote R instance

I would like to have RStudio running on my local machine (os x) and the R executable on a remote computer.
I'm aware that I could run RStudio Server on the remote machine and connect to it using a web interface, however I abhor using web interfaces for things like this due to the delays and limited ability to move windows and use shortcuts.
Since RStudio isn't stand alone but points to an R executable elsewhere on the local machine (in a location that can be varied), it seems like in theory this pointer could be to a remote location. (Is there some reason this is incorrect?)
How might I accomplish this?

Workflow for using command line R?

I am used to using R in RStudio. For a new project, I have to use R on the command line, because the data storage and analysis are only allowed to be on a specific server that I connect to using ssh. This server doesn't have rstudio-server to support remote RStudio sessions.
The project involves an extremely large dataset, and some pre-written code to load/format the data that I have been told to run using "source()" before I do anything else. This takes several minutes to run and load the data each time.
What would a good workflow be for something like this? Editing my code in a .r file, saving, then running it would require taking several minutes to load the data each time. But just running R in an interactive session would make it hard to keep track of what I am doing and repeat things if necessary.
Is there some command-line equivalent to RStudio where you can have an interactive session but be editing/saving a file of your code as you go?
Sounds like JuPyteR might be your friend here.
The R kernel works great.
You can use it on a remote server either with exposing an open port (and setting up JuPyteR login credentials)
Or via port forwarding over SSH.
It is a lot like an interactive reply, except it holds state.
And you can go back and rerun cells.
(Of course state can be dangerous for reproduceability)
For RStudio you can launch console and ssh to your remote servers even if your servers don't use expensive RStudio for servers platform. You can then execute all commands from R Studio directly into the ssh with the default shortcut key. This might allow to continue using R studio, track what you're doing in the R script, execute interactively.

Best way to execute code on remote machine

I am looking for the best way to execute code on a distant machine. Ideally, I am looking for a solution such as Cuda which provides the opportunity to allocate executions on GPU or CPU, but for distinct machine.
I tried distinct ways to do that :
I connect my machines with ssh, export my script, execut it. No particular issue, but not very handy. But maybe this solution could be optimise. Because I open my ssh connection with the terminal, or termius.
I try another way with mosh, same outcomes, but quicker.
Currently, I am working on a Spyder kernel to have a direct link in the place of execution.
I've seen there is also a possibility with a nohup connection, but I have to work on this solution to understand well the possibilities.
Everything works well, but I am looking for a more convenient solution.
Thank you in advance for your answers !
You could either use sshfs along to ssh to mount the remote filesystem on your machine it's easier than always copy the code by hand, if so I would recommend to use screen or something like that that if the connection breaks it offers no problems.
Personal I like to work with Visual Studio Code and the ssh fs extension for this purpose.
An other alternative is to work with X2Go. X2Go enables you to access a graphical desktop of a computer over a low bandwidth (or high bandwidth) connection.

Difference between using RStudio on a virtual machine and Rstudio on RServer

I am new in R and I am working with a datasets that has more than 5 millions of observations. So I thought that it would be a good idea to use RStudio on a virtual machine instead of using it on my local machine.
I am reading the documentation about virtual machines and RServer but it is still not clear to me if I have to use Microsoft R Server to create a VIM and then just install Rstudio as I would do in my local machine or if I can create a generic VIM and then install RStudio. Which is the correct way? Why?
If both of these options are possible, which one is the best?
Please help me. Sorry for my confusion.
You can do either. If you are using Azure (which I think you are given that you mention Microsoft R Server), there is also the Data Science VM, which will come preinstalled with RStudio and many other useful programs.
R Server is more for production workloads with R, so unless you are planning that you could probably stick with the Data Science VM. If you end up choosing this option, you can connect directly to an RStudio instance on the R Server from the Azure portal.

Running GUI analysis packages from RStudio server

RStudio server uses a headless R session and seems to pass all of the I/O operations encoded to save bandwidth. This works for everything except for packages like Rattle or Latticist, which work through their own GUI. Is there a way to use these packages through RStudio server or otherwise access the RStudio server R session to run these packages remotely?
Bonus if there's an efficient way to run these packages remotely without forwarding an X session over SSH.
I'm not sure this is possible over the RStudio interface because of the way these graphical programs work. It's easy enough for RStudio to capture textual input and output for R. Capturing normal graphical output is pretty impressive, but that's done "natively" in R. Even packages like ggplot2 and lattice use the builtin R plotting capabilities -- they do some rendering and data processing on their own, pass that onto grid and then grid renders the plots via R builtins when plot() or print is called (including implicitly in the REPL for interactive sessions). RCommander, RGL and the like use external libraries (Tcl/Tk, OpenGL), which render their interfaces directly over operating system services and not via R. R doesn't even see the output from these programs -- it only knows that the R wrapper function for these services hasn't returned yet. For local RStudio, this isn't a problem because the services are forwarded directly to the local display, but for RStudio server, there is no display!
Another consideration: assuming R could capture and forward X, that would imply having an X Server (in X, Server is the display/keyboard/etc, Client is the program that needs I/O) running in your browser. Modern JavaScript is pretty amazing at times, but X is a very complicated codebase and very sensitive to latency. Running X over the Internet is much slower than over the local network -- the protocol just wasn't designed for such things and most operations involve far too many roundtrips.
On a more practical side, you can still do most of your work via RStudio and only do the graphical commands via X forwarding:
Do everything that doesn't involve an external graphics interface.
Save your R Session (in the Environment tab or via the command line) as .RData in your project directory. (You can actually do this elsewhere, but it's generally more convenient if your workspace is saved in the working directory.)
Login in via SSH and X Forwarding and cd to the project directory.
Start R -- R will automatically load any existing workspaces saved as .RData. (You can disable this behavior with --vanilla. Depending on the size of your workspace, R may take a few seconds to a few minutes to load.
Have fun with Rattle, Latticist, RCommander, RGL, etc! Be ready for massive lag if you're doing this over the Internet and not the local network (see above).

Resources