Is there a way to send dataframes from server to client in R? - r

I have an R script which I program on my laptop. After I am done, I FTP the R script up to my university cluster and run my code there (in parallel if needed). Most of my functions return data frames that I'd like to plot using ggplot. This is perfectly fine however I'd like to use tikzDevice to create tikz (latex code) for my plot to have the same font and style as my thesis.
The problem:
I can't run tikzdevice on the university cluster because of the lack of LaTeX packages. I also can't install them due to no sudo access. Essentially, this route is a dead end for me.
Solution:
I can run tikzDevice on my own laptop. Since I am working on my latex document(thesis) on my laptop, its a seamless \include.
The problem is that the data (as dataframes) exist on the university cluster. I COULD save dataframes as textfiles, download them onto my laptop, and read.table them but this is gonna kill my productivity.
Are there any pacakges, tools, software, anything that will let me "extract" my data from the university server?
A possible solution is https://gist.github.com/SachaEpskamp/5796467
but I have no idea how to use this.
Note: I also don't know which part of the SE network this could go on.

I've found a workaround solution to this.
To those who are looking to transfer data back and forth from server/client, you can send and receive objects by serializing it.
On the server, you use the saveRDS command, and on the client you have the readRDS command. To provide a URL to readRDS, you must use gzcon, so like the following:
con = gzcon(url("http://path.com/to/your/object/serialized"))
a = readRDS(file = con)
Obviously this depends on some protocol installed on the server (like http)

Related

Workflow for using command line R?

I am used to using R in RStudio. For a new project, I have to use R on the command line, because the data storage and analysis are only allowed to be on a specific server that I connect to using ssh. This server doesn't have rstudio-server to support remote RStudio sessions.
The project involves an extremely large dataset, and some pre-written code to load/format the data that I have been told to run using "source()" before I do anything else. This takes several minutes to run and load the data each time.
What would a good workflow be for something like this? Editing my code in a .r file, saving, then running it would require taking several minutes to load the data each time. But just running R in an interactive session would make it hard to keep track of what I am doing and repeat things if necessary.
Is there some command-line equivalent to RStudio where you can have an interactive session but be editing/saving a file of your code as you go?
Sounds like JuPyteR might be your friend here.
The R kernel works great.
You can use it on a remote server either with exposing an open port (and setting up JuPyteR login credentials)
Or via port forwarding over SSH.
It is a lot like an interactive reply, except it holds state.
And you can go back and rerun cells.
(Of course state can be dangerous for reproduceability)
For RStudio you can launch console and ssh to your remote servers even if your servers don't use expensive RStudio for servers platform. You can then execute all commands from R Studio directly into the ssh with the default shortcut key. This might allow to continue using R studio, track what you're doing in the R script, execute interactively.

Link Project and R Version

I have two different versions of R installed, one which is up to date and which I use for all my regular R coding (needs to be up to date so that I can use various updated and new packages) and one which I use to access OLAP cubes (needs to be the R Client from Microsoft, because this is the only one which supports the olapR package, and which currently uses R version 3.4.3).
Since, in theory, I only have to access the OLAP cube once a month, I "outsourced" this task to a different RStudio project, in which I download and save the required data for all other projects. Hence, all other projects never require the olapR package to be installed and can and will be run in the up to date R version.
Now, ideally I would like to link my R version to my projects, so that I do not have to change my global R version and restart RStudio every time I access the OLAP cube or work on this data retrieval project (and then switch it back). However, I could not find any options in RStudio to achieve this result.
There are a few threads out there describing the same problem, but with no satisfactory answer in my opinion:
https://support.rstudio.com/hc/en-us/community/posts/200657296-Link-Project-and-R-Version
Rstudio project using different version of R
I also tried looking for a different package than olapR but with similar functionality, but could not find anything except X4R, which seems outdated and does not work for me (https://github.com/overcoil/X4R). Sadly, I am also unable to directly access the databases which the OLAP cube uses for its results, so I cannot go "around" it.
I am happy for any help or suggestions you can offer, whether it is a general workaround to link a project to a specific R version or the (less helpful for the community) solution of accessing the OLAP cube in a different way.
Thanks in advance!
Using the answer from MrGumble I created a .bat file that will execute my .R file using the desired R installation. Even though it is not the answer I thought I would get, I think it is an even better solution to the problem.
For all facing a similar issue, here is the .bat file (never created one before, so also had to google how to do it and I guess some might be in the same position):
#echo off
title Getting data for further processing in R
echo Retrieving OLAP data
echo.
"C:\Program Files\Microsoft\R Client\R_SERVER\bin\Rscript.exe" "C:\Users\me\Documents\Projects\!Data\script.R"
echo.
echo Saved data
echo.
pause
Thanks again to MrGumble for his help.
Skip RStudio.
RStudio is really just an editor (albeit powerful and useful) editor, which starts an R console for you (and the surrounding PATH variables, library locations, etc.).
If your monthly task only requires you to run the R-script (or a bit of interactive work), you can simply execute your preferred version of R from the command line and have it run your R script. E.g.
C:\Users\me>"C:\Program Files (x64)\Microsoft R\bin\Rscript" myscript.R
You might have to define some PATH variables so that the older R doesn't look for packages in the newer R's libraries, but that depends entirely on your current setup.

Using RStudio with R backend on cluster via SSH

I have access to (not authority over) a computing cluster which has R installed. Is there a way for me to use R-Studio on my local computer -- but have the code running on the cluster via SSH?
To clarify -- No I don't really have non-SSH access, no I can't install R-Studio (server or desktop) on the cluster.
In line with the hackish options #hrbrmstr mentioned...
If your aim is to run mostly non-interactive code, then you can probably establish an n-node parallel::makePSOCKcluster() on the remote machines and run each of your commands via parallel like commands. Similarly, you could use package::svSocket, see this neat demo on YouTube for more details than fit in a reasonable answer.
But, given that you said RStudio, I suspect you are thinking of interactive use, and the above would be doable (but painful). Nothing I know of will let you just pretend that the remote machine is the local machine (which is a pity to be sure). However you might be able to hack something together, with sink() etc and a server and client side loop, e.g. How to connect two computers using R?.

Efficient switching between 32bit and 64bit R versions

I am working with large datasets that are available in *.mdb (i.e access database) format. I am using RODBC R package to extract data from access database. I figured out that I have 32 bit office installed on my machine. Since, I have 32 bit office installed, it seems I can use only 32 bit R in order to connect to the access database using RODBC. After I read the data using 32 bit R, then doing some exploratory analysis (plotting data, summary / regression), I got the memory issues which I didn't get while using 64-bit R.
Currently, I am using Rstudio to run all my code and I could change the version of R that I use from Options >> Global Options >> R version:
However, I don't want to switch to 32-bit while reading access database using RODBC and then go back to R-studio to revert back to 64-bit for analysis. Is there an automatic solution which allows me to specify 32-bit or 64-bit ? Can we do that using batch file ? If anyone could shed some light that would be great.
Write your code that extracts the data as one R script. Have that script save the output data that you need for your analysis to an .RData file.
Write the code that you run your analyses in, to be run in 64-bit R. Using the answer found here, run your code using the 32-bit R. Then, the next line can be reading the data in from the .RData file. If needed to allow things to load, use Sys.sleep to have your first program wait a few seconds for the load to complete.

Running GUI analysis packages from RStudio server

RStudio server uses a headless R session and seems to pass all of the I/O operations encoded to save bandwidth. This works for everything except for packages like Rattle or Latticist, which work through their own GUI. Is there a way to use these packages through RStudio server or otherwise access the RStudio server R session to run these packages remotely?
Bonus if there's an efficient way to run these packages remotely without forwarding an X session over SSH.
I'm not sure this is possible over the RStudio interface because of the way these graphical programs work. It's easy enough for RStudio to capture textual input and output for R. Capturing normal graphical output is pretty impressive, but that's done "natively" in R. Even packages like ggplot2 and lattice use the builtin R plotting capabilities -- they do some rendering and data processing on their own, pass that onto grid and then grid renders the plots via R builtins when plot() or print is called (including implicitly in the REPL for interactive sessions). RCommander, RGL and the like use external libraries (Tcl/Tk, OpenGL), which render their interfaces directly over operating system services and not via R. R doesn't even see the output from these programs -- it only knows that the R wrapper function for these services hasn't returned yet. For local RStudio, this isn't a problem because the services are forwarded directly to the local display, but for RStudio server, there is no display!
Another consideration: assuming R could capture and forward X, that would imply having an X Server (in X, Server is the display/keyboard/etc, Client is the program that needs I/O) running in your browser. Modern JavaScript is pretty amazing at times, but X is a very complicated codebase and very sensitive to latency. Running X over the Internet is much slower than over the local network -- the protocol just wasn't designed for such things and most operations involve far too many roundtrips.
On a more practical side, you can still do most of your work via RStudio and only do the graphical commands via X forwarding:
Do everything that doesn't involve an external graphics interface.
Save your R Session (in the Environment tab or via the command line) as .RData in your project directory. (You can actually do this elsewhere, but it's generally more convenient if your workspace is saved in the working directory.)
Login in via SSH and X Forwarding and cd to the project directory.
Start R -- R will automatically load any existing workspaces saved as .RData. (You can disable this behavior with --vanilla. Depending on the size of your workspace, R may take a few seconds to a few minutes to load.
Have fun with Rattle, Latticist, RCommander, RGL, etc! Be ready for massive lag if you're doing this over the Internet and not the local network (see above).

Resources