Build system for R - r

I've got a large data analysis project containing dozens of R scripts that depend in complicated ways on each other and so I thought it would be a good idea to formalize all these dependencies and set the project in a build system that runs things in the correct order and re-runs anything that changes or anything that's downstream from things that change.
But even after some hours worth of googling I haven't found any build systems that are custom-made for R (though there are plenty for more genreal purposes). I've previously worked with waf to organize data analysis projects in Python and know I could use waf to run R scripts as well. But having to manage a whole Python environment just to run some R scripts seems clunky.
What are other people using to solve this problem?

There is a package called "GRANBase" that is capable of doing something similar to what you're referring to. It's an R package management & build tool, so if your scripts are put into R package(s), you could probably make use of it.

Related

Link Project and R Version

I have two different versions of R installed, one which is up to date and which I use for all my regular R coding (needs to be up to date so that I can use various updated and new packages) and one which I use to access OLAP cubes (needs to be the R Client from Microsoft, because this is the only one which supports the olapR package, and which currently uses R version 3.4.3).
Since, in theory, I only have to access the OLAP cube once a month, I "outsourced" this task to a different RStudio project, in which I download and save the required data for all other projects. Hence, all other projects never require the olapR package to be installed and can and will be run in the up to date R version.
Now, ideally I would like to link my R version to my projects, so that I do not have to change my global R version and restart RStudio every time I access the OLAP cube or work on this data retrieval project (and then switch it back). However, I could not find any options in RStudio to achieve this result.
There are a few threads out there describing the same problem, but with no satisfactory answer in my opinion:
https://support.rstudio.com/hc/en-us/community/posts/200657296-Link-Project-and-R-Version
Rstudio project using different version of R
I also tried looking for a different package than olapR but with similar functionality, but could not find anything except X4R, which seems outdated and does not work for me (https://github.com/overcoil/X4R). Sadly, I am also unable to directly access the databases which the OLAP cube uses for its results, so I cannot go "around" it.
I am happy for any help or suggestions you can offer, whether it is a general workaround to link a project to a specific R version or the (less helpful for the community) solution of accessing the OLAP cube in a different way.
Thanks in advance!
Using the answer from MrGumble I created a .bat file that will execute my .R file using the desired R installation. Even though it is not the answer I thought I would get, I think it is an even better solution to the problem.
For all facing a similar issue, here is the .bat file (never created one before, so also had to google how to do it and I guess some might be in the same position):
#echo off
title Getting data for further processing in R
echo Retrieving OLAP data
echo.
"C:\Program Files\Microsoft\R Client\R_SERVER\bin\Rscript.exe" "C:\Users\me\Documents\Projects\!Data\script.R"
echo.
echo Saved data
echo.
pause
Thanks again to MrGumble for his help.
Skip RStudio.
RStudio is really just an editor (albeit powerful and useful) editor, which starts an R console for you (and the surrounding PATH variables, library locations, etc.).
If your monthly task only requires you to run the R-script (or a bit of interactive work), you can simply execute your preferred version of R from the command line and have it run your R script. E.g.
C:\Users\me>"C:\Program Files (x64)\Microsoft R\bin\Rscript" myscript.R
You might have to define some PATH variables so that the older R doesn't look for packages in the newer R's libraries, but that depends entirely on your current setup.

How switch R architectures dynamically in RStudio

In RStudio there's a Tools menu which allows you to select an installed version/architecture of R under Global Options.
That's great, but my issue with that is that, as the name implies, it is a Global option, so once you select a different architecture (or version number) you then have to restart RStudio and it applies to all of your RStudio instances and projects.
This is a problem for me because:
I have some scripts within a given project that strictly require 32-bit R due to the fact that they're interfacing with 32-bit databases, such as Hortonworks' Hadoop
I have other scripts within the same project which strictly require 64-bit R, due to (a) availability of certain packages and (b) memory limits being prohibitively small in 32-bit R on my OS
which we can call "Issue #1" and it's also a problem because I have certain projects which require a specific architecture, though all the scripts within the project use the same architecture (which should theoretically be an easier to solve problem that we can call "Issue #2").
If we can solve Issue #1 then Issue #2 is solved as well. If we can solve Issue #2 I'll still be better off, even if Issue #1 is unsolved.
I'm basically asking if anyone has a hack, work-around, or better workflow to address this need for frequently switching architectures and/or needing to run different architectures in different R/RStudio sessions simultaneously for different projects on a regular basis.
I know that this functionality would probably represent a feature request for RStudio and if this question is not appropriate for StackOverflow for that reason then let me know and I'll delete it. I just figured that a lot of other people probably have this issue, so maybe someone has found a work-around/hack?
There's no simple way to do this, but there are some workarounds. One you might consider is launching the correct bit-flavor of R from the current bit-flavor of R via system2 invoking Rscript.exe, e.g. (untested code):
source32 <- function(file) {
system2("C:\\Program Files\\R\\R-3.1.0\\bin\\i386\\Rscript.exe", normalizePath(file))
}
...
# Run a 64 bit script
source("my64.R")
# Run a 32 bit script
source32("my32.R")
Of course that doesn't really give you a 32 bit interactive session so much as the ability to run code as 32 bit.
One other tip: If you hold down CTRL while launching RStudio, you can pick the R flavor and bitness to launch on startup. This will save you some time if you're switching a lot.

Interfacing R with other non-Java languages / Compiling R to executable

I've developed a .R script that works with a DB, does a bunch of processing and outputs graphs and tables. I can output that data as comma-separated values and pictures, to later import them on my software, that I have no issue.
The problem is how can I distribute my application without having to make a complete install of R on the client. I've seen things like RJava, but my app is on VB6 (yeah...) and I don't see any libraries, or ways to compile to exe. The compile package only makes compiled versions of any function you define, like what psyco used to do for Python (before Pypy).
Does anyone have some insight on compiling R to avoid having the user to install an entire additional software?
EDIT: Does an R compiler exist? This question relates deeply to mine, but I haven't seen how it can be used to make a full script an exe. You can just compile a main function and cat it to a file? Is that even possible?
The short answer is "no, that will not work".
There simply is no compiler that allows you to shrink-wrap your app. So your best best may be either
using the headless Rserve over the network, or
using the R (D)COM server used by RExcel et al

Tools Commonly used to Program in R

I apologize if this has already been asked a different way but I couldn't find anything getting at what I wanted.
I am really getting into R from other packages (SPSS). As I learn about what truly can be done, I realize that there are additional "tools" that I need. This gets me to my question.
What setup do you have for developing R code? I can't see myself actually developing r packages anywhere in the near future, but I do see myself wanting to manage my r projects effeciently, as well as create reports and presentations in LaTeX.
For context, I develop my R code in Eclipse for Windows, but I have had a real hard time successfully setting up Latex/Sweave and Github plugins.
Lastly, do you develop code using Windows or something else?
Many thanks in advance for any insight you can lend.
Emacs has everything I commonly need:
ESS (for R),
AucTeX (for Latex),
similarly rich 'modes' for other languages I use (C++, make, shell, ...),
plus a lot of other modes you get quite used to as e.g. dired for directory/file browsing or org-mode as planner/to-do list,
the SVN integration is very good too
and there are probably a number of tools within Emacs I am now forgetting.
Works in text mode as well as graphical mode, and works essentially the same (incl ESS and AucTeX) on several operating systems (Linux mostly and Windows when I must). On Debian/Ubuntu all this is prepackaged and tends to work out of the box as well. For both Windows and OS X, Vincent Goulet has package very handy bundles, see here.
The 'daemon mode' is outstanding too -- I keep the same main Emacs session running and just connect and re-connect to it even when accessing the machine (via ssh or directly) from different computers.
Also see the EmacsWiki for more tips around Emacs.
Back to Emacs and R in particular. The R FAQ says it pretty well:
6.1 Is there Emacs support for R?
6.2 Should I run R from within Emacs?
and I like the affirmative and resounding answer to the second question: "Yes, definitely". I fully concur.
I'll second the suggestion that Emacs compliments R nicely, but let me share what the "killer feature" is for me.
Using Org-mode with Org-babel, I can write whole reports with inline graphs produced from R in raster and vector format which compiles seamlessly into a PDF report via latex. I can also view the graphs while editing, similar to a WYSIWYG editor.
I just wrapped up a major report with over 70 inline graphs with little effort, no editing external files, no issues maintaining naming between figures in my report and external files, or forgetting to recompile the latest version of a figure. Org & Babel does it all.
Org-mode:
http://orgmode.org/
Org-Babel:
http://orgmode.org/worg/org-contrib/babel/index.php
Example of inline R with Babel and PDF output, see the first example in multiple formats:
http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-R.php
Enjoy!
This is probably more relevant for package development, but it is also worth mentioning the roxygen R package that allows in-source documentation of your code. Note that even though you can't see yourself developing R packages anywhere in the near future, a package can be a very handy way of grouping related functions you develop and maintain, consistently documenting the code and keeping track of updates, even if you do not plan to distribute it.
I use a mac, and my most important tools are:
the command line, for running R
git, for keeping track of changes
github for publishing my code, bug tracking and collaboration
textmate for writing R code
Has anyone tried RStudio? It's the shiny new editor for R.
I use windows... (don't say it).
I like Notepad++ and NPPtoR. Makes it pretty easy to send things back and forth.
I use Eclipse on Windows and Linux. I compile LaTeX code (with Sweave) on Linux and I haven't bothered yet to set up the whole process in Eclipse. I need to pdflatex and bibtex files several times anyway, so I just have a terminal window with the specific string of commands handy. I tried ESS and Eclipse and they're very similar in functionality (and in my opinion the best two editors out there).
I use Eclipse / StatEt on Windows, and it Rocks !. For LaTex/Sweave I use MikTex which works well for me. For help setting things up check out this document and this post.
Other Tools you may find useful include;
If you want to build R Packages on
Windows, then get the RTools
For
Creating Documents, you may want to
check out odfWeave,
LibreOffice (was OpenOffice) and
the MSOffice ODF plugin
I have also
dabbled with Git but also didn't get
very far on Windows, but that was a
while ago.
For Presentations in LaTex
I recomend Beamer
I use Eclipse for both R and Latex while working on research papers. The plugins for both are very mature now. The nice thing is that you don't have to switch application while writing papers. I used different combination before but I found this to be the best.
I just got home from our local R User meeting (find one near you here) and of the 20 or so people there, all of us used a different program or tool to write R code in. I think that goes to show the diversity of the tools used to write and edit R code is just as diverse as the R community itself.

How do you use multiple versions of the same R package?

In order to be able to compare two versions of a package, I need to able to choose which version of the package that I load. R's package system is set to by default to overwrite existing packages, so that you always have the latest version. How do I override this behaviour?
My thoughts so far are:
I could get the package sources, edit the descriptions to give different names and build, in effect, two different packages. I'd rather be able to work directly with the binaries though, as it is much less hassle.
I don't necessarily need to have both versions of the packages loaded at the same time (just installed somewhere at the same time). I could perhaps mess about with Sys.getenv('R_HOME') to change the place where R installs the packages, and then .libpaths() to change the place where R looks for them. This seems hacky though, so does anyone have any better ideas?
You could selectively alter the library path. For complete transparency, keep both out of your usual path and then do
library(foo, lib.loc="~/dev/foo/v1") ## loads v1
and
library(foo, lib.loc="~/dev/foo/v2") ## loads v2
The same works for install.packages(), of course. All these commands have a number of arguments, so the hooks you aim for may already be present. So don't look at changing R_HOME, rather look at help(install.packages) (assuming you install from source).
But AFAIK you cannot load the same package twice under the same name.
Many years have passed since the accepted answer which is of course still valid. It might however be worthwhile to mention a few new options that arised in the meanwhile:
Managing multiple versions of packages
For managing multiple versions of packages on a project (directory) level, the packrat tool can be useful: https://rstudio.github.io/packrat/. In short
Packrat enhances your project directory by storing your package dependencies inside it, rather than relying on your personal R library that is shared across all of your other R sessions.
This basically means that each of your projects can have its own "private library", isolated from the user and system libraries. If you are using RStudio, packrat is very neatly integrated and easy to use.
Installing custom package versions
In terms of installing a custom version of a package, there are many ways, perhaps the most convenient may be using the devtools package, example:
devtools::install_version("ggplot2", version = "0.9.1")
Alternatively, as suggested by Richie, there is now a more lightweight package called remotes that is a result of the decomposition of devtools into smaller packages, with very similar usage:
remotes::install_version("ggplot2", version = "0.9.1")
More info on the topic can be found:
https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
I worked with R for a longtime now and it's only today that I thought about this. The idea came from the fact that I started dabbling with Python and the first step I had to make was to manage what they (pythonistas) call "Virtual environments". They even have dedicated tools for this seemingly important task. I informed myself more about this aspect and why they take it so seriously. I finally realized that this is a neat and important way to manage different projects with conflicting dependencies. I wanted to know why R doesn't have this feature and found that actually the concept of "environments" exists in R but not introduced to newbies like in Python. So you need to check the documentation about this and it will solve your issue.
Sorry for rambling but I thought it would help.

Resources