R package development best practices: using system() command? - r

I'm developing a new R package to release to CRAN and would like to invoke the system() command directly within its source code. For example, I would like to use the gzip utility directly within my R package:
write.csv(mydat, "mydat.csv")
system("gzip mydat.csv", wait=FALSE)
Even more importantly, I would like to leverage other existing command-line utilities directly within my R package. And by command-line utilities, I mean actual large command-line software programs that are not trivial to rewrite in R.
So my question is: What are some best practices for specifying the usage of external (not R) command-line libraries during the development of an R package?
For example, the Imports and Depends fields in an R package DESCRIPTION file are only good for specifying the usage of existing R libraries within your R package. It would be a nuisance for users to have to manually install some existing non-R command-line library by using a package manager (e.g., brew), and this would go against best practices of self-contained work within an R Studio IDE. Besides, there is no guarantee that such a roundabout approach would work in a reproducible fashion, due to the difficulty of properly matching full paths to the command-line executable, coordinating with the R Studio IDE, etc.
Likewise, using tools such as https://cran.r-project.org/web/packages/ssh.utils/index.html will only serve basic command-line needs within the R environment, and hence does not apply to the needs of using large command-line software programs.
Note: The R package that I'm developing is not for personal use. It is intended for public release to CRAN and, hence, should comply with their checks. However, I could not find any specification from CRAN regarding the use of the system() command, particularly in the context of leveraging actual large command-line software programs that are not trivial to rewrite in R.

I would like to use the gzip utility directly within my R package
That is a code smell. Your package then needs to determine by means of configure (or similar) if such programs exist. So why bother? In this example, and on my box:
edd#don:~$ grep GZIP /etc/R/Renviron
R_GZIPCMD=${R_GZIPCMD-'/bin/gzip -n'}
edd#don:~$
You have access to it via most file-saving commands such as saveRDS(), the gzcon() and gzfile() functions and so on. See this older answer of mine.
For truly external programs you can rely on system(). See Christoph's seasonal package relying on our underlying x13binary binary package.

Related

Downloading R on Linux for multiple clients

I've created a program that runs in R that I plan on distributing among a lot of other people. Currently the R script is ran completely automatically and behind the scenes with one .sh script which is exactly how it is intended to be. I'm trying to make it so theres no need for client intervention. The R script itself loads the packages and installs them if they aren't present which takes away the task of them installing the packages themselves.
Is there a way I can provide a folder within my Application's folder that they already download that contains R-script and its dependencies so the code can use that location of Rscript to compile and run the R-program I have created. The goal is to be able to download it and run without the need of internet connection to download R and maybe even the programs required packages if possible.
Any help or ideas is appreciated.
I assume that process you want called "creating binary package". Binary is programs (like EXE files) which can run directly on target CPU without any interpreter software (like Python interpreter for python scripts, or Java VM for java applications). I'm not so familiar with packaging of R programs but I found some materials regarding this issue:
1 - Building binary package R
2 - https://seandavi.github.io/post/build-linux-r-binary-packages/
3 - https://support.rstudio.com/hc/en-us/articles/200486508-Building-Testing-and-Distributing-Packages
Second link assumes Linux as target system. Opposite to interpreted languages, binary files often OS dependent (Linux, Windows, or Mac). I, personally, don't know how compatible are packages between Linux systems with different library sets.
Please comment if you find some information misleading, I'll correct the answer.

Link Project and R Version

I have two different versions of R installed, one which is up to date and which I use for all my regular R coding (needs to be up to date so that I can use various updated and new packages) and one which I use to access OLAP cubes (needs to be the R Client from Microsoft, because this is the only one which supports the olapR package, and which currently uses R version 3.4.3).
Since, in theory, I only have to access the OLAP cube once a month, I "outsourced" this task to a different RStudio project, in which I download and save the required data for all other projects. Hence, all other projects never require the olapR package to be installed and can and will be run in the up to date R version.
Now, ideally I would like to link my R version to my projects, so that I do not have to change my global R version and restart RStudio every time I access the OLAP cube or work on this data retrieval project (and then switch it back). However, I could not find any options in RStudio to achieve this result.
There are a few threads out there describing the same problem, but with no satisfactory answer in my opinion:
https://support.rstudio.com/hc/en-us/community/posts/200657296-Link-Project-and-R-Version
Rstudio project using different version of R
I also tried looking for a different package than olapR but with similar functionality, but could not find anything except X4R, which seems outdated and does not work for me (https://github.com/overcoil/X4R). Sadly, I am also unable to directly access the databases which the OLAP cube uses for its results, so I cannot go "around" it.
I am happy for any help or suggestions you can offer, whether it is a general workaround to link a project to a specific R version or the (less helpful for the community) solution of accessing the OLAP cube in a different way.
Thanks in advance!
Using the answer from MrGumble I created a .bat file that will execute my .R file using the desired R installation. Even though it is not the answer I thought I would get, I think it is an even better solution to the problem.
For all facing a similar issue, here is the .bat file (never created one before, so also had to google how to do it and I guess some might be in the same position):
#echo off
title Getting data for further processing in R
echo Retrieving OLAP data
echo.
"C:\Program Files\Microsoft\R Client\R_SERVER\bin\Rscript.exe" "C:\Users\me\Documents\Projects\!Data\script.R"
echo.
echo Saved data
echo.
pause
Thanks again to MrGumble for his help.
Skip RStudio.
RStudio is really just an editor (albeit powerful and useful) editor, which starts an R console for you (and the surrounding PATH variables, library locations, etc.).
If your monthly task only requires you to run the R-script (or a bit of interactive work), you can simply execute your preferred version of R from the command line and have it run your R script. E.g.
C:\Users\me>"C:\Program Files (x64)\Microsoft R\bin\Rscript" myscript.R
You might have to define some PATH variables so that the older R doesn't look for packages in the newer R's libraries, but that depends entirely on your current setup.

How do I revert to an earlier version of a package?

I'm trying to write some SPARQL queries in R using the rrdf package. However, I get this error every time I try to load the library.
Error: package 'rrdflibs' 1.1.2 was found, but == 1.1.0 is required by 'rrdf'
Not sure why they didn't write it as >= 1.1.0. Is what they did a good programming practice?
Go to http://cran.r-project.org/src/contrib/Archive/rrdflibs/ to retrieve an older version. This is a source archive, so you will have to be able to build from source (typically easy on Linux, pretty easy on MacOS, and hard on Windows; you can use the http://win-builder.r-project.org/ service to build a Windows binary if necessary).
Actually, based on a quick look at the package, I think you should be able to install in this case (even on Windows without Rtools) via
download.file("http://cran.r-project.org/src/contrib/Archive/rrdflibs/rrdflibs_1.1.0.tar.gz",
dest="rrfdlibs_1.1.0.tar.gz")
install.packages("rrfdlibs_1.1.0.tar.gz",repos=NULL,type="source")
because the package doesn't actually contain anything that needs to be compiled.
Don't know about programming practice, you'd have to ask the authors if they had some particular reason to do it that way. (See maintainer("rrdf").) Maybe they knew the versions would not be backward/forward compatible?

Interfacing R with other non-Java languages / Compiling R to executable

I've developed a .R script that works with a DB, does a bunch of processing and outputs graphs and tables. I can output that data as comma-separated values and pictures, to later import them on my software, that I have no issue.
The problem is how can I distribute my application without having to make a complete install of R on the client. I've seen things like RJava, but my app is on VB6 (yeah...) and I don't see any libraries, or ways to compile to exe. The compile package only makes compiled versions of any function you define, like what psyco used to do for Python (before Pypy).
Does anyone have some insight on compiling R to avoid having the user to install an entire additional software?
EDIT: Does an R compiler exist? This question relates deeply to mine, but I haven't seen how it can be used to make a full script an exe. You can just compile a main function and cat it to a file? Is that even possible?
The short answer is "no, that will not work".
There simply is no compiler that allows you to shrink-wrap your app. So your best best may be either
using the headless Rserve over the network, or
using the R (D)COM server used by RExcel et al

R package and execution time

I have developed a big library of functions in R.
For the moment I just load ("source") the functions at the beginning of all my scripts.
I have seen that I can create packages.
My question is: Will that improve the execution time of my functions? (by transforming interpreter code into machine language?)
What does the package creation does? Does it creates binaries?
Thanks
fred
There isn't an R compiler yet Packaging your R code won't improve its execution time massively. It also won't create binaries for you - you need to build those from the package tarball (or get CRAN or similar to build them for you). There is now a byte compiler for R and R's packages are now by default byte compiled. Speed improvements are in general modest - don't expect C-like speed.
Packaging R code just does exactly that; it packages the R code, code to be compiled (C Fortran etc), man pages, documentation, tests etc into a standard format that can be distributed to users and installed/built on multiple architectures.
Packages can take advantage of things like lazy loading such that R objects (your functions say) are only loaded when needed, whereas source loads them all into the global environment (by default).
If you don't intend to distribute your code then there are few benefits of packaging just for your own use, but if you do package and write documentation and examples/tests, you might be alerted to changes in the package code that break examples or cause tests to fail. That way you are better informed as to the reliability of your code, even if it is only you using it!

Resources