How to avoid parsing command line arguments when building R package? - r

I'm developing an R package that includes a command line interface. When building the package, I would like to avoid parsing the command line arguments (build flags) since my command line argument parser doesn't recognize these build-related arguments and produces an error.
To overcome this issue, I'm currently using an approach where my build flags are hard coded to an if statement before trying to parse the arguments:
if (length(commandArgs(trailingOnly = TRUE)) > 0 &&
!(grepl("--no-multiarch", commandArgs(trailingOnly = TRUE)))) {
argv <- GetCmdlineArguments()
DoStuff(argv$parsed.argument)
}
Another approach which I haven't experimented with yet includes putting the argument parsing in a different R file which is ignored by the build via .Rbuildignore. This would, however, lead to an unfavorable situation where an extra file is needed for each R file that has a command line interface.
Is there more elegant and robust way of detecting if the package is being built instead of actually executed from the command line by the user?

I do not fully understand what you are trying to achieve, but allow me to provide some context:
first off, all R use is generally from the R prompt, not the command-line
as such, a package will only every contain R functions etc
that said, command-line work is very powerful and popular
the R ecosystem has both Rscript and r from littler both of which facilitate command-line use
typically, we ship scripts for use these frontends in inst/scripts/ or inst/examples
all those scripts can use one of the many packages to parse command-line options -- I favour docopt
consequently, my littler package has lots of examples using docopt
Could you not just do the same?

Related

R check for a package: optimize the way DLLs are built and checked

I want to optimize my process for building a package. I have in pckgname/src some fortran code (f90):
pckgname/src/FortranFile1.f90
pckgname/src/FortranFile2.f90
I am under RStudio. When I build the package, it creates the src-i386 and src-x64 folders, inside which executable files in .o are produced
pckgname/src-i386/FortranFile1.o
pckgname/src-i386/FortranFile2.o
pckgname/src-x64/FortranFile1.o
pckgname/src-x64/FortranFile2.o
then dll files are produced into each of these folders from the .o files:
pckgname/src-i386/dllname.dll
pckgname/src-x64/dllname.dll
thereafter if I want to check the code successfully, I need to manually copy paste the dll into these two folders (in the previous version of the question i wrote code instead of dll which might have led to misunderstandings)
pckgname/inst/libs/x64/dllname.dll
pckgname/libs/X64/dllname.dll
My question is: is it normal that I have to do this or is there a shorter way without having to copy paste by hand dllname.dll
into these two folders? It could be indeed a source of error.
NB: If i don't copy the dlls into the said folders I get the following error messages (translated from the French):
Error in inDL(x, as.logical(local), as.logical(now), ...) :
impossible to load shared object 'C:/Users/username/Documents/pckgname/inst/libs/x64/dllname.dll':
LoadLibrary failure: The specified module can't be found
Error in inDL(x, as.logical(local), as.logical(now), ...) :
impossible to load shared object 'C:/Users/username/Documents/pckgname/libs/x64/dllname.dll':
LoadLibrary failure: The specified module can't be found.
[...]
`cleanup` is deprecated
The short answer
Is it normal that I have to do this?
No. If path/to/package is the directory you are developing your package in, and you have everything set up for your package to call your Fortran subroutines correctly (see "The long answer"), you can run
R CMD build path/to/package
at the command prompt, and a tarball will be constructed for you with everything in the right place (note you will need Rtools for this). Then you should be able to run
R CMD check packagename_versionnumber.tar.gz
from the command prompt to check your package, without any problems (stemming from the .dll files being in the wrong place -- you may have other problems, in which case I would suggest asking a new question with the ERROR, WARNING, or NOTE listed in the question).
If you prefer to work just from R, you can even
devtools::check("path/to/package")
without having to run devtools::build() or R CMD build ("devtools::check()... [b]undles the package before checking it" -- Hadley's chapter on checking; see also Karl Broman's chapter on checking).
The long answer
I think your question has to do with three issues potentially:
The difference between directory structure of packages before and after they're installed. (You may want to read the "What is a package?" section of Hadley's Package structure chapter -- luckily R CMD build at the command prompt or devtools::build() in R will take care of that for you)
Using INSTALL vs. BUILD (from the comments to the original version of this answer)
The proper way to set up a package to call Fortran subroutines.
You may need quite a bit of advice on the process of developing R packages itself. Some good guides include (in increasing order of detail):
Karl Broman's R package primer
Hadley Wickham's R packages
The Writing R Extensions manual
In particular, there are some details about having compiled code in an R package that you may want to be aware of. You may want to first read Hadley's chapter on compiled code (Broman doesn't have one), but then you honestly need to read most of the Writing R Extensions manual, in particular sections 1.1, 1.2, 1.5.4, and 1.6, and all of chapters 5 and 6.
In the mean time, I've setup a GitHub repository here that demonstrates a toy example R package FortranExample that shows how to correctly setup a package with Fortran code. The steps I took were:
Create the basic package structure using devtools::create("FortranExample").
Eliminate the "Depends" line in the DESCRIPTION, as it set a dependence on R >= 3.5.1, which will throw a warning in check (I have now also revised the "License" field to eliminate a warning about not specifying a proper license).
Make a src/ directory and add toy Fortran code there (it just doubles a double value).
Use tools::package_native_routine_registration_skeleton("FortranExample") to generate the symbol registration code that I placed in src/init.c (See Writing R Extensions, section 5.4).
Create a nice R wrapper for using .Fortran() to call the Fortran code (placed in R/example_function.R).
In that same file use the #' #useDynLib FortranExample Roxygen tag to add useDynLib(FortranExample) to the NAMESPACE file; if you don't use Roxygen, you can put it there manually (See Writing R Extensions 1.5.4 and 5.2).
Now we have a package that's properly set up to deal with the Fortran code. I have tested on a Windows machine (running Windows 8.1 and R 3.5.1) both the paths of running
R CMD build FortranExample
R CMD check FortranExample_0.0.0.9000.tar.gz
from the command prompt, and of running
devtools::check("FortranExample")
from R. There were no errors, and the only warning was the "License" issue mentioned above.
After cleaning up the after-effects of running devtools::check("FortranExample") (for some reason the cleanup option is now deprecated; see below for an R function to handle this for you inspired by devtools::clean_dll()), I used
devtools::install("FortranExample")
to successfully install the package and tested its function, getting:
FortranExample::example_function(2.0)
# [1] 4
The cleanup function I mentioned is
clean_source_dirs <- function(path) {
paths <- file.path(path, paste0("src", c("", "-i386", "-x64")))
file_pattern <- "\\.o|so|dll|a|sl|dyl"
unlink(list.files(path = paths, pattern = file_pattern, full.names = TRUE))
}
No, it is not normal and there is a solution to this problem. Make use of Makevars.win. The reason for your problem is that .dlls are looking for dependencies in places defined by environment variable PATH and relative paths defined during the linking. Linking is being done when running the command R CMD INSTALL as it is stated in Mingw preferences plus some custom parameters defined in the file Makevars.win (Windows platform dependent). As soon as the resulting library is copied, the binding to the places where dependent .dlls were situated may become broken, so if you put dlls in a place where typically dependent libraries reside, such as, for instance, $(R_HOME)/bin/$(ARCH)/,
cp -f <your library relative path>.dll $(R_HOME)/bin/$(ARCH)/<your library>.dll
during the check R will be looking for your dependencies specifically there too, so you will not miss the dependencies. Very crude solution, but it worked in my case.

How does a typical Rcpp edit-compile-test cycle look like?

I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.

Run R command without entering R and without a script

I want to run an R command from command line (actually, from within a Makefile). The command is roxygen2::roxygenise(), if it is relevant. I don't want to create a new file and run that as a script - that will just clutter my directory.
In python, this is simple - you write python -c "import antigravity".
I use the Makefile to build, install and test a (Rcpp) package I'm working on.
This is generally done with so 'shebang scripts'.
Historically, littler was there first, about a decade or so ago. It is still widely used, and contains a number of helper scripts as for example roxy.r which does just what you desire: run roxygen2::roxygenize(). I use this all the time.
Next, Rscript started to ship with R. It is similar to littler but automatically available whereever R is which is a plus. On the minus side, it starts slower, and fails to load the methods package which is a source of a number of bug reports and SO questions.
Much more recently, R itself added the ability to run expressions following the -e ... switch.
So you have plenty of choices. You can also study plenty of src/Makevars files many of which use Rscript.

calling Qiime with system call from R

Hej,
When I try to call QIIME with a system call from R, i.e
system2("macqiime")
R stops responding. It's no problem with other command line programs though.
can certain programs not be called from R via system2() ?
MacQIIME version:
MacQIIME 1.8.0-20140103
Sourcing MacQIIME environment variables...
This is the same as a normal terminal shell, except your default
python is DIFFERENT (/macqiime/bin/python) and there are other new
QIIME-related things in your PATH.
(note that I am primarily interested to call QIIME from R Markdown with engine = "sh" which fails, too. But I strongly suspect the problems are related)
In my experience, when you call Qiime from unix command line, it usually creates a virtual shell of it`s own to run its commands which is different from regular system commands like ls or mv. I suspect you may not be able to run Qiime from within R unless you emulate that same shell or configuration Qiime requires. I tried to run it from a python script and was not successful.

Profiling an installed R package with source line numbers?

I'd like to profile functions in an installed R package (data.table) using Rprof() with line.profiling=TRUE. Normally, installed package are byte compiled, and line numbers are not available for byte compiled packages. The usual instructions for line profiling with Rprof() require using source() or eval(parse()) so that srcref attributes are present.
How can I load data.table so that line numbers are active? My naive attempts to first load the package with library(data.table) and then source('data.table.R') fails because some of the compiled C functions are not found when I attempt to use the package, presumably because library() is using a different namespace. Maybe there is some way to source() into the correct namespace?
Alternatively, perhaps I can build a modified version of data.table that is not byte compiled, and then load that in a way that keeps line numbers? What alterations would I have to make, and how would I then load it? I started by setting ByteCompile: FALSE and then trying R CMD INSTALL -l ~/R/lib --build data.table, but this still seems to be byte compiled.
I'm eager to make this work and will pursue any suggestions. I'm running R 3.2.1 on Linux, have full control over the machine, and can install anything else that is required.
Edit:
A more complete description of the problem I was trying to solve (and the solution for it) is here: https://github.com/Rdatatable/data.table/issues/1249
I ended up doing essentially what Joshua suggested: recompile the package with "KeepSource: TRUE" in the DESCRIPTION. For my purposes, I also found "ByteCompile: FALSE" to be helpful, although this might not apply generally. I also changed the version number so I could see that I was using my modified version.
Then I installed to a different location with "R CMD INSTALL data.table -l ~/R/lib", and loaded with "library(data.table, lib='~/R/lib')". When used with the patches given in the link, I got the line numbers of the allocations as I desired. But if anyone knows a solution that doesn't require recompilation, I'm sure that others would appreciate if you shared.
You should be able to get line numbers even if the package is byte-compiled. But, as it says in ?Rprof (emphasis added):
Individual statements will be recorded in the profile log if
line.profiling is TRUE, and if the code being executed was
parsed with source references. See parse for a discussion of
source references. By default the statement locations are not
shown in summaryRprof, but see that help page for options to
enable the display.
That means you need to set KeepSource: TRUE either in the DESCRIPTION file or via the --with-keep.source argument to R CMD INSTALL.

Resources