Alternative way to use compiled C++ code in Rcpp - r

I have a application that calls some R code and some supporting C++ code that is called in my R code via Rcpp.
Currently, I am using sourceCpp() when I start my R session and this fine for now (e.g., sourceCpp('path/code.cpp')). But, this compiles the C++ each time the session starts and there is some overhead in doing so that makes the app slower to start. Of course, I could create an R package that precompiles the c++ code and I could load the package each time.
However, I'm curious if there is a way to source c++ code into an R session that is pre-compiled in a manner other than creating an R package?
The benefit of creating the R package makes is faster to load the code that is already compiled, but requires the work associated with creating that package. The sourceCpp() avoids having to create the package, but is slow when sourcing in the code. So looking to learn if there is an option that provides the convenience of sourcing in the c++ code like sourceCpp(), but would source in code that as been compiled.
Thank you

Related

Workaround for clusterExport issue when using C++ compiled functions / use .onLoad in R?

I have a question re: the issue that Rcpp modules are not easily exported as regular R objects. In particular, my code is object-oriented and uses an instance of a class exported to R. The issue has been documented here.
A possible workaround is to load a few things into the environment upon loading but I don't quite understand how to hold back my zzz.R which gets executed prematurely (build vs. load time) which results in an error because the modules called in zzz.R only become available AFTER compilation and installation.

GNU+Intel openmp dynamic loading

I have been trying to create an R library dynamically loading a dependency using Intel OpenMP. When the library is loaded, there is a clash with OpenMP library.
Using KMP_DUPLICATE_LIB_OK=TRUE gets me past the loading error but the program crashes once it is in a parallel section.
Unfortunately, neither compiling R using intel OpenMP or the dependency using GNU OpenMP is an option (because I want it to work with the standard R distribution and some external dependencies linked statically have to use Intel OpenMP).
However, I can recompile the dependency with some compatibility flags or modify how linking is done (but in the end, it has to be loaded dynamically from the R library). Setting environment variables is also an option (I am thinking about https://software.intel.com/en-us/node/522775 but none of the options seems to help so far).
The R library is written in C and I doubt that the fact it is R which will load it in the end really matters.
Any idea how to handle this?

How to edit R library source files in-place

This a follow-up to How to edit and debug R library sources. I'm wondering if there's an easy way to edit an R library source file and cause the edited file to be loaded by library without reinstalling the code. I'm asking this in the context of a library that I'm developing and am looking for an easy way to incrementally edit and test my code. I know about source and other ways of loading code into an R session, but I want to test scripts that do the usual library thing.
Thanks!
It sounds like you're developing a packages. If that's the case then using the devtools package is probably what you want to do. The load_all() function will systematically reload all of your code so you can make changes and test everything out.

R package development best practices: using system() command?

I'm developing a new R package to release to CRAN and would like to invoke the system() command directly within its source code. For example, I would like to use the gzip utility directly within my R package:
write.csv(mydat, "mydat.csv")
system("gzip mydat.csv", wait=FALSE)
Even more importantly, I would like to leverage other existing command-line utilities directly within my R package. And by command-line utilities, I mean actual large command-line software programs that are not trivial to rewrite in R.
So my question is: What are some best practices for specifying the usage of external (not R) command-line libraries during the development of an R package?
For example, the Imports and Depends fields in an R package DESCRIPTION file are only good for specifying the usage of existing R libraries within your R package. It would be a nuisance for users to have to manually install some existing non-R command-line library by using a package manager (e.g., brew), and this would go against best practices of self-contained work within an R Studio IDE. Besides, there is no guarantee that such a roundabout approach would work in a reproducible fashion, due to the difficulty of properly matching full paths to the command-line executable, coordinating with the R Studio IDE, etc.
Likewise, using tools such as https://cran.r-project.org/web/packages/ssh.utils/index.html will only serve basic command-line needs within the R environment, and hence does not apply to the needs of using large command-line software programs.
Note: The R package that I'm developing is not for personal use. It is intended for public release to CRAN and, hence, should comply with their checks. However, I could not find any specification from CRAN regarding the use of the system() command, particularly in the context of leveraging actual large command-line software programs that are not trivial to rewrite in R.
I would like to use the gzip utility directly within my R package
That is a code smell. Your package then needs to determine by means of configure (or similar) if such programs exist. So why bother? In this example, and on my box:
edd#don:~$ grep GZIP /etc/R/Renviron
R_GZIPCMD=${R_GZIPCMD-'/bin/gzip -n'}
edd#don:~$
You have access to it via most file-saving commands such as saveRDS(), the gzcon() and gzfile() functions and so on. See this older answer of mine.
For truly external programs you can rely on system(). See Christoph's seasonal package relying on our underlying x13binary binary package.

R package and execution time

I have developed a big library of functions in R.
For the moment I just load ("source") the functions at the beginning of all my scripts.
I have seen that I can create packages.
My question is: Will that improve the execution time of my functions? (by transforming interpreter code into machine language?)
What does the package creation does? Does it creates binaries?
Thanks
fred
There isn't an R compiler yet Packaging your R code won't improve its execution time massively. It also won't create binaries for you - you need to build those from the package tarball (or get CRAN or similar to build them for you). There is now a byte compiler for R and R's packages are now by default byte compiled. Speed improvements are in general modest - don't expect C-like speed.
Packaging R code just does exactly that; it packages the R code, code to be compiled (C Fortran etc), man pages, documentation, tests etc into a standard format that can be distributed to users and installed/built on multiple architectures.
Packages can take advantage of things like lazy loading such that R objects (your functions say) are only loaded when needed, whereas source loads them all into the global environment (by default).
If you don't intend to distribute your code then there are few benefits of packaging just for your own use, but if you do package and write documentation and examples/tests, you might be alerted to changes in the package code that break examples or cause tests to fail. That way you are better informed as to the reliability of your code, even if it is only you using it!

Resources