R package and execution time - r

I have developed a big library of functions in R.
For the moment I just load ("source") the functions at the beginning of all my scripts.
I have seen that I can create packages.
My question is: Will that improve the execution time of my functions? (by transforming interpreter code into machine language?)
What does the package creation does? Does it creates binaries?
Thanks
fred

There isn't an R compiler yet Packaging your R code won't improve its execution time massively. It also won't create binaries for you - you need to build those from the package tarball (or get CRAN or similar to build them for you). There is now a byte compiler for R and R's packages are now by default byte compiled. Speed improvements are in general modest - don't expect C-like speed.
Packaging R code just does exactly that; it packages the R code, code to be compiled (C Fortran etc), man pages, documentation, tests etc into a standard format that can be distributed to users and installed/built on multiple architectures.
Packages can take advantage of things like lazy loading such that R objects (your functions say) are only loaded when needed, whereas source loads them all into the global environment (by default).
If you don't intend to distribute your code then there are few benefits of packaging just for your own use, but if you do package and write documentation and examples/tests, you might be alerted to changes in the package code that break examples or cause tests to fail. That way you are better informed as to the reliability of your code, even if it is only you using it!

Related

Alternative way to use compiled C++ code in Rcpp

I have a application that calls some R code and some supporting C++ code that is called in my R code via Rcpp.
Currently, I am using sourceCpp() when I start my R session and this fine for now (e.g., sourceCpp('path/code.cpp')). But, this compiles the C++ each time the session starts and there is some overhead in doing so that makes the app slower to start. Of course, I could create an R package that precompiles the c++ code and I could load the package each time.
However, I'm curious if there is a way to source c++ code into an R session that is pre-compiled in a manner other than creating an R package?
The benefit of creating the R package makes is faster to load the code that is already compiled, but requires the work associated with creating that package. The sourceCpp() avoids having to create the package, but is slow when sourcing in the code. So looking to learn if there is an option that provides the convenience of sourcing in the c++ code like sourceCpp(), but would source in code that as been compiled.
Thank you

R package development best practices: using system() command?

I'm developing a new R package to release to CRAN and would like to invoke the system() command directly within its source code. For example, I would like to use the gzip utility directly within my R package:
write.csv(mydat, "mydat.csv")
system("gzip mydat.csv", wait=FALSE)
Even more importantly, I would like to leverage other existing command-line utilities directly within my R package. And by command-line utilities, I mean actual large command-line software programs that are not trivial to rewrite in R.
So my question is: What are some best practices for specifying the usage of external (not R) command-line libraries during the development of an R package?
For example, the Imports and Depends fields in an R package DESCRIPTION file are only good for specifying the usage of existing R libraries within your R package. It would be a nuisance for users to have to manually install some existing non-R command-line library by using a package manager (e.g., brew), and this would go against best practices of self-contained work within an R Studio IDE. Besides, there is no guarantee that such a roundabout approach would work in a reproducible fashion, due to the difficulty of properly matching full paths to the command-line executable, coordinating with the R Studio IDE, etc.
Likewise, using tools such as https://cran.r-project.org/web/packages/ssh.utils/index.html will only serve basic command-line needs within the R environment, and hence does not apply to the needs of using large command-line software programs.
Note: The R package that I'm developing is not for personal use. It is intended for public release to CRAN and, hence, should comply with their checks. However, I could not find any specification from CRAN regarding the use of the system() command, particularly in the context of leveraging actual large command-line software programs that are not trivial to rewrite in R.
I would like to use the gzip utility directly within my R package
That is a code smell. Your package then needs to determine by means of configure (or similar) if such programs exist. So why bother? In this example, and on my box:
edd#don:~$ grep GZIP /etc/R/Renviron
R_GZIPCMD=${R_GZIPCMD-'/bin/gzip -n'}
edd#don:~$
You have access to it via most file-saving commands such as saveRDS(), the gzcon() and gzfile() functions and so on. See this older answer of mine.
For truly external programs you can rely on system(). See Christoph's seasonal package relying on our underlying x13binary binary package.

R Package with Large Size External Assets

This is a followup to a question I posted earlier. To summarize, I am writing an R Package called Slidify, which makes use of several external non-R based libraries. My earlier question was about how to manage dependencies.
Several solutions were proposed, of which the most attractive solution was to package the external libraries as a different R package, and make it a dependency for Slidify. This is the strategy followed by the package xlsx, which packages the java dependencies as a different package xlsxjars.
An alterative is for me to provide the external libraries as a download and package a install_libraries function within Slidify, which would automatically fetch the required files and download it into the package directory. I can also add an update_libraries function which would update if things change.
My question is, are there any specific advantages to doing the CRAN dance for external libraries which are not R based. Am I missing something here?
As discussed in the comment-thread, for a package like slidify with a number of large, (mostly) fixed, and portable files, a "resource" package makes more sense:
you will know the path where it installed (as the package itself will tell you)
users can't accidentally put it somewhere else
you get CRAN tests
you get CRAN distribution, mirrors, ...
users already know install.packages() etc
the more nimble development of your package using these fixed parts is not held back by the large support files

Julia compiles the script every time?

Julia language compiles the script every time, can't we compile binaries with julia instead?
I tried a small helloworld script with println function it took like 2,3 seconds for julia to show the output! It would be better if we can make binaries instead of compiling every time
Update: There have been some changes in Julia, since I asked this question. Though I'm not following the updates for julia anymore, since I've asked this question and if you're looking for something similar, look into the below answers and comments by people who are following julia.
Also, its good to know that now it takes around 150ms to load a script.
Keno's answer is spot on, but maybe I can give a little more detail on what's going on and what we're planning to do about it.
Currently there is only an LLVM JIT mode:
There's a very trivial interpreter for some simple top-level statements.
All other code is jitted into machine code before execution. The code is aggressively specialized using the run-time types of the values that the code is being applied to, propagated through the program using dynamic type inference.
This is how Julia gets good performance even when code is written without type annotations: if you call f(1) you get code specialized for Int64 — the type of 1 on 64-bit systems; if you call f(1.0) you get a newly jitted version that is specialized for Float64 — the type of 1.0 on all systems. Since each compiled version of the function knows what types it will be getting, it can run at C-like speed. You can sabotage this by writing and using "type-unstable" functions whose return type depends on run-time data, rather than just types, but we've taken great care not to do that in designing the core language and standard library.
Most of Julia is written in itself, then parsed, type-inferred and jitted, so bootstrapping the entire system from scratch takes some 15-20 seconds. To make it faster, we have a staged system where we parse, type-infer, and then cache a serialized version of the type-inferred AST in the file sys.ji. This file is then loaded and used to run the system when you run julia. No LLVM code or machine code is cached in sys.ji, however, so all the LLVM jitting still needs to be done every time julia starts up, which therefore takes about 2 seconds.
This 2-second startup delay is quite annoying and we have a plan for fixing it. The basic plan is to be able to compile whole Julia programs to binaries: either executables that can be run or .so/.dylib shared libraries that can be called from other programs as though they were simply shared C libraries. The startup time for a binary will be like any other C program, so the 2-second startup delay will vanish.
Addendum 1: Since November 2013, the development version of Julia no longer has a 2-second startup delay since it precompiles the standard library as binary code. The startup time is still 10x slower than Python and Ruby, so there's room for improvement, but it's pretty fast. The next step will be to allow precompilation of packages and scripts so that those can startup just as fast as Julia itself already does.
Addendum 2: Since June 2015, the development version of Julia precompiles many packages automatically, allowing them to load quickly. The next step is static compilation of entire Julia programs.
At the moment Julia JIT compiles its entire standard library on startup. We are aware of the situation and are currently working on caching the LLVM JIT output to remedy the situation, but until then, there's no way around it (except for using the REPL).

Interfacing R with other non-Java languages / Compiling R to executable

I've developed a .R script that works with a DB, does a bunch of processing and outputs graphs and tables. I can output that data as comma-separated values and pictures, to later import them on my software, that I have no issue.
The problem is how can I distribute my application without having to make a complete install of R on the client. I've seen things like RJava, but my app is on VB6 (yeah...) and I don't see any libraries, or ways to compile to exe. The compile package only makes compiled versions of any function you define, like what psyco used to do for Python (before Pypy).
Does anyone have some insight on compiling R to avoid having the user to install an entire additional software?
EDIT: Does an R compiler exist? This question relates deeply to mine, but I haven't seen how it can be used to make a full script an exe. You can just compile a main function and cat it to a file? Is that even possible?
The short answer is "no, that will not work".
There simply is no compiler that allows you to shrink-wrap your app. So your best best may be either
using the headless Rserve over the network, or
using the R (D)COM server used by RExcel et al

Resources