Using/Distributing pre-compiled files - julia

I already asked this in the Julia community discourse but asking it here as expect to find different audience.
I created a simple function as below:
#MyFunction.jl
__precompile__()
function MyFunction(x)
y = x * 5
y * 5
end
And found the pre-compiled files saved as:
/Users/hasan/.julia/compiled/v1.0/MyFunction.jl
Can I use/distribute this pre-compiled file with my main function without using the original file source code itself?

These "compiled" files are only the lowering pass of Julia to byte code, which is not sufficient for stand-alone distribution. You might find this StackOverflow answer from Stefan Karspinski, one of the creators of Julia, useful for more details on the various layers of compilation inside Julia: https://stackoverflow.com/a/43456211/5504925
If you really want compiled code, your current best bet would be https://github.com/JuliaLang/PackageCompiler.jl. I'm not sure whether the package currently supports creating stand-alone binaries, or only intermediate forms, see also the introducting blog post by the author: https://medium.com/#sdanisch/compiling-julia-binaries-ddd6d4e0caf4.

Related

Should I write `.registration = true` for an "external" DLL in a R package?

When I do a package including some C code or using Rcpp, I type the roxygen code:
#' #useDynLib TheDLL, .registration=true
I did a package in which I included some DLLs created with Haskell, that I put in the inst/libs folder. I didn't type .registration=true in the roxygen code and the package works fine. Should I type it nevertheless? If so, what is the role of .registration=true?
I think you almost certainly shouldn't use it in a general-purpose DLL, but if the DLL was written specifically for R, maybe you should. It indicates that the dll calls R_registerRoutines from its R_init_DLLNAME function, so entry points can be saved into variables. For example, you might have a function named "foo". You can call it using
.Call("foo", ...)
without registering it, and R will need to search symbol tables for it at run time. Or you can register it and call it as
.Call(foo, ...)
and the search is unnecessary. This is discussed mainly in section 5.4.2 of "Writing R Extensions". I believe that if you specify .registration=true then R will use the registration information to find entry points, otherwise it needs to search through all the exports of the DLL, which is probably slower.

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

How can I create a library in julia?

I need to know how to create a library in Julia and where I must keep it in order to call it later. I come from C and matlab, it seems there is no documentation about pratical programming in Julia.
Thanks
If you are new to Julia, you will find it helpful to realize that Julia has two mechanisms for loading code. Stating you "need to know how to create a library in Julia" would imply you most likely will want to create a Julia module docs and possibly a packagedocs. But the first method listed below may also be useful to you.
The two methods to load code in Julia are:
1. Code inclusion via the include("file_path_relative_to_call_or_pwd.jl")docs
The expression include("source.jl") causes the contents of the file source.jl to be evaluated in the global scope of the module where the include call occurs.
Regarding where the "source.jl" file is searched for:
The included path, source.jl, is interpreted relative to the file where the include call occurs. This makes it simple to relocate a subtree of source files. In the REPL, included paths are interpreted relative to the current working directory, pwd().
Including a file is an easy way to pull code from one file into another one. However, the variables, functions, etc. defined in the included file become part of the current namespace. On the other hand, a module provides its own distinct namespace.
2. Package loading via import X or using Xdocs
The import mechanism allows you to load a package—i.e. an independent, reusable collection of Julia code, wrapped in a module—and makes the resulting module available by the name X inside of the importing module.
Regarding the difference between these two methods of code loading:
Code inclusion is quite straightforward: it simply parses and evaluates a source file in the context of the caller. Package loading is built on top of code inclusion and is quite a bit more complex.
Regarding where Julia searches for module files, see docs summary:
The global variable LOAD_PATH contains the directories Julia searches for modules when calling require. It can be extended using push!:
push!(LOAD_PATH, "/Path/To/My/Module/")
Putting this statement in the file ~/.julia/config/startup.jl will extend LOAD_PATH on every Julia startup. Alternatively, the module load path can be extended by defining the environment variable JULIA_LOAD_PATH.
For one of the simplest examples of a Julia module, see Example.jl
module Example
export hello, domath
hello(who::String) = "Hello, $who"
domath(x::Number) = x + 5
end
and for the Example package, see here.
Side Note There is also a planned (future) library capability similar to what you may have used with other languages. See docs:
Library (future work): a compiled binary dependency (not written in Julia) packaged to be used by a Julia project. These are currently typically built in- place by a deps/build.jl script in a project’s source tree, but in the future we plan to make libraries first-class entities directly installed and upgraded by the package manager.

Sourcing So many R scripts

I have lots of .r scripts that I want to source all. I have written a function like the one below to source.
sourcer=function(){
source("wil.r")
source("k.r")
source("l.r")
}
Please can any one tell me how to get this codes activated and how to call each one any time I want to use it?
In addition to the answer by #user2885462, if the amount of R code you need to source becomes bigger, you might want to wrap the code into an R package. This provides a convenient way of loading the code, and allows you to add tests, documentation, etc. Reading the official package writing tutorial is a good place to start for that.
For an individual project, I like to have all (or most) of my R functions in separate .r files, all in the same folder: e.g., AllFunctions
Then at the beginning of my main code I run the following line of code, which sources all .r (and other extensions if they exist - which they usually don't) in the AllFunctions folder:
for (nm in list.files("AllFunctions", pattern = ".[RrSsQq]$")) source(file.path("AllFunctions", nm))

How to manage R extension / package documentation with grace (or at least without pain)

Now and then I embrace project specific code in R packages. I use the documentation files as suggested by Writing R Extensions to document the application of the code.
So once you set up your project and did all the editing to the .Rd files,
how do you manage a painless and clean versioning without rewriting or intense copy-pasting of all the documentation files in case of code or, even worse, code structure changes?
To be more verbose, my current workflow is that I issue package.skeleton(), do the editing on the .Rd-files followed by R CMD check and R CMD build. When I do changes to my code I need to redo the above maybe appending '.2.0.1' or whatever in order to preserve the precursor version. Before running the R CMD check command I need to repopulate all the .Rd-files with great care in order to get a clean check and succsessful compilation of Tex-files. This is really silly and sometimes a real pain, e.g. if you want to address all the warnings or latex has a bad day.
What tricks do you use? Please share your workflow.
The solution you're looking for is roxygen2.
RStudio provides a handy guide, but briefly you document your function in-line with the function definition:
#' Function doing something
#' Extended description goes here
#' #param x Input foo blah
#' #return A numeric vector length one containing foo
myFunc <- function(x) NULL
If you're using RStudio (and maybe ESS also?) the Build Package command automagically creates the .Rd files for you. If not, you can read the roxygen2 documentation for the commands to generate the docs.

Resources