R: How to enable JIT compiler via script, not environment? - r

I work on several R projects at the same time. One of them involves a simulation with a for-loop, which I hope to speed up by using a JIT-compiler. To do so, I added to the file Rcmd_environ in my R-directory/etc the lines following this recommendation.
R_COMPILE_PKGS=TRUE
R_ENABLE_JIT=3
Now I wonder, whether it is possible, to turn this on and off via a script. This way, I wouldn't have JIT-compilation in my other projects. Any ideas?

You can load the compiler library and then set the JIT level via calling the 'enableJIT` function.
For e.g. you can do
require(compiler)
enableJIT(3)
to get full JIT compilation.

Related

Load packages automatically without `using` in Julia?

When looking at files like this: https://github.com/simon-lc/Silico.jl/blob/main/examples/demo/peg_in_hole_planning.jl
The author does not call "using Silico" or "using Mehrotra" anywhere, yet calls it many times throughout the file. As someone coming from Python, I don't understand this. How does Julia know where to look for Silico without a statement like "using Silico"?
For this, you can customize the configuration file of Julia.
For example, in Windows OS, you can go to the following path:
C://Users//.julia/config/startup.jl
Open the file and write the importing command(s) you want. E.g., using Term or using OhMyREPL and using Statistics: mean, std (then those functions will be available by default). Then every time you run the Julia, those packages will be imported automatically.
*Note that if this file doesn't exist in the path, you can create a file with the same name.
You can also compile the preferred packages into the Julia system image, and the Julia REPL will start a bit quicker since it does not have to parse and compile the package when loaded. The way to do this is by using PackageCompiler.jl. [1]

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

What is the difference between library()/require() and source() in r?

Looked around and still not sure what is the difference between library()/require() and source() in r? According to this SO question: What is the difference between require() and library()? it looks like library() and require() are the same thing, maybe one is legacy. Is source() for lazy developers that don't want to create a library? When do you use each of these constructs?
The differences between library and require are already well documented in What is the difference between require() and library()?.
So, I will focus on how source differs from these. In fact they are fundamentally quite different commands. Neither library nor require actually execute any code. They simply attach a namespace, in a lazy fashion, meaning that individual functions in the package are not run unless they are actually called later. Source on the other hand does something quite different which is to execute all of the code in the file at that time.
A small caveat: packages can be made to actually run some code at the time of package loading or attaching, via the .onLoad and .onAttach functions. Have a look here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/ns-hooks.html
source runs the code in a .R file, line by line.
library and require load and attach R packages.
Is source() for lazy developers that don't want to create a library?
You're correct that source is for the cases when you don't have a package. Laziness is not the only reason, sometimes packages are not appropriate---packages provide functionality, but don't do things. Perhaps I have a script that pulls data from a database, fits a model, and makes some predictions. A package may provide functions to help me do that, but it does not actually do it. A script saved in a .R file and run with source() could run the commands and complete the task.
I do want to address this:
it looks like library() and require() are the same thing, maybe one is legacy.
They both do the same thing (load and attach a package). The main difference is that library() will throw an error and stop the script if the package is not available, whereas require() will return TRUE or FALSE depending on its success. The general consensus is that library is better so that your script stops with a nice clear error and you can install the missing package before proceeeding. The question linked has a more thorough discussion which I won't try to replicate here.

Running a R console in RInside

Is it possible to run something similar to a Linux R console (which uses GNU Readline) from within a C++ program using RInside? The best option would be, if such a console would have all the nice features like the autocomplete.
The background:
I have a big solver, which has a RInside-based plugin for running small chunks of R code during a simulation. It would be nice if the user would be able to switch it to "interactive" mode and check things out as they go.
Notice:
1. I cannot just run R as a separate program, as I need it to see my objects and pointers from the main code. 2. I know about callbacks in RInside, but they do not provide any console-like capabilities.
Code: I doubt it will help, but here is my code now: https://github.com/llaniewski/TCLB/blob/RInside/src/Handlers/cbRunR.cpp.Rt

Is there a way to use Expect-Lite variables inside of a spawned command?

I've been working on trying to automate the complicated process of building source code on a build machine and then transferring the compiled image files over to my embedded ARMv7 device to be flashed. Each step by itself is easy to automate with standard Linux Shell Script, but when trying to do everything in one giant script things get complicated. Thus far I've been using expect-lite to do the work, which is working except now I've run into a problem. When transferring the images over I have expect-lite code that looks like the following:
$imageDestination="/the/destination"
$imageSource="/the/source/"
>sftp $userName'#'$buildMachine
>$password
>get $imageSource'/'x-load_sdcard.bin.ift $imageDestination'/'MLO
>echo "Finished"
>bye
If you know a thing or two about expect-lite, then you'll know that the above variables will be read as "Shell" variables. The problem is that as far as I know SFTP doesn't allow the use of variables. Is there a way to tell expect-lite to use the predefined variables instead of trying to use "Shell" variables? Or, is there some cleaver way to get around this limitation without removing the variables?
All help is greatly appreciated.
Dreligor,
There is no scope issue. Expect-lite variables are all of global scope (as stated in the documentation). I think the problem is that you are using quotes which is making things more difficult. You should try:
$imageDestination=/the/destination
$imageSource=/the/source
>sftp $userName'#'$buildMachine
>$password
>get $imageSource/x-load_sdcard.bin.ift $imageDestination/MLO
>echo "Finished"
>bye
Craig Miller - author and maintainer of expect-lite
After some experimentation it turns out that this is a scope issue. The solution is to simply move the variable declarations down. They need to be declared after the script has connected to the remote machine via sftp. The fixed code is as follows:
>sftp $userName'#'$buildMachine
>$password
$imageDestination="/the/destination"
$imageSource="/the/source/"
>get $imageSource'/'x-load_sdcard.bin.ift $imageDestination'/'MLO
>echo "Finished"
>bye
Hopefully this will help others.

Resources