I am looking at a couple of complex models that seem to need a lot of computational power. I am currently using the R package "glmmTMB" to account for spatio-temporal autocorrelation and random effects. In theory, glmmTMB should be able to run much faster using parallelization: https://cran.r-project.org/web/packages/glmmTMB/vignettes/parallel.html
If your OS supports OpenMP parallelization and R was installed using
OpenMP, glmmTMB will automatically pick up the OpenMP flags from R’s
Makevars and compile the C++ model with OpenMP support. If the flag is
not available, then the model will be compiled with serial
optimization only.
Instead of running these models on my personal maschine, I decided to set up a virtual maschine in a HPC environment. How can I install R using OpenMP on Ubuntu 20.04? I couldn't find anything on this topic.
Related
I am using R for a reproducible scientific machine learing & hyperparameter optimizations. I stumble upon the fact that other implementations of blas such openblas/atlas/klm can speedup this costly optimization. But results are slightly different using each blas even if optimization is forced on single thread results deviate from default R.
So I want to try using Docker to contain the experiment. I have multiple questions.
is it good to compile from source instead of binaries ?
if I compile from source, will it lead to same configuration as debian binaries ?
since results are different for each blas, there is a tool called ReproBLAS from Berkeley, is it good idea to use it with R ?
when you compile R using "--with-blas=-lopenblas" in this case openblas is single threaded or multithreaded ?
I use R, but lately are trying to implement Model Predictive Control as applied in Control Engineering on my models, but it seems that this area in R is non-existent as compared to Matlab where it is quite easy to do System Identification and create transfer functions which can be deployed inside the Model Predictive Control module. Anybody knows where to look, or which packages to use for MPC and transfer functions in R to build MPC models?
There are a growing number of Python packages for control engineering. One option is to use the reticulate R package to interface to Python functions. Here are some Control Engineering packages in Python:
SciPy.signal for signal processing and system modeling
SymPy for Laplace transforms and differential equation analytic solutions
Control Systems Library
Chemical Process Control from Jeff Kantor, Notre Dame
Process Dynamics and Control in Python at BYU (my course)
Advanced Control and Machine Learning in Python at BYU (also my course)
Here is an example of running MPC in Python on the Temperature Control Lab.
I am trying to test the multi-threading advantages of using Oracle R Distribution. I have a workstation with a 12 core CPU and 32 GB of RAM available that I'd really like to exploit.
I've downloaded the latest Oracle R distribution and the 30 day trial of Intel MKL 11.1. I've specified my PATH per the Oracle documentation and in R studio when I run Sys.BlasLapack(), I am getting Intel Math Kernel Library (Intel MKL).
However my jobs aren't running any faster. Do I need to run one of the .bat files to actually compile and set parameters for the MKL? I don't have Visual Studio and I can't find anything on the web telling me how to do this. Any pointers? I am using Windows 7 Professional.
Short Answer: Run the benchmark from here under standard BLAS and Intel MKL to see if the MKL is working. MKL will only improve performance for some operations.
To actually get the full power of the Oracle R implementation you would have to use the embedded R functions. These are the ones that start with ore.
In Oracle R Enterprise, embedded R execution is the ability to store R
scripts in Oracle Database and to invoke such scripts, which then
execute in one or more R engines that run in the database and that are
dynamically started and managed by the database.
We have tried out ORE in the office with Oracle running on an Exadata box; we began to see performance lift only when the datasets were extremely large.
If your goal is to take advantage of a more powerful BLAS you don't actually need Oracle R to do that. On a Unix distribution you can build open source R with using the --with-blas option (see this link). I believe the same approach can be used for Windows although I've never compiled R from source with Windows.
Not all R functions run faster with the a different BLAS, in particular most modeling functions like glm don't use the BLAS. To check the performance of your system with different BLAS I have used scripts from this site. They will run much faster if the Intel MKL is being used. Maybe you should try one on your Oracle R distribution and compare with your open source install to confirm that ORE is using the Intel BLAS.
Overall I did not get much day to day performance improvement out of installing the Intel BLAS when I tried it. Revolution Analytics makes a big deal over how their non-free distribution of R leverages the Intel MKL. But they had to rewrite many R functions to take advantage of the increased speed.
The makeCluster function for the SNOW package has the different cluster types of "SOCK", "PVM", "MPI", and "NWS" but I'm not very clear on the differences among them, and more specifically which would be best for my program.
Currently I have a queue of tasks of different length going into a load balancing cluster with clusterApplyLB and am using a 64bit 32-core Windows machine.
I am looking for a brief description of the differences among the four cluster types, which would be best for my use and why.
Welcome to parallel programming. You may want to peruse the vignette of the excellent parallel package that comes with R as it gives a general introduction. It also gives you an idea of what you can or cannot do on Windows -- in short, PVM and MPI are standard parallel programming approaches supported by namesake libraries. These exists on Windows, but are less frequently used and often not as mature as their Unix counterparts.
If you want to stick with snow, your options are essentially limited to SOCK types clusters. Again, the package documentation will have pointers.
Is it possible to do concurrent programming in R
For example, running 2 functions with while(TRUE) loops concurrently.
The snow, Rmpi, and pvm packages have supported this for almost a decade, initially across computers and also on a multi-cpu or multi-core machine.
The multicore package added a the ability to do this on multi-core machines.
Since R 2.14.0, the parallel package has bundled parts of snow and multicore in the basic R distribution. This may be your best starting point now.
A few parts of R itself also use multi-threaded programming, but that approach is limited due to some architectural constraints that are unlikely to be lifted.
We wrote a survey paper on parallel programming with R a few years ago which is still relevant.
Yes as of version 2.14.0 the parallel package is included so you can run things on different threads for one instance of R. See: http://cran.r-project.org/web/views/HighPerformanceComputing.html