I am using R for a reproducible scientific machine learing & hyperparameter optimizations. I stumble upon the fact that other implementations of blas such openblas/atlas/klm can speedup this costly optimization. But results are slightly different using each blas even if optimization is forced on single thread results deviate from default R.
So I want to try using Docker to contain the experiment. I have multiple questions.
is it good to compile from source instead of binaries ?
if I compile from source, will it lead to same configuration as debian binaries ?
since results are different for each blas, there is a tool called ReproBLAS from Berkeley, is it good idea to use it with R ?
when you compile R using "--with-blas=-lopenblas" in this case openblas is single threaded or multithreaded ?
Related
I am looking at a couple of complex models that seem to need a lot of computational power. I am currently using the R package "glmmTMB" to account for spatio-temporal autocorrelation and random effects. In theory, glmmTMB should be able to run much faster using parallelization: https://cran.r-project.org/web/packages/glmmTMB/vignettes/parallel.html
If your OS supports OpenMP parallelization and R was installed using
OpenMP, glmmTMB will automatically pick up the OpenMP flags from R’s
Makevars and compile the C++ model with OpenMP support. If the flag is
not available, then the model will be compiled with serial
optimization only.
Instead of running these models on my personal maschine, I decided to set up a virtual maschine in a HPC environment. How can I install R using OpenMP on Ubuntu 20.04? I couldn't find anything on this topic.
I recently looked into the usage of GPU computation, where the usage of package seemed to be confusing.
For example, CuArrays and ArrayFire seemed to be doing the same thing, where ArrayFire seemed to be the "official" package on Nvidia developers' webpage.(https://devblogs.nvidia.com/gpu-computing-julia-programming-language )
Also, there were CUDAdrv and CUDAnative Packages..., which seemed to be confusing, as their functionality seemed to be not as straightforward as the others.
What does these packages do? Is there any difference between CuArrays and ArrayFire?
As explained in the blog post you shared, it is quite simply as given below
The Julia package ecosystem already contains quite a few GPU-related
packages, targeting different levels of abstraction as Figure 1 shows.
At the highest abstraction level, domain-specific packages like
MXNet.jl and TensorFlow.jl can transparently use the GPUs in your
system. More generic development is possible with ArrayFire.jl, and if
you need a specialized CUDA implementation of a linear algebra or deep
neural network algorithm you can use vendor-specific packages like
cuBLAS.jl or cuDNN.jl. All these packages are essentially wrappers
around native libraries, making use of Julia’s foreign function
interfaces (FFI) to call into the library’s API with minimal overhead.
CUDAdrv and CUDAnative packages are meant for directly using CUDA runtime API and writing kernels from Julia itself. I believe that is where CuArray come in handy - wrapping native Julia objects into CUDA accessible format, roughly speaking.
ArrayFire on the other hand is a generic library that wraps around all(cuBLAS, cuSparse, cuSolve, cuFFT) CUDA provided domain specific libraries into nice interface(functions). Apart from the interface to CUDA's domain specific libraries, ArrayFire by itself provides lot of other functions in the areas of statistics, image processing, computer vision etc. It has nice JIT feature where user's code is compiled to a runtime kernel - simply put. ArrayFire.jl is an language binding with some extra Julia specific improvements at wrapper level.
That's the general difference. From a developers perspective, using a library(like ArrayFire) basically takes out the burden of keeping up with CUDA API and maintaining/tweaking the kernels for optimum performance which I think takes lot of time.
PS. I am a member of ArrayFire development team.
I am trying to test the multi-threading advantages of using Oracle R Distribution. I have a workstation with a 12 core CPU and 32 GB of RAM available that I'd really like to exploit.
I've downloaded the latest Oracle R distribution and the 30 day trial of Intel MKL 11.1. I've specified my PATH per the Oracle documentation and in R studio when I run Sys.BlasLapack(), I am getting Intel Math Kernel Library (Intel MKL).
However my jobs aren't running any faster. Do I need to run one of the .bat files to actually compile and set parameters for the MKL? I don't have Visual Studio and I can't find anything on the web telling me how to do this. Any pointers? I am using Windows 7 Professional.
Short Answer: Run the benchmark from here under standard BLAS and Intel MKL to see if the MKL is working. MKL will only improve performance for some operations.
To actually get the full power of the Oracle R implementation you would have to use the embedded R functions. These are the ones that start with ore.
In Oracle R Enterprise, embedded R execution is the ability to store R
scripts in Oracle Database and to invoke such scripts, which then
execute in one or more R engines that run in the database and that are
dynamically started and managed by the database.
We have tried out ORE in the office with Oracle running on an Exadata box; we began to see performance lift only when the datasets were extremely large.
If your goal is to take advantage of a more powerful BLAS you don't actually need Oracle R to do that. On a Unix distribution you can build open source R with using the --with-blas option (see this link). I believe the same approach can be used for Windows although I've never compiled R from source with Windows.
Not all R functions run faster with the a different BLAS, in particular most modeling functions like glm don't use the BLAS. To check the performance of your system with different BLAS I have used scripts from this site. They will run much faster if the Intel MKL is being used. Maybe you should try one on your Oracle R distribution and compare with your open source install to confirm that ORE is using the Intel BLAS.
Overall I did not get much day to day performance improvement out of installing the Intel BLAS when I tried it. Revolution Analytics makes a big deal over how their non-free distribution of R leverages the Intel MKL. But they had to rewrite many R functions to take advantage of the increased speed.
The makeCluster function for the SNOW package has the different cluster types of "SOCK", "PVM", "MPI", and "NWS" but I'm not very clear on the differences among them, and more specifically which would be best for my program.
Currently I have a queue of tasks of different length going into a load balancing cluster with clusterApplyLB and am using a 64bit 32-core Windows machine.
I am looking for a brief description of the differences among the four cluster types, which would be best for my use and why.
Welcome to parallel programming. You may want to peruse the vignette of the excellent parallel package that comes with R as it gives a general introduction. It also gives you an idea of what you can or cannot do on Windows -- in short, PVM and MPI are standard parallel programming approaches supported by namesake libraries. These exists on Windows, but are less frequently used and often not as mature as their Unix counterparts.
If you want to stick with snow, your options are essentially limited to SOCK types clusters. Again, the package documentation will have pointers.
I can't find it anywhere on the web (and I don't want to install it). Is the R language a compiled language? How fast does it run a pre-written script? Does it do any kind of compilation, or just execute instructions line by line?
In most cases R is an interpreted language that runs in a read-evaluate-print loop. There are numerous extensions to R that are written in other languages like C and Fortran where speed or interfacing with native libraries is helpful.
I've often rewritten R code in C++ and made it run 100x faster. Looping is especially inefficient in R.
R is generally an interpreted language. However, package compiler offers bytecode compilation that can improve performance. You can also call compiled code from R.
In terms of how fast, it depends on what you are trying to do and how you are trying to do it. Some looping operations can be very slow. However, in many cases, with well written code, the performance of R scripting is determined by the speed of the underlying internal C-based libraries and system memory read-write speeds, and so R is about as fast as anything else.