Using nvBLAS in R on Windows?

Using nvBLAS in R on Windows? - r

I am having trouble getting nvBLAS to work in R. I'm using RStudio on a Windows 10 machine, and I have no idea how to link nvBLAS and the original Rblas together for R to start up with both. From the nvBLAS documentation:
To use the NVBLAS Library, the user application must be relinked
against NVBLAS in addition to the original CPU Blas (technically only
NVBLAS is needed unless some BLAS routines not supported by NVBLAS are
used by the application). To be sure that the linker links against the
exposed symbols of NVBLAS and not the ones from the CPU Blas, the
NVBLAS Library needs to be put before the CPU Blas on the linkage
command line.
How exactly do I do this in Windows? Caveat, I am a pretty solid R user, but I am by no means an R expert or a computer scientist. I would ideally like to avoid using an Ubuntu-build for this as well.

Related

Downloading R on Linux for multiple clients

I've created a program that runs in R that I plan on distributing among a lot of other people. Currently the R script is ran completely automatically and behind the scenes with one .sh script which is exactly how it is intended to be. I'm trying to make it so theres no need for client intervention. The R script itself loads the packages and installs them if they aren't present which takes away the task of them installing the packages themselves.
Is there a way I can provide a folder within my Application's folder that they already download that contains R-script and its dependencies so the code can use that location of Rscript to compile and run the R-program I have created. The goal is to be able to download it and run without the need of internet connection to download R and maybe even the programs required packages if possible.
Any help or ideas is appreciated.

I assume that process you want called "creating binary package". Binary is programs (like EXE files) which can run directly on target CPU without any interpreter software (like Python interpreter for python scripts, or Java VM for java applications). I'm not so familiar with packaging of R programs but I found some materials regarding this issue:
1 - Building binary package R
2 - https://seandavi.github.io/post/build-linux-r-binary-packages/
3 - https://support.rstudio.com/hc/en-us/articles/200486508-Building-Testing-and-Distributing-Packages
Second link assumes Linux as target system. Opposite to interpreted languages, binary files often OS dependent (Linux, Windows, or Mac). I, personally, don't know how compatible are packages between Linux systems with different library sets.
Please comment if you find some information misleading, I'll correct the answer.

Julia wrapper for Cuba and fork on Windows?

According to its developer, the Cuba library (for numerical integration) will not run(?) on a Windows machine since the OS is missing the function fork(2).
Windows users: Cuba 3 and up uses fork(2) to parallelize the execution threads. This POSIX function is not part of the Windows API, however, and is furthermore used in an essential way such that it cannot be worked around simply with CreateProcess etc. The only feasible emulation seems to be available through Cygwin.
See [http://www.feynarts.de/cuba/][1].
Since Cuba.jl is just a wrapper for the Cuba library, does this mean that Windows users must install Cygwin in order to use this Julia package?

You don't have to install Cygwin to use Cuba.jl, a normal Windows system is sufficient. The package is tested on AppVeyor without any problem

GNU+Intel openmp dynamic loading

I have been trying to create an R library dynamically loading a dependency using Intel OpenMP. When the library is loaded, there is a clash with OpenMP library.
Using KMP_DUPLICATE_LIB_OK=TRUE gets me past the loading error but the program crashes once it is in a parallel section.
Unfortunately, neither compiling R using intel OpenMP or the dependency using GNU OpenMP is an option (because I want it to work with the standard R distribution and some external dependencies linked statically have to use Intel OpenMP).
However, I can recompile the dependency with some compatibility flags or modify how linking is done (but in the end, it has to be loaded dynamically from the R library). Setting environment variables is also an option (I am thinking about https://software.intel.com/en-us/node/522775 but none of the options seems to help so far).
The R library is written in C and I doubt that the fact it is R which will load it in the end really matters.
Any idea how to handle this?

R Parallel Processing with Xeon Phi, minimal code changes?

Looking at buying a couple Xeon Phi 5110P, but trying to estimate how much code I have to change or other software needed.
Currently I make good use of R on a multi-core Windows machine (24 cores) by using the foreach package, passing it other packages forecast, glmnet, etc. to do my parallel processing.
Having a Xeon Phi I understand I would want to compile R
https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors And I understand this could be done with a trail version of Parallel Studio XE.
Then do I then need to edit R's Makeconf file, adding the C/C++ flags and for the Phi? Compile all the needed packages before the trail on Parallel Studio expires? Or do I not need to edit the Makeconf to get the benefits of foreach on the Phi?
Seems like some of this will be handled automatically once R is compiled, with offloading done by the Math Kernel Library (MKL), but I'm not totally sure of this.
Somewhat related question: Is the Intel Xeon Phi usable without a costly Intel Compiler?
Also revolutionanalytics.com seems to have a few related blog posts but not entirely conclusive for me: http://blog.revolutionanalytics.com/2015/05/behold-the-power-of-parallel.html

If all you need is matrix operations, you can compile it with MKL libraries per here: [Running R with Support for Intel® Xeon Phi™ Coprocessors][1] , which requires the Intel Complier. Microsoft R comes pre compiled with MKL but I was not able to use the auto offload, I had to compile R with the Intel compiler for it to work properly.
You could use the trial version compiler and compile it during the trial period to see if it fits your purpose.
If you want to use things like foreach package by setting up a cluster,since each node is a linux computer, I'm afraid you're out of luck. On page 3 of [R-Admin][1] it says
Cross-building is not possible: installing R builds a minimal version of R and then runs many
R scripts to complete the build.
You have to cross compile from xeon host for xeon phi node with the intel compiler, and it's just not feasible.
The last way to utilize the Phi is to rewrite your code to call it directly. Rcpp provides an easy interface to C and C++ routines. If you found a C routine that runs well on xeon you can call the nodes within your code. I have done this with CUDA and the Rcpp is a thin layer and there are good examples of how to use it, and if you join it with examples of calling the phi card nodes you can probably achieve your goal with less overhead.
BUt, if all you need is matrix ops, there is no quicker route to supercomputing than a good double precision nvidea card and pre loading nvBlas during R startup.

Running a loop in parallel using multiple cores in R [duplicate]

I have a quad-core laptop running Windows XP, but looking at Task Manager R only ever seems to use one processor at a time. How can I make R use all four processors and speed up my R programs?

I have a basic system I use where I parallelize my programs on the "for" loops. This method is simple once you understand what needs to be done. It only works for local computing, but that seems to be what you're after.
You'll need these libraries installed:
library("parallel")
library("foreach")
library("doParallel")
First you need to create your computing cluster. I usually do other stuff while running parallel programs, so I like to leave one open. The "detectCores" function will return the number of cores in your computer.
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores = detectCores() - 1)
Next, call your for loop with the "foreach" command, along with the %dopar% operator. I always use a "try" wrapper to make sure that any iterations where the operations fail are discarded, and don't disrupt the otherwise good data. You will need to specify the ".combine" parameter, and pass any necessary packages into the loop. Note that "i" is defined with an equals sign, not an "in" operator!
data = foreach(i = 1:length(filenames), .packages = c("ncdf","chron","stats"),
.combine = rbind) %dopar% {
try({
# your operations; line 1...
# your operations; line 2...
# your output
})
}
Once you're done, clean up with:
stopCluster(cl)

The CRAN Task View on High-Performance Compting with R lists several options. XP is a restriction, but you still get something like snow to work using sockets within minutes.

As of version 2.15, R now comes with native support for multi-core computations. Just load the parallel package
library("parallel")
and check out the associated vignette
vignette("parallel")

I hear tell that REvolution R supports better multi-threading then the typical CRAN version of R and REvolution also supports 64 bit R in windows. I have been considering buying a copy but I found their pricing opaque. There's no price list on their web site. Very odd.

I believe the multicore package works on XP. It gives some basic multi-process capability, especially through offering a drop-in replacement for lapply() and a simple way to evaluate an expression in a new thread (mcparallel()).

On Windows I believe the best way to do this would probably be with foreach and snow as David Smith said.
However, Unix/Linux based systems can compute using multiple processes with the 'multicore' package. It provides a high-level function, 'mclapply', that performs a list comprehension across multiple cores. An advantage of the 'multicore' package is that each processor gets a private copy of the Global Environment that it may modify. Initially, this copy is just a pointer to the Global Environment, making the sharing of variable extremely quick if the Global Environment is treated as read-only.
Rmpi requires that the data be explicitly transferred between R processes instead of working with the 'multicore' closure approach.
-- Dan

If you do a lot of matrix operations and you are using Windows you can install revolutionanalytics.com/revolution-r-open for free, and this one comes with the intel MKL libraries which allow you to do multithreaded matrix operations. On Windows if you take the libiomp5md.dll, Rblas.dll and Rlapack.dll files from that install and overwrite the ones in whatever R version you like to use you'll have multithreaded matrix operations (typically you get a 10-20 x speedup for matrix operations). Or you can use the Atlas Rblas.dll from prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64 which also work on 64 bit R and are almost as fast as the MKL ones. I found this the single easiest thing to do to drastically increase R's performance on Windows systems. Not sure why they don't come as standard in fact on R Windows installs.
On Windows, multithreading unfortunately is not well supported in R (unless you use OpenMP via Rcpp) and the available SOCKET-based parallelization on Windows systems, e.g. via package parallel, is very inefficient. On POSIX systems things are better as you can use forking there. (package multicore there is I believe the most efficient one). You could also try to use package Rdsm for multithreading within a shared memory model - I've got a version on my github that has unflagged -unix only flag and should work also on Windows (earlier Windows wasn't supported as dependency bigmemory supposedly didn't work on Windows, but now it seems it does) :
library(devtools)
devtools::install_github('tomwenseleers/Rdsm')
library(Rdsm)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using nvBLAS in R on Windows? - r

Related

Downloading R on Linux for multiple clients

Julia wrapper for Cuba and fork on Windows?

GNU+Intel openmp dynamic loading

R Parallel Processing with Xeon Phi, minimal code changes?

Running a loop in parallel using multiple cores in R [duplicate]

Categories

Resources