I have a question regarding the use of BLAS parallelized matrix product in R (being the default matrix product at least since R-3.4, maybe earlier).
The default behavior (at least on my machine) is now for the matrix product (c.f. example below) to use all the cores available on the machine, which can be a problem.
Do you know how to control the number of cores used for standard matrix product in R?
Thanks in advance
Example:
n=10000
p=1000
q=5000
A = matrix(runif(n*p),nrow=n, ncol=p)
B = matrix(runif(p*q),nrow=p, ncol=q)
C = A %*% B # multi-threaded matrix product
Session info:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8
[5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8
[7] LC_PAPER=fr_FR.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1
The package RhpcBLASctl does just that.
From its DESCRIPTION:
Control the number of threads on 'BLAS' (Aka 'GotoBLAS', 'ACML' and 'MKL'). and possible to control the number of threads in 'OpenMP'. get a number of logical cores and physical cores if feasible.
We mention it in the CRAN Task View on HPC.
Related
I was waiting for R 4.1. and native Apple silicon support to dome some benchmarks against other platforms. The results on my MacBook Pro with the M1 chip look disturbing to me. Let's start with the Mac:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0
The results from the benchmark are:
> N <- 20000
> M <- 2000
> X <- matrix(rnorm(N*M),N)
> system.time(crossprod(X))
user system elapsed
49.954 0.109 50.056
Interestingly, the sessionInfo has different output in R Console but the results are the same:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0
Clearly R uses the Acclerate framework's BLAS libraries, but the benchmarks are similar:
> system.time(crossprod(X))
user system elapsed
49.909 0.117 50.015
Under Windows using my Thinkpad E 580 it is a whole different story:
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] microbenchmark_1.4-7 RevoUtils_11.0.2 RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2 grid_4.0.2 lattice_0.20-41
The computations are much quicker:
> system.time(crossprod(X))
user system elapsed
2.60 0.03 0.70
Windows uses Microsoft R Open and that may explain the difference. On Ubuntu or Fedora, using OpenBlas on the same laptop, the results are similar to Windows. I don't know if this is to be expected. For me the macOS R is inexplicably slow.
I was wrong that in my case R uses the faster vecLib libraries. From the R for Mac OS FAQ:
Currently the default is to use the R BLAS: this is recommended for
precision
This is the case if one installs the regular R from CRAN. The solution is to change to the Appleās Accelerate BLAS:
cd /Library/Frameworks/R.framework/Resources/lib
ln -sf /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib libRblas.dylib
as explained in many sources, for example here.
With changed BLAS the MacBook is much faster than the Thinkpad:
> system.time(crossprod(X))
user system elapsed
0.491 0.059 0.553
I guess I fooled myself because on Fedora the openblas is enabled by default and I was expecting the same. Also, one must read documentation.
I am trying to run the some unconstrained optimization on a large problem on
a super computer, so I am trying to use the ucminf (although this problem
also works if I use the ucminf option in optimx). When I run any simple
optimization, I get the following message:
Error in .Call("mfopt", fnstr, grstr, rho, PACKAGE = "ucminf") :
"mfopt" not available for .Call() for package "ucminf"
For simplicity, here is a simple bit of test code that gives me the error.
library(ucminf)
test<-function(x){
(x-3)^2}
ucminf(0,test)
All of this works fine on my personal computer, but does not work on the
super computer. I have tried this on R/3.5.0 and R/3.3.0 on the super
computer and they both give me the same error.
Here is my Session info.
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)
Matrix products: default
BLAS: /usr/public/R/3.5.0.gnu/lib64/R/lib/libRblas.so
LAPACK: /usr/public/R/3.5.0.gnu/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=en_US.iso885915 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ucminf_1.0-5 numDeriv_2016.8-1
loaded via a namespace (and not attached):
[1] compiler_3.5.0
I have uninstalled and reinstalled the package using Rinstall('ucminf'). Does anyone have experience with this and how to fix it?
I have a question regarding the use of BLAS parallelized matrix product in R (being the default matrix product at least since R-3.4, maybe earlier).
The default behavior (at least on my machine) is now for the matrix product (c.f. example below) to use all the cores available on the machine, which can be a problem.
Do you know how to control the number of cores used for standard matrix product in R?
Thanks in advance
Example:
n=10000
p=1000
q=5000
A = matrix(runif(n*p),nrow=n, ncol=p)
B = matrix(runif(p*q),nrow=p, ncol=q)
C = A %*% B # multi-threaded matrix product
Session info:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8
[5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8
[7] LC_PAPER=fr_FR.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1
The package RhpcBLASctl does just that.
From its DESCRIPTION:
Control the number of threads on 'BLAS' (Aka 'GotoBLAS', 'ACML' and 'MKL'). and possible to control the number of threads in 'OpenMP'. get a number of logical cores and physical cores if feasible.
We mention it in the CRAN Task View on HPC.
I am getting the following error every time I run a Cox model using the survival package in R. This error arose within the last few days. To illustrate the error, I am using a standard example command which is given at https://stat.ethz.ch/R-manual/R-devel/library/survival/html/coxph.html:
# Fit a stratified model, clustered on patients
library(survival)
bladder1 <- bladder[bladder$enum < 5, ]
coxph(Surv(stop, event) ~ (rx + size + number) * strata(enum) +
cluster(id), bladder1)
The error I get is as follows:
Error in fitter(X, Y, strats, offset, init, control, weights = weights, :
object 'Ccoxmart' not found
I am using latest version of R [3.4.0 (2017-04-21) -- "You Stupid Darkness"].
I have tried to consult the survival package manual for R and researched on the internet. I am grateful for any resource or solution you may recommend.
I can confirm this error. It's definitely something to do with the update from R 3.3.3 (Another Canoe) -> R 3.4.0 (You Stupid Darkness). All unit tests in my system working correctly on Friday, broken Monday.
In addition, I'm also having an issue with "Ccoxph_wtest" not being found. Must be a similar issue.
I'll start debugging later today and let you know what I find, but for now if you have to get back up and running, I'd suggest reverting to R v3.3.3 (Another Canoe). I have rerun all my unit tests using v3.3.3 and all is fine.
Here is the sessionInfo():
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
loaded via a namespace (and not attached):
[1] compiler_3.4.0
My solution was to re-install the survival package. Just install it right on top of the original. install.packages("survival").
You have to re-install packages that use C or Fortran in R 3.4.0:
update.packages(checkBuild=TRUE)
See this post
Packages which register native routines for .C or .Fortran need
to be re-installed for this version (unless installed with
R-devel SVN revision r72375 or later)
I have a case where foreach using doMC as a backend produces different behaviors on different machines.
On a linux server running Ubuntu 12.04.4 LTS the following code (adapted from the foreach vingette) runs 5 jobs simultaneously on a single core, which is not the desired behavior.
library(foreach)
library(doMC)
registerDoMC(cores=5)
getDoParWorkers()
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
Session info:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C
[6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.3.3 iterators_1.0.7 foreach_1.4.2
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.1.0 tools_3.1.0
The same code run on a Mac running OSX 10.7.5 produces the desired and expected behavior of running 5 jobs on 5 different cores.
Session info:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.3.2 iterators_1.0.6 foreach_1.4.1
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.1 tools_3.0.1
I have also observed the same behavior using other parallel backends. Both machines have 20+ cores. Any ideas on what's going on?
The issue was caused by OpenBLAS. Switching to ATLAS solved the problem. The recipe for switching between BLAS libraries in Linux is on Nathan VanHoudnos's blog:
Switching between BLAS libraries
Now we can switch between the different BLAS options that are installed:
sudo update-alternatives --config libblas.so.3gf
There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode
1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode
2 /usr/lib/libblas/libblas.so.3gf 10 manual mode
3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode
Press enter to keep the current choice[*], or type selection number:
Side note: If the above returned:
update-alternatives: error: no alternatives for libblas.so.3gf
Try
sudo update-alternatives --config libblas.so.3