According to this link
only installing the following package in Ubuntu will speed up R significantly for certain calculations:
libatlas3gf-base
Do I have to compile from source to get this benefit? If not, do I have to reinstall R after installing this package?
Are there any other packages that are similar to this in that they can speed things up by just installing them?
The libatlas3gf-base base package will already help over the default "reference blas", but you can (if you care) do better by locally building Atlas. That is in a way the whole point of Atlas as the A and T stand for Automatically Tuned.
Now, keep in mind that a) rebuilding the package is not as trivial as just installing the base package and b) you were quite right in pointing to certain calculations. Your net time spent in R will only rarely by bound by the linear algebra operations that you can accelerate here. So for me, just installing atlas-base is usually good enough on my Ubuntu and Debian systems.
Related
This question already has answers here:
R: apt-get install r-cran-foo vs. install.packages("foo")
(2 answers)
Closed 7 years ago.
In Debian, there are some compiled R packages in the official repositories. But one could also install a R package from source.
I am interested to know why would a user prefer one method of installation to another.
It's sometimes preferable to 'compile' the sources on your server rather than just using an existing executable file.
This is because the compiler makes the exe file specifically for your machine so may run faster and work much better, for instance the compiler knows the processor you have so can optimise for this.
I already provided a somewhat detailed answer in response to this SO question.
As an update, these days you even have lots of packages prebuilt thanks to updated cran2deb initiaives:
On Ubuntu you now have almost all CRAN packages prebuilt via Michael Rutter's 'cran2deb for ubuntu' ppa on Launchpad.
For Debian, Don Armstrong now provides a similar service (also covering BioConductor and OmegaHat) at debian-r.debian.net.
The idea of pre-compiled R packages for Debian/Ubuntu is borrowing from Windows and MacOS. Those OSes have pre-compiled packages since they typically don't have the standard tools in standard locations for building packages from source (c and fortran compilers, latex, perl, etc.).
If there is a new release of a package on CRAN, is the pre-compiled package on Debian repos automatically updated? I believe that you better sync with CRAN. Check out the package ctv to help you manage large collections of R packages ("CRAN views"), both for installing and updating.
You need root privileges to install a pre-compiled package from the OS repos, while any regular user may install any packages using install.packages() in R (but I recommend to run sudo R, if you are the sysadmin, for installing CRAN views, so as to make them available system-wide, instead of inflating your ~/).
One inconvenient to source packages is that if you fetch many, the compiling will take extra time to install (depending on your machine). You might gain in performance from compiling, but it is not guaranteed to be noticeable.
I have R installed on one linux computer where there are number of packages installed. Now I am setting up R on another linux computer. Installing R is easy from their repository but I will have to install many packages using
install.packages("pkgname")
which will involve repeat downloading as well. Is there any way I can copy all the installed packages from first computer to the second one? Thanks for your help.
I would recommend against this approach. Some of those packages will have been installed from source which includes compile-time checks based on what is installed on "computer one" and that will not necessarily be true on the other computer.
You have two basic choices
Use binary package (ie r-cran-pkgname for various packages). These will work but a) not all of CRAN exists that way and b) they may lag the current release.
Install from source. Just run saveRDS(installed.packages(), file="/tmp/pkgs.rds") on the first computer and pkgs <- readRDS("/tmp/pkgs.rds"); install.packages(rownames(pkgs)) on the second after transfering the file.
I found that using one of BLAS/ATLAS/MKL/OPENBLAS will give improvement on speed in R. However, will it still improve the R Package that is written in C or C++?
for example, R package Glmnet is implemented in FORTRAN and R package rpart is implemented in C++. Will it just installing BLAS/...etc will improve the execution time? or do we have to rebuild (building new C code) the package based on BLAS/...etc?
It is frequently stated, including in a comment here, that "you have to recompile R" to use different BLAS or LAPACK library. That is wrong.
You do not have to recompile R provided it is build against the shared library versions of BLAS and LAPACK.
I have a package and vignette on CRAN which uses this fact to provide a benchmarking framework in which different BLAS and LAPACK version are timed against each just by installing different ones (one commmand in Debian/Ubuntu) and running benchmarks -- this is so straightforward that it can be automated in a package such as this.
The results in that package will provide an idea of the possible speed differences. Exactly how they pan out depends on your computer, your data (size), your problem etc. But if, say, your problem uses LAPACK functions which can run benefit from running multithreaded then installing OpenBLAS may help. That is true for any R package using LAPACK as they will use the same LAPACK installation accessed through are, and these can be changed.
For over a year now, I've been afraid to update my version of R for fear of losing the "rgdal" package...which when I first started working with R on my mac (and maybe still?), had to be installed from source/ could not be installed via the package installer from within R. That was complicated and I had to seek help from a more experienced R user than myself in order to get this package which is critical for me.
But I've finally decided that I need to take the risk and update R. I used the following instructions:
#--run in the old version of R
setwd("C:/Temp/")
packages <- installed.packages()[,"Package"]
save(packages, file="Rpackages")
Followed by this in the new version:
#--run in the new version
setwd("C:/Temp/")
load("Rpackages")
for (p in setdiff(packages, installed.packages()[,"Package"]))
install.packages(p)
found in this post:
Painless way to install a new version of R?
This seems to have worked (e.g. I can open up libraries that aren't included in the base install...including rgdal) but I have the following questions in order to better understand this whole process:
1) Is it the case that in following this approach, I essentially saved a list of all the R packages I had previously installed in my old version, then (from within the new version) determined the SET of packages that differed from the set of base libraries in the new version and told R to install packages belonging to this SET?
2) If the above is true, then does this approach negate the need to update my packages after (e.g. by installing them from within the new version, the newest versions are installed)?
3) The other approach that seems to be common out there (and that is recommended in one of the responses to the post above) is to set up things so that all packages get saved to a directory outside of R and then change the settings (in the .Renviron file or whatever the appropriate file is) to always look for packages in this external directory... I'm wondering why this approach is favoured by some people? Is it because this approach means that after updating R everything is just ready to go (e.g. if one is willing to work with un-updated packages)? I'm confused because if one still has to use update.packages() after installing a new version of R, doesn't this just more or less amount to re-installing them? What are the advantages?
4) Are there packages that I need to worry about if I do go the install route (and not the save to external directory then update route)? R did give me a warning that indicates that four of my packages are not available for the latest version (R. 3.0.3). I'm assuming that if I need to use these packages, I must temporarily just revert to an older version of R. Is this all correct?
Thank you in advance for the help!
1) Yes.
2) The newly installed packages will be at the latest version, but there is no harm in running update.packages(ask=FALSE, checkBuilt=TRUE) afterward just be be sure everything is up to date.
3) I've not seen anyone suggest this particular approach, though a similar process is suggested in the R for Windows FAQ that consists of 1) installing the new version 2) copying the packages from the library folder of the old version to the library folder of the new version and 2) running update.packages(checkBuilt=TRUE, ask=FALSE). I don't think there are any clear advantages to doing it this way vs. the installed.packages()... way.
4) All methods under discussion are limited to CRAN packages, so things that you installed from e.g., bioconductor will not be covered. Also packages removed from CRAN will not be covered, you will either have to use an older version of R or build and install them manually.
The new R 3.0.0 requires that all contributed packages are reinstalled. Two questions:
Does this also mean that software that calls R, e.g. rapache, need to be recompiled after R has been upgraded?
Are the new builds backward compatible? E.g. if an r-cran-xxx package has been compiled using r-base-dev 3.0.0, can this package be loaded in R 2.15? Or do we need to distribute separate binary packages for R 2.15 and R 3.0.0?
This is really a question for r-devel, or as you use our Debian / Ubuntu package terminology, r-sig-debian.
In short:
Question 1 is AFAICT a no. The R C API did not change. The design of Rapache heavily influenced our much smaller littler r, and r runs just fine on my Ubuntu box after installing R 3.0.0 from Michael's builds based on my packages. Even though littler (see r --version) was built against R 2.15.2.
Question 2 is a no, and that is no change. R always moves "forward in time", not backwards. Just how we need package rebuilds when NAMESPACES were added, and when the help format changed. So if you have N different R versions with M different ABIs for packages, you may need M trees and manage your libPaths. Nothing new here.