When I install the packages in R, sometimes it is used by devtools::install_github(). other times it is used by install.packages().
Could I ask what is the essential difference between them?
R's official repository for packages is located on CRAN (Comprehensive R Archive Network). The process of publishing a package there is very strict and is reachable via install.packages(). For the most part, binary packages (opposed to source code, which is not "properly translated" yet) are available and no additional tools need to be present for proper installation (see next paragraph).
GitHub is one of many webservices that offers repositories for code, incl. R code. Author can upload her or his package and if everything is in its place, the user can install a package from source via devtools::install_github(). This means you need to have a proper toolchain installed (also a distributoin of LaTeX). In Windows, this means Rtools. Linux based OS are likely to be shipped with most of the necessary tools.
Related
We have a ubuntu linux server in our office which is a air-gapped environment. There is no internet access to external network.
However I would like to install few R packages like ggplot2, Database Connector, dplyr, Tidyverse etc. I have more than 10-15 packages to download
While I cannot write the usual command install.packages("DatabaseConnector"), I have to download the zipped folders from CRAN as shown here.
I am new to R. So, can you help me with my questions given below?
a) Why is there are no files for linux systems? I only see windows binaries and macOS binaries. Which one should I download?
b) Should I download binaries or package source? which one is easy to install?
c) When I download packages like above as zipped file from CRAN like shown here, will the dependencies be automatically downloaded as well? Or should I look at error messages and keep downloading them one by one?
d) Since I work in a Air-gapped environment, what would be the best way to do this process efficiently.
Under linux packages are always installed from source. There are no official binary packages for linux. However, your distro might offer some of them in the official repositories. Ubuntu does. However these tend to be quite old versions and usually limited to a handfull of the most important packages. So, for linux you have to download the source packages. The zip files are for windows and will not work.
You will also need to download all of the dependencies of the packages. For something like tidyverse this will be a huge number. Tracking those by hand is a lot of work. Easiest is probably to use a package like miniCRAN outside of your airgapped system to build a selective copy of CRAN. You can specify the packages you want and the package will download all dependencies. You can then copy the downloaded directories to your server, point install.packages in the right direction and install as usually using install.packages. For details see https://andrie.github.io/miniCRAN/articles/miniCRAN-introduction.html.
You might also run into the problem that your system does not have all of the depencies needed to build all of the packages. Under ubuntu you need for example to install libxml2-dev to be able to install the xml package. For that you need to use the package manager of ubuntu. How to do that on an airgapped system is another issue
So the thing is my IT department is scared of R and therefore only allows us to use it on a laptop that can't go online. Therefore they are the ones to install packages on it.
But due to covid I'm at home discovering they didn't install the requested packages. So I'm looking to see if there is a way to download these packages on a laptop with internet access, add them to an USB and then install them from the USB in R?
Quite easy, just download the package from the CRAN site e.g. https://cran.r-project.org/web/packages/imputeTS/
You can either download the "Package source" or the "Binaries" (must be your operating system.
If you are using R Studio, there is then even a menu item for installing the package - just select the file you downloaded and here you go.
If you are not using R Studio, just provide the path to your downloaded archive in the install.packages command.
The only problem is, you also need all the dependencies ...;)
So you ideally already have them and just one package is somehow missing - otherwise this can get quite time consuming to download all the dependencies - because the dependencies itself also usually have dependencies themselves...
In this case the miniCRAN solution Roland linked in the comments might be an idea. ( Offline installation of a list of packages: getting dependencies in order ) But didn't try the miniCRAN myself yet - would be interested how good this actually works.
I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.
I am currently running R on mac osx but am looking to purchase a linux server for more power. Is there any way that I can check for specific R packages whether they will also work on linux? (before, of course, I actually buy the server and try to install and run the given packages). Also, is there any way to determine if a given package would run on certain linux distributions but not others (e.g. Ubuntu vs. Debian)?
Assuming the package is on CRAN, go to the package's CRAN page, e.g. https://cran.r-project.org/package=zoo and then click on the link to the right of CRAN checks which in this example would be labelled zoo results. It would take you to this page: https://cran.r-project.org/web/checks/check_results_zoo.html showing the results of checking that package on various different platforms.
If the package is not on CRAN but is on github and the developer checks it with Travis-CI then you can view the check by clicking on the Travis-CI icon. For example, the klmr modules package is not on CRAN (there is a CRAN package of the same name but it's different); however, if you look at its github home page at https://github.com/klmr/modules and click on the icon which currently is black and green and reads build passing (but could read something else if there are changes to the package or R that breaks tests) then you will be taken to the Travis-CI tests at https://travis-ci.org/klmr/modules .
tl;dr Slightly opinion/personal experience based, but I would be surprised if there were any CRAN package that you couldn't get running on Linux.
In general Unix users tend to install packages from source: CRAN doesn't provide binaries, but source installation is usually painless. The package binaries that are available (the CRAN Linux page has links for Debian, Ubuntu, SUSE, and Red Hat) tend to focus on packages that have extra system-level dependencies (e.g. FFT libraries, or spatial data analysis libraries) where it's more of a nuisance to assemble the needed dependencies for a particular system.
From the CRAN repository policy:
Package authors should make all reasonable efforts to provide cross-platform portable code. Packages will not normally be accepted that do not run on at least two of the major R platforms [i.e. Windows, MacOS, Linux]. Cases for Windows-only packages will be considered, but CRAN may not be the most appropriate place to host them.
When a package fails to run on of one of the three platforms, it's usually Windows. The only package I've ever had real trouble installing on Linux is
R2OpenBUGS on 64-bit systems, because it requires installing a 32-bit toolchain.
I was trying to run code that required the R packages ‘pkgDepTools’ and ‘Rgraphviz’. I received error messages saying that neither package is available for R version 2.15.0.
A Google search turned up the following webpage RPM Pbone that seems to have the packages:
http://rpm.pbone.net/index.php3/stat/4/idpl/17802118/dir/mandrake_other/com/R-pkgDepTools-1.20.0-1-mdv2012.0.i586.rpm.html
and
http://rpm.pbone.net/index.php3/stat/4/idpl/17802080/dir/mandrake_other/com/R-Rgraphviz-1.32.0-2-mdv2012.0.i586.rpm.html
However, the files have an *.rpm extension rather than the *.tar.gz or *.zip extensions I am used to.
I am using Windows 7 and R version 2.15.0. Can I install an R package from an *.rpm file?
From Wikipedia *.rpm seems like maybe it is more for Linux:
http://en.wikipedia.org/wiki/RPM_Package_Manager
Regarding other possible solutions, I have found several earlier posts here with similar questions about installing R packages that are not available for the most recent version of R:
Bivariate Poisson Regression in R?
Package ‘GeneR’ is not available
R Venn Diagram package Venerable unavailable - alternative package?
I have installed the latest version of Rtools and the package 'devtools'. Although I know nothing about them.
There is an archived version of 'Rgraphviz' here:
http://cran.r-project.org/src/contrib/Archive/Rgraphviz/
but I cannot locate an archived version of 'pkgDepTools'.
If I can install the packages on a Windows machine using the above *.rpm files could someone please provide instructions?
If I must use Rtools to build them I might ask more questions because the instructions at the link below are challenging for me:
http://cran.r-project.org/doc/manuals/R-admin.html#Building-from-source
To be completely transparent I am hoping someone might build them for me, if that is possible. Although I recognize the experience and knowledge gained from doing it myself would probably pay off in the long run.
Thank you for any advice.
pkgDepTools and Rgraphviz are BioConductor R packages not ones hosted on CRAN. Unless you configure your R to download packages from those repos, R will report that they are not available; it can only install from repos it has been configured to install from.
To install those BioConductor packages a lite installation method is provided:
source("http://bioconductor.org/biocLite.R")
biocLite(c("pkgDepTools", "Rgraphviz"))
Further details are provided on the Install page of the BioConductor website
In general you can't use rpm packages on Windows; rpm's are the equivalent of a binary package for Linux. Any C/C++/Fortran/etc code will have been compiled for Linux not Windows. If a package really isn't available for your version of R then check if there is a reason stated on CRAN (usually Windows binaries take a few days longer to produce or there may be requirements for software not available on the CRAN Windows build machines). You can try the WinBuilder service run by Uwe Ligges to build Windows Binaries of packages for you, but if the package was on CRAN and now isn't that suggests it no longer works with current R and can not be built.
In general try a wider search for packages; the first hit in my Google search results under the search string "pkgDepTools" is the Bioconductor page for the package which includes a link to the Windows binary and instructions on how to install the package from within R.
I think this merits an answer rather than a comment.
A gentleman at Bioconductor helped me get Rgraphviz installed. The primary problem was that the version of Rgraphviz I had downloaded only seems to work with the 32-bit version of R and I was running a 64-bit version of R. I was able to install Rgraphviz in the 32-bit version of R.
I had also made an error or two in the PATH statement during some of my attempts to install Rgraphviz. However, the post above in my second comment provides the instructions for installation.
You just, it seems, cannot install the normal download version of Rgraphviz in the 64-bit version of R.
I think many of our emails back and forth are now posted on the Bioconductor forum.
I might edit this answer with more detailed instructions in the next 24-hours.