We are managing the R packages in cluster via puppet and we have created one file which has commands like below. We have mirror of R package repo internally.
install.packages("BH",repos=NULL, dependencies=TRUE, contriburl=http://our_internal_repo.com)
in the rPackages.xt
This using puppet we are executing via RScript rPackages.txt
Now in next week we get 3 more additional packages , we modify the rPackages to include additional lines for new packages.
Now since scirpt will read from start to end , it will try to reinstall all packages.
My question is , how to install package only if the version installed is not same as the one present in our internal repo.
How to do those checks in the RScript and execute puppet accordingly.
What are the best practices to manage R installations.
Thanks
Change the Rscript to
if("BH" %in% rownames(installed.packages()) == FALSE) {install.packages("BH")}
as for the version, you could probably use packageVersion somehow.
Related
I am trying to run R code in Jupyter and the R Kernel was added. Most of the time, packages can be installed successfully. However, some of the packages, such as RCurl and ggmap, would got error while installing.
Example:
install.packages("RCurl")
Warning message in install.packages("RCurl"):
“installation of package ‘RCurl’ had non-zero exit status”Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
What should I do?
Try to specify CRAN as repository in your install.packages statement when installing RCurl and ggmap. For example:
install.packages("RCurl", repos='http://cran.us.r-project.org')
This Stack Overflow post on installing R packages through Anaconda/Jupyter beyond those included in R essential provides more detail.
(Side note: I had encountered the same issue when trying to install R packages on computer clusters. This solution worked for me.)
use conda comment:
conda install r-RCurl
I kept getting the non-zero exit status when trying to install packages with Jupyter notebook with R kernel and was failing because of multiple dependencies when wanting to install a package. I am not an expert in any of these so please forgive me if I make an error in explaining or if it is a non-issue for you but please feel free to comment to clear things out. I just want to share my success story so hopefully it can help someone else: I am working on a MacBook Pro. Here are the information I get when I run R.version() on my jupyter notebook with R kernel:
$platform 'x86_64-conda_cos6-linux-gnu'
$arch 'x86_64'
$os 'linux-gnu'
$system 'x86_64, linux-gnu'
$language 'R'
$version.string 'R version 3.6.1 (2019-07-05)'
These are the steps to take to fix the issue:
Go to https://anaconda.org/
Search the package name that you are trying to install
Copy the one line that is given to install the package, it should be something like:
Conda install -c r r-caret #conda install -c r r-package_name
NOTE: sometimes during installing packages, you’re asked whether or not you want to continue, so add --y at the end of the above statement to continue, so something like this
Conda install -c r r-caret --y
(I will always add it just to be on the safe side)
Click on the new launcher (+ icon) to create a new notebook with PySpark (once opened it has .ipynp extension)
On the first cell paste the copied line from step 2 and run
Once done, restart the kernel on the current notebook
Restart the kernel on your other notebook with R kernel
Run library(package_name) on your notebook with R kernel (e.x. library(caret))
install.packages("Hmisc", .libPaths(), repos='http://cran.us.r-project.org')
This command will install the packagae in the conda
"/home/user/anaconda3/lib/R/library" and use the cran r repository as source.
Add path in Anaconda
As per this answer,
one can also add additional paths in anaconda to load libraries from (for eg., the location where R studio saves the user-installed packages) with
.libPaths( c( .libPaths(), "~/userLibrary") )
For example, the following worked for me:
In Anaconda :
.libPaths( c( .libPaths(), "C:\Users\name\Documents\R\win-library\3.5") )
When I tried to add anaconda's library path to RStudio, it resulted in errors (The procedure entry point MARK_NOT_MUTABLE could not be located in the dynamic link library << arose 4 times in succession) after installation of a package, though the package seemed to load.
Replace name with your local user folder name
Add/change path in RStudio
A useful link to make changes in default user-installed library paths :
https://www.accelebrate.com/library/how-to-articles/r-rstudio-library
To find out where a package has been installed:
find.package('package_name')
The directions nobody else supplied worked for me, but I found this guide, and it worked. Spent way too much time trying all these when I just needed a few simple commands. https://developers.refinitiv.com/en/article-catalog/article/setup-jupyter-notebook-r
I already had R and python installed, so I skipped to step 3. The only seems to mention windows, but it worked for me on mac as well. After following them I was able to install the packages using install.packages("dplyr", repos = "http://cran.us.r-project.org") in a cell in jupyter.
You have to create a directory in which your package will be and do for eg:
install.packages('ggplot2',loc='your directory')
First Step: You can install the IRkernel packages by running the following command in an R console:install.packages('IRkernel')
Second Step: You will have to make Jupyter see the newly installed R kernel by installing a kernel spec. To install system-wide, set user to False in the installspec command IRkernel::installspec(user = FALSE)
Setup Jupyter Notebook for R
I have to perform a remote R installation on an Ubuntu 16.10 system. As part of this, I have to install specific packages on that host. I want to install these packages Rcmdr,list,ggplot2,afex,lsmeans. Since I am doing this remotely, I cannot use
sudo -i R
to first enter the R CLI and then install with install.packages(). Instead I must somehow install the packages from the Ubuntu CLI.
I found these links:
multiple R package installation with
install.packages()
R CMD INSTALL -l usage syntax to install multiple packages in
section
6.3
Use of repos parameter inside
install.packages()
However, some packages have dependencies:
The list package depends on utils and sandwich.
The Rcmdr package depends on grDevices, utils, splines, RcmdrMisc, car.
The ggplot2 package also has dependencies.
I would like to install only the packages Rcmdr,list,ggplot2 with all their dependencies. Normally, I would do it this way:
install.packages(c('Rcmdr','list','ggplot2'), dependencies=TRUE)
QUESTIONS
How do I specify the dependencies option in R CMD for one package
only? Is this the way to install them
R CMD INSTALL -l Rcmdr dependencies=TRUE, list dependencies=TRUE, \
ggplot2 dependencies=TRUE, afex, lsmeans
or this incorrect?
Also, how to I specify the repos parameter inside R CMD INSTALL -l?
EDIT
As per the first comment below, sudo is not needed above.i.e. sudo -i R can be replaced by R.
Regarding your questions:
Question 1
This may not be the best approach. Consider instead Rscript -e 'install.packages(...)' which is what R CMD INSTALL ... calls anyway. You have better control over options here. And read on...
Question 2
On all Ubuntu machines at work and home I do this via /etc/R/Rprofile.site via something like
## Example of Rprofile.site
local({
r <- getOption("repos")
r["CRAN"] <- "https://cloud.r-project.org"
r["ghrr"] <- "https://ghrr.github.io/drat"
options(repos = r)
})
where we usually add a third and network-local repo. You may just want CRAN here -- it uses the 'always-close to you' CDN administered by RStudio for the R Project and R Consortium. The ghrr drat is a helper repo I set up.
Question 3
sudo is not needed per something I add to the official Debian/Ubuntu package for R -- but you need to be a member of the group that owns /usr/local/lib/R/site-library.
Now, if I may, two more suggestions:
Littler
The r executable is available to you via sudo apt-get install r-cran-littler. I use it on the command-line; and you probably want to look into the basic install.r script and the extended install2.r. I tend to create a softlink from /usr/local/bin to the package directory for these and other (such as update.r). I have been running many (Ubuntu and Debian) machines like that for many years.
Michael Rutter repos, and Docker
We actually have about 3000 CRAN packages as binaries for Ubuntu so you could just do sudo apt-get install ... and all dependendies would get resolved. Look eg in this script of mine (which uses them on Travis) or some of the Docker files I maintain such as this one.
I currently have testthat installed on my machine. I want to install the package and all its dependencies (recursively) into a separate library. The problem is that when I try to do this using install.packages("testthat", lib = "newdir"), its dependencies, such as xml2 are not installed along with it. How can I install a package and all its dependencies into a new library?
I would do the following:
use install2.r from littler with its -l argument for the target library (and I do things like this all the time for reverse dependency checks
maybe use a properly set/reset .libPaths() so that the current installation you are doing does not "see" the existing installations; worst case you make a copy of install2.r and set/reset .libPaths() there; you may need to experiment with Rscript vs r to launch it as r gets some values "baked in" during its compilation
Taken together it is basically what we do when we keep a separate R-devel on a box as well.
Edit: You can of course script this with install2.r -- it is just a wrapper to install.packages(). But it happens to be setting the relevant arguments.
I am trying to implement a reducer for Hadoop Streaming using R. However, I need to figure out a way to access certain libraries that are not built in R, dplyr..etc. Based on my research seems like there are two approaches:
(1) In the reducer code, install the required libraries to a temporary folder and they will be disposed when the session is done, like this:
.libPaths(c(.libPaths(), temp <- tempdir()))
install.packages("dplyr", lib=temp, repos='http://cran.us.r-project.org')
library(dplyr)
...
However, this approach will have a dramatic overhead depending on how many libraries you are trying to install. So most of the time will be wasted on installing libraries(sophisticated libraries like dplyr has tons of dependencies which will take minutes to install on a vanilla R session).
So sounds like I need to install it before hand, which leads us to approach2.
(2) My cluster is fairly big. And I have to use some tool like Ansible to make it work. So I prefer to have one Linux shell command to install the library. I have seen R CMD INSTALL... before, however, it feels like will only install packages from source file instead of doing install.packages() in R console, figure out the mirror, pull the source file, install it in one command.
Can anyone show me how to use one command line in shell to non-interactively install a R package?
(sorry for this much background knowledge, if anyone thinks I am not even following the right phylosophy, feel free to leave in the comment how this whole cluster R package should be managed.)
tl;dr
Rscript -e 'install.packages("drat", repos="https://cloud.r-project.org")'
You mentioned you are trying to install dplyr into custom lib location on your disk. Be aware that dplyr package does not support that. You can read more in dplyr#4641.
Moreover if you are installing private package published in internal CRAN-like repository (created by drat or tools::write_PACKAGES), you can easily combine repos argument and resolve dependencies from CRAN automatically.
Rscript -e 'install.packages("priv.pkg", repos=c("cran.priv","https://cloud.r-project.org"))'
This is very handy feature of R repositories, although for production use I would recommend to cache packages from CRAN locally, and use those, so you will never be surprised by a breaking changes in your dependencies. For quality information about handling R in production I suggest to look into talk by Wit Jakuczun at WhyR2019 How to make R great for machine learning in (not only) Enterprise: slides, video.
You may find littler useful. It is a command-line front-end / variant of R (which uses the R-embedding interface).
I use the install.r script all the time to install package from the shell. There is a second variant with more command-line argument parsing but it has an added dependency.
I have built an R package, i.e. I have the mypackage.tar.gz file. This package depends on several other packages, all downloadable and installable from any CRAN mirror.
Now I want to install this package on a system where the dependencies are not yet installed, and I would like that the dependencies will be downloaded and installed automatically when I install my package.
I tried:
install.packages("mypackage.tar.gz",type="source",dependencies=TRUE,repos="http://a.cran.mirror")
but it searches for mypackage.tar.gz on the mirror (and obviously it does not find), while if I set repos=NULL it correctly tries to install the local package file (as documented), but obviously it does not find the dependencies packages.
So my question is: is there a way to perform a 'mixed' installation (local package with online dependencies) or the only way to do is to manually install all the dependencies?
You could use install from the devtools package. Just run install("<directory of your package>", dependencies = TRUE). Its help states:
Uses R CMD INSTALL to install the package. Will also try to install dependencies of the package from CRAN, if they're not already installed.
Here, I'm using untar() with devtools::install() and passing in a directory to which the source tarball has been extracted.
d <- tempdir()
untar("mypackage.tar.gz", compressed="gzip", exdir=d)
devtools::install(file.path(d, "mypackage"), dependencies=TRUE,
repos="https://cloud.r-project.org/")
If you want to install from multiple repos, you can provide a list of them. For example, to use both Bioconductor and CRAN, you could run:
devtools::install(file.path(d, "mypackage"), dependencies=TRUE,
repos=BiocManager::repositories())
NOTE: I can't figure out how to directly pass the tarball to install(), but this solution works in the meantime and leaves no clutter because we extract to a temp directory. It seems install_local() should be able to take a tarball, but I am getting an error when attempting to do so.
If you already have installed your local package, you should be able to use a couple functions in tools to install the dependencies from CRAN:
library('tools')
installFoundDepends(pkgDepends('mypackage', local = FALSE)$Found)
Note: You can pass args (like repos) through installFoundDepends to install.packages.
You can also use the Depends element from the pkgDepends output to pass directly to install.packages:
install.packages(pkgDepends('mypackage')$Depends)
UPDATE: Apparently it is not possible to install a local package with dependencies=FALSE. This seems odd, since you can do that for a remote package from a repository. The reason (looking at the source code) is that if(is.null(repos) & missing(contriburl)), installation is handled via system calls to R CMD INSTALL, which has no dependency-related arguments.
So old question and so many answers but unfortunately none of them presents the canonical way to address the problem.
R was designed to handle situations like this, no extra packages are needed. One has to create local repository, and then use it, together with CRAN url, as a repository source when installing.
Below is code that present complete process.
## double check our dependency is not yet installed
## remove.packages("data.table")
"data.table" %in% rownames(installed.packages())
#[1] FALSE
## create our pkg
hello = function() "world"
package.skeleton(name="pkg", list="hello")
#...
cat("Imports: data.table\n", file="pkg/DESCRIPTION", append=TRUE)
unlink(c("pkg/Read-and-delete-me", "pkg/man"), recursive=TRUE)
rm(hello)
## publish our pkg in current working directory
system("R CMD build pkg")
#...
dir.create("src/contrib", recursive=TRUE)
file.rename("pkg_1.0.tar.gz", "src/contrib/pkg_1.0.tar.gz")
#[1] TRUE
tools::write_PACKAGES("src/contrib")
## install pkg and its dependencies automatically
install.packages("pkg", repos=c(
paste0("file://", getwd()),
"https://cloud.r-project.org"
))
#Installing package into '/home/jan/R/x86_64-pc-linux-gnu-library/4.2'
#(as 'lib' is unspecified)
#also installing the dependency 'data.table'
#...
## test
library(pkg)
hello()
#[1] "world
"data.table" %in% rownames(installed.packages())
#[1] TRUE
On windows one may need to specify type="source" and amend paths.
If you are not opposed to using another package who manages this for you, this can nowadays be easily achieved with the {remotes} package.
install.packages("remotes")
remotes::install_local("mypackage.tar.gz")
You can specify some further options which dependencies you want (e.g. also those in 'Suggests') etc.:
?remotes::install_local
{remotes} itself does not have dependencies afaik, so it does not add too much clutter to your environment.