I'm setting up a git workflow with my R project using packrat. Everytime I packrat::snapshot() my workspace, the file packrat.lock changes with the new packages/versions etc, but it also changes the Hash line for each package, which is a bit annoying when checking file diffs to see what changed from one commit to another.
Is this Hash really necessary? If not, is there any way to disable it?
The hash is generated by the hidden hash() function in packrat library, and it serves as a package consistency check.
The algorithm generates an md5sum that is based on the DESCRIPTION file included in the package tarball, but there is additional logic involved, see lines #103-#107 in the packrat/R/cache.R source at Github.
In order to obtain the HASH that packrat expects to find in the packrat.lock file a direct call to the hash() function must be made. This function is not exposed in the compiled package, so the only way to access it is to use the packrat source.
Obtain a copy of the source of the packrat library from CRAN with the correct version
Extract it into a folder (in my example it is packrat-0.5.0)
Start an R session
The following lines demonstrate how to generate the hash for the package BH-1.66.0-1 (4cc8883584b955ed01f38f68bc03af6d):
# md5sum() function is neeeded
library(tools)
# relevant source code files are loaded
source('packrat-0.5.0/R/utils.R') # readDcf() function
source('packrat-0.5.0/R/cache.R') # packrat's hash() function
# execute the hash() function on the DESCRIPTION file in the package
print(hash('/usr/local/lib/R/site-library/BH/DESCRIPTION'))
This should return the correct HASH of 4cc8883584b955ed01f38f68bc03af6d.
I am not aware of any options in packrat that would allow you to disable HASH checking. If your goal is to manually modify the packrat.lock file to alter a package version, it is certainly possible by performing this trick.
This could help overcome some minor dependency issues. However, there are two dangers:
such a package version change may start a cascade of dependency upgrade requirements
errors appear in your app because of compatibility issues
Related
I cannot fully restore one package in renv lock file, but I am able to install different version of this package. So I wonder if I can manually overwrite package version in lock file. Do I just need to replace version number? Should I change hash as well? What are the consequences?
You can -- renv.lock is just JSON, so you can modify it as needed if you need to tweak a particular entry. (Or, you can use renv::record(<package>#<version>) to explicitly update the lockfile using renv APIs.)
If you're changing entries in renv.lock, you should normally remove the Hash component for modified entries. The hash is used for caching; it allows renv::restore() to restore a package from the global renv cache if available, thereby avoiding a retrieve + build + install of the package.
If it is not set, then renv will not use the cache and instead always try to retrieve the package from the declared source (which seems appropriate for your case).
Is there any documentation on manually installing a package in a user library when the R.home() path is locked down and incomplete (no etc, no bin, just library?) The system does NOT support shelling out to execute R CMD, which I believe standard R does.
I would like to build existing source packages (from CRAN) and install into a user library directory, so that I can use the library() function and get all the usual namespace and *.Rdx and *.Rdb files.
At the moment, I'm plodding through install.packages, tools::.build_package, and tools:::.install.packages source, using a standard MacOS R and the r source. Hopefully this has been documented in a more user-friendly fashion and my google searches have missed it.
Thanks.
You don't need to use a different install.packages method, rather you only need to specify a writable location for storing packages and give it precedence over the system default one. A simple way to accomplish this is to set an R_LIBS environment variable. For instance, in my .bashrc I have
export R_LIBS='/home/username/.local/lib/R-3.3.3'
Then, by default, all packages are installed here. Further, packages installed both here and the system-wide location will give priority to the ones here when loading.
You can verify that the location is being used by checking .libPaths() in your R session.
I have a few older R projects I'm working with, which are dependent on several currently deprecated (or heavily modified) packages. In order for everything to work smoothly I use older versions of those packages, which I have saved in another folder and load up manually to %userprofile%\documents\R\win-library\3.3 when necessary. However, this is not convenient, especially if I want to run multiple projects simultaneously, some of which requires the new and updated versions of the packages.
My question - is there a way to specify custom directories for each .Rproj from which it would take and load the libraries?
You can solve this much simpler:
Have a top-level directory for each project, call projA, projB, ...
Within each of these, create a directory libs/, say.
And within each of these directories have a file .Rprofile with a single assignment such as .libPaths("./libs")
Now when you start R in the different project directories, each will a separate library directory preceding the path, allowing you to place per-projects overrides there.
In an nutshell, the approach outlines here allows you to keep the local and modified packages around as you please. (You can even assign common directories via .libPaths() if you so choose.)
The nice things is that this will
work with any R invocation, batch or GUI or RStudio or shiny or ...
does not depend on any other tools, and hence
does not rely on RStudio or .Rprof files -- though you are free to use RStudio as well.
As so often, Base R is there for you.
One option is to use the checkpoint package by Revolution Analytics.
You can indicate for each main R file in a project the date for which you which you wish to load a set of packages. You can read a bit more about it here.
To pull snapshotted packages from a given date from the mirror use getValidSnapshots(mranRootUrl = mranUrl()).
To create a checkpoint:
# Create temporary project and set working directory
example_project <- paste0("~/checkpoint_example_project_", Sys.Date())
dir.create(example_project, recursive = TRUE)
oldwd <- setwd(example_project)
# Write dummy code file to project
cat("library(MASS)", "library(foreach)",
sep="\n",
file="checkpoint_example_code.R")
# Create a checkpoint by specifying a snapshot date
library(checkpoint)
checkpoint("2014-09-17")
# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")
# Check that library path is set to ~/.checkpoint
.libPaths()
# Check which packages are installed in checkpoint library
installed.packages()
# cleanup
unlink(example_project, recursive = TRUE)
setwd(oldwd)
I'd like to profile functions in an installed R package (data.table) using Rprof() with line.profiling=TRUE. Normally, installed package are byte compiled, and line numbers are not available for byte compiled packages. The usual instructions for line profiling with Rprof() require using source() or eval(parse()) so that srcref attributes are present.
How can I load data.table so that line numbers are active? My naive attempts to first load the package with library(data.table) and then source('data.table.R') fails because some of the compiled C functions are not found when I attempt to use the package, presumably because library() is using a different namespace. Maybe there is some way to source() into the correct namespace?
Alternatively, perhaps I can build a modified version of data.table that is not byte compiled, and then load that in a way that keeps line numbers? What alterations would I have to make, and how would I then load it? I started by setting ByteCompile: FALSE and then trying R CMD INSTALL -l ~/R/lib --build data.table, but this still seems to be byte compiled.
I'm eager to make this work and will pursue any suggestions. I'm running R 3.2.1 on Linux, have full control over the machine, and can install anything else that is required.
Edit:
A more complete description of the problem I was trying to solve (and the solution for it) is here: https://github.com/Rdatatable/data.table/issues/1249
I ended up doing essentially what Joshua suggested: recompile the package with "KeepSource: TRUE" in the DESCRIPTION. For my purposes, I also found "ByteCompile: FALSE" to be helpful, although this might not apply generally. I also changed the version number so I could see that I was using my modified version.
Then I installed to a different location with "R CMD INSTALL data.table -l ~/R/lib", and loaded with "library(data.table, lib='~/R/lib')". When used with the patches given in the link, I got the line numbers of the allocations as I desired. But if anyone knows a solution that doesn't require recompilation, I'm sure that others would appreciate if you shared.
You should be able to get line numbers even if the package is byte-compiled. But, as it says in ?Rprof (emphasis added):
Individual statements will be recorded in the profile log if
line.profiling is TRUE, and if the code being executed was
parsed with source references. See parse for a discussion of
source references. By default the statement locations are not
shown in summaryRprof, but see that help page for options to
enable the display.
That means you need to set KeepSource: TRUE either in the DESCRIPTION file or via the --with-keep.source argument to R CMD INSTALL.
Probably a pretty basic question but a friend and I tried to run str(packge_name) and R threw us an error. Now that I'm looking at it, I'm wondering if an R package is like a .zip file in that it is a collection of objects, say pictures and songs, but not a picture or song itself.
If I tried to open a zip of pictures with an image viewer, it wouldn't know what to do until I unzipped it - just like I can't call str(forecast) but I can call str(ts) once I've loaded the forecast package into my library...
Can anyone set me straight?
R packages are generally distributed as compressed bundles of files. They can either be in "binary" form which are preprocessed at a repository to compile any C or Fortran source and create the proper headers, or they can be in source form where the various required files are available to be used in the installation process, but this requires that the users have the necessary compilers and tools installed at locations where the R build process using OS system resources can get at them.
If you read the documentation for a package at CRAN you see they are distributed in set of compressed formats that vary depending on the OS-targets:
Package source: Rcpp_0.11.3.tar.gz # the Linus/UNIX targets
Windows binaries: r-devel: Rcpp_0.11.3.zip, r-release: Rcpp_0.11.3.zip, r-oldrel: Rcpp_0.11.3.zip
OS X Snow Leopard binaries: r-release: Rcpp_0.11.3.tgz, r-oldrel: Rcpp_0.11.3.tgz
OS X Mavericks binaries: r-release: Rcpp_0.11.3.tgz
Old sources: Rcpp archive # not really a file but a web link
Once installed an R package will have a specified directory structure. The DESCRIPTION file is a text file with specific entries for components that determine whether the local installation meets the dependencies of the package. There are NAMESPACE, LICENSE, and INDEX files. There are directories named '/help', '/html', '/Meta', '/R', and possibly '/libs', '/demo', '/data', '/unitTests', and others.
This is the tree at the top of the ../library/Rcpp package directory:
$ ls
CITATION NAMESPACE THANKS examples libs
DESCRIPTION NEWS.Rd announce help prompt
INDEX R discovery html skeleton
Meta README doc include unitTests
So in the "life-cycle" of a package, there will be initially a series of required and optional files, which then get processed by the BUILD and CHECK mechanisms into an installed package, which than then get compressed for distribution, and later unpacked into a specified directory tree on the users machine. See these help pages:
?.libPaths # also describes .Library()
?package.skeleton
?install.packages
?INSTALL
And of course read Writing R Extensions, a document that ships with every installation of R.
Your question is:
What type of object is an R package?
Somehow, I’m still missing an answer to this exact question. So here goes:
As far as R is concerned, an R package is not an object. That is, it’s not an object in R’s type system. R is being a bit difficult, because it allows you to write
library(pkg_name)
Without requiring you to define pkg_name anywhere prior. In contrast, other objects which you are using in R have to be defined somewhere – either by you, or by some package that’s loaded either explicitly or implicitly.
This is unfortunate, and confuses people. Therefore, when you see library(pkg_name), think
library('pkg_name')
That is, imagine the package name in quotes. This does in fact work just as expected. The fact that the code also works without quotes is a peculiarity of the library function, known as non-standard evaluation. In this case, it’s mostly an unfortunate design decision (but there are reasons).
So, to repeat the answer: a package isn’t a type of R object1. For R, it’s simply a name which refers to a known location in the file system, similar to what you’ve assumed. BondedDust’s answer goes into detail to explain that structure, so I shan’t repeat it here.
1 For super technical details, see Joshua’s and Richard’s comments below.
From R's own documentation:
Packages provide a mechanism for loading optional code, data and
documentation as needed.…A package is a directory of files which
extend R, a source package (the master files of a package), or a
tarball containing the files of a source package, or an installed
package, the result of running R CMD INSTALL on a source package. On
some platforms (notably OS X and Windows) there are also binary
packages, a zip file or tarball containing the files of an installed
package which can be unpacked rather than installing from sources. A
package is not a library.
So yes, a package is not the functions within it; it is a mechanism to have R be able to use the functions or data which comprise the package. Thus, it needs to be loaded first.
I am reading Hadley's book Advanced-R (Chapter 6.3 - functions, p.79) and this quote will cover you I think:
Every operation is a function call
“To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call."
— John Chambers
According to that using library(name_of_library) is a function call that will load the package. Every little bit that has been loaded i.e. functions or data sets are objects which you can use by calling other functions. In that sense a package is not an object in any of R's environments until it is loaded. Then you can say that it is a collection of the objects it contains and which are loaded.