getting lazy data without attaching package - r

Background:
I have a CRAN R package which has a dependency on lazy-loaded data in another CRAN package of a specific version. I need to avoid using :: to refer to the data, because it causes CRAN check to fail.
I've read:
Evaluate function within package environment without attaching package
and
See if a variable/function exists in a package?
I've tried (using nycflights13 for this example):
# this works, but I can't use ::
nycflights13::airlines
find("airlines")
# character(0)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
#Error in get("airlines", envir = asNamespace("nycflights13"), mode = "list") : object 'airlines' of mode 'list' was not found
# attach
library(nycflights13)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
# works
find("airlines")
# [1] "package:nycflights13"
This may make it even more complicated, but I actually want to refer to an active binding, which returns data which may or may not be available.
What I would like:
A CRAN compatible way of referring to lazy-loaded data in another package without using :: or Imports in DESCRIPTION.

My workaround was to export a getter function for the external package, for which I am also the author. This works because functions are visible, but lazy data and active bindings (which are set, in my case, in .onLoad()) are not.
Another possibility is to use the fact that :: is a command, so something like this is valid R, and with variable naming on the RHS, it would enable flexibility to query presence or absence of data in namespaces (not just environments on the search() path)
`::`(nycflights13, airlines)
:: just substitutes the given symbols for strings, and calls getExportedValue in base.
So, better still, and I think this is my final answer:
base::getExportedValue(asNamespace("nycflights13"), "airlines")
This works without any requireNamespace() or library().

Related

Why/how some packages define their functions in nameless environment?

In my code, I needed to check which package the function is defined from (in my case it was exprs(): I needed it from Biobase but it turned out to be overriden by rlang).
From this SO question, I thought I could use simply environmentName(environment(functionname)). But for exprs from Biobase that expression returned empty string:
environmentName(environment(exprs))
# [1] ""
After checking the structure of environment(exprs) I noticed that it has .Generic member which contains package name as an attribute:
environment(exprs)$.Generic
# [1] "exprs"
# attr(,"package")
# [1] "Biobase"
So, for now I made this helper function:
pkgparent <- function(functionObj) {
functionEnv <- environment(functionObj)
envName <- environmentName(functionEnv)
if (envName!="")
return(envName) else
return(attr(functionEnv$.Generic,'package'))
}
It does the job and correctly returns package name for the function if it is loaded, for example:
pkgparent(exprs)
# Error in environment(functionObj) : object 'exprs' not found
library(Biobase)
pkgparent(exprs)
# [1] "Biobase"
library(rlang)
# The following object is masked from ‘package:Biobase’:
# exprs
pkgparent(exprs)
# [1] "rlang"
But I still would like to learn how does it happen that for some packages their functions are defined in "unnamed" environment while others will look like <environment: namespace:packagename>.
What you’re seeing here is part of how S4 method dispatch works. In fact, .Generic is part of the R method dispatch mechanism.
The rlang package is a red herring, by the way: the issue presents itself purely due to Biobase’s use of S4.
But more generally your resolution strategy might fail in other situations, because there are other reasons (albeit rarely) why packages might define functions inside a separate environment. The reason for this is generally to define a closure over some variable.
For example, it’s generally impossible to modify variables defined inside a package at the namespace level, because the namespace gets locked when loaded. There are multiple ways to work around this. A simple way, if a package needs a stateful function, is to define this function inside an environment. For example, you could define a counter function that increases its count on each invocation as follows:
counter = local({
current = 0L
function () {
current <<- current + 1L
current
}
})
local defines an environment in which the function is wrapped.
To cope with this kind of situation, what you should do instead is to iterate over parent environments until you find a namespace environment. But there’s a simpler solution, because R already provides a function to find a namespace environment for a given environment (by performing said iteration):
pkgparent = function (fun) {
nsenv = topenv(environment(fun))
environmentName(nsenv)
}

loading multiple R packages stored as an object using pacman's p_load

I'd like to use pacman's p_load function. I've read through the documentation and understand how to pass multiple packages into the function directly. However, I'd like to store the package names separately and feed them into pacman so that I can use this same 'list' to later test that the packages have been loaded into the environment.
Based on the documentation, pacman is expecting a character vector. my first attempt was:
pkg_list <- as.vector(c("tidyverse", "forecast")
or
pkg_list <- "tidyverse, forecast"
followed by:
pacman::p_load(pkg_list)
or
pacman::p_load(pkg_list, character.only = FALSE)
Which all return the same error stating:
package ‘pkg_list’ is not available
Fine, it's obviously looking for a package called pkg_list instead of the contents of the object so I also tried using a list and unlisting it in the p_load statement, using eval, etc. but p_load always seems to evaluate whatever is input as a literal.
Just trying to better understand why and how to escape that literal evaluation if possible.

Function imported from dependency not found, requires library(dependency)

I am trying to create an R package that uses functions from another package (gamlss.tr).
The function I need from the dependency is gamlss.dist::TF (gamlss.dist is loaded alongside gamlss.tr), but it is referenced in my code as simply TF within a call to gamlss.tr::gen.trun.
When I load gamlss.tr manually with library(), this works. However, when I rely on the functions of the dependency automatically being imported by my package through #import, I get an "object not found" error as soon as TF is accessed.
My attempt to be more explicit and reference the function I need as gamlss.dist::TF resulted in a different error ("unexpected '::'").
Any tips on how to use this function in my package would be much appreciated!
The code below reproduces the problem if incorporated into a clean R package (as done in this .zip), built and loaded with document("/path/to/package"):
#' #import gamlss gamlss.tr gamlss.dist
NULL
#' Use GAMLSS
#'
#' Generate a truncated distribution and use it.
#' #export
use_gamlss <- function() {
print("gen.trun():")
gamlss.tr::gen.trun(par=0,family=TF)
#Error in inherits(object, "gamlss.family") : object 'TF' not found
#gamlss.tr::gen.trun(par=0,family=gamlss.dist::TF)
#Error in parse(text = fname) : <text>:1:1: unexpected '::'
y = rTFtr(1000,mu=10,sigma=5, nu=5)
print("trun():")
truncated_dist = gamlss.tr::trun(par=0,family=TF, local=TRUE)
model = gamlss(y~1, family=truncated_dist)
print(model)
}
use_gamlss() will only start working once a user calls library(gamlss.tr).
This is due to bad design of gamlss.tr in particular the trun.x functions (they take character vectors instead of family objects / they evaluate everything in the function environment instead of the calling environment).
To work around this, you have to make sure that gamlss.distr is in the search path of the execution environment of gamlss.tr functions (This is why ## import-ing it in your package does not help: it would need to be #' #import-ed in gamlss.tr).
This can be achieved by adding it to Depends: of your package.
If you want to avoid that attaching your package also attaches gamlss.distr, you could also add the following at the top of use_gamlss:
nsname <-"gamlss.dist"
attname <- paste0("package:", nsname)
if (!(attname %in% search())) {
attachNamespace(nsname)
on.exit(detach(attname, character.only = TRUE))
}
This would temporarily attach gamlss.dist if it is not attached already.
You can read more on namespaces in R in Hadley Wickham's "Advanced R"

How does the PACKAGE argument to .Call work?

.Call seems rather poorly documented; ?.Call gives an explanation of the PACKAGE argument:
PACKAGE: if supplied, confine the search for a character string .NAME to the DLL given by this argument (plus the conventional extension, ‘.so’, ‘.dll’, ...).
This argument follows ... and so its name cannot be abbreviated.
This is intended to add safety for packages, which can ensure by using this argument that no other package can override their external symbols, and also speeds up the search (see ‘Note’).
And in the Note:
If one of these functions is to be used frequently, do specify PACKAGE (to confine the search to a single DLL) or pass .NAME as one of the native symbol objects. Searching for symbols can take a long time, especially when many namespaces are loaded.
You may see PACKAGE = "base" for symbols linked into R. Do not use this in your own code: such symbols are not part of the API and may be changed without warning.
PACKAGE = "" used to be accepted (but was undocumented): it is now an error.
But there are no usage examples.
It's unclear how the PACKAGE argument works. For example, in answering this question, I thought the following should have worked, but it doesn't:
.Call(C_BinCount, x, breaks, TRUE, TRUE, PACKAGE = "graphics")
Instead this works:
.Call(graphics:::C_BinCount, x, breaks, TRUE, TRUE)
Is this simply because C_BinCount is unexported? I.e., if the internal code of hist.default had added PACKAGE = "graphics", this would have worked?
This seems simple but is really rare to find usage of this argument; none of the sources I found give more than passing mention (1, 2, 3, 4, 5)... Examples of this actually working would be appreciated (even if it's just citing code found in an existing package)
(for self-containment purposes, if you don't want to copy-paste code from the other question, here are x and breaks):
x = runif(100000000, 2.5, 2.6)
nB <- 99
delt <- 3/nB
fuzz <- 1e-7 * c(-delt, rep.int(delt, nB))
breaks <- seq(0, 3, by = delt) + fuzz
C_BinCount is an object of class "NativeSymbolInfo", rather than a character string naming a C-level function, hence PACKAGE (which "confine(s) the search for a character string .NAME") is not relevant. C_BinCount is made a symbol by its mention in useDynLib() in the graphics package NAMESPACE.
As an R symbol, C_BinCount's resolution is subject to the same rules as other symbols -- it's not exported from the NAMESPACE, so only accessible via graphics:::C_BinCount. And also, for that reason, off-limits for robust code development. Since the C entry point is imported as a symbol, it is not available as a character string, so .Call("C_BinCount", ...) will not work.
Using a NativeSymbolInfo object tells R where the C code is located, so there is no need to do so again via PACKAGE; the choice to use the symbol rather than character string is made by the package developer, and I think would generally be considered good practice. Many packages developed before the invention of NativeSymbolInfo use the PACKAGE argument, if I grep the Bioconductor source tree there are 4379 lines with .Call.*PACKAGE, e.g., here.
Additional information, including examples, is in Writing R Extensions section 1.5.4.

knowing functions and dataset available in a package without any manual and information (blackbox) R

I have a package which is beta version, I do not have manual neither helppage have a manual.
using following command, I can say that the package is loaded in current session.
(.packages())
When I search
data()
I can not see any data associated with. Is that mean that there is no dataset associate with it ? How can I know whether there are any functions?
function() # do not work.
To list all stuff,
ls("package:MASS", all = TRUE)
all = TRUE shows hidden objects (i.e., variable name beginning with ".")
To list all functions with formals,
lsf.str("package:MASS", all = TRUE)
To list all datasets with brief description
data(package = "MASS")$results
Just in case, to list all imports and exports of the namespace,
getNamespaceInfo("MASS", "imports")
From the help page for ?data
try(data(package = "rpart") ) # list the data sets in the rpart package
And this is one of the reasons why the uses of the name 'data' is deprecated.
try(fortunes::fortune("dog") )

Resources