How does the PACKAGE argument to .Call work? - r

.Call seems rather poorly documented; ?.Call gives an explanation of the PACKAGE argument:
PACKAGE: if supplied, confine the search for a character string .NAME to the DLL given by this argument (plus the conventional extension, ‘.so’, ‘.dll’, ...).
This argument follows ... and so its name cannot be abbreviated.
This is intended to add safety for packages, which can ensure by using this argument that no other package can override their external symbols, and also speeds up the search (see ‘Note’).
And in the Note:
If one of these functions is to be used frequently, do specify PACKAGE (to confine the search to a single DLL) or pass .NAME as one of the native symbol objects. Searching for symbols can take a long time, especially when many namespaces are loaded.
You may see PACKAGE = "base" for symbols linked into R. Do not use this in your own code: such symbols are not part of the API and may be changed without warning.
PACKAGE = "" used to be accepted (but was undocumented): it is now an error.
But there are no usage examples.
It's unclear how the PACKAGE argument works. For example, in answering this question, I thought the following should have worked, but it doesn't:
.Call(C_BinCount, x, breaks, TRUE, TRUE, PACKAGE = "graphics")
Instead this works:
.Call(graphics:::C_BinCount, x, breaks, TRUE, TRUE)
Is this simply because C_BinCount is unexported? I.e., if the internal code of hist.default had added PACKAGE = "graphics", this would have worked?
This seems simple but is really rare to find usage of this argument; none of the sources I found give more than passing mention (1, 2, 3, 4, 5)... Examples of this actually working would be appreciated (even if it's just citing code found in an existing package)
(for self-containment purposes, if you don't want to copy-paste code from the other question, here are x and breaks):
x = runif(100000000, 2.5, 2.6)
nB <- 99
delt <- 3/nB
fuzz <- 1e-7 * c(-delt, rep.int(delt, nB))
breaks <- seq(0, 3, by = delt) + fuzz

C_BinCount is an object of class "NativeSymbolInfo", rather than a character string naming a C-level function, hence PACKAGE (which "confine(s) the search for a character string .NAME") is not relevant. C_BinCount is made a symbol by its mention in useDynLib() in the graphics package NAMESPACE.
As an R symbol, C_BinCount's resolution is subject to the same rules as other symbols -- it's not exported from the NAMESPACE, so only accessible via graphics:::C_BinCount. And also, for that reason, off-limits for robust code development. Since the C entry point is imported as a symbol, it is not available as a character string, so .Call("C_BinCount", ...) will not work.
Using a NativeSymbolInfo object tells R where the C code is located, so there is no need to do so again via PACKAGE; the choice to use the symbol rather than character string is made by the package developer, and I think would generally be considered good practice. Many packages developed before the invention of NativeSymbolInfo use the PACKAGE argument, if I grep the Bioconductor source tree there are 4379 lines with .Call.*PACKAGE, e.g., here.
Additional information, including examples, is in Writing R Extensions section 1.5.4.

Related

How to see available parameters and documentation for a class method in R?

How can we see all available parameters (or view documentation more generally) for a class method?
For example, if we look at arguments for print()
?print
x
an object used to select a method.
...
further arguments passed to or from other methods.
quote
logical, indicating whether or not strings should be printed with surrounding quotes.
-- leaving others out for brevity --
useSource
logical indicating if internally stored source should be used for printing when present, e.g., if options(keep.source = TRUE) has been in use.
Note that we do not see any documentation for the parameter max_n.
Now suppose we call print() on something of class xml_nodes, e.g:
library(rvest)
library(dplyr)
# Generate an object of class xml_nodes
a <- rep("<p></p>", 30) %>%
paste0(collapse="") %>%
read_html %>%
html_nodes("p")
class(a)
# [1] "xml_nodeset"
a is of class xml_nodeset, and if we call print(a), it prints only 20 results, and that's because (I think) the xml_nodeset class is configured such that when print is called on it, it will only return 20 results. (the '20' number can be changed via the max_n parameter).
How do we find the specific documentation for how print will behave when called on the object of class xml_nodeset? (preferably via RStudio/manuals)
Note that the example above is just a random example, I would like to find a general way of finding documentation for all class methods
You can see all the "special" version of print by running methods(print). These versions are typically in the form <function name>.<class name>. Many listed there have astericks which means they are not directly exported from the packages where they are defined. If they have documentation, you can access it via ?print.rle for example. In this case there is no documentation for the print.xml_nodeset function. But you can look at it if you do getAnywhere(print.xml_nodeset) or if you happened to know it was from the xml2 namespace, you could do xml2:::print.xml_nodeset (with three colons).
There's also the sloop package which can tell you which S3 method will be called for a given invocation. For example
sloop::s3_dispatch(print(a))
=> print.xml_nodeset
* print.default
You could file an issue with the package maintainer asking to provide documentation for the function, but otherwise R can't really give you documentation if the author did not include it.

getting lazy data without attaching package

Background:
I have a CRAN R package which has a dependency on lazy-loaded data in another CRAN package of a specific version. I need to avoid using :: to refer to the data, because it causes CRAN check to fail.
I've read:
Evaluate function within package environment without attaching package
and
See if a variable/function exists in a package?
I've tried (using nycflights13 for this example):
# this works, but I can't use ::
nycflights13::airlines
find("airlines")
# character(0)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
#Error in get("airlines", envir = asNamespace("nycflights13"), mode = "list") : object 'airlines' of mode 'list' was not found
# attach
library(nycflights13)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
# works
find("airlines")
# [1] "package:nycflights13"
This may make it even more complicated, but I actually want to refer to an active binding, which returns data which may or may not be available.
What I would like:
A CRAN compatible way of referring to lazy-loaded data in another package without using :: or Imports in DESCRIPTION.
My workaround was to export a getter function for the external package, for which I am also the author. This works because functions are visible, but lazy data and active bindings (which are set, in my case, in .onLoad()) are not.
Another possibility is to use the fact that :: is a command, so something like this is valid R, and with variable naming on the RHS, it would enable flexibility to query presence or absence of data in namespaces (not just environments on the search() path)
`::`(nycflights13, airlines)
:: just substitutes the given symbols for strings, and calls getExportedValue in base.
So, better still, and I think this is my final answer:
base::getExportedValue(asNamespace("nycflights13"), "airlines")
This works without any requireNamespace() or library().

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

Documentation and code for functions that require a backtick ` to access in function form

For example, the math operators +, -, *, / are all defined as infix operators so that 1 + 3 can also be written as `+`(1, 3). (Further reading).
I know that you can sometimes get the documentation for these functions by using ?`+` or help(`+`).
However, this is not working for the distr package, which defines the above mathematical operators on random variables which are class objects. E.g.
library(distr)
Norm() * Norm()
I have tried things like help(`distr::*`) and help(distr::`*`). Interestingly if I try
library(dplyr)
help(`%>%`)
I get two links in the help window, one to the dplyr package and one to the magrittr package. I also do not know what syntax to use to access the help of dplyr::`%>%` directly.
Try the following:
library(distr)
?operators
This may also be of interest:
?"Math-methods"
methods?Math # same
and also try this to browse the distr package "-class" help files, keyword math help files and keyword arith help files:
help.search("class", package = "distr")
help.search("math", fields = "keyword", package = "distr")
help.search("arith", fields = "keyword", package = "distr")
If you want to browse all the help files for the distr package:
help(package = "distr")
You can also browse the source at https://github.com/cran/distr or http://distr.r-forge.r-project.org or download it from its CRAN home page https://cran.r-project.org/package=distr .

R: How far does it go? (Plus venting)

I have an object called defaultPacks, containing the names of packages installed on all the computers I use. Much abbreviated:
defaultPacks <- c(
"AER",
"plyr",
"dplyr"
)
I want to save this object to file in a shared directory all of them can reach. I am using Dropbox for this, with sync always paused when R is running.
save(defaultPacks,
file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata"))
Then I want to load the object and install the packages the names of which are in the object defaultPacks.
SyncPacks <- function(fileString){
defaultPacks <- load(file=fileString)
install.packages(defaultPacks, repos="http://cran.us.r-project.org")
}
SyncPacks(file.path("C:","Users","andrewH","Dropbox","R_PROJ","sharedSettings.rdata")
If I do this, I get a warning:
Warning in install.packages: package ‘defaultPacks’ is not available (for R version 3.2.1)
I look what is in defaultPacks immediately after I load and assign it: the string "defaultPacks". So it seems to loading just be a string rather than an object.
So I go back to my save, and try
save(get(defaultPacks), file.path(etc.))
This gives me an different error:
Error in save(get("defaultPacks"), file = file.path("C:", "Users", "andrewH", :
object ‘get("defaultPacks")’ not found.
Then I tried dynGet() with the same result.
So where before it was treating a symbol as a string, now it is treating a function as a string.
So I try the list option for save:
save(list = defaultPacks, file = file.path(etc))
And get yet another error:
Error in save(list = defaultPacks, file = file.path("C:", "Users", "andrewH", :
objects ‘AER’, ‘plyr’, ‘dplyr’, (etc.) not found
So where before I couldn't get to my character vector, now I am shooting right past it, evaluating defaultPacks to find the strings, and then treating each string as a symbol, and evaluating it to its (nonexistent) object.
So, I want to know how to make this work. But I am asking for something more than that. I have this problem, or an analogous problem, all the time. After several years of using R, I still have it a couple of times a week. I don't know how many steps of evaluation R is going to take on any given occasion. I hand a function an object name, and the function treats it as a string. I hand a function a string, and the R function converts it to a symbol and tries to evaluate it. Here, I don't understand why the save function does not save the object I gave it, and then give it back with load.
I've read the discussions on scoping in ten different R books, from Chambers "Software for Data Analysis" to Wickham's "Advanced R." Twice. Four times in some cases. I know about the four environments of a function, and the difference between the call stack and the chain of environmental parents. And yet, it is clear that I am missing something basic. It is not just that I don't know why save does not take a name in its ... argument and save it as an object (unless the problem is at the load end). I don't know how I can know. The function description says, of the ...s, "the names of the objects to be saved (as symbols or character strings)." So why is it saving a name as a string? Or why is load returning a string, if save saved an object? And how could I predict that?
Experienced R programmers, I know you can tell in advance how a given R function is going to treat one of its arguments. You know how far it will be evaluated. You can make it go as far as you want it to, and then STOP. You don't have to write str()'s into your functions every time you want to figure out what the heck it thinks its arguments mean. How do you do it?
Bloody "R Inferno". It's an understatement.
One way of seeing the problem is to note that the value of defaultPacks changes from before to after these operations.
> fname = tempfile()
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> save(defaultPacks, file=fname)
> defaultPacks = load(fname)
> identical(orig, defaultPacks)
[1] FALSE
The problem starts with an understanding of what save() does. From ?save, the object that is saved is named defaultPacks and it has value c("AER", "plyr", "dplyr"). save() could save multiple objects, each with a name and associated value, so it somehow has to save the name of each object.
load() restores the objects that save() has written, and returns (from ?load) a "character vector of the names of objects created". In this case load() restores (creates in the global environment) the symbol defaultPacks, populates it with the character vector of default packages, and returns the name (i.e., character vector of length 1 "defaultPacks") of the object(s) it has restored. The return value then overwrites the restored value, and we have defaultPacks = "defaultPacks".
install.packages doesn't do anything fancy with it's first argument, which from ?install.packages is a "character vector of the names of packages whose current versions should be downloaded". The character vector happens to be the symbol defaultPacks, but the error comes from the value of the symbol, which is the character vector "defaultPacks".
save() and load() more or less have to work the way they do to support multiple objects. On the other hand saveRDS() and readRDS() (ok, why read instead of load?) have a contract to save a single object. The name of the saved object does not need to be stored to be able to recover the values associated with it. So saveRDS(defaultPacks, fname); defaultPacks = readRDS(fname) works, and in particular the value of defaultPacks before and after this series of operations remains unchanged.
> orig = defaultPacks = c("AER", "plyr", "dplyr")
> saveRDS(defaultPacks, fname)
> defaultPacks = readRDS(fname)
> identical(orig, defaultPacks)
[1] TRUE
Without meaning to be too much of a jerk, the answer to the question "Experienced R programmers...how do you do it?" the answer is implied by the ? above -- by carefully reading the manual. Also, there are not that many places in base R code where evaluation is non-standards -- formulas and library are the main culprits -- so recognizing what the problem is not can help to focus on what is actually going on.

Resources