Changing the load order of files in an R package - r

I'm writing a package for R in which the exported functions are decorated by a higher-order function that adds error checking and some other boilerplate code.
However, because this code is at the top-level it is evaluated after parsing. These means that
the load order of the package files is important.
To give an equivalent but simplified example, suppose I have a package with two files (Negate2 and Utils), and I require Negate2.R to be loaded first for the function 'isfalse( )' to be defined without throwing an error.
# /Negate2.R
Negate2 <- Negate
# -------------------
# /Utils.R
istrue <- isTRUE
isfalse <- Negate2(istrue)
Is it possible to structure NAMESPACE, DESCRIPTION (collate) or another package file in order to change the load order of files? The internal working of the R package structure and CRAN are still black magic to me.
It is possible to get around this problem using awkward hacks, but the least repetitive way of solving this problem. The wrapper function must be a higher-order function, since it also changes the function call semantics of its input function. The package is code heavy (~6000 lines, 100 functions) so repetition would be...problematic.
Solution
As #Manetheran points out, to change the load order you just change the order of the file names in the DESCRIPTION file.
# /DESCRIPTION
Collate:
'Negate2.R'
'Utils.R'

The Collate: field of the DESCRIPTION file allows you to change the order files are loaded when the package is built.
I stumbled across the answer to this question yesterday while reading up on Roxygen. If you've been documenting your functions with Roxygen, it can try to intelligently order your R source files in the Collate: field (based on where S4 class and method definitions are). This can be done by adding "collate" to the roclets argument of roxygenize. Alternatively if you're developing in RStudio there is a simple box that can be checked under Build->Configure Build Tools->Configure... (Button next to "Generate documentation with Roxygen").

R loads files in alphabetical order. To change the order, Collate field could be used from the DESCRIPTION file.
roxygen2 provides an explicit way of saying that one file must be loaded before another: #include. The #include tag gives a space separated list of file names that should be loaded before the current file:
#' #include class-a.r
setClass("B", contains = "A")
If any #include tags are present in the package, roxygen2 will set the Collate field in the DESCRIPTION.
You need to run generation of roxygen2 documentation in order to changes to take effect.

Related

Read a package NAMESPACE file

I am looking for a fast R solution to read a package NAMESPACE file. The solution should contain already preprocessed (and aggregated) records and separated imports and exports.
Unfortunately I can’t use getNamespaceExports("dplyr")/getNamespaceImports("dplyr") as they need the package to be loaded to the R session, which is too slow.
I need a solution which simply process a text from the NAMESPACE file. Any solutions using external packages as well as partial solutions would still be welcome.
The raw data we could grabbed with a call like readLines("https://raw.githubusercontent.com/cran/dplyr/master/NAMESPACE"). roxygen2 generated files are formatted properly, but this will be not true for all manually generated files.
EDIT:
Thanks to Konrad R. answer I could develop such functionality in my new CRAN package - pacs. I recommended to check pacs::pac_namespace function. There is even one function which goes one step further, comparing NAMESPACE files between different package versions pacs::pac_comapre_namespace.
The function is included in R as base::parseNamespaceFile. Unfortunately the function does not directly take a path as an argument. Instead it constructs the path from a package name and the library location. However, armed with this knowledge you should be able to call it; e.g.:
parseNamespaceFile('dplyr', .libPaths()[1L])
EDIT
Somebody has to remember that the whole packages imports (like import(rlang)) have to be still invoked with the same function and the exports for them extracted. Two core elements are using parse on NAMESPACE code and then using the recursive extract function parseDirective.

how to use utils::globalVariables

Following your recommendations (or trying to do it, at least), I have tried some options, but the problem remains, so there must be something I am missing.
I have included a more complete code
setwd("C:/naapp")
#' #import utils
#' #import devtools
I have tried with and without using suppressForeignCheck
if(getRversion() >= "2.15.1"){
utils::globalVariables(c("eleven"))
utils::suppressForeignCheck(c("eleven"))
}
myFunctionSum <- function(X){print(X+eleven)}
myFunctionMul <- function(X){print(X*eleven)}
myFunction11 <- function(X){
assign("eleven",11,envir=environment(myFunctionMul))
}
maybe I should use a particular environment?
package.skeleton(name = "myPack11", list=ls(),
path = "C:/naapp", force = TRUE,
code_files = character())
I remove the "man" directory from the directory myPack11,
otherwise I would get an error because the help files are empty.
I add the imports utils, and devtools to the descrption
Then I run check
devtools::check("myPack11")
And I still get this note
#checking R code for possible problems ... NOTE
#myFunctionMul: no visible binding for global variable 'eleven'
#myFunctionSum: no visible binding for global variable 'eleven'
#Undefined global functions or variables:eleven
I have tried also to make an enviroment, combining Tomas Kalibera's suggetion and an example I found in the Internet.
myEnvir <- new.env()
myEnvir$eleven <- 11
etc
In this case, I get the same note, but with "myEnvir", instead of "eleven"
First version of the question
I trying to use "globalVariables" from the package utils. I am building an interface in R and I am planning to submit to CRAN. This is my first time, so, sorry if the question is very basic.I have read the help and I have tried to find examples, but I still don't know how to use it.
I have made a little silly example to ilustrate my question, which is:
Where do I have to place this line exactly?:
if(getRversion() >= "2.15.1"){utils::globalVariables("eleven")}
My example has three functions. myFunction11 creates the global variable "eleven" and the other two functions manipulate it. In my real code, I cannot use arguments in the functions that are called by means of a button. Consider that this is just a silly example to learn how to use globalVariables (to avoid binding notes).
myFunction11 <- function(){
assign("eleven",11,envir=environment(myFunctionSum))
}
myFunctionSum <- function(X){
print(X+eleven)
}
myFunctionMul <- function(X){
print(X*eleven)
}
Thank you in advance
I thought that the file globals.R would be automatically generated when using globalsVariables. The problem was that I needed to create the package skeleton, then create the file globals.R, add it to the R directory in the package and check the package.
So, I needed to place this in a different file:
#' #import utils
utils::globalVariables(c("eleven"))
and save it
The documentation clearly says:
## In the same source file (to remind you that you did it) add:
if(getRversion() >= "2.15.1") utils::globalVariables(c(".obj1", "obj2"))
so put it in the same source file as your functions. It can go in any of your R source files, but the comment above recommends you put it close to your code. Looking at a bunch of github packages reveals another common pattern is to have a globals.R function with it in, but this is probably a bad idea. If you later remove the global from your package but neglect to update globals.R you could mask a problem. Putting it right close to the functions that use it will hopefully remind you when you edit those functions.
Make sure you put it outside any function definitions in the file, or it won't get seen.
You cannot modify bindings in a package namespace once the package is loaded (and namespace sealed, and bindings locked). The check tool helps you to spot violations of this restriction, so you find out about the problem when checking the package rather than while running it. globalVariables is just a call to silence check when looking for these violations, which is undesirable in almost all cases. If you really need mutable state in a package, you can create a new environment (using new.env) and bind it to an (unexported) "global" variable in your namespace. This binding will be locked, but this is ok, because in R you can change an environment in place (add/remove elements, effectively modifying the elements).
The best situation is however when you can keep all mutable state in user objects (passed in as arguments into functions, and their modified versions returned as output values of functions).

How to call R script from another R script, both in same package?

I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.

Linking multiple files while creating a package in R

I am trying to create a package in R wherein I have created lots of new custom Classes. Each class is in a different file. The Classes inherit from a parent class and inherit to other classes.
While running my codes I call each of them like this
source("package/father.R")
source("package/son.R")
source("package/grandson.R")
Definition for Some of the methods needed by the grandson Class in defined in Son class. I use package.skeleton() to call each of them and create a package and it seems to work fine. But when running R CMD Check(and when trying install into R), it throws an error because the function tries to call the files in the alphabetical order and so the file grandson.R is called before son.R and it shows and error saying that the methods has not been defined. If I change the names to zgrandson.R, R called that file the last, and everything seems to work fine, but this is evidently not a solution for the problem.
I have read tutorials for creating packages, but all of them seem to deal with simple cases where there is no inheritance/calling other files in R. Hope I have made myself clear.
As far as I understand, you can use the Collate field in the DESCRIPTION file to control this.
Quoting from the Writing R Extensions manual:
An ‘Collate’ field can be used for controlling the collation order for
the R code files in a package when these are processed for package
installation. The default is to collate according to the ‘C’ locale.
If present, the collate specification must list all R code files in
the package (taking possible OS-specific subdirectories into account,
see Package subdirectories) as a whitespace separated list of file
paths relative to the R subdirectory. Paths containing white space or
quotes need to be quoted. An OS-specific collation field
(‘Collate.unix’ or ‘Collate.windows’) will be used instead of
‘Collate’.
So, you could specify:
Collate:
father.r
son.R
grandson.r
Or simply rename the files in such a way that lexicographic sorting order will result in the correct collation order, as you indicated in your question.
But also see this answer by #DirkEddelbuettel on a similar question.

Exporting data in Roxygen2 so that they are available without requiring data()

After reading questions such as this SO question on documenting a data set with Roxygen I have managed to document a dataset (which I will refer to as cells) and it now appears in the list generated by data(package="mypackage") and is loaded if I run the command data(cells). After this, cells will appear when ls() is run.
However, in many packages the data is immediately available without requiring a data() call. Also, the data names do not appear when ls() is run. An example is the baseball data set that comes with plyr. I have looked at the source for plyr and I cannot see how this is done.
In the DESCRIPTION file of your package make sure that there is a field called LazyData that is set to TRUE.
From the "Writing R Extensions" guide:
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
I think the exact syntax changed with R version 2.14; before that it was LazyLoad not LazyData.

Resources