I am looking for a fast R solution to read a package NAMESPACE file. The solution should contain already preprocessed (and aggregated) records and separated imports and exports.
Unfortunately I can’t use getNamespaceExports("dplyr")/getNamespaceImports("dplyr") as they need the package to be loaded to the R session, which is too slow.
I need a solution which simply process a text from the NAMESPACE file. Any solutions using external packages as well as partial solutions would still be welcome.
The raw data we could grabbed with a call like readLines("https://raw.githubusercontent.com/cran/dplyr/master/NAMESPACE"). roxygen2 generated files are formatted properly, but this will be not true for all manually generated files.
EDIT:
Thanks to Konrad R. answer I could develop such functionality in my new CRAN package - pacs. I recommended to check pacs::pac_namespace function. There is even one function which goes one step further, comparing NAMESPACE files between different package versions pacs::pac_comapre_namespace.
The function is included in R as base::parseNamespaceFile. Unfortunately the function does not directly take a path as an argument. Instead it constructs the path from a package name and the library location. However, armed with this knowledge you should be able to call it; e.g.:
parseNamespaceFile('dplyr', .libPaths()[1L])
EDIT
Somebody has to remember that the whole packages imports (like import(rlang)) have to be still invoked with the same function and the exports for them extracted. Two core elements are using parse on NAMESPACE code and then using the recursive extract function parseDirective.
Related
I created a local package with personal functions to be easily used within R. One of these is aimed to be used in the lidR package within a wrapper function (i.e. grid_metrics). For this reason I took the scheme of this script as a reference, exporting both the long name (e.g. my_metrics(param1, param2,...)) and the lazy one (e.g. .my_metrics), because I really like its ease of use.
Nevertheless, if I load my package and then call the lazy function
library(mypackage)
test = grid_metrics(las, .my_metrics, 20)
it does not work, so I have to load in memory the function by running its code from the file. At this stage, I can use it in both forms.
Within the NAMESPACE file I can see that both forms are exported so my last guess is that this might be related somehow to lazyeval but I don't get how.
It seems that the problem is related to the DESCRIPTION section in which the lidR package was included. Since when I moved from Imports to Depends the issue is solved.
I am trying to follow closely #hadley's book to learn best practices in writing R packages. And I was thrilled to read these lines about the philosophy of the book:
anything that can be automated, should be automated. Do as little as
possible by hand. Do as much as possible with functions.
So when I was reading about dependencies and the (sort of) confusing differences between import directives in the NAMESPACE file and the "Imports:" field in the DESCRIPTION file, I was hoping that roxygen2 would automatically handle both of them. After all
Every package mentioned in NAMESPACE must also be present in the
Imports or Depends fields.
I was hoping that roxygen2 would take every #import in my functions and make sure it is included in the DESCRIPTION file. But it does not do that automatically.
So I either have to add it manually to the DESCRIPTION file or almost manually using devtools::use_package.
Looking around for an answer, I found this question in SO, where #hadley confirms in the comments that
Currently, the namespace roclet will modify NAMESPACE but not
DESCRIPTION
and other posts (e.g. here or here) where collate_roclet is discussed, but "This only matters if your code has side-effects; most commonly because you’re using S4".
I wonder:
the reason that DESCRIPTION is not automatically updated (sort of contradicting the aforementioned philosophy, which is presumably shared by roxygen2) and
If someone has already crafted a way to do it
I have written a little R package for that task:
https://github.com/markusdumke/pkghelper
It scans the R Code and NAMESPACE for packages in use and adds them to the Imports section.
The namespace_roclet edits the NAMESPACE file based on the tags added in the script before the function. As there are three types of dependencies (Depends, Imports, and Suggests), a similar method as used by the namespace_roclet would require three different tags (notice Imports should be a different one, to differentiate it from the packages to attach in NAMESPACE).
If you are willing to take a semi-automated process, you could identify the packages you have used and add the missing ones to DESCRIPTION, in the adequate sections.
library(reinstallr)
package.dir <- getwd()
base_path <- normalizePath(package.dir)
files <- list.files(file.path(base_path, "R"), full.names = TRUE)
packages <- unique(reinstallr:::scan_for_packages(files)$package)
packages
Regarding the two bullets you wonder about at the bottom:
Updates to the DESCRIPTION file could be further automated with additional roclets, however already >4 years ago such a pull request was deferred:
https://github.com/klutometis/roxygen/pull/76
I have to assume that the guys would indeed rather have you use the devtools package for updating the DESCRIPTION file, instead of adding this to roxygen2. So in that sense, devtools would be the first available choice
I am working on a family of R packages, all of which share substantial common code which is housed in an internal package, lets call it myPackageUtilities. So I have several packages
myPackage1, myPackage2, etc...
All of these packages depend on every method in myPackageUtilities. For a real-world example, please see statnet on CRAN. Each dependent package, of course, has
Depends: myPackageUtilities
in its DESCRIPTION file.
My question is: In the R code for myPackage1, which of the following two techniques for accessing methods from myPackageUtilities is preferable:
Use ::: to access the methods in myPackageUtilities, or
Export everything from myPackageUtilities (e.g. by including exportPattern("^[^\\.]") in the NAMESPACE)?
Option 2 clutters the end-user's search path, but the R gurus recommend against using :::.
Follow-up question: If (2) is the better choice, is there a way to export everything using roxygen2?
Suppose we have a package called randomUtils and this package has a function called sd that calculates the Slytherin Defiance Quotient.
Now I write a package called spellbound. If spellbound Depends on randomUtils, then randomUtils::sd will be found in the search path and can conflict with calculating standard deviation.
If spellbound Imports randomUtils, however, then R will install randomUtils but will not load it when spellbound is loaded. Thus, the new version of sd can't be found on the search path, but can still be accessed by randomUtils::sd
With an ever growing body of contributed work on CRAN, it is becoming very important that we use Imports as much as possible so that we don't introduce unexpected behaviors by having conflicting function definitions.
An example of when I have used Depends: when writing the HydeNet package, I wanted the end user to be able to use the rjags package in concert with HydeNet. So I put rjags in Depends so that library (HydeNet) would loaf both packages. (In other words, put rjags on the search path.
Moral of the story, if you don't intend for the user to directly access the functions, it should go in Imports.
I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.
Title should be pretty clear I hope. I'm writing a package called forecasting, with imports for dplyr among other packages. With the imports written in to the DESCRIPTION file, I am able to force these other packages to be installed along with forecasting - is there an equivalent way to do this for the loading of the package? In other words, is there a way that when I load my package with library(forecasting), it automatically also loads dplyr and the other packages?
Thanks
Yes.
Re-read "Writing R Extensions". The Depends: forces both the initial installation as well as the loading of the depended-upon packages.
But these days you want Imports: along with importFrom() in the NAMESPACE file which is more fine-grained.
But first things first: get it working with Depends.
Edit:
Opps you're correct, the documentation I referenced is not a primary source. Perhaps this is better:
From the R documentation:
The ‘Depends’ field gives a comma-separated list of package names which this package depends on. Those packages will be attached before the current package when library or require is called.
and
The ‘Imports’ field lists packages whose namespaces are imported from (as specified in the NAMESPACE file) but which do not need to be attached. Namespaces accessed by the ‘::’ and ‘:::’ operators must be listed here, or in ‘Suggests’ or ‘Enhances’
Original:
From the R packages documentation:
Adding a package dependency here [the DESCRIPTION file] ensures that it’ll be installed. However, it does not mean that it will be attached along with your package (i.e., library(x)). The best practice is to explicitly refer to external functions using the syntax package::function(). This makes it very easy to identify which functions live outside of your package. This is especially useful when you read your code in the future.