I'm developing a package that uses ggmap as a dependency.
ggmap: https://github.com/dkahle/ggmap
Within my package, I'm calling ggmap function using the recommended approach of including ggmap in the Imports section of the Description file, and calling functions using the :: operator (e.g. ggmap::get_map()). My issue is that ggmap assumes that some options are set upon initialization in .onLoad().
https://github.com/dkahle/ggmap/blob/master/R/attach.R
I believe that, since I'm not calling library() or require(), .onAttach() never gets called, and thus these options never get set. I can't call .onAttach() within my package, because it is not exported.
What is the best practice for initializing a dependent package?
This seems like a general problem in R package development, but I can't find the answer anywhere. And my apologies, this doesn't seem like the kind of question that can have a reproducible example.
Related
I am writing a package for the first time and I am trying to understand how it works in principle.
I need a variable to be available to many functions within the package. The common suggestion is to use a new environment that you specifically create for this purpose from within the package. For example, here:
How to define "hidden global variables" inside R packages?
or in this snippet of code:
https://github.com/eddelbuettel/rcppgsl/blob/master/R/inline.R#L19-L34
Based on this suggestion, right now my code starts with:
.pkg_env <- new.env(parent=emptyenv())
Yet, I cannot reconcile this with the principle that "the code in a package is run when the package is built" as mentioned here (https://r-pkgs.org/r.html#understand-when-code-is-executed). Perhaps, I am missing something in the way environments are managed in R, but I would guess you need that line to be run at load to create the environment.
When making your own package for R, one often wants to make use of functions from a different package.
Maybe it's a plotting library like ggplot2, dplyr, or some niche function.
However, when making a function that depends on functions in other packages, what is the appropriate way to call them? In particular, I am looking for examples of when to use
myFunction <- function(x) {
example_package::function(x)
}
or
require(example_package)
myFunction <- function(x) {
function(x)
}
When should I use one over the other?
If you're actually creating an R package (as opposed to a script to source, R Project, or other method), you should NEVER use library() or require(). This is not an alternative to using package::function(). You are essentially choosing between package::function() and function(), which as highlighted by #Bernhard, explicitly calling the package ensures consistency if there are conflicting names in two or more packages.
Rather than require(package), you need to worry about properly defining your DESCRIPTION and NAMESPACE files. There's many posts about that on SO and elsewhere, so won't go into details, see here for example.
Using package::function() can help with above if you are using roxygen2 to generate your package documentation (it will automatically generate a proper NAMESPACE file.
The douple-colon variant :: has a clear advantage in the rare situations, when the same function name is used by two packages. There is a function psych::alpha to calculate Cronbach's alpha as a measure of internal consistency and a function scales::alpha to modify color transparency. There are not that many examples but then again, there are examples. dplyr even masks functions from the stats and base package! (And the tidyverse is continuing to produce more and more entries in our namespaces. Should you use dyplr you do not know, if the base function you use today will be masked by a future version of dplyr thus leading to an unexpected runtime problem of your package in the future.)
All of that is no problem if you use the :: variant. All of that is not a problem if in your package the last package opened is the one you mean.
The require (or library) variant leads to overall shorter code and it is obvious, at what time and place in the code the problem of a not-available package will lead to an error and thus become visible.
In general, both work well and you are free to choose, which of these admittedly small differences appears more important to you.
Sometimes I use attach with some subset terms to work with odd dimensions of study data. To prevent "masking" variables in the environment (really the warning message itself) I simply call detach() to just remove whatever dataset I was working with from the R search path. When I get muddled in scripting, I may end up calling detach a few times. Well, interestingly if I call it enough, R removes functions that are loaded at start-up as part of packages like from utils, stats, and graphics. Why does "detach" remove these functions?
R removes base functions from the search path, like plot and ? and so on.
These functions that were removed are often called “base” functions but they are not part of the actual ‹base› package. Rather, plot is from the package ‹graphics›, and ? is from the package ‹utils›, both of which are part of the R default packages, and are therefore attached by default. Both packages are attached after package:base, and you’re accidentally detaching these packages with your too many detach calls (package:base itself cannot be detached; this is important because if it were detached, you couldn’t reattach it: the functions necessary for that are inside package:base).
To expand on this, attach and detach are usually used in conjunction with package environments rather than data sets: to enable the uses functions from a package without explicitly typing the package name (e.g. graphics::plot), the library function attaches these packages. When loading R, some packages are attached by default. You can find more information about this in Hadley Wickham’s Advanced R.
As you noticed, you can also attach and detach data sets. However, this is generally discouraged (quite strongly, in fact). Instead, you can use data transformation functions from the base package (e.g. with and transform, as noted by Moody_Mudskipper in a comment) or from data manipulation package (‹dplyr› is state of the art; an alternative is ‹data.table›).
Q. How does one write an R package function that references an R package dataset in a way that is simple/friendly/efficient for an R user. For example, how does one handle afunction() that calls adataset from a package?
What I think may not be simple/friendly/efficient:
User is required to run
data(adataset) before running
afunction(...)
or else receiving an Error: ... object 'adataset' not found. I have noticed some packages have built-in datasets that can be called anytime the package is loaded, for example, iris, which one can call without bringing it to the Global Environment.
Possible options which I have entertained:
Write data(NamedDataSet) directly into the function. Is this a bad idea. I thought perhaps it could be, looking at memory and given my limiting understanding of function environments.
Code the structure of the dataset directly into the function. I think this works depending on the size of the data but it makes me wonder about how to go about proper documentation in the package.
Change Nothing. Given a large enough dataset, maybe it does not make sense to implement a way different from reading it before calling the function.
Any comments are appreciated.
You might find these resources about writing R data packages useful:
the "External Data" section of R Packages
Creating an R data package
Creating an R data package (overview)
In particular, take note of the DESCRIPTION file and usage of the line LazyData: true. This is how datasets are made available without having to use data(), as in the iris example that you mention.
I'm writing my own R package and would like to plot a spatialPolygonsDataFrame object. If I were writing it as a script I would simply load the necessary packages (maptools, rgdal, and rgeos) with library() and plot with plot(x).
When writing a package to build using library() is not advised, instead it is usual to load the package by adding it to Imports: in the NAMESPACE. If I do this I receive the following error:
Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'
This is is corrected by loading the maptools package with library() if writing a script.
I know you can load individual methods with ImportMethodsFrom in the NAMESPACE so have tried to import a plot method from maptools using this approach but have had no luck. When I looked in the NAMESPACE of the maptools package I couldn't find a plot method exported. I've seen there is a plot.Spatial function which I have tried to import to my NAMESPACE without success:
No methods found in "maptools" for requests: plot.Spatial
Finally, I have tried adding maptools to Depends: instead of Imports: in my NAMESPACE and this does work. Is this the canonical way to do this? It seems overkill to attach a whole package for one method (plus I don't know what functions have been masked, etc.). What is the best way to load the necessary tools to plot maps within a self-authored function?
Edit 1: In response to #Hack-R's question, I don't know if plot.Spatial is the only method I need, or even if it's the correct one. It's my educated guess that this will enable me to plot spatial objects.
plot.Spatial is internal and is in sp and not maptools, which I think is the answer here. You are looking at the wrong package.
As discussed in the comments, you can simply use sp::plot.
For developing a package, there's a bit more to it.
If you import the methods for plot so that your functions can use it internally, but it won't be available to users unless they library(sp). You could re-export it, so your users don't have to attach sp - but you'll need to document it and perhaps explain why, and also check there's no issues if sp is attached.
This is a bit of a challenging topic that is well explained here: http://r-pkgs.had.co.nz/namespace.html I was pretty comfortable with namespaces but only recently realized you could re-export a function that you import from another - so you could provide sp's plot.Spatial without Depends: sp.
I override the print methods for Spatial in a package I use, and that in in turn overrides the overrides that raster provides - there's no stopping you doing this, it's a matter of managing the user expectations and hopefully not making things hard/er. You probably don't want to override a generic like plot for normal use, it's clearer if you have a myPlot that does that specifically, or add your own classes.
It's another level complicated though, since plot.Spatial is internal, and it's source is used to define an S4 method for plot. You can see the methods with showMethods("plot") and then get the internal functions that provide those with findMethods("plot")[["Spatial#missing"]] or findMethods("plot")[["SpatialPolygons#missing"]].
#mdsumner's answer pointed me in the right direction and was a useful discussion in its own right.
The answer to my specific query to plot spatialPolygonsDataFrame objects was to add sp to Imports: and call sp::plot()