Package design: How to prevent R from loading sp package unnecessarily? - r

I've made a package that exports spatial polygons objects. When using such objects, R automatically loads the sp package. Due to lazy loading of data, I was hoping that as long as the user does not use these spatial polygons objects, the sp package would not be loaded yet.
What I want
That the sp package is not loaded until the user uses / loads one of the spatial polygons objects in my package.
What actually happens
When I attach my package with library(myPackage), it does indeed not load sp. However, all functions and objects in my package start with mir_, and as soon as I type mir_ into my console, R loads sp. (I don't even need to execute any code, or even select any function or object. Just typing mir_ is enough.)
My questions
Why does R load sp even though I haven't used a spatial polygons object?
Is it possible to solve this issue, so that sp is only loaded when a spatial polygons object is used / loaded?
Extra info
I'm using LazyData: true in my DESCRIPTION file.

Related

R package building : what is the best solution to use large and external data that need to be regularly updated?

We are creating a package on R whose main function is to geocode addresses in Belgium (= transform a "number-street-postcode" string into X-Y spatial coordinates). To do this, we need several data files, namely all buildings with their geographical coordinates in Belgium, as well as municipal boundary data containing geometries.
We face two problems in creating this package:
The files take up space: about 300-400Mb in total. This size is a problem, because we want to eventually put this package on CRAN. A solution we found on Stackoverflow is to create another package only for the data, and to host this package elsewhere. But there is then a second problem that arises (see next point).
Some of the files we use are files produced by public authorities. These files are publicly available for download and they are updated weekly. We have created a function that downloads the data if the data on the computer is more than one week old, and transforms it for the geocoding function (we have created a specific structure to optimize the processing). We are new to package creation, but from what we understand, it is not possible to update data every week if it is originally contained in the package (maybe we are wrong?). It would be possible to create a weekly update of the package, but this would be very tedious for us. Instead, we want the user to be able to update the data whenever he wants, and for the data to persist.
So we are wondering what is the best solution regarding this data issue for our package. In summary, here is what we want:
Find a user-friendly way to download the data and use it with the package.
That the user can update the data whenever he wants with a function of the package, and that this data persists on the computer.
We found an example that could work: it is the Rpostal package, which also relies on large external data (https://github.com/Datactuariat/Rpostal). The author found the solution to install the data outside the package, and to specify the directory where they are located each time a function is used. It is then necessary to define libpostal_path as argument in the functions, so that it works.
However, we wonder if there is a solution to store the files in a directory internal to R or to our package, so we don't have to define this directory in the functions. Would it be possible, for example, to download these files into the package directory, without the user having the choice, so that we can know their path in any case and without the user having to mention it?
Are we on the right track or do you think there is a better solution for us?

using large data in an R package

I am writing an R package with very large internal data consisting of several models created with caret, which add up to almost 2 GB. The idea is that this package will live on my computer exclusively, and several other packages I built will be able to use it to make predictions. My computer has plenty of memory, but I can't figure out how to set up the package so that the models work efficiently.
I can install the package successfully if I store the large models in inst/extdata, set lazy loading to false in the DESCRIPTION file, and load the models inside the function that uses them. (I think I could also so this by putting the models in the data directory, turning off lazy loading, and loading them inside the function.) But this is very slow, since my other packages call the prediction function repeatedly and it has to load the models every time. It would work much better if the models were loaded along with the package, and just stayed in memory.
Other things I have tried made it so that the package couldn't be installed at all. When I try I get the error "long vectors not supported yet." These include:
storing the models in inst/extdata with lazy loading
storing the models in R/sysdata.rda (with or without lazy loading)
storing the models in the data directory (so they're exported) with lazy loading
Is there a better way to do this, that keeps the models loaded when the package is loaded? Or is there some better alternative to using an R package?

Why does calling `detach` cause R to "forget" functions?

Sometimes I use attach with some subset terms to work with odd dimensions of study data. To prevent "masking" variables in the environment (really the warning message itself) I simply call detach() to just remove whatever dataset I was working with from the R search path. When I get muddled in scripting, I may end up calling detach a few times. Well, interestingly if I call it enough, R removes functions that are loaded at start-up as part of packages like from utils, stats, and graphics. Why does "detach" remove these functions?
R removes base functions from the search path, like plot and ? and so on.
These functions that were removed are often called “base” functions but they are not part of the actual ‹base› package. Rather, plot is from the package ‹graphics›, and ? is from the package ‹utils›, both of which are part of the R default packages, and are therefore attached by default. Both packages are attached after package:base, and you’re accidentally detaching these packages with your too many detach calls (package:base itself cannot be detached; this is important because if it were detached, you couldn’t reattach it: the functions necessary for that are inside package:base).
To expand on this, attach and detach are usually used in conjunction with package environments rather than data sets: to enable the uses functions from a package without explicitly typing the package name (e.g. graphics::plot), the library function attaches these packages. When loading R, some packages are attached by default. You can find more information about this in Hadley Wickham’s Advanced R.
As you noticed, you can also attach and detach data sets. However, this is generally discouraged (quite strongly, in fact). Instead, you can use data transformation functions from the base package (e.g. with and transform, as noted by Moody_Mudskipper in a comment) or from data manipulation package (‹dplyr› is state of the art; an alternative is ‹data.table›).

Self-authored package: load plot method for spatialPolygonsDataFrame

I'm writing my own R package and would like to plot a spatialPolygonsDataFrame object. If I were writing it as a script I would simply load the necessary packages (maptools, rgdal, and rgeos) with library() and plot with plot(x).
When writing a package to build using library() is not advised, instead it is usual to load the package by adding it to Imports: in the NAMESPACE. If I do this I receive the following error:
Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'
This is is corrected by loading the maptools package with library() if writing a script.
I know you can load individual methods with ImportMethodsFrom in the NAMESPACE so have tried to import a plot method from maptools using this approach but have had no luck. When I looked in the NAMESPACE of the maptools package I couldn't find a plot method exported. I've seen there is a plot.Spatial function which I have tried to import to my NAMESPACE without success:
No methods found in "maptools" for requests: plot.Spatial
Finally, I have tried adding maptools to Depends: instead of Imports: in my NAMESPACE and this does work. Is this the canonical way to do this? It seems overkill to attach a whole package for one method (plus I don't know what functions have been masked, etc.). What is the best way to load the necessary tools to plot maps within a self-authored function?
Edit 1: In response to #Hack-R's question, I don't know if plot.Spatial is the only method I need, or even if it's the correct one. It's my educated guess that this will enable me to plot spatial objects.
plot.Spatial is internal and is in sp and not maptools, which I think is the answer here. You are looking at the wrong package.
As discussed in the comments, you can simply use sp::plot.
For developing a package, there's a bit more to it.
If you import the methods for plot so that your functions can use it internally, but it won't be available to users unless they library(sp). You could re-export it, so your users don't have to attach sp - but you'll need to document it and perhaps explain why, and also check there's no issues if sp is attached.
This is a bit of a challenging topic that is well explained here: http://r-pkgs.had.co.nz/namespace.html I was pretty comfortable with namespaces but only recently realized you could re-export a function that you import from another - so you could provide sp's plot.Spatial without Depends: sp.
I override the print methods for Spatial in a package I use, and that in in turn overrides the overrides that raster provides - there's no stopping you doing this, it's a matter of managing the user expectations and hopefully not making things hard/er. You probably don't want to override a generic like plot for normal use, it's clearer if you have a myPlot that does that specifically, or add your own classes.
It's another level complicated though, since plot.Spatial is internal, and it's source is used to define an S4 method for plot. You can see the methods with showMethods("plot") and then get the internal functions that provide those with findMethods("plot")[["Spatial#missing"]] or findMethods("plot")[["SpatialPolygons#missing"]].
#mdsumner's answer pointed me in the right direction and was a useful discussion in its own right.
The answer to my specific query to plot spatialPolygonsDataFrame objects was to add sp to Imports: and call sp::plot()

Converting spatial polygon to regular data frame without use of gpclib tools

I working with spatial data in R for a commercial application and would like to use ggplot2 for data visualization. If you run the Hadley's example at https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles you find that in order to run the fortify command you need to enable the use of gpclib tools using gpclibPermit().
I'm looking for an efficient way (that doesn't involve manually hacking into the S4 object) to perform the same operation that fortify does here, i.e. take a spatial polygon object and turn it into a regular data frame where row entries contain latitudinal and longitudinal coordinates along with a polygon id.
Has anyone else solved this one?
You need to also install the rgeos package. When maptools is loaded and rgeos is not installed, the following message is shown:
> require("maptools")
Loading required package: maptools
Checking rgeos availability: FALSE
Note: when rgeos is not available, polygon geometry
computations in maptools depend on gpclib,
which has a restricted licence. It is disabled by default;
to enable gpclib, type gpclibPermit()
When fortify is called with a region argument (as it is in the example you linked to), then some "polygon geometry computations" need to be done. If rgeos is not available, and gpclib is not permitted, it will fail.

Resources