Creating a data.frame for R package - r

I am making an R package, and there is a need to keep track of files that were opened using the functions in the package.
What is the recommended procedure for creating R objects (in this case, a data.frame) upon loading the package in a way that is (sufficiently) hidden from the user? I do not want the user to manually edit the data.frame.
One idea I had was to create a data.frame in the options settings inside of an .onLoad call (similar to what Hadley does in his devtools package here), but the list of opened files is not really a configurable "option" in my package. Is there another way?

When you create an R package, unless you're exporting all objects, you have to list which objects are exported in the NAMESPACE file. If you need to maintain a data frame within your package but you don't want it made available to the user, you can choose not to export it by excluding it from the list.

Related

Generate a call to a package function programatically given vector of package names

In my work I develop R packages that export R data objects (.RData). The name of these .RData files is always the same (e.g. files.RData). These packages also define and export a function that uploads the data to my database, say upload_data(). Inside upload_data() I first load the data using data(files, package = "PACKAGE NAME") and then push it into my database.
Let's say I have two packages, package1 and package2, which live on my file system. Given a vector of the package names (c("package1", "package2")), how would I go about to call 'upload_data()' programatically? Specifically, inside a script, how would I construct a call using "::" notation that constructs and evaluates a call like this: package1::upload_data()). I tried 'call' but couldn't get it right.
You could go the route of constructing the call using :: notation and evaluating that - but it's probably just easier to directly use get and specify the package you want to grab from.
get("upload_data", envir = asNamespace("package1"))
will return the function the same as using package1::upload_data would but is much easier to deal with programatically.

Temporarily loading and unloading packages in an R function

I am writing a function that will take the name of an installed package and return a data frame listing all the data frames available in that package along with the number and types of variables in those data frames.
In order to do this, I need to require the package temporarily so I can access its data sets. The problem I have is that requiring a package also introduces a whole lot of extra stuff into the search path and the loaded namespaces beyond just the package in question. I want my function to tidy up after itself, but I can't find a good way to detach everything that got imported when the package was required. In particular, detach seems to detach only the package, but not any of the other imported stuff.
Any advice?
I'm not sure what IDE you are workign with but many of them have "tab-completion". If I type :....... ?unload at my console and hit <tab> I immmediately see ??unloadNamespace ... so that would be a reasonable function to investigate. You should first look at:
?unloadNamespace
... and then decide if that is sufficient. There is also the detach function that has a link to its associated help page in that help page.

List objects in sysdata.rda from within an R package

I'm building a package that contains a number of look up tables, hidden from the user by storing them in R/sysdata.rda. This works fine and I am able to reference them from internal package functions directly or via get.
Is there a way to get a vector of object names contained in sysdata.rda from within a function inside the package? What about as a user?
The behavior I am looking for would be similar to how ls lists the objects in an environment.
The way I use is to have an internal function to generate the sysdata.R It can then also generate the vector of names within the limited scope of the function. You can then add the list of names to sysdata.R itself.
Or, if it is more complicated, have the function save the tables in a new environment: you can then ls the new environment for the list, and save the contents into sysdata.R.

Load data object when package is loaded

Is there a way to automatically load a data object from a package in memory when the package is loaded (but not yet attached)? I.e. the opposite of lazy loading? The object is used in one of the package functions, so it needs to be available at all time.
When the package is set to lazydata=false, the data object is not exported by the package at all, and needs to be loaded manually with data(). We could use something like:
.onLoad <- function(lib, pkg){
data(mydata, package = pkg)
}
However, data() loads the object in the global environment. I strongly prefer to load it in the package environment (which is what lazydata does) to prevent masking conflicts.
A workaround is to bypass the data mechanics completely, and simply hardcode the object in the package. So the package myscore.R would look like
mymodel <- readRDS("inst/mymodel.rds")
myscore <- function(newdata){
predict(mymodel, newdata)
}
But this will lead to a huge packagedb for large data objects, and I am not sure what are the consequences of that.
As you say
The object is used in one of the package functions, so it needs to be available at all time.
I think the author of that package should really NOT use data(.) for that.
Instead he should define the object inside his /R/ either by simple R code in an R/*.R file,
or by using the sysdata.rda approach that is explained in the famous first reference for all these question,
"Writing R Extensions". In both cases the package author can also export the object which is often desirable for other users as in your case.
Of course this needs a polite conversation between you and the package author, and will only apply to the next version of that package.
I'm going to post this since it seems to work for my use case.
.onLoad() is:
function(lib,pkg)
data(mydata, package=pkg,
environment=parent.env(environment()))
Also need Imports: utils in DESCRIPTION and importFrom(utils, data) in NAMESPACE in order to pass R CMD check.
In my case I don't need the data object to be visible to the user, I need it to be visible to one of the functions in the package. If you need it visible to the user, that's going to be even harder (I think) because as far as I can tell you can't export data, just functions. The only way I've thought of to export data is to export a wrapper function for the data.

Exporting data in Roxygen2 so that they are available without requiring data()

After reading questions such as this SO question on documenting a data set with Roxygen I have managed to document a dataset (which I will refer to as cells) and it now appears in the list generated by data(package="mypackage") and is loaded if I run the command data(cells). After this, cells will appear when ls() is run.
However, in many packages the data is immediately available without requiring a data() call. Also, the data names do not appear when ls() is run. An example is the baseball data set that comes with plyr. I have looked at the source for plyr and I cannot see how this is done.
In the DESCRIPTION file of your package make sure that there is a field called LazyData that is set to TRUE.
From the "Writing R Extensions" guide:
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
I think the exact syntax changed with R version 2.14; before that it was LazyLoad not LazyData.

Resources