R package dev: should I make data files internal or external?

R package dev: should I make data files internal or external? - r

(Trying again with this question to make it more clear.)
I am attempting to write a package that makes it easier to access data from a web API, and deciding whether to make lookup tables and query defaults internal or external data, as outlined in the Data chapter of R Packages.
As I understand, there are drawbacks to each. Internal data is meant for data only used by the package, invisible to users. It is added to the package with devtools::use_data(x, mtcars, internal = TRUE) which adds sysdata.rda to the R/ package folder. However, although the package "needs" the data tables, I also want my data to be visible to the users, so they can correct errors, and perhaps add additional data files by pull request to extend the capability of the package. Furthermore, since I'm dealing with multiple files, not all available at the moment, rebundling everything into R/sysdata.rda every time there's a change seems inconvenient.
An alternative would be to make the lookup tables and query defaults external data, which is added with the default internal = FALSE flag: devtools::use_data(x, mtcars), adding mtcars.rda to the data/ package folder. The advantage is that such data is clearly visible to the user, but the downside is that I don't know how to access it from within the package functions without getting an error when running devtools::check(): object 'querydefaults' not found. What is the proper way to do this?

You can add the dataset both as external and internal, and it resolves the issue with devtools::check(). See the RIC package as an example.

Related

Is there a "correct" way to use exported data internally in an R package?

I know that exported data (access to users) belongs in the data/ folder and that internal data (data used internally by package functions) belongs in R/sysdata.rda. However, what about data I wish to both export to the user AND be available internally for use by package functions?
Currently, presumably due to the order in which objects/data are added to the NAMESPACE, my exported data is not available during devtools::check() and I am getting a NOTE: no visible binding for global variable 'data_x'.
There are probably a half dozen ways to get around this issue, many of which appear to me as rather hacky, so I was wondering if there was a "correct" way to have BOTH external and internal data (and avoid the NOTE from R CMD check).
So far I see these options:
write an internal function that calls the data and use that everywhere internally
Use the ':::' to access the data; which seems odd and invokes a different warning
Have a copy of data_x in BOTH data/ and R/sysdata.rda (super hacky)
Get over it and ignore the NOTE
Any suggestions greatly appreciated,
Thx.

Using built-in data in an R package [duplicate]

This is likely an easy answer that's been answered numerous times. I just lack the terminology to search for the answer.
I'm creating a package. When the package loads I want to have instant access to the data sets.
So lets say DICTIONARY is a data set.
I want to be able to do DICTIONARY rather than data(DICTIONARY) to get it to load. How do I go about doing this?

From R-exts.pdf (online source):
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
Adding the following to the DESCRIPTION file should do it:
LazyData: true

R: Adding configuration options on package install

I am developing a simple R package in which I would like to specify some local configuration options on package install. In particular, I would like the user to input a location on their computer on which to store package output data.
The package is most basically a random number generator. When used, it will generate a new random number, check the user-specified data file to make sure that number has not been generated, then either generate a new number, or append the new number to the data file.
I believe I could come up with a way to do this, but I'm wondering if there is a formal way of specifying something like configuration options that are particular to the local installation.
I've scoured the internet for information about this but not found much. This page may be somewhat related but in the end I didn't think this was exactly what I would want.
Thank you for any advice you can give!

Include library calls in functions?

Is it good practice to include every library I need to execute a function within that function?
For example, my file global.r contains several functions I need for a shiny app. Currently I have all needed packages at the top of the file. When I'm switching projects/copying these functions I have to load the packages/include them in the new code. Otherwise all needed packages are contained in that function. Of course I have to check all functions with a new R session, but I think this could help in the long run.
When I tried to load a package twice it won't load the package again but checks it's already loaded. My main question is if it would slow my functions if I restructure in that way?
I only saw that practice once, library calls inside functions, so I'm not sure.

As one of the commenters suggest, you should avoid loading packages within a function since
The function now has a global effect - as a general rule, this is something to avoid.
There is a very small performance hit.
The first point is the big one. As with most optimisation, only worry about the second point if it's an issue.
Now that we've established the principle, what are the possible solution.
In small projects, I have a file called packages.R that includes all the library calls I need. This is sourced at the top of my analysis script. BTW, all my functions are in a file call func.R. This workflow was stolen/adapted from a previous SO question
If you're only importing a single function, you could use the :: trick, e.g. package::funcA(...) That way you avoid loading the package.
For larger projects, I create an R package that handles all necessary imports. The benefit of creating a package is detailed in this answer on structuring large R projects.

How to automatically load data in an R package?

This is likely an easy answer that's been answered numerous times. I just lack the terminology to search for the answer.
I'm creating a package. When the package loads I want to have instant access to the data sets.
So lets say DICTIONARY is a data set.
I want to be able to do DICTIONARY rather than data(DICTIONARY) to get it to load. How do I go about doing this?

From R-exts.pdf (online source):
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
Adding the following to the DESCRIPTION file should do it:
LazyData: true

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R package dev: should I make data files internal or external? - r

You can add the dataset both as external and internal, and it resolves the issue with devtools::check(). See the RIC package as an example.

Related

Is there a "correct" way to use exported data internally in an R package?

Using built-in data in an R package [duplicate]

R: Adding configuration options on package install

Include library calls in functions?

How to automatically load data in an R package?

Categories

Resources