Exporting data in Roxygen2 so that they are available without requiring data() - r

After reading questions such as this SO question on documenting a data set with Roxygen I have managed to document a dataset (which I will refer to as cells) and it now appears in the list generated by data(package="mypackage") and is loaded if I run the command data(cells). After this, cells will appear when ls() is run.
However, in many packages the data is immediately available without requiring a data() call. Also, the data names do not appear when ls() is run. An example is the baseball data set that comes with plyr. I have looked at the source for plyr and I cannot see how this is done.

In the DESCRIPTION file of your package make sure that there is a field called LazyData that is set to TRUE.
From the "Writing R Extensions" guide:
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
I think the exact syntax changed with R version 2.14; before that it was LazyLoad not LazyData.

Related

Datasets not exported/available from my R package

Following advice about NAMESPACE and External Data formatting/setup, I have:
A. My data files in mypackage/data/datafilename.RData
B. The data script as mypackage/R/data.R with data files individually named and described within that one file, having just changed "itemize" to "describe" and changing the format of those item lines:
C. I've document()-ed this, commit-pushed to github, and install_github reinstalled locally.
Help for the data files works:
But I can't access those data, whereas I can access data in other packages using the same method:
Can anyone think why this would be? NAMESPACE doesn't include these as exports:
But it's autogenerated by document() so that's arguably out of my control. By comparison, mapplots' NAMESPACE has exportPattern(".")
Environment for the package also doesn't include them, but I don't know if this is expected or not, based on lazy loading (which is true):
Any ideas welcome. I've tried data(gbm.auto:grids) with 1, 2 & 3 colons, to no avail. Based on the answer to this related question (also by me), I get the suspicion that there might be some issue whereby only the last named object in data.R is important/accessible?
usethis has been created since I've been updating this package and has use_data and create_package but I'm reluctant to try these out since ostensibly everything in my package should already be in order and I don't want to make things worse.
Thanks in advance. Reprex would be
library(devtools)
install_github("SimonDedman/gbm.auto")
Edit: to add to this, the datasets available in the installed package are a combo of the full list, some individual, some named in datalist:
Which contrasts against what's in the working folder and github:
As far as I can see, all the data files are the same format, e.g. when doubleclicked in file explorer they open in RStudio with the right name and same format. gbm.auto/R/data.R file is here. Per the last image, the three data files listed in datalist can be loaded in R with library(gbm.auto) data(Juveniles), but the other three data files can't. If I delete/rename the existing datalist from /data and generate a new one with add_datalist(pkgname = getwd()), a new file is generated but again it only lists those 3 files, not all 6.
Ugh, goddamn it. Found the issue. The 3 'bad' files had "Rdata" extensions while the 3 good ones had "RData" extensions. Lower case vs capital D. How unbelievably annoying.
Data files in data must have .RData extensions, not .Rdata
Bug filed here.

Global variable on load in R package

I'm writing a package in R and I'd like to include some sample objects in it that would be easily accessible for users to play with. The problem is they contain non-ASCII characters, and R CMD check won't allow this in .rda files in data. It will, however, allow Unicode in inst/extdata. I could just have these datasets read and wrapped in objects when the package is loaded. I tried assign and <<-, but I couldn't make either work.
Alternately, they could be loaded and saved as .rda files during the installation of the package. This would be preferable, in fact, but from what I read this seemed to be less possible.
Probably irrelevant but possibly interesting bit of history: I started the package on Debian unstable. I saved those datasets as .rda and they passed the check just fine. At one point I made a little correction, resaved them, and got a warning. I saved them again, and the warning disappeared. Then I moved to Debian stable, added some new datasets, resaved them all, and now I can't get rid of the warning in any way. When I save them from r-devel, however, I only get a note, not a warning.
The answer is embarrassingly simple: read the data and prepare the variables in one of the files in the R folder, and #' #export them. No need to assign or anything.

Creating R package using code from script file

I’ve written some R functions and dropped them into a script file using RStudio. These are bits of code that I use over and over, so I’m wondering how I might most easily create an R package out of them (for my own private use).
I’ve read various “how to” guides online but they’re quite complicated. Can anyone suggest an “idiot’s guide” to doing this please?
I've been involved in creating R packages recently, so I can help you with that. Before proceeding to the steps to be followed, there are some pre-requisites, which include:
RStudio
devtools package (for most of the functions involved in creation of a package)
roxygen2 package (for roxygen documentation)
In case you don't have the aforementioned packages, you can install them with these commands respectively:
install.packages("devtools")
install.packages("roxygen2")
Steps:
(1) Import devtools in RStudio by using library(devtools).
(devtools is a core package that makes creating R packages easier with its tools)
(2) Create your package by using:
create_package("~/directory/package_name") for a custom directory.
or
create_package("package_name") if you want your package to be created in current workspace directory.
(3) Soon after you execute this function, it will open a new RStudio session. You will observe that in the old session some lines will be auto-generated which basically tells R to create a new package with required components in the specified directory.
After this, we are done with this old instance of RStudio. We will continue our work on the new RStudio session window.
By far the package creation part is already over (yes, that simple) however, a package isn't directly functionable just by its creation plus the fact that you need to include a function in it requires some additional aspects of a package such as its documentation (where the function's title, parameters, return types, examples etc as mentioned using #param, #return etc - you would be familiar if you see roxygen documentation like in some github repositories) and R CMD checks to get it working.
I'll get to that in the subsequent steps, but just in case you want to verify that your package is created, you can look at:
The top right corner of the new RStudio session, where you can see the package name that you created.
The console, where you will see that R created a new directory/folder in the path that we specified in create_package() function.
The files panel of RStudio session, where you'll notice a bunch of new files and directories within your directory.
(4) As you mentioned in your words, you drop your functions in a script file - hence you will need to create the script first, which can be done using:
use_r("function_name")
A new R script will pop up in your working session, ready to be used.
Now go ahead and write your function(s) in it.
(5) After your done, you need to load the function(s) you have written for your package. This is accomplished by using the devtools::load_all() function.
When you execute load_all() in the console, you'll get to know that the functions have been loaded into your package when you'll see Loading package_name displayed in console.
You can try calling your functions after that in the console to verify that they work as a part of the package.
(6) Now that your function has been written and loaded into your package, it is time to move onto checks. It is a good practice to check the whole package as we make changes to our package. The function devtools::check() offers an easy way to do this.
Try executing check() in the console, it will go through a number of procedures checking your package for warnings/errors and give details for the same as messages on the screen (pertaining to what are the errors/warnings/notes). The R CMD check results at the end will contain the vital logs for you to see what are the errors and warnings you got along with their frequency.
If the functions in your package are written well, (with additional package dependencies taken care of) it will give you two warnings upon execution of check:
The first warning will be regarding the license that your package uses, which is not specified for a new pacakge.
The second should be the one for documentation, warning us that our code is not documented.
To resolve the first issue which is the license, use the use_mit_license("license_holder_name") command (or any other license which suits your package - but then for private use as you mentioned, it doesn't really matter what you specify if only your going to use it or not its to be distributed) with your name as in place of license_holder_name or anything which suits a license name.
This will add the license field in the .DESCRIPTION file (in your files panel) plus create additional files adding the license information.
Also you'll need to edit the .DESCRIPTION file, which have self-explanatory fields to fill-in or edit. Here is an example of how you can have it:
Package: Your_package_name
Title: Give a brief title
Version: 1.0.0.0
Authors#R:
person(given = "Your_first_name",
family = "Your_surname/family_name",
role = c("package_creator", "author"),
email = "youremailaddress#gmail.com",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: Give a brief description considering your package functionality.
License: will be updated with whatever license you provide, the above step will take care of this line.
Encoding: UTF-8
LazyData: true
To resolve the documentation warning, you'll need to document your function using roxygen documentation. An example:
#' #param a parameter one
#' #param b parameter two
#' #return sum of a and b
#' #export
#'
#' #examples
#' yourfunction(1,2)
yourfunction <- function(a,b)
{
sum <- a+b
return(sum)
}
Follow the roxygen syntax and add attributes as you desire, some may be optional such as #title for specifying title, while others such as #import are required (must) if your importing from other packages other than base R.
After your done documenting your function(s) using the Roxygen skeleton, we can tell our package that we have documented our functions by running devtools::document(). After you execute the document() command, perform check() again to see if you get any warnings. If you don't, then that means you're good to go. (you won't if you follow the steps)
Lastly, you'll need to install the package, for it to be accessible by R. Simply use the install() command (yes the same one you used at the beginning, except you don't need to specify the package here like install("package") since you are currently working in an instance where the package is loaded and is ready to be deployed/installed) and you'll see after a few lines of installation a statement like "Done (package_name)", which indicates the installation of our package is complete.
Now you can try your function by first importing your package using library("package_name") and then calling your desired function from the package. Thats it, congrats you did it!
I've tried to include the procedure in a lucid way (the way I create my R packages), but if you have any doubts feel free to ask.

Automatic loading of data from sysdata.rda in package

I have spent a lot of time searching for an answer to what is probably a very basic question, but I just can't find the solution to my issue. The closest that I found was this exchange from a few years ago.
In that case, the issue was the location of the sysdata.rda file in the correct directory within the package. That is not my issue.
I have some variables that store things like color palettes that I amusing inside a package. These variables are only used inside my functions so I storing them in R/sysdata.rda. However, when I load the packages, the variables are not loading into the package environment. If I load the data manually from sysdata.rda then everything works fine.
My impression from reading everything that I could find on internal data in R packages was that the data in R/sysdata.rda would load automatically.
Here is the code that I am using to store my data.
devtools::use_data(tmpBrks, tmpColors, prcpBrks, prcpChgBrks,
prcpChgBrkLabels, prcpColors, prcpChgColors,
internal = TRUE, overwrite = TRUE)
That successfully creates the data file at R/sysdata.rda and the data is in the file when I load it manually.
What do I need to do to have the data load automatically so the functions in my package can use them?
As usual, this was a bad combination of user ignorance and poor R documentation. The data was being loaded and was available to the functions. Where I went wrong was in assuming that the data would be visible in the package environment. That is not the case.
As far as I can tell, internal data in the R\sysdata.rda file is available to the functions within the package, but not visible in any way. After I created the internal data file I was looking for the data in the package environment. When I didn't see it I assumed that it wasn't loaded. When I kept pushing forward with my package development I finally realized that the data was loading silently and accessible to the functions in the package.
As evidenced by the two up votes that my question got, I am not the only one who didn't understand the behavior of the R\sysdata.rda internal data. Hopefully this explanation will save someone else a bunch of time searching for an answer to this issue that doesn't really exist.

Changing the load order of files in an R package

I'm writing a package for R in which the exported functions are decorated by a higher-order function that adds error checking and some other boilerplate code.
However, because this code is at the top-level it is evaluated after parsing. These means that
the load order of the package files is important.
To give an equivalent but simplified example, suppose I have a package with two files (Negate2 and Utils), and I require Negate2.R to be loaded first for the function 'isfalse( )' to be defined without throwing an error.
# /Negate2.R
Negate2 <- Negate
# -------------------
# /Utils.R
istrue <- isTRUE
isfalse <- Negate2(istrue)
Is it possible to structure NAMESPACE, DESCRIPTION (collate) or another package file in order to change the load order of files? The internal working of the R package structure and CRAN are still black magic to me.
It is possible to get around this problem using awkward hacks, but the least repetitive way of solving this problem. The wrapper function must be a higher-order function, since it also changes the function call semantics of its input function. The package is code heavy (~6000 lines, 100 functions) so repetition would be...problematic.
Solution
As #Manetheran points out, to change the load order you just change the order of the file names in the DESCRIPTION file.
# /DESCRIPTION
Collate:
'Negate2.R'
'Utils.R'
The Collate: field of the DESCRIPTION file allows you to change the order files are loaded when the package is built.
I stumbled across the answer to this question yesterday while reading up on Roxygen. If you've been documenting your functions with Roxygen, it can try to intelligently order your R source files in the Collate: field (based on where S4 class and method definitions are). This can be done by adding "collate" to the roclets argument of roxygenize. Alternatively if you're developing in RStudio there is a simple box that can be checked under Build->Configure Build Tools->Configure... (Button next to "Generate documentation with Roxygen").
R loads files in alphabetical order. To change the order, Collate field could be used from the DESCRIPTION file.
roxygen2 provides an explicit way of saying that one file must be loaded before another: #include. The #include tag gives a space separated list of file names that should be loaded before the current file:
#' #include class-a.r
setClass("B", contains = "A")
If any #include tags are present in the package, roxygen2 will set the Collate field in the DESCRIPTION.
You need to run generation of roxygen2 documentation in order to changes to take effect.

Resources