Datasets not exported/available from my R package - r

Following advice about NAMESPACE and External Data formatting/setup, I have:
A. My data files in mypackage/data/datafilename.RData
B. The data script as mypackage/R/data.R with data files individually named and described within that one file, having just changed "itemize" to "describe" and changing the format of those item lines:
C. I've document()-ed this, commit-pushed to github, and install_github reinstalled locally.
Help for the data files works:
But I can't access those data, whereas I can access data in other packages using the same method:
Can anyone think why this would be? NAMESPACE doesn't include these as exports:
But it's autogenerated by document() so that's arguably out of my control. By comparison, mapplots' NAMESPACE has exportPattern(".")
Environment for the package also doesn't include them, but I don't know if this is expected or not, based on lazy loading (which is true):
Any ideas welcome. I've tried data(gbm.auto:grids) with 1, 2 & 3 colons, to no avail. Based on the answer to this related question (also by me), I get the suspicion that there might be some issue whereby only the last named object in data.R is important/accessible?
usethis has been created since I've been updating this package and has use_data and create_package but I'm reluctant to try these out since ostensibly everything in my package should already be in order and I don't want to make things worse.
Thanks in advance. Reprex would be
library(devtools)
install_github("SimonDedman/gbm.auto")
Edit: to add to this, the datasets available in the installed package are a combo of the full list, some individual, some named in datalist:
Which contrasts against what's in the working folder and github:
As far as I can see, all the data files are the same format, e.g. when doubleclicked in file explorer they open in RStudio with the right name and same format. gbm.auto/R/data.R file is here. Per the last image, the three data files listed in datalist can be loaded in R with library(gbm.auto) data(Juveniles), but the other three data files can't. If I delete/rename the existing datalist from /data and generate a new one with add_datalist(pkgname = getwd()), a new file is generated but again it only lists those 3 files, not all 6.
Ugh, goddamn it. Found the issue. The 3 'bad' files had "Rdata" extensions while the 3 good ones had "RData" extensions. Lower case vs capital D. How unbelievably annoying.

Data files in data must have .RData extensions, not .Rdata
Bug filed here.

Related

How to include raw data in an R package

I'm working on the final assignment of the course Building R Packages.
In this assignment, we need to create an R package based on some example functions provided by the instructors. We need to organize and document the package, then make it available on GitHub. My package is called FARS and is already available in this GitHub repo.
I'm having trouble with making raw data available with the package. After following the instructions provided in the course's readings and also in chapter 14.3 of the book Building R Packages, the files are still not being recognized.
What did I do so far?
Prepared all the package's documentation, including roxygen2 tags, DESCRIPTION, README.Md, and vignette, following these steps in addition to instructions provided in the readings and book mentioned;
Created a subdirectory named inst/extdata in the package's directory;
Copied all three example files (.csv.bz2) with raw data to inst/extdata;
Tested the functions using testthat;
Installed my FARS package.
Now I'm trying to check if one of the files is available after installing the package:
system.file("extdata", "accident_2013.csv.bz2",
package = "FARS",
mustWork = TRUE)
I get an error message:
Error in system.file("extdata", "accident_2013.csv.bz2", package = "FARS", :
no file found
These data files need to be available with the package, so the examples provided in the vignette work properly.
Here's a "real-life" example, using a simple package I wrote recently.
I have a "data" directory in the build directory.
EDIT To clarify the comments found in R-exts, the directory tree packagename/inst/extdata is intended for data that your functions call directly, by specifying that directory path. Since you want to load data into your workspace, use the data directory.
My "data" directory contains one file named preciseNumbersAsChar.r . That file contains assignments such as
charE <- {long number string}
If you read the help page for the command data, it explains that files ending in .r are sourced when called.
library(FunWithNumbers)
data('preciseNumbersAsChar') #works
Which is to say, the defined objects are now in my environment.
It's worth reading the help page for data in detail as different file types are handled slightly differently.

Where to put R files that generate package data

I am currently developing an R package and want it to be as clean as possible, so I try to resolve all WARNINGs and NOTEs displayed by devtools::check().
One of these notes is related to some code I use for generating sample data to go with the package:
checking top-level files ... NOTE
Non-standard file/directory found at top level:
'generate_sample_data.R'
It's an R script currently placed in the package root directory and not meant to be distributed with the package (because it doesn't really seem useful to include)
So here's my question:
Where should I put such a file or how do I tell R to leave it be?
Is .Rbuildignore the right way to go?
Currently devtools::build() puts the R script in the final package, so I shouldn't just ignore the NOTE.
As suggested in http://r-pkgs.had.co.nz/data.html, it makes sense to use ./data-raw/ for scripts/functions that are necessary for creating/updating data but not something you need in the package itself. After adding ./data-raw/ to ./.Rbuildignore, the package generation should ignore anything within that directory. (And, as you commented, there is a helper-function devtools::use_data_raw().)

Automatic loading of data from sysdata.rda in package

I have spent a lot of time searching for an answer to what is probably a very basic question, but I just can't find the solution to my issue. The closest that I found was this exchange from a few years ago.
In that case, the issue was the location of the sysdata.rda file in the correct directory within the package. That is not my issue.
I have some variables that store things like color palettes that I amusing inside a package. These variables are only used inside my functions so I storing them in R/sysdata.rda. However, when I load the packages, the variables are not loading into the package environment. If I load the data manually from sysdata.rda then everything works fine.
My impression from reading everything that I could find on internal data in R packages was that the data in R/sysdata.rda would load automatically.
Here is the code that I am using to store my data.
devtools::use_data(tmpBrks, tmpColors, prcpBrks, prcpChgBrks,
prcpChgBrkLabels, prcpColors, prcpChgColors,
internal = TRUE, overwrite = TRUE)
That successfully creates the data file at R/sysdata.rda and the data is in the file when I load it manually.
What do I need to do to have the data load automatically so the functions in my package can use them?
As usual, this was a bad combination of user ignorance and poor R documentation. The data was being loaded and was available to the functions. Where I went wrong was in assuming that the data would be visible in the package environment. That is not the case.
As far as I can tell, internal data in the R\sysdata.rda file is available to the functions within the package, but not visible in any way. After I created the internal data file I was looking for the data in the package environment. When I didn't see it I assumed that it wasn't loaded. When I kept pushing forward with my package development I finally realized that the data was loading silently and accessible to the functions in the package.
As evidenced by the two up votes that my question got, I am not the only one who didn't understand the behavior of the R\sysdata.rda internal data. Hopefully this explanation will save someone else a bunch of time searching for an answer to this issue that doesn't really exist.

How to take data files info to index and how to update a data file in an existing R package properly?

I have written an R package in which the names of the functions are in Turkish.
I wanna take that package to CRAN with internalization. I changed all of the Turkish names (of functions, of data sets) to English so that everybody can easily use the package. After that, I followed regular "library(roxygen2); library(devtools); library(digest); roxygenize("causfinder"); build("causfinder"); install("causfinder"); library(causfinder)" way in process.
At the end, all of the functions appear with their English names this time in the Object Browser of Revolution R (version 7.1.0 Academic License). So, for the conversion of the names of the functions, all are OK.
Problem:
But, interestingly, not all of the names of data files in the package are not converted to English.
What I did to solve the problem till now:
I tried every sort of trick I know:
1. I deleted package from library location (I have only 1 such location: "C:\Revolution\R-Enterprise-7.1\R-3.0.2\library") completely and rebuild package and install again.
2. I deleted package from working directory "C:\Users\erdogan\Documents\Revolution", and triggered "library(roxygen2); library(devtools); library(digest); roxygenize("causfinder"); build("causfinder"); install("causfinder"); library(causfinder)" process
3. By giving possibility to Buffer effects, I deleted "C:\Users\erdogan\Documents\Revolution\32_7.1" so that "PackageXMLs\causfinder.xml" in that folder does not meddle in improperly. I had R created "32_7.1" folder by restarting Revo R.
4. I applied tricks suggested by "Dirk Eddelbuettel" here:
Update the dataset in an installed package
"updating the source and re-installing with a new distinctive version number": Not worked.
"by forcefully overwriting it, possibly. Not the proper way to do it.": How to apply that force?
My findings that may perhaps give an idea to professionals to solve the problem:
Only one of the data files correctly got renamed, and at the end of that data file in object browser "[Package causfinder version 1.0 Index]" appears. The names of all the other data files are still in Turkish and at the end of those data files, the phrase "[Package causfinder version 1.0 Index]" does not appear! I did not do anything peculiar to that data file whose internalization was done correctly.
Any help will be greatly appreciated.
Step by Step Solution: (Notebook with 32-bit Windows; GUI: Revolution R Enterprise (32-bit))
1. Prepare the environment by cleaning related folders:
1a. Delete the package folder in R's library location via Windows Explorer:
(I have only 1 such location: I deleted "C:\Revolution\R-Enterprise-7.1\R-3.0.2\library\causfinder" folder)
(Run ".libPaths()" to see R's library locations and delete the package's folder from all R's library locations)
1b. Delete the package folder in R's working directory via Windows Explorer:
(I have only 1 such location: I deleted "C:\Users\erdogan\Documents\Revolution\causfinder" folder)
(Run "getwd()" to see R's working directory and delete the package's folder from all R's working directories)
1c. Delete the "32_7.1" or "64_7.1" folder (32bit, 64 bit R, whichever you use) from R's working directory via Windows Explorer:
The .xml file of the package in discussion in this folder may sometimes meddle in and affect the results of R commands, and gives unexpected results.
Delete "C:\Users\erdogan\Documents\Revolution\32_7.1" folder where "PackageXMLs\causfinder.xml" is located.
(When Revolution R restarted, 32_7.1 (or 64_7.1) folder is created automatically if it doesn't exist)
Restart Revolution R now.
2. Create .rda and .Rd files via R and put them in relevant location.
((For English ones, I created .rda and .Rd files like this:
V6Stationary43Obs.df <-
read.csv("C:/Users/erdogan/Documents/Revolution/V6Stationary43ObsWithoutX.csv", header = TRUE, stringsAsFactors = FALSE)
# create V6Stationary43Obs.df.rda dataset file; df to denote data frame
save(V6Stationary43Obs.df, file="V6Stationary43Obs.df.rda")
prompt(V6Stationary43Obs.df) # creates V6Stationary43Obs.df.Rd help file))
((For Turkish ones, I performed the followings once upon a time
D6Duragan43Gozlem.dvc<- read.csv("C:/Users/erdogan/Documents/Revolution/D6Duragan43GozlemXsiz.csv", header = TRUE, stringsAsFactors = FALSE)
# create D6Duragan43Gozlem.vc.rda dataset file; df to denote data frame
save(D6Duragan43Gozlem.vc, file="D6Duragan43Gozlem.vc.rda")
prompt(D6Duragan43Gozlem.vc) # create D6Duragan43Gozlem.vc.Rd help file))
3. Take the .rda and .Rd files (created in Step2) to the "data" and "man" folder in R's working directory via Windows Explorer:
V6Stationary43Obs.df.rda dataset file --> C:\Users\erdogan\Documents\Revolution\causfinder\data
V6Stationary43Obs.df.Rd help file --> C:\Users\erdogan\Documents\Revolution\causfinder\man
4. Fill at least the "Title" and "Description" tags of .Rd files (created in Step3) via R:
"File - Open - File... - V6Stationary43Obs.df.Rd"
\title{
V6Stationary43Obs is..... .
}
\description{
V6Stationary43Obs does..... .
}
5. Apply roxygenization:
library(roxygen2)
library(devtools)
library(digest)
roxygenize("causfinder")
build("causfinder")
install("causfinder")
library(causfinder)
Solution: (Notebook with 64-bit Windows; GUI: Revolution R Enterprise (32-bit))
The above process is performed with the following additional actions:
1. The datasets are created as usual (for example; ".df" to denote data frame):
X.df <- read.csv("C:/Users/erdogan/Documents/Revolution/X.csv", header = TRUE, stringsAsFactors = FALSE)
save(X.df, file="X.df.rda") # X.df.rda dataset is created
prompt(X.df) #X.df.Rd help file is created.
2. Revolution R is closed, and in the working directory, the folder "32_7.3" is deleted. We delete this folder because: The information (functions, data sets, etc.) of our package (here: causfinder) are stored as .xml file in 32_7.3. However, this .xml file cannot update itself once we add our data sets one by one.
3. Revolution R is opened (the folder "32_7.3" is re-created). At this point, Step 5 above (Applying roxygenization) is performed. Once we do this, the datasets are visible in Revolution R's Object Browser.
4. The control of everything realized seamlessly: Go to the "data" folder of package's library location. Here, you must see only ".rdb, .rds, .rdx" files.
Solution: (Notebook with 64-bit Windows; GUI: Revolution R Enterprise (64-bit))
Apply the steps for 32-bits. If everything is OK, then fine: you are done. If not OK (i.e. datasets are not visible in Object Browser of Revo R and data folder of the package in the library location does not include only the .rdb, .rds, .rdx files), then do not panic:
You can still work with the datasets you created that are seen as .rda files in the package's library location: Use "data" command:
data(YourDatasetName, package = causfinder, lib.loc = YourLibraryLocation)
Learn about data command more. Upon using this command, you get the dataset in the Object Browser as an object of Global Environment.

Exporting data in Roxygen2 so that they are available without requiring data()

After reading questions such as this SO question on documenting a data set with Roxygen I have managed to document a dataset (which I will refer to as cells) and it now appears in the list generated by data(package="mypackage") and is loaded if I run the command data(cells). After this, cells will appear when ls() is run.
However, in many packages the data is immediately available without requiring a data() call. Also, the data names do not appear when ls() is run. An example is the baseball data set that comes with plyr. I have looked at the source for plyr and I cannot see how this is done.
In the DESCRIPTION file of your package make sure that there is a field called LazyData that is set to TRUE.
From the "Writing R Extensions" guide:
The ‘data’ subdirectory is for data files, either to be made available
via lazy-loading or for loading using data(). (The choice is made by
the ‘LazyData’ field in the ‘DESCRIPTION’ file: the default is not to
do so.)
I think the exact syntax changed with R version 2.14; before that it was LazyLoad not LazyData.

Resources