How to share objects in vignettes documentation in a R package? - r

I'm writing documentation for my R package using vignettes (which will be embedded in a pkgdown website.
My question is :
if I create an R object in a chunk within a first "aa" vignette.
myobject <- mypkg::myfct()
How to reuse this object in a second vignette called "bb"?
verif <- myobject[myfilters,]
I get this error : myobject not found

This doesn't seem like a good idea.
You can save this data in an internal RDA file, and load it in the second vignette. See Chapter 14 External data, R Pakcages.
So you'd have to:
Create a data_raw folder, and have a script that creates this object, then saves it. This is facilitated by usethis::use_data_raw, with like "myfct_example".
Then, at the bottom of this script in data_raw ensure that internal = TRUE; This will save the object in R/sysdata.rda, instead of data/myfct_example (if you call that object that).
Ensure that you run the data_raw/myfct_example script every time you change myfct_example, so the new version get stored in R/sysdata.rda.
In both vignettes, you'll have this data object available, by mypkg:::myfct_example.

Related

How to include raw data in an R package

I'm working on the final assignment of the course Building R Packages.
In this assignment, we need to create an R package based on some example functions provided by the instructors. We need to organize and document the package, then make it available on GitHub. My package is called FARS and is already available in this GitHub repo.
I'm having trouble with making raw data available with the package. After following the instructions provided in the course's readings and also in chapter 14.3 of the book Building R Packages, the files are still not being recognized.
What did I do so far?
Prepared all the package's documentation, including roxygen2 tags, DESCRIPTION, README.Md, and vignette, following these steps in addition to instructions provided in the readings and book mentioned;
Created a subdirectory named inst/extdata in the package's directory;
Copied all three example files (.csv.bz2) with raw data to inst/extdata;
Tested the functions using testthat;
Installed my FARS package.
Now I'm trying to check if one of the files is available after installing the package:
system.file("extdata", "accident_2013.csv.bz2",
package = "FARS",
mustWork = TRUE)
I get an error message:
Error in system.file("extdata", "accident_2013.csv.bz2", package = "FARS", :
no file found
These data files need to be available with the package, so the examples provided in the vignette work properly.
Here's a "real-life" example, using a simple package I wrote recently.
I have a "data" directory in the build directory.
EDIT To clarify the comments found in R-exts, the directory tree packagename/inst/extdata is intended for data that your functions call directly, by specifying that directory path. Since you want to load data into your workspace, use the data directory.
My "data" directory contains one file named preciseNumbersAsChar.r . That file contains assignments such as
charE <- {long number string}
If you read the help page for the command data, it explains that files ending in .r are sourced when called.
library(FunWithNumbers)
data('preciseNumbersAsChar') #works
Which is to say, the defined objects are now in my environment.
It's worth reading the help page for data in detail as different file types are handled slightly differently.

How to use an R package's own data when writing a vignette

I have written most of an R package and now wish to write a vignette that uses my own data, that is already in the package. The data is correctly stored as my_data.Rda in the Data folder, and when the package is loaded I can access it in the Console, for instance by using data(my_data).
My problem comes when, using usethis::use_vignette("my_vignette") , I want to include something like this (much more complex in practice, of course) in the vignette:
The mean of my_data is given by
```{r} data(my_data)
mean(my_data)
```
When I knit the vignette I get the message
"Error in assert_engine(is_numeric, x, .xname = get_name_in_parent(x),
: object 'my_data' not found"
I have looked at this post: How to add external data file into developing R package? but that addresses external data.
What am I doing wrong?
I have created a minimal R package with the relevant Rmd file in the vignettes folder. link to Github
I think you are supposed to use
data(my_dataset, package = "my_package")
to load your package's data into the session where your vignette is built.
Could you confirm that your datasets are stored inside the ./data directory of your package as *.rda files

R Package Build/Install Error: "object not found" even though I have it in R/sysdata.rda

Similar Question
accessing sysdata.rda within package functions
Why This Similar Question Does Not Apply To Me
They were able to actually build it, and apparently it was a Github error for them (not related)
R VERSION
3.4.2 (I tried using 3.4.3 as well but the same problem occurred)
EDIT: I am on Windows 10
Context
I have fully read the following tutorial on R packages and how to include .Rda files in them. I have LazyData in my DESCRIPTION file set as true as well. I have tried both the data/ folder implementation and the R/sysdata.rda implementation using the function devtools::use_data() with the respective options of internal = FALSE and internal = TRUE.
However, when I try to build the package, or use devtools::install (which builds as well I assume), it fails and gives me the following error message:
Error in predict(finalModel, newInput) : object 'finalModel' not found
Where finalModel is stored within my .rda file.
Does anyone know any possible reasons why this might occur?
I also asked a coworker to install the package on his machine, but unfortunately he got the exact same error.
I made another test package as a 'sanity-check' by creating a simple linear model using the lm() function on datasets::swiss, and then made a test package with this newly created model as a .rda file. When I referenced this test model in a function within this test package, it eerily worked, despite the fact that (to the best of my knowledge) I used the exact same steps to create this new R package.
Also, I unfortunately cannot share the code for the package I am creating, but I am willing to share the code for the test package that uses the swiss dataset.
Thank you in advance.
EDIT: My .rda file I am putting in the package was created last year, if that has anything to do with it.
I just solved a similar issue of having object 'objectName' not found that arose during package management. In my case, the issue related to losing the context of variables when working with parallelization.
When using parallel::clusterExport(cl, varlist=c("function-name")), clusterExport looks at .GlobalEnv for variable definitions. This wouldn't come up during debugging, as I always the variables defined in .GlobalEnv. The solution was to state the environment explicitly: parallel::clusterExport(cl, varlist=c("function-name"), envir=environment()). This ensures that the parallel processes have context of the variables within the data/ folder and R/sysdata.rda.
Source
If you have more than one internal file, you must save them together:
usethis::use_data(file_1,
file_2,
file_3,
internal = TRUE,
overwrite = TRUE)

Generate a call to a package function programatically given vector of package names

In my work I develop R packages that export R data objects (.RData). The name of these .RData files is always the same (e.g. files.RData). These packages also define and export a function that uploads the data to my database, say upload_data(). Inside upload_data() I first load the data using data(files, package = "PACKAGE NAME") and then push it into my database.
Let's say I have two packages, package1 and package2, which live on my file system. Given a vector of the package names (c("package1", "package2")), how would I go about to call 'upload_data()' programatically? Specifically, inside a script, how would I construct a call using "::" notation that constructs and evaluates a call like this: package1::upload_data()). I tried 'call' but couldn't get it right.
You could go the route of constructing the call using :: notation and evaluating that - but it's probably just easier to directly use get and specify the package you want to grab from.
get("upload_data", envir = asNamespace("package1"))
will return the function the same as using package1::upload_data would but is much easier to deal with programatically.

accessing sysdata.rda within package functions

I thought that putting an internal dataset for a package into R/sysdata.rda would make the data accessible to my functions. But I can't seem to figure out how to actually access this dataframe. None of the documentation actually says how to access the data, but my guess was that I could simply refer to the dataframe by name. However, this does not seem to work.
I used devtools::use_data() with internal = TRUE and sysdata.rda was created. Lazy-loading is set to TRUE.
To test it, I manually loaded it just to make sure it was the right file. The file is called nhanes_files. Within my function, I simply refer to the nhanes_files object and extract the necessary data. When I tested my function in my package project, it seemed to work. When I build and load the package, upload to GitHub, and then install the package into a new project, I get an error: Error in find_data() : object 'nhanes_files' not found
Do I need to do something else to make this internal data accessible to my functions?
Below is the most basic function, which is not working:
#' Print NHANES file listing
#'
#' Provides access to the internal data listing all NHANES files
#'
#' #return A data frame with the list of files that can be accessed through the NHANES website. Should not generally be used. Present for debugging purposes and transparency.
#' #export
find_data <- function(){
nhanes_files
}
If your package name is somepackage and the object saved was nhanes_files with devtools::use_data(nhanes_files, internal = TRUE) then you can access this in your functions by calling somepackage:::nhanes_files.
Pay attention, there're 3 : here.
I use myobject <- get0("myobject", envir = asNamespace("mypackage")).
This formulation passes R CMD CHECK. It is possible to change the name of the value, and it works to access objects in other loaded packages.

Resources