Including data and vignette in package? Problems with size - r

I have been working on an R package. A package deals with large data (more than 200-300mb is the data file that is used in vignette example, however, when compressed, the whole package is about 20-30mb). If I would use smaller data, an example in vignette would not make any sense.
Since I plan to upload this to a CRAN, what should I do? I read that you need to "justify the size of your package" and "If it’s still too large, consider moving data into its own package." (this is from R packages by Hadley Wickham). What does this mean?
When running check as cran, I get a note about package size (0 errors and 0 warnings). It doesn't seem like the package is that big and it is impossible to include less data because test example in vignette would not make any sense. Is this a valid reason for having package size larger than 5mb or will it be turned down from cran?
I thought about putting this data on my private server and loading it over the internet (for example read.tabe(file="https://link_to_file....")) in vignette Rmd. However, I found out that all the data for creating a vignette has to be in vignettes folder.
Then I thought about excluding the vignette, pkgdown site and the data (and .r file describing this data in R folder) from the build and submitting the package as it is to cran (with no vignette and data). Furthermore, I would put the vignette, pkgdown site and the data on another github repository available for everybody, providing the link in README file.
Does this make sense to you? By proceeding this way I would have a package which, if tested, would give 0 errors, warnings and notes but without complete documentation (however, I would provide it over github repo).
What do you think? Does submitting a package without vignette increases chances of being turned down (even though you have 0 errors/warnings/notes in a check)? Or should I include the vignette but then have a > 5mb package (which returns 1 note in check)? Is there a way to submit a package with a vignette but without the data but still provide the data separately?
I have no idea how to proceed. Any suggestions are welcome, thank you...

Related

non standard file "data-raw" note on building/checking a package in R

I get this warning
Non-standard file/directory found at top level:
‘data-raw’
when building my package, even there is the recommendation of creating this folder to create package data http://r-pkgs.had.co.nz/data.html#data-sysdata
Any comments on that or do I need a specific setting to get rid of this message.
When used, data-raw should be added to .Rbuildignore. As explained in the Data section of Hadley's R-Packages book (also linked in the question)
Often, the data you include in data/ is a cleaned up version of raw data you’ve gathered from elsewhere. I highly recommend taking the time to include the code used to do this in the source version of your package. This will make it easy for you to update or reproduce your version of the data. I suggest that you put this code in data-raw/. You don’t need it in the bundled version of your package, so also add it to .Rbuildignore. Do all this in one step with:
usethis::use_data_raw()

best way to link to a vignette from manual in an R package

I'm developing an R package, and I'm trying to make a link from the manual of the package to its vignette (a pdf). I've make this in the R function code, and it works:
\link[=../doc/package.pdf]{package's User Manual}
The problem is that the devtools::check() complains with a warning, which also causes a delay in the process of revision when uploading to CRAN...
* checking Rd cross-references ... WARNING
Missing link or links in documentation object 'package.Rd':
'../doc/package.pdf'
Is there a better way of linking from man to vignette? or it is not correct to do so? As the pdf can contain more graphical information, it seems desirable to be able to link to it.
If you use pkgdown to make a website out of your package, then you can directly link to the url of the specific vignette.
Or you can just write
Run \code{vignette("NAME_OF_YOUR_VIGNETTE", package = "NAME_OF_YOUR_PACKAGE")} to see the corresponding vignette.

What is the difference between a manual and a vignette?

I've been reading through R's Affy manual, and it refers to other vignettes. Does the difference between these two terms simply relate to quantity of content, or is there more to it?
Reference Manuals
The reference manual of a package is a single document beginning with the package description and containing all of the content from the .Rd help files for the packages. Generally, this means it has the help files for (exported) functions in the package, any documented data sets, and package-level documentation (if included). It is automatically generated from the .Rd sources.
Every package has a manual. Even a package with no exported functions and documentation would still have a manual (when built) consisting of the text from the Description file.
Vignettes
Vignettes are free-form documents. Generally, package authors use them to demonstrate the use of their package. They are optional, some packages have several (as I write this dplyr has 8 vignettes) and many packages have none.
As mentioned in the comments the R Extensions Manual is the definitive source for all things package-related. Here is Josh's link to the Vignettes Section, and 2.15 Processing documentation files describes how reference manuals are built.

What methods exist for distributing a semi-live dataset with an R package?

I am building a package for internal use using devtools. I would like to have the package load in data from a file/connection (that differs depending on the date package is built). The data is large-ish so having a onetime cost of parsing and loading the data during package building is preferable.
Currently, I have a data.R file under R/ that assigns the data to package-level variables, the values are assigned during package installation (or at least that's what appears to be happening). This less than ideal setup mostly works. In order to get all instances of the package to have the same data I have to distribute the data file with the package (currently it's being copied to inst/ by a helper script before building the package) instead of just having it all be packaged together. There must be a better way.
Such as:
Generate .rda files during package building (but this requires not running the same code during package install)
I can do this with a Makefile but that seems like overkill
Can I have R code that is only run during package building and not during install?
Run R code in data/
But the data is munged using code in the package in question. I can fix that with Collate (I think) but then I have to maintain the order of all of the .R files (but with that added complexity I might as well use a Makefile?)
Build two packages, one with all of the code I want, one with the data.
Obvious, clever things I've not thought of.
tl;dr: What are some methods for adding a snapshot of dynamically changing data to an R package frozen for deployment?
As #BenBolker points out in the comments above, splitting the dataset out into a different package has precedent in the community (most notably the core package datasets) and has additional benefits.
The separation of functions from data also makes working on historic versions of the data easier to do with the up to date functions.
I currently have an tools-to-munge package and a things-to-munge package. Using a helper script I can build the tools-to-munge and setup a Suggests (or Depends) in the DESCRIPTION of both packages to point to the appropriate incrementing version of the packages. After the new tools-to-munge package has been built I can build the things-to-munge package as necessary using the functions in the tools-to-munge package.

R -- Vignettes that are not made by Sweave possible?

Can I include some PDF in the pkg/doc folder so that the vignette function works, but no corresponding Rnw, Rtex, etc exists?
I am thinking of slides or documents containing markdown text weaved with R chunks, which have a different build process and hence different file extensions.
The writing R extensions guide suggests that it should be possible to include documents which can not be build at installation time, but the vignette function seems to look for files with special extensions (Rnw, Rtex, etc) and also for a file called vignette.rds.
Any hints are appreciated.
I asked about this several years ago, and while Fritz Leisch is amenable to the idea, he hasn't had the time to implement it.
(Cross-posted from a response I just left on R-help:)
As a workaround, you could include your own xvignette function in your package: see below.
It won't show you indices, but it will pick up any appropriately named file that you include in the inst/doc directory of your
package ...
xvignette <- function(vname,pkg,ext="pdf") {
vname <- paste(vname,ext,sep=".")
fn <- system.file("doc",vname,package=pkg)
if (nchar(fn)==0) stop("file not found")
utils:::print.vignette(list(pdf=fn))
invisible(fn)
}
You'll have to somehow alert your package users to the fact that this alternative documentation exists -- perhaps in the help file for the package itself.
You might fill in the default value of pkg above with your package name to make it easier on the user: I thought about using some variant of getPackageName(environment(xvignette)) to do it automatically, but that seems too complicated ...
Brian Ripley also mentioned in his response to the question that:
At present vignette() means Sweave documents, as only they have
metadata like titles. This is planned to be changed soon.
... but I don't know what "soon" means (it will be about 6 months until 2.14.0 comes out, I think)
edit: http://article.gmane.org/gmane.comp.lang.r.devel/28449 details another workaround (creating a dummy vignette that incorporates the existing PDF file)
edit 2: And
here's what Yihui Xie has to say about including knitr-based vignettes in packages (essentially another "dummy vignette" strategy)
vignette about non-Sweave vignettes from the R.rsp package
This is supported natively as of R 3.0.0, see http://yihui.name/knitr/demo/vignette/.
Instructions to use knitr as vignette engine boil down to:
add %\VignetteEngine{knitr::knitr} to the Rnw source document (note you still need %\VignetteIndexEntry{} as before)
specify VignetteBuilder: knitr in the package DESCRIPTION file
add Suggests: knitr in DESCRIPTION if knitr is needed only for vignettes
See also the official R documentation on that topic.

Resources