I am currently working on a GitHub-based R package and I would like to add markdown text containing coding style guidelines. The problem is that I can't find the best place to put that information. According to Writing R Extensions, I can't just add a STYLE.md file without having the package fail basic build checks. They say:
The sources of an R package consists of a subdirectory containing a
files DESCRIPTION and NAMESPACE, and the subdirectories R, data, demo,
exec, inst, man, po, src, tests, tools and vignettes (some of which
can be missing, but which should not be empty). The package
subdirectory may also contain files INDEX, configure, cleanup,
LICENSE, LICENCE and NEWS. Other files such as INSTALL (for
non-standard installation instructions), README/README.md, or
ChangeLog will be ignored by R, but may be useful to end users.
So my current solution is to integrate the style guide into README.md file, but I don't like to have it there, because the guide is a secondary thing written for the package contributors and not the users; it should not be visible for everybody visiting the package's homepage (which is what happens for anything written in README.md and pushed to GitHub).
Another thing I thought about doing was to create a vignette for that; the developers of the rockchalk package did just that. However, my guide is much shorter, I don't think it fits well with the spotlight and printing-friendly format of vignettes.
Where else could/should I put the style guide?
Related
I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.
I am currently developing an R package and want it to be as clean as possible, so I try to resolve all WARNINGs and NOTEs displayed by devtools::check().
One of these notes is related to some code I use for generating sample data to go with the package:
checking top-level files ... NOTE
Non-standard file/directory found at top level:
'generate_sample_data.R'
It's an R script currently placed in the package root directory and not meant to be distributed with the package (because it doesn't really seem useful to include)
So here's my question:
Where should I put such a file or how do I tell R to leave it be?
Is .Rbuildignore the right way to go?
Currently devtools::build() puts the R script in the final package, so I shouldn't just ignore the NOTE.
As suggested in http://r-pkgs.had.co.nz/data.html, it makes sense to use ./data-raw/ for scripts/functions that are necessary for creating/updating data but not something you need in the package itself. After adding ./data-raw/ to ./.Rbuildignore, the package generation should ignore anything within that directory. (And, as you commented, there is a helper-function devtools::use_data_raw().)
I get this warning
Non-standard file/directory found at top level:
‘data-raw’
when building my package, even there is the recommendation of creating this folder to create package data http://r-pkgs.had.co.nz/data.html#data-sysdata
Any comments on that or do I need a specific setting to get rid of this message.
When used, data-raw should be added to .Rbuildignore. As explained in the Data section of Hadley's R-Packages book (also linked in the question)
Often, the data you include in data/ is a cleaned up version of raw data you’ve gathered from elsewhere. I highly recommend taking the time to include the code used to do this in the source version of your package. This will make it easy for you to update or reproduce your version of the data. I suggest that you put this code in data-raw/. You don’t need it in the bundled version of your package, so also add it to .Rbuildignore. Do all this in one step with:
usethis::use_data_raw()
I'm planning to condense some of my code into a package, and was looking at the source of a few published packages on CRAN as a guide. I notice many packages include the file R\zzz.R, so I presume there must be some convention surrounding this.
However, I cannot find any mention of zzz.R in the official Writing R Extensions guide. What is this file for, and do I need to include one in my package? Why is it named the way it is - why not zzzz.R?
It's a file where one usually puts actions on load of the package. It is tradition/convention that it's called zzz.R and could be called anything.R
You only need to include this if you want you package to do something out of the ordinary when it loads. Keep looking at what people put in there and you'll begin to get a sense of what they're used for.
This zzz.R file was also mentioned by Hadley Wickham in his book "R packages", at the bottom of "When you do need side-effects" section.
https://r-pkgs.org/Code.html#when-you-do-need-side-effects
If you use .onLoad(), consider using .onUnload() to clean up any side effects. By convention, .onLoad() and friends are usually saved in a file called R/zzz.R. (Note that .First.lib() and .Last.lib() are old versions of .onLoad() and .onUnload() and should no longer be used.)
I'm creating an R package and I need it to include a couple of non R script files which get called by one of my functions. I need these script files to be distributed with the package, naturally. So that leaves me with two questions:
a) In which directory of the package
tree should I place these files? b) Is that location mandatory or just convention?
Do I need to change any other
settings or configurations or will
they just get copied to the
directory mentioned in #1 and then I
can figure out the path using
system.file()?
I've tried to find the answer in the Writing R Extensions document, but it didn't jump out at me. And, of course, I didn't read the whole thing. Am I being too honest here?
I think you want either exec/ at the top-level (even though that is labeled 'still experimental, or subdirectory of inst as everything in inst/ gets copied verbatim into the package.
A quick example from the packages I have expanded in source is gdata which has inst/perl, inst/xls and inst/bin. These you could then call from R itself by computing the path of the installed package using system.file().