R testthat: use external package only in test file - not in DESCRIPTION - r

I'm trying to run a testthat script using GitHub Actions.
I would like to test a functionality of my function that allows it to be combined with (many) external packages. Now I want to test these external packages for the R CMD Check but I don't want to load the external packages generally (i.e. putting them into the Description) - after all, most people will not use these external packages.
Any ideas how to just include an external package in the testing files but not in the DESCRIPTION?
Thanks!

I think you describe a very standard use of Suggests.
I see two related but separable issues:
You want to test something using CI, in this case GHA. That is fine. Because you control the execution of the code, you could move your code from the test runner to, say, inst/examples and call it explicitly. That way the standard check of 'is the package using undeclared code' passes as inst/examples is not checked
You want to not force other people to have to load these packages. That is fine too, and we have Suggests: for this! Read Section 1.1 of Writing R Extensions about all the detailed semantics. If your package invokes other packages via tests, the every R CMD check touches this (and the external packages) so they must be declared. But you already know that only "some" people will want to use this "some of the time": that is precisely what Suggests: does, and you bracket the use with if (requireNamespace(pkgHere, quietly=TRUE)).
You can go either way, or even combine both. But you cannot call packages from tests and not declare them.

Related

Run-time vs develop-time dependencies in R

I'm developing a package (golem) in R, and it returns a NOTE about excess package in an Import (DESCRIPTION):
checking package dependencies … NOTE
Imports includes 34 non-default packages.
Importing from so many packages makes the package vulnerable to any of
them becoming unavailable. Move as many as possible to Suggests and
use conditionally.
I have allocated some packages in Suggests (DESCRIPTION), like this:
usethis::use_package(package = "ggplot2", type = "Suggests")
usethis::use_package(package = "MASS", type = "Suggests")
I would like to know :
What is the difference between Imports (run-time) vs Suggests (develop-time) and if the latter has anything to do with the term "compile time" of other programming languages.
How do I know a package is needed by the user at runtime? Is there any universal rule for this (like a phrase to help you know)? And for Suggests?
In R, packages listed in the Imports clause of the DESCRIPTION file must be available or your package won't load. Normally they will all be loaded when your package is loaded, though it's possible to delay that by not importing anything, just using :: notation to access them.
Packages listed in the Suggests clause don't need to be available, and won't be automatically loaded. To access their functions, you normally call requireNamespace() to find out if the package is available, and if so use :: for access. If it is not available, your package should fail gracefully in whatever the user was trying to do, letting them know that they need to install the missing package if they want the task to succeed.
These aren't really "run-time" versus "develop-time" differences. It's all run-time.
There are two things in R that might be called "compile-time" in other languages. The best match is installing your package. That configures it to the particular R version and platform it is running on. R also has a "just-in-time" compiler that optimizes functions, but other than a bit of a speed increase that is pretty much invisible to the user.
I think #r2evans answered your second question clearly in a comment: the user needs a package to use functions that use that package. If some of your functions that use it are unlikely to be used by most users, use Suggests, and add the test.

add and Rcpp file to an existing r Package?

I have already made a simple R package (pure R) to solve a problem with brute force then I tried to faster the code by writing the Rcpp script. I wrote a script to compare the running time with the "bench" library. now, how can I add this script to my package? I tried to add
#'#importFrom Rcpp cppFunction
on top of my R script and inserting the Rcpp file in the scr folder but didn't work. Is there a way to add it to my r package without creating the package from scratch? sorry if it has already been asked but I am new to all this and completely lost.
That conversion is actually (still) surprisingly difficult (in the sense of requiring more than just one file). It is easy to overlook details. Let me walk you through why.
Let us assume for a second that you started a working package using the R package package.skeleton(). That is the simplest and most general case. The package will work (yet have warning, see my pkgKitten package for a wrapper than cleans up, and a dozen other package helping functions and packages on CRAN). Note in particular that I have said nothing about roxygen2 which at this point is a just an added complication so let's focus on just .Rd files.
You can now contrast your simplest package with one built by and for Rcpp, namely by using Rcpp.package.skeleton(). You will see at least these differences in
DESCRIPTION for LinkingTo: and Imports
NAMESPACE for importFrom as well as the useDynLib line
a new src directory and a possible need for src/Makevars
All of which make it easier to (basically) start a new package via Rcpp.package.skeleton() and copy your existing package code into that package. We simply do not have a conversion helper. I still do the "manual conversion" you tried every now and then, and even I need a try or two and I have seen all the error messages a few times over...
So even if you don't want to "copy everything over" I think the simplest way is to
create two packages with and without Rcpp
do a recursive diff
ensure the difference is applied in your original package.
PS And remember that when you use roxygen2 and have documentation in the src/ directory to always first run Rcpp::compileAttributes() before running roxygen2::roxygenize(). RStudio and other helpers do that for you but it is still easy to forget...

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

What is the difference between library()/require() and source() in r?

Looked around and still not sure what is the difference between library()/require() and source() in r? According to this SO question: What is the difference between require() and library()? it looks like library() and require() are the same thing, maybe one is legacy. Is source() for lazy developers that don't want to create a library? When do you use each of these constructs?
The differences between library and require are already well documented in What is the difference between require() and library()?.
So, I will focus on how source differs from these. In fact they are fundamentally quite different commands. Neither library nor require actually execute any code. They simply attach a namespace, in a lazy fashion, meaning that individual functions in the package are not run unless they are actually called later. Source on the other hand does something quite different which is to execute all of the code in the file at that time.
A small caveat: packages can be made to actually run some code at the time of package loading or attaching, via the .onLoad and .onAttach functions. Have a look here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/ns-hooks.html
source runs the code in a .R file, line by line.
library and require load and attach R packages.
Is source() for lazy developers that don't want to create a library?
You're correct that source is for the cases when you don't have a package. Laziness is not the only reason, sometimes packages are not appropriate---packages provide functionality, but don't do things. Perhaps I have a script that pulls data from a database, fits a model, and makes some predictions. A package may provide functions to help me do that, but it does not actually do it. A script saved in a .R file and run with source() could run the commands and complete the task.
I do want to address this:
it looks like library() and require() are the same thing, maybe one is legacy.
They both do the same thing (load and attach a package). The main difference is that library() will throw an error and stop the script if the package is not available, whereas require() will return TRUE or FALSE depending on its success. The general consensus is that library is better so that your script stops with a nice clear error and you can install the missing package before proceeeding. The question linked has a more thorough discussion which I won't try to replicate here.

Where to put R files that generate package data

I am currently developing an R package and want it to be as clean as possible, so I try to resolve all WARNINGs and NOTEs displayed by devtools::check().
One of these notes is related to some code I use for generating sample data to go with the package:
checking top-level files ... NOTE
Non-standard file/directory found at top level:
'generate_sample_data.R'
It's an R script currently placed in the package root directory and not meant to be distributed with the package (because it doesn't really seem useful to include)
So here's my question:
Where should I put such a file or how do I tell R to leave it be?
Is .Rbuildignore the right way to go?
Currently devtools::build() puts the R script in the final package, so I shouldn't just ignore the NOTE.
As suggested in http://r-pkgs.had.co.nz/data.html, it makes sense to use ./data-raw/ for scripts/functions that are necessary for creating/updating data but not something you need in the package itself. After adding ./data-raw/ to ./.Rbuildignore, the package generation should ignore anything within that directory. (And, as you commented, there is a helper-function devtools::use_data_raw().)

Resources