Using 'require' package code to obtain datapackages on the fly in R - r

I am writing an R package that uses a variety of bioconductor annotation data packages. The specific data packages vary with the use-case. As such, I have a function which does something like this:
if (!require(biocpack_name, character.only=T)) {
source("https://bioconductor.org/biocLite.R")
BiocInstaller::biocLite(biocpack_name)
require(biocpack_name , character.only=T)
}
biocpack_name can be several of ~30+ annotation data packages that are looked up based on the particular data being analysed. As such, I don't want to have to add each to 'Suggests' (Im not even sure that would work because the error is not for a package but rather a string specifying the package). R CMD CHK gives me this error:
'library' or 'require' call not declared from: ‘biocpack_name’
'library' or 'require' call to ‘biocpack_name’ in package code.
How do I get around this?

It's not an error, but a warning. It goes away if you use character.only = TRUE rather than T (I guess because the value of TRUE is known and cannot be re-assigned, but T is unknown and can be anything, including FALSE). But in addition follow the advice in the warning to use requireNamespace() (and not pollute the user search path); maybe db = get(biocpack_name, getNamespace(biocpack_name)) will allow you to use the annotation package the way you'd like, e.g., mapIds(db, ...).
If one were pedantic, adding the packages to the Enhances: field of the DESCRIPTION file would communicate that your package somehow works with the annotation packages, but does not result in installation of the package (e.g., for building the vignette) unless explicitly requested.

Related

How to declare a dependency on an R package from which you only use S3/S4 methods, but no exports?

Currently I have in my package DESCRIPTION, a dependency on dbplyr:
Imports:
dbplyr,
dplyr
dbplyr is useful almost solely because of the S3 methods it defines: https://github.com/tidyverse/dbplyr/blob/main/NAMESPACE. The actual functions you call to use dbplyr are almost entirely from dplyr.
By putting dbplyr in my Imports, it should automatically get loaded, but not attached, which should be enough to register its S3 methods: https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-attach-vs-load.
This seems to work fine, but whenever I R CMD check, it tells me:
N checking dependencies in R code (10.8s)
Namespace in Imports field not imported from: ‘dbplyr’
All declared Imports should be used.
Firstly, why does R CMD check even check this, considering that it often makes sense to load packages without importing them. Secondly, how am I supposed to satisfy R CMD check without loading things into my namespace that I don't want or need?
I am pretty sure two of your assumptions are false.
First, putting Imports: dbplyr into your DESCRIPTION file won't load it, so its methods won't be loaded from that alone. Basically the Imports field in the DESCRIPTION file just guarantees that dbplyr is available to be loaded when requested. If you import something via the NAMESPACE file, that will cause it to be loaded. If you evaluate dbplyr::something that will cause it to be loaded. Executing loadNamespace("dbplyr") is another way, and there are a few others. You may also load some other package that loads it.
Second, I think you have misinterpreted the error message. It isn't saying that you loaded it without importing it (though it would complain about that too), it is saying that it can't detect any use of it in your package, so maybe it shouldn't be a requirement for installing your package.
Unfortunately, the code to detect uses is fallible, so it sometimes misses uses. Examples I've heard about are:
if the package is only used in the default value for a function argument. This has been fixed in R-devel.
if the package is only used during the build to construct some object, e.g. code like someclass <- R6::R6Class( ... ) needs R6, but the check code won't see it because it looks at someclass, not at the source code that created it.
if the use of the package is hidden by specifying the name of the package in a character variable.
if the need for the package is indirect, e.g. you need to use ggplot2::geom_hex. That needs the hexbin package, but ggplot2 only declares it as "Suggested".
These examples come from this discussion: https://github.com/hadley/r-pkgs/issues/828#issuecomment-1421353457 .
The recommended workaround there is to create an object that refers to the imported package explicitly, e.g. putting the line
dummy_r6 <- function() R6::R6Class
into your package is enough to suppress the note without actually loading R6. (It will be loaded if you ever call this function.)
However, your requirement is stronger: you do need to make sure dbplyr is loaded if you want its methods to be used. I'd put something in your .onLoad() function that triggers the load. For example,
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
}
EDITED TO ADD: As pointed out in the comments, there's a bug in the check code that means it won't detect this as being a use of dbplyr. You really need to do both things, e.g.
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
# Work around bug in code checking in R 4.2.2 for use of packages
dummy <- function() dbplyr::across_apply_fns
}
The function used in the dummy construction is arbitrary; it probably doesn't even need to exist, but I chose one that does.

Creating R package using code from script file

I’ve written some R functions and dropped them into a script file using RStudio. These are bits of code that I use over and over, so I’m wondering how I might most easily create an R package out of them (for my own private use).
I’ve read various “how to” guides online but they’re quite complicated. Can anyone suggest an “idiot’s guide” to doing this please?
I've been involved in creating R packages recently, so I can help you with that. Before proceeding to the steps to be followed, there are some pre-requisites, which include:
RStudio
devtools package (for most of the functions involved in creation of a package)
roxygen2 package (for roxygen documentation)
In case you don't have the aforementioned packages, you can install them with these commands respectively:
install.packages("devtools")
install.packages("roxygen2")
Steps:
(1) Import devtools in RStudio by using library(devtools).
(devtools is a core package that makes creating R packages easier with its tools)
(2) Create your package by using:
create_package("~/directory/package_name") for a custom directory.
or
create_package("package_name") if you want your package to be created in current workspace directory.
(3) Soon after you execute this function, it will open a new RStudio session. You will observe that in the old session some lines will be auto-generated which basically tells R to create a new package with required components in the specified directory.
After this, we are done with this old instance of RStudio. We will continue our work on the new RStudio session window.
By far the package creation part is already over (yes, that simple) however, a package isn't directly functionable just by its creation plus the fact that you need to include a function in it requires some additional aspects of a package such as its documentation (where the function's title, parameters, return types, examples etc as mentioned using #param, #return etc - you would be familiar if you see roxygen documentation like in some github repositories) and R CMD checks to get it working.
I'll get to that in the subsequent steps, but just in case you want to verify that your package is created, you can look at:
The top right corner of the new RStudio session, where you can see the package name that you created.
The console, where you will see that R created a new directory/folder in the path that we specified in create_package() function.
The files panel of RStudio session, where you'll notice a bunch of new files and directories within your directory.
(4) As you mentioned in your words, you drop your functions in a script file - hence you will need to create the script first, which can be done using:
use_r("function_name")
A new R script will pop up in your working session, ready to be used.
Now go ahead and write your function(s) in it.
(5) After your done, you need to load the function(s) you have written for your package. This is accomplished by using the devtools::load_all() function.
When you execute load_all() in the console, you'll get to know that the functions have been loaded into your package when you'll see Loading package_name displayed in console.
You can try calling your functions after that in the console to verify that they work as a part of the package.
(6) Now that your function has been written and loaded into your package, it is time to move onto checks. It is a good practice to check the whole package as we make changes to our package. The function devtools::check() offers an easy way to do this.
Try executing check() in the console, it will go through a number of procedures checking your package for warnings/errors and give details for the same as messages on the screen (pertaining to what are the errors/warnings/notes). The R CMD check results at the end will contain the vital logs for you to see what are the errors and warnings you got along with their frequency.
If the functions in your package are written well, (with additional package dependencies taken care of) it will give you two warnings upon execution of check:
The first warning will be regarding the license that your package uses, which is not specified for a new pacakge.
The second should be the one for documentation, warning us that our code is not documented.
To resolve the first issue which is the license, use the use_mit_license("license_holder_name") command (or any other license which suits your package - but then for private use as you mentioned, it doesn't really matter what you specify if only your going to use it or not its to be distributed) with your name as in place of license_holder_name or anything which suits a license name.
This will add the license field in the .DESCRIPTION file (in your files panel) plus create additional files adding the license information.
Also you'll need to edit the .DESCRIPTION file, which have self-explanatory fields to fill-in or edit. Here is an example of how you can have it:
Package: Your_package_name
Title: Give a brief title
Version: 1.0.0.0
Authors#R:
person(given = "Your_first_name",
family = "Your_surname/family_name",
role = c("package_creator", "author"),
email = "youremailaddress#gmail.com",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: Give a brief description considering your package functionality.
License: will be updated with whatever license you provide, the above step will take care of this line.
Encoding: UTF-8
LazyData: true
To resolve the documentation warning, you'll need to document your function using roxygen documentation. An example:
#' #param a parameter one
#' #param b parameter two
#' #return sum of a and b
#' #export
#'
#' #examples
#' yourfunction(1,2)
yourfunction <- function(a,b)
{
sum <- a+b
return(sum)
}
Follow the roxygen syntax and add attributes as you desire, some may be optional such as #title for specifying title, while others such as #import are required (must) if your importing from other packages other than base R.
After your done documenting your function(s) using the Roxygen skeleton, we can tell our package that we have documented our functions by running devtools::document(). After you execute the document() command, perform check() again to see if you get any warnings. If you don't, then that means you're good to go. (you won't if you follow the steps)
Lastly, you'll need to install the package, for it to be accessible by R. Simply use the install() command (yes the same one you used at the beginning, except you don't need to specify the package here like install("package") since you are currently working in an instance where the package is loaded and is ready to be deployed/installed) and you'll see after a few lines of installation a statement like "Done (package_name)", which indicates the installation of our package is complete.
Now you can try your function by first importing your package using library("package_name") and then calling your desired function from the package. Thats it, congrats you did it!
I've tried to include the procedure in a lucid way (the way I create my R packages), but if you have any doubts feel free to ask.

Suppress mFilter onLoad message

I'm creating an R package and it cannot show any kind of message from imported packages when it loads. I'm having a problem with an specific package, mFilter. If I import it, I always get
‘mFilter’ version: 0.1-3
‘mFilter’ is a package for time
series filtering
See ‘library(help="mFilter")’ for
details
Author: Mehmet Balcilar,
mbalcilar#yahoo.com
when the user loads my package, regardless of adding suppressMessages('mFilter') in the .onLoad file.
I really need to use mFilter. So removing it from Imports list doesn't help. Does anyone know what should I do?
I don't think you can. In the mFilter package, instead of using message() in .onLoad(), the authors have incorrectly used
if(interactive() || getOption("verbose"))
writeLines(strwrap(txt, indent = 4, exdent = 4))
If you are using the package interactively this will always execute and won't be suppressed.
If you can limit your use of mFilter to just a few functions, you could Suggest mFilter rather than Import or Depend on it. Then, in the functions that need it, you could capture.output(require(mFilter, quietly = TRUE)) to load the package (and stop with a message that mFilter needs to be installed if the load is unsuccessful).
Alternately, you could take the same approach, but have the loading of mFilter take place in your package's .onLoad.
You may even be able to do something tricky where mFilter is listed in DESCRIPTION Imports (to guarantee it gets installed) but isn't imported in the NAMESPACE file. It would probably (at least) throw a warning during check, but it would probably work just fine.

Load data object when package is loaded

Is there a way to automatically load a data object from a package in memory when the package is loaded (but not yet attached)? I.e. the opposite of lazy loading? The object is used in one of the package functions, so it needs to be available at all time.
When the package is set to lazydata=false, the data object is not exported by the package at all, and needs to be loaded manually with data(). We could use something like:
.onLoad <- function(lib, pkg){
data(mydata, package = pkg)
}
However, data() loads the object in the global environment. I strongly prefer to load it in the package environment (which is what lazydata does) to prevent masking conflicts.
A workaround is to bypass the data mechanics completely, and simply hardcode the object in the package. So the package myscore.R would look like
mymodel <- readRDS("inst/mymodel.rds")
myscore <- function(newdata){
predict(mymodel, newdata)
}
But this will lead to a huge packagedb for large data objects, and I am not sure what are the consequences of that.
As you say
The object is used in one of the package functions, so it needs to be available at all time.
I think the author of that package should really NOT use data(.) for that.
Instead he should define the object inside his /R/ either by simple R code in an R/*.R file,
or by using the sysdata.rda approach that is explained in the famous first reference for all these question,
"Writing R Extensions". In both cases the package author can also export the object which is often desirable for other users as in your case.
Of course this needs a polite conversation between you and the package author, and will only apply to the next version of that package.
I'm going to post this since it seems to work for my use case.
.onLoad() is:
function(lib,pkg)
data(mydata, package=pkg,
environment=parent.env(environment()))
Also need Imports: utils in DESCRIPTION and importFrom(utils, data) in NAMESPACE in order to pass R CMD check.
In my case I don't need the data object to be visible to the user, I need it to be visible to one of the functions in the package. If you need it visible to the user, that's going to be even harder (I think) because as far as I can tell you can't export data, just functions. The only way I've thought of to export data is to export a wrapper function for the data.

R devel: Warning: multiple methods tables found for ‘append’

I am maintaining an R package that recently started throwing the following warning during R CMD check packagename:
** testing if installed package can be loaded
Warning: multiple methods tables found for ‘append’
(The package is called phyloseq, and the branch that is currently causing me this problem is here)
Refined subquestions:
So the "multiple methods tables" part, this seems to imply that I have two dependent packages with a collision over dispatch for the append method. Right?
I don't have a function/method named "append" in this package, and don't import any.
I was able to reproduce the warning message in a new R session by simply loading two of the packages in R at the same time, one of which (RJSONIO) is a second-level dependency -- by which I mean one of my dependencies (biom) depends on it, but not mine:
library("RJSONIO");library("Biostrings")
Which throws the warning in the R session:
multiple methods tables found for ‘append’
And naturally, append is exported in the NAMESPACE file of both RJSONIO and Biostrings. What I don't understand is why this should cause a problem when loading my package. The packages I directly depend on (Biostrings-2.28.0, biom-0.3.8) are not fully imported -- certainly not importing any append methods. How else could this conflict arise?
Workaround:
If I update Biostrings to the "devel" version, 2.29.2, then the warning appears to go away. Most users will not do this, however, and I'd still like to understand how this collision is even possible, given the way I specifically imported functions and classes from these packages rather than full Import or Depends.

Resources