R package CRAN note for package dependencies and warnings in tests - r

I'm planning to submit my first package to CRAN. I've heard that you should not have any errors, warnings or notes. However, I get the Note stating that there are too many package dependencies:
"Imports includes 24 non-default packages.
Importing from so many packages makes the package vulnerable to any of
them becoming unavailable. Move as many as possible to Suggests and
use conditionally."
Is this note something I have to address in regard to a CRAN submission?
Does it make a difference to state that all/most packages used is OK to be included because they are well-maintained?
Is it possible to use tidyverse as a dependency instead of each individual package (I understand that this to some extent defeats the purpose with the limit; although having a 20-package-limit feels rather arbitrary anyway and the focus should also be on using well-maintained packages).
Warnings in tests
I have created test cases for the package; however, in order to keep the size limit I need to get use fewer cases than normally used; and this creates different warnings when running the test. Is this OK to have these test related warnings when submitting CRAN?
Thanks in advance!
John

In most cases, "Notes" won't automatically cause a reviewer to reject your submission, assuming you otherwise passed R CMD CHECK --as-cran [yourpackage] . In this case, I would take the advice to heart.
First, decide if you really, truly need all those imports at all , let alone as imports. That does seem like a very large collection. Make sure you can't, for example, call some functions in referenced packages A, B, C, and D rather than similar functions in packages K, Q, and T (listing your references from A to X). If you're only using one standalone function from a package, i.e. a function which doesn't depend on any other item in that package, copy the source code from there, with attribution, into your package's source directory.
Second, Only import them if they're needed for your functions to be able to execute regardless of their argument lists. Packages which only support specific "modes" or options should be moved to Suggests .
The relevant portion of the document "R_exts" , which I hope you've read, is quoted below.
All packages that are needed7 to successfully run R CMD check on the
package must be listed in one of ‘Depends’ or ‘Suggests’ or ‘Imports’.
Packages used to run examples or tests conditionally (e.g. via
if(require(pkgname))) should be listed in ‘Suggests’ or ‘Enhances’.
(This allows checkers to ensure that all the packages needed for a
complete check are installed.) In particular, packages providing
“only” data for examples or vignettes should be listed in ‘Suggests’
rather than ‘Depends’ in order to make lean installations possible.
Version dependencies in the ‘Depends’ and ‘Imports’ fields are used by
library when it loads the package, and install.packages checks
versions for the ‘Depends’, ‘Imports’ and (for dependencies = TRUE)
‘Suggests’ fields. It is increasingly important that the information
in these fields is complete and accurate: it is for example used to
compute which packages depend on an updated package and which packages
can safely be installed in parallel. This scheme was developed before
all packages had namespaces (R 2.14.0 in October 2011), and good
practice changed once that was in place. Field ‘Depends’ should
nowadays be used rarely, only for packages which are intended to be
put on the search path to make their facilities available to the end
user (and not to the package itself): for example it makes sense that
a user of package latticeExtra would want the functions of package
lattice made available. Almost always packages mentioned in ‘Depends’
should also be imported from in the NAMESPACE file: this ensures that
any needed parts of those packages are available when some other
package imports the current package. The ‘Imports’ field should not
contain packages which are not imported from (via the NAMESPACE file
or :: or ::: operators), as all the packages listed in that field need
to be installed for the current package to be installed. (This is
checked by R CMD check.) R code in the package should call library or
require only exceptionally. Such calls are never needed for packages
listed in ‘Depends’ as they will already be on the search path. It
used to be common practice to use require calls for packages listed in
‘suggests’ in functions which used their functionality, but nowadays
it is better to access such functionality via :: calls.

Related

Julia package available from a registry

I added the package Knet with Pkg.add("Knet") and noticed that several packages were installed including CUDA. However, after the installation finished when I try:
using CUDA
it says that this package is not found but that it is available from a registry. It seems that this package is a requirement for Knet and it is installed but then one cannot access it right away. Do you know what is happening behind scenes? Thanks.
The underlying mechanism is a bit complex, and is described in detail here.
But the general logic is as follows: you can use (with using or import) the packages that you have explicitly installed. However, such packages might depend on other packages. Julia will automatically decide what other packages are needed to be installed, but they will be not visible in your project unless you explicitly install them.
In fact, typically, on one computer you will have hundreds of packages installed in one place (to avoid having to download and precompile them each time), but each individual project will have access only to packages that you explicitly specify you want to use in this project. The information what packages should be visible in an individual project is typically contained in the Project.toml file as is described here.
You can find more information how to manage projects in Julia here.

How to submit an R package to CRAN via GitHub Actions?

I've created an R package and I'd like to upload it to CRAN via GitHub Actions whenever I merge changes into the master branch. I've found a lot of examples of R actions and I've even looked up how some of the most popular packages like dplyr do it and even though I've found a devtools::release() helper function, I still haven't seen a workflow that would submit a library to CRAN when you merge changes into the master branch. Do package developers do this manually? Is there any reason why this hasn't been automated?
CRAN works quite differently from other language repositories, as uploads are not fully automated like in e.g. PyPI.
When you upload a new package, it is subject to verification from an actual human. When you update a package, if it triggers certain checks it will also be subject to a new review from a human. When a package uploads successfully and passes the first verification, many automated checks are run for it over the course of weeks (e.g. different OSes, compilers, compiler options, architectures, sanitizers, valgrind, etc.), and precompiled binaries are automatically generated for some platforms and R versions from your source code.
The CRAN policies explicitly state that frequent updates are not allowed, and you're not supposed to be submitting uploads any faster than once every couple months, for which I think this level of automation would not be worth it.
Even if you do want to automate this process, there is an email verification in the middle, so you'd perhaps have to do something with selenium + other scripts.
BTW if you are worried about complicated building processes and are using RStudio, you can configure on a per-project basis what arguments to use when building source or binary distributions of your package.

Can CRAN packages be modified and re-uploaded by others?

I've been working with R for a long time, but I'm a complete newbie in writing (and/or publishing) own packages via CRAN. Actually, I create a new package for educational purposes (university) and I want to load it to CRAN, so my students (and, of course, others) can download and use it.
After I uploaded my package (let's call it “JohnnyStat”), is it possible that another person (let's call “Mark Miller”) modifies it and adds his name as another “co-author” (“author”/“contributor” etc.)?
So, as a result, the package “JohnnyStat” would be registered as written by “Johnny” AND “Mark Miller”?
No. Only the maintainer can upload package updates, not any co-author. An acceptance mail is automatically sent to the mail address of the maintainer. And no-one can become maintainer without the explicit consent of the previous maintainer (for obvious reasons).
If you want the possibility of various people modifying the package, maybe CRAN is not the best option. It is possible to install from other repositories. Why not have the package at e.g. R-forge or github?
You may get a more complete answer if you ask this on the CRAN mailing list R-package-devel.

R Package with Large Size External Assets

This is a followup to a question I posted earlier. To summarize, I am writing an R Package called Slidify, which makes use of several external non-R based libraries. My earlier question was about how to manage dependencies.
Several solutions were proposed, of which the most attractive solution was to package the external libraries as a different R package, and make it a dependency for Slidify. This is the strategy followed by the package xlsx, which packages the java dependencies as a different package xlsxjars.
An alterative is for me to provide the external libraries as a download and package a install_libraries function within Slidify, which would automatically fetch the required files and download it into the package directory. I can also add an update_libraries function which would update if things change.
My question is, are there any specific advantages to doing the CRAN dance for external libraries which are not R based. Am I missing something here?
As discussed in the comment-thread, for a package like slidify with a number of large, (mostly) fixed, and portable files, a "resource" package makes more sense:
you will know the path where it installed (as the package itself will tell you)
users can't accidentally put it somewhere else
you get CRAN tests
you get CRAN distribution, mirrors, ...
users already know install.packages() etc
the more nimble development of your package using these fixed parts is not held back by the large support files

How do you use multiple versions of the same R package?

In order to be able to compare two versions of a package, I need to able to choose which version of the package that I load. R's package system is set to by default to overwrite existing packages, so that you always have the latest version. How do I override this behaviour?
My thoughts so far are:
I could get the package sources, edit the descriptions to give different names and build, in effect, two different packages. I'd rather be able to work directly with the binaries though, as it is much less hassle.
I don't necessarily need to have both versions of the packages loaded at the same time (just installed somewhere at the same time). I could perhaps mess about with Sys.getenv('R_HOME') to change the place where R installs the packages, and then .libpaths() to change the place where R looks for them. This seems hacky though, so does anyone have any better ideas?
You could selectively alter the library path. For complete transparency, keep both out of your usual path and then do
library(foo, lib.loc="~/dev/foo/v1") ## loads v1
and
library(foo, lib.loc="~/dev/foo/v2") ## loads v2
The same works for install.packages(), of course. All these commands have a number of arguments, so the hooks you aim for may already be present. So don't look at changing R_HOME, rather look at help(install.packages) (assuming you install from source).
But AFAIK you cannot load the same package twice under the same name.
Many years have passed since the accepted answer which is of course still valid. It might however be worthwhile to mention a few new options that arised in the meanwhile:
Managing multiple versions of packages
For managing multiple versions of packages on a project (directory) level, the packrat tool can be useful: https://rstudio.github.io/packrat/. In short
Packrat enhances your project directory by storing your package dependencies inside it, rather than relying on your personal R library that is shared across all of your other R sessions.
This basically means that each of your projects can have its own "private library", isolated from the user and system libraries. If you are using RStudio, packrat is very neatly integrated and easy to use.
Installing custom package versions
In terms of installing a custom version of a package, there are many ways, perhaps the most convenient may be using the devtools package, example:
devtools::install_version("ggplot2", version = "0.9.1")
Alternatively, as suggested by Richie, there is now a more lightweight package called remotes that is a result of the decomposition of devtools into smaller packages, with very similar usage:
remotes::install_version("ggplot2", version = "0.9.1")
More info on the topic can be found:
https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
I worked with R for a longtime now and it's only today that I thought about this. The idea came from the fact that I started dabbling with Python and the first step I had to make was to manage what they (pythonistas) call "Virtual environments". They even have dedicated tools for this seemingly important task. I informed myself more about this aspect and why they take it so seriously. I finally realized that this is a neat and important way to manage different projects with conflicting dependencies. I wanted to know why R doesn't have this feature and found that actually the concept of "environments" exists in R but not introduced to newbies like in Python. So you need to check the documentation about this and it will solve your issue.
Sorry for rambling but I thought it would help.

Resources