How to make R package recommend a package hosted on GitHub? - r

I'm developing an R package that works as a wrapper for functions from the parallel and Rhpc packages called ctools. I know that if I want my package to require these packages I need to include them in the Imports section of the DESCRIPTION file. When installing my package, these packages will be installed from CRAN. Similarly I can put them in the Suggests section if they aren't required, but useful. These won't be installed with my package.
But, I've forked the Rhpc package and added a function that I use in my ctools package. How do I get my package to Suggest/Import this package from my GitHub repo so of instead of installing the Rhpc package, it executes devtools::install_github("bamonroe/Rhpc")?

From the manual (and quoting source here):
#c DESCRIPTION field Additional_repositories
The #samp{Additional_repositories} field is a comma-separated list of
repository URLs where the packages named in the other fields may be
found. It is currently used by #command{R CMD check} to check that the
packages can be found, at least as source packages (which can be
installed on any platform).
You can add the package to Suggests: and point to additional repositories -- possibly created using drat. There used to a package doing that, and IIRC there is another one doing it now but its name escaped me now.
Edit: Found it! See here in the source DESCRIPTION file of RNeXML -- and note how the line disappears in the posted DESCRIPTION on CRAN. Better still, note how two of the packages in Suggests: are not listed a hyperlinks on CRAN. I thinks those come from the additional repos. And yes, rOpenSci uses drat to manage that.
Edit 2: And just to close the loop, you (easily) use drat to host such an additional repo on GitHub -- the prime use case for drat.
Edit 3: RNeXML has dropped the additional repository, but the github history still has it.
Edit 4: Currently (i.e. on 2020-03-13), the CRAN packages EMC, bcmaps, blkbox, broom.mixed, epikit, grattan, gtsummary, hurricaneexposure, memoise, multinomialeq, noaastormevents, pointblank, provSummarize, provViz, spData, swephR, tashu, taxadb, waveformbildar all list a field Additional_repositories containing a URL pointing to a drat repo.

Related

Make CRAN R package suggest GitHub R package

I want to use the R package BOLTSSIRR available on GitHub in my R package, which I want to upload to CRAN.
I listed BOLTSSIRR under Suggests: in the DESCRIPTION file and made the link to GitHub available using Additional_repositories: https://github.com/daviddaigithub/BOLTSSIRR.
However, running R CMD check --as-cran I get:
Suggests or Enhances not in mainstream repositories:
BOLTSSIRR
Availability using Additional_repositories specification:
BOLTSSIRR no ?
? ? https://github.com/daviddaigithub/BOLTSSIRR
Additional repositories with no packages:
https://github.com/daviddaigithub/BOLTSSIRR
So the GitHub link does not seem to get recognized in the check. Might I have to change something here?
As you found, you can't use Remotes in a CRAN package. What you need to do is to make sure the .tar.gz file for the package you are depending on is available somewhere. Github doesn't do that automatically, because https://github.com/daviddaigithub/BOLTSSIRR isn't set up as a package repository.
The solution is to create your own small repository, and keep copies of non-CRAN packages there. The drat package (available here: https://github.com/eddelbuettel/drat) makes this easy as long as you have a Github account: follow the instructions here: https://github.com/drat-base/drat. In summary:
Fork https://github.com/drat-base/drat into your account, and clone it to your own computer.
Enable Github Pages with the docs/ folder in the main branch.
Install the drat package into R using remotes::install_github("eddelbuettel/drat"). (I assume this version will make it to CRAN eventually; if you use the current CRAN version instructions are slightly more complicated.)
Build the package you want to insert. You need the source version; you might want binaries too, if those are hard for your users to build.
Run options(dratBranch="docs"); drat::insertPackage(...) to insert those files into your repository.
Commit the changes, and push them to Github.
In the package that needs to use this non-CRAN package, add
Additional_repositories: https://yourname.github.io/drat
to the DESCRIPTION.
You will be responsible for updating your repository if BOLTSSIRR is updated. This is good because the updates might break yours: after all, it's still in development mode. It's also bad because your users won't automatically get bug fixes.
That's it, if I haven't missed anything!

Questions about R package publishing and code visibility

When I use a package in R I install it and use it with loading it. Now what if I add a package which uses another package? Is this package automatically downloaded and loaded too? Or is it in general forbidden for a R package to use another package? I don't think that.
Suppose I want to publish a R package. Within my code, can I use functions from other packages and install and load these packages? Or how does this work when I need functions from other packages? Do I have to implement a message that this and that package is needed and that the user has to install and load it prior to it and I need to implement error catching functions in case the package cannot be found on the pc system?
When I want to publish a R package, can I use/call Java code within my package/code?
For a package which was already published - so let's take just as an example the fGarch package - I would like to see the complete code. How can I see this? I know that R is open source and I think it is more or less possible to just enter a function empty and get the code displayed, but sometimes this does not work and especially my question is: Is there a way I can look into the whole code of the package?
For a package which was already published, is it possible to see and look into all files which were submitted? So like a repository as git where all files are submitted - the code itself and further files which are needed like description files or whatever - and I can see these files and look into them?
Furthermore regarding this post here and hiding functions: Is there code in a R package which I cannot see as an end user? This refers also to my previous question, how can I or which way can I see the whole code in a R package?
I guess you have a few different questions here. Let's take them in the order you asked them:
What if I add a package which uses another package? Is this package automatically downloaded and loaded too? Or is it in general forbidden for a R package to use another package?
It is certainly not forbidden for an R package to use another R package. In fact, the majority of R packages rely on other packages.
The source code for each R package must include a text-based DESCRIPTION file in the root directory. In this file you will find (among other things) a "Depends" field, and an "Imports" field. Together, these two fields list all the other packages required to use this package. If a user doesn't already have these other packages installed in their local library, R will install them automatically when it installs the requested package.
If your package lists a dependency in "Depends", then the dependency package is attached whenever your package is attached. Thus if you looked at the source code for a package called "foo" and you see that its DESCRIPTION file contains the line
Depends: bar,
you know that when you call library(foo) in your R console, you have effectively done library(bar); library(foo)
This isn't always ideal. The package foo might only need a couple of functions from package bar, and bar might contain some other functions whose names could clash with other commonly used functions. Therefore, in general, if you are writing a package and you only want to use a few functions from another package, it would be better to use "Imports" rather than "Depends" to limit the number of unnecessary symbols being added to your user's search path.
Suppose I want to publish a R package. Within my code, can I use functions from other packages and install and load these packages
Yes, you can use functions from other packages. The simplest way to do this is to include the name of the package in the Depends field of your DESCRIPTION file.
However, when using just a few functions from another package inside your own package, best practice is to use the "Imports" field in the DESCRIPTION file, and use a namespace qualifier for the imported function in your actual R code. For example, if you wanted to use ggplot from the ggplot2 package, then inside your function you would call it ggplot2::ggplot rather than just ggplot.
If you publish your package for others to use, the dependencies will be installed automatically along with your package if the user calls install.packages with the default settings. For example, when I did:
install.packages("fGarch")
I got the associated message:
#> also installing the dependencies ‘timeSeries’, ‘fBasics’, ‘fastICA’
Do I have to implement a message that this and that package is needed and that the user has to install and load it prior to it and I need to implement error catching functions in case the package cannot be found on the pc system?
No, not in general. R will take care of this as long as you have listed the correct packages in your DESCRIPTION file.
When I want to publish a R package, can I use/call Java code within my package/code?
R does not have a native Java API, but you can use your own Java code via the rJava package, which you can list as a dependency for your package. However, there are some users who have difficulty getting Java to run, for example business and academic users who may use R but do not have Java installed and do not have admin rights to install it, so this is something to bear in mind when writing a package.
For a package which was already published - so let's take just as an example the fGarch package - I would like to see the complete code. How can I see this?
Every package available for download from CRAN has its source code available. In the case of fGarch, its CRAN page contains a link to the gzipped tarball of the source code. You can download this and use untar in R to review all the source code. Alternatively, many packages will have an easily-found repository on Github or other source-control sites where you can examine the source code via a browser. For example, you can browse the fGarch source on Github here.
For a package which was already published, is it possible to see and look into all files which were submitted? So like a repository as git where all files are submitted - the code itself and further files which are needed like description files or whatever - and I can see these files and look into them?
Yes, you can look at all the sources files for all the packages uploaded to CRAN on Github at the unofficial Github CRAN mirror here
Is there code in a R package which I cannot see as an end user? This refers also to my previous question, how can I or which way can I see the whole code in a R package?
As above, you can get the source code for any package via CRAN or Github. As you said, you can look at the source code for exported functions just by typing the name of that function into R. For unexported functions, you can do the same with a triple colon. For example, ggplot2:::adjust_breaks allows you to see the function body of the unexported function adjust_breaks from ggplot2. There are some complexities when an object-oriented system like S4, ggproto or R6 is used, or when the source code includes compiled C or C++ code, but I haven't come across a situation yet in which I was not able to find the relevant source code after a minute or two with an R console and a good search engine.

`subgraphMining` package not available

Was unable to install the package ‘subgraphMining’. The error says "is not available (for R version 3.4.3)". What steps are needed?
There is a package named subgraphMining which you can find by searching with Google using that term. (It's not a CRAN or Github hosted package.) It's found at a book website, http://www.csc.ncsu.edu/faculty/samatova/practical-graph-mining-with-R/PracticalGraphMiningWithR.html . It does require an additional package named igraph0 and there is an answered question on SO describing that issue. The igraph0 package is likewise not "available" for the current version of R but it is in the CRAN archives. So you also need the development tools for your OS (Mac in my case, so XCode and the Command Line Tools).
siteURL <- "https://www.csc2.ncsu.edu/faculty/nfsamato/practical-graph-mining-with-R/R-code/FrequentSubgraphMining.zip"
# Since I'm not a windoze user I unzip to a local disk.
install.packages("https://cran.r-project.org/src/contrib/Archive/igraph0/igraph0_0.5.7.tar.gz",
repo=NULL, type="source")
# I downloaded the package from the chapter: Frequent Subgraph Mining
install.packages("~/Downloads/FrequentSubgraphMining/subgraphMining_1.0.tar.gz",
repo=NULL, type="source")
library(subgraphMining)
It's not written particularly well. It doesn't, for instance, list any Imports or Depends in the package DESCRIPTION file and the author of igraph thinks that the package authors should have rewritten it to use the igraph package which is currently maintained. But I think that there is quite a bit of potential value in having its installation be possible in support of what appears to be a rather interesting set of methods described in hte book.

How can I use dependency packages that are not located on usual repositories while building a new package?

My package depends on another package which is not uploaded in cran, mran or even github and it has a .zip format. When submitting my package on cran it can't find the package and returns errors. How can I use the not uploaded package on public repositories, in a package that want to be submitted on cran?
Some ideas:
Ask the authors of the original package to submit it to CRAN.
If the package is open source, add it into your package and attribute the original authors (should probably add them as authors on the combined package; also would be a good idea to contact them first)
Create a drat repository for the dependent package and then add this repository in the field Additional_repositories in the DESCRIPTION file
The 3rd option is the only purely technical solution. See the drat documentation, this SO answer from the drat package author, and this thread on R-pkg-devel in which an R package author successfully submits to CRAN following this strategy.
Update: The CRAN package discussed above that used option 3 was wikipediatrend. This line in the DESCRIPTION file sets the Additional_repositories field.

Include non-CRAN package in CRAN package

The question is pretty simple. First:
Is it possible to include a non-CRAN (or bioconductor, or omega hat) package in a CRAN package and actually use tools from that package in examples.
If yes how does one set up the DESCRIPTION file etc. to make it legit and pass CRAN checks?
Specifically I'm asking about openNLPmodels.en that used to be a CRAN package. It's pretty useful and want to include functionality from it. I could do a work around and not actual use openNLPmodels.en in the examples or create unit tests for it, and have it install when a function gets use (similar to how the gender package installs the data sets it needs) but I'd prefer an approach that allows me to run checks, texts, examples.
This is how one downloads and installs openNLPmodels.en
install.packages(
"http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz",
repos=NULL,
type="source"
)
Existing answer is good but doesn't explain the whole process fully in details so posting this one.
Is it possible to include a non-CRAN (or bioconductor, or omega hat) package in a CRAN package and actually use tools from that package in examples.
Yes, it is possible. Any use (package code, examples, tests, vignettes) of such non-CRAN has to be escaped as any other package in Suggests, ideally using
if (requireNamespace("non.cran.pkg", quietly=TRUE)) {
non.cran.pkg::fun()
} else {
cat("skipping functionality due to missing Suggested dependency")
}
If yes how does one set up the DESCRIPTION file etc. to make it legit and pass CRAN checks?
You need to use Additional_repositories field in DESCRIPTION file. Location provided in that field has to contain expect directory structure, PACKAGES file in appropriate directory, and PACKAGES file has to have non-CRAN package listed.
Now going to your particular example of openNLPmodels.en package.
According to the way how you download and install this package it will not be possible to use it as dependency and pass on CRAN. openNLPmodels.en has to be published in a structure expected from R repository. Otherwise you don't have a valid location to put into Additional_repositories field.
What you can do is to download non-CRAN package and publish it in your R repository yourself, and then use that location in Additional_repositories field in your CRAN package.
Here is an example of how to do it:
dir.create("src/contrib", recursive=TRUE)
download.file("http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz", "src/contrib/openNLPmodels.en_1.5-1.tar.gz")
tools::write_PACKAGES("src/contrib")
We just put package sources in expected directory src/contrib and the rest is nicely handled by write_PACKAGES function. To ensure that repository is properly created you can list packages that are available in that repository:
available.packages(repos=file.path("file:/",getwd()))
It should list your non-CRAN package there.
Then having non-CRAN package published in R repository you should location of the repository into Additional_repositories field of your CRAN package. In this case location will be location returned by file.path("file:/",getwd()) expression.
Note that it uses location on your local machine, you will probably want to put it online, so that url can accessed by any machine checking your CRAN package, including checks on CRAN itself. For that just move your src directory to a public directory that is going to be hosted somewhere online and use the location of that server.
Now looking at your non-CRAN package again, we can see it has src/contrib in its url, thus we can assume that proper R repository already exists for it and we don't have to create and publish new one.
Therefore your installation instruction could look like
install.packages(
"openNLPmodels.en",
repos="http://datacube.wu.ac.at",
type="source"
)
And then all you need for your CRAN package is to use existing repository where it is available
Additional_repositories http://datacube.wu.ac.at
Its possible, but! ...
There is a field in the DESCRIPTION file that that you can use:
Additional_repositories: http://ghrr.github.io/drat
BUT!
Everything that depends on the functionality from the package from the additional repository has to be absolutely optional.
So packages from this repo should be placed under Suggests.
Example
I am not 100% sure whether or not BioConductor and OmegaHat are considered mainstream or not.
The usethis::use_dev_package function has solved this problem.
As an example, running this line:
usethis::use_dev_package(package = "h3", type = "Imports", remote = "crazycapivara/h3-r")
will automatically write the following lines to your DESCRIPTION file:
Imports:
h3 (>= 3.7.1)
Remotes:
crazycapivara/h3-r
Note that because github is the most commonly-used unofficial package distribution in R, it is the default. As such, make sure there is no github:: prefix to the entry in the Remotes section of the DESCRIPTION file.

Resources