Rcpp beginner's question:
I want to improve my execution efficiency in R. So I write some code in cpp and use Rcpp to help me compile them.
Question is that I use some other R packages in my .cpp files and I want those packages to be installed and imported automatically when a user installs my package.
e.g. If I use the R package 'gtools' in my files, I don't want the error:
* installing to library 'C:/Program Files/R/R-3.4.1/library'
* installing *source* package 'pkgname' ...
make: Nothing to be done for `all`.
** libs
installing to C:/Program Files/R/R-3.4.1/library/pkgname/libs/i386
** R
** preparing package for lazy loading
Error in library(gtools) : there is no package called 'gtools'
Error : unable to load R code in package 'pkgname'
ERROR: lazy loading failed for package 'pkgname'
* removing 'C:/Program Files/R/R-3.4.1/library/pkgname'
Exited with status 1.
I tried to add depended package name to the DESCRIPTION file. i.e.
Imports: Rcpp (>= 0.12.12),gtools
LinkingTo: Rcpp, gtools
But it gives me following error:
ERROR: dependency 'gtools' is not available for package 'pkgname'
I don't find any similar questions and please tell me if there are.
First, you should probably make sure gtools is installed on your system. I say this because of the following error:
Error in library(gtools) : there is no package called 'gtools'
With this being said, the main issue you are running into is uncertainty between the LinkingTo: and Imports: fields in the DESCRIPTION file. This is covered in Section 1.1.3: Package Dependencies of Writing R Extensions.
Specifically, we have:
The ‘Imports’ field lists packages whose namespaces are imported from (as specified in the NAMESPACE file) but which do not need to be attached. Namespaces accessed by the ‘::’ and ‘:::’ operators must be listed here, or in ‘Suggests’ or ‘Enhances’ (see below). Ideally this field will include all the standard packages that are used, and it is important to include S4-using packages (as their class definitions can change and the DESCRIPTION file is used to decide which packages to re-install when this happens). Packages declared in the ‘Depends’ field should not also be in the ‘Imports’ field. Version requirements can be specified and are checked when the namespace is loaded (since R >= 3.0.0).
And the LinkingTo field:
A package that wishes to make use of header files in other packages needs to
declare them as a comma-separated list in the field ‘LinkingTo’ in the
DESCRIPTION file. For example
LinkingTo: link1, link2
The ‘LinkingTo’ field can have a version requirement which is checked at installation.
Specifying a package in ‘LinkingTo’ suffices if these are C++ headers containing source code or static linking is done at installation: the packages do not need to be (and usually should not be) listed in the ‘Depends’ or ‘Imports’ fields. This includes CRAN package BH and almost all users of RcppArmadillo and RcppEigen.
For another use of ‘LinkingTo’ see Linking to native routines in other packages.
So, the Imports: is meant to specify packages that contain R functions that you wish to import. In particular, the function from a given package or the entire package itself must be specified in the NAMESPACE file. For packages that use Rcpp, you can typically expect R functions to be available if the author has exported the routine from C++.
Now, regarding the LinkingTo:, this is a bit more specific. If an author wishes to make available a C++ API via header files they must explicitly declare the statements as is given in native methods of Writing R Extensions. Generally, packages that proceed in this manner are "header-only". These packages place the header definitions under inst/include, e.g.
|- pkgname
|- inst/
|- include/
|- pkgname.h
|- R/
|- man/
|- DESCRIPTION
|- NAMESPACE
However, another trend is to allow for "non-header" packages. This leads to a bit more complicated of topic as you have to understand shared objects and dynamic libraries. CRAN presents an overview of how to "Link" packages in Section 5.8: Linking to other packages of Writing R Extensions
If the author does not make available a C++ API, then there are four options:
Ask the author nicely to support calling the C++ API or submit a patch that enables access to the C++ API.
Call an R function from C++. (This negates any performance gain from writing your code in C++ though.)
Copy the implementation from the author's package while respecting their intellectual property.
Implement the desired functionality from scratch to avoid licensing issues.
Unfortunately, this is the case for gtools. As the author(s) do not provide a means to "link" to the C++ version of package's code.
Related
When I use a package in R I install it and use it with loading it. Now what if I add a package which uses another package? Is this package automatically downloaded and loaded too? Or is it in general forbidden for a R package to use another package? I don't think that.
Suppose I want to publish a R package. Within my code, can I use functions from other packages and install and load these packages? Or how does this work when I need functions from other packages? Do I have to implement a message that this and that package is needed and that the user has to install and load it prior to it and I need to implement error catching functions in case the package cannot be found on the pc system?
When I want to publish a R package, can I use/call Java code within my package/code?
For a package which was already published - so let's take just as an example the fGarch package - I would like to see the complete code. How can I see this? I know that R is open source and I think it is more or less possible to just enter a function empty and get the code displayed, but sometimes this does not work and especially my question is: Is there a way I can look into the whole code of the package?
For a package which was already published, is it possible to see and look into all files which were submitted? So like a repository as git where all files are submitted - the code itself and further files which are needed like description files or whatever - and I can see these files and look into them?
Furthermore regarding this post here and hiding functions: Is there code in a R package which I cannot see as an end user? This refers also to my previous question, how can I or which way can I see the whole code in a R package?
I guess you have a few different questions here. Let's take them in the order you asked them:
What if I add a package which uses another package? Is this package automatically downloaded and loaded too? Or is it in general forbidden for a R package to use another package?
It is certainly not forbidden for an R package to use another R package. In fact, the majority of R packages rely on other packages.
The source code for each R package must include a text-based DESCRIPTION file in the root directory. In this file you will find (among other things) a "Depends" field, and an "Imports" field. Together, these two fields list all the other packages required to use this package. If a user doesn't already have these other packages installed in their local library, R will install them automatically when it installs the requested package.
If your package lists a dependency in "Depends", then the dependency package is attached whenever your package is attached. Thus if you looked at the source code for a package called "foo" and you see that its DESCRIPTION file contains the line
Depends: bar,
you know that when you call library(foo) in your R console, you have effectively done library(bar); library(foo)
This isn't always ideal. The package foo might only need a couple of functions from package bar, and bar might contain some other functions whose names could clash with other commonly used functions. Therefore, in general, if you are writing a package and you only want to use a few functions from another package, it would be better to use "Imports" rather than "Depends" to limit the number of unnecessary symbols being added to your user's search path.
Suppose I want to publish a R package. Within my code, can I use functions from other packages and install and load these packages
Yes, you can use functions from other packages. The simplest way to do this is to include the name of the package in the Depends field of your DESCRIPTION file.
However, when using just a few functions from another package inside your own package, best practice is to use the "Imports" field in the DESCRIPTION file, and use a namespace qualifier for the imported function in your actual R code. For example, if you wanted to use ggplot from the ggplot2 package, then inside your function you would call it ggplot2::ggplot rather than just ggplot.
If you publish your package for others to use, the dependencies will be installed automatically along with your package if the user calls install.packages with the default settings. For example, when I did:
install.packages("fGarch")
I got the associated message:
#> also installing the dependencies ‘timeSeries’, ‘fBasics’, ‘fastICA’
Do I have to implement a message that this and that package is needed and that the user has to install and load it prior to it and I need to implement error catching functions in case the package cannot be found on the pc system?
No, not in general. R will take care of this as long as you have listed the correct packages in your DESCRIPTION file.
When I want to publish a R package, can I use/call Java code within my package/code?
R does not have a native Java API, but you can use your own Java code via the rJava package, which you can list as a dependency for your package. However, there are some users who have difficulty getting Java to run, for example business and academic users who may use R but do not have Java installed and do not have admin rights to install it, so this is something to bear in mind when writing a package.
For a package which was already published - so let's take just as an example the fGarch package - I would like to see the complete code. How can I see this?
Every package available for download from CRAN has its source code available. In the case of fGarch, its CRAN page contains a link to the gzipped tarball of the source code. You can download this and use untar in R to review all the source code. Alternatively, many packages will have an easily-found repository on Github or other source-control sites where you can examine the source code via a browser. For example, you can browse the fGarch source on Github here.
For a package which was already published, is it possible to see and look into all files which were submitted? So like a repository as git where all files are submitted - the code itself and further files which are needed like description files or whatever - and I can see these files and look into them?
Yes, you can look at all the sources files for all the packages uploaded to CRAN on Github at the unofficial Github CRAN mirror here
Is there code in a R package which I cannot see as an end user? This refers also to my previous question, how can I or which way can I see the whole code in a R package?
As above, you can get the source code for any package via CRAN or Github. As you said, you can look at the source code for exported functions just by typing the name of that function into R. For unexported functions, you can do the same with a triple colon. For example, ggplot2:::adjust_breaks allows you to see the function body of the unexported function adjust_breaks from ggplot2. There are some complexities when an object-oriented system like S4, ggproto or R6 is used, or when the source code includes compiled C or C++ code, but I haven't come across a situation yet in which I was not able to find the relevant source code after a minute or two with an R console and a good search engine.
I am building a package for the first time.
The package needs couple of libraries to work. Should I include those libraries in the package for every function? or should I include them in my main Script?
In the DESCRIPTION file in your package you can list the packages your package depends on. This will allow you to use the code from those packages to be used anywhere in your package. So, there is no need for explicit use of library or require. When your package is loaded, the other packages will also be loaded. In addition, when setting dependencies = TRUE in install.packages the packages your package depends on will also be installed (if available on CRAN).
In R what is the difference between a library and a package?
I have come across posts where people refer to packages within a library. Based on this idea I interpret it that a package lives in a library (i.e I store my packages with a designated library). However I get confused when I want to use package 'x'.
I am under the imperssion I need to call the library function to get package 'x' to be in use ?
And once I have have called upon package 'x' the functions of package 'x' then become available to me ?
In R, a package is a collection of R functions, data and compiled code. The location where the packages are stored is called the library. If there is a particular functionality that you require, you can download the package from the appropriate site and it will be stored in your library. To actually use the package use the command "library(package)" which makes that package available to you. Then just call the appropriate package functions etc.
1. Package
Package extends basic R functionality and standardizes the distribution of code. For example, a package can contain a set of functions relating to a specific topic or tasks.
Packages can be distributed as SOURCE (a directory with all package components), BINARIES (contains files in OS-specific format) or as a BUNDLE (compressed file containing package components, similar to source).
The most basic package, for example created with,
library(devtools)
create("C:/Users/Documents/R-dev/MyPackage")
contains:
R/ directory where all the R code goes to, and DESCRIPTION and NAMESPACE metadata files.
2. Library
Library is a directory where the packages are stored. You can have multiple libraries on your hard drive.
To see which libraries are available (which paths are searched for packages):
.libPaths()
And to see which packages are there:
lapply(.libPaths(), dir)
To use package ‘x’, it first has to be installed in a package library. This can be done for example, with:
install.packages(‘x’) # to install packages from CRAN
or
R CMD INSTALL Xpackagename.tar.gz #to install directly from source
Once installed it has to be loaded into memory with library(x) or require(x).
Hi I am following the tutorial here from Hilary and here from Hadley Wickham trying to create a dummy package.
However, my package need some external dependencies XML and RCurl in this case, when I run the command document, it will complain that:
> setwd('/home/datafireball/projects/Rprojects/rgetout/rgetout')
> document()
Error: could not find function "document"
> library(devtools)
> document()
Updating rgetout documentation
Loading rgetout
Loading required namespace: XML
Error in (function (dep_name, dep_ver = NA, dep_compare = NA) :
Dependency package XML not available.
>
Here is my DESCRIPTION file.
Package: rgetout
Title: A R package to get all the outlinks for a given URL
Version: 0.1
Authors#R: "Eric Cartman <Eric.Cartman#gmail.com> [aut, cre]"
Description: This package is intended to include as much web extraction functionality as much as possible. It starts with one function. getout will extract
all the outlinks for a given URL with a user-agent that you can customize.
Depends: R (>= 3.0.2)
Imports:
XML,
RCurl
License: MIT
LazyData: true
Here is the source code github repo if you want to get more info.
If you are having problems with this, even when you have the packages installed and loaded, I suggest you to do the following.
Delete the Imports: and Suggests: entries of your DESCRIPTION file.
Make sure you have usethis working by doing library(usethis)
Now start adding the libraries to your DESCRIPTION file, by running the following command on your console: usethis::use_package("dplyr") for any Imports: you need. Repeat this step for every library that is required.
In my case, dplyr was the one refusing to load. You can decide where the package will be located by doing: usethis::use_package("dplyr", "Suggests").
It is assumed that you will have the required tools / dependencies for developing a package when you are
doing so.
utils::install.packages has a dependencies argument that will attempt to install uninstalled packages on which a package depends / (in whichever way they are dependent (suggests/ depends/linkingTo).
devtools::install_github will perform similarly.
Installing a package and documenting it as a component of development are quiet different activities
.
Although there are quite a few postings on similar topics, none of them helped me understanding how to setup the DESCRIPTION file an R package.
My questions are:
1.) Is my description file correct now? Did I use "depends" and "imports" correctly? (maybe duplicate question...)
2.) Are required packages (dependencies?) automatically installed along with my package when needed, or "loaded" when one of my package function needs to refer to a function of an imported package? (didn't find anything on this issue yet...)
I tried to submit a package to CRAN and got following feedback:
checking package dependencies ... NOTE
Depends: includes the non-default packages:
‘MASS’ ‘car’ ‘foreign’ ‘ggplot2’ ‘lmtest’ ‘plyr’ ‘reshape2’ ‘scales’
Adding so many packages to the search path is excessive and importing selectively is preferable.
I originally had listed all above mentioned packages in the depends section of the DESCRIPTION file. In the NAMESPACE file, I used import(pkgName) for all packages listed above.
After that, I updated my files using importFrom(pkgName, function) in the NAMESPACE file and moved most of the packages to the imports section of my DESCRIPTION file. The package check with the current R-devel-version no longer gives this note. Here's an extract of my DESCRIPTION file:
License: GPL-3
Depends:
ggplot2
Imports:
MASS,
car,
foreign,
lmtest,
plyr,
reshape2,
scales
Collate:
'sjImportSPSS.R'
and the NAMESPACE file:
import(ggplot2)
importFrom(MASS,lda)
importFrom(MASS,loglm)
importFrom(car,crPlots)
importFrom(car,durbinWatsonTest)
importFrom(car,influencePlot)
importFrom(car,leveragePlots)
importFrom(car,ncvTest)
importFrom(car,outlierTest)
importFrom(car,spreadLevelPlot)
importFrom(car,vif)
importFrom(foreign,read.spss)
importFrom(lmtest,bptest)
importFrom(plyr,adply)
importFrom(plyr,ddply)
importFrom(reshape2,melt)
importFrom(scales,brewer_pal)
importFrom(scales,percent)
I'm unsure whether this approach addresses the issue given in the check note above. Furthermore, when I load my package with library(sjPlot), ggplot2 is also attached, but none of the other packages. Does my package still work for other users? What if they don't have all needed packages installed?
From ?install.packages the default behavior is that Depends: and Imports: packages are installed if not already installed. Check out sessionInfo() and you'll see your Imports: are loaded (resident in memory) but not attached (available on disk). If your importFrom statements cover the symbols used in your package code, then your code will work for others (if there were missing imports, you would be warned about undefined global variables).