I often use Rcpp code to incorporate C++ code into R. Through the BH-package I am also able to use the Boost-library. However, the Boost library lacks a function that I would like to use (to be precise, it only has Bessel function but I would like to get Log-Bessel immediately because of overflow). I know that Alglib does have this feature.
Would it be possible to use Alglib with Rcpp, that is, use the log-bessel function from Alglib somehow?
I do not see a clear difference in functionality between the
AlgLib documentation on Bessel functions, and
Boost documentation on Bessel functions.
As such, I think you can just use the BH package giving you all of Boost Math and then some.
Last but not least there is a package bessel on CRAN written by the R Core member focusing on special functions so you could start from there too.
Related
I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?
Is it a good practice in general case? Or what problem might it induce?
It all depends on context.
:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.
This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.
Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".
From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))
Many people seem to forget that every use of :: is an R
function call and using it is inefficient compared to just using
the already imported name.
The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.
As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.
Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.
Within your own package
The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:
Personally I've got the impression that :: is
much "overused" nowadays, notably in packages where I'd strongly
advocate using importFrom() in NAMESPACE, so all this happens
at package load time, and then not using :: in the package
sources itself.
On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:
It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.
With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.
If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add #author and #references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.
When making your own package for R, one often wants to make use of functions from a different package.
Maybe it's a plotting library like ggplot2, dplyr, or some niche function.
However, when making a function that depends on functions in other packages, what is the appropriate way to call them? In particular, I am looking for examples of when to use
myFunction <- function(x) {
example_package::function(x)
}
or
require(example_package)
myFunction <- function(x) {
function(x)
}
When should I use one over the other?
If you're actually creating an R package (as opposed to a script to source, R Project, or other method), you should NEVER use library() or require(). This is not an alternative to using package::function(). You are essentially choosing between package::function() and function(), which as highlighted by #Bernhard, explicitly calling the package ensures consistency if there are conflicting names in two or more packages.
Rather than require(package), you need to worry about properly defining your DESCRIPTION and NAMESPACE files. There's many posts about that on SO and elsewhere, so won't go into details, see here for example.
Using package::function() can help with above if you are using roxygen2 to generate your package documentation (it will automatically generate a proper NAMESPACE file.
The douple-colon variant :: has a clear advantage in the rare situations, when the same function name is used by two packages. There is a function psych::alpha to calculate Cronbach's alpha as a measure of internal consistency and a function scales::alpha to modify color transparency. There are not that many examples but then again, there are examples. dplyr even masks functions from the stats and base package! (And the tidyverse is continuing to produce more and more entries in our namespaces. Should you use dyplr you do not know, if the base function you use today will be masked by a future version of dplyr thus leading to an unexpected runtime problem of your package in the future.)
All of that is no problem if you use the :: variant. All of that is not a problem if in your package the last package opened is the one you mean.
The require (or library) variant leads to overall shorter code and it is obvious, at what time and place in the code the problem of a not-available package will lead to an error and thus become visible.
In general, both work well and you are free to choose, which of these admittedly small differences appears more important to you.
When RNGScope scope is needed when using Rcpp? In general, you need it when using C RNG functions. In some places you can read that it is not needed when using Rcpp, examples in Rcpp documentation use it, some examples by Dirk Eddelbuettel use it, while others do not like this one or this do not. So in the end I'm confused...
When exactly is it needed and when not? Does it make a difference if I use Rcpp::runif(), R::runif(), or R::unif_rand()? I'm interested mostly in using Rcpp within R package rather then calling standalone code.
In short:
whenever you call the RNGs of the C API of R, you need to save and later re-set state
RNGScope automates this for you as it is so darn useful (and generally inexpensive)
whenever you use Rcpp Attributes, it inserts RNGScope (as you can see when you turn on verbose=TRUE
So in general you don't need to anything manual -- unless you go really old school and write all code directly, foregoing the glue provided by Rcpp.
And if you use Rcpp, it doesn't matter whether you use scalar interfaces from the R:: namespace or the vectorized Rcpp Sugar onces via Rcpp::. RNGScope will be there for you.
I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?
Is it a good practice in general case? Or what problem might it induce?
It all depends on context.
:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.
This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.
Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".
From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))
Many people seem to forget that every use of :: is an R
function call and using it is inefficient compared to just using
the already imported name.
The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.
As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.
Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.
Within your own package
The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:
Personally I've got the impression that :: is
much "overused" nowadays, notably in packages where I'd strongly
advocate using importFrom() in NAMESPACE, so all this happens
at package load time, and then not using :: in the package
sources itself.
On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:
It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.
With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.
If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add #author and #references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.
I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?
Is it a good practice in general case? Or what problem might it induce?
It all depends on context.
:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.
This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.
Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".
From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))
Many people seem to forget that every use of :: is an R
function call and using it is inefficient compared to just using
the already imported name.
The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.
As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.
Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.
Within your own package
The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:
Personally I've got the impression that :: is
much "overused" nowadays, notably in packages where I'd strongly
advocate using importFrom() in NAMESPACE, so all this happens
at package load time, and then not using :: in the package
sources itself.
On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:
It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.
With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.
If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add #author and #references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.