R use %within% operator without calling library(lubridate) - r

I'm writing a function that uses some lubridate functions.
However, I'm not sure how to import the function %within% from lubridate. Usually this is easy, as I just use lubridate::function. When I tried this with the %within% operator, this did not work. Is there any way to use operators from packages without loading the entire package itself?

Yes. The issue here is that %within% is a special symbol that is also a function. In R, pretty much everything is a function. For example, when you do 2 + 2, R actually interprets that as
`+`(2, 2)
Check this section from "Advanced R" by Hadley Wickham to understand this details.
So the most direct answer to your question would be: use the same logic for the %within% function, but escaping the % symbol and directly specifying arguments a and b for the function.
lubridate::`%within%`(a, b)
If you're developing a package yourself, you can document your function that uses %within% using roxygen2. All you need to do is:
#' #title Your Nice Function
#' #description This is a description...
#' #param a Your first param
#' #param b Your second param
#' #importFrom lubridate \%within\%
your_nice_function <- function(a, b) {
...
}
That does it, because you're saying that this function imports the specific function %within% from lubridate, without loading the entire lubridate package.
However, if you just want to use the function inside a script the most "familiar" way possible, maybe the best way is to:
`%within%` <- lubridate::`%within%`
By doing so, you're essentially copying the function to a local variable of the same name.

Related

R Package: how "import" works when my exported function does not call explicitly a function from other packages, but a subroutine does

I am developing my first R package and there is something that it is not clear to me about Imports in the DESCRIPTION file. I went through quite some guides that explain package structure but I do not find an answer to my question, so here is my situation.
I define a function f that I will export, so its definition will have the proper #export roxygen comment on top.
now, my function f calls a subroutine hidden, that I do not want to export. Function hidden uses other packages too, say package X.
Because the call to X is inside function hidden, there is no tag #import X in my function f. Thus, I added package X to the Imports in my DESCRIPTION file, hoping to specify the relevant dependency there.
When I use devtools::document(), however, the generated NAMESPACE does not contain an entry for X. I can see why that happens: the parser just does not find the flag in the roxygen comment for f, and at runtime a call to f crashes because X is missing.
Now, I can probably fix everything by specifying X in the import of f. But why is the mechanism this tricky? Or, similarly, why my imports in DESCRIPTION do not match the ones in NAMESPACE?
My understanding is that there are three "correct" ways to do the import. By "correct," I mean that they will pass CRAN checks and function properly. Which option you choose is a matter of balancing various advantages and is largely subjective.
I'll review these options below using the terminology
primary_function the function in your package that you wish to export
hidden the unexported function in your package used by primary_function
thirdpartypkg::blackbox, blackbox is an exported function from the thirdpartypkg package.
Option 1 (no direct import / explicit function call)
I think this is the most common approach. thirdpartypkg is declared in the DESCRIPTION file, but nothing is imported from thirdpartypkg in the NAMESPACE file. In this option, it is necessary to use the thirdpartypkg::blackbox construct to get the desired behavior.
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
hidden <- function(a, b, c){
# do something here
thirdpartypkg::blackbox(a, c)
}
Option 2 (direct import / no explicit function call)
In this option, you directly import the blackbox function. Having done so, it is no longer necessary to use thirdpartypkg::blackbox; you may simply call blackbox as if it were a part of your package. (Technically it is, you imported it to the namespace, so there's no need to reach to another namespace to get it)
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
importFrom(thirdpartypkg, blackbox)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
#' #importFrom thirdpartypkg blackbox
hidden <- function(a, b, c){
# do something here
# I CAN USE blackbox HERE AS IF IT WERE PART OF MY PACKAGE
blackbox(a, c)
}
Option 3 (direct import / explicit function call)
Your last option combines the the previous two options and imports blackbox into your namespace, but then uses the thirdpartypkg::blackbox construct to utilize it. This is "correct" in the sense that it works. But it can be argued to be wasteful and redundant.
The reason I say it is wasteful and redundant is that, having imported blackbox to your namespace, you're never using it. Instead, you're using the blackbox in the thirdpartypkg namespace. Essentially, blackbox now exists in two namespaces, but only one of them is ever being used. Which begs the question of why make the copy at all.
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
importFrom(thirdpartypkg, blackbox)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
#' #importFrom thirdpartypkg blackbox
hidden <- function(a, b, c){
# do something here
# I CAN USE blackbox HERE AS IF IT WERE PART OF MY PACKAGE
# EVEN THOUGH I DIDN'T. CONSEQUENTLY, THE blackbox I IMPORTED
# ISN'T BEING USED.
thirdpartypkg::blackbox(a, c)
}
Considerations
So which is the best approach to use? There isn't really an easy answer to that. I will say that Option 3 is probably not the approach to take. I can tell you that Wickham advises against Option 3 (I had been developing under that framework and he advised me against it).
If we make the choice between Option 1 and Option 2, the considerations we have to make are 1) efficiency of writing code, 2) efficiency of reading code, and 3) efficiency of executing code.
When it comes to the efficiency of writing code, it's generally easier to #importFrom thirdpartypkg blackbox and avoid having to use the :: operator. It just saves a few key strokes. This adversely affects readability of code, however, because now it isn't immediately apparent where blackbox comes from.
When it comes to efficiency of reading code, it's superior to omit #importFrom and use thirdpartypkg::blackbox. This makes it obvious where blackbox comes from.
When it comes to efficiency of executing code, it's better to #importFrom. Calling thirdpartypkg::blackbox is about 0.1 milliseconds slower than using #importFrom and calling blackbox. That isn't a lot of time, so probably isn't much of a consideration. But if your package uses hundreds of :: constructs and then gets thrown into looping or resampling processes, those milliseconds can start to add up.
Ultimately, I think the best guidance I've read (and I don't know where) is that if you are going to call blackbox more than a handful of times, it's worth using #importFrom. If you will only call it three or four times in a package, go ahead and use the :: construct.

defining custom dplyr methods in R package

I have a package with custom summary(), print() methods for objects that have a particular class. This package also uses the wonderful dplyr package for data manipulation - and I expect my users to write scripts that use both my package and dplyr.
One roadblock, which has been noted by others here and here is that dplyr verbs doesn't preserve custom classes - meaning that an ungroup command can strip my data.frames of their custom classes, and thus screw up method dispatch for summary, etc.
Hadley says "doing this correctly is up to you - you need to define a method for your class for each dplyr method that correctly restores all the classes and attributes" and I'm trying to take the advice - but I can't figure out how to correctly wrap the dplyr verbs.
Here's a simple toy example. Let's say I've defined a cars class, and I have a custom summary for it.
this works
library(tidyverse)
class(mtcars) <- c('cars', class(mtcars))
summary.cars <- function(x, ...) {
#gather some summary stats
df_dim <- dim(x)
quantile_sum <- map(mtcars, quantile)
cat("A cars object with:\n")
cat(df_dim[[1]], 'rows and ', df_dim[[2]], 'columns.\n')
print(quantile_sum)
}
summary(mtcars)
here's the problem
small_cars <- mtcars %>% filter(cyl < 6)
summary(small_cars)
class(small_cars)
that summary call for small_cars just gives me the generic summary, not my custom method, because small_cars no longer retains the cars class after dplyr filtering.
what I tried
First I tried writing a custom method around filter (filter.cars). That didn't work, because filter actually a wrapper around filter_ that allows for non-standard evaluation.
So I wrote a custom filter_ method for cars objects, attempting to implement #jwdink 's advice
filter_.cars <- function(df, ...) {
old_classes <- class(df)
out <- dplyr::filter_(df, ...)
new_classes <- class(out)
class(out) <- c(new_classes, old_classes) %>% unique()
out
}
That doesn't work - I get an infinite recursion error:
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
All I want to do is grab the classes on the incoming df, hand off to dplyr, then return the object with the same classnames as it had before the dplyr call. How do I change my filter_ wrapper to accomplish that? Thanks!
UPDATE:
Some things have changed since my original answer:
Many dplyr verbs no longer remove custom classes; for example, dplyr::filter keeps the class. However, some — like dplyr::group_by — still remove the class, so this question lives on.
With R 3.5 and beyond, method lookup changed its scoping rules
The trailing-underscore version of the verbs are deprecated
Recently ran into a hard-to-figure-out issues due to the second bullet, so just wanted to give a fuller example. Let's say you're using a custom class, with name custom_class, and you want to add a groupby method. Assuming you're using roxygen:
#' group_by.custom_class
#'
#' #description Preserve the class of a `custom_class` object.
#' #inheritParams dplyr::group_by
#'
#' #importFrom dplyr group_by
#'
#' #export
#' #method group_by custom_class
group_by.custom_class <- function(.data, ...) {
result <- NextMethod()
return(reclass(.data, result))
}
(see original answer for definition of reclass function)
Highlights:
You need #method group_by custom_class to add S3method(group_by,custom_class) to NAMESPACE
You need #importFrom dplyr group_by to add importFrom(dplyr,group_by) to your NAMESPACE
I believe in R < 3.5 you could get away with just that second one, but now you need both.
OLD ANSWER:
Further suggestions were offered in the thread so I thought I'd update with what seems to be best practice, which is to use NextMethod().
filter_.cars <- function(.data, ...) {
result <- NextMethod()
reclass(.data, result)
}
Where reclass is written by you; it's just a generic that (at least) adds the original class back on:
reclass <- function(x, result) {
UseMethod('reclass')
}
reclass.default <- function(x, result) {
class(result) <- unique(c(class(x)[[1]], class(result)))
result
}
Your new filter_ method tries to apply to the new class within the definition, hence the recursion.
Following the advice in the issue you linked, try removing that new class prior to filter_ in your updated method.
class(out) <- class(out)[-1]

Why is there no lubridate:::update function?

As said in the title: Why is there no such function? Or in a different way: What is the type of the function? When I type ?update I get something from stats package, but there is a lubridate function as described here on page 7. There also seems to be a lubridate:::update.Date function, but I can't find any explanations for that function.
Backround: I use the function in a package and I only got it to work after I used the Depends: in the decription file. Initially I wanted to use lubridate::update()...
The lubridate package provides the methods lubridate:::update.Date() and lubridate:::update.POSIXt(). Those functions are not exported into the namespace, but I assume that, by means of function overloading, they are invoked when update() is applied to a POSIXor Date object when the lubridate library is loaded.
The help page ?lubridate:::update.POSIXt provides some information concerning the update function within the lubridate package:
Description
update.Date and update.POSIXt return a date with the specified
elements updated. Elements not specified will be left unaltered.
update.Date and update.POSIXt do not add the specified values to the
existing date, they substitute them for the appropriate parts of the
existing date.
Usage
## S3 method for class 'POSIXt'
update(object, ..., simple = FALSE)
The usage section and the examples in the help page indicate that these functions don't need to be addressed individually, as they are called by simply using update() when the lubridate library is loaded.
To inspect these functions one can type, e.g., lubridate:::update.POSIXt in the console (without passing arguments, and without the parentheses).
You need to load the lubridate package:
library(lubridate)
date <- now()
print(date)
new_date <- update(date, year = 2010, month = 1, day = 1)
print(new_date)
Outputs:
"2016-08-04 08:58:08 CEST"
"2010-01-01 08:58:08 CET"

When does a package need to use ::: for its own objects

Consider this R package with two functions, one exported and the other internal
hello.R
#' #export
hello <- function() {
internalFunctions:::hello_internal()
}
hello_internal.R
hello_internal <- function(x){
print("hello world")
}
NAMESPACE
# Generated by roxygen2 (4.1.1): do not edit by hand
export(hello)
When this is checked (devtools::check()) it returns the NOTE
There are ::: calls to the package's namespace in its code. A package
almost never needs to use ::: for its own objects:
‘hello_internal’
Question
Given the NOTE says almost never, under what circumstances will a package need to use ::: for its own objects?
Extra
I have a very similar related question where I do require the ::: for an internal function, but I don't know why it's required. Hopefully having an answer to this one will solve that one. I have a suspicion that unlocking the environment is doing something I'm not expecting, and thus having to use ::: on an internal function.
If they are considered duplicates of each other I'll delete the other one.
You should never need this in ordinary circumstances. You may need it if you are calling the parent function in an unusual way (for example, you've manually changed its environment, or you're calling it from another process where the package isn't attached).
Here is a pseudo-code example, where I think using ::: is the only viable solution:
# R-package with an internal function FInternal() that is called in a foreach loop
FInternal <- function(i) {...}
#' Exported function containing a foreach loop
#' #export
ParallelLoop <- function(is, <other-variables>) {
foreach(i = is) %dopar% {
# This fails, because it cannot not locate FInternal, unless it is exported.
FInternal(i)
# This works but causes a note:
PackageName:::FInternal(i)
}
}
I think the problem here is that the body of the foreach loop is not defined as a function of the package. Hence, when executed on a worker process, it is not treated as a code belonging to the package and does not have access to the internal objects of the package. I would be glad if someone could suggest an elegant solution for this specific case.

Choose function to load from an R package

I like using function reshape from the matlab package, but I need then to specify base::sum(m) each time I want to sum the elements of my matrix or else matlab::sum is called, which only sums by columns..
I need loading package gtools to use the rdirichlet function, but then the function gtools::logit masks the function pracma::logit that I like better..
I gess there are no such things like:
library(loadOnly = "rdirichlet", from = "gtools")
or
library(loadEverythingFrom = "matlab", except = "sum")
.. because functions from the package matlab may internaly work on the matlab::sum function. So the latter must be loaded. But is there no way to get this behavior from the point of view of the user? Something that would feel like:
library(pracma)
library(matlab)
library(gtools)
sum <- base::sum
logit <- pracma::logit
.. but that would not spoil your ls() with all these small utilitary functions?
Maybe I need defining my own default namespace?
To avoid spoiling your ls, you can do something like this:
.ns <- new.env()
.ns$sum <- base::sum
.ns$logit <- pracma::logit
attach(.ns)
To my knowledge there is no easy answer to what you want to achieve. The only dirty hack I can think of is to download the source of the packages "matlab", "gtools", "pracma" and delete the offending functions from their NAMESPACE file prior to installation from source (with R CMD INSTALL package).
However, I would recommend using the explicit notation pracma::logit, because it improves readability of your code for other people and yourself in the future.
This site gives a good overview about package namespaces:
http://r-pkgs.had.co.nz/namespace.html

Resources