How is J() function implemented in data.table? - r

I recently learned about the elegant R package data.table. I am very curious to know how the J function is implemented there. This function is bound to the function [.data.table, it doesn't exist in the global environment.
I downloaded the source code but I cannot find the definition for this J function anywhere there. I found lockBind(".SD", ...), but not J. Any idea how this function is implemented?
Many thanks.

J() used to be exported before, but not since 1.8.8. Here's the note from 1.8.8:
o The J() alias is now removed outside DT[...], but will still work inside DT[...]; i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and deprecated with warning() in v1.8.4. This resolves the conflict with function J() in package XLConnect (#1747) and rJava (#2045). Please use data.table() directly instead of J(), outside DT[...].
Using R's lazy evaluation, J(.) is detected and simply replaced with list(.) using the (invisible) non-exported function .massagei.
That is, when you do:
require(data.table)
DT = data.table(x=rep(1:5, each=2L), y=1:10, key="x")
DT[J(1L)]
i (= J(1L)) is checked for its type and this line gets executed:
i = eval(.massagei(isub), x, parent.frame())
where isub = substitute(i) and .massagei is simply:
.massagei = function(x) {
if (is.call(x) && as.character(x[[1L]]) %chin% c("J","."))
x[[1L]] = quote(list)
x
}
Basically, data.table:::.massagei(quote(J(1L))) gets executed which returns list(1L), which is then converted to data.table. And from there, it's clear that a join has to happen.

Related

How to declare S3 method to default to loaded environment?

In a package, I would like to call an S3 method "compact" for object foobar.
There would therefore be a compact.foobar function in my package, along with the compact function itself:
compact = function(x, ...){
UseMethod("compact", x)
}
However, this latter would be conflicting with purrr::compact.
I could default the method to use purrr (compact.default = purrr::compact, or maybe
compact.list = purrr::compact), but that would make little sense if the user does not have purrr loaded.
How can I default my method to the loaded version of compact, in the user environment? (so that it uses purrr::compact, any other declared compact function, or fails of missing function)
Unfortunately S3 does not deal with this situation well. You have to search for suitable functions manually. The following works, more or less:
get_defined_function = function (name) {
matches = getAnywhere(name)
# Filter out invisible objects and duplicates
objs = matches$objs[matches$visible & ! matches$dups]
# Filter out non-function objects
funs = objs[vapply(objs, is.function, logical(1L))]
# Filter out function defined in own package.
envs = lapply(funs, environment)
funs = funs[! vapply(envs, identical, logical(1L), topenv())]
funs[1L][[1L]] # Return `NULL` if no function exists.
}
compact.default = function (...) {
# Maybe add error handling for functions not found.
get_defined_function('compact')(...)
}
This uses getAnywhere to find all objects named compact that R knows about. It then filters out those that are not visible because they’re not inside attached packages, and those that are duplicate (this is probably redundant, but we do it anyway).
Next, it filters out anything that’s not a function. And finally it filters out the compact S3 generic that our own package defines. To do this, it compares each function’s environment to the package environment (given by topenv()).
This should work, but it has no rule for which function to prefer if there are multiple functions with the same name defined in different locations (it just picks an arbitrary one first), and it also doesn’t check whether the function signature matches (doing this is hard in R, since function calling and parameter matching is very flexible).

How to extract functions used in R package?

I have inadvertently over-ridden the package I created with a different function, saved it, and closed R Studio. Now, my R package contains an unintended function.
Thankfully, I did not install the package, so I still have the old package contents stored in my computer.
Is there a way to extract the function from the installed package? It's one long function. Not more than one function.
And, no, I do not have a backup, at least not the updated version.
View(package::function)
Where package is the package you mentioned you had installed and function is the function you're looking to to inspect.
The important thing is to forego the parenthesis where you would normally have the function argument. This will open the function code for inspection.
You can view the structure of a function by typing it's name in the console.
> sum
function (..., na.rm = FALSE) .Primitive("sum")
To get the function from a package, you can use the :: operator
> dplyr::coalesce
function (x, ...)
{
values <- list(...)
for (i in seq_along(values)) {
x <- replace_with(x, is.na(x), values[[i]], paste0("Vector ",
i))
}
x
}
<environment: namespace:dplyr>

I cannot find unique function from data.table package [duplicate]

help(unique) shows that unique function is present in two packages - base and data.table. I would like to use this function from data.table package. I thought that the following syntax - data <- data.table::unique(data) indicates the package to be used. But I get the following error -
'unique' is not an exported object from 'namespace:data.table'
But data <- unique(data) works well.
What is wrong here?
The function in question is really unique.data.table, an S3 method defined in the data.table package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique in this case, to dispatch on it. So the right way to call the function is:
library(data.table)
irisDT <- data.table(iris)
unique(irisDT)
We use base::unique, which is exported, and it dispatches data.table:::unique.data.table, which is not exported. The function data.table:::unique does not actually exist (or does it need to).
As eddi points out, base::unique dispatches based on the class of the object called. So base::unique will call data.table:::unique.data.table only if the object is a data.table. You can force a call to that method directly with something like data.table:::unique.data.table(iris), but internally that will mostly likely result in the next method getting called unless your object is actually a data.table.
There are actually two infix operators in R that pull functions from particular package namespaces. You used :: but there is also a ::: that retrieves "unexported" functions. The unique-function is actually a family of functions and its behavior will depend on both the class of its argument and the particular packages that have been loaded. The R term of this is "generic". Try:
data <- data.table:::unique(data) # assuming 'data' is a data.table
The other tool that lets you peek behind the curtain that the lack of "exportation" is creating is the getAnywhere-function. It lets you see the code at the console:
> unique.data.table
Error: object 'unique.data.table' not found
> getAnywhere(unique.data.table)
A single object matching ‘unique.data.table’ was found
It was found in the following places
registered S3 method for unique from namespace data.table
namespace:data.table
with value
function (x, incomparables = FALSE, fromLast = FALSE, by = key(x),
...)
{
if (!cedta())
return(NextMethod("unique"))
dups <- duplicated.data.table(x, incomparables, fromLast,
by, ...)
.Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x)))
}
<bytecode: 0x2ff645950>
<environment: namespace:data.table>

Manipulate data.table objects within user defined function

I'm trying to write functions that use data.table methods to add and edit columns by reference. However, the same code that work in the console does not work when called from within a function. Here is a simple examples:
> dt <- data.table(alpha = c("a","b","c"), numeric = 1:3)
> foo <- function(x) {
print(is.data.table(x))
x[,"uppercase":=toupper(alpha)]
}
When I call this function, I get the following error.
> test = foo(dt)
[1] TRUE
Error in `:=`("uppercase", toupper(alpha)) :
Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are
defined for use in j, once only and in particular ways. See help(":=").
Yet if I type the same code directly into the console, it works perfectly fine.
> dt[,"uppercase":=toupper(alpha)]
> dt
alpha numeric uppercase
1: a 1 A
2: b 2 B
3: c 3 C
I've scoured stackoverflow and the only clues I could find suggest that the function might be looking for alpha in a different environment or parent frame.
EDIT: More information to reproduce the error. The function is not within a package, but declared directly in the global environment. I didn't realize this before, but in order to reproduce the error I need to dump the function to a file, then load it. I've saved all of my personal functions via dput and load them into R with dget, so that's why I get this error often.
> dput(foo, "foo.R")
> foo <- dget("foo.R")
> foo(dt)
Error in `:=` etc...
This problem is a different flavor of the one described in the post Function on data.table environment errors. It's not exactly a problem, just how dget is designed. But for those curious, this happens because dget assigns the object to parent environment base, and the namespace base isn't data.table aware.
If x is a function the associated environment is stripped. Hence scoping information can be lost.
One workaround is to assign the function to the global enviornment:
> environment(foo) <- .GlobalEnv
But I think the best solution here is to use saveRDS to transfer R objects, which is what ?dget recommends.

`data.table::unique` errors: is not an exported object from namespace

help(unique) shows that unique function is present in two packages - base and data.table. I would like to use this function from data.table package. I thought that the following syntax - data <- data.table::unique(data) indicates the package to be used. But I get the following error -
'unique' is not an exported object from 'namespace:data.table'
But data <- unique(data) works well.
What is wrong here?
The function in question is really unique.data.table, an S3 method defined in the data.table package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique in this case, to dispatch on it. So the right way to call the function is:
library(data.table)
irisDT <- data.table(iris)
unique(irisDT)
We use base::unique, which is exported, and it dispatches data.table:::unique.data.table, which is not exported. The function data.table:::unique does not actually exist (or does it need to).
As eddi points out, base::unique dispatches based on the class of the object called. So base::unique will call data.table:::unique.data.table only if the object is a data.table. You can force a call to that method directly with something like data.table:::unique.data.table(iris), but internally that will mostly likely result in the next method getting called unless your object is actually a data.table.
There are actually two infix operators in R that pull functions from particular package namespaces. You used :: but there is also a ::: that retrieves "unexported" functions. The unique-function is actually a family of functions and its behavior will depend on both the class of its argument and the particular packages that have been loaded. The R term of this is "generic". Try:
data <- data.table:::unique(data) # assuming 'data' is a data.table
The other tool that lets you peek behind the curtain that the lack of "exportation" is creating is the getAnywhere-function. It lets you see the code at the console:
> unique.data.table
Error: object 'unique.data.table' not found
> getAnywhere(unique.data.table)
A single object matching ‘unique.data.table’ was found
It was found in the following places
registered S3 method for unique from namespace data.table
namespace:data.table
with value
function (x, incomparables = FALSE, fromLast = FALSE, by = key(x),
...)
{
if (!cedta())
return(NextMethod("unique"))
dups <- duplicated.data.table(x, incomparables, fromLast,
by, ...)
.Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x)))
}
<bytecode: 0x2ff645950>
<environment: namespace:data.table>

Resources