call `[.data.table` explicitly - r

I have a weird behavior, when running my R program on another machine.
When I try to run a data.table join df1[df2] I get the error message
Error in `[.default`(x, i) : invalid subscript type 'list'
I assume that for some reason the R environment on the other machine does not find the data.table bracket function (Although I have loaded the library there).
To force R to use the bracket from data.table I would like to call the bracket function explicitly, but I can't find out how.
Here what I've tried
library(data.table)
df1 <- data.frame(a = c("a1","a2","a3"), n = c(1,2,3), b = c(T,T,T))
df2 <- data.frame(a = c("a1","a2","a3"), n = c(1,2,3), b = c(F,T,F))
df1 <- data.table(df1)
df2 <- data.table(df2)
setkey(df1,a,n,b)
setkey(df2,a,n,b)
df1[df2] # produces `[.default`(x, i) : invalid subscript type 'list'
# my tries to call `[.data.table` explicitly all produce errors
`[.data.table`(df1, df2)
data.table::`[.data.table`(df1, df2)
data.table::`[`(df1, df2)
How can I use the bracket function from the data.table package explicitly?
EDIT:
OK, I'm trying to find the root cause of the error.
I'm using R version 3.2.1,
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.2 mypackage_1.0 ROracle_1.1-10 DBI_0.2-7
loaded via a namespace (and not attached):
[1] plyr_1.8.1 reshape2_1.4 Rcpp_0.11.2 stringr_0.6.2
is.data.table gives TRUE on both, df1 and df2 just before calling df1[df2] (I'm debugging through the code).
The function that contains the codeline df1[df2] is called inside mypackage_1.0 (A package I'm developing). I have noticed, that if I run the code line by line, instead of calling my package function and debugging it, the code works as expected. So I assume there is something wrong with the package. In the DESCRIPTION file I only import the package data.table under "Suggests". Might it be related to that?

To long for a comment so posting as answer.
General comments related to your case.
You can call [.data.table explicitly by calling not exported data.table function using ::: operator.
data.table:::`[.data.table`(x, i)
Using ::: is not a best practice, as it makes you responsible for a function which package author decided not to expose to users directly. You should keep that in mind, still the R CMD check will not raise an error or warning. According to Writing R Extensions:
Using foo:::f instead of foo::f allows access to unexported objects. This is generally not recommended, as the semantics of unexported objects may be changed by the package author in routine maintenance.
In my opinion if you develop and internal package which will be deployed with explicitly stated version of dependencies, it is pretty safe to use :::.
Update your data.table version, 1.9.2 is pretty old release already.
In your DESCRIPTION file use Imports data.table and don't forget to define imports in NAMESPACE file
Debug your problematic machine with the following
if(is.data.table(df1) && is.data.table(df2)) df1[df2] else stop("not a data.table")
Use sessionInfo() as one of your first step in debugging cross package issues to track attached packagess.

Related

Can R differentiate between a manually loaded library and a dependency

I have written a function to get the name and version of all of my loaded packages:
my_lib <- function(){
tmp <- (.packages())
tmp_base <- sessionInfo()$basePkgs
tmp <- setdiff(tmp, tmp_base)
tmp <- sort(tmp)
tmp <- sapply(tmp, function(x){
x <- paste(x, utils::packageVersion(x), sep = ' v')
})
tmp <- paste(tmp, collapse=', ')
return(tmp)
}
This also returns all packages loaded as dependencies to other packages (eg I load car and carData is loaded as a dependency).
I am wondering if there is a way to only return the packages I manually loaded (eg just car)? Can R tell the difference between manually loaded vs loaded as a dependency?
Edit:
Added line to remove base packages using sessionInfo()
R has a subtle difference between a loaded package and an attached package.
A package is attached when you use the library function,
and it makes its exported functions "visible" to the user's global environment.
If a package is attached,
its namespace has been loaded,
but the opposite is not necessarily true.
Each package can define two main types of dependencies: Depends and Imports.
The packages in the former get attached as soon as the dependent package is attached,
but the packages in the latter only get loaded.
This means you can't completely differentiate,
because you may call library for a specific package,
but any packages it Depends on will also be attached.
Nevertheless, you can differentiate between loaded and attached packages with loadedNamespaces() and search().
EDIT: It just occurred to me that if you want to track usage of library
(ignoring require),
you could write a custom tracker:
library_tracker <- with(new.env(), {
packages <- character()
function(flag) {
if (missing(flag)) {
packages <<- union(packages, as.character(substitute(package, parent.frame())))
}
packages
}
})
trace("library", library_tracker, print = FALSE)
library("dplyr")
library(data.table)
# retrieve packages loaded so far
library_tracker(TRUE)
[1] "dplyr" "data.table"
The flag parameter is just used to distinguish between calls made by trace,
which call the function without parameters,
and those made outside of it,
in order to easily retrieve packages loaded so far.
You could also use environment(library_tracker)$packages.

getting lazy data without attaching package

Background:
I have a CRAN R package which has a dependency on lazy-loaded data in another CRAN package of a specific version. I need to avoid using :: to refer to the data, because it causes CRAN check to fail.
I've read:
Evaluate function within package environment without attaching package
and
See if a variable/function exists in a package?
I've tried (using nycflights13 for this example):
# this works, but I can't use ::
nycflights13::airlines
find("airlines")
# character(0)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
#Error in get("airlines", envir = asNamespace("nycflights13"), mode = "list") : object 'airlines' of mode 'list' was not found
# attach
library(nycflights13)
get("airlines", envir = asNamespace("nycflights13"), mode = "list")
# works
find("airlines")
# [1] "package:nycflights13"
This may make it even more complicated, but I actually want to refer to an active binding, which returns data which may or may not be available.
What I would like:
A CRAN compatible way of referring to lazy-loaded data in another package without using :: or Imports in DESCRIPTION.
My workaround was to export a getter function for the external package, for which I am also the author. This works because functions are visible, but lazy data and active bindings (which are set, in my case, in .onLoad()) are not.
Another possibility is to use the fact that :: is a command, so something like this is valid R, and with variable naming on the RHS, it would enable flexibility to query presence or absence of data in namespaces (not just environments on the search() path)
`::`(nycflights13, airlines)
:: just substitutes the given symbols for strings, and calls getExportedValue in base.
So, better still, and I think this is my final answer:
base::getExportedValue(asNamespace("nycflights13"), "airlines")
This works without any requireNamespace() or library().

R: Patching a package function and reloading base libraries

Occasionally one wants to patch a function in a package, without recompiling the whole package.
For example, in Emacs ESS, the function install.packages() might get stuck if tcltk is not loaded. One might want to patch install.packages() in order to require tcltk before installation and unload it after the package setup.
A temp() patched version of install.packages() might be:
## Get original args without ending NULL
temp=rev(rev(deparse(args(install.packages)))[-1])
temp=paste(paste(temp, collapse="\n"),
## Add code to load tcltk
"{",
" wasloaded= 'package:tcltk' %in% search()",
" require(tcltk)",
## Add orginal body without braces
paste(rev(rev(deparse(body(install.packages))[-1])[-1]), collapse="\n"),
## Unload tcltk if it was not loaded before by user
" if(!wasloaded) detach('package:tcltk', unload=TRUE)",
"}\n",
sep="\n")
## Eval patched function
temp=eval(parse(text=temp))
# temp
Now we want to replace the original install.packages() and perhaps insert the code in Rprofile.
To this end it is worth nothing that:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... install.packages() source follows (quite lengthy)
That is, the function is stored inside the package/namespace of utils. This environment is sealed and therefore install.packages() should be unlocked before being replaced:
## Override original function
unlockBinding("install.packages", as.environment("package:utils"))
assign("install.packages", temp, envir=as.environment("package:utils"))
unlockBinding("install.packages", asNamespace("utils"))
assign("install.packages", temp, envir=asNamespace("utils"))
rm(temp)
Using getAnywhere() again, we get:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... the *new* install.packages() source follows
It seems that the patched function is placed in the right place.
Unfortunately, running it gives:
Error in install.packages(xxxxx) :
could not find function "getDependencies"
getDependencies() is a function inside the same utils package, but not exported; therefore it is not accessible outside its namespace.
Despite the output of getAnywhere("install.packages"), the patched install.packages() is still misplaced.
The problem is that we need to reload the utils library to obtain the desired effect, which also requires unloading other libraries importing it.
detach("package:stats", unload=TRUE)
detach("package:graphics", unload=TRUE)
detach("package:grDevices", unload=TRUE)
detach("package:utils", unload=TRUE)
library(utils)
install.packages() works now.
Of course, we need to reload the other libraries too. Given the dependencies, using
library(stats)
should reload everything. But there is a problem when reloading the graphics library, at least on Windows:
library(graphics)
# Error in FUN(X[[i]], ...) :
# no such symbol C_contour in package path/to/library/graphics/libs/x64/graphics.dll
Which is the correct way of (re)loading the graphics library?
Patching functions in packages is a low-level operation that should be avoided, because it may break internal assumptions of the execution environment and lead to unpredictable behavior/crashes. If there is a problem with tck/ESS (I didn't try to repeat that) perhaps it should be fixed or there may be a workaround. Particularly changing locked bindings is something to avoid.
If you really wanted to run some code at the start/end of say install.packages, you can use trace. It will do some of the low-level operations mentioned in the question, but the good part is you don't have to worry about fixing this whenever some new internals of R change.
trace(install.packages,
tracer=quote(cat("Starting install.packages\n")),
exit=quote(cat("Ending install packages.\n"))
)
Replace tracer and exit accordingly - maybe exit is not needed anyway, maybe you don't need to unload the package. Still, trace is a very useful tool for debugging.
I am not sure if that will solve your problem - if it would work with ESS - but in general you can also wrap install.packages in a function you define say in your workspace:
install.packages <- function(...) {
cat("Entry.\n")
on.exit(cat("Exit.\n"))
utils::install.packages(...)
}
This is the cleanest option indeed.

Why this simple test with data.table fails? How to fix it? [duplicate]

I am trying to use the data.table package inside my own package. MWE is as follows:
I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code is
test.fun<-function ()
{
library(data.table)
testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))
setkey(testdata, A)
res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]
return(res)
}
When I create this function in a regular R session, and then run the function, it works as expected.
> res<-test.fun()
data.table 1.8.0 For help type: help("data.table")
> res
A Ct Total Avg
[1,] 1 5 -0.5326444 -0.1065289
[2,] 2 5 -4.0832062 -0.8166412
[3,] 3 5 0.9458251 0.1891650
[4,] 4 5 2.0474791 0.4094958
[5,] 5 5 2.3609443 0.4721889
When I put this function into a package, install the package, load the package, and then run the function, I get an error message.
> library(testpackage)
> res<-test.fun()
data.table 1.8.0 For help type: help("data.table")
Error in `[.data.frame`(x, i, j) : object 'Val' not found
Can anybody explain to me why this is happening and what I can do to fix it. Any help is very much appreciated.
Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:
FAQ 6.9: I have created a package that depends on data.table. How do I
ensure my package is data.table-aware so that inheritance from
data.frame works?
Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.
Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.
This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :
CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.
Here is the complete recipe:
Add data.table to Imports in your DESCRIPTION file.
Add #import data.table to your respective .R file (i.e., the .R file that houses your function that's throwing the error Error in [.data.frame(x, i, j) : object 'Val' not found).
Type library(devtools) and set your working directory to point at the main directory of your R package.
Type document(). This will ensure that your NAMESPACE file includes a import(data.table) line.
Type build()
Type install()
For a nice primer on what build() and install() do, see: http://kbroman.org/pkg_primer/.
Then, once you close your R session and login next time, you can immediately jump right in with:
Type library("my_R_package")
Type the name of your function that's housed in the .R file mentioned above.
Enjoy! You should no longer receive the dreaded Error in [.data.frame(x, i, j) : object 'Val' not found

find all functions (including private) in a package

I know ls("package:grid") and find.funs("package:grid") in mvbutils but apparently neither of them can find non-exported functions and methods that are only accessible internally or with ::: or getAnywhere.
I've had to source the files in the /R directory of the source package and use ls() on a clean global environment, but there must be a better way, no?
you can use asNamespace:
> methods(cbind)
[1] cbind.data.frame cbind.grobGrid cbind.ts*
Non-visible functions are asterisked
> r <- unclass(lsf.str(envir = asNamespace("stats"), all = T))
> r[grep("cbind.ts", r)]
[1] ".cbind.ts" "cbind.ts"
cbind.ts in stats package is invisible but can find in envir = asNamespace("stats").
This appears to be something of a perennial here.
If it's this one-liners you're after then this should be a contender (credit #Joshua):
ls(getNamespace("grid"), all.names=TRUE)
(Link is to a question that was asked after the above, but closely related).
As grid is a base package and I haven't yet moved up to R 3... I'm getting 756 functions with Version 2.15.1. vs. 503 from the unclass solution.

Resources