automatic redirection of functions - r

The language is R.
I have a couple of files:
utilities.non.foo.R
utilities.foo.R
utilities.R
foo is an in-house package that has been cobbled together (for image processing, although this is irrelevant). It works great, but only on Linux machines, and it is a huge pain to try and compile it even on those.
Basically, utilities.foo.R contains a whole lot of functions that require package foo.
The functions in here are called functionname.foo.
I'm about to start sharing this code with external collaborators who don't have this package or Linux, so I've written a file utilities.non.foo.R, which contains all the functions in utilities.foo.R, except the dependency on package foo has been removed.
These functions are all called functionname.non.foo.
The file utilities.R has a whole heap of this, for each function:
functionname <- function(...) {
if ( fooIsLoaded() ) {
functionname.foo(...)
} else {
functionname.non.foo(...)
}
}
The idea is that one only needs to load utilities.R and if you happen to have package foo (e.g. my internal collaborators), you will use that backend. If you don't have foo (external collaborators), you'll use the non-foo backend.
My question is: is there some way to do the redirection for each function name without explicitly writing the above bit of code for every single function name?
This reminds me of how (e.g.) there is a print method, a print.table, print.data.frame, etc, but the user only needs to use print and which method is used is chosen automatically.
I'd like to have that, except the method.class would be more like method.depends_on_which_package_is_loaded.
Is there any way to avoid writing a redirection function per function in my utilities.R file?

As Dirk says, just use a package. In this case, put all your new *.non.foo functions in a new package, which is also called foo. Distribute this foo to your collaborators, instead of your in-house version. That way your utilities code can just be
functionname <- function(...) functionname.foo(...)
without having to make any checks at all.

Here is an idea: write a function that sets f to either f.foo or f.non.foo. It could be called in a loop, over all functions in a given namespace (or all functions whose name ends in .foo).
dispatch <- function(s) {
if ( fooIsLoaded() ) {
f <- get( paste(s, "foo", sep=".") )
} else {
f <- get( paste(s, "non.foo", sep=".") )
}
assign( s, f, envir=.GlobalEnv ) # You may want to use a namespace
}
f.foo <- function() cat("foo\n")
f.non.foo <- function() cat("non-foo\n")
fooIsLoaded <- function() TRUE
dispatch("f")
f()
fooIsLoaded <- function() FALSE
dispatch("f")
f()
A simpler solution would be to give the same name
to both functions, but put them in different namespaces/packages.

This sounds quite inefficient and inelegant, but how about
funify = function(f, g, package="ggplot2") {
if(paste("package:", package, sep="") %in% search()) f else
{ message("how do you intend to work without ", package) ; g}
}
detach(package:ggplot2)
foo = funify(paste, function(x) letters[x])
foo(1:10)
library(ggplot2)
foo = funify(paste, function(x) letters[x])
foo(1:10)

Related

How can I use get0 in my R package to search only within package namespace?

Let's say I have an internal function in my package, call it is_(), which can be thought of as a more generalized version if is(). In particular, it searches for specialized functions that are used to tell whether the supplied object is the given class. For example, is_() might be programmed as:
is_ <- function(obj, class) {
if (exists(paste0("is.", class))) {
get0(paste0("is.", class))(obj)
}
else inherits(obj, class)
}
That way, if I'm doing argument checking in my package, I can run something like
is_(x, "numeric_vector")
where I've defined
is.numeric_vector <- function(x) is.numeric(x) && is.null(dim(x))
within my package.
A problem arises when is.numeric_vector() is defined outside my package, e.g., by another package the user has loaded. In that case, exists() and get0() both find the function wherever they can, but I want to restrict the search to function defined in my package and included in my package's namespace (i.e., all imported packages). The envir and inherits arguments seem to get at what I want, but I don't know how to supply them to get the result I want. How can I restrict get0() to only search for its argument within my package's namespace?
The problem is that your package namespace will inherit from the base namespace which inherits from the global environment. For a more detailed explanation, see here: https://adv-r.hadley.nz/environments.html#namespaces. If you want more control over the symbol look up, you'll need to do the work yourself. You could include your own get function in your package
# These are private helpers that do not need to be exported from your package.
.pkgenv <- environment()
get1 <- function(name, env = .pkgenv) {
if (identical(env, emptyenv())) {
NULL
} else if (identical(env, globalenv())) {
# stop at global env
NULL
} else if (exists(name, envir=env, inherits = FALSE)) {
env[[name]]
} else {
# try parent
get1(name, parent.env(env))
}
}
This will recursively search for the symbol in environments but stops at the global environment. You could use it with your is_ function like
is_ <- function(obj, class) {
if (!is.null(fn <- get1(paste0("is.", class)))) {
fn(obj)
} else {
inherits(obj, class)
}
}
Here we just check for null rather than separately verifying the name and then retrieving the value. If get1 is something that will be called a bunch you might want to consider caching the result so you don't always have to walk the inheritance tree.

Memoisation with RStudio's "Go to function"

I'm frequently use Rstudio's "Go To Function Definition" (shortcut is F2) in order to navigate between many files and quickly access a function's definition / make changes (printing the function's definition is usually not enough).
In order to make my analysis quicker, many of my functions are memoised with the package 'memoise'. This is all and well, but when I use the "Go To Function Definition" button (or F2), it takes me to the memoise function. This is the result:
function (Date = Sys.Date(), Symbol)
{
hash <- `_digest`(c(list(Date, Symbol), lapply(`_additional`,
function(x) eval(x[[2L]], environment(x)))), algo = "sha512")
if (`_cache`$has_key(hash)) {
res <- `_cache`$get(hash)
}
else {
res <- withVisible(`_f`(Date = Date, Symbol = Symbol))
`_cache`$set(hash, res)
}
if (res$visible) {
res$value
}
else {
invisible(res$value)
}
}
One note - I tried defining the function and giving it its memoisation below it as follows:
foo <- function(x) { return(x) }
foo <- memoise::memoise(foo)
But when I run this on linux, any time I call foo I get an infinite loop. Oddly, it works well on Windows (and the F2 function works on windows with this method!). I need something that will work on a linux system as well as having F2 functionality work.

R, dplyr and snow: how to parallelize functions which use dplyr

Let's suppose that I want to apply, in a parallel fashion, myfunction to each row of myDataFrame. Suppose that otherDataFrame is a dataframe with two columns: COLUNM1_odf and COLUMN2_odf used for some reasons in myfunction. So I would like to write a code using parApply like this:
clus <- makeCluster(4)
clusterExport(clus, list("myfunction","%>%"))
myfunction <- function(fst, snd) {
#otherFunction and aGlobalDataFrame are defined in the global env
otherFunction(aGlobalDataFrame)
# some code to create otherDataFrame **INTERNALLY** to this function
otherDataFrame %>% filter(COLUMN1_odf==fst & COLUMN2_odf==snd)
return(otherDataFrame)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r) { myfunction(r[1],r[2]) }
The problem here is that R doesn't recognize COLUMN1_odf and COLUMN2_odf even if I insert them in clusterExport. How can I solve this problem? Is there a way to "export" all the object that snow needs in order to not enumerate each of them?
EDIT 1: I've added a comment (in the code above) in order to specify that the otherDataFrame is created interally to myfunction.
EDIT 2: I've added some pseudo-code in order to generalize myfunction: it now uses a global dataframe (aGlobalDataFrame and another function otherFunction)
Done some experiments, so I solved my problem (with the suggestion of Benjamin and considering the 'edit' that I've added to the question) with:
clus <- makeCluster(4)
clusterEvalQ(clus, {library(dplyr); library(magrittr)})
clusterExport(clus, "myfunction", "otherfunction", aGlobalDataFrame)
myfunction <- function(fst, snd) {
#otherFunction and aGlobalDataFrame are defined in the global env
otherFunction(aGlobalDataFrame)
# some code to create otherDataFrame **INTERNALLY** to this function
otherDataFrame %>% dplyr::filter(COLUMN1_odf==fst & COLUMN2_odf==snd)
return(otherDataFrame)
}
do.call(bind_rows, parApply(clus, myDataFrame, 1,
{function(r) { myfunction(r[1], r[2]) } )
In this way I've registered aGlobalDataFrame, myfunction and otherfunction, in short all the function and the data used by the function used to parallelize the job (myfunction itself)
Now that I'm not looking at this on my phone, I can see a couple of issues.
First, you are not actually creating otherDataFrame in your function. You are trying to pipe an existing otherDataFrame into filter, and if otherDataFrame doesn't exist in the environment, the function will fail.
Second, unless you have already loaded the dplyr package into your cluster environments, you will be calling the wrong filter function.
Lastly, when you've called parApply, you haven't specified anywhere what fst and snd are supposed to be. Give the following a try:
clus <- makeCluster(4)
clusterEvalQ(clus, {library(dplyr); library(magrittr)})
clusterExport(clus, "myfunction")
myfunction <- function(otherDataFrame, fst, snd) {
dplyr::filter(otherDataFrame, COLUMN1_odf==fst & COLUMN2_odf==snd)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r, fst, snd) { myfunction(r[fst],r[snd]), "[fst]", "[snd]") }

Find parent environment within call stack by function name

I'm working in a call stack of variable depth that looks like
TopLevelFunction
-> <SomeOtherFunction(s), 1 or more>
-> AssignmentFunction
Now, my goal is to assign a variable created in AssignmentFunction, to the environment of TopLevelFunction. I know I can extract the stack with sys.calls, so my current approach is
# get the call stack and search for TopLevelFunction
depth <- which(stringr::str_detect(as.character(sys.calls()), "TopLevelFunction"))
# assign in TopLevelFunction's environment
assign(varName, varValue, envir = sys.frame(depth))
I'm more or less fine with that, though I am not sure if that's a good idea to convert call objects to character vectors. Is that approach error-prone? More generally, how would you search for a specific parent environment, knowing only the name of the function?
A fn like this
get_toplevel_env <- function(env) {
if (identical(parent.env(env), globalenv())) {
env
} else {
get_toplevel_env(parent.env(env))
}
}
And use it within any level of your nested-functions like this?
get_toplevel_env(as.environment(-1))
I'm not sure if I understood correctly what you want to do, but, woulnd't it work to use parent.env(as.environment(-1))?
In this example it seems to work.
fn1 <- function() {
fn1.1 <- function(){
assign("parentvar", "PARENT",
envir = parent.env(as.environment(-1)))
}
fn1.1()
print(parentvar)
}
fn1()
Maybe other possibility is to use <<-, which assigns in the global environment, I think. But maybe that's not what you want.

How to list all the functions signatures in an R file?

Is there an R function that lists all the functions in an R script file along with their arguments?
i.e. an output of the form:
func1(var1, var2)
func2(var4, var10)
.
.
.
func10(varA, varB)
Using [sys.]source has the very undesirable side-effect of executing the source inside the file. At the worst this has security problems, but even “benign” code may simply have unintended side-effects when executed. At best it just takes unnecessary time (and potentially a lot).
It’s actually unnecessary to execute the code, though: it is enough to parse it, and then do some syntactical analysis.
The actual code is trivial:
file_parsed = parse(filename)
functions = Filter(is_function, file_parsed)
function_names = unlist(Map(function_name, functions))
And there you go, function_names contains a vector of function names. Extending this to also list the function arguments is left as an exercise to the reader. Hint: there are two approaches. One is to eval the function definition (now that we know it’s a function definition, this is safe); the other is to “cheat” and just get the list of arguments to the function call.
The implementation of the functions used above is also not particularly hard. There’s probably even something already in R core packages (‘utils’ has a lot of stuff) but since I’m not very familiar with this, I’ve just written them myself:
is_function = function (expr) {
if (! is_assign(expr)) return(FALSE)
value = expr[[3L]]
is.call(value) && as.character(value[[1L]]) == 'function'
}
function_name = function (expr) {
as.character(expr[[2L]])
}
is_assign = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', 'assign')
}
This correctly recognises function declarations of the forms
f = function (…) …
f <- function (…) …
assign('f', function (…) …)
It won’t work for more complex code, since assignments can be arbitrarily complex and in general are only resolvable by actually executing the code. However, the three forms above probably account for ≫ 99% of all named function definitions in practice.
UPDATE: Please refer to the answer by #Konrad Rudolph instead
You can create a new environment, source your file in that environment and then list the functions in it using lsf.str() e.g.
test.env <- new.env()
sys.source("myfile.R", envir = test.env)
lsf.str(envir=test.env)
rm(test.env)
or if you want to wrap it as a function:
listFunctions <- function(filename) {
temp.env <- new.env()
sys.source(filename, envir = temp.env)
functions <- lsf.str(envir=temp.env)
rm(temp.env)
return(functions)
}

Resources