Many great R packages exists. However, often they use slightly different names for the same behaviour. As I often use different packages, the different names get in my way. Thus, I would like to extend the original package by adding local functions. E.g.
in the package "rethinking" we use the function "extract.samples()" to obtain the samples from the posterior distribution.
in the package "rstanarm" we use the function "as.matrix()" instead.
It would be nice to add the function "extract.samples()" to my local repository, and to define that it is called only if the input parameter is an "rstanarm object". Thus, I really would like to extend the package: If I load "rethinking" the "rethinking::extract.samples()" is used, and if I load "rstanarm" the function "rstanarm::extract.samples()" is used.
What I currently do is the following:
extract.samples = function(object, n=1000, ...){
if ( class(object)[[1]] == 'stanreg' ){
# rstanarm object:
SIMULATIONS = as.matrix(object)
} else if ( attr(class(object), "package") == 'rethinking' ){
SIMULATIONS = rethinking::extract.samples(object, n=n, ...)
}
return(invisible( SIMULATIONS ))
}
Thus, I explicitly have to take care of all the possible objects and parameter setting. This becomes messy, if a third package defines the function "extract.samples" or if the two packages use different parameters. I wonder, if there is a more robust method.
The proper way to do this is to create your own package that exports a generic function and several methods for it. If you can edit rethinking, then just do that, otherwise create a third package. I'll assume you're doing that.
Here's what the code could look like:
extract.samples <- function(x, ...) {
UseMethod("extract.samples")
}
extract.samples.stanreg <- function(x, ...) {
as.matrix(x, ...)
}
extract.samples.default <- function(x, ...) {
rethinking::extract.samples(x, ...)
}
Related
Let's say I have an internal function in my package, call it is_(), which can be thought of as a more generalized version if is(). In particular, it searches for specialized functions that are used to tell whether the supplied object is the given class. For example, is_() might be programmed as:
is_ <- function(obj, class) {
if (exists(paste0("is.", class))) {
get0(paste0("is.", class))(obj)
}
else inherits(obj, class)
}
That way, if I'm doing argument checking in my package, I can run something like
is_(x, "numeric_vector")
where I've defined
is.numeric_vector <- function(x) is.numeric(x) && is.null(dim(x))
within my package.
A problem arises when is.numeric_vector() is defined outside my package, e.g., by another package the user has loaded. In that case, exists() and get0() both find the function wherever they can, but I want to restrict the search to function defined in my package and included in my package's namespace (i.e., all imported packages). The envir and inherits arguments seem to get at what I want, but I don't know how to supply them to get the result I want. How can I restrict get0() to only search for its argument within my package's namespace?
The problem is that your package namespace will inherit from the base namespace which inherits from the global environment. For a more detailed explanation, see here: https://adv-r.hadley.nz/environments.html#namespaces. If you want more control over the symbol look up, you'll need to do the work yourself. You could include your own get function in your package
# These are private helpers that do not need to be exported from your package.
.pkgenv <- environment()
get1 <- function(name, env = .pkgenv) {
if (identical(env, emptyenv())) {
NULL
} else if (identical(env, globalenv())) {
# stop at global env
NULL
} else if (exists(name, envir=env, inherits = FALSE)) {
env[[name]]
} else {
# try parent
get1(name, parent.env(env))
}
}
This will recursively search for the symbol in environments but stops at the global environment. You could use it with your is_ function like
is_ <- function(obj, class) {
if (!is.null(fn <- get1(paste0("is.", class)))) {
fn(obj)
} else {
inherits(obj, class)
}
}
Here we just check for null rather than separately verifying the name and then retrieving the value. If get1 is something that will be called a bunch you might want to consider caching the result so you don't always have to walk the inheritance tree.
I am attempting to take a function and apply it in a slightly different way in my own function. However I keep running into dependency issues, such and such function not found.
This in spite of the fact, that the function itself works just fine only loading rlist.
This is select.list in the rlist package:
function (.data, ...)
{
args <- set_argnames(dots(...))
quote <- as.call(c(quote(list), args))
list.map.internal(.data, quote, parent.frame())
}
If I attempt to run this function I get an error that the function set_argnames was not found.
When I go into that function here:
function (args, data = args)
{
argnames <- getnames(args, character(length(args)))
indices <- !nzchar(argnames) & vapply(args, is.name, logical(1L))
argnames[indices] <- as.character(args[indices])
setnames(data, argnames)
}
I run into errors of similar nature.
I have intalled and loaded all package dependecies, LinkingTos,Imports,and even suggesteds. Nothing seems to solve the issue.
Is there a way to load all the dependent functions of a function?
Let's suppose that I want to apply, in a parallel fashion, myfunction to each row of myDataFrame. Suppose that otherDataFrame is a dataframe with two columns: COLUNM1_odf and COLUMN2_odf used for some reasons in myfunction. So I would like to write a code using parApply like this:
clus <- makeCluster(4)
clusterExport(clus, list("myfunction","%>%"))
myfunction <- function(fst, snd) {
#otherFunction and aGlobalDataFrame are defined in the global env
otherFunction(aGlobalDataFrame)
# some code to create otherDataFrame **INTERNALLY** to this function
otherDataFrame %>% filter(COLUMN1_odf==fst & COLUMN2_odf==snd)
return(otherDataFrame)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r) { myfunction(r[1],r[2]) }
The problem here is that R doesn't recognize COLUMN1_odf and COLUMN2_odf even if I insert them in clusterExport. How can I solve this problem? Is there a way to "export" all the object that snow needs in order to not enumerate each of them?
EDIT 1: I've added a comment (in the code above) in order to specify that the otherDataFrame is created interally to myfunction.
EDIT 2: I've added some pseudo-code in order to generalize myfunction: it now uses a global dataframe (aGlobalDataFrame and another function otherFunction)
Done some experiments, so I solved my problem (with the suggestion of Benjamin and considering the 'edit' that I've added to the question) with:
clus <- makeCluster(4)
clusterEvalQ(clus, {library(dplyr); library(magrittr)})
clusterExport(clus, "myfunction", "otherfunction", aGlobalDataFrame)
myfunction <- function(fst, snd) {
#otherFunction and aGlobalDataFrame are defined in the global env
otherFunction(aGlobalDataFrame)
# some code to create otherDataFrame **INTERNALLY** to this function
otherDataFrame %>% dplyr::filter(COLUMN1_odf==fst & COLUMN2_odf==snd)
return(otherDataFrame)
}
do.call(bind_rows, parApply(clus, myDataFrame, 1,
{function(r) { myfunction(r[1], r[2]) } )
In this way I've registered aGlobalDataFrame, myfunction and otherfunction, in short all the function and the data used by the function used to parallelize the job (myfunction itself)
Now that I'm not looking at this on my phone, I can see a couple of issues.
First, you are not actually creating otherDataFrame in your function. You are trying to pipe an existing otherDataFrame into filter, and if otherDataFrame doesn't exist in the environment, the function will fail.
Second, unless you have already loaded the dplyr package into your cluster environments, you will be calling the wrong filter function.
Lastly, when you've called parApply, you haven't specified anywhere what fst and snd are supposed to be. Give the following a try:
clus <- makeCluster(4)
clusterEvalQ(clus, {library(dplyr); library(magrittr)})
clusterExport(clus, "myfunction")
myfunction <- function(otherDataFrame, fst, snd) {
dplyr::filter(otherDataFrame, COLUMN1_odf==fst & COLUMN2_odf==snd)
}
do.call(bind_rows,parApply(clus,myDataFrame,1,function(r, fst, snd) { myfunction(r[fst],r[snd]), "[fst]", "[snd]") }
Is there an R function that lists all the functions in an R script file along with their arguments?
i.e. an output of the form:
func1(var1, var2)
func2(var4, var10)
.
.
.
func10(varA, varB)
Using [sys.]source has the very undesirable side-effect of executing the source inside the file. At the worst this has security problems, but even “benign” code may simply have unintended side-effects when executed. At best it just takes unnecessary time (and potentially a lot).
It’s actually unnecessary to execute the code, though: it is enough to parse it, and then do some syntactical analysis.
The actual code is trivial:
file_parsed = parse(filename)
functions = Filter(is_function, file_parsed)
function_names = unlist(Map(function_name, functions))
And there you go, function_names contains a vector of function names. Extending this to also list the function arguments is left as an exercise to the reader. Hint: there are two approaches. One is to eval the function definition (now that we know it’s a function definition, this is safe); the other is to “cheat” and just get the list of arguments to the function call.
The implementation of the functions used above is also not particularly hard. There’s probably even something already in R core packages (‘utils’ has a lot of stuff) but since I’m not very familiar with this, I’ve just written them myself:
is_function = function (expr) {
if (! is_assign(expr)) return(FALSE)
value = expr[[3L]]
is.call(value) && as.character(value[[1L]]) == 'function'
}
function_name = function (expr) {
as.character(expr[[2L]])
}
is_assign = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', 'assign')
}
This correctly recognises function declarations of the forms
f = function (…) …
f <- function (…) …
assign('f', function (…) …)
It won’t work for more complex code, since assignments can be arbitrarily complex and in general are only resolvable by actually executing the code. However, the three forms above probably account for ≫ 99% of all named function definitions in practice.
UPDATE: Please refer to the answer by #Konrad Rudolph instead
You can create a new environment, source your file in that environment and then list the functions in it using lsf.str() e.g.
test.env <- new.env()
sys.source("myfile.R", envir = test.env)
lsf.str(envir=test.env)
rm(test.env)
or if you want to wrap it as a function:
listFunctions <- function(filename) {
temp.env <- new.env()
sys.source(filename, envir = temp.env)
functions <- lsf.str(envir=temp.env)
rm(temp.env)
return(functions)
}
Does R support function overloading ??
I want to do something in the lines of :
g <- function(X,Y) { # do something and return something }
g <- function(X) { # do something and return something}
EDIT, following clarification of the question in comments above:
From a quick glance at this page, it looks like Erlang allows you to define functions that will dispatch completely different methods depending on the arity of their argument list (up to a ..., following which the arguments are optional/don't affect the dispatched method).
To do something like that in R, you'll probably want to use S4 classes and methods. In the S3 system, the method that is dispatched depends solely on the class of the first argument. In the S4 system, the method that's called can depend on the classes of an arbitrary number of arguments.
For one example of what's possible, try running the following. It requires you to have installed both the raster package and the sp package. Between them, they provide a large number of functions for plotting both raster and vector spatial data, and both of them use the S4 system to perform method dispatch. Each of the lines returned by the call to showMethods() corresponds to a separate function, which will be dispatched when plot() is passed x and y arguments that having the indicated classes (which can include being entirely "missing").
> library(raster)
> showMethods("plot")
Function: plot (package graphics)
x="ANY", y="ANY"
x="Extent", y="ANY"
x="Raster", y="Raster"
x="RasterLayer", y="missing"
x="RasterStackBrick", y="ANY"
x="Spatial", y="missing"
x="SpatialGrid", y="missing"
x="SpatialLines", y="missing"
x="SpatialPoints", y="missing"
x="SpatialPolygons", y="missing"
R sure does. Try, for an example:
plot(x = 1:10)
plot(x = 1:10, y = 10:1)
And then go have a look at how the function accomplishes that, by typing plot.default.
In general, the best way to learn how implement this kind of thing yourself will be to spend some time poking around in the code used to define functions whose behavior is already familiar to you.
Then, if you want to explore more sophisticated forms of method dispatch, you'll want to look into both the S3 and S4 class systems provided by R.
This is usually best done through optional arguments. For example:
g <- function(X, Y=FALSE) {
if (Y == FALSE) {
# do something
}
else {
# do something else
}
}
Check out the missing() function in R. For the function to still run, you need to reassign the missing variables before running the rest of the function. For example, this code:
overload = function(x,y) {
if (missing(y)) {
y = FALSE
}
if (y == FALSE) {
print("One variable provided")
} else {
print("Two variables provided")
}
}
overload(1)
overload(1, 2)
Will return:
> overload(1)
[1] "One variable provided"
> overload(1, 2)
[1] "Two variables provided"
Lastly, the missing() function is only reliable if you haven't altered the variable in question in the function.