find all functions in a package that use a function - r

I would like to find all functions in a package that use a function. By functionB "using" functionA I mean that there exists a set of parameters such that functionA is called when functionB is given those parameters.
Also, it would be nice to be able to control the level at which the results are reported. For example, if I have the following:
outer_fn <- function(a,b,c) {
inner_fn <- function(a,b) {
my_arg <- function(a) {
a^2
}
my_arg(a)
}
inner_fn(a,b)
}
I might or might not care to have inner_fn reported. Probably in most cases not, but I think this might be difficult to do.
Can someone give me some direction on this?
Thanks

A small step to find uses of functions is to find where the function name is used. Here's a small example of how to do that:
findRefs <- function(pkg, fn) {
ns <- getNamespace(pkg)
found <- vapply(ls(ns, all.names=TRUE), function(n) {
f <- get(n, ns)
is.function(f) && fn %in% all.names(body(f))
}, logical(1))
names(found[found])
}
findRefs('stats', 'lm.fit')
#[1] "add1.lm" "aov" "drop1.lm" "lm" "promax"
...To go further you'd need to analyze the body to ensure it is a function call or the FUN argument to an apply-like function or the f argument to Map etc... - so in the general case, it is nearly impossible to find all legal references...
Then you should really also check that getting the name from that function's environment returns the same function you are looking for (it might use a different function with the same name)... This would actually handle your "inner function" case.

(Upgraded from a comment.) There is a very nice foodweb function in Mark Bravington's mvbutils package with a lot of this capability, including graphical representations of the resulting call graphs. This blog post gives a brief description.

Related

rlang: Error: Can't convert a function to a string

I created a function to convert a function name to string. Version 1 func_to_string1 works well, but version 2 func_to_string2 doesn't work.
func_to_string1 <- function(fun){
print(rlang::as_string(rlang::enexpr(fun)))
}
func_to_string2 <- function(fun){
is.function(fun)
print(rlang::as_string(rlang::enexpr(fun)))
}
func_to_string1 works:
> func_to_string1(sum)
[1] "sum"
func_to_string2 doesn't work.
> func_to_string2(sum)
Error: Can't convert a primitive function to a string
Call `rlang::last_error()` to see a backtrace
My guess is that by calling the fun before converting it to a string, it gets evaluated inside function and hence throw the error message. But why does this happen since I didn't do any assignments?
My questions are why does it happen and is there a better way to convert function name to string?
Any help is appreciated, thanks!
This isn't a complete answer, but I don't think it fits in a comment.
R has a mechanism called pass-by-promise,
whereby a function's formal arguments are lazy objects (promises) that only get evaluated when they are used.
Even if you didn't perform any assignment,
the call to is.function uses the argument,
so the promise is "replaced" by the result of evaluating it.
Nevertheless, in my opinion, this seems like an inconsistency in rlang*,
especially given cory's answer,
which implies that R can still find the promise object even after a given parameter has been used;
the mechanism to do so might not be part of R's public API though.
*EDIT: see coments.
Regardless, you could treat enexpr/enquo/ensym like base::missing,
in the sense that you should only use them with parameters you haven't used at all in the function's body.
Maybe use this instead?
func_to_string2 <- function(fun){
is.function(fun)
deparse(substitute(fun))
#print(rlang::as_string(rlang::enexpr(fun)))
}
> func_to_string2(sum)
[1] "sum"
This question brings up an interesting point on lazy evaluations.
R arguments are lazily evaluated, meaning the arguments are not evaluated until its required.
This is best understood in the Advanced R book which has the following example,
f <- function(x) {
10
}
f(stop("This is an error!"))
the result is 10, which is surprising because x is never called and hence never evaluated. We can force x to be evaluated by using force()
f <- function(x) {
force(x)
10
}
f(stop("This is an error!"))
This behaves as expected. In fact we dont even need force() (Although it is good to be explicit).
f <- function(x) {
x
10
}
f(stop("This is an error!"))
This what is happening with your call here. The function sum which is a symbol initially is being evaluated with no arguments when is.function() is being called. In fact, even this will fail.
func_to_string2 <- function(fun){
fun
print(rlang::as_string(rlang::ensym(fun)))
}
Overall, I think its best to use enexpr() at the very beginning of the function.
Source:
http://adv-r.had.co.nz/Functions.html

Check if a function is called inside another function

Let say I have the function
mean_wrapper <- function(x) {
mean(x)
}
How can I check if the mean function is called?
An use case is for instance If I want to check this behavior in a unit test.
EDIT:
I make another exampe to be clearer. Let consider this function:
library(readr)
library(magrittr)
read_data <- function(file_name) {
read_csv(file_name) %>%
validate_data()
}
The aim of read_data is to read a CVS file and validate it. validate_data performs some checks on the data. It raises an error if one of them fail, otherwise returns the input object.
I want to test both functions but I don't want replicate the same tests I wrote for validate_data in the case of read_data. Anyway I have to check that the latter function has been called in read_data, so I wolud like to write a test that does this for me.
You could trace mean:
trace(mean, tracer = quote(message("mean was called")))
mean_wrapper(3)
#Tracing mean(x) on entry
#mean was called
#[1] 3
untrace(mean)
#Untracing function "mean" in package "base"
Instead of a message you can use anything (e.g., assignment to a variable in the enclosing environment) as tracer.

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

How does local() differ from other approaches to closure in R?

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,
example <- local({
hidden.x <- "You can't see me!"
hidden.fn <- function(){
cat("\"hidden.fn()\"")
}
function(){
cat("You can see and call example()\n")
cat("but you can't see hidden.x\n")
cat("and you can't call ")
hidden.fn()
cat("\n")
}
})
which behaves as follows from the command prompt:
> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"
I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.
What the pros and cons of these two methods?
Encapsulation
The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.
Note that we actually can access the hidden function although its not that easy:
# run hidden.fn
environment(example)$hidden.fn()
Object Oriented Programming
Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:
library(proto)
p <- proto(x = "x",
fn = function(.) cat(' "fn()"\n '),
example = function(.) .$fn()
)
p$example() # prints "fn()"
proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()
EDIT:
The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.
ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch
local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.
getMPIcluster <- NULL
setMPIcluster <- NULL
local({
cl <- NULL
getMPIcluster <<- function() cl
setMPIcluster <<- function(new) cl <<- new
})
local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.
Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.
As #otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler
library(XML)
crawler <- local({
seen <- new.env(parent=emptyenv())
.do_crawl <- function(url, base, pattern) {
if (!exists(url, seen)) {
message(url)
xml <- htmlTreeParse(url, useInternal=TRUE)
hrefs <- unlist(getNodeSet(xml, "//a/#href"))
urls <-
sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
seen[[url]] <- length(urls)
for (url in urls)
.do_crawl(url, base, pattern)
}
}
.do_report <- function(url) {
urls <- as.list(seen)
data.frame(Url=names(urls), Links=unlist(unname(urls)),
stringsAsFactors=FALSE)
}
list(crawl=function(base, pattern="^/.*html$") {
.do_crawl(base, base, pattern)
}, report=.do_report)
})
crawler$crawl(favorite_url)
dim(crawler$report())
(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

attach() inside function

I'd like to give a params argument to a function and then attach it so that I can use a instead of params$a everytime I refer to the list element a.
run.simulation<-function(model,params){
attach(params)
#
# Use elements of params as parameters in a simulation
detach(params)
}
Is there a problem with this? If I have defined a global variable named c and have also defined an element named c of the list "params" , whose value would be used after the attach command?
Noah has already pointed out that using attach is a bad idea, even though you see it in some examples and books. There is a way around. You can use "local attach" that's called with. In Noah's dummy example, this would look like
with(params, print(a))
which will yield identical result, but is tidier.
Another possibility is:
run.simulation <- function(model, params){
# Assume params is a list of parameters from
# "params <- list(name1=value1, name2=value2, etc.)"
for (v in 1:length(params)) assign(names(params)[v], params[[v]])
# Use elements of params as parameters in a simulation
}
Easiest way to solve scope problems like this is usually to try something simple out:
a = 1
params = c()
params$a = 2
myfun <- function(params) {
attach(params)
print(a)
detach(params)
}
myfun(params)
The following object(s) are masked _by_ .GlobalEnv:
a
# [1] 1
As you can see, R is picking up the global attribute a here.
It's almost always a good idea to avoid using attach and detach wherever possible -- scope ends up being tricky to handle (incidentally, it's also best to avoid naming variables c -- R will often figure out what you're referring to, but there are so many other letters out there, why risk it?). In addition, I find code using attach/detach almost impossible to decipher.
Jean-Luc's answer helped me immensely for a case that I had a data.frame Dat instead of the list as specified in the OP:
for (v in 1:ncol(Dat)) assign(names(Dat)[v], Dat[,v])

Resources