Getting the parse tree for a predefined function in R

Getting the parse tree for a predefined function in R - r

I feel as if this is a fairly basic question, but I can't figure it out.
If I define a function in R, how do I later use the name of the function to get its parse tree. I can't just use substitute as that will just return the parse tree of its argument, in this case just the function name.
For example,
> f <- function(x){ x^2 }
> substitute(f)
f
How should I access the parse tree of the function using its name? For example, how would I get the value of substitute(function(x){ x^2 }) without explicitly writing out the whole function?

I'm not exactly sure which of these meets your desires:
eval(f)
#function(x){ x^2 }
identical(eval(f), get("f"))
#[1] TRUE
identical(eval(f), substitute( function(x){ x^2 }) )
#[1] FALSE
deparse(f)
#[1] "function (x) " "{" " x^2" "}"
body(f)
#------
{
x^2
}
#---------
eval(parse(text=deparse(f)))
#---------
function (x)
{
x^2
}
#-----------
parse(text=deparse(f))
#--------
expression(function (x)
{
x^2
})
#--------
get("f")
# function(x){ x^2 }
The print representation may not display the full features of the values returned.
class(substitute(function(x){ x^2 }) )
#[1] "call"
class( eval(f) )
#[1] "function"

The function substitute can substitute in values bound to an environment. The odd thing is that its env argument does not possess a default value, but it defaults to the evaluation environment. This behavior seems to make it fail when the evaluation environment is the global environment, but works fine otherwise.
Here is an example:
> a = new.env()
> a$f = function(x){x^2}
> substitute(f, a)
function(x){x^2}
> f = function(x){x^2}
> environment()
<environment: R_GlobalEnv>
> substitute(f, environment())
f
> substitute(f, globalenv())
f
As demonstrated, when using the global environment as the second argument the functionality fails.
A further demosntration that it works correctly using a but not the global environment:
> evalq(substitute(f), a)
function(x){x^2}
> evalq(substitute(f), environment())
f
Quite puzzling.

Apparently that's indeed some weird quirk of substitute and is mentioned here:
/* do_substitute has two arguments, an expression and an
environment (optional). Symbols found in the expression are
substituted with their values as found in the environment. There is
no inheritance so only the supplied environment is searched. If no
environment is specified the environment in which substitute was
called is used. If the specified environment is R_GlobalEnv it is
converted to R_NilValue, for historical reasons. In substitute(),
R_NilValue signals that no substitution should be done, only
extraction of promise expressions. Arguments to do_substitute
should not be evaluated.
*/
And you have already found a way of circumventing it:
e = new.env()
e$fn = f
substitute(fn, e)

Related

Why is this simple function not working?

I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.

You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.

In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).

Unevaluated argument in R

I still a novice in R, and still understanding lazy evaluation. I read quite a few threads on SO (R functions that pass on unevaluated arguments to other functions), but I am still not sure.
Question 1:
Here's my code:
f <- function(x = ls()) {
a<-1
#x ##without x
}
f(x=ls())
When I execute this code i.e. f(), nothing returns. Specifically, I don't see the value of a. Why is it so?
Question 2:
Moreover, I do see the value of a in this code:
f <- function(x = ls()) {
a<-1
x ##with x
}
f(x=ls())
When I execute the function by f() I get :
[1] "a" "x"
Why is it so? Can someone please help me?

Question 1
This has nothing to do with lazy evaluation.
A function returns the result of the last statement it executed. In this case the last statement was a <- 1. The result of a <- 1 is one. You could for example do b <- a <- 1 which would result in b being equal to 1. So, in this case you function returns 1.
> f <- function(x = ls()) {
+ a<-1
+ }
> b <- f(x=ls())
> print(b)
[1] 1
The argument x is nowhere used, and so doesn't play any role.
Functions can return values visibly (the default) or invisibly. In order to return invisibly the function invisible can be used. An example:
> f1 <- function() {
+ 1
+ }
> f1()
[1] 1
>
> f2 <- function() {
+ invisible(1)
+ }
> f2()
>
In this case f2 doesn't seem to return anything. However, it still returns the value 1. What the invisible does, is not print anything when the function is called and the result is not assigned to anything. The relevance to your example, is that a <- 1 also returns invisibly. That is the reason that your function doesn't seem to return anything. But when assigned to b above, b still gets the value 1.
Question 2
First, I'll explain why you see the results you see. The a you see in your result, was caused some previous code. If we first clean the workspace, we only see f. This makes sense as we create a variable f (a function is also a variable in R) and then do a ls().
> rm(list = ls())
>
> f <- function(x = ls()) {
+ a<-1
+ x
+ }
> f(x=ls())
[1] "f"
What the function does (at least what you would expect), if first list all variables ls() then pass the result to the function as x. This function then returns x, which is the list of all variables, which then gets printed.
How this can be modified to show lazy evaluation at work
> rm(list = ls())
>
> f <- function(x) {
+ a <<- 1
+ x
+ }
>
> f(x = ls())
[1] "a" "f"
>
In this case the global assignment is used (a <<- 1), which creates a new variable a in the global workspace (not something you normally want to do).
In this case, one would still expect the result of the function call to be just f. The fact that it also shows a is caused by lazy evaluation.
Without lazy evaluation, it would first evaluate ls() (at that time only f exists in the workspace), copy that into the function with the name x. The function then returns x. In this case the ls() is evaluated before a is created.
However, with lazy evaluation, the expression ls() is only evaluated when the result of the expression is needed. In this case that is when the function returns and the result is printed. At that time the global environment has changed (a is created), which means that ls() also shows a.
(This is also one of the reasons why you don't want functions to change the global workspace using <<-.)

Get the attribute of a packaged function from within itself

Suppose we have this functions in a R package.
prova <- function() {
print(attr(prova, 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
'myattr<-' <- function(x, value) {
attr(x, 'myattr') <- value
x
}
myattr <- function(x) attr(x, 'myattr')
So, I install the package and then I test it. This is the result:
prova()
# NULL
# NULL
myattr(prova) <- 'ciao' # setting 'ciao' for 'myattr' attribute
prova()
# NULL
# NULL # Why NULL here ?
myattr(prova)
# [1] "ciao"
attr(prova, 'myattr')
# [1] "ciao"
The question is: how to get the attribute of the function from within itself?
Inside the function itself I cannot get its attribute, as demonstrated by the example.
I suppose that the solution will be of the serie "computing on the language" (match.call()[[1L]], substitute, environments and friends). Am I wrong?
I think that the important point here is that this function is in a package (so, it has its environment and namespace) and I need its attribute inside itself, in the package, not outside.

you can use get with the envir argument.
prova <- function() {
print(attr(get("prova", envir=envir.prova), 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
eg:
envir.prova <- environment()
prova()
# NULL
# NULL
myattr(prova) <- 'ciao'
prova()
# [1] "ciao"
# [1] "ciao"
Where envir.prova is a variable whose value you set to the environment in which prova is defined.
Alternatively you can use get(.. envir=parent.frame()), but that is less reliable as then you have to track the calls too, and ensure against another object with the same name between the target environment and the calling environment.
Update regarding question in the comments:
regarding using parent.frame() versus using an explicit environment name: parent.frame, as the name suggests, goes "up one level." Often, that is exactly where you want to go, so that works fine. And yet, even when your goal is get an object in an environment further up, R searches up the call stack until it finds the object with the matching name. So very often, parent.frame() is just fine.
HOWEVER if there are multiple calls between where you are invoking parent.frame() and where the object is located AND in one of the intermediary environments there exists another object with the same name, then R will stop at that intermediary environment and return its object, which is not the object you were looking for.
Therefore, parent.frame() has an argument n (which defaults to 1), so that you can tell R to begin it's search at n levels back.
This is the "keeping track" that I refer to, where the developer has to be mindful of the number of calls in between. The straightforward way to go about this is to have an n argument in every function that is calling the function in question, and have that value default to 1. Then for the envir argument, you use: get/assign/eval/etc (.. , envir=parent.frame(n=n) )
Then if you call Func2 from Func1, (both Func1 and Func2 have an n argument), and Func2 is calling prova, you use:
Func1 <- function(x, y, ..., n=1) {
... some stuff ...
Func2( <some, parameters, etc,> n=n+1)
}
Func2 <- function(a, b, c, ..., n=1) {
.... some stuff....
eval(quote(prova()), envir=parent.frame(n=n) )
}
As you can see, it is not complicated but it is * tedious* and sometimes what seems like a bug creeps in, which is simply forgetting to carry the n over.
Therefore, I prefer to use a fixed variable with the environment name.

The solution that I found is:
myattr <- function(x) attr(x, 'myattr')
'myattr<-' <- function(x, value) {
# check that x is a function (e.g. the prova function)
# checks on value (e.g. also value is a function with a given precise signature)
attr(x, 'myattr') <- value
x
}
prova <- function(..., env = parent.frame()) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], env)
# print(eval(as.call(c(myattr, this)), env)) # alternative
print(myattr(this))
# print(attr(this, 'myattr')
invisible(TRUE)
}
I want to thank #RicardoSaporta for the help and the clarification about keeping tracks of the calls.
This solution doesn't work when e.g. myattr(prova) <- function() TRUE is nested in func1 while prova is called in func2 (that it's called by func1). Unless you do not properly update its parameter env ...
For completeness, following the suggestion of #RicardoSaporta, I slightly modified the prova function:
prova <- function(..., pos = 1L) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], parent.frame(n = pos)
print(myattr(this))
# ...
}
This way, it works also when nested, if the the correct pos parameter is passed in.
With this modification it is easier to go to fish out the environment in which you set the attribute on the function prova.
myfun1 <- function() {
myattr(prova) <- function() print(FALSE)
myfun2(n = 2)
}
myfun2 <- function(n) {
prova(pos = n)
}
myfun1()
# function() print(FALSE)
# <environment: 0x22e8208>

How to prevent namespace pollution in R [duplicate]

This is probably not correct terminology, but hopefully I can get my point across.
I frequently end up doing something like:
myVar = 1
f <- function(myvar) { return(myVar); }
# f(2) = 1 now
R happily uses the variable outside of the function's scope, which leaves me scratching my head, wondering how I could possibly be getting the results I am.
Is there any option which says "force me to only use variables which have previously been assigned values in this function's scope"? Perl's use strict does something like this, for example. But I don't know that R has an equivalent of my.
EDIT: Thank you, I am aware of that I capitalized them differently. Indeed, the example was created specifically to illustrate this problem!
I want to know if there is a way that R can automatically warn me when I do this.
EDIT 2: Also, if Rkward or another IDE offers this functionality I'd like to know that too.

As far as I know, R does not provide a "use strict" mode. So you are left with two options:
1 - Ensure all your "strict" functions don't have globalenv as environment. You could define a nice wrapper function for this, but the simplest is to call local:
# Use "local" directly to control the function environment
f <- local( function(myvar) { return(myVar); }, as.environment(2))
f(3) # Error in f(3) : object 'myVar' not found
# Create a wrapper function "strict" to do it for you...
strict <- function(f, pos=2) eval(substitute(f), as.environment(pos))
f <- strict( function(myvar) { return(myVar); } )
f(3) # Error in f(3) : object 'myVar' not found
2 - Do a code analysis that warns you of "bad" usage.
Here's a function checkStrict that hopefully does what you want. It uses the excellent codetools package.
# Checks a function for use of global variables
# Returns TRUE if ok, FALSE if globals were found.
checkStrict <- function(f, silent=FALSE) {
vars <- codetools::findGlobals(f)
found <- !vapply(vars, exists, logical(1), envir=as.environment(2))
if (!silent && any(found)) {
warning("global variables used: ", paste(names(found)[found], collapse=', '))
return(invisible(FALSE))
}
!any(found)
}
And trying it out:
> myVar = 1
> f <- function(myvar) { return(myVar); }
> checkStrict(f)
Warning message:
In checkStrict(f) : global variables used: myVar

checkUsage in the codetools package is helpful, but doesn't get you all the way there.
In a clean session where myVar is not defined,
f <- function(myvar) { return(myVar); }
codetools::checkUsage(f)
gives
<anonymous>: no visible binding for global variable ‘myVar’
but once you define myVar, checkUsage is happy.
See ?codetools in the codetools package: it's possible that something there is useful:
> findGlobals(f)
[1] "{" "myVar" "return"
> findLocals(f)
character(0)

You need to fix the typo: myvar != myVar. Then it will all work...
Scope resolution is 'from the inside out' starting from the current one, then the enclosing and so on.
Edit Now that you clarified your question, look at the package codetools (which is part of the R Base set):
R> library(codetools)
R> f <- function(myVAR) { return(myvar) }
R> checkUsage(f)
<anonymous>: no visible binding for global variable 'myvar'
R>

Using get(x, inherits=FALSE) will force local scope.
myVar = 1
f2 <- function(myvar) get("myVar", inherits=FALSE)
f3 <- function(myvar){
myVar <- myvar
get("myVar", inherits=FALSE)
}
output:
> f2(8)
Error in get("myVar", inherits = FALSE) : object 'myVar' not found
> f3(8)
[1] 8

You are of course doing it wrong. Don't expect static code checking tools to find all your mistakes. Check your code with tests. And more tests. Any decent test written to run in a clean environment will spot this kind of mistake. Write tests for your functions, and use them. Look at the glory that is the testthat package on CRAN.

There is a new package modules on CRAN which addresses this common issue (see the vignette here). With modules, the function raises an error instead of silently returning the wrong result.
# without modules
myVar <- 1
f <- function(myvar) { return(myVar) }
f(2)
[1] 1
# with modules
library(modules)
m <- module({
f <- function(myvar) { return(myVar) }
})
m$f(2)
Error in m$f(2) : object 'myVar' not found
This is the first time I use it. It seems to be straightforward so I might include it in my regular workflow to prevent time consuming mishaps.

you can dynamically change the environment tree like this:
a <- 1
f <- function(){
b <- 1
print(b)
print(a)
}
environment(f) <- new.env(parent = baseenv())
f()
Inside f, b can be found, while a cannot.
But probably it will do more harm than good.

You can test to see if the variable is defined locally:
myVar = 1
f <- function(myvar) {
if( exists('myVar', environment(), inherits = FALSE) ) return( myVar) else cat("myVar was not found locally\n")
}
> f(2)
myVar was not found locally
But I find it very artificial if the only thing you are trying to do is to protect yourself from spelling mistakes.
The exists function searches for the variable name in the particular environment. inherits = FALSE tells it not to look into the enclosing frames.

environment(fun) = parent.env(environment(fun))
will remove the 'workspace' from your search path, leave everything else. This is probably closest to what you want.

#Tommy gave a very good answer and I used it to create 3 functions that I think are more convenient in practice.
strict
to make a function strict, you just have to call
strict(f,x,y)
instead of
f(x,y)
example:
my_fun1 <- function(a,b,c){a+b+c}
my_fun2 <- function(a,b,c){a+B+c}
B <- 1
my_fun1(1,2,3) # 6
strict(my_fun1,1,2,3) # 6
my_fun2(1,2,3) # 5
strict(my_fun2,1,2,3) # Error in (function (a, b, c) : object 'B' not found
checkStrict1
To get a diagnosis, execute checkStrict1(f) with optional Boolean parameters to show more ore less.
checkStrict1("my_fun1") # nothing
checkStrict1("my_fun2") # my_fun2 : B
A more complicated case:
A <- 1 # unambiguous variable defined OUTSIDE AND INSIDE my_fun3
# B unambiguous variable defined only INSIDE my_fun3
C <- 1 # defined OUTSIDE AND INSIDE with ambiguous name (C is also a base function)
D <- 1 # defined only OUTSIDE my_fun3 (D is also a base function)
E <- 1 # unambiguous variable defined only OUTSIDE my_fun3
# G unambiguous variable defined only INSIDE my_fun3
# H is undeclared and doesn't exist at all
# I is undeclared (though I is also base function)
# v defined only INSIDE (v is also a base function)
my_fun3 <- function(a,b,c){
A<-1;B<-1;C<-1;G<-1
a+b+A+B+C+D+E+G+H+I+v+ my_fun1(1,2,3)
}
checkStrict1("my_fun3",show_global_functions = TRUE ,show_ambiguous = TRUE , show_inexistent = TRUE)
# my_fun3 : E
# my_fun3 Ambiguous : D
# my_fun3 Inexistent : H
# my_fun3 Global functions : my_fun1
I chose to show only inexistent by default out of the 3 optional additions. You can change it easily in the function definition.
checkStrictAll
Get a diagnostic of all your potentially problematic functions, with the same parameters.
checkStrictAll()
my_fun2 : B
my_fun3 : E
my_fun3 Inexistent : H
sources
strict <- function(f1,...){
function_text <- deparse(f1)
function_text <- paste(function_text[1],function_text[2],paste(function_text[c(-1,-2,-length(function_text))],collapse=";"),"}",collapse="")
strict0 <- function(f1, pos=2) eval(substitute(f1), as.environment(pos))
f1 <- eval(parse(text=paste0("strict0(",function_text,")")))
do.call(f1,list(...))
}
checkStrict1 <- function(f_str,exceptions = NULL,n_char = nchar(f_str),show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
f <- try(eval(parse(text=f_str)),silent=TRUE)
if(inherits(f, "try-error")) {return(NULL)}
vars <- codetools::findGlobals(f)
vars <- vars[!vars %in% exceptions]
global_functions <- vars %in% functions
in_global_env <- vapply(vars, exists, logical(1), envir=globalenv())
in_local_env <- vapply(vars, exists, logical(1), envir=as.environment(2))
in_global_env_but_not_function <- rep(FALSE,length(vars))
for (my_mode in c("logical", "integer", "double", "complex", "character", "raw","list", "NULL")){
in_global_env_but_not_function <- in_global_env_but_not_function | vapply(vars, exists, logical(1), envir=globalenv(),mode = my_mode)
}
found <- in_global_env_but_not_function & !in_local_env
ambiguous <- in_global_env_but_not_function & in_local_env
inexistent <- (!in_local_env) & (!in_global_env)
if(typeof(f)=="closure"){
if(any(found)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),":", paste(names(found)[found], collapse=', '),"\n"))}
if(show_ambiguous & any(ambiguous)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Ambiguous :", paste(names(found)[ambiguous], collapse=', '),"\n"))}
if(show_inexistent & any(inexistent)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Inexistent :", paste(names(found)[inexistent], collapse=', '),"\n"))}
if(show_global_functions & any(global_functions)){cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Global functions :", paste(names(found)[global_functions], collapse=', '),"\n"))}
return(invisible(FALSE))
} else {return(invisible(TRUE))}
}
checkStrictAll <- function(exceptions = NULL,show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
n_char <- max(nchar(functions))
invisible(sapply(functions,checkStrict1,exceptions,n_char = n_char,show_global_functions,show_ambiguous, show_inexistent))
}

What works for me, based on #c-urchin 's answer, is to define a script which reads all my functions and then excludes the global environment:
filenames <- Sys.glob('fun/*.R')
for (filename in filenames) {
source(filename, local=T)
funname <- sub('^fun/(.*).R$', "\\1", filename)
eval(parse(text=paste('environment(',funname,') <- parent.env(globalenv())',sep='')))
}
I assume that
all functions and nothing else are contained in the relative directory ./fun and
every .R file contains exactly one function with an identical name as the file.
The catch is that if one of my functions calls another one of my functions, then the outer function has to also call this script first, and it is essential to call it with local=T:
source('readfun.R', local=T)
assuming of course that the script file is called readfun.R.

Finding the names of all functions in an R expression

I'm trying to find the names of all the functions used in an arbitrary legal R expression, but I can't find a function that will flag the below example as a function instead of a name.
test <- expression(
this_is_a_function <- function(var1, var2){
this_is_a_function(var1-1, var2)
})
all.vars(test, functions = FALSE)
[1] "this_is_a_function" "var1" "var2"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
Is there any function - in the core libraries or elsewhere - that will flag 'this_is_a_function' as a function, not a name? It needs to work on arbitrary expressions, that are syntactically legal but might not evaluate correctly (e.g '+'(1, 'duck'))
I've found similar questions, but they don't seem to contain the solution.
If clarification is needed, leave a comment below. I'm using the parser package to parse the expressions.
Edit: #Hadley
I have expressions with contain entire scripts, which usually consist of a main function containing nested function definitions, with a call to the main function at the end of the script.
Functions are all defined inside the expressions, and I don't mind if I have to include '<-' and '{', since I can easy filter them out myself.
The motivation is to take all my R scripts and gather basic statistics about how my use of functions has changed over time.
Edit: Current Solution
A Regex-based approach grabs the function definitions, combined with the method in James' comment to grab function calls. Usually works, since I never use right-hand assignment.
function_usage <- function(code_string){
# takes a script, extracts function definitions
require(stringr)
code_string <- str_replace(code_string, 'expression\\(', '')
equal_assign <- '.+[ \n]+<-[ \n]+function'
arrow_assign <- '.+[ \n]+=[ \n]+function'
function_names <- sapply(
strsplit(
str_match(code_string, equal_assign), split = '[ \n]+<-'),
function(x) x[1])
function_names <- c(function_names, sapply(
strsplit(
str_match(code_string, arrow_assign), split = '[ \n]+='),
function(x) x[1]))
return(table(function_names))
}

Short answer: is.function checks whether a variable actually holds a function. This does not work on (unevaluated) calls because they are calls. You also need to take care of masking:
mean <- mean (x)
Longer answer:
IMHO there is a big difference between the two occurences of this_is_a_function.
In the first case you'll assign a function to the variable with name this_is_a_function once you evaluate the expression. The difference is the same difference as between 2+2 and 4.
However, just finding <- function () does not guarantee that the result is a function:
f <- function (x) {x + 1} (2)
The second occurrence is syntactically a function call. You can determine from the expression that a variable called this_is_a_function which holds a function needs to exist in order for the call to evaluate properly. BUT: you don't know whether it exists from that statement alone. however, you can check whether such a variable exists, and whether it is a function.
The fact that functions are stored in variables like other types of data, too, means that in the first case you can know that the result of function () will be function and from that conclude that immediately after this expression is evaluated, the variable with name this_is_a_function will hold a function.
However, R is full of names and functions: "->" is the name of the assignment function (a variable holding the assignment function) ...
After evaluating the expression, you can verify this by is.function (this_is_a_function).
However, this is by no means the only expression that returns a function: Think of
f <- function () {g <- function (){}}
> body (f)[[2]][[3]]
function() {
}
> class (body (f)[[2]][[3]])
[1] "call"
> class (eval (body (f)[[2]][[3]]))
[1] "function"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
I'd say it is the other way round: in that expression f is the variable (name) which will be asssigned the function (once the call is evaluated). + (1, 2) evaluates to a numeric. Unless you keep it from doing so.
e <- expression (1 + 2)
> e <- expression (1 + 2)
> e [[1]]
1 + 2
> e [[1]][[1]]
`+`
> class (e [[1]][[1]])
[1] "name"
> eval (e [[1]][[1]])
function (e1, e2) .Primitive("+")
> class (eval (e [[1]][[1]]))
[1] "function"

Instead of looking for function definitions, which is going to be effectively impossible to do correctly without actually evaluating the functions, it will be easier to look for function calls.
The following function recursively spiders the expression/call tree returning the names of all objects that are called like a function:
find_calls <- function(x) {
# Base case
if (!is.recursive(x)) return()
recurse <- function(x) {
sort(unique(as.character(unlist(lapply(x, find_calls)))))
}
if (is.call(x)) {
f_name <- as.character(x[[1]])
c(f_name, recurse(x[-1]))
} else {
recurse(x)
}
}
It works as expected for a simple test case:
x <- expression({
f(3, g())
h <- function(x, y) {
i()
j()
k(l())
}
})
find_calls(x)
# [1] "{" "<-" "f" "function" "g" "i" "j"
# [8] "k" "l"

Just to follow up here as I have also been dealing with this problem: I have now created a C-level function to do this using code very similar to the C implementation of all.names and all.vars in base R. It however only works with objects of type "language" i.e. function calls, not type "expression". Demonstration:
ex = quote(sum(x) + mean(y) / z)
all.names(ex)
#> [1] "+" "sum" "x" "/" "mean" "y" "z"
all.vars(ex)
#> [1] "x" "y" "z"
collapse::all_funs(ex)
#> [1] "+" "sum" "/" "mean"
Created on 2022-08-17 by the reprex package (v2.0.1)
This generalizes to arbitrarily complex nested calls.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Getting the parse tree for a predefined function in R - r

Related

Why is this simple function not working?

Unevaluated argument in R

Get the attribute of a packaged function from within itself

How to prevent namespace pollution in R [duplicate]

Finding the names of all functions in an R expression

Categories

Resources