R errors indirect calling of argument - r

This is one of examples when '[' error occurs :
> libs=.packages(TRUE)
> library(help=libs[1])
Błąd w poleceniu 'find.package(pkgName, lib.loc, verbose = verbose)':
nie ma pakietu o nazwie ‘[’
R behaves differently when I use argument directly library(help="base") versus indirect use : x="base"; library(help=x), why R thinks I ask about x packages, what mechanism is used ? I think solution is somewhere here : http://adv-r.had.co.nz/

Looking at the source of library, you will find the following code
if (!character.only)
help <- as.character(substitute(help))
The help to substitute says that
substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env
where env means the current evaluation environment. But libs is bound in .GlobalEnv and not in the environment of the function library.
A simple example of what you are doing
x="a"
test_fun=function(x) {
as.character(substitute(x))
}
test_fun(x)
#"x"
However, if x was defined within the function body
#delete previous definition of x, if necessary
#rm(list="x")
test_fun=function(x) {
x="a"
as.character(substitute(x))
}
test_fun(x)
#"a"
One interesting aspect can be read further down the help of substitute
Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged.
The last part "unless env is .GlobalEnv in which case the symbol is left unchanged" is interesting because
x="a"
as.character(substitute(x))
"x"
This is because env is .GlobalEnv in this case. Then, substitution does not take place. I am sure there is a good reason for this - still it is somewhat surprising for me.

Related

How does R evaluate these weird expressions?

I was trying to make Python 3-style assignment unpacking possible in R (e.g., a, *b, c = [1,2,3], "C"), and although I got so close (you can check out my code here), I ultimately ran into a few (weird) problems.
My code is meant to work like this:
a %,*% b %,% c <- c(1,2,3,4,5)
and will assign a = 1, b = c(2,3,4) and c = 5 (my code actually does do this, but with one small snag I will get to later).
In order for this to do anything, I have to define:
`%,%` <- function(lhs, rhs) {
...
}
and
`%,%<-` <- function(lhs, rhs, value) {
...
}
(as well as %,*% and %,*%<-, which are slight variants of the previous functions).
First issue: why R substitutes *tmp* for the lhs argument
As far as I can tell, R evaluates this code from left to right at first (i.e., going from a to c, until it reaches the last %,%, where upon, it goes back from right to left, assigning values along the way. But the first weird thing I noticed is that when I do match.call() or substitute(lhs) in something like x %infix% y <- z, it says that the input into the lhs argument in %infix% is *tmp*, instead of say, a or x.
This is bizarre to me, and I couldn't find any mention of it in the R manual or docs. I actually make use of this weird convention in my code (i.e., it doesn't show this behavior on the righthand side of the assignment, so I can use the presence of the *tmp* input to make %,% behave differently on this side of the assignment), but I don't know why it does this.
Second issue: why R checks for object existence before anything else
My second problem is what makes my code ultimately not work. I noticed that if you start with a variable name on the lefthand side of any assignment, R doesn't seem to even start evaluating the expression---it returns the error object '<variable name>' not found. I.e., if x is not defined, x %infix% y <- z won't evaluate, even if %infix% doesn't actually use or evaluate x.
Why does R behave like this, and can I change it or get around it? If I could to run the code in %,% before R checks to see if x exists, I could probably hack it so that I wouldn't be a problem, and my Python unpacking code would be useful enough to actually share. But as it is now, the first variable needs to already exist, which is just too limiting in my opinion. I know that I could probably do something by changing the <- to a custom infix operator like %<-%, but then my code would be so similar to the zeallot package, that I wouldn't consider it worth it. (It's already very close in what it does, but I like my style better.)
Edit:
Following Ben Bolker's excellent advice, I was able to find a way around the problem... by overwriting <-.
`<-` <- function(x, value) {
base::`<-`(`=`, base::`=`)
find_and_assign(match.call(), parent.frame())
do.call(base::`<-`, list(x = substitute(x), value = substitute(value)),
quote = FALSE, envir = parent.frame())
}
find_and_assign <- function(expr, envir) {
base::`<-`(`<-`, base::`<-`)
base::`<-`(`=`, base::`=`)
while (is.call(expr)) expr <- expr[[2]]
if (!rlang::is_symbol(expr)) return()
var <- rlang::as_string(expr) # A little safer than `as.character()`
if (!exists(var, envir = envir)) {
assign(var, NULL, envir = envir)
}
}
I'm pretty sure that this would be a mortal sin though, right? I can't exactly see how it would mess anything up, but the tingling of my programmer senses tells me this would not be appropriate to share in something like a package... How bad would this be?
For your first question, about *tmp* (and maybe related to your second question):
From Section 3.4.4 of the R Language definition:
Assignment to subsets of a structure is a special case of a general mechanism for complex assignment:
x[3:5] <- 13:15
The result of this command is as if the following had been executed
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
Note that the index is first converted to a numeric index and then the elements are replaced sequentially along the numeric index, as if a for loop had been used. Any existing variable called *tmp* will be overwritten and deleted, and this variable name should not be used in code.
The same mechanism can be applied to functions other than [. The replacement function has the same name with <- pasted on. Its last argument, which must be called value, is the new value to be assigned.
I can imagine that your second problem has to do with the first step of the "as if" code: if R is internally trying to evaluate *tmp* <- x, it may be impossible to prevent from trying to evaluate x at this point ...
If you want to dig farther, I think the internal evaluation code used to deal with "complex assignment" (as it seems to be called in the internal comments) is around here ...

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Call Arguments of Function inside Function / R language

I have a function:
func <- function (x)
{
arguments <- match.call()
return(arguments)
}
1) If I call my function with specifying argument in the call:
func("value")
I get:
func(x = "value")
2) If I call my function by passing a variable:
my_variable <-"value"
func(my_variable)
I get:
func(x = my_variable)
Why is the first and the second result different?
Can I somehow get in the second call "func(x = "value")"?
I'm thinking my problem is that the Environment inside a function simply doesn't contain values if they were passed by variables. The Environment contains only names of variables for further lookup. Is there a way to follow such reference and get value from inside a function?
In R, when you pass my_variable as formal argument x into a function, the value of my_variable will only be retrieved when the function tries to read x (if it does not use x, my_variable will not be read at all). The same applies when you pass more complicated arguments, such as func(x = compute_my_variable()) -- the call to compute_my_variable will take place when func tries to read x (this is referred to as lazy evaluation).
Given lazy evaluation, what you are trying to do is not well defined because of side effects - in which order would you like to evaluate the arguments? Which arguments would you like to evaluate at all? (note a function can just take an expression for its argument using substitute, but not evaluate it). As a side effect, compute_my_variable could modify something that would impact the result of another argument of func. This can happen even when you only passed variables and constants as arguments (function func could modify some of the variables that will be later read, or even reading a variable such as my_variable could trigger code that would modify some of the variables that will be read later, e.g. with active bindings or delayed assignment).
So, if all you want to do is to log how a function was called, you can use sys.call (or match.call but that indeed expands argument names, etc). If you wanted a more complete stacktrace, you can use e.g. traceback(1).
If for some reason you really wanted values of all arguments, say as if they were all read in the order of match.call, which is the order in which they are declared, you can do it using eval (returns them as list):
lapply(as.list(match.call())[-1], eval)
can't you simply
return paste('func(x =', x, ')')

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

What is the mechanism that makes `+` work when defined by + in empty environment?

Here I create an unevaluated expression:
e2 <- expression(x+10)
If I supply an environment in which x is defined like
env <- as.environment(list(x=20))
eval(e2,env)
R will report an error:
Error in eval(expr, envir, enclos) : could not find function "+"
It is understandable since env is an environment created from scratch, that is, it has no parent environment where + is defined.
However, if I supply + in the list to be converted to an environment like this
env <- as.environment(list(x=20,`+`=function(a,b) {a+b}))
eval(e2,env)
The evaluation works correctly and yields 30.
However, when I define + in the list, it is a binary function whose body also uses + which is defined in {base}. I know that function returns are lazily evaluated in R, but why this could work? If a+b in the function body is lazily evaluated, when I call eval for e2 within env, even though + is defined in this environment which has no parent environment, it should still calls + in itself, which should end up in endless loop. Why is does not happen like this? What is the mechanism here?
When you define the environment here:
env <- as.environment(list(x=20,`+`=function(a,b) {a+b}))
then the function definition is actually defined in the .GlobalEnv (namely, where the definition is executed. You can verify this:
$ environment(env$`+`)
<environment: R_GlobalEnv>
This observation is worth being pondered a little: a function can be a member of environment x, yet belong to y (where “belong to” means that its object lookup uses y rather than x).
And .GlobalEnv knows about +, since it is defined somewhere in its parents (accessible via search()).
Incidentally, if you had used list2env instead of as.environment, your initial code would have worked:
$ env = list2env(list(x = 20))
$ eval(e2, env)
30
The reason is that, unlike as.environment, list2env by default uses the current environment as the new environment’s parent (this can be controlled via the parent argument). as.environment by contrast uses the empty environment (when creating an environment from a list).

Resources