Strange behavior in default argument enclos =parent.frame() of eval function - r

I'm currently having some issues understanding the behavior of the eval function- specifically the enclos/third argument when an argument isn't supplied to it/ the default argument parent.fame() is used.
name <- function(x){
print(substitute(x))
t <- substitute(x)
eval(t, list(a=7), parent.frame())
}
z <-5
name(a+z)
# returns 12, makes sense because this amounts to
# eval(a+z, list(a=7), glovalenv())
# however the return here makes no sense to me
name2 <- function(x){
print(substitute(x))
t <- substitute(x)
eval(t, list(a=7)) # third/enclosure argument is left missing
}
z <-5
name2(a+z)
# Also returns 12
I'm having trouble understanding why the second call returns 12. According to my understanding of R, the second call should result in an error because
1) eval's default third argument enclos= parent.frame(), which is not specified.
2) Therefore, parent.frame() is evaluated in the local environment of eval. This is confirmed by Hadley in When/how/where is parent.frame in a default argument interpreted?
3) Thus, the last expression ought to resolve to eval(a+z, list(a=7), executing environment of name)
4) This should return an error because z is not defined in the executing environment of name nor in list(a=7).
Can someone please explain what's wrong with this logic?

z will be available inside the function since it's defined in .GlobalEnv.
Simply put,
name <- function(x) {
print(z)
}
z <- 5
name(z)
# [1] 5
So while a is still unknown until eval(t, list(a=7)), z is already available. If z is not defined inside name, it will be looked for in .GlobalEnv. What might be counterintuitive is that (a+z) is undefined unless you specify an environment for a. But for z, there is no need to do so.

Related

subvert external function's `deparse(substitute())` without `eval`

I'd like to wrap around the checkmate library's qassert to check multiple variable's specification at a time. Importantly, I'd like assertion errors to still report the variable name that's out of spec.
So I create checkargs to loop through input arguments. But to get the variable passed on to qassert, I use the same code for each loop -- that ambigious code string gets used for for the error message instead of the problematic variable name.
qassert() (via vname()) is getting what to display in the assertion error like deparse(eval.parent(substitute(substitute(x))). Is there any way to box up get(var) such that that R will see e.g. 'x' on deparse instead?
At least one work around is eval(parse()). But something like checkargs(x="n', system('echo malicious',intern=T),'") has me hoping for an alternative.
checkargs <- function(...) {
args<-list(...)
for(var in names(args))
checkmate::qassert(get(var,envir=parent.frame()),args[[var]])
# scary string interpolation alternative
#eval(parse(text=paste0("qassert(",var,",'",args[[var]], "')")),parent.frame())
}
test_checkargs <- function(x, y) {checkargs(x='b',y='n'); print(y)}
# checkargs is working!
test_checkargs(T, 1)
# [1] 1
# but the error message isn't helpful.
test_checkargs(1, 1)
# Error in checkargs(x = "b", y = "n") :
# Assertion on 'get(var, envir = parent.frame())' failed. Must be of class 'logical', not 'double'.
#
# want:
# Assertion on 'x' failed. ...
substitute() with as.name seems to do the trick. This still uses eval but without string interpolation.
eval(substitute(
qassert(x,spec),
list(x=as.name(var),
spec=args[[var]])),
envir=parent.frame())

Why does my R function have knowledge of variables that are not given as arguments? [duplicate]

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

R: Global assignment of vector element works only inside a function

I'm working on a project where there are some global assignments, and I ran into something sort of odd. I was hoping someone could help me with it.
I wrote this toy example to demonstrate the problem:
x <- 1:3 ; x <- c(1, 2, 5) # this works fine
x <- 1:3 ; x[3] <- 5 # this works fine
x <<- 1:3 ; x <<- c(1, 2, 5) # this works fine
x <<- 1:3 ; x[3] <<- 5 # this does not work
# Error in x[3] <<- 5 : object 'x' not found
same.thing.but.in.a.function = function() {
x <<- 1:3
x[3] <<- 5
}
same.thing.but.in.a.function(); x
# works just fine
So, it seems it's not possible to change part of a vector using a global assignment -- unless that assignment is contained within a function. Can anyone explain why this is the case?
I figured out the problem.
Basically, in this manifestation of <<- (which is more accurately called the "superassignment operator" rather than the "global assignment operator"), it actually skips checking the global environment when trying to access the variable.
On page 19 of R Language Definition, it states the following:
x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
names(x)[3] <<- "Three"
is equivalent to
x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
`*tmp*` <<- get(x, envir=parent.env(), inherits=TRUE)
names(`*tmp*`)[3] <- "Three"
x <<- `*tmp*`
rm(`*tmp*`)
When I tried to run those four lines, it threw an error -- parent.env requires an argument and has no default. I can only assume that the documentation was written at a time when parent.env() contained a default value for its first argument. But I can safely guess that the default would have been environment() which returns the current environment. It then throws an error again -- x needs to be in quotes. So I fixed that too. Now, when I run the first line, it throws the same error message as I encountered originally, but with more detail:
# Error in get("x", envir = parent.env(environment()), inherits = TRUE) :
# object 'x' not found
This makes sense -- environment() itself returns .GlobalEnv, so parent.env(.GlobalEnv) misses out on the global environment entirely, instead returning the most recently loaded package environment. Then, since inherits is set to TRUE, the get() function keeps going up the levels, searching through each of the loaded package environments before eventually reaching the empty environment, and at that point it has still not found x. Thus the error.
Since parent.env(environment()) will return .GlobalEnv (or another environment below it) as long as you start inside a local environment, this same problem does not occur when the same lines are run from inside a local environment:*
local({
x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
`tmp` <<- get("x", envir=parent.env(environment()), inherits=TRUE)
names(`tmp`)[3] <- "Three"
x <<- `tmp`
rm(`tmp`)
})
x
# X0 X0.1 Three
# 1 0 0 0
# so, it works properly
In contrast, when <<- is used in general, there is no extra subsetting code that occurs behind the scenes, and it first attempts to access the value in the current environment (which might be the global environment), before moving upwards. So in that situation, it doesn't run into the problem where it skips the global environment.
* I had to change the variable from *tmp* to tmp because one of the behind-the-scenes operations in the code uses the *tmp* variable and then removes it, so *tmp* disappears in the middle of line 3 and so it throws an error when I then try to access it.
If you change to single arrow assignment then it work
x <<- 1:3 ; x[3] <- 5
BTW - I would suggest these wonderful discussions for better understanding and proper use of <<- operator -
How do you use "<<-" (scoping assignment) in R?
What is the difference between assign() and <<- in R?

R scoping: disallow global variables in function

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

R: Passing parameters from a wrapper function to internal functions

I am not surprised that this function doesn't work, but I cannot quite understand why.
computeMeans <- function(data,dv,fun) {
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
computeMeans(df.basic,dprime,mean)
Where df.basic is a dataframe with factors method, hypothesis, etc, and several dependent variables (and I specify one with the dv parameter, dprime).
I have multiple dependent variables and several dataframes all of the same form, so I wanted to write this little function to keep things "simple". The error I get is:
Error in aggregate(dv, list(method = method, hypo = hypothesis,
pre.group = pre.group, :
object 'dprime' not found
But dprime does exist in df.basic, which is referenced with with(). Can anyone explain the problem? Thank you!
EDIT: This is the R programming language. http://www.r-project.org/
Although dprime exists in df.basic, when you call it at computeMeans it has no idea what you are referring to, unless you explicitly reference it.
computeMeans(df.basic,df.basic$dprime,mean)
will work.
Alternatively
computeMeans <- function(data,dv,fun) {
dv <- eval(substitute(dv), envir=data)
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
You might think that since dv is in the with(data, (.)) call, it gets evaluated within the environment of data. It does not.
When a function is called the arguments are matched and then each of
the formal arguments is bound to a promise. The expression that was
given for that formal argument and a pointer to the environment the
function was called from are stored in the promise.
Until that argument is accessed there is no value associated with the
promise. When the argument is accessed, the stored expression is
evaluated in the stored environment, and the result is returned. The
result is also saved by the promise.
source
A promise is therefore evaluated within the environment in which it was created (ie, the environment where the function was called), regardless of the environment in which the promise is first called. Observe:
delayedAssign("x", y)
local({
y <- 10
x
})
Error in eval(expr, envir, enclos) : object 'y' not found
w <- 10
delayedAssign("z", w)
local({
w <- 11
z
})
[1] 10
Note that delayedAssign creates a promise. In the first example, x is assigned the value of y via a promise in the global environemnt, but y has not been defined in the global enviornment. x is called in an enviornment where y has been defined, yet calling x still results in an error indicating that y does not exist. This demonstrates that x is evaluated in environment in which the promise was defined, not in its current environment.
In the second example, z is assigned the value of w via a promise in the global environment, and w is defined in the global environment. z is then called in an enviornment where w has been assigned a different value, yet z still returns the value of the w in the environment where the promise has been created.
Passing in the dprime argument as a character string would allow you to sidestep any consideration of the involved scoping and evaluation rules discussed in #Michael's answer:
computeMeans <- function(data, dv, fun) {
x <- aggregate(data[[dv]],
list(
method = data[["method"]],
hypo = data[["hypothesis"]],
pre.group = data[["pre.group"]],
pre.smooth = data[["pre.smooth"]]
),
fun )
return(x)
}
computeMeans(df.basic, "dprime", mean)

Resources