Make sure that R functions don't use global variables - r

I'm writing some code in R and have around 600 lines of functions right now and want to know if there is an easy way to check, if any of my functions is using global variables (which I DON'T want).
For example it could give me an error if sourcing this code:
example_fun<-function(x){
y=x*c
return(y)
}
x=2
c=2
y=example_fun(x)
WARNING: Variable c is accessed from global workspace!
Solution to the problem with the help of #Hugh:
install.packages("codetools")
library("codetools")
x = as.character(lsf.str())
which_global=list()
for (i in 1:length(x)){
which_global[[x[i]]] = codetools::findGlobals(get(x[i]), merge = FALSE)$variables
}
Results will look like this:
> which_global
$arrange_vars
character(0)
$cal_flood_curve
[1] "..count.." "FI" "FI_new"
$create_Flood_CuRve
[1] "y"
$dens_GEV
character(0)
...

For a given function like example_function, you can use package codetools:
codetools::findGlobals(example_fun, merge = FALSE)$variables
#> [1] "c"
To collect all functions see Is there a way to get a vector with the name of all functions that one could use in R?

What about emptying your global environment and running the function? If an object from the global environment were to be used in the function, you would get an error, e.g.
V <- 100
my.fct <- function(x){return(x*V)}
> my.fct(1)
[1] 100
#### clearing global environment & re-running my.fct <- function... ####
> my.fct(1)
Error in my.fct(1) : object 'V' not found

Related

make R function return a locked/immutable list

I have a function that returns a list.
I would like this list to be immutable, similar to the way that lockBinding prevents overwriting or editing an object.
This would look something like the following:
myfun <- function(x){
out <- list(a = 1, val = x)
make_read_only(out)
out
}
test <- myfun(9)
test$a
[1] 1
test$val
[1] 9
test
$a
[1] 1
$val
[1] 9
test$newval <- 7
Error:
Where make_read_only() is just a standin for a function or some code that accomplishes this task.
I have tried using lockBinding which would work perfectly, however the 'lock' doesn't survive being passed upward into the parent environment after the function returns its output.
I have also looked into trying to lock symbol in the parent environment, but there doesn't seem to be a way to learn what the output will be assigned to from inside the function, which is needed as an argument of lockBinding.
It seems like there might be a way to do this via returning a refernce to an environment and locking the environment using lockEnvironment() or doing something similar, but I would like to hear what options are out there to accomplish this before beginnig.
Similary, it seems like it might be achievable using R6 but I would prefer to avoid using R6 since it is not required for any other part of this codebase and again, I would just like to hear what options are availble to acheive this behaviour.
In summary, the function will return an arbitrary list. This list will be at least a few levels deep.
mylist <- myfun(list(b = list(c = 3), foo = "bar"))
The user should be able to access the sublits/elements in a straightforward way similar to current dollar-sign access
mylist$a$b$c
[1] 3
It should not be possible to edit this object.
mylist$a <- 5
Error:
It should be possible to remove the object via
rm(mylist)
mylist
Error: object 'mylist' not found
So my question is, what available options are there in R to accomplish this?
One way of achieving this could be by creating a new class (locked) along with methods for [<-, [[<-, and $<- for this new class, which will return an error.
For example:
`[[<-.locked` <- function(...) {stop("Can't assign into locked object")}
a<-list(a="a",b=2)
a
$a
[1] "a"
$b
[1] 2
class(a)<-"locked"
a[[1]]<-"moose"
Error in `[[<-.locked`(`*tmp*`, 1, value = "moose") :
Can't assign into locked object
a
$a
[1] "a"
$b
[1] 2
attr(,"class")
[1] "locked"
In that case, all your "make read only" function needs to do is redefine the class for the object as locked:
class(out)<-c("locked",class(out))

R- creating a function that creates names of variables

I want to create a function who create a name depending of its argument.
I tried:
a <- function(x){ assign(paste("train",x,sep=""),4]) }
But when i do a(3) for example, nothing happens. what's wrong?
Thanks for your help.
edit: I will be more specific as requested.
I want to do a feature selection: the idea is to use a function to generate different subsets of features, generate a training set for each subset, then use the output of this function in another function (let's say lm() ) to test each training set. the number of subsets/training set is variable and I don't know how to store them in order to re-use them later.
You need to assign the variable within the global environment (or whichever environment you want the variable to live in).
> a <- function(x) { assign(paste('train', x, sep = ''), 4, envir = .GlobalEnv) }
> ls()
[1] "a"
> a(1)
> ls()
[1] "a" "train1"

Why does my R function have knowledge of variables that are not given as arguments? [duplicate]

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

Which function will identify the name of an R variable's enclosing environment?

I've been reading about R environments, and I'm trying to test my understanding with a simple example:
> f <- function() {
+ x <- 1
+ environment(x)
+ }
>
> f()
NULL
I'm assuming this means that the object x is enclosed by the environment named NULL, but when I try to list all the objects in that environment, R displays an error message:
> ls(NULL)
Error in as.environment(pos) : using 'as.environment(NULL)' is defunct
So I'm wondering if there's a built-in function I can use on the command line that will return the environment name given the object name. I tried this:
> environment(x)
Error in environment(x) : object 'x' not found
but that returned an error as well. Any help will be greatly appreciated.
Variables created in function calls are destroyed when the function finishes executing (unless you specifically create them in other persistent environments). As #joran pointed out, when a function is called, a temporary environment is created where local variables are defined, and is destroyed when the function is done executing (that memory is freed). However, as #MrFlick pointed out, if the function returns a function, the returned function maintains a reference to the environment it was created in. You can read more about 'scope', 'stack', and 'heap'. In R there are various ways you can define your variables into specified environments.
f <- function() {
x <<- 1 # create x in the global environment (or change it if it's there)
## or `assign` x to a value
## assign(x, value=1, envir=.GlobalEnv)
}
environment(f) # where was f defined?
exists("x", envir=.GlobalEnv)
# [1] TRUE
The package pryr has some nice functions to do these kind of things. For example, there is a function called where which will give you the environment of an object:
library(pryr)
f <- function() {
x <- 1
where("x")
}
f()
<environment: 0x0000000013356f50>
So the environment of x was the temporary enviroment created by function f(). As people have said before, this enviroment is detroyed after you run the function, so it will give you a different result each time you run f().

R scoping: disallow global variables in function

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

Resources