how to isolate a function - r

How can I ensure that when a function is called it is not allowed to grab variables from the global environment?
I would like the following code to give me an error. The reason is because I might have mistyped z (I wanted to type y).
z <- 10
temp <- function(x,y) {
y <- y + 2
return(x+z)
}
> temp(2,1)
[1] 12
I'm guessing the answer has to do with environments, but I haven't understood those yet.
Is there a way to make my desired behavior default (e.g. by setting an option)?

> library(codetools)
> checkUsage(temp)
<anonymous>: no visible binding for global variable 'z'
The function doesn't change, so no need to check it each time it's used. findGlobals is more general, and a little more cryptic. Something like
Filter(Negate(is.null), eapply(.GlobalEnv, function(elt) {
if (is.function(elt))
findGlobals(elt)
}))
could visit all functions in an environment, but if there are several functions then maybe it's time to think about writing a package (it's not that hard).

environment(temp) = baseenv()
See also http://cran.r-project.org/doc/manuals/R-lang.html#Scope-of-variables and ?environment.

environment(fun) = parent.env(environment(fun))
(I'm using 'fun' in place of your function name 'temp' for clarity)
This will remove the "workspace" environment (.GlobalEnv) from the search path and leave everything else (eg all packages).

Related

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

R: IF object is TRUE then assign object NOT WORKING

I am trying to write a very basic IF statement in R and am stuck. I thought I'd find someone with the same problem, but I cant. Im sorry if this has been solved before.
I want to check if a variable/object has been assigned, IF TRUE I want to execute a function that is part of a R-package. First I wrote
FileAssignment <- function(x){
if(exists("x")==TRUE){
print("yes!")
x <- parse.vdjtools(x)
} else { print("Nope!")}
}
I assign a filename as x
FILENAME <- "FILENAME.txt"
I run the function
FileAssignment(FILENAME)
I use print("yes!") and print("Nope!") to check if the IF-Statement works, and it does. However, the parse.vdjtools(x) part is not assigned. Now I tested the same IF-statement outside of the function:
if(exists("FILENAME1")==TRUE){
FILENAME1 <- parse.vdjtools(FILENAME1)
}
This works. I read here that it might be because the function uses {} and the if-statement does too. So I should remove the brackets from the if-statement.
FileAssignment <- function(x){
if(exists("x")==TRUE)
x <- parse.vdjtools(x)
else { print("Nope!")
}
Did not work either.
I thought it might be related to the specific parse.vdjtools(x) function, so I just tried assigning a normal value to x with x <- 20. Also did not work inside the function, however, it does outside.
I dont really know what you are trying to acheive, but I wpuld say that the use of exists in this context is wrong. There is no way that the x cannot exist inside the function. See this example
# All this does is report if x exists
f <- function(x){
if(exists("x"))
cat("Found x!", fill = TRUE)
}
f()
f("a")
f(iris)
# All will be found!
Investigate file.exists instead? This is vectorised, so a vector of files can be investigated at the same time.
The question that you are asking is less trivial than you seem to believe. There are two points that should be addressed to obtain the desired behavior, and especially the first one is somewhat tricky:
As pointed out by #NJBurgo and #KonradRudolph the variable x will always exist within the function since it is an argument of the function. In your case the function exists() should therefore not check whether the variable x is defined. Instead, it should be used to verify whether a variable with a name corresponding to the character string stored in x exists.
This is achieved by using a combination of deparse() and
substitute():
if (exists(deparse(substitute(x)))) { …
Since x is defined only within the scope of the function, the superassignment operator <<- would be required to make a value assigned to x visible outside the function, as suggested by #thothai. However, functions should not have such side effects. Problems with this kind of programming include possible conflicts with another variable named x that could be defined in a different context outside the function body, as well as a lack of clarity concerning the operations performed by the function.
A better way is to return the value instead of assigning it to a variable.
Combining these two aspects, the function could be rewritten like this:
FileAssignment <- function(x){
if (exists(deparse(substitute(x)))) {
print("yes!")
return(parse.vdjtools(x))
} else {
print("Nope!")
return(NULL)}
}
In this version of the function, the scope of x is limited to the function body and the function has no side effects. The return value of FileAssignment(a) is either parse.vdjtools(a) or NULL, depending on whether a exists or not.
Outside the function, this value can be assigned to x with
x <- FileAssignment(a)

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

Get variables that have been created inside a function

I have created a function (which is quite long) that I have saved in a .txt file.
It works well (I use source(< >) to access it).
My problem is that I have created a few variables in that function
ie:
myfun<-function(a,b) {
Var1=....
Var2=Var1 + ..
}
Now I want to get those variables.
When I include return() inside the function, its fine: the value comes up on the screen, but when I type Var1 outside the function, I have an error message "the object cannot be found".
I am new to R, but I was thinking it might be because "myfun" operates in a different envrionment than the global one, but when I did
environment()
environment: R_GlobalEnv>
environment(myfun1)
environment: R_GlobalEnv>
It seems to me the problem is elsewhere...
Any idea?
Thanks
I realize this answer is more than 3 years old but I believe the option you are looking for is as follows:
myfun <- function(a,b) {
Var1 = (a + b) / 2 # do whatever logic you have to do here...
Var2 <<- Var1 + a # then output result to Global Environment with the "<<-" object.
}
The double "<<-" assignment operator will output "Var2" to the global environment and you can then use or reference it however you like without having to use "return()" inside your function.
If you want to do it in a nice way, write a class and than provide a print method. Within this class it is possible to return variables invisible. A nice book which covers such topics is "The Art of R programming".
An easy fix would be save each variable you need later on an list and than return a list
(as Peter pointed out):
return(list(VAR1=VAR1, .....))

attach() inside function

I'd like to give a params argument to a function and then attach it so that I can use a instead of params$a everytime I refer to the list element a.
run.simulation<-function(model,params){
attach(params)
#
# Use elements of params as parameters in a simulation
detach(params)
}
Is there a problem with this? If I have defined a global variable named c and have also defined an element named c of the list "params" , whose value would be used after the attach command?
Noah has already pointed out that using attach is a bad idea, even though you see it in some examples and books. There is a way around. You can use "local attach" that's called with. In Noah's dummy example, this would look like
with(params, print(a))
which will yield identical result, but is tidier.
Another possibility is:
run.simulation <- function(model, params){
# Assume params is a list of parameters from
# "params <- list(name1=value1, name2=value2, etc.)"
for (v in 1:length(params)) assign(names(params)[v], params[[v]])
# Use elements of params as parameters in a simulation
}
Easiest way to solve scope problems like this is usually to try something simple out:
a = 1
params = c()
params$a = 2
myfun <- function(params) {
attach(params)
print(a)
detach(params)
}
myfun(params)
The following object(s) are masked _by_ .GlobalEnv:
a
# [1] 1
As you can see, R is picking up the global attribute a here.
It's almost always a good idea to avoid using attach and detach wherever possible -- scope ends up being tricky to handle (incidentally, it's also best to avoid naming variables c -- R will often figure out what you're referring to, but there are so many other letters out there, why risk it?). In addition, I find code using attach/detach almost impossible to decipher.
Jean-Luc's answer helped me immensely for a case that I had a data.frame Dat instead of the list as specified in the OP:
for (v in 1:ncol(Dat)) assign(names(Dat)[v], Dat[,v])

Resources