global variable - r

I have one question about using global variable in R. I write two examples
First version:
a <- 1
fun <- function(b){
return(a+b)
}
fun(b)
Second version:
a <- 1
fun <- function(a,b){
return(a+b)
}
fun(a,b)
I want to know which version is correct or recommand.

Relying on global state inside functions is frowned upon for various reasons (which can be roughly grouped under the caption encapsulation. So the second version would be superior in most scenarios.
However, the situation changes once the variable is defined inside a non-global environment. In that case, you’ve encapsulated your state into something that’s put away neatly. This is sometimes useful, because it allows you to create functions based on some input.
The classical example is something like this:
adder = function (value_to_add) {
function (x) {
x + value_to_add
}
}
This may look obscure but it’s simply a function that returns another function: you can use it to create functions. Here, for example, we create a function that takes one argument and adds the value 5 to it:
add5 = adder(5)
And here’s one that adds π to its argument:
add_pi = adder(pi)
Both of these are normal functions:
> add5(10)
[1] 15
> add_pi(10)
[1] 13.14159
Both of these functions, add5 and add_pi, access a variable, value_to_add, that’s outside of the function itself, in a separate environment. And it’s important to realise that these are different environments from each other: add5’s value_to_add is a different value, in a different environment, from add_pi’s value_to_add:
> environment(add5)$value_to_add
[1] 5
> environment(add_pi)$value_to_add
[1] 3.141593
(environment(f) allows you to inspect the environment to which a function f belongs. $ is used to access names inside that environment.)

Related

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Global vs Local within userdefined function

I am making changes to a global dataframe within my user defined function. The dataframe is created outside of the function.
However, my changes to the dataframe are not visible outside of the function. Only if I use a return option, I end up with the dataframe.
Is there a way to change this?
Whether you should do "call by reference" functionality in R is one question (addressed in the comments - generally the answer is no).
However, you asked whether you can do it. The answer is yes, you can modify your global dataframe in the local scope of your function. Here is how you do it: 1) Use eval.parent() (set the evaluation scope to the calling scope, which, presumably, is the global scope) and 2) substitute() (to replace the variable reference instead of destroying one and creating a new one).
Here's an example:
> attach(mtcars)
> my_cars <- mtcars[mpg,] #not sorted
> pointless_sort <- function() {
+ eval.parent(substitute(my_cars<-mtcars[order(mpg),]))
+ }
> pointless_sort()
> #here the global my_cars is ordered/sorted by mpg
Important points: 1) You can do it; 2) Good programming generally means not doing it (but we've all been lazy, wanted a convenient way to split up code). Now you have the power.
"With Great Power Comes Great Responsibility."

Assign to an environment by reference id (i.e. without passing env. to child functions)

Programmers often uses multiple small functions inside of larger functions. Along the way we may want to collect things in an environment for later reference. We could create an environment with new.env(hash=FALSE) and pass that along to the smaller functions and assign with assign. Well and dandy. I was wondering if we could use the reference id of the environment and not pass it along to the child functions but still assign to the environment by reference the environment id.
So here I make
myenv <- new.env(hash=FALSE)
## <environment: 0x00000000588cc918>
And as typical could assign like this if I passed along to the child functions the environment.
assign("elem1", 35, myenv)
myenv[["elem1"]]
# 35
What I want is to make the environment in the parent function and pass the reference id along instead so I want to do something like:
assign("elem2", 123, "0x00000000588cc918")
But predictably results in:
## Error in as.environment(pos) :
## no item called "0x00000000588cc918" on the search list
Is it possible to pass along just the environment id and use that instead? This seems cleaner than passing the environment from function to function and returning as a list and then operating on the environment in that list...and maybe more memory efficient too.
I would want to also access this environment by reference as well.
Environments are not like lists. Passing an environment to a function does not copy its contents even if the contents of the environment are modified within the function so you don't have to worry about inefficiency. Also, when an environment is passed to a function which modifies its contents the contents are preserved even after the function completes so unlike the situation with lists there is no need to pass the environment back.
For example, the code below passes environment e to function f and f modifies the contents of it but does not pass it back. After f completes the caller sees the change.
f <- function(x, env) {
env$X <- x
TRUE
}
e <- new.env()
f(1, e)
## [1] TRUE
e$X
## [1] 1
More about enviorments in Hadely's book: http://adv-r.had.co.nz/Environments.html

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

Get variables that have been created inside a function

I have created a function (which is quite long) that I have saved in a .txt file.
It works well (I use source(< >) to access it).
My problem is that I have created a few variables in that function
ie:
myfun<-function(a,b) {
Var1=....
Var2=Var1 + ..
}
Now I want to get those variables.
When I include return() inside the function, its fine: the value comes up on the screen, but when I type Var1 outside the function, I have an error message "the object cannot be found".
I am new to R, but I was thinking it might be because "myfun" operates in a different envrionment than the global one, but when I did
environment()
environment: R_GlobalEnv>
environment(myfun1)
environment: R_GlobalEnv>
It seems to me the problem is elsewhere...
Any idea?
Thanks
I realize this answer is more than 3 years old but I believe the option you are looking for is as follows:
myfun <- function(a,b) {
Var1 = (a + b) / 2 # do whatever logic you have to do here...
Var2 <<- Var1 + a # then output result to Global Environment with the "<<-" object.
}
The double "<<-" assignment operator will output "Var2" to the global environment and you can then use or reference it however you like without having to use "return()" inside your function.
If you want to do it in a nice way, write a class and than provide a print method. Within this class it is possible to return variables invisible. A nice book which covers such topics is "The Art of R programming".
An easy fix would be save each variable you need later on an list and than return a list
(as Peter pointed out):
return(list(VAR1=VAR1, .....))

Resources