"Capturing" the global environment - r

I'm using functions from an external package (that I cannot modify). These functions put a lot of stuff in the global environment, for instance the package does things like
the.data <<- data.frame(A=rnorm(10),B=rnorm(10),C=rnorm(10)) ## A sample dataset
package.plot <- function(){
x.coords <<- the.data$A/the.data$B
y.coords <<- the.data$C
plot(x.coords, y.coords)
}
(obviously hyper-simplified example... here the key is that x.coods and y.coords are rather complex derivations, sufficiently complex that I do not want to recode them but find it advantageous to re-use the existing code)
I want to use these functions in my own scripts, namely make the same graph with ggplot. Of course, a first, obvious solution is
my.better.plot <- function(){
package.plot()
tibble(x.coords,y.coords) %>% ggplot(aes(x=x.coords,y=y.coords))+geom_point() # etc.
}
However, this has two issues:
I end up plotting twice (a minor issue, it is sufficiently fast
to be unnoticeable);
I "pollute" the global environment with
global x.coords and y.coords
Hence, I would like to run package.plot() in a "pseudo-global" environment to avoid ending up with global variables that may be modified in an "uncontrolled" way.
A workaround, of course, is
my.better.plot <- function(){
package.plot()
tibble(x.coords,y.coords) %>% ggplot(aes(x=x.coords,y=y.coords))+geom_point() # etc.
rm(x.coords,envir=.GlobalEnv)
}
However, I'd prefer to do something like
my.better.and.cleaner.plot(){
within.envir(dummy_env,my.better.plot)
}
.. assuming that there is, indeed, a function "within.envir" that allows to run its second argument in a mock global environment.
Is something like this possible at all ? I did read http://adv-r.had.co.nz/Environments.html , but could not find the answer... (not one that I understood, at any rate). Bonus question : if this is possible, how can I extract the return value of ggplot from dummy_env and return it ?

This function avoids the side effects as much as possible:
library(ggplot2)
library(magrittr)
library(tibble)
my.better.plot <- function(){
x.coords <- 1
y.coords <- 1
environment(package.plot) <- environment()
bmp(tempfile())
package.plot()
dev.off()
print(tibble(x.coords,y.coords) %>% ggplot(aes(x=x.coords,y=y.coords))+geom_point()) # etc.
}
my.better.plot()
#creates only the ggplot in the current device
ls(globalenv())
#[1] "my.better.plot" "package.plot" "the.data"

So this is unfortunately very hacky, since the <<- operator will traverse the environment tree upwards if it does not find the variable name (hence why you should basically never use it.
The one workaround is to call the function from another environment that already has the variables in question initialized. Then it will assign it into those variables and not traverse further up into the globalEnv. You need to know the variable names beforehand though.
f <- function(x) a <<- x
f(5)
# a = 5 in GlobalEnv
rm(a)
CapturedCall <- function(fun, CapturedVars,...)
{
stopifnot(is.function(fun))
SandBox <- new.env()
for(varName in CapturedVars) assign(varName, NA,SandBox)
environment(fun) <- SandBox
fun(...)
}
CapturedCall(f,"a",1)
#Nothing in GlobalEnv

Related

R functions using data.frame$column not working

The two following functions don't work at the moment, but do work when I write them out in full - not sure why. Any suggestions for fixes would be great.
change_specific_column_name <- function(data.frame,old_column_name,new_column_name){
names(data.frame)[names(data.frame) == old_column_name] <- new_column_name
}
change_specific_observations_name <- function(data.frame, column_name, old_obseration, new_observation){
data.frame$column_name[which(data.frame$column_name == old_obseration)] <- new_observation
}
test_frame <- data.frame(Does=1,This=2,Work=3)
change_specific_column_name(test_frame,"Work","Happen") # this doesn't change the name of the column
names(test_frame)[names(test_frame) == "Work"] <- "Happen" # writing out the function does change the name
Although not exact, you can think of the function's argument as passed-by-value, so it's perspicuous that changes made to the function's formal parameter don't impact the actual argument.
Any suggestions for fixes would be great.
If you really want a function to modify its argument, you could use a technique described e. g. under Call by reference in R; essentially just wrap your assignment in eval.parent(substitute(…)).
change_specific_column_name <- function(data.frame, old_column_name, new_column_name)
eval.parent(substitute(names(data.frame)[names(data.frame) == old_column_name] <- new_column_name))

How does lazy evaluation binds variable (in R)

I'm fairly new to R and I just noticed that the first call to a function seems to bind its environment parameters. How does this work ? (Or how is it it called, so I can look it up in the doc).
E.g.:
make.power <- function(n)
{
pow <- function(x)
{
x^n
}
}
i <- 3
cube <- make.power(i)
# print(cube(3)) # uncommenting this line change the value below
i <- 2
square <- make.power(i)
print(cube(3)) # this value changes depending on whether cube(3) was called before.
print(square(3))
I'm looking for a sample explanation of what's going on, or just the name of this feature, so I can look it up.
Thanks !

Using global variable in function

I have a very big dataset and I analyze it with R.
The problem is that I want to add some columns with different treatments on my dataset AND I need some recursive function which use some global variable. Each function modify some global variable et create some variables. So the duplication of my dataset in memory is a big problem...
I read some documentation: if I didn't misunderstand, neither the use of <<- nor assign() could help me...
What I want:
mydata <- list(read.table(), ...)
myfunction <- function(var1, var2) {
#modification of global mydata
mydata = ...
#definition of another variable with the new mydata
var3 <- ...
#recursive function
mydata = myfunction(var2, var3)
}
Do you have some suggestions for my problem?
Both <<- and assign will work:
myfunction <- function(var1, var2) {
# Modification of global mydata
mydata <<- ...
# Alternatively:
#assign('mydata', ..., globalenv())
# Assign locally as well
mydata <- mydata
# Definition of another variable with the new mydata
var3 <- ...
# Recursive function
mydata = myfunction(var2, var3)
}
That said, it’s almost always a bad idea to want to modify global data from a function, and there’s almost certainly a more elegant solution to this.
Furthermore, note that <<- is actually not the same as assigning to a variable in globalenv(), rather, it assigns to a variable in the parent scope, whatever that may be. For functions defined in the global environment, it’s the global environment. For functions defined elsewhere, it’s not the global environment.

R - pass a global variable to a function, modify it and save

I'm trying to build a dynamic function utilizing eval,parse, or whatever works
Intention of a function: a value setter.
Parameter input: list, name of list item, value
Return: don't really care
Current code
#call fun_lsSetValue(state_list,selected,"dropdown")
fun_lsSetValue <- function(ls,name,value){
pars <- as.list(match.call()[-1])
element <- as.character(eval(expression(pars$name)))
if(is.null(value))
eval(parse(text="ls[[element]] <- ''"))
else
eval(parse(text="ls[[element]] <- value"))
#part that I need help, I need to assign ls to "state_list" without
#having to hard coded it in this function
#I have tried everything I can think of like
#assign(deparse(substitute(ls)),ls,.GlobalEnv)
#state_list <<- ls works, but I want to be dynamic
}
The problem I found is I need to pass the value of a local variable "ls" to where it came from dynamically (state_list)
I know a <- function(a,name,value) {... return(a)} work, but this syntax is really not my preference.
Since I'm trying to learn if same thing can be done without the assign out side of function.
Any advise would be helpful.
Even though this is a terrible idea in general, something like
fun_lsSetValue <- function(ls,name,value){
lsname <- deparse(substitute(ls))
name <- deparse(substitute(name))
ls <- get(lsname, envir=globalenv())
if(is.null(value)) {
value<-''
}
ls[[name]]<-value
assign(lsname, ls,envir=globalenv())
}
should work
a <- list(x=1)
fun_lsSetValue(a,x,3)
a
# $x
# [1] 3

Iterating over separate lists in R

I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B

Resources