Reading global environment from within a function in R - r

I need to access (i.e., read and save) the items of the environment I'm working in. I have written the following function to save all objects in my (global) environment:
save_vars <- function(list.of.vars = NULL,
prefix = "StatusQuo",
path = "data") {
if(is.null(list.of.vars)) list.of.vars <- ls()
date_time <- Sys.time()
if (!is.null(path))
path <- paste0(path, "/")
file_name <- paste0(path, prefix, "_", date_time, ".RData")
save(list = list.of.vars, file = file_name)
}
The idea was that if no list.of.vars argument is passed to the function, using ls(), the function accesses the variables of the environment calling save_vars. However, it only saves the variables within the scope of the function itself. I know I can call the function as save_vars(ls()) to do the job, but is there a neater way around it?

Probably cleanest to pass the environment:
fun <- function(envir = parent.frame()) ls(envir = envir)
fun()
This lists the objects in the caller but also lets the user change which environment is used. For example, they could force the global environment to be used:
fun(.GlobalEnv)

Related

How do I avoid eval and parse?

I have written a function that sources files that contain scripts for other functions and stores these functions in an alternative environment so that they aren't cluttering up the global environment. The code works, but contains three instances of eval(parse(...)):
# sourceFunctionHidden ---------------------------
# source a function and hide the function from the global environment
sourceFunctionHidden <- function(functions, environment = "env", ...) {
if (environment %in% search()) {
while (environment %in% search()) {
if (!exists("counter", inherits = F)) counter <- 0
eval(parse(text = paste0("detach(", environment, ")")))
counter <- counter + 1
}
cat("detached", counter, environment, "s\n")
} else {cat("no", environment, "attached\n")}
if (!environment %in% ls(.GlobalEnv, all.names = T)) {
assign(environment, new.env(), pos = .GlobalEnv)
cat("created", environment, "\n")
} else {cat(environment, "already exists\n")}
sapply(functions, function(func) {
source(paste0("C:/Users/JT/R/Functions/", func, ".R"))
eval(parse(text = paste0(environment, "$", func," <- ", func)))
cat(func, "created in", environment, "\n")
})
eval(parse(text = paste0("attach(", environment, ")")))
cat("attached", environment, "\n\n")
}
Much has been written about the sub-optimality of the eval(parse(...)) construction (see here and here). However, the discussions that I've found mostly deal with alternate strategies for subsetting. The first and third instances of eval(parse(...)) in my code don't involve subsetting (the second instance might be related to subsetting).
Is there a way to call new.env(...), [environment name]$[function name] <- [function name], and attach(...) without resorting to eval(parse(...))? Thanks.
N.B.: I don't want to change the names of my functions to .name to hide them in the global environment
For what its worth, the function source actually uses eval(parse(...)), albeit in a somewhat subtle way. First, .Internal(parse(...)) is used to create expressions, which after more processing are later passed to eval. So eval(parse(...)) seems to be good enough for the R core team in this instance.
That said, you don't need to jump through hoops to source functions into a new environment. source provides an argument local that can be used for precisely this.
local: TRUE, FALSE or an environment, determining where the parsed expressions are evaluated.
An example:
env = new.env()
source('test.r', local = env)
testing it works:
env$test('hello', 'world')
# [1] "hello world"
ls(pattern = 'test')
# character(0)
And an example test.r file to use this on:
test = function(a,b) paste(a,b)
If you want to keep it off global_env, put it into a package. It's common for people in the R community to put a bunch of frequently used helper functions into their own personal package.
tl;dr: The right way to convert quoted strings to object names is to use assign() and get(). See this post.
The long answer: The answer from #dww about being able to source() directly to a specific environment led me to change the second instance of eval(parse(...)) as follows:
# old version
source(paste0("C:/Users/JT/R/Functions/", func, ".R"))
eval(parse(text = paste0(environment, "$", func," <- ", func)))
# new version
source(
paste0("C:/Users/JT/R/Functions/", func, ".R"),
local = get(environment)
)
The answer from #dww also got me to exploring attach(). attach() has an argument that allows specification of the environment to which to direct the output. This led me to change the third instance of eval(parse(...)) (below). Note the use of get() to convert the "env" that comes from environment to the unquoted env that attach() requires.
# old version
eval(parse(text = paste0("attach(", environment, ")")))
# new version
attach(get(environment), name = environment)
Finally, at some point in this process I was reminded that rm() has a character.only argument. detach() accepts the same argument, so I changed the second instance of eval(parse()) as below:
# old version
eval(parse(text = paste0("detach(", environment, ")")))
# new version
detach(environment, character.only = T)
So my new code is:
# sourceFunctionHidden ---------------------------
# source a function and hide the function from the global environment
sourceFunctionHidden <- function(functions, environment = "env", ...) {
if (environment %in% search()) {
while (environment %in% search()) {
if (!exists("counter", inherits = F)) counter <- 0
detach(environment, character.only = T)
counter <- counter + 1
}
cat("detached", counter, environment, "s\n")
} else {cat("no", environment, "attached\n")}
if (!environment %in% ls(.GlobalEnv, all.names = T)) {
assign(environment, new.env(), pos = .GlobalEnv)
cat("created", environment, "\n")
} else {cat(environment, "already exists\n")}
sapply(functions, function(func) {
source(
paste0("C:/Users/JT/R/Functions/", func, ".R"),
local = get(environment)
)
cat(func, "created in", environment, "\n")
})
attach(get(environment), name = environment)
cat("attached", environment, "\n\n")
}

function to clean current workspace apart from some variables

How can I write a generic R function that cleans the current workspace apart from some self-defined variables? For sure, I can achieve this in a single script with the following code:
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
rm(list = ls()[ !ls() %in% c('prj', 'src') ] )
# only prj and src remain
However I want this to be a function, so that it's applicable for multiple scripts and I can change the variables which should not be cleaned, in one place. Is this possible?
In case you wrap this in a function, you have to keep in mind, that a function will create its own environment, when executed. Therefore, you need to specify the environment every time (in each ls as well as rm). You probably want to remove them from the .GlobalEnv.
clean_workspace <- function(not_to_be_removed) {
rm(list =
setdiff(ls(envir = .GlobalEnv), c("clean_workspace", not_to_be_removed)),
envir = .GlobalEnv)
}
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
clean_workspace(c('prj', 'src'))
In order not to remove the function itself, it should be added to the values not to be removed.
If you want to read more about environments, have a look a this overview.
I think you want to remove the function itself. The important bit is to tell rm the environment where to remove these objects from:
clean_workspace <- function(not_to_be_removed, envir = globalenv()) {
objs <- ls(envir = envir)
rm(list = objs[ !objs %in% not_to_be_removed], envir = envir)
}
prj = '/path/to/project'
src = 'string'
data_to_clean = head(iris)
clean_workspace(c('prj', 'src'))
ls()
#> [1] "prj" "src"

Restrict which functions can modify an object

I have a variable in my global environment called myList. I have a function that modifies myList and re-assigns it to the global environment called myFunction. I only want myList to be modified by myFunction. Is there a way to prevent any other function from modifying myList?
For background, I am building a general tool for R users. I don't want users of the tool to be able to define their own function to modify myList. I also don't want to myself to be able to modify myList with a function I may write in the future.
I have a potential solution, but I don't like it. When the tool is executed, I could examine the text of every function defined by a user and search for the text that will assign myList to the global environment. I don't like the fact that I need to search over all functions.
Does anyone know if what I am looking for is implementable in R? Thanks for any help that can be provided.
For a reproducible example. I need code that will make the following example possible:
assign('myList', list(), envir = globalenv())
myFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # I need some code that will allow this function to run successfully
userFunction() # and cause an error when this function runs
Sounds like you need the modules package.
Basically, each unit of code has its own scope.
e.g.
# install.packages("modules")
# Load library
library("modules")
# Create a basic module
m <- module({
.myList <- list()
myFunction <- function() {
.myList <<- c(.myList, 'test')
}
get <- function() .myList
})
# Accessor
m$get()
# list()
# Your function
m$myFunction()
# Modification
m$get()
# [[1]]
# [1] "test"
Note, we tweaked the example slightly by changing the variable name to .myList from myList. So, we'll need to update that in the userfunction()
userFunction <- function() {
.myList <- c(.myList, 'test')
}
Running this, we now get:
userFunction()
# Error in userFunction() : object '.myList' not found
As desired.
For more detailed examples see modules vignette.
The alternative is you can define an environment (new.env()) and then lock it after you have loaded myList.
This is all around a bad idea. Beginning with assignment into the global environment (I'd never use a package that does this) to surprising your users. You should probably just use S4 or reference classes.
Anyway, you can lock the bindings (or environment if you followed better practices). You wouldn't stop an advanced user with that, but they would at least know that you don't want them to change the object.
createLocked <- function(x, name, env) {
assign(name, x, envir = env)
lockBinding(name, env)
invisible(NULL)
}
createLocked(list(), "myList", globalenv())
myFunction <- function() {
unlockBinding("myList", globalenv())
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
lockBinding("myList", globalenv())
invisible(NULL)
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # runs successfully
userFunction()
#Error in assign("myList", myList, envir = globalenv()) :
# cannot change value of locked binding for 'myList'

add a data frame to an existing rdata file

I am fairly new to R
and will try my best to make myself understood.
Suppose if I have an existing rdata file with multiple objects.
Now I want to add a data frame to it how do i do that?
I tried the following:
write.data.loc <- 'users/Jim/Objects'
rdataPath <- 'users/Jim/Objects.Rda'
myFile<- read.csv("myFile.csv")
loadObjects <- load(rdataPath)
save(loadObjects,myFile,file=paste(write.data.loc,".Rda",sep=""))
But this does not seem to work?
I'm not certain of your actual use-case, but if you must "append" a new object to an rda file, here is one method. This tries to be clever by loading all of the objects from the rda file into a new environment (there are many tutorials and guides that discuss the use and relevance of environments, Hadley's "Advanced R" is one that does a good job, I think).
This first step loads all of the objects into a new (empty) environment. It's useful to use an otherwise-empty environment so that we can get all of the objects from it rather easily using ls.
e <- new.env(parent = emptyenv())
load("path/to/.rda", envir = e)
The object you want to add should be loaded into a variable within the environment. Note that the dollar-sign access looks the same as lists, which makes it both (1) easy to confuse the two, and (2) easy to understand the named indexing that $ provides.
e$myFile <- read.csv("yourFile.csv")
This last piece, re-saving the rda file, is an indirect method. The ls(envir = e) returns the variable names of all objects within the environment. This is good, because save can deal with objects or with their names.
do.call("save", c(ls(envir = e), list(envir = e, file = "newsave.rda")))
Realize that this is not technically appending the data.frame to the rda file, it's over-writing the rda file with a new one that happens to contain all the previous objects and the new one data.frame.
I wrote this solution that can add dataframes, list, matrices or lists. By default it will overwrite an existing object but can be reversed with overwrite=TRUE.
add_object_to_rda <- function(obj, rda_file, overwrite = FALSE) {
.dummy <- NULL
if (!file.exists(rda_file)) save(.dummy, file = rda_file)
old_e <- new.env()
new_e <- new.env()
load(file = rda_file, envir = old_e)
name_obj <- deparse(substitute(obj)) # get the name of the object
# new_e[[name_obj]] <- get(name_obj) # use this only outside a function
new_e[[name_obj]] <- obj
# merge object from old environment with the new environment
# ls(old_e) is a character vector of the object names
if (overwrite) {
# the old variables take precedence over the new ones
invisible(sapply(ls(new_e), function(x)
assign(x, get(x, envir = new_e), envir = old_e)))
# And finally we save the variables in the environment
save(list = ls(old_e), file = rda_file, envir = old_e)
}
else {
invisible(sapply(ls(old_e), function(x)
assign(x, get(x, envir = old_e), envir = new_e)))
# And finally we save the variables in the environment
save(list = ls(new_e), file = rda_file, envir = new_e)
}
}

R Snowfall Environments issues

I am trying to get my head around the Snowfall library and its usage.
Having writing a simulation that makes use of environments, I encountered the following issue. If I source a file to load functions within the parallel mode, the function seems to use a different environment than when I declare the function within parallel mode direclty.
To make things a little bit more clear, lets consider the following two scripts:
q_func.R declares the function
foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# assigns the value x to the variable "val" in the environment envname
q_snowfall.R main function that uses snowfall
library(snowfall)
SnowFunc <- function(envname) {
# load the functions
# Option 1 not working
source("q_func.R")
# Option 2 working...
# foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# create the new environment
assign(envname, new.env())
# use the function as declared in q_func.R
# to assign random numbers to the new env
foo.bar(x = rnorm(1), envname = envname)
# return the environment including the random values
return(get("val", envir = get(envname)))
}
sfInit(parallel = TRUE, cpus = 2)
# create environment 'a' and 'b' that each will get a new variable
# called 'val' that gets assigned a random value
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
If I execute the script "q_snowfall.R" I get the error
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'a' not found
However, if I use the second option (declaring the function within the SnowFunc-function the error disappears.
Do you know how Snowfall handles the different environments? Or do you even have a solution for the issue. (note that 'q_func.R' actually takes some 100 lines of code, therefore I would prefer to have it in a separate file, thus the "keep option 2" is not a solution!)
Thank you very much!
Edit
If I change all get(envname) to get(envname, envir = globalenv()) it seems to work. But it seems to me that this is more or less a workaround and not a very snowfall-like solution.
I think the issue is not with snowfall but with the fact that you're passing the environment by name (as character). You don't need to change all occurences of get, and having it look in globalEnv may indeed be unsafe.
It is sufficient to change the get call in foo.bar to look in parent.frame() instead (i.e., the environment from which foo.bar was called). The following worked on my machine.
new q_func.R
foo.bar <- function(x, envname) assign("val", x, envir=get(envname,
pos=parent.frame()))
(not so) new q_snowfall.R
library(snowfall)
SnowFunc <- function(envname) {
assign(envname, new.env())
foo.bar(x = rnorm(1), envname = envname)
return(get("val", envir = get(envname)))
}
source("q_func.R")
sfInit(parallel = TRUE, cpus = 2)
sfExport("foo.bar")
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
Note also that I source'd before starting the cluster and used sfExport to export foo.bar to each node.

Resources