Clearing environment from within script - r

I'd like to clear the environment during a script run.
I am using of the (slightly changed) function which i've seen here
clearEnv <- function(env = parent.frame())
{
rm(list = setdiff( ls(all.names=TRUE, env = env), lsf.str(all.names=TRUE, env = env)),envir = env)
}
But this only works outside of a script. I want to call this function in-script of my personal R package before anything else is saved as variable.
Is there a way of doing this? Maybe nesting the function somewhere?
Thank you!
EDIT:
Thought it was clear enough - obviously not, so here is a minimal example:
script.R has two functions
clearEnv <- function(env = parent.frame())
{
rm(list = setdiff( ls(all.names=TRUE, env = env), lsf.str(all.names=TRUE, env = env)),envir = env)
}
myScript <- function(){
clearEnv()
/*do work*/
}
-- This does not work. I see that it is not clearing the environment at all, because when I save a variable, say thisShouldVanish <- 0, before running myScript(), it is still there afterwards.
If I run
clearEnv()
myScript()
it is working as expected, but I want to run myScript() only instead of separating the two function calls...
I mentioned, that this functions are part of a package, so neither these function nor anything else has to be in the environment -> it should be okay to delete mid-run.
I hope I was able to explain it better.

Related

knitr: Create environment to share variables across chunks which are deleted when finished

I'm developing an R package (https://github.com/rcst/rmaxima) that provides an interface to the computer algebra system Maxima. I want to include a knitr engine, so that it can directly be used with knitr. The package has functions to start/ spawn a child process and then send commands to and fetch results from Maxima. This way, it should be possible to carry over results between different chunks.
The interface works by creating a new object of an Rcpp-class of the interface. Creating the object spawns a child process, deleting the objects stops that process.
I want the engine to start a new child process each time the document is knit()ed, so that the result is reproducible. I'm thinking that I could create an extra environment that binds the interface object. The engine checks whether this objects exists in that environment. If it doesn't exist, it will be created, otherwise the engine can directly proceed to send code to the interface. When knit() exits, it exits the scope of it's environment and all variables within that environment are deleted automatically. This way, I need not stop the child process, because the object of the interface class get's deleted and the process is stopped automatically.
But I have no clue how to go about it. Any hints very much appreciated.
yihui provides an answer here.
In essence, (a) one sets a temporary variable in the parent frame to check if the engine is running and (b) inspects the lists of chunk labels to determine if the current one is the last and therefore trigger deletion and tear down after it's been processed:
last_label <- function(label = knitr::opts_current$get('label')) {
if (knitr:::child_mode()) return(FALSE)
labels <- names(knitr::knit_code$get())
tail(labels, 1) == label
}
knitr::knit_engines$set(maxima = local({
mx <- FALSE
function(options) {
if (!mx) {
maxima$startMaxima()
mx <<- TRUE
}
... # run maxima code
if (last_label(options$label)) {
maxima$stopMaxima()
mx <<- FALSE
}
}
}))
For completeness, I also came up with a solution which works, but is less robust.
Objects that go out of scope are automatically removed in R. However, actual deletion happens during R's garbage collection gc(), which cannot be controlled directly. So to get an object removed when knit() finishes, that object needs to be created in the environment of knit(), which is some levels above the engine call.
In principle one can register a function that does the actual clean-up via on.exit() of knit(), who's environment can be retrieved by parent.frame(n=...). Note that all objects of that scope are still present the moment the expressions registered to on.exit() get called.
maxima.engine <- function(options) {
e <- parent.frame(n = sys.parent() - 2)
if(!exists("mx", envir = e)) {
message("starting maxima on first chunk")
assign(x = "mx", value = new(rmaxima:::RMaxima), envir = e)
assign(x = "execute", value = get("mx", envir = e)$execute, envir = e)
assign(x = "stopMaxima", value = get("mx", envir = e)$stopMaxima, envir = e)
# quit maxima "on.exit"ing of knit()
# eval() doesn't work on "special primitive functions
# do.call() does ... this may break in the future
# see https://stat.ethz.ch/pipermail/r-devel/2013-November/067874.html
do.call("on.exit", list(quote(stopMaxima()),
add = TRUE), envir = e)
}
code <- options$code
out <- character(0);
for(i in 1:length(code))
out <- c(out, eval(call("execute", code[i]), envir = e))
engine_output(options, code, out)
}

Saving R objects to global environment from inside a nested function called by a parent function using mcmapply

I am trying to write an R-script that uses nested functions to save multiple data.frames (parallelly) to global environment. The below sample code works fine in Windows. But when I moved the same code to a Linux server, the objects the function - prepare_output() saves to global environment are not captured by the save() operation in function - get_output().
Am i missing something that is fundamentally different on how mcmapply affects scoping in Linux vs Windows?
library(data.table)
library(parallel)
#Function definitions
default_case <- function(flag){
if(flag == 1){
create_input()
get_output()
}else{
Print("select a proper flag!")
}
}
create_input <- function(){
dt_initial <<- data.table('col1' = c(1:20), 'col2' = c(21:40)) #Assignment to global envir
}
get_output<- function(){
list1 <- c(5,6,7,8)
dt1 <- data.table(dt_initial[1:15,])
prepare_output<- function(cnt){
dt_new <- data.table(dt1)
dt_new <- dt_new[col1 <= cnt, ]
assign(paste0('dt_final_',cnt), dt_new, envir = .GlobalEnv )
#eval(call("<<-",paste0('dt_final_',cnt), dt_new))
print('contents in global envir inside:')
print(ls(name = .GlobalEnv)) # This print all object names dt_final_5 through dt_final_8 correctly
}
mcmapply(FUN = prepare_output,list1,mc.cores = globalenv()$numCores)
print('contents in global envir outside:')
print(ls(name = .GlobalEnv)) #this does NOT print dataframes generated and assigned to global in function prepare_output
save( list = ls(name = .GlobalEnv)[ls(name = .GlobalEnv) %like% 'dt_final_' ], file = 'dt_final.Rdata')
}
if(Sys.info()['sysname'] == "Windows"){numCores <- 1}else{numCores <- parallel::detectCores()}
print('numCores:')
print(numCores)
#Function call
default_case(1)
The reason I an using nested structure is because the preparation of dt1 is time taking and I do not want to increase the execution time by its execution every loop in the apply call.
(Sorry, I'll write this as an 'Answer' because the comment box is too brief)
The best solution to your problem would be to make sure you return the objects you produce rather than trying to assign them from inside a function to an external environment [edit 2020-01-26] which never works in parallel processing because parallel workers do not have access to the environments of the main R process.
A very good rule of thumb in R that will help you achieve this: Never use assign() or <<- in code - neither for sequential nor for parallel processing. At best, you can get such code to work in sequential mode but, in general, you will end up with hard to maintain and error-prone code.
By focusing on returning values (y <- mclapply(...) in your example), you'll get it right. It also fits in much better with the overall functional design of R and parallelizes more naturally.
I've got a blog post 'Parallelize a For-Loop by Rewriting it as an Lapply Call' from 2019-01-11 that might help you transition to this functional style.

How does r clear all global environment quickly [duplicate]

I'm trying to clear my R workspace. Nothing I've found in any thread seems to work - and I've been googling and trying solutions for hours now :(
When I open R and type ls, the console displays all the code from a previous session:
function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
pattern)
{
if (!missing(name)) {
nameValue <- try(name, silent = TRUE)
if (identical(class(nameValue), "try-error")) {
name <- substitute(name)
if (!is.character(name))
name <- deparse(name)
warning(gettextf("%s converted to character string",
sQuote(name)), domain = NA)
pos <- name
}
else pos <- nameValue
}
all.names <- .Internal(ls(envir, all.names))
if (!missing(pattern)) {
if ((ll <- length(grep("[", pattern, fixed = TRUE))) &&
ll != length(grep("]", pattern, fixed = TRUE))) {
if (pattern == "[") {
pattern <- "\\["
warning("replaced regular expression pattern '[' by '\\\\['")
}
else if (length(grep("[^\\\\]\\[<-", pattern))) {
pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
warning("replaced '[<-' by '\\\\[<-' in regular expression pattern")
}
}
grep(pattern, all.names, value = TRUE)
}
else all.names
}
<bytecode: 0x2974f38>
<environment: namespace:base>
If I type rm(list=ls()) and then type ls again, I get the exact same response - i.e., the code from the previous session hasn't been removed.
By the way, I'm typing ls without the parentheses. Typing ls() with parentheses returns character(0).
I've also tried clearing the environment via RStudio, and even deleting the ~/.Rdata file. Nothing will clear this workspace. Every time I restart R and type ls, all the old code is still there.
I've already tried the tips in this thread, and they don't work for me.
Any idea why this might be happening? Thanks!
What you are seeing is the source code for the ls function. When you enter a function name without the parentheses, you'll see the complete source code for that function (provided that function is in one of the packages attached to the search path, or in the global environment).
When you see character(0) as the result of calling ls(), that means that there are no objects in the global environment. The base package, where ls calls home, is different from the global environment, and objects there cannot be removed.
When character(0) is the result of ls() after you call rm(list=ls()), you have successfully cleared the objects in the global environment.

Function to remove all variables

From remove all variables except functions, I got the command to remove all variables without removing functions. I don't want to type it in all the time, so I tried to turn it into a function defined in ~/.Rprofile. I'm new to R, but I've browsed the environment frame scheme, and have a shaky understanding of it. The following attempt doesn't seem to erase a time series object defined in the main environment (the command line prompt when I first start R):
# In ~/.Rprofile
clVar <- function()
{
rm(
list=setdiff( ls(all.names=TRUE), lsf.str(all.names=TRUE)),
envir=parent.frame()
)
}
The following code shows that it doesn't work:
( x<-ts( 1:100 ,frequency=12 ) )
clVar()
ls()
Thanks for any help in fixing the environment framing.
You need to pass the parent.frame() environment to ls, not just to rm. Otherwise ls won't find the variables to remove.
clVar <- function()
{
env <- parent.frame()
rm(
list = setdiff( ls(all.names=TRUE, env = env), lsf.str(all.names=TRUE, env = env)),
envir = env
)
}

What's similar to an #ifdef DEBUG in R?

I'm writing R code where I would like to have it run either in "non-debug" or "debug" mode. Under the debug mode, I would like the code to print-out runtime information.
In other languages, I would typically have some sort of print function that does nothing unless a flag is turned on (either for compilation or runtime).
For example, I can use #ifdef DEBUG (in compilation time), or set a debug level in run time.
What would be the equivalent way of doing this in R?
Same thing, minus the preprocessor:
Define a global variable variable (or use an options() value)
Insert conditional code that tests for the variable
Works with your functions (adding ..., verbose=options(myVerbose)), in your packages, etc pp
I have also used it in R scripts (driven by littler) using the CRAN package getopt to pick up a command-line option --verbose or --debug.
A slightly fancier version of Dirk's answer:
is_debug_mode <- function()
{
exists(".DEBUG", envir = globalenv()) &&
get(".DEBUG", envir = globalenv())
}
set_debug_mode <- function(on = FALSE)
{
old_value <- is.debug.mode()
.DEBUG <<- on
invisible(old_value)
}
Usage is, e.g.,
if(is_debug_mode())
{
#do some logging or whatever
}
and
set_debug_mode(TRUE) #turn debug mode on
set_debug_mode(FALSE) #turn debug mode off
It might also be worth looking at the Verbose class in the R.utils package, which allows you very fine control for printing run-time information of various sorts.
Extending Richie's code:
also, you can check for the system environment variable DEBUG for initialization:
isdbg <- function()
{
if(exists(".DEBUG", envir = globalenv()))
{
return(get(".DEBUG", envir = globalenv()))
} else #initialise from shell environment variable
{
debugmode <- !(toupper(Sys.getenv("DEBUG")) %in% c("", "FALSE", "0"))
assign(".DEBUG", debugmode, envir = globalenv())
return(debugmode)
}
}
setdbg <- function(on = FALSE)
{
old_value <- isdbg()
.DEBUG <<- on
invisible(old_value)
}
ifdbg <- function(x)
{
if(isdbg()) x
}
usage:
setdbg(TRUE) #turn debug mode on
setdbg(FALSE) #turn debug mode off
if(isdbg())
{
#do some logging or whatever
}
or
ifdebug(...do something here...)
Another possibility is log4r
To quote this page:
log4r: A simple logging system for R, based on log4j
logr4 provides an object-oriented logging system that uses an API
roughly equivalent to log4j and its related variants.

Resources