data.table's tables() function runs some of my .Rprofile functions - r

In my .Rprofile I have the following two lines defined in my .First
makeActiveBinding(".refresh", function() { system("R"); q("no") }, .GlobalEnv)
makeActiveBinding('.rm', function() {rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()}, .GlobalEnv)
They're usually harmless, unless I type them by accident! The first makes a .refresh function that will quit and restart the R session. The second empties the global environment. However, when using the tables() function from data.table these two functions are run which isn't exactly desirable.
For the moment, I've removed them from my .First but I'm curious if there is a way to avoid this. The offending lines in the tables() function are:
tt = objects(envir = env, all.names = TRUE)
ss = which(as.logical(sapply(tt, function(x) is.data.table(get(x,
envir = env)))))

I think you just discovered a downside to using active bindings in that way. Why don't you instead create ordinary functions .rm and .refresh, that you call in the usual way (i.e. .rm() and .refresh()), and which won't be executed upon simple inspection?
Here's what part of your .First might then look like:
.First <- function() {
assign(".rm",
function() {rm(list=ls(envir=.GlobalEnv), envir=.GlobalEnv)},
pos = .GlobalEnv)
}
## Try it out
j <- 1:10
ls()
.First()
.rm()
ls()
Edit, with solution:
On further thought, this seems to work, only executing the core bits when .rm is 'called' directly. It works by inspecting the length of the call stack, and only running rm(...) if there is just one call in it (representing the current call to .rm(). If .rm is called/touched by a call to some other function (e.g. tables()), the call stack will be longer, and rm() won't be executed.:
makeActiveBinding('.rm',
function() {
if(length(sys.calls())==1) {
rm(list=ls(envir = .GlobalEnv),envir=.GlobalEnv); gc()
}
},
.GlobalEnv)
## Try _it_ out
library(data.table)
j <- 100
.rm
ls()
j <- 100
tables()
ls()

Related

R pass argument when sourcing another file under foreach [duplicate]

Here is a toy example to illustrate my problem.
library(foreach)
library(doMC)
registerDoMC(cores=2)
foreach(i = 1:2) %dopar%{
i + 2
}
[[1]]
[1] 3
[[2]]
[1] 4
So far so good...
But if the code i + 2 is saved in the file addition.R and that I call that file using source() then
> foreach(i = 1:2) %dopar%{
+ source("addition.R")
+ }
Error in { : task 1 failed - "object 'i' not found"
I cannot fully reproduce your toy, but I had a smiliar problem, which I was able to solve by:
source(file, local = TRUE)
which should parse the source in the local environment, i.e. recognizing i.
The comment by NiceE and the answer by Sosel already address this; when calling source(file) it defaults to source(file, local = FALSE), which means that the code in the file sourced is evaluating in the global environment ("user's workspace") and there is, cf. ?source. Note that there is no variable i in the global environment. The solution is to make sure the file sourced in the environment that calls it, i.e. to use source(file, local = TRUE).
Solution:
library("foreach")
y <- foreach(i = 1:2) %dopar% {
i + 2
}
str(y)
doMC::registerDoMC(cores = 2L)
y <- foreach(i = 1:2) %dopar% {
source("addition.R", local = TRUE)
}
str(y)
Example of the same problem with a for() loop:
The fact that source() is evaluated in the global environment which is different from the calling environment where i lives can also be illustrated using a regular for loop by running the for loop in another environment than the global, e.g. inside a function or by:
local({
for(i in 1:2) {
source("addition.R")
}
})
which gives:
Error in eval(ei, envir) : object 'i' not found
Now, the reason why the above foreach(i = 1:2) %dopar% { source("addition.R") } works with registerDoSEQ() if and only if called from the global environment, is that then the foreach iteration is evaluated in the calling environment, which is the global environment, which is the environment that source() uses. However, if one used local(foreach(i = 1:2) %dopar% { ... }) also this fails analoguously to the above local(for(i in 1:2) { ... }) call.
In conclusion: nothing magic happens, but to understand it is a bit tedious.
I finally solved the problem by converting the source("addition.R") to a function and simply passing the variables into it. I don't know why but the suggested solutions based on source(file, local = TRUE) does not work.

Scoping -- Counter function works normally but not in custom package

I've got a counter function that I like to wrap around another function ("fun") to help keep track of how many times I've called it. I keep track of the calls by creating a new environment "counter.env" if it doesn't already exist and storing the count there.
counter <- function(fun) {
if (!exists("counter.env", envir = .GlobalEnv)) {
counter.env <<- new.env(parent = globalenv())
assign("i", 0, envir = counter.env)
}
function(...) {
local(i <- i+1, env = counter.env)
fun(...)
}
}
Also I have a function "get_calls" which is simply a call to get the count from the environment. I'd like it to run a 0 in case the user calls this before the actual function they're calling, for whatever reason they'd do this.
get_calls <- function() {
if (!exists("counter.env", envir = .GlobalEnv)) {
counter.env <<- new.env(parent = .GlobalEnv)
assign("i", 0, envir = counter.env)
}
get("i", envir = counter.env)
}
Finally lets say the function I'm wrapping is a function with its own argument, "fun(arg1)". So I wrap it.
count.and.call <- counter(fun)
And I call it like this:
count.and.call(arg1)
Immediately "counter.env" is created in my global environment and I can return the call with get_calls.
Now, drum roll When I put these functions in a package, and I build the package, and run
count.and.call(arg1)
the counter.env is not created in the global env. and it shows
error in eval(quote(i <- i + 1), counter.env) :
object 'counter.env' not found
My immediate concern is to fix my counter, which is probably something to do with the environment scoping.
However I am also not sure if I have used the best practices for my counter function, if so, could I get some advice?
The best practice is that your package should not meddle with the global environment. If you want to store state, create an environment for it in your package's namespace. You don't even have to specify the location yourself, it happens automatically by default.
In a source file:
counter.env <- new.env()
# this gets run every time your package is loaded
.onLoad <- function(libname, pkgname)
{
counter.env$i <- 0
}
counter <- function(fun)
{
# do stuff...
counter.env$i <- counter.env$i + 1
}
reset_counter <- function()
{
counter.env$i <- 0
}
# necessary if you want the user to see the counter and you don't export counter.env
get_counter <- function()
{
counter.env$i
}
Another way very R-ish way to do this is using closures. For example:
countingFun <- function(fun) {
count <- 0
function(x) {
count <<- count + 1
fun(x)
}
}
count <- function(fun) {
environment(fun)$count
}
This keeps the count in the environment of the function, which is created automatically, containing all the variables that are local to the call to countingFun. Then you can do
myMean <- countingFun(mean)
mySd <- countingFun(sd)
myMean(x)
mySd(x)
myMean(x)
count(myMean) # 2
count(mySd) # 1
You might want to add some error checking to count, to make sure it isn't being called on a function that isn't being counted.

Restrict which functions can modify an object

I have a variable in my global environment called myList. I have a function that modifies myList and re-assigns it to the global environment called myFunction. I only want myList to be modified by myFunction. Is there a way to prevent any other function from modifying myList?
For background, I am building a general tool for R users. I don't want users of the tool to be able to define their own function to modify myList. I also don't want to myself to be able to modify myList with a function I may write in the future.
I have a potential solution, but I don't like it. When the tool is executed, I could examine the text of every function defined by a user and search for the text that will assign myList to the global environment. I don't like the fact that I need to search over all functions.
Does anyone know if what I am looking for is implementable in R? Thanks for any help that can be provided.
For a reproducible example. I need code that will make the following example possible:
assign('myList', list(), envir = globalenv())
myFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # I need some code that will allow this function to run successfully
userFunction() # and cause an error when this function runs
Sounds like you need the modules package.
Basically, each unit of code has its own scope.
e.g.
# install.packages("modules")
# Load library
library("modules")
# Create a basic module
m <- module({
.myList <- list()
myFunction <- function() {
.myList <<- c(.myList, 'test')
}
get <- function() .myList
})
# Accessor
m$get()
# list()
# Your function
m$myFunction()
# Modification
m$get()
# [[1]]
# [1] "test"
Note, we tweaked the example slightly by changing the variable name to .myList from myList. So, we'll need to update that in the userfunction()
userFunction <- function() {
.myList <- c(.myList, 'test')
}
Running this, we now get:
userFunction()
# Error in userFunction() : object '.myList' not found
As desired.
For more detailed examples see modules vignette.
The alternative is you can define an environment (new.env()) and then lock it after you have loaded myList.
This is all around a bad idea. Beginning with assignment into the global environment (I'd never use a package that does this) to surprising your users. You should probably just use S4 or reference classes.
Anyway, you can lock the bindings (or environment if you followed better practices). You wouldn't stop an advanced user with that, but they would at least know that you don't want them to change the object.
createLocked <- function(x, name, env) {
assign(name, x, envir = env)
lockBinding(name, env)
invisible(NULL)
}
createLocked(list(), "myList", globalenv())
myFunction <- function() {
unlockBinding("myList", globalenv())
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
lockBinding("myList", globalenv())
invisible(NULL)
}
userFunction <- function() {
myList <- c(myList, 'test')
assign('myList', myList, envir = globalenv())
}
myFunction() # runs successfully
userFunction()
#Error in assign("myList", myList, envir = globalenv()) :
# cannot change value of locked binding for 'myList'

Using source() within parallel foreach loops

Here is a toy example to illustrate my problem.
library(foreach)
library(doMC)
registerDoMC(cores=2)
foreach(i = 1:2) %dopar%{
i + 2
}
[[1]]
[1] 3
[[2]]
[1] 4
So far so good...
But if the code i + 2 is saved in the file addition.R and that I call that file using source() then
> foreach(i = 1:2) %dopar%{
+ source("addition.R")
+ }
Error in { : task 1 failed - "object 'i' not found"
I cannot fully reproduce your toy, but I had a smiliar problem, which I was able to solve by:
source(file, local = TRUE)
which should parse the source in the local environment, i.e. recognizing i.
The comment by NiceE and the answer by Sosel already address this; when calling source(file) it defaults to source(file, local = FALSE), which means that the code in the file sourced is evaluating in the global environment ("user's workspace") and there is, cf. ?source. Note that there is no variable i in the global environment. The solution is to make sure the file sourced in the environment that calls it, i.e. to use source(file, local = TRUE).
Solution:
library("foreach")
y <- foreach(i = 1:2) %dopar% {
i + 2
}
str(y)
doMC::registerDoMC(cores = 2L)
y <- foreach(i = 1:2) %dopar% {
source("addition.R", local = TRUE)
}
str(y)
Example of the same problem with a for() loop:
The fact that source() is evaluated in the global environment which is different from the calling environment where i lives can also be illustrated using a regular for loop by running the for loop in another environment than the global, e.g. inside a function or by:
local({
for(i in 1:2) {
source("addition.R")
}
})
which gives:
Error in eval(ei, envir) : object 'i' not found
Now, the reason why the above foreach(i = 1:2) %dopar% { source("addition.R") } works with registerDoSEQ() if and only if called from the global environment, is that then the foreach iteration is evaluated in the calling environment, which is the global environment, which is the environment that source() uses. However, if one used local(foreach(i = 1:2) %dopar% { ... }) also this fails analoguously to the above local(for(i in 1:2) { ... }) call.
In conclusion: nothing magic happens, but to understand it is a bit tedious.
I finally solved the problem by converting the source("addition.R") to a function and simply passing the variables into it. I don't know why but the suggested solutions based on source(file, local = TRUE) does not work.

Convenience function for exporting objects to the global environment

UPDATE: I have added a variant
of Roland's implementation to the kimisc package.
Is there a convenience function for exporting objects to the global environment, which can be called from a function to make objects available globally?
I'm looking for something like
export(obj.a, obj.b)
which would behave like
assign("obj.a", obj.a, .GlobalEnv)
assign("obj.b", obj.b, .GlobalEnv)
Rationale
I am aware of <<- and assign. I need this to refactor oldish code which is simply a concatenation of scripts:
input("script1.R")
input("script2.R")
input("script3.R")
script2.R uses results from script1.R, and script3.R potentially uses results from both 1 and 2. This creates a heavily polluted namespace, and I wanted to change each script
pollute <- the(namespace)
useful <- result
to
(function() {
pollute <- the(namespace)
useful <- result
export(useful)
})()
as a first cheap countermeasure.
Simply write a wrapper:
myexport <- function(...) {
arg.list <- list(...)
names <- all.names(match.call())[-1]
for (i in seq_along(names)) assign(names[i],arg.list[[i]],.GlobalEnv)
}
fun <- function(a) {
ttt <- a+1
ttt2 <- a+2
myexport(ttt,ttt2)
return(a)
}
print(ttt)
#object not found error
fun(2)
#[1] 2
print(ttt)
#[1] 3
print(ttt2)
#[1] 4
Not tested thoroughly and not sure how "safe" that is.
You can create an environment variable and use it within your export function. For example:
env <- .GlobalEnv ## better here to create a new one :new.env()
exportx <- function(x)
{
x <- x+1
env$y <- x
}
exportx(3)
y
[1] 4
For example , If you want to define a global options(emulate the classic R options) in your package ,
my.options <- new.env()
setOption1 <- function(value) my.options$Option1 <- value
EDIT after OP clarification:
You can use evalq which take 2 arguments :
envir the environment in which expr is to be evaluated
enclos where R looks for objects not found in envir.
Here an example:
env.script1 <- new.env()
env.script2 <- new.env()
evalq({
x <- 2
p <- 3
z <- 5
} ,envir = env.script1,enclos=.GlobalEnv)
evalq({
h <- x +2
} ,envir = env.script2,enclos=myenv.script1)`
You can see that all variable are created within the environnment ( like local)
env.script2$h
[1] 4
env.script1$p
[1] 3
> env.script1$x
[1] 2
First, given your use case, I don't see how an export function is any better than using good (?) old-fashioned <<-. You could just do
(function() {
pollute <- the(namespace)
useful <<- result
})()
which will give the same result as what's in your example.
Second, rather than anonymous functions, it seems better form to use local, which allows you to run involved computations without littering your workspace with various temporary objects.
local({
pollute <- the(namespace)
useful <<- result
})
ETA: If it's important for whatever reason to avoid modifying an existing variable called useful, put an exists check in there. The same applies to the other solutions presented.
local({
.....
useful <- result
if(!exists("useful", globalenv())) useful <<- useful
})

Resources