I am currently working on user defined functions aimed at modelling empirical data and I have problems with objects / parameters passed to the function:
bestModel <- function(k=4L, R2=0.994){
print(k) # here, everything is still fine
lmX <- mixlm::lm(getLinearModelFunction(k), data)
best <- mixlm::best.subsets(lmX, nbest=1)
.
.
.
}
At first, everything works as expected, but as soon as I want to pass the parameter k to another user defined function getLinearModelFunction(), an error is thrown:
Error in getLinearModelFunction(k) : object 'k' not found
It doesn't help, if I am assigning a new parameter, e. g. l <- k and try to pass that on. The parameter doesn't seem to be available for the other function. I ran into this problem not only with primitive data types, but as well complex structures. On command line, everything works, as long as the objects are in my workspace.
To sum it up: Passing parameters work only within that function, but calls of other functions from there onwards result in error. Why? And: What to do about it?
EDIT:
While trying to resolve the problem, it gets really weird. I stripped down all functions:
functionA <- function(data, k){
lmX <- mixlm::lm(functionB(k), data)
summary(lmX)
# best <- mixlm::best.subsets(lmX,nbest=1)
}
functionB <- function(k=4){
if(k==1){
return(formula("raw ~ L1"))
}else if(k==2){
return(formula("raw ~ L1 + L2"))
}else if(k==3){
return(formula("raw ~ L1 + L2 + L3 "))
}else if(k==4){
return(formula("raw ~ L1 + L2 + L3 + L4"))
}
}
Let's say, we have a data.frame d with the variables raw, L1, L2, L3, L4 ... As long, as there is the commenting # before best, it works. As soon as it is removed, calling functionA(d, 3) results in
Error in functionB(k) : object 'k' not found
Even, though k doesn't play a role in that function and before that, it worked.
Ok, indeed, this was an environment thing. The solution is to get the current environment and to take the object from there:
functionA <- function(data, k){
e <- environment()
lmX <- mixlm::lm(functionB(e$k), e$data)
summary(lmX)
best <- mixlm::best.subsets(lmX,nbest=1)
}
This is usually not a problem, when directly working with are packages. The objects usually are in the global environments then. When working with functions, each function has its' own environment. I managed to solve this while starting to learn about packaging the code: http://adv-r.had.co.nz/Environments.html
Related
I want to add an environment to a search path and modify the values of variables within that environment, in a limited chunk of code, without having to specify the name of the environment every time I refer to a variable: for example, given the environment
ee <- list2env(list(x=1,y=2))
Now I would like to do stuff like
ee$x <- ee$x+1
ee$y <- ee$y*2
ee$z <- 6
but without appending ee$ to everything (or using assign("x", ee$x+1, ee) ... etc.): something like
in_environment(ee, {
x <- x+1
y <- y+2
z <- 6
})
Most of the solutions I can think of are explicitly designed not to modify the environment, e.g.
?attach: "The database is not actually attached. Rather, a new environment
is created on the search path ..."
within(): takes lists or data frames (not environments) "... and makes the corresponding modifications to a copy of ‘data’"
There are two problems with <<-: (1) using it will cause NOTEs in CRAN checks (I think? can't find direct evidence of this, but e.g. see here — maybe this only happens because of the appearance of assigning to a locally undefined symbol? I guess I could put this in a package and test it with --as-cran to confirm ...); (2) it will try to assign in the parent environment, which in a package context [which this is] will be locked ...
I suppose I could use a closure as described in section 10.7 of the Introduction to R by doing
clfun <- function() {
x <- 1
y <- 2
function(...) {
x <<- x + 1
y <<- y * 2
}
}
myfun <- clfun()
This seems convoluted (but I guess not too bad?) but:
will still incur problem #1 (CRAN check?).
I think (??) it won't work with variables that don't already exist in the environment (would need an explicit assign() for that ...)
doesn't allow a choice of which environment to operate in - it's necessarily going to work in the enclosing environment, not with arbitrary environment ee
Am I missing something obvious and idiomatic?
Thanks to #Nuclear03020704 ! I think with() was what I wanted all along; I was incorrectly assuming that it would also create a local copy of the environment, but it only does this if the data argument is not already an environment.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})
does exactly that I want.
Just had another idea, which also seems to have some drawbacks: using a big eval clause. Rather than make my question a long laundry list of unsatisfactory solutions, I'll add it here.
myfun <- function() {
eval(quote( {
x <- x+1
y <- y*2
z <- 3
}), envir=ee)
}
This does seem to work, but also seems very weird/mysterious! I hate to think about explaining it to someone who's being using R for less than 10 years ... I suppose I could write an in_environment() based on this, but I'd have to be very careful to capture the expression properly without evaluating it ...
What about with()? From here,
with(data, expr)
data is the data to use for constructing an environment. For the default with method this may be an environment, a list, a data frame, or an integer.
expr is the expression to evaluate.
with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)
Note that assignments within expr take place in the constructed environment and not in the user's workspace.
with() returns value of the evaluated expr.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})
I would like to use the plotMA function of limma.
The example of the documentation works fine:
A <- runif(1000,4,16)
y <- A + matrix(rnorm(1000*3,sd=0.2),1000,3)
status <- rep(c(0,-1,1),c(950,40,10))
y[,1] <- y[,1] + status
plotMA(y, array=1, status=status, values=c(-1,1), hl.col=c("blue","red"))
Now I would like to access the underlying data that is used for the plot as I would like to use the data in a different context, not just the plot. I currently don't see a way to access the data; of course I could implement the method myself and only use the data, but that feels wrong.
Is there a way to access the underlying data used for the MA plot?
Looking at the code of plotMA we see that several variables are created and used for plotting. These variables are not returned however.
You could now copy and paste the function to write your own function, which plots and returns the data. This is however, error-prone,if there is a new version of the function you may rely on old code.
So what you can do instead is to use trace to insert arbitrary code into plotMA notably some code which stores the data in your global environment. I illustrate the idea with a toy example:
f <- function(x) {
y <- x + rnorm(length(x))
plot(x, y)
invisible()
}
If we would like to use y in this function we could do something like this
trace(f, exit = quote(my_y <<- y))
# [1] "f"
ls()
# [1] "f"
f(1:10)
# Tracing f(1:10) on exit
ls()
# [1] "f" "my_y"
And now we can access my_y.
What you should do:
Look at the code of plotMA
Identify which part of the data you need (e.g. x, y and sel)
Use trace(plotMA, exit = quote({my_data <<- list(x, y, sel)}), where = asNamespace("limma"))
Run plotMA
Access the data via my_data
Note. Check out ?trace to fully understand the possibilities of it. In particular, if you want to inject your code not at the end (exit) but at another psoition (maybe because intermediate variables are overwritten and you need the first results) for which you would need to use the at parameter of trace
Update
Maybe the easiest is to get a full dump of all local variables defined in the function:
trace("plotMA", exit = quote(var_dump <<- mget(ls())), where = asNamespace("limma"))
Background
I’m in the process of creating a shortcut for lambdas, since the repeated use of function (…) … clutters my code considerably. As a remedy, I’m trying out alternative syntaxes inspired by other languages such as Haskell, as far as this is possible in R. Simplified, my code looks like this:
f <- function (...) {
args <- match.call(expand.dots = FALSE)$...
last <- length(args)
params <- c(args[-last], names(args)[[last]])
function (...)
eval(args[[length(args)]],
envir = setNames(list(...), params),
enclos = parent.frame())
}
This allows the following code:
f(x = x * 2)(5) # => 10
f(x, y = x + y)(1, 2) # => 3
etc.
Of course the real purpose is to use this with higher-order functions1:
Map(f(x = x * 2), 1 : 10)
The problem
Unfortunately, I sometimes have to nest higher-order functions and then it stops working:
f(x = Map(f(y = x + y), 1:2))(10)
yields “Error in eval(expr, envir, enclos): object x not found”. The conceptually equivalent code using function instead of f works. Furthermore, other nesting scenarios also work:
f(x = f(y = x + y)(2))(3) # => 5
I’m suspecting that the culprit is the parent environment of the nested f inside the map: it’s the top-level environment rather than the outer f’s. But I have no idea how to fix this, and it also leaves me puzzled that the second scenario above works. Related questions (such as this one) suggest workarounds which are not applicable in my case.
Clearly I have a gap in my understanding of environments in R. Is what I want possible at all?
1 Of course this example could simply be written as (1 : 10) * 2. The real application is with more complex objects / operations.
The answer is to attach parent.frame() to the output function's environment:
f <- function (...) {
args <- match.call(expand.dots = FALSE)$...
last <- length(args)
params <- c(args[-last], names(args)[[last]])
e <- parent.frame()
function (...)
eval(args[[length(args)]],
envir = setNames(list(...), params),
enclos = e)
}
Hopefully someone can explain well why this works and not yours. Feel free to edit.
Great question.
Why your code fails
Your code fails because eval()'s supplied enclos= argument does not point far enough up the call stack to reach the environment in which you are wanting it to next search for unresolved symbols.
Here is a partial diagram of the call stack from the bottom of which your call to parent.frame() occurs. (To make sense of this, it's important to keep in mind that the function call from which parent.frame() is here being called is not f(), but a call the anonymous function returned by f() (let's call it fval)).
## Note: E.F. = "Evaluation Frame"
## fval = anonymous function returned as value of nested call to f()
f( <------------------------- ## E.F. you want, ptd to by parent.frame(n=3)
Map(
mapply( <-------------------- ## E.F. pointed to by parent.frame(n=1)
fval( |
parent.frame(n=1 |
In this particular case, redefining the function returned by f() to call parent.frame(n=3) rather than parent.frame(n=1) produces working code, but that's not a good general solution. For instance, if you wanted to call f(x = mapply(f(y = x + y), 1:2))(10), the call stack would then be one step shorter, and you'd instead need parent.frame(n=2).
Why flodel's code works
flodel's code provides a more robust solution by calling parent.frame() during evaluation of the inner call to f in the nested chain f(Map(f(), ...)) (rather than during the subsequent evaluation of the anonymous function fval returned by f()).
To understand why his parent.frame(n=1) points to the appropriate environment, it's important to recall that in R, supplied arguments are evaluated in the the evaluation frame of the calling function. In the OP's example of nested code, the inner f() is evaluated during the processing of Map()'s supplied arguments, so it's evaluation environment is that of the function calling Map(). Here, the function calling Map() is the outer call to f(), and its evaluation frame is exactly where you want eval() to next be looking for symbols:
f( <--------------------- ## Evaluation frame of the nested call to f()
Map(f( |
parent.frame(n=1 |
I am trying to write an R function that takes a data set and outputs the plot() function with the data set read in its environment. This means you don't have to use attach() anymore, which is good practice. Here's my example:
mydata <- data.frame(a = rnorm(100), b = rnorm(100,0,.2))
plot(mydata$a, mydata$b) # works just fine
scatter_plot <- function(ds) { # function I'm trying to create
ifelse(exists(deparse(quote(ds))),
function(x,y) plot(ds$x, ds$y),
sprintf("The dataset %s does not exist.", ds))
}
scatter_plot(mydata)(a, b) # not working
Here's the error I'm getting:
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'closure'
I tried several other versions, but they all give me the same error. What am I doing wrong?
EDIT: I realize the code is not too practical. My goal is to understand functional programming better. I wrote a similar macro in SAS, and I was just trying to write its counterpart in R, but I'm failing. I just picked this as an example. I think it's a pretty simple example and yet it's not working.
There are a few small issues. ifelse is a vectorized function, but you just need a simple if. In fact, you don't really need an else -- you could just throw an error immediately if the data set does not exist. Note that your error message is not using the name of the object, so it will create its own error.
You are passing a and b instead of "a" and "b". Instead of the ds$x syntax, you should use the ds[[x]] syntax when you are programming (fortunes::fortune(312)). If that's the way you want to call the function, then you'll have to deparse those arguments as well. Finally, I think you want deparse(substitute()) instead of deparse(quote())
scatter_plot <- function(ds) {
ds.name <- deparse(substitute(ds))
if (!exists(ds.name))
stop(sprintf("The dataset %s does not exist.", ds.name))
function(x, y) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
plot(ds[[x]], ds[[y]])
}
}
scatter_plot(mydata)(a, b)
This is something I find difficult to understand:
cl = makeCluster(rep("localhost", 8), "SOCK")
# This will not work, error: dat not found in the nodes
pmult = function(cl, a, x)
{
mult = function(s) s*x
parLapply(cl, a, mult)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
# This will work
pmult = function(cl, a, x)
{
x
mult = function(s) s*x
parLapply(cl, a, mult)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
# This will work
pmult = function(cl, a, x)
{
mult = function(s, x) s*x
parLapply(cl, a, mult, x)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
The first function doesn't work because of lazy evaluation of arguments. But what is lazy evaluation? When mult() is executed, does it not require x to be evaluated? The second one works because it forces x to be evaluated. Now the most strange thing happens in the third function, nothing is done but make mult() receive x as an extra argument, and suddenly everything works!
Another thing is, what should I do if I don't want to define all the variables and functions inside the function calling parLapply()? The following definitely will not work:
pmult = function(cl)
{
source("a_x_mult.r")
parLapply(cl, a, mult, x)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
I can pass all these variables and functions as arguments:
f1 = function(i)
{
return(rnorm(i))
}
f2 = function(y)
{
return(f1(y)^2)
}
f3 = function(v)
{
return(v- floor(v) + 100)
}
test = function(cl, f1, f2, f3)
{
x = f2(15)
parLapply(cl, x, f3)
}
test(cl, f1, f2, f3)
Or I can use clusterExport(), but it'll be cumbersome when there are lots of objects to be exported. Is there a better way?
To understand this, you have to realize that there is an environment associated with every function, and what that environment is depends on how the function was created. A function that is simply created in a script is associated with the global environment, but a function that is created by another function is associated with the local environment of the creating function. In your example, pmult creates mult, so the environment associated with mult contains the formal arguments cl, a, and x.
The problem with the first case is that parLapply doesn't know anything about x: it is just an unevaluated formal argument that is serialized as part of the environment of mult by parLapply. Since x isn't evaluated when mult is serialized and sent to the cluster workers, it causes an error when the workers execute mult, since dat isn't available in that context. In other words, by the time mult evaluates x, it's too late.
The second case works because x is evaluated before mult is serialized, so the actual value of x is serialized along with the environment of mult. It does what you would expect if you knew about closures but not lazy argument evaluation.
The third case works because you're having parLapply handle x for you. There's no trickery going on at all.
I should warn you that in all of these cases, a is being evaluated (by parLapply) and serialized along with the environment of mult. parLapply is also splitting a into chunks and sending those chunks to each worker, so the copy of a in the environment of mult is completely unnecessary. It doesn't cause an error, but it could hurt performance, since mult is sent to the workers in every task object. Fortunately, this is much less of a problem with parLapply, since there is only one task per worker. It would be a much worse problem with clusterApply or clusterApplyLB where the number of tasks is equal to the length of a.
I talk about a number of issues relating to functions and environments in the "snow" chapter of my book. There are some subtle issues involved, and it's easy to get burned, sometimes without realizing that it happened.
As for your second question, there are various strategies for exporting functions to the workers, but some people do use source to define functions on the workers rather than using clusterExport. Keep in mind that source has a local argument that controls where the parsed expressions are evaluated, and you may need to specify the absolute path to the script. Finally, if you're using remote cluster workers, you may need to scp the script to the workers if you don't have a distributed file system.
Here is a simple method of exporting all of the functions in your global environment to the cluster workers:
ex <- Filter(function(x) is.function(get(x, .GlobalEnv)), ls(.GlobalEnv))
clusterExport(cl, ex)