debug a function when Safely from Purrr has been used? - r

I have a function that has been saved as an Rds object and I'm wondering if there is any possibility after reading the function to debug() it or to see the code inside the function?
Example
library(purrr)
some_function <- function(x){
avg <- mean(x)
std <- sd(x)
return(c(avg, std))
}
safe_function <- safely(some_function)
saveRDS(safe_function, 'safe_function.rds')
rm(safe_function)
# How can I debug the function or make changes to it after I've loaded it?
safe_function <- readRDS('safe_function.rds')

Here's one way to do it:
Execute debug(safe_function) in the console and then call your function, say, safe_function(c(1, 2))
At this time, you will be in debug mode:
In your console, execute debugonce(.f) and then hit either 'next' or 'continue' (alternatively, type n or c in the console)
You will now be within the body of your some_function and will be able to see the code:

The short answer is that it's easy to extract the underlying function, with a single line of code:
extracted_function <- environment(safe_function)$.f
extracted_function
#> function(x){
#>
#>
#> avg <- mean(x)
#> std <- sd(x)
#>
#> return(c(avg, std))
#>
#> }
#> <bytecode: 0x1951c3e0>
You can debug this function however you like, then if you want to overwrite it but keep it within its safely wrapper, you can overwrite it like this:
debugged_function <- function(x) c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE))
environment(safe_function)$.f <- debugged_function
safe_function(1:10)
#> $result
#> [1] 5.50000 3.02765
#>
#> $error
#> NULL
I'll give a quick explanation of why this works:
If you examine your safe_function, you will notice a couple of unusual things about it. Although you have loaded it into the global environment, it is actually wrapped in its own unnamed environment (in this case <environment: 0x095704c0>), which is integral to the way that it works:
safe_function
#> function (...)
#> capture_error(.f(...), otherwise, quiet)
#> <bytecode: 0x0956fd50>
#> <environment: 0x095704c0>
The other odd thing you'll notice is that safe_function calls the function .f, which is neither a built-in function, nor a function exported from purrr. That's because .f is a copy of your original function that is kept inside this special environment.
We can look at the complete contents of the unnamed environment by doing:
ls(environment(safe_function), all.names = TRUE)
#> [1] ".f" "otherwise" "quiet"
Now, if you look at what .f is, you will find it is just a copy of your original some_function:
#> function(x){
#>
#>
#> avg <- mean(x)
#> std <- sd(x)
#>
#> return(c(avg, std))
#>
#> }
#> <bytecode: 0x1951c3e0>
So this is where your wrapped function is "hiding". It remains accessible as a member of this unnamed environment, which can be accessed via environment(safe_function) so is easy to modify if desired.

Related

Passing arguments to furrr::future_map using ellipsis (...)

I am trying to use furrr::future_pmap in R to replace purrr::pmap in a function call within another function.
Presently I have it set up so pmap is passing other arguments using the ellipsis ... however when I try and do this using future_pmap I get unused argument errors (see example below). I know from comments in here passing ellipsis arguments to map function purrr package, R and other previous research that for the ellipsis to work with pmap you need to use function(x,y,z) blah(x,y,z,...) instead of ~blah(..1,..2,..3) but the same approach doesn't seem to work for future_map. Is there some other secret to making this work?
I've created a very simple reprex, obviously my real functions make a lot more sense to run in future_pmap
library(purrr)
library(furrr)
#> Loading required package: future
plan(multiprocess)
xd <- list(1, 10, 100)
yd <- list(1, 2, 3)
zd <- list(5, 50, 500)
sumfun <- function(indata, otherdata){
out <- sum(c(indata, otherdata))
return(out)
}
test_fun_pmap_tilde <- function(ind, ...){
return( pmap(ind, ~sumfun(c(..1,..2,..3), ...)))
}
test_fun_pmap <- function(ind, ...){
return( pmap(ind, function(x,y,z) sumfun(c(x,y,z), ...)))
}
test_fun_future_pmap <- function(ind, ...){
return( future_pmap(ind, function(x,y,z) sumfun(c(x,y,z), ...)))
}
#doesn't work as need to use function(x,y,z) instead of tildes
test_fun_pmap_tilde(list(xd, yd, zd), otherdata = c(100,1000))
#> Error in sumfun(c(..1, ..2, ..3), ...): unused arguments (.l[[2]][[i]], .l[[3]][[i]])
#this one works
test_fun_pmap(list(xd, yd, zd), otherdata = c(100,1000))
#> [[1]]
#> [1] 1107
#>
#> [[2]]
#> [1] 1162
#>
#> [[3]]
#> [1] 1703
#but using future_pmap it doesn't work
test_fun_future_pmap(list(xd, yd, zd), otherdata = c(100,1000))
#> Error in (function (x, y, z) : unused argument (otherdata = c(100, 1000))
Created on 2020-08-31 by the reprex package (v0.3.0)
Okay I have found a way for it to work. Apparently I need 3 sets of ellipsis instead of just 1.
test_fun_future_pmap <- function(ind, ...){
return( future_pmap(ind, function(x,y,z,...) sumfun(c(x,y,z), ...),...))
}

Getting the name of object passed to `print` when calling object directly (not expressing the `print` function)

I'm trying to define the print method for my new object and use the name of the object passed to print using deparse(substitute(y)). This works perfectly using the print function explicitly:
obj <- structure(list(x = 1),
class = "new_obj")
print.new_obj <- function(y){
cat("New object name:\n")
print(deparse(substitute(y)))
}
print(obj)
# New object name:
# [1] "obj"
But when the object is called by name on its own the resulting print function doesn't detect the name:
obj
# New object name:
# [1] "x"
Is there a standard way to change the behaviour of the implicit call to print when passing an object name on its own?
EDIT: have changed the function argument to y to represent object being passed, to demonstrate that "x" is returned no matter what in the second call.
It is easier to explain what's going on than it is to fix it. If we start by looking at the generic print we can see it simply dispatches the class-appropriate print method via UseMethod("print"):
print
#> function (x, ...)
#> UseMethod("print")
So when you call print(obj), you are calling the generic function print(obj) first, which then calls print.new_obj(obj). We can confirm this by adding print(sys.calls()) to your print method:
print.new_obj <- function(y){
print(sys.calls())
cat("New object name:\n")
cat(deparse(substitute(y)))
}
print(obj)
#> [[1]]
#> print(obj)
#>
#> [[2]]
#> print.new_obj(obj)
#>
#> New object name:
#> obj
So far, so good, and I suspect you already knew all this.
What happens now, when you just type obj into the console?
obj
#> [[1]]
#> (function (x, ...)
#> UseMethod("print"))(x)
#>
#> [[2]]
#> print.new_obj(x)
#>
#> New object name:
#> x
Now we can see where the x comes from. It is taken from a behind-the-scenes call to the generic print which is actually called as an unnamed function. Hence the name of the variable is not actually included in the call stack. There are other questions on SO where it says this makes the problem insoluble. This isn't true; it just means you will need to look outside of the call stack for your object:
print.new_obj <- function(y){
obj_name <- deparse(substitute(x, parent.frame()))
if (obj_name != "x")
{
obj_name <- names(which(sapply(ls(envir = parent.frame(2)), function(v)
identical(y, get(v, envir = parent.frame(2))))))[1]
cat("New object name:\n", obj_name)
}
else cat("New object name:\n", deparse(substitute(y)))
}
print(obj)
#> New object name:
#> obj
obj
#> New object name:
#> obj
Of course, you wouldn't want to use this in production code, for all sorts of reasons. It is not particularly useful or logical for a data structure to know what name it has been assigned in a particular environment, and would not be an idiomatic way to write a package for other users.
Still, nice to know it is possible.

Partial functions keeping their signature

We can use purrr::partial to create partial functions:
f <- function(x, y) {
print(x)
print(y)
return(invisible())
}
ff <- purrr::partial(f, y = 1)
ff(2)
#> [1] 2
#> [1] 1
Created on 2020-02-19 by the reprex package (v0.3.0)
This can often be quite useful, but has the unfortunate side-effect that the partialized function loses it's signature, which is replaced with an elipsis:
ff
#> <partialised>
#> function (...)
#> f(y = 1, ...)
While programatically irrelevant, this leads to worse code legibility during development, where RStudio's "intellisense" can no longer aid us in remembering the names and/or order of arguments. So is there some other means of partializing which keeps the original signature (minus the partialized-away arguments), as below?
ff
#> <partialised>
#> function (x)
#> f(y = 1, x)
Now, obviously this can be done manually, by defining a new function ff which is simply a wrapper around f with the desired arguments.
ff <- function(x) f(x, y = 1)
But this means any modifications to the signature of f need to be replicated to ff. So is there a "cleaner" way of partializing while keeping the signature?
One option is to use rlang::fn_fmls() (or base::formals() equivalent) to explicitly give default values to the function arguments:
# If desired, create a copy of the function first: ff <- f
rlang::fn_fmls(f) <- purrr::list_modify( rlang::fn_fmls(f), y=1 )
args(f)
# function (x, y = 1)
f(2)
# [1] 2
# [1] 1

When is it worth using `remove` in R functions?

What factors should I consider when deciding whether or not to remove a variable that will not be used again in a function?
Here's a noddy example:
DivideByLower <- function (a, b) {
if (a > b) {
tmp <- a
a <- b
b <- tmp
remove(tmp) # When should I include this line?
}
# Return:
a / b
}
I understand that tmp will be removed when the function finishes executing, but should I ever be concerned about removing it earlier?
From Hadley Wickham's advanced R :
In some languages, you have to explicitly delete unused objects for
their memory to be returned. R uses an alternative approach: garbage
collection (or GC for short). GC automatically releases memory when an
object is no longer used. It does this by tracking how many names
point to each object, and when there are no names pointing to an
object, it deletes that object.
In the case you're describing garbage collection will release the memory.
In case the output of your function is another function, in which case Hadley names these functions respectively the function factory and the manufactured function, the variables created in the body of the function factory will be available in the enclosing environment of the manufactured function, and memory won't be freed.
More info, still in Hadley's book, can be found in the chapter about function factories.
function_factory <- function(x){
force(x)
y <- "bar"
fun <- function(z){
sprintf("x, y, and z are all accessible and their values are '%s', '%s', and '%s'",
x, y, z)
}
fun
}
manufactured_function <- function_factory("foo")
manufactured_function("baz")
#> [1] "x, y, and z are all accessible and their values are 'foo', 'bar', and 'baz'"
Created on 2019-07-08 by the reprex package (v0.3.0)
In this case, if you want to control which variables are available in the enclosing environment, or be sure you don't clutter your memory, you might want to remove unnecessary objects, either by using rm / remove as you did, or as I tend to prefer, wrapped in an on.exit statement.
Another case in which I might use rm is if I want to access variables from a parent environment without risk of them being overriden inside of the function, but in that case it's often possible and cleaner to use eval.parent.
y <- 2
z <- 3
test0 <- function(x, var){
y <- 1
x + eval(substitute(var))
}
# opps, the value of y is the one defined in the body
test0(0, y)
#> [1] 1
test0(0, z)
#> [1] 3
# but it will work using eval.parent :
test1 <- function(x, var){
y <- 1
x + eval.parent(substitute(var))
}
test1(0, y)
#> [1] 2
test1(0, z)
#> [1] 3
# in some cases (better avoided), it can be easier/quick and dirty to do something like :
test2 <- function(x, var){
y <- 1
# whatever code using y
rm(y)
x + eval(substitute(var))
}
test2(0, y)
#> [1] 2
test2(0, z)
#> [1] 3
Created on 2019-07-08 by the reprex package (v0.3.0)

Writing a decorator for R functions

A colleague recently was looking at call graphs and wanted to see what called what. We sorted that with foodweb from mvbutils, but I was wondering about how best to create a decorator (in python speak) in R. So I did this:
instrument=function(z){
force(z)
n=deparse(substitute(z)) # get the name
f=function(...){
cat("calling ", n,"\n")
x=z(...)
cat("done\n")
return(x)
}
return(f)
}
This lets me do:
> foo=function(x,y){x+y}
> foo(1,2)
[1] 3
and now I can make the function log itself by wrapping it:
> foo=instrument(foo)
> foo(1,2)
calling foo
done
[1] 3
has this been done before, in a package say, and have I missed any gotchas that will break my way of doing this?
The trace function in R does that. See ?trace.
my github package tag attempts to tackle this issue.
Your example could be solved as follows :
# remotes::install_github("moodymudskipper/tag")
library(tag)
deco <- tag(args = list(.first = NULL, .last = NULL), pattern = {
t_args <- T_ARGS() # fetch arguments fed to tag
eval.parent(t_args[[".first"]]) # run .first arg
on.exit(eval.parent(t_args[[".last"]])) # run .last arg on exit
CALL() # run main call
})
foo <- function(x, y) {Sys.sleep(1); x + y} # sleep 1 sec to highlight expected behavior
deco(quote(message("calling foo")), quote(message("done")))$foo(1, 2)
#> calling foo
#> done
#> [1] 3
foo2 <- deco(quote(message("calling foo")), quote(message("done")))$foo
foo2(1, 2)
#> calling foo
#> done
#> [1] 3
deco2 <- deco(quote(message("calling foo")), quote(message("done")))
deco2$foo(1, 2)
#> calling foo
#> done
#> [1] 3
Created on 2020-01-30 by the reprex package (v0.3.0)
tags are function operator factories (or adverb factories), here deco is a tag, and deco(quote(message("calling foo")), quote(message("done"))) is an adverb, with a method for $. It means you could run deco(quote(message("calling foo")), quote(message("done")))(foo)(1,2), but the dollar notation makes it friendlier.
The tag definition features defaults arguments (default value is mandatory, dots aren't supported), and a pattern which is a bit like your new body, using special functions T_ARGS(), F_ARGS(), F_ARGS(), F_FORMALS() and CALL() to access the tag or function's arguments or formals and the call itself (see ?tag::CALL).
Some more magic is implemented so the tag's argument can be given to the tagged function itself, so the following can be done too :
deco$foo(1, 2, quote(message("calling foo")), quote(message("done")))
#> calling foo
#> done
#> [1] 3
foo2 <- deco$foo
foo2(1, 2, quote(message("calling foo")), quote(message("done")))
#> calling foo
#> done
#> [1] 3
In these cases you can enjoy the autocomplete in RStudio :
More info : https://github.com/moodymudskipper/tag
The package tags contains a collection of such "decorators"

Resources