R debug() "could not find function" even though it exists - r

When I try to debug a certain function (itself defined within the function NbCluster), I get a could not find function error. I have checked and the function in question is definitely loaded when debug is called.
> data("USArrests")
> arrests <- scale(USArrests)
> source("NbCluster_copy.R")
> NbCluster_copy(data = arrests, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 12,
+ method = "ward.D2", index = "gap", alphaBeale = 0.1)
[1] "Indice.Gap exists"
Error in debug(fun = "Indice.Gap") : could not find function "Indice.Gap"
And the issue does not happen if I manually step through the function (by selecting and running lines instead of calling the function).
I tried making a minimal example, but was unable to, so I don't think it is the nested functions that are the problem.
###This works as expected, when I run "wrapper", debug is called from within the function:
wrapper <- function(x){
wrapper <- function(x){
fun1 <- function(x){
fun0 <- function(x){
y = x + 1
return(y)
}
debug(fun0)
y = fun0(x) * 2
return(y)
}
fun1(x)
}
> wrapper(2)
debugging in: fun0(x)
debug at #3: {
y = x + 1
return(y)
}
Browse[2]>
debug at #4: y = x + 1
Browse[2]>
debug at #5: return(y)
Browse[2]>
exiting from: fun0(x)
[1] 6
This is the part I added into the NbClust function.
if(exists("Indice.Gap")){
print("Indice.Gap exists")
}
debug(fun = "Indice.Gap")
right before the first call of Indice.Gap:
resultSGAP <- Indice.Gap(x = jeu, clall = clall,
reference.distribution = "unif", B = 10, method = "ward.D2",
d = NULL, centrotypes = "centroids")
I only made very minor changes besides the one shown above, but if you want to look at the whole function, my copy is here: https://pastebin.com/wxKKDbHy

Just remove the quotes in debug and it should work:
debug(Indice.Gap)
should do the trick.
outer_fun <- function() {
inner_fun <- function() 1
## does not work
# debug("inner_fun")
## works
debug(inner_fun)
inner_fun()
}
outer_fun()
Funny enough on the top level you can provide the function name as string:
debug("outer_fun") # works
debug(outer_fun) # works

Related

Exporting function inputs in parallel processing in R

I am trying to write a function which has a parallel computation option.
To make it work in both windows, and mac or Linux environments, I am using a PSOCK system, which I believe is a default configuration in makeCluster(). My question is whether I should, or it is more desirable, to pass all arguments to the clusters using a clusterExport function. If I do this, I think I need to evaluate all input arguments-- instead of the default lazy evaluation. If some variables are used only in some special cases, this does not seem desirable.
For example, in the following code, I am wondering whether I should add
clusterExport(varlist = c("a","b","c"),cl = cl,envir = environment()) in the function. The following code works fine in my computer, but a similar code failed in other's computer.
I would be very interested to hear about the best practice as well. Thank you!
library(pbapply)
foo = function(a=3, b=4, c=5, B = 8, parallel = FALSE){
if(parallel) {cl = makeCluster(4) } else{cl = NULL}
# default a,b,c values are desired to be used
if(a>5){
# c is used only in this case
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+c+i},cl = cl)
}else{
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+i},cl = cl)
}
if(parallel) stopCluster(cl)
return(v)
}
system.time(foo())
system.time(foo(parallel = T))
You could try to set defaults to NULL and do a case handling using sapply. I'm not sure, though, if this really works, because I can't reproduce your error.
foo <- function(a=NULL, b=NULL, c=NULL, B=NULL, parallel=FALSE) {
if (parallel) {
cl <- makeCluster(detectCores() - 4) ## safer to code this more dynamically
## case handling:
sapply(c("a", "b", "c", "B"), function(x) {
if (!is.null(get(x))) clusterExport(cl, x, environment())
})
} else {
cl <- NULL
}
# default a,b,c values are desired to be used
if (a > 5) {
# c is used only in this case
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + c + i
}, cl=cl)
} else {
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + i
}, cl=cl)
}
if (parallel) stopCluster(cl)
return(v)
}
foo(a=3, b=4, c=5, B=8, parallel=TRUE)
# |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
# [1] 8 9 10 11 12 13 14 15

Nested use of call_modify

I am trying to create a call to a function f whose first argument is a call to another function (for which I've chosen dbinom as an example). The call to dbinom (passed on to f) does not include values for all the arguments as these should be finalised within f, and the completed call is returned by f. Here is my failed minimal attempt:
f <- function(a_call) {
call_modify(a_call, x=1)
}
a_call <- call2(dbinom, size=1, prob=0.5)
y <- call2(f, a_call)
The output for y is:
(function(a_call) {
call_modify(a_call, x=1)
})((function (x, size, prob, log = FALSE)
.Call(C_dbinom, x, size, prob, log))(size = 1, prob = 0.5))
This call will
call a_call without any arguments, and then;
pass this result on to f.
If I evaluate y, it errors because dinom's first argument is missing.
I similar-but-related construct:
> call2(call2(dbinom, x=1, size=1, prob=0.5))
((function (x, size, prob, log = FALSE)
.Call(C_dbinom, x, size, prob, log))(x = 1, size = 1, prob = 0.5))()
(function (x, size, prob, log = FALSE)
I get the sense there is something 'not even wrong' with what I'm trying here, and nesting a call modification is best done another way.
It seems that what you are trying to do is handled more naturally by purrr::partial(), which fills in one or more arguments of a function:
f <- function( a_fun ) {purrr::partial( a_fun, x=1 )}
a_fun <- purrr::partial( dbinom, size=1, prob=0.5 )
y <- f(a_fun)
y(...) is now effectively dbinom( x=1, size=1, prob=0.5, ... )
y() # 0.5
y(log=TRUE) # -0.6931472
The great thing about partial() is that it can be naturally chained with the %>% pipe:
z <- partial(dbinom, size=1) %>% partial(prob=0.5) %>% partial(x=1)
z(log=TRUE) # -0.6931472
If I understand correctly what you're trying to do,
then maybe this works better:
f <- function(a_call) {
call_modify(call_standardise(call2(ensym(a_call)),
caller_env()),
x=1)
}
Which you can use with or without characters:
f(print)
# print(x = 1)
f("print")
# print(x = 1)
eval(f(print))
# 1
Or with more indirection:
a_call <- expr(print)
eval(call2(f, a_call))
# print(x = 1)
eval(expr(f(!!a_call)))
# print(x = 1)
Since we do a bit of non-standard evaluation here,
things get a bit tricky.
call_standardise needs to be able to find the function you specify,
and it's very probable that it will be found in the environment that calls f,
and not necessarily in the environment that calls call_standardise,
which would be f's execution environment in this case.
That's why caller_env() is explicitly specified when calling call_standardise even though that's the default for the latter's env,
because default arguments are evaluated in the function's execution environment,
whereas explicit arguments are evaluated in the caller's environment.
Here's a contrived-looking example for this problem:
f2 <- function(a_call) {
call_modify(call_standardise(call2(ensym(a_call))),
x=1)
}
e <- new.env()
e$foo <- function(x) { x + 1 }
with(e, f(foo))
# foo(x = 1)
with(e, f2(foo))
# Error in eval_bare(node_car(expr), env) : object 'foo' not found
However, if you were to develop a package that provides f,
the example is no longer contrived:
f would live in your package's environment,
and other packages could call it for functions that are only available in their respective namespaces.
For more specifics and depictions,
check this reference,
and maybe try drawing the call tree for my example.
call2 constructs a call by passing evaluated ... arguments on to the callable object (the first argument). For example, the command below outputs to the console "y" as the second argument passed to call2 is evaluated,
> A <- call2(print, x=print('y'))
[1] "y"
and constructs a call to print which takes x="y" as its argument (not x=print("y")):
> A
(function (x, ...)
UseMethod("print"))(x = "y")
In order to get around a_call being evaluated and then passed (to f) in the constructed call, it can be quoted, e.g.
f <- function(a_call) {
call_modify(a_call, x=1)
}
a_call <- call2(dbinom, size=1, prob=0.5)
y <- call2(f, quote(a_call))
Now:
> y
(function(a_call) {
call_modify(a_call, x=1)
})(a_call)

RStudio and Lazy Evaluation

I am mainting the package "hdm" and I encountered the following problem.
The following code runs in plain R and used to run in RStudio, but not anymore:
library(hdm)
attach(GrowthData)
fmla= "Outcome ~ ."
fmla.y= "Outcome ~ . - gdpsh465 "
rY= rlasso(fmla.y, data =GrowthData)
Error message:
Error in exists("homoscedastic", where = penalty) : object 'n' not
found
If no penalty in the function rlasso is specified it is set by default containing the variable "n", the sample size of x, which is evaluated later.
n is gotten by lazy evaluation and it seems that in RStudio the correct environment is not found anymore.
The error occurs here, but the problem is that penalty contains n which is not know
if (!exists("homoscedastic", where = penalty)) penalty$homoscedastic = "FALSE"
Somehow I am not sure to solve this and would like to ask if you have any idea.
Thanks a lot for your efforts in advance!
Best,
Martin
When x is a character object, the problem arises because n is not defined in the environment from which rlasso.formula is called, i.e. rlasso.character(), or its parents. This is roughly what's happening:
test <- function(x, ...) {
UseMethod("test")
}
test.character <- function(x, pen = list(alpha = n)) {
test.formula(x, pen = pen)
}
test.formula <- function(x, pen = list(alpha = n)) {
n <- 2
test.default(x, pen)
}
test.default <- function(x, pen = list(alpha = n)) {
n <- 3
exists("alpha", where = pen)
}
test("y ~ x")
# Error in exists("alpha", where = pen) : object 'n' not found
test(y ~ x)
# [1] TRUE
test(123)
# [1] TRUE
A workaround is to not specify pen in the call to the formula method if it's not defined when the character method is called:
test.character <- function(x, pen = list(alpha = n)) {
if (missing(pen))
test.formula(x)
else
test.formula(x, pen = pen)
}
test("y ~ x")
# [1] TRUE

pretty log scale breaks with ggplot hex plot

I want to draw a hexbin plot with ggplot, but with log scale "pretty" breaks for the frequency. Consider
df = data.frame(a=rnorm(1000)); df$b <- df$a+rnorm(1000);
I used this answer to get pretty breaks on linear scale
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=pretty(..value.., n=10)))) +
scale_fill_discrete("Frequency")
This works. Now say I want to use log scale pretty breaks. So I used the idea from another answer to define
base_breaks <- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
and try to do
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=base_breaks(n=10)(..value..))))
but it is not able to find the function. It says:
Error in cut.default(value, breaks = base_breaks(n = 10)(value)) :
could not find function "base_breaks"
Even though base_breaks is defined.
> base_breaks(n=10)(c(1:1000))
[1] 1 5 10 50 100 500 1000
How can I make my function visible in whatever environment ggplot is calling it? I even defined it as a global variable with
base_breaks <<- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
but I still get the same error.
I am not sure about it, but you could try simplifying the function like this:
base_breaks <<- function(n = 10, x){
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
Maybe the problem is that you have a function whose result is another function, and that could be causing the error. With this aproach you would have the values more directly. Check it out!
I can't check it myself, since I get an error object 'value' not found...

tryCatch - namespace?

I am quite new to R and I am confused by the correct usage of tryCatch. My goal is to make a prediction for a large data set. If the predictions cannot fit into memory, I want to circumvent the problem by splitting my data.
Right now, my code looks roughly as follows:
tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
small_vector = predict(model, large_data_frame[i:(i+step-1), ])
save(small_vector, tmpfile)
}
rm(large_data_frame) # free memory
large_vector = NULL
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
load(tmpfile)
unlink(tmpfile)
large_vector = c(large_vector, small_vector)
}
})
The point is that if no error occurs, large_vector is filled with my predictions as expected. If an error occurs, large_vector seems to exist only in the namespace of the error code - which makes sense because I declared it as a function. For the same reason, I get a warning saying that large_data_frame cannot be removed.
Unfortunately, this behavior is not what I want. I would want to assign the variable large_vector from within my error function. I figured that one possibility is to specify the environment and use assign. Thus, I would use the following statements in my error code:
rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))
However, this solution seems rather dirty to me. I wonder whether there is any possibility to achieve my goal with "clean" code?
[EDIT]
There seems to be some confusion because I put the code above mainly to illustrate the problem, not to give a working example. Here's a minimal example that shows the namespace issue:
# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
large_vector = rep(5, 1000)
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # all 5
# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector) # all 3
# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # does not exist
I would do something like this :
res <- tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
function(x)
small_vector = predict(model, x))
return(ll)
})
rm(large_data_frame)
if(is.list(ll))
res <- do.call(rbind,res)
The idea is to return a list of predictions results if you run out of the memory.
NOTE, i am not sure of the result here, because we don't have a reproducible example.
EDIT: Let's try again:
You can use finally argument of tryCatch:
step<-1000
n<-dim(large_data_frame)[1]
large_vector <- NULL
tryCatch({
large_vector <- predict(model, large_data_frame)
}, error = function(e) { # ran out of memory
for (i in seq(from = 1, to = n, by = step)) {
small_vector <- predict(model, large_data_frame[i:(i+step-1),]) #predict in pieces
save(small_vector,file=paste0("tmpfile",i)) #same pieces
}
rm(large_data_frame) #free memory
},finally={if(is.null(large_vector)){ #if we run out of memory
large_vector<-numeric(n) #make vector
for (i in seq(from = 1, to = n, by = step)){
#collect pieces
load(paste0("tmpfile",i))
large_vector[i:(i+step-1)] <- small_vector
}
}})
Here's a simplified version to see what is going on:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
print("error")
},finally={if(is.null(large_vector)){
large_vector<-1
}})
> large_vector
[1] 1
EDIT2: Another tip regarding the scope which could be useful for you (although maybe not in this situation as you didn't want to declare large_vector beforehand): The <<- operator, from R-help:
The operators <<- and ->> are normally only used in functions, and
cause a search to made through parent environments for an existing
definition of the variable being assigned...
Therefore you could use above example code like this:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
large_vector <<- 1
print("error")
})
> large_vector
[1] 1
The code below is quite self explanatory. Indeed the problem is that anything inside the error function is not by default applied to the parent environment.
b=0
as explained, this doesn't work:
tryCatch(expr = {stop("error1")}, error=function(e) {b=1})
b
SOLUTION 1: assign to the parent environment
tryCatch(expr = {stop("error2")}, error=function(e) {assign(x = "b", value =
2, envir = parent.env(env = environment()))})
b
SOLUTION 2: the most simple (only works if you are assigning to b in both expr and error)
b = tryCatch(expr = {stop("error3")}, error=function(e) {b=3;return(b)})
b

Resources