I am quite new to R and I am confused by the correct usage of tryCatch. My goal is to make a prediction for a large data set. If the predictions cannot fit into memory, I want to circumvent the problem by splitting my data.
Right now, my code looks roughly as follows:
tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
small_vector = predict(model, large_data_frame[i:(i+step-1), ])
save(small_vector, tmpfile)
}
rm(large_data_frame) # free memory
large_vector = NULL
for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
load(tmpfile)
unlink(tmpfile)
large_vector = c(large_vector, small_vector)
}
})
The point is that if no error occurs, large_vector is filled with my predictions as expected. If an error occurs, large_vector seems to exist only in the namespace of the error code - which makes sense because I declared it as a function. For the same reason, I get a warning saying that large_data_frame cannot be removed.
Unfortunately, this behavior is not what I want. I would want to assign the variable large_vector from within my error function. I figured that one possibility is to specify the environment and use assign. Thus, I would use the following statements in my error code:
rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))
However, this solution seems rather dirty to me. I wonder whether there is any possibility to achieve my goal with "clean" code?
[EDIT]
There seems to be some confusion because I put the code above mainly to illustrate the problem, not to give a working example. Here's a minimal example that shows the namespace issue:
# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
large_vector = rep(5, 1000)
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # all 5
# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector) # all 3
# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({
stop(); # simulate error
}, error = function(e) {
# do stuff to build the vector
large_vector = rep(3, 1000)
})
print(large_vector) # does not exist
I would do something like this :
res <- tryCatch({
large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
function(x)
small_vector = predict(model, x))
return(ll)
})
rm(large_data_frame)
if(is.list(ll))
res <- do.call(rbind,res)
The idea is to return a list of predictions results if you run out of the memory.
NOTE, i am not sure of the result here, because we don't have a reproducible example.
EDIT: Let's try again:
You can use finally argument of tryCatch:
step<-1000
n<-dim(large_data_frame)[1]
large_vector <- NULL
tryCatch({
large_vector <- predict(model, large_data_frame)
}, error = function(e) { # ran out of memory
for (i in seq(from = 1, to = n, by = step)) {
small_vector <- predict(model, large_data_frame[i:(i+step-1),]) #predict in pieces
save(small_vector,file=paste0("tmpfile",i)) #same pieces
}
rm(large_data_frame) #free memory
},finally={if(is.null(large_vector)){ #if we run out of memory
large_vector<-numeric(n) #make vector
for (i in seq(from = 1, to = n, by = step)){
#collect pieces
load(paste0("tmpfile",i))
large_vector[i:(i+step-1)] <- small_vector
}
}})
Here's a simplified version to see what is going on:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
print("error")
},finally={if(is.null(large_vector)){
large_vector<-1
}})
> large_vector
[1] 1
EDIT2: Another tip regarding the scope which could be useful for you (although maybe not in this situation as you didn't want to declare large_vector beforehand): The <<- operator, from R-help:
The operators <<- and ->> are normally only used in functions, and
cause a search to made through parent environments for an existing
definition of the variable being assigned...
Therefore you could use above example code like this:
large_vector<-NULL
rm(y)
tryCatch({
large_vector <- y
}, error = function(e) {# y is not found
large_vector <<- 1
print("error")
})
> large_vector
[1] 1
The code below is quite self explanatory. Indeed the problem is that anything inside the error function is not by default applied to the parent environment.
b=0
as explained, this doesn't work:
tryCatch(expr = {stop("error1")}, error=function(e) {b=1})
b
SOLUTION 1: assign to the parent environment
tryCatch(expr = {stop("error2")}, error=function(e) {assign(x = "b", value =
2, envir = parent.env(env = environment()))})
b
SOLUTION 2: the most simple (only works if you are assigning to b in both expr and error)
b = tryCatch(expr = {stop("error3")}, error=function(e) {b=3;return(b)})
b
Related
I am trying something pretty simple, want to run a bunch of regressions parallelly. When I use the following data generator (PART 1), The parallel part does not work and give the error listed below
#PART 1
p <- 20; rho<-0.7;
cdc<- diag(p)
for( i in 1:(p-1) ){ for( j in (i+1):p ){
cdc[i,j] <- cdc[j,i] <- rho^abs(i-j)
}}
my.data <- mvrnorm(n=100, mu = rep(0, p), Sigma = cdc)
The following Parallel Part does work but if I generate the data as PART 2
# PART 2
my.data<-matrix(rnorm(1000,0,1),nrow=100,ncol=10)
I configured the function that I want to run parallelly... as
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-coef(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
So I write the following way of running parlapply
{
no_cores <- detectCores(logical = TRUE)
myclusternumber<-(no_cores-1)
cl <- makeCluster(myclusternumber)
registerDoParallel(cl)
p1 <- ncol(my.data)
obj<-splitIndices(p1, myclusternumber)
clusterExport(cl,list('parallel_fun','my.data','obj'),envir=environment())
clusterEvalQ(cl, {
library(MASS)
library(Matrix)
library(BAS)
})
newresult<-parallel::parLapply(cl,obj,fun = parallel_fun,my.data)
stopCluster(cl)
}
But whenever am doing PART 1 I get the following error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: object 'my_df' not found
But this should not happen, the data frame should be created, I have no idea why this is happening. Any help is appreciated.
Posting this as one possible workaround, see if it works:
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my_df <<- my_df
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-BAS:::coef.bas(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
The issue seems to be with BAS:::coef.bas function, that calls eval in order to get my_df and fails to do that when called in parallel. The "hack" here is to force my_df out to the parent environment by calling my_df <<- my_df.
There should be a better way to do this, but <<- might be the fastest one. In general, <<- may cause unwanted behaviour, especially when used in loops. Assigning unique variable name before exporting (and don't forgetting to remove after use) is one way to tackle them.
When I try to debug a certain function (itself defined within the function NbCluster), I get a could not find function error. I have checked and the function in question is definitely loaded when debug is called.
> data("USArrests")
> arrests <- scale(USArrests)
> source("NbCluster_copy.R")
> NbCluster_copy(data = arrests, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 12,
+ method = "ward.D2", index = "gap", alphaBeale = 0.1)
[1] "Indice.Gap exists"
Error in debug(fun = "Indice.Gap") : could not find function "Indice.Gap"
And the issue does not happen if I manually step through the function (by selecting and running lines instead of calling the function).
I tried making a minimal example, but was unable to, so I don't think it is the nested functions that are the problem.
###This works as expected, when I run "wrapper", debug is called from within the function:
wrapper <- function(x){
wrapper <- function(x){
fun1 <- function(x){
fun0 <- function(x){
y = x + 1
return(y)
}
debug(fun0)
y = fun0(x) * 2
return(y)
}
fun1(x)
}
> wrapper(2)
debugging in: fun0(x)
debug at #3: {
y = x + 1
return(y)
}
Browse[2]>
debug at #4: y = x + 1
Browse[2]>
debug at #5: return(y)
Browse[2]>
exiting from: fun0(x)
[1] 6
This is the part I added into the NbClust function.
if(exists("Indice.Gap")){
print("Indice.Gap exists")
}
debug(fun = "Indice.Gap")
right before the first call of Indice.Gap:
resultSGAP <- Indice.Gap(x = jeu, clall = clall,
reference.distribution = "unif", B = 10, method = "ward.D2",
d = NULL, centrotypes = "centroids")
I only made very minor changes besides the one shown above, but if you want to look at the whole function, my copy is here: https://pastebin.com/wxKKDbHy
Just remove the quotes in debug and it should work:
debug(Indice.Gap)
should do the trick.
outer_fun <- function() {
inner_fun <- function() 1
## does not work
# debug("inner_fun")
## works
debug(inner_fun)
inner_fun()
}
outer_fun()
Funny enough on the top level you can provide the function name as string:
debug("outer_fun") # works
debug(outer_fun) # works
I have strange environment/scoping dynamic that I've been trying to figure out, and looking for the right or recommended method for achieving this.
I've made a toy example of my problem below purely for illustration. (I'm aware this particular problem can be solved much more simply, but it illustrates the dynamic I'm trying to implement).
Current functioning code:
master_function <-
function(x, iter = 100){
x_p1 <- function(){ x <<- x + 1 }
x_m1 <- function(){ x <<- x - 1 }
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1()
} else {
x_m1()
}
path[i] <- x
}
path
}
The issue with this code (especially for an actually difficult problem) is that it makes debugging the x_p1, x_m1 function contents with the RStudio debug utility impossible.
Hoping to restructure the code to look something like:
master_function <-
function(x, iter = 100){
master_env <- environment()
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1(master_env)
} else {
x_m1(master_env)
}
path[i] <- x
}
path
}
x_p1 <- function(env){ assign('x', get('x', envir = env) + 1, envir = env) }
x_m1 <- function(env){ assign('x', get('x', envir = env) - 1, envir = env) }
But this is also quite ugly. Is there a way to augment the search path, for example, such that access to the master_env is cleaner?
Edit: More information as requested by #MrFlick
Essentially I have simulation with a lot of moving pieces. As it progresses, different events (the sub-functions being referenced) are triggered modifying the state of the simulation. These functions currently modify many different state objects for each function call. Since the functions are made within the master function call, I can take advantage of lexical scoping and the <<- operator, but I lose the ability to debug within those functions.
Trying to figure out how to create those functions outside of the master simulation. If I understand correctly, if I make the functions such that they consume the simulation state and return a modified version, it comes at a large memory cost.
1) trace Use trace to insert debug statements after the definitions of x_p1 and x_m1 and then one can step through them when master_function is run.
trace(master_function, at = 4, quote({debug(x_p1); debug(x_m1) }))
untrace(master_function) turns this off. Use body(master_function)[4] to see which line corresponds to 4. See ?trace for more.
2) instrument Another possibility is to instrument your function like this and then call it with master(function(x, DEBUG = TRUE) to turn on debugging.
master_function <-
function(x, iter = 100, DEBUG = FALSE){
x_p1 <- function(){ x <<- x + 1 }
x_m1 <- function(){ x <<- x - 1 }
if (DEBUG) {
debug(x_p1)
debug(x_m1)
}
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
x_p1()
} else {
x_m1()
}
path[i] <- x
}
path
}
Why does x need to reside in an alternative environment at all? The following internalizes and avoids the multiple environments entirely.
x_p1 <- function(z){ z + 1 }
x_m1 <- function(z){ z - 1 }
master_function <-
function(x, iter = 100){
new_x <- x
path <- numeric(iter)
for(i in 1:iter){
next_step <- sample(c('p', 'm'), 1)
if(next_step == 'p'){
new_x <- x_p1(new_x)
} else {
new_x <- x_m1(new_x)
}
path[i] <- new_x
}
path
}
I'm trying to do a thing "the right way". Sometimes "the right way" takes too long, depending on the inputs. I can't really know a priori when this will be. When "the right way" is taking too long, I want to go to "the hackish way". How do I make R monitor how long a particular task as taken, and give it something else to do if a threshold has passed? I'd imagine that this will be part of the try family, but I'm not quite sure what to call it or google for.
Dummy example below. When slow.func takes too long, I want interuptor to stop it and call fast.func instead.
slow.func <- function(x){
Sys.sleep(x)
print('good morning')
}
fast.func <- function(x){
Sys.sleep(x/10)
print('hit snooze')
}
interuptor = function(FUN,args, time.limit, ALTFUN){
# START MONITORING TIME HERE
do.call(FUN,args)
# IF FUN TAKES TOO LONG, STOP IT, CALL A
do.call(ALTFUN,args)
}
interuptor(slow.func, list(x = 2), time.limit = 1, fast.func)
The R package R.utils has a function evalWithTimeout that's pretty much exactly what you're describing. If you don't want to install a package, evalWithTimeout relies on the less user friendly R base function setTimeLimit
Your code would look something like this:
library(R.utils)
slow.func <- function(x){
Sys.sleep(10)
return(x^2)
}
fast.func <- function(x){
Sys.sleep(2)
return(x*x)
}
interruptor = function(FUN,args, time.limit, ALTFUN){
results <- NULL
results <- evalWithTimeout({FUN(args)},timeout=time.limit,onTimeout="warning")
if(results==NULL){
results <- ALTFUN(args)
}
return(results)
}
interruptor(slow.func,args=2,time.limit=3,fast.func)
For anyone who wants a lighter weight solution that does not depend on the R.utils package, I ended up using a minimal solution based on the withTimeout() code.
foo <- function() {
time_limit <- 10
setTimeLimit(cpu = time_limit, elapsed = time_limit, transient = TRUE)
on.exit({
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
})
tryCatch({
# do some stuff
}, error = function(e) {
if (grepl("reached elapsed time limit|reached CPU time limit", e$message)) {
# we reached timeout, apply some alternative method or do something else
} else {
# error not related to timeout
stop(e)
}
})
}
The initial version I posted worked with "R.utils v2.5.0 (2016-11-07)" but it does not with "R.utils v2.9.2". Below a version with some modifications that works using "R.utils v2.9.2"
version with "R.utils v2.5.0"
The answer of "nwknoblauch" does not work for me unless I change "warning" by "silent" inside the interruptor function.
library(R.utils)
slow.func <- function(x){
Sys.sleep(10)
return(x^2)
}
fast.func <- function(x){
Sys.sleep(2)
return(x*x)
}
interruptor = function(FUN,args, time.limit, ALTFUN){
results <- NULL
results <- evalWithTimeout({FUN(args)},timeout=time.limit,onTimeout="silent")
if(is.null(results)){
results <- ALTFUN(args)
}
return(results)
}
interruptor(FUN = slow.func,args=2,time.limit=3,ALTFUN = fast.func)
version with "R.utils v2.9.2"
library(R.utils)
slow.func <- function(x){
Sys.sleep(4)
return(x^2)
}
fast.func <- function(x){
Sys.sleep(2)
return(x)
}
interruptor <- function(FUN,args, time.limit, ALTFUN){
results <-
tryCatch({
withTimeout({FUN(args)}, timeout=time.limit)
}, error = function(e){
if(grepl("reached elapsed time limit",e$message))
ALTFUN(args) else
paste(e$message,"EXTRACTERROR")
})
if(grepl("EXTRACTERROR",results)){
print(gsub("EXTRACTERROR","",results))
results <- NULL
}
return(results)
}
Depending on the selected time.limit, it executes the first function or the alternative. It returns NULL when there is an error not related to time limit and print the error message.
EXAMPLE:
test_obj <- interruptor(FUN = slow.func, args=5, time.limit= 6, ALTFUN = fast.func)
test_obj
test_obj <- interruptor(FUN = slow.func, args=5, time.limit= 3, ALTFUN = fast.func)
test_obj
test_obj <- interruptor(FUN = slow.func, args="A", time.limit= 6, ALTFUN = fast.func)
test_obj
test_obj <- interruptor(FUN = slow.func, args="A", time.limit= 3, ALTFUN = fast.func)
test_obj
Thanks to andybega for the idea of how improving the issue of error messages
The following:
install.packages("quantreg")
require(quantreg)
y=rnorm(10)
x=rnorm(10)
summary(rq(y~x,tau=0.01),se="ker")
Generates the error Error in summary.rq(rq(y ~ x, tau = 0.01), se = "ker") :
tau - h < 0: error in summary.rq.
Say I loop over different y and x 1000 times. I want to be able to know when the error occurs and implement a fix mid-loop.
However all my attempts to work with summary(rq(y~x,tau=0.01),se="ker") using is() etc etc doesn't get anywhere. I've never worked with this object type before (and Google/SE searches haven't revealed the answer yet).
I want something like is.error(summary(rq(y~x,tau=0.01),se="ker")), which doesn't actually exist.
The following command will return a logical value indicating whether an error occured:
class(tryCatch(summary(rq(y ~ x,tau = 0.01),se = "ker"),
error = function(e) e))[1] == "simpleError"
You can use replicate instead of a for loop. It is more efficient. In the follwing example, a list including x, y, and the logical errorvalue is returned. The procedure is replicated two times. You could use n = 1000 to replicate it 1000 times.
replicate(n = 2,
expr = {y <- rnorm(10);
x <- rnorm(10);
error <- class(tryCatch(summary(rq(y ~ x,tau = 0.01),se = "ker"), error = function(e) e))[1] == "simpleError";
return(list(x = x, y = y, error = error))},
simplify = FALSE)
Elaborating on the answer from #SvenHohenstein one would like to return the result on successful evaluation, not just whether an error occurred. We'd likely also want to return the reason for the error message using conditionMessage. We'd like to catch errors of class simpleError, so we write a handlers specific to that type of condition. So
error <- FALSE # no error yet
result <- tryCatch({ # result from summary(), or from the error handler
summary(rq(y ~ x,tau = 0.01),se = "ker")
}, simpleError = function(e) { # only catch simpleErrors
error <<- TRUE # yes, error occurred
conditionMessage(e) # 'result' gets error message
})
we'd then return list(x = x, y = y, error=error, result=result).