foreach-loop (R/doParallel package) fails with big number of iterations

foreach-loop (R/doParallel package) fails with big number of iterations - r

I have the following R-code:
library(doParallel)
cl <- makeCluster(detectCores()-4, outfile = "")
registerDoParallel(cl)
calc <- function(i){
...
#returns a dataframe
}
system.time(
res<- foreach( i = 1:106800, .verbose = TRUE) %dopar% calc(i)
)
stopCluster(cl)
If I run that code from 1:5, it finishes successfully.
The same happens if I run that code from 106000 - 106800.
But it fails if I run the full vector 1-106800, or even 100000-106800 (these are not the very exact numbers I am working with but better readable) with this error message:
...
got results for task 6813
numValues: 6814, numResults: 6813, stopped: TRUE
returning status FALSE
got results for task 6814
numValues: 6814, numResults: 6814, stopped: TRUE
calling combine function
evaluating call object to combine results:
fun(accum, result.6733, result.6734, result.6735, result.6736,
result.6737, result.6738, result.6739, result.6740, result.6741,
result.6742, result.6743, result.6744, result.6745, result.6746,
result.6747, result.6748, result.6749, result.6750, result.6751,
result.6752, result.6753, result.6754, result.6755, result.6756,
result.6757, result.6758, result.6759, result.6760, result.6761,
result.6762, result.6763, result.6764, result.6765, result.6766,
result.6767, result.6768, result.6769, result.6770, result.6771,
result.6772, result.6773, result.6774, result.6775, result.6776,
result.6777, result.6778, result.6779, result.6780, result.6781,
result.6782, result.6783, result.6784, result.6785, result.6786,
result.6787, result.6788, result.6789, result.6790, result.6791,
result.6792, result.6793, result.6794, result.6795, result.6796,
result.6797, result.6798, result.6799, result.6800, result.6801,
result.6802, result.6803, result.6804, result.6805, result.6806,
result.6807, result.6808, result.6809, result.6810, result.6811,
result.6812, result.6813, result.6814)
returning status TRUE
Error in calc(i) :
task 1 failed - "object of type 'S4' is not subsettable"
I have no clue why I get this error message. Unfortunately, I cannot provide a running example as I cannot reproduce it with some simple code. Is a single job failing? If yes, how can I find which one fails? Or any other ideas how to troubleshoot?

Related

RcallMethod Error While Writing Data Frame to Oracle DB Using Parallel Approach

I am trying to write my dataframe to Oracle DB using RJDBC connection. I am trying to implement a parallel approach using foreach / parLappy. Here is my code
Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181')
library(rJava)
library(RJDBC)
library(DBI)
jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"")
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:#//XXXX/YYY", "ZZZ", "TTT")
# connected to DB
After this step I get some data from DB. After processing it I want to write the obtained data frame (brand3.merge.u) to another table in Oracle DB. My code is
library(foreach)
library(doParallel)
#setup clusters
cl<-makeCluster(7)
registerDoParallel(cl)
clusterExport(cl, varlist = list("jdbcConnection", "brand3.merge.u"))
foreach(x = 1:length(brand3.merge.u$CELL_PH_NUM), .packages=c( "rJava", "RJDBC", "DBI", "data.table"), .combine = 'c') %dopar% {
rJava::.jinit()
RJDBC::dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8])
}
I use rJava::.jinit() to avoid JVM error. But now I am getting
Error in { :
task 1 failed - "RcallMethod: attempt to call a method of a NULL object."
error. How can I avoid this error? When I use print function and print my dataframe inside foreach I can get result but dbSendUpdate function yields an error. How can I fix my "do stuff" part of the foreach loop?
NOTE : I have already seen the similar question Error in { : task 3 failed - "RcallMethod: attempt to call a method of a NULL object." but in this question "do stuff" part of the foreach is not given and I already used clusterExport function. So my question is not a duplicated question.
SOLUTION: Thanks to #HenrikB and #F. Privé I solved the problem. To anyone who face with the same problem my solution is:
foreach(x = 1:iters, .packages=c( "rJava", "RJDBC", "DBI"), .combine = 'c') %dopar% {
rJava::.jinit()
jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"") # IDENTIFIER.QUOTE!!!!!
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:#//XXXX/YYY", "ZZZ", "TTT")
RJDBC::dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8])
dbDisconnect(jdbcConnection)
}

How to show error location in tryCatch?

Displaying error locations with options(show.error.locations = TRUE) doesn't seem to work when handling exceptions with tryCatch. I am trying to display location of the error but I don't know how:
options(show.error.locations = TRUE)
tryCatch({
some_function(...)
}, error = function (e, f, g) {
e <<- e
cat("ERROR: ", e$message, "\nin ")
print(e$call)
})
If I then look at the variable e, the location doesn't seem to be there:
> str(e)
List of 2
$ message: chr "missing value where TRUE/FALSE needed"
$ call : language if (index_smooth == "INDEX") { rescale <- 100/meanMSI[plotbaseyear] ...
- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
If I don't trap the error, it is printed on the console along with source file and line number. How to do it with tryCatch?

Context
As noted by Willem van Doesburg, it is not possible to use the traceback() function to display where the error occured with tryCatch(), and to my knowledge there is currently no practical way to store the position of the error with base functions in R while using tryCatch .
The idea of a separate error handler
The possible solution I found consists of two parts, the main one is writing an error handler similar to that of Chrispy from "printing stack trace and continuing after error occurs in R" which produces a log with the position of the error.
The second part is capturing this output into a variable, similarly to what was suggested by Ben Bolker in "is it possible to redirect console output to a variable".
The call stack in R seems to be purged when an error is raised and then handled (I might be wrong so any information is welcomed), hence we need to capture the error while it is occuring.
Script with an error
I used an example from one of your previous questions regarding where and R error occured with the following function stored in a file called "TestError.R" which I call in my example bellow:
# TestError.R
f2 <- function(x)
{
if (is.null(x)) "x is Null"
if (x==1) "foo"
}
f <- function(x)
{
f2(x)
}
# The following line will raise an error if executed
f(NULL)
Error tracing function
This is the function I adapted form Chrispy's code as I mentionned above.
Upon execution, if an error is raised, the code underneath will print where the error occured, in the case of the above function, it will print :
"Error occuring: Test.R#9: f2(x)" and "Error occuring: Test.R#14: f(NULL)" meaning the error result from a trouble with the f(NULL) function at line 14 which references the f2() function at line 9
# Error tracing function
withErrorTracing = function(expr, silentSuccess=FALSE) {
hasFailed = FALSE
messages = list()
warnings = list()
errorTracer = function(obj) {
# Storing the call stack
calls = sys.calls()
calls = calls[1:length(calls)-1]
# Keeping the calls only
trace = limitedLabels(c(calls, attr(obj, "calls")))
# Printing the 2nd and 3rd traces that contain the line where the error occured
# This is the part you might want to edit to suit your needs
print(paste0("Error occuring: ", trace[length(trace):1][2:3]))
# Muffle any redundant output of the same message
optionalRestart = function(r) { res = findRestart(r); if (!is.null(res)) invokeRestart(res) }
optionalRestart("muffleMessage")
optionalRestart("muffleWarning")
}
vexpr = withCallingHandlers(withVisible(expr), error=errorTracer)
if (silentSuccess && !hasFailed) {
cat(paste(warnings, collapse=""))
}
if (vexpr$visible) vexpr$value else invisible(vexpr$value)
}
Storing the error position and the message
We call the script TestError.R above and capture the printed output in a variable, here called errorStorage with which we can deal later on or simply display.
errorStorage <- capture.output(tryCatch({
withErrorTracing({source("TestError.R")})
}, error = function(e){
e <<- e
cat("ERROR: ", e$message, "\nin ")
print(e$call)
}))
Hence we keep the value of e with the call and message as well as the position of the error location.
The errorStorage output should be as follow:
[1] "[1] \"Error occuring: Test.R#9: f2(x)\" \"Error occuring: Test.R#14: f(NULL)\""
[2] "ERROR: argument is of length zero "
[3] "in if (x == 1) \"foo\""
Hoping this might help.

You can use traceback() in the error handler to show the call stack. Errors in a tryCatch don't produce line numbers. See also the help on traceback. If you use your tryCatch statements defensively, this will help you narrow down the location of the error.
Here is a working example:
## Example of Showing line-number in Try Catch
# set this variable to "error", "warning" or empty ('') to see the different scenarios
case <- "error"
result <- "init value"
tryCatch({
if( case == "error") {
stop( simpleError("Whoops: error") )
}
if( case == "warning") {
stop( simpleWarning("Whoops: warning") )
}
result <- "My result"
},
warning = function (e) {
print(sprintf("caught Warning: %s", e))
traceback(1, max.lines = 1)
},
error = function(e) {
print(sprintf("caught Error: %s", e))
traceback(1, max.lines = 1)
},
finally = {
print(sprintf("And the result is: %s", result))
})

pmap bounds error: parallel julia

I get a bounds error when running a function in parallel that runs fine normally (sequentially) e.g. when I run:
parallelsol = #time pmap(i -> findividual(x,y,z), 1:50)
It gives me an error:
exception on 2: exception on exception on 16: 20exception on 5: : ERROR: BoundsError()
in getindex at array.jl:246 (repeats 2 times)
But when I run:
parallelsol = #time map(i -> findividual(prodexcint,firstrun,q,r,unempinc,VUnempperm,Indunempperm,i,VUnemp,poachedwagevec, mw,k,Vp,Vnp,reswage), 1:50)
It runs fine. Any ideas as to why this might be happening?

How to continue function when error is thrown in withCallingHandlers in R

I'm writing a test case for an R function that tests whether an error is being thrown and caught correctly at a certain point in the function and I'm having some trouble getting the test to continue when an error is thrown during execution in withCallingHandlers(...). I'm using this approach:
counter <- 0
withCallingHandlers({
testingFunction(df0, df1)
testingFunction(df2, df3)
testingFunction(df4, df5)
}, warning=function(war){
print(paste(war$message))
}, error=function(err){
print(paste(err$message))
if(err$message == paste("The function should throw this error message",
"at the right time.")){
counter <<- counter + 1
}
})
stopifnot(counter == 2)
The problem I'm running into is that the script is exiting after the first error is (successfully) caught and I'm not sure how to handle the error so that after it's caught, withCallingHandlers simply continues onto the next part of its execution. I understand that it has something to do with a restart object but I'm not sure how to use them correctly. Does anyone know how I could manipulate the above code so that execution of withCallingHandlers(...) continues even when an error is caught?

For a test function
fun1 = function() stop("something specific went wrong")
the idiom
obs = tryCatch(fun1(), error=conditionMessage)
exp = "something specific went wrong"
stopifnot(identical(exp, obs))
is maybe a tidier version of Ryan's, and like his avoids the unfortunate case where an error is thrown but for the wrong reason. The same paradigm works for warnings
fun2 = function(x) as.integer(x)
obs = tryCatch(fun2(c("1", "two")), warning=conditionMessage)
stopifnot(identical("NAs introduced by coercion", obs))
and to check for 'clean' evaluation
obs = tryCatch(fun2(c("1", "2")), warning=conditionMessage,
error=conditionMessage)
stopifnot(identical(1:2, obs))
This is ok, provided Sys.getlocale() is "C" or another encoding that doesn't change the translation of the condition messages.

You can just wrap each call to testingFunction with a call to tryCatch.:
counter <- 0
testForExpectedError <- function(expr) {
tryCatch(expr, error=function(err) {
print(paste(err$message))
if(err$message == paste("The function should throw this error message",
"at the right time.")){
counter <<- counter + 1
}
})
}
testForExpectedError(testingFunction(df0, df1))
testForExpectedError(testingFunction(df2, df3))
testForExpectedError(testingFunction(df4, df5))
stopifnot(counter == 2)

overriding R's incomplete error messages

When a call exists of multiple lines, a potential error only includes the first line of match.call() resulting in some lost information and an incomplete sentence. A simple example:
#proper error message:
runif(n=1, k=5)
#incomplete error message:
runif(n=1, k={5})
What would be a way to get R to include the full call to the error message (maybe by collapsing the multiple lines or so)? I am mostly interested in using this in a tryCatch setting.

I had a go at investigating the error object in a tryCatch setting via:
tryCatch( runif(n=1,k={5}),
error = function(e) recover() )
And then selected the 4th environment (value[[3]](cond)) to examine e.
I noticed that e$call was:
Browse[1]> e$call
runif(n = 1, k = {
5
})
So it seems that the error message just uses that first line.
You can collapse all the lines together with:
Browse[1]> paste(deparse(e$call),collapse='')
[1] "runif(n = 1, k = { 5})"
So you could try something like:
tryCatch( runif(n=1,k={5}),
error = function(e) {
cat(sprintf('Error in %s: %s\n',
paste(deparse(e$call),collapse=''),
e$message))
} )
But this doesn't fix up the error message itself, just the call leading up to it:
Error in runif(n = 1, k = { 5}): unused argument(s) (k = {
So the 'Error in xxx' is complete, but the 'unused argument(s) xxx' is still not. It's a start, but not all the way there.
I'm not sure how to improve on this (and am also interested to know if it's possible).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

foreach-loop (R/doParallel package) fails with big number of iterations - r

Related

RcallMethod Error While Writing Data Frame to Oracle DB Using Parallel Approach

How to show error location in tryCatch?

pmap bounds error: parallel julia

How to continue function when error is thrown in withCallingHandlers in R

overriding R's incomplete error messages

Categories

Resources