TryCatch with parLapply (Parallel package) in R - r

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?

In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like
message(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.

What you could do is something like this (with another example, reproducible):
test1 <- function(i) {
dat <- NA
try({
if (runif(1) < 0.8) {
dat <- rnorm(i)
} else {
stop("Error!")
}
})
return(dat)
}
cl <- parallel::makeCluster(3)
dat <- parallel::parLapply(cl, 1:100, test1)
See this related question for other solutions. I think using foreach with .errorhandling = "pass" would be another good solution.

Related

How to redo tryCatch after error in for loop

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?
You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}
If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

How can I use error handling when reading xlsx files in R?

I am trying to read 23 excel files, store each in a list, and then rbind them to one csv. Some of these file are csv and some of them are xlsx. However, I got the following message:
Error: Can't establish that the input is either xls or xlsx.
So I want to identify which ones are giving error and then append it manually.
My function is the following:
make_df<-function(filename){
library(readxl)
library(foreign)
if (str_sub(filename,-3,-1) == "csv"){
df<-read.csv(filename,fileEncoding="latin1")
}
else{
df<-read_excel(filename)
}
return(df)
}
filenames_vector<-list.files(# directory)
datalist = list()
for (i in 1:23){
datalist[[i]] <- make_df(filenames_vector[i])
}
mega_data = do.call(rbind,datalist)
How can I add something in make_df to print out the names of files that are causing the error message? Also, is there another work around, when the the error message is on not being able to distinguish xlsx from xls?
This can be done with a tryCatch block. Without example data it's a little hard to recreate. I'm not sure what you mean in your second question.
Try the code below to catch errors and print out the filename if there's an error, otherwise return the df object.
make_df<-function(filename){
library(readxl)
library(foreign)
df = tryCatch(
{ # try block
if (str_sub(filename,-3,-1) == "csv"){
df<-read.csv(filename,fileEncoding="latin1")
}
else{
df<-read_excel(filename)
}
},
error=function(cond){return(filename)} # grab the filename if there was an error
)
if (class(df) == 'character') {
print(df)
} else{return(df)}
}

failed - "cannot open the connection"

I am trying to read several url files. Does anyone know how to check first if it can open the url and then do something? Sometimes I am getting error (failed="cannot open the connection"). I just want this to skip if it cannot open the connection.
urlAdd=paste0(server,siteID,'.dly')
# Reading the whole data in the page
if(url(urlAdd)) {
tmp <- read.fwf(urlAdd,widths=c(11,4,2,4,rep(c(5,1,1,1),31)))
}
But this condition fails.
You can use tryCatch which returns the value of the expression if it succeeds, and the value of the error argument if there is an error. See ?tryCatch.
This example looks up a bunch of URLs and downloads them. The tryCatch will return the result of readlines if it is successful, and NULL if not. If the result is NULL we just next() to the next part of the loop.
urls <- c('http://google.com', 'http://nonexistent.jfkldasf', 'http://stackoverflow.com')
for (u in urls) {
# I only put warn=F to avoid the "incomplete final line" warning
# you put read.fwf or whatever here.
tmp <- tryCatch(readLines(url(u), warn=F),
error = function (e) NULL)
if (is.null(tmp)) {
# you might want to put some informative message here.
next() # skip to the next url.
}
}
Note this will do so on any error, not just a "404 not found"-type error. If I typo'd and wrote tryCatch(raedlines(url(u), warn=F) (typo on readLines) it'd just skip everything as this would also through an error.
edit re: comments (lapply is being used, where to put data-processing code). Instead of next(), just only do your processing if the read succeeds. Put data-processing code after reading-data code. Try something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (is.null(tmp)) {
# read failed
return() # or return whatever you want the failure value to be
}
# data processing code goes here.
})
The above returns out of the function (only affects the current element of lapply) if the read fails.
Or you could invert it and do something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (!is.null(tmp)) {
# read succeeded!
# data processing code goes here.
}
})
which will do the same (it only does your data processing code if the read succeeded, and otherwise skips that whole block of code and returns NULL).

Stopping an R script quietly and return control to the terminal [duplicate]

Is there any way to stop an R program without error?
For example I have a big source, defining several functions and after it there are some calls to the functions. It happens that I edit some function, and want the function definitions to be updated in R environment, but they are not actually called.
I defined a variable justUpdate and when it is TRUE want to stop the program just after function definitions.
ReadInput <- function(...) ...
Analyze <- function(...) ...
WriteOutput <- function(...) ...
if (justUpdate)
stop()
# main body
x <- ReadInput()
y <- Analyze(x)
WriteOutput(y)
I have called stop() function, but the problem is that it prints an error message.
ctrl+c is another option, but I want to stop the source in specific line.
The problem with q() or quit() is that it terminates R session, but I would like to have the R session still open.
As #JoshuaUlrich proposed browser() can be another option, but still not perfect, because the source terminates in a new environment (i.e. the R prompt will change to Browser[1]> rather than >). Still we can press Q to quit it, but I am looking for the straightforward way.
Another option is to use if (! justUpdate) { main body } but it's clearing the problem, not solving it.
Is there any better option?
I found a rather neat solution here. The trick is to turn off all error messages just before calling stop(). The function on.exit() is used to make sure that error messages are turned on again afterwards. The function looks like this:
stop_quietly <- function() {
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
stop()
}
The first line turns off error messages and stores the old setting to the variable opt. After this line, any error that occurs will not output a message and therfore, also stop() will not cause any message to be printed.
According to the R help,
on.exit records the expression given as its argument as needing to be executed when the current function exits.
The current function is stop_quietly() and it exits when stop() is called. So the last thing that the program does is call options(opt) which will set show.error.messages to the value it had, before stop_quietly() was called (presumably, but not necessarily, TRUE).
There is a nice solution in a mailing list here that defines a stopQuietly function that basically hides the error shown from the stop function:
stopQuietly <- function(...) {
blankMsg <- sprintf("\r%s\r", paste(rep(" ", getOption("width")-1L), collapse=" "));
stop(simpleError(blankMsg));
} # stopQuietly()
> stopQuietly()
I have a similar problem and, based on #VangelisTasoulas answer, I got a simple solution.
Inside functions, I have to check if DB is updated. If it is not, stop the execution.
r=readline(prompt="Is DB updated?(y/n)")
Is DB updated?(y/n)n
if(r != 'y') stop('\r Update DB')
Update DB
Just putting \r in the beginning of the message, overwrite Error: in the message.
You're looking for the function browser.
You can use the following solution to stop an R program without error:
if (justUpdate)
return(cat(".. Your Message .. "))
Just return something at the line you want to quit the function:
f <- function(x, dry=F) {
message("hi1")
if (dry) return(x)
message("hi2")
x <- 2*x
}
y1 <- f(2) # = 4 hi1 hi2
y2 <- f(2, dry=T) # = 2 hi1
In addition to answer from Stibu on Mar 22 '17 at 7:29, if you want to write a message as a part of stop(), this message is not written.
I perceive strange that following two lines have to be used meaning on.exit(options(options(show....))) doesn't work.
opt <- options(show.error.messages = F)
on.exit(options(opt))
I had forgotten the answer to this and needed to look it up and landed here... You posted the hint to the answer in your question...
ctrl+c is another option, but I want to stop the source in specific line.
Signal an error, warning, or message
rlang::inform("Updated Only")
rlang::interrupt()
I've found it good to write a script and run it with source(). In the script, a write exit statements as a special class of error that a tryCatch() can pick up and send back as just a message:
exit <- function(..., .cl = NULL) {
# Use to capture acceptable stop
cond <- structure(
list(.makeMessage(...), .cl),
class = c("exitError", "error", "condition"),
names = c("message", "call")
)
stop(cond)
}
foo <- function() {
exit("quit here")
1
}
tryCatch(
# rather than foo(), you might use source(filename)
foo(),
exitError = function(e) message(e$message)
)
#> quit here
Created on 2022-01-24 by the reprex package (v2.0.1)
You can use with_options() in the withr package to temporarily disable error messages and then you can call stop() directly.
Here is an example:
weird_math <- function(x, y, z) {
if (x > z) {
withr::with_options(
list(show.error.messages = FALSE),
{
print("You can execute other code here if you want")
stop()
}
)
}
# only runs if x <= z
x + y ^ z
}
weird_math(1, 2, 3)
[1] 9
weird_math(3, 2, 1)
[1] "You can execute other code here if you want"
why not just use an if () {} else {}? It's only a couple of characters...
f1 <- function(){}
f2 <- function(){}
if (justUpdate) {
} else {
# main body
}
or even
f1 <- function(){}
f2 <- function(){}
if (!justUpdate) {
# main body
}
The below code work for me stopped without error messages.
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
break

Stop an R program without error

Is there any way to stop an R program without error?
For example I have a big source, defining several functions and after it there are some calls to the functions. It happens that I edit some function, and want the function definitions to be updated in R environment, but they are not actually called.
I defined a variable justUpdate and when it is TRUE want to stop the program just after function definitions.
ReadInput <- function(...) ...
Analyze <- function(...) ...
WriteOutput <- function(...) ...
if (justUpdate)
stop()
# main body
x <- ReadInput()
y <- Analyze(x)
WriteOutput(y)
I have called stop() function, but the problem is that it prints an error message.
ctrl+c is another option, but I want to stop the source in specific line.
The problem with q() or quit() is that it terminates R session, but I would like to have the R session still open.
As #JoshuaUlrich proposed browser() can be another option, but still not perfect, because the source terminates in a new environment (i.e. the R prompt will change to Browser[1]> rather than >). Still we can press Q to quit it, but I am looking for the straightforward way.
Another option is to use if (! justUpdate) { main body } but it's clearing the problem, not solving it.
Is there any better option?
I found a rather neat solution here. The trick is to turn off all error messages just before calling stop(). The function on.exit() is used to make sure that error messages are turned on again afterwards. The function looks like this:
stop_quietly <- function() {
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
stop()
}
The first line turns off error messages and stores the old setting to the variable opt. After this line, any error that occurs will not output a message and therfore, also stop() will not cause any message to be printed.
According to the R help,
on.exit records the expression given as its argument as needing to be executed when the current function exits.
The current function is stop_quietly() and it exits when stop() is called. So the last thing that the program does is call options(opt) which will set show.error.messages to the value it had, before stop_quietly() was called (presumably, but not necessarily, TRUE).
There is a nice solution in a mailing list here that defines a stopQuietly function that basically hides the error shown from the stop function:
stopQuietly <- function(...) {
blankMsg <- sprintf("\r%s\r", paste(rep(" ", getOption("width")-1L), collapse=" "));
stop(simpleError(blankMsg));
} # stopQuietly()
> stopQuietly()
I have a similar problem and, based on #VangelisTasoulas answer, I got a simple solution.
Inside functions, I have to check if DB is updated. If it is not, stop the execution.
r=readline(prompt="Is DB updated?(y/n)")
Is DB updated?(y/n)n
if(r != 'y') stop('\r Update DB')
Update DB
Just putting \r in the beginning of the message, overwrite Error: in the message.
You're looking for the function browser.
You can use the following solution to stop an R program without error:
if (justUpdate)
return(cat(".. Your Message .. "))
Just return something at the line you want to quit the function:
f <- function(x, dry=F) {
message("hi1")
if (dry) return(x)
message("hi2")
x <- 2*x
}
y1 <- f(2) # = 4 hi1 hi2
y2 <- f(2, dry=T) # = 2 hi1
In addition to answer from Stibu on Mar 22 '17 at 7:29, if you want to write a message as a part of stop(), this message is not written.
I perceive strange that following two lines have to be used meaning on.exit(options(options(show....))) doesn't work.
opt <- options(show.error.messages = F)
on.exit(options(opt))
I had forgotten the answer to this and needed to look it up and landed here... You posted the hint to the answer in your question...
ctrl+c is another option, but I want to stop the source in specific line.
Signal an error, warning, or message
rlang::inform("Updated Only")
rlang::interrupt()
I've found it good to write a script and run it with source(). In the script, a write exit statements as a special class of error that a tryCatch() can pick up and send back as just a message:
exit <- function(..., .cl = NULL) {
# Use to capture acceptable stop
cond <- structure(
list(.makeMessage(...), .cl),
class = c("exitError", "error", "condition"),
names = c("message", "call")
)
stop(cond)
}
foo <- function() {
exit("quit here")
1
}
tryCatch(
# rather than foo(), you might use source(filename)
foo(),
exitError = function(e) message(e$message)
)
#> quit here
Created on 2022-01-24 by the reprex package (v2.0.1)
You can use with_options() in the withr package to temporarily disable error messages and then you can call stop() directly.
Here is an example:
weird_math <- function(x, y, z) {
if (x > z) {
withr::with_options(
list(show.error.messages = FALSE),
{
print("You can execute other code here if you want")
stop()
}
)
}
# only runs if x <= z
x + y ^ z
}
weird_math(1, 2, 3)
[1] 9
weird_math(3, 2, 1)
[1] "You can execute other code here if you want"
why not just use an if () {} else {}? It's only a couple of characters...
f1 <- function(){}
f2 <- function(){}
if (justUpdate) {
} else {
# main body
}
or even
f1 <- function(){}
f2 <- function(){}
if (!justUpdate) {
# main body
}
The below code work for me stopped without error messages.
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
break

Resources