How to redo tryCatch after error in for loop - r

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?

You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}

If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

Related

Execution of future package in R results in an endless waiting time

i have a Problem with the future package. In my Task I try to set up an asynchronous process. I am doing this using Futures. If I run my script for the first time (in a clean RSession) everything is working fine and as expected. Running the same function for the second time, within the same R Session, Ends up in an endless waiting time. The execution stops in the line, where the Futures are started. No Errors are thrown. The Code just runs forever. If I interrupt the Code manually, the browser is called from the line:
Sys.sleep(interval).
Doing this a little bit earlier the call is made from:
Called from: socketSelect(list(con), write = FALSE, timeout = timeout).
I have written a small program, which has basically the same structure as my script and the same Problem occurs. While not obvious in this little example, this structure has some advantage in my original code:
library(future)
library(parallel)
asynchronousfunction <- function(){
Threads.2.start <- availableCores()
cl <- parallel::makePSOCKcluster(Threads.2.start)
plan(cluster, workers = cl)
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
while(!any(resolved(threads))){
Sys.sleep(0.1)
}
threads <- lapply(1:Threads.2.start, function(index){
future::cluster({Sys.getpid()},persistent = TRUE, workers = cl[[index]])
})
stopCluster(cl = cl)
}
asynchronousfunction() # First call to the function. Everything is working fine.
asynchronousfunction() #Second call to the function. Endless Execution.
I am working on Windows 10 and the R Version is 3.4.2. The package Version is 1.6.2.
I hope you Guys can help me.
Thanks in advance.
Best Regards,
Harvard
Author future here. It looks a like you've tried to overdo it a bit and I am not 100% sure what you're trying to achieve. Things that looks suspicious to me is your use of:
cluster() - call future() instead.
cluster(..., workers = cl[[index]]) - don't specify workers when you set up a future.
Is there a reason why you want to use persistent = TRUE?
resolve(threads) basically does the same as your while() loop.
You are not collecting the values of the futures, i.e. you're not calling value() or values().
For troubleshooting, you can get more details on what's going on under the hood by setting option(future.debug = TRUE).
If I'd rewrite your example as close to what you have now, a working example would look like this:
library("future")
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
## Can be replaced by resolve(fs)
while(!any(resolved(fs))) {
Sys.sleep(0.1)
}
fs <- lapply(1:n, function(index) {
future({ Sys.getpid() }, persistent = TRUE)
})
parallel::stopCluster(cl = cl)
}
Instead of rolling your own lapply() + future(), would it be sufficient for you to use future_lapply()? For example,
asynchronousfunction <- function() {
n <- availableCores()
cl <- makeClusterPSOCK(n)
plan(cluster, workers = cl)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
pids <- future_lapply(1:n, function(ii) {
Sys.getpid()
})
str(pids)
parallel::stopCluster(cl = cl)
}

TryCatch with parLapply (Parallel package) in R

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?
In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like
message(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.
What you could do is something like this (with another example, reproducible):
test1 <- function(i) {
dat <- NA
try({
if (runif(1) < 0.8) {
dat <- rnorm(i)
} else {
stop("Error!")
}
})
return(dat)
}
cl <- parallel::makeCluster(3)
dat <- parallel::parLapply(cl, 1:100, test1)
See this related question for other solutions. I think using foreach with .errorhandling = "pass" would be another good solution.

Handling 404 and other bad responses with looping twitteR queries

I would like it if my loop wouldn't break if I get an error. I want it to just move on to the next iteration. This example is a minimal example of the error I'm getting and the loop breaking. In my real application I'm iterating through some of the followers I've generated from another script.
library(twitteR)
#set oauth...
for(i in 1:10) {
+ x <- getUser("nalegezx") }
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"), params = params, :
client error: (404) Not Found
I understand that this loop would simply rewrite the same response to x. I'm just interested in not breaking the loop.
I'm not an expert in the R Twitter API, but I can suggest that you consider placing your call to getUser() inside a try block like this:
for (i in 1:10) {
x <- try(getUser("sdlfkja"))
}
This should stop your code from crashing in the middle of the loop. If you want to also have separate logic when a warning or error occurs in the loop, you can use tryCatch:
for (i in 1:10) {
x <- tryCatch(getUser("sdlfkja"),
warning = function(w) {
print("warning");
# handle warning here
},
error = function(e) {
print("error");
# handle error here
})
}
I accepted Tim's answer because it resolved the problem I had but for the specific instance of getting many results from twitter on the profile for users I used lookupUsers which does the job for me without messing with my request limit.

Clean way to wrap-up and handle RMySQL connections?

I'm fairly new to R, so forgive me if this is a amateur question. I still don't get parts of how the R language works and I haven't used closures enough to really build intuition on how to approach this problem.
I want to wrap up opening and closing a database connection in my R project in a clean way. I have a variety of scripts set aside that all use a common DB connection configuration file (I don't put it in my repo, it's a local file only), all of which need to connect to the same MySQL database.
The end goal is to do something like :
query <- db_open()
out <- query("select * from example limit 10")
db_close()
This is what I wrote so far (all my scripts load these functions from another .R file) :
db_open <- function() {
db_close()
db_conn <<- dbConnect(MySQL(), user = db_user, password = db_pass, host = db_host)
query <- function(...) { dbGetQuery(db_conn, ...) }
return(query)
}
db_close <- function() {
result <- tryCatch({
dbDisconnect(db_conn)
}, warning = function(w) {
# ignore
}, error = function(e) {
return(FALSE)
})
return(result)
}
I'm probably thinking of this in an OOP way when I shouldn't be, but sticking db_conn in the global environment feels unnecessary or even wrong.
Is this a reasonable way to accomplish what I want? Is there a better way that I'm missing here?
Any advice is appreciated.
You basically had it, you just need to move the query function into its own function. Regarding keeping db_conn, there really is no reason not to have it in the global environment.
db_open <- function() {
db_close()
db_conn <<- dbConnect(MySQL(), user='root', password='Use14Characters!', dbname='msdb_complex', host='localhost')
}
db_close <- function() {
result <- tryCatch({
dbDisconnect(db_conn)
}, warning = function(w) {
# ignore
}, error = function(e) {
return(FALSE)
})
return(return)
}
query <- function(x,num=-1)
{
q <- dbSendQuery(db_conn, x)
s <- fetch(q, num);
}
Then you should be able to do something like:
query <- db_open()
results <- query("SELECT * FROM msenrollmentlog", 10)
db_close()

Stopping an R script quietly and return control to the terminal [duplicate]

Is there any way to stop an R program without error?
For example I have a big source, defining several functions and after it there are some calls to the functions. It happens that I edit some function, and want the function definitions to be updated in R environment, but they are not actually called.
I defined a variable justUpdate and when it is TRUE want to stop the program just after function definitions.
ReadInput <- function(...) ...
Analyze <- function(...) ...
WriteOutput <- function(...) ...
if (justUpdate)
stop()
# main body
x <- ReadInput()
y <- Analyze(x)
WriteOutput(y)
I have called stop() function, but the problem is that it prints an error message.
ctrl+c is another option, but I want to stop the source in specific line.
The problem with q() or quit() is that it terminates R session, but I would like to have the R session still open.
As #JoshuaUlrich proposed browser() can be another option, but still not perfect, because the source terminates in a new environment (i.e. the R prompt will change to Browser[1]> rather than >). Still we can press Q to quit it, but I am looking for the straightforward way.
Another option is to use if (! justUpdate) { main body } but it's clearing the problem, not solving it.
Is there any better option?
I found a rather neat solution here. The trick is to turn off all error messages just before calling stop(). The function on.exit() is used to make sure that error messages are turned on again afterwards. The function looks like this:
stop_quietly <- function() {
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
stop()
}
The first line turns off error messages and stores the old setting to the variable opt. After this line, any error that occurs will not output a message and therfore, also stop() will not cause any message to be printed.
According to the R help,
on.exit records the expression given as its argument as needing to be executed when the current function exits.
The current function is stop_quietly() and it exits when stop() is called. So the last thing that the program does is call options(opt) which will set show.error.messages to the value it had, before stop_quietly() was called (presumably, but not necessarily, TRUE).
There is a nice solution in a mailing list here that defines a stopQuietly function that basically hides the error shown from the stop function:
stopQuietly <- function(...) {
blankMsg <- sprintf("\r%s\r", paste(rep(" ", getOption("width")-1L), collapse=" "));
stop(simpleError(blankMsg));
} # stopQuietly()
> stopQuietly()
I have a similar problem and, based on #VangelisTasoulas answer, I got a simple solution.
Inside functions, I have to check if DB is updated. If it is not, stop the execution.
r=readline(prompt="Is DB updated?(y/n)")
Is DB updated?(y/n)n
if(r != 'y') stop('\r Update DB')
Update DB
Just putting \r in the beginning of the message, overwrite Error: in the message.
You're looking for the function browser.
You can use the following solution to stop an R program without error:
if (justUpdate)
return(cat(".. Your Message .. "))
Just return something at the line you want to quit the function:
f <- function(x, dry=F) {
message("hi1")
if (dry) return(x)
message("hi2")
x <- 2*x
}
y1 <- f(2) # = 4 hi1 hi2
y2 <- f(2, dry=T) # = 2 hi1
In addition to answer from Stibu on Mar 22 '17 at 7:29, if you want to write a message as a part of stop(), this message is not written.
I perceive strange that following two lines have to be used meaning on.exit(options(options(show....))) doesn't work.
opt <- options(show.error.messages = F)
on.exit(options(opt))
I had forgotten the answer to this and needed to look it up and landed here... You posted the hint to the answer in your question...
ctrl+c is another option, but I want to stop the source in specific line.
Signal an error, warning, or message
rlang::inform("Updated Only")
rlang::interrupt()
I've found it good to write a script and run it with source(). In the script, a write exit statements as a special class of error that a tryCatch() can pick up and send back as just a message:
exit <- function(..., .cl = NULL) {
# Use to capture acceptable stop
cond <- structure(
list(.makeMessage(...), .cl),
class = c("exitError", "error", "condition"),
names = c("message", "call")
)
stop(cond)
}
foo <- function() {
exit("quit here")
1
}
tryCatch(
# rather than foo(), you might use source(filename)
foo(),
exitError = function(e) message(e$message)
)
#> quit here
Created on 2022-01-24 by the reprex package (v2.0.1)
You can use with_options() in the withr package to temporarily disable error messages and then you can call stop() directly.
Here is an example:
weird_math <- function(x, y, z) {
if (x > z) {
withr::with_options(
list(show.error.messages = FALSE),
{
print("You can execute other code here if you want")
stop()
}
)
}
# only runs if x <= z
x + y ^ z
}
weird_math(1, 2, 3)
[1] 9
weird_math(3, 2, 1)
[1] "You can execute other code here if you want"
why not just use an if () {} else {}? It's only a couple of characters...
f1 <- function(){}
f2 <- function(){}
if (justUpdate) {
} else {
# main body
}
or even
f1 <- function(){}
f2 <- function(){}
if (!justUpdate) {
# main body
}
The below code work for me stopped without error messages.
opt <- options(show.error.messages = FALSE)
on.exit(options(opt))
break

Resources