How to handle API error in a foreach loop R? - r

FYI, based on some comments I added more information.
I created the following function that is making a call to an API:
keyword_checker <- function(keyword, domain, loc, lang){
keyword_to_check <- as.character(keyword)
api_request <- paste("https://script.google.com/blabalbalba",
"?kw=",keyword,
"&domain=",domain,
"&loc=",loc,
"&lang=",lang,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <-fromJSON(file = api_request)#Json file into Data Frame
return(data.frame(do.call("rbind", source$data$result))) ##in order to extract only the "results" data
I am using the R package foreach() with %dopar% and doSNOW to do many API calls (more than 120k calls).
Unfortunately, it happens that there are some errors (usually time out connection), so it makes the script stops. In order to avoid this problem I used the .errorhandling = 'pass'. Now, the script doesn't stop but I would like to know if there is a way to make the API call until I get an answer?
Here is my script:
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass', .packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
results <- indexed_url(test_keyword)} ##name of my function
##Stop cluster
stopCluster(cl)
I basically want my script to continue (without stopping the whole process) until I get the answer from the API call
Do I need to incorporate the TryCatch function within the foreach, OR is that better to "upgrade" the function that I created by adding something like "if the API doesn't give the answer, then wait until it gets it?"
I hope this is clearer.

Try using tryCatch inside the foreach function to catch the expected error messages (here failed API call due to time out). Below is a sample code snippet for the given function keyword_checker, based on my understanding.
library(foreach)
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass',
.packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
#results <- keyword_checker(test_keyword)} ##name of my function
results <- function(test_keyword){
dmy <- tryCatch(
{
keyword_checker(test_keyword)
},
error = function(cond){
message = "Timeout error! Calling again..."
dmy2 <- keyword_checker(test_keyword)
return(dmy2)
}
warning = function(cond){
message("Warning message:")
message(cond)
return(NULL)
}
finally = {
message(paste("Succesfully called API ", test_keyword))
}
)
return(dmy)
}
##Stop cluster
stopCluster(cl)
Here's a link which explains how to write tryCatch. Note, this snippet may not exactly work since I didn't run the code block. But calling the API caller again, when it fails should do the job.
Check this link, for a discussion on similar issue.

Here is an updated script including the TryCatch directly in the function.
indexed_url <- function(url){
url_to_check <- as.character(url)
api_request <- paste("https://script.google.com/macros/blablabalbalbaexec",
"?page=",url_to_check,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <- tryCatch({
fromJSON(file = api_request)#Convertir un Json file en Data Frame
}, error = function(e) {
cat(paste0("Une erreur a eu lieu :",e))
Sys.sleep(1)
indexed_url(url)
})
return(data.frame(do.call("rbind", source)))
}
Then running the foreach just the way it was is working perfectly. No more errors, and I have the full analysis.

Related

Assign variables to the global environment in a parallel loop

I am doing some heavy computations which I would like to speed up by performing it in a parallel loop. Moreover, I want the result of each calculation to be assigned to the global environment based on the name of the data currently processed:
fun <- function(arg) {
assign(arg, arg, envir = .GlobalEnv)
}
For loop
In a simple for loop, that would be the following and this works just fine:
for_fun <- function() {
data <- letters[1:10]
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
}
# Works as expected
for_fun()
In this function, I first get some data, loop over it, quote it (although not necessary) to be used in a function call. In reality, this function name is also dynamic which is why I am doing it this way.
Foreach
Now, I want to speed this up. My first thought was to use the foreach package (with a doParallel backend):
foreach_fun <- function() {
# Set up parallel backend
cl <- parallel::makeCluster(parallel::detectCores())
doParallel::registerDoParallel(cl)
data <- letters[1:10]
foreach(i = 1:length(data)) %dopar% {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
# Stop the parallel backend
parallel::stopCluster(cl)
doParallel::stopImplicitCluster()
}
# Error in { : task 1 failed - "could not find function "fun""
foreach_fun()
Replacing the whole quote-call-eval procedure with simply fun(data[i]) resolves the error but still nothing gets assigned.
Future
To ensure it wasn't a problem with the foreach package, I also tried the future package (although I am not familiar with it).
future_fun <- function() {
# Plan a parallel future
cl <- parallel::makeCluster(parallel::detectCores())
future::plan(cluster, workers = cl)
data <- letters[1:10]
# Create an explicit future
future(expr = {
for(i in 1:length(data)) {
dat <- quote(data[i])
call <- call("fun", dat)
eval(call)
}
})
# Stop the parallel future
parallel::stopCluster(cl)
future::plan(sequential)
}
# No errors but nothing assigned
# probably the future was never evaluated
future_fun()
Forcing the future to be evaluated (f <- future(...); value(f)) triggers the same error as by using foreach: Error in { : task 1 failed - "could not find function "fun""
Summary
In short, my questions are:
How do you assign variables to the global environment in a parallel loop?
Why does the function lookup fail?

tryRetry function in R?

I am looking for a tryCatch function in R that would retry n times instead of just once. One of my web request fails occasionally to return a value when the server is busy, but after one or two retries it usually works fine.
The excellent page How to write trycatch in R does not touch on this topic. I found the function TryRetry in C (orginally discussed in TryRetry - Try, Catch, then Retry) which accomplishes what I was looking for and I thought maybe a similar function exist in R in some package too?
Unfortunately, I don't have the skills to abstract an R code structure from the C example. I could just recall my function in the error handling portion of the tryCatch, but somehow this seems the wrong way to go, especially once you deal with more than one retry.
Any suggestions on how to approach a tryRetry-code structure in R would be appreciated.
You can implement a retry logic by relying on the RETRY method from the httr package and parsing the response in a second step.
In order to apply it to file download I would go down the following path (using this hosted .csv file as an example):
library(httr)
library(dplyr)
df <- RETRY(
"GET",
url = "https://www.stats.govt.nz/assets/Uploads/Business-operations-survey/Business-operations-survey-2018/Download-data/business-operations-survey-2018-business-finance-csv.csv",
times = 3) %>% # max retry attempts
content(., "parsed")
Here is a way of having a web read request tried several times before failing. It's an adaptation of the post linked to in the question, called in a loop a number of times chosen by the user. Between each try there is a Sys.sleep defaulting to 3 seconds.
I repost the function readUrl, changed. And with many comments deleted, they are in the original code.
readUrl <- function(url) {
out <- tryCatch(
{
message("This is the 'try' part")
text <- readLines(con=url, warn=FALSE)
return(list(ok = TRUE, contents = text))
},
error=function(cond) {
message(paste("URL does not seem to exist:", url))
message("Here's the original error message:")
message(paste(cond, "\n"))
# Choose a return value in case of error
return(list(ok = FALSE, contents = cond))
},
warning=function(cond) {
message(paste("URL caused a warning:", url))
message("Here's the original warning message:")
message(paste(cond, "\n"))
# Choose a return value in case of warning
return(list(ok = FALSE, contents = cond))
},
finally={
message(paste("Processed URL:", url))
message("Some other message at the end")
}
)
return(out)
}
readUrlRetry <- function(url, times = 1, secs = 3){
count <- 0L
while(count < times){
res <- readUrl(url)
count <- count + 1L
OK <- res$ok
if(OK) break
Sys.sleep(time = secs)
}
res
}
url <- c(
"http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html",
"http://en.wikipedia.org/wiki/Xz",
"xxxxx")
res <- lapply(url, readUrlRetry, times = 3)
res[[3]]
inherits(res[[3]]$contents, "warning")

Using tryCatch to skip execution upon error without exiting lapply()

I am trying to write a function that cleans spreadsheets. However, some of the spreadsheets are corrupted and will not open. I want the function to recognize this, print an error message, and skip execution of the rest of the function (since I am using lapply() to iterate across files), and continues. My current attempt looks like this:
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- tryCatch(read_xls(filename, col_names = F),
error = function (e){
warning(paste(filename, "cannot be opened; corrupted or does not exist"))
})
print(filename)
#rest of function
cand_df[1,1]
}
test_vec <- c("test.xls", "test2.xls", "test3.xls")
lapply(FUN = candidate.cleaner, X = test_vec)
However, this still executes the line of the function after the tryCatch statement when given a .xls file that does not exist, which throws a stop since I'm attempting to index a dataframe that doesn't exist. This exits the lapply call. How can I write the tryCatch call to make it skip execution of the rest of the function without exiting lapply?
One could set a semaphore at the start of the tryCatch() indicating that things have gone OK so far, then handle the error and signal that things have gone wrong, and finally check the semaphore and return from the function with an appropriate value.
lapply(1:5, function(i) {
value <- tryCatch({
OK <- TRUE
if (i == 2)
stop("stopping...")
i
}, error = function(e) {
warning("oops: ", conditionMessage(e))
OK <<- FALSE # assign in parent environment
}, finally = {
## return NA on error
OK || return(NA)
})
## proceed
value * value
})
This allows one to continue using the tryCatch() infrastructure, e.g., to translate warnings into errors. The tryCatch() block encapsulates all the relevant code.
Turns out, this can be accomplished in a simple way with try() and an additional help function.
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- try(read_xls(filename, col_names = F))
if(is.error(cand_df) == T){
return(list("Corrupted: rescrape", filename))
} else {
#storing election name for later matching
election_name <- cand_df[1,1]
}
}
Where is.error() is taken from Hadley Wickham's Advanced R chapter on debugging. It's defined as:
is.error <- function(x) inherits(x, "try-error")

Building a function for .combine in foreach

I have a process I want to do in parallel but I fail due to some strange error. Now I am considering to combine, and calculate the failing task on the master CPU. However I don't know how to write such a function for .combine.
How should it be written?
I know how to write them, for example this answer provides an example, but it doesn't provide how to handle with failing tasks, neither repeating a task on the master.
I would do something like:
foreach(i=1:100, .combine = function(x, y){tryCatch(?)} %dopar% {
long_process_which_fails_randomly(i)
}
However, how do I use the input of that task in the .combine function (if it can be done)? Or should I provide inside the %dopar% to return a flag or a list to calculate it?
To execute tasks in the combine function, you need to include extra information in the result object returned by the body of the foreach loop. In this case, that would be an error flag and the value of i. There are many ways to do this, but here's an example:
comb <- function(results, x) {
i <- x$i
result <- x$result
if (x$error) {
cat(sprintf('master computing failed task %d\n', i))
# Could call function repeatedly until it succeeds,
# but that could hang the master
result <- try(fails_randomly(i))
}
results[i] <- list(result) # guard against a NULL result
results
}
r <- foreach(i=1:100, .combine='comb',
.init=vector('list', 100)) %dopar% {
tryCatch({
list(error=FALSE, i=i, result=fails_randomly(i))
},
error=function(e) {
list(error=TRUE, i=i, result=e)
})
}
I'd be tempted to deal with this problem by executing the parallel loop repeatedly until all the tasks have been computed:
x <- rnorm(100)
results <- lapply(x, function(i) simpleError(''))
# Might want to put a limit on the number of retries
repeat {
ix <- which(sapply(results, function(x) inherits(x, 'error')))
if (length(ix) == 0)
break
cat(sprintf('computing tasks %s\n', paste(ix, collapse=',')))
r <- foreach(i=x[ix], .errorhandling='pass') %dopar% {
fails_randomly(i)
}
results[ix] <- r
}
Note that this solution uses the .errorhandling option which is very useful if errors can occur. For more information on this option, see the foreach man page.

Run until no error occurred

I want to execute a function which uses an internet connection to grep some online data. Because the connection is not very stable, it needs several attempts to run the function successfully.
Therfore I want to repeat or loop the function until it worked and also save the results.
tryCatch seems to be a suitable function but so far I did not find a way to solve the problem.
This is the function:
annotations(snp = 'rs1049434', output = 'snpedia')
and sometimes this error occur:
Error in annotations(snp = "rs1049434", output = "snpedia") :
server error: (502) Bad Gateway
The basic code schematically:
while( tmp == F){
ifelse(result <- function worked, tmp <- T, tmp <- F)}
And I need the output result which is a data.frame.
ANSWER (see the link in the comment from nicola):
bo=0
while(bo!=10){
x = try(annotations(snp = 'rs1049434', output = 'snpedia'),silent=TRUE)
if (class(x)=="try-error") {
cat("ERROR1: ", x, "\n")
Sys.sleep(1)
print("reconntecting...")
bo <- bo+1
print(bo)
} else
break
}

Resources