Using tryCatch to skip execution upon error without exiting lapply() - r

I am trying to write a function that cleans spreadsheets. However, some of the spreadsheets are corrupted and will not open. I want the function to recognize this, print an error message, and skip execution of the rest of the function (since I am using lapply() to iterate across files), and continues. My current attempt looks like this:
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- tryCatch(read_xls(filename, col_names = F),
error = function (e){
warning(paste(filename, "cannot be opened; corrupted or does not exist"))
})
print(filename)
#rest of function
cand_df[1,1]
}
test_vec <- c("test.xls", "test2.xls", "test3.xls")
lapply(FUN = candidate.cleaner, X = test_vec)
However, this still executes the line of the function after the tryCatch statement when given a .xls file that does not exist, which throws a stop since I'm attempting to index a dataframe that doesn't exist. This exits the lapply call. How can I write the tryCatch call to make it skip execution of the rest of the function without exiting lapply?

One could set a semaphore at the start of the tryCatch() indicating that things have gone OK so far, then handle the error and signal that things have gone wrong, and finally check the semaphore and return from the function with an appropriate value.
lapply(1:5, function(i) {
value <- tryCatch({
OK <- TRUE
if (i == 2)
stop("stopping...")
i
}, error = function(e) {
warning("oops: ", conditionMessage(e))
OK <<- FALSE # assign in parent environment
}, finally = {
## return NA on error
OK || return(NA)
})
## proceed
value * value
})
This allows one to continue using the tryCatch() infrastructure, e.g., to translate warnings into errors. The tryCatch() block encapsulates all the relevant code.

Turns out, this can be accomplished in a simple way with try() and an additional help function.
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- try(read_xls(filename, col_names = F))
if(is.error(cand_df) == T){
return(list("Corrupted: rescrape", filename))
} else {
#storing election name for later matching
election_name <- cand_df[1,1]
}
}
Where is.error() is taken from Hadley Wickham's Advanced R chapter on debugging. It's defined as:
is.error <- function(x) inherits(x, "try-error")

Related

Redirect warnings & error messages produced by assertions into the logfile [R]

I am writing a function which would create a logfile along the output. The logfile is supposed to contain the info whether the data processing was finished successfully or not (and why is that be).
I know how to display custom error/warning messages using tryCatch (and I use this function IRL). However I do not know how to deal with the messages produced by assertions. I use assertthat & assertive for validation of arguments passed to the function.
I would like to divert (sink?) the assertthat output to the logfile if the argument is missing or does not meet the requirements (so the logfile would inform why the function finished unsuccessfully).
For instance, I would like to have a following info within the logfile: "Function finished unsuccessfully because (assertion msg)". Does anyone know how to do it?
Here is a dummy function which does nothing spectacular, it serves just as a simple reprex:
example_function <- function(input_vec, input_num, save_dir){
cat(paste0('[', as.character(Sys.time()), '] ', 'Pipeline initialized','\n','\n'))
# Create a log file
if (dir.exists(file.path(save_dir))) {
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
#Show console message
cat(paste0('Hello there!','\n', '\n', sep = ""))
# Handle if save dir does not exist
if (!dir.exists(file.path(save_dir))) {
cat(paste0('[', as.character(Sys.time()), '] ', 'Defined save directory does not exist. Creating...','\n', sep=''))
tryCatch({dir.create(file.path(save_dir, fsep = .Platform$file.sep))
cat('Done!\n')
},
error=function(e){
cat(paste0('[', as.character(Sys.time()), '] ', 'Failed to create the save dir. Results will be stored in the current working directory.\n', sep=''))
save_dir <- getwd()
})
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
# Assertions
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
if (missing(input_num)) {
stop("An input_num is missing.", call. =FALSE)
}
if (missing(save_dir)) {
stop("A save dir is missing. ", call. =FALSE)
}
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
#just a dummy thing for reprex
output <- input_vec*input_num
#display console messages
cat(paste0('[', as.character(Sys.time()), '] ','Function finished','\n'))
cat(paste0('[', as.character(Sys.time()), '] ','A logfile is stored in: ','\n'))
cat(paste0(' ', log_filepath, '\n'))
# close logfile connection
on.exit(close(log_file))
return(output)
}
And here is some dummy input:
input_vec <- c(1:100)
input_num <- 14
test <- example_function(input_vec = input_vec, input_num = input_num, save_dir =getwd())
Currently, the given example does not allow to produce a logfile containing error info produced by assertions.
Solution:
I am posting this solution (may be not the neattiest, but it works) because I think it might be useful (there are some other questions regarding logging to a file in R, some of them remain unanswered etc.).
assertions created with stop() function:
Using the reprex code from the above question. One needs to switch this:
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
To this:
if (missing(input_vec)) {
sink(log_file, type='message')
cat(paste0('[', as.character(Sys.time()), '] ','Function encountered an error. Aborted.\n'))
on.exit(sink())
stop("An input_vec is missing. ", call. =FALSE)
}
And use similar syntax in the case of the other similar assertions.
assertions created with assertthat & assertive
Once more going back to reprex. In this scenario, enclosing conditions within sink() on/off connection works perfectly fine.
sink(log_file, append=TRUE, split = F, type='message')
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input num must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
sink(type='message')
I made two big mistakes, which prompted me to ask a question on SO. First, I was trying to sink() the assertion with stop() function outside the condition brackets, which resulted in an empty log (without the error message). Second, regarding assertthat, I tried to put the sink() function inside the assert_that function call, which also was inappropriate. Hadley Wickham's book on advanced R gave me hint how to solve this.

How to avoid stop of the code execution in R

I am writting a code in R to read a list of CSV files from 6000 URLs.
data <- read.csv(url)
If R can not acces a URL, the code execution stops. Anyone knows hot to avoid this error stop in R?
I have been looking for any argument for the read.csv functionm but probably there is a function.
Simply use tryCatch to catch the error inside the loop but continue on to other iterations:
# DEFINED METHOD, ON ERROR PRINTS MESSAGE AND RETURNS NULL
read_data_from_url <- function(url) {
tryCatch({
read.csv(url)
}, error = function(e) {
print(e)
return(NULL)
})
}
# NAMED LIST OF DATA FRAMES
df_list <- sapply(
list_of_6000_urls, read_data_from_url, simplify = FALSE
)
# FILTER OUT NULLS (PROBLEMATIC URLS)
df_list <- Filter(NROW, df_list)

How do you force evaluation of a value in R within a for loop?

For complicated reasons I'm trying to spy on often-used I/O methods in R.
I've got working code but there's something going on I don't understand about R's evaluation which leads to a lengthy comment.
.spy.on.methods <- function() {
attached.packages <- .packages()
pkg.watch <- list(
"data.table" = "fread",
"maptools" = "readShapePoly",
"utils" = "read.csv"
)
for (pkg in names(pkg.watch)) {
methods <- pkg.watch[[pkg]]
if (pkg %in% attached.packages) {
# replace now
.replace.methods.with.spy(methods)
} else {
# replace them right after they're loaded
setHook(
packageEvent(pkg, "attach"),
# parameters included for reader comprehension - we don't need or use them
function(pkg.name, pkg.path) {
# WARNING: you cannot use `methods` here. The reason is the value is
# mutated within this loop and R's lazy evaluation means that ALL
# hook functions will end up having a `methods` value from the final
# iteration of the loop
.replace.methods.with.spy(pkg.watch[[pkg.name]])
},
action = "append"
)
}
}
}
It seems to me that I should be able to replace
.replace.methods.with.spy(pkg.watch[[pkg.name]])
with
.replace.methods.with.spy(methods)
but it doesn't work. Diagnostic print statements indicate that in all cases methods has the value associated with pkg.watch[["utils"]].
I feel as if some combination of force, eval, and substitute are the solution here but I have not yet found it. Can anyone enlighten me?

How to handle API error in a foreach loop R?

FYI, based on some comments I added more information.
I created the following function that is making a call to an API:
keyword_checker <- function(keyword, domain, loc, lang){
keyword_to_check <- as.character(keyword)
api_request <- paste("https://script.google.com/blabalbalba",
"?kw=",keyword,
"&domain=",domain,
"&loc=",loc,
"&lang=",lang,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <-fromJSON(file = api_request)#Json file into Data Frame
return(data.frame(do.call("rbind", source$data$result))) ##in order to extract only the "results" data
I am using the R package foreach() with %dopar% and doSNOW to do many API calls (more than 120k calls).
Unfortunately, it happens that there are some errors (usually time out connection), so it makes the script stops. In order to avoid this problem I used the .errorhandling = 'pass'. Now, the script doesn't stop but I would like to know if there is a way to make the API call until I get an answer?
Here is my script:
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass', .packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
results <- indexed_url(test_keyword)} ##name of my function
##Stop cluster
stopCluster(cl)
I basically want my script to continue (without stopping the whole process) until I get the answer from the API call
Do I need to incorporate the TryCatch function within the foreach, OR is that better to "upgrade" the function that I created by adding something like "if the API doesn't give the answer, then wait until it gets it?"
I hope this is clearer.
Try using tryCatch inside the foreach function to catch the expected error messages (here failed API call due to time out). Below is a sample code snippet for the given function keyword_checker, based on my understanding.
library(foreach)
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass',
.packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
#results <- keyword_checker(test_keyword)} ##name of my function
results <- function(test_keyword){
dmy <- tryCatch(
{
keyword_checker(test_keyword)
},
error = function(cond){
message = "Timeout error! Calling again..."
dmy2 <- keyword_checker(test_keyword)
return(dmy2)
}
warning = function(cond){
message("Warning message:")
message(cond)
return(NULL)
}
finally = {
message(paste("Succesfully called API ", test_keyword))
}
)
return(dmy)
}
##Stop cluster
stopCluster(cl)
Here's a link which explains how to write tryCatch. Note, this snippet may not exactly work since I didn't run the code block. But calling the API caller again, when it fails should do the job.
Check this link, for a discussion on similar issue.
Here is an updated script including the TryCatch directly in the function.
indexed_url <- function(url){
url_to_check <- as.character(url)
api_request <- paste("https://script.google.com/macros/blablabalbalbaexec",
"?page=",url_to_check,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <- tryCatch({
fromJSON(file = api_request)#Convertir un Json file en Data Frame
}, error = function(e) {
cat(paste0("Une erreur a eu lieu :",e))
Sys.sleep(1)
indexed_url(url)
})
return(data.frame(do.call("rbind", source)))
}
Then running the foreach just the way it was is working perfectly. No more errors, and I have the full analysis.

Run until no error occurred

I want to execute a function which uses an internet connection to grep some online data. Because the connection is not very stable, it needs several attempts to run the function successfully.
Therfore I want to repeat or loop the function until it worked and also save the results.
tryCatch seems to be a suitable function but so far I did not find a way to solve the problem.
This is the function:
annotations(snp = 'rs1049434', output = 'snpedia')
and sometimes this error occur:
Error in annotations(snp = "rs1049434", output = "snpedia") :
server error: (502) Bad Gateway
The basic code schematically:
while( tmp == F){
ifelse(result <- function worked, tmp <- T, tmp <- F)}
And I need the output result which is a data.frame.
ANSWER (see the link in the comment from nicola):
bo=0
while(bo!=10){
x = try(annotations(snp = 'rs1049434', output = 'snpedia'),silent=TRUE)
if (class(x)=="try-error") {
cat("ERROR1: ", x, "\n")
Sys.sleep(1)
print("reconntecting...")
bo <- bo+1
print(bo)
} else
break
}

Resources