I want to execute a function which uses an internet connection to grep some online data. Because the connection is not very stable, it needs several attempts to run the function successfully.
Therfore I want to repeat or loop the function until it worked and also save the results.
tryCatch seems to be a suitable function but so far I did not find a way to solve the problem.
This is the function:
annotations(snp = 'rs1049434', output = 'snpedia')
and sometimes this error occur:
Error in annotations(snp = "rs1049434", output = "snpedia") :
server error: (502) Bad Gateway
The basic code schematically:
while( tmp == F){
ifelse(result <- function worked, tmp <- T, tmp <- F)}
And I need the output result which is a data.frame.
ANSWER (see the link in the comment from nicola):
bo=0
while(bo!=10){
x = try(annotations(snp = 'rs1049434', output = 'snpedia'),silent=TRUE)
if (class(x)=="try-error") {
cat("ERROR1: ", x, "\n")
Sys.sleep(1)
print("reconntecting...")
bo <- bo+1
print(bo)
} else
break
}
Related
I have a series of functions which goes to a website and collects data. Sometimes the website returns a 404 error and my code breaks. It could take 10 minutes of processing until I get a 404 error, or the code (more often then not) runs without the 404 error.
I have the following code:
linkToStopAt = as.character(unique(currentData$linkURL)[1])
myLinksToSearchOver = as.character(unique(currentData$page))
tmp = NULL
i <- 1
out_lst = list()
while(i <= length(myLinksToSearchOver)){
print(paste("Processing page: ", i))
tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>%
add_column(page = myLinksToSearchOver[i])
if(linkToStopAt %in% tmp$linkURL)
{
print(paste("We stopped at: ", i))
break
}
out_lst[[i]] <- tmp
i <- i + 1
}
Broken down as:
linkToStopAt = as.character(unique(currentData$linkURL)[1]) gives me a single URL where the while loops will break if it see this URL
myLinksToSearchOver = as.character(unique(currentData$page)) gives me multiple links in which the while loop will search over, once it finds the linkToStopAt on one of these links, the while loop breaks.
tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>% add_column(page = myLinksToSearchOver[i]) This is a big function, which relies on many other functions...
######################################################
So, the while loop runs until it finds a link linkToStopAt on one of the pages from myLinksToSearchOver. The function possible_collectPageData just does all my scraping/data processing etc. Each page from myLinksToSearchOver is stored in out_lst[[i]] <- tmp.
I recieve a specific error "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero" in the console sometimes.
What I want to do, is something like:
repeat {
tmpCollectData <- try(while("ALL-MY-WHILE-LOOP-HERE??")) #try(execute(f))
if (!(inherits(tmpCollectData, "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero")))
break
}
Where, if the while loop breaks with that error, just run it all again, setting tmp = NULL, i = 1, out_list = list() etc. (Basically start again, I can do this manually by just re-executing the code again)
You could create a function that does your work, and then wrap the call to that function in try(), with silent=TRUE. Then place that in a while(TRUE) loop, breaking out if get_data() does NOT return an error:
Function to do your work
get_data <- function(links, stoplink) {
i=1
out_lst=list()
while(i <= length(links)){
print(paste("Processing page: ", i))
tmp = possible_collectPageData(links[i]) %>% add_column(page = links[i])
if(stoplink %in% tmp$linkURL) {
print(paste("We stopped at: ", i))
break
}
out_lst[[i]] <- tmp
i <- i + 1
}
return(out_lst)
}
Infinite loop that gets broken if result does not have any error.
while(TRUE) {
result = try(get_data(myLinksToSearchOver, linkToStopAt), silent=T)
if(!"try-error" %in% class(result)) break
}
I am writing a function which would create a logfile along the output. The logfile is supposed to contain the info whether the data processing was finished successfully or not (and why is that be).
I know how to display custom error/warning messages using tryCatch (and I use this function IRL). However I do not know how to deal with the messages produced by assertions. I use assertthat & assertive for validation of arguments passed to the function.
I would like to divert (sink?) the assertthat output to the logfile if the argument is missing or does not meet the requirements (so the logfile would inform why the function finished unsuccessfully).
For instance, I would like to have a following info within the logfile: "Function finished unsuccessfully because (assertion msg)". Does anyone know how to do it?
Here is a dummy function which does nothing spectacular, it serves just as a simple reprex:
example_function <- function(input_vec, input_num, save_dir){
cat(paste0('[', as.character(Sys.time()), '] ', 'Pipeline initialized','\n','\n'))
# Create a log file
if (dir.exists(file.path(save_dir))) {
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
#Show console message
cat(paste0('Hello there!','\n', '\n', sep = ""))
# Handle if save dir does not exist
if (!dir.exists(file.path(save_dir))) {
cat(paste0('[', as.character(Sys.time()), '] ', 'Defined save directory does not exist. Creating...','\n', sep=''))
tryCatch({dir.create(file.path(save_dir, fsep = .Platform$file.sep))
cat('Done!\n')
},
error=function(e){
cat(paste0('[', as.character(Sys.time()), '] ', 'Failed to create the save dir. Results will be stored in the current working directory.\n', sep=''))
save_dir <- getwd()
})
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
# Assertions
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
if (missing(input_num)) {
stop("An input_num is missing.", call. =FALSE)
}
if (missing(save_dir)) {
stop("A save dir is missing. ", call. =FALSE)
}
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
#just a dummy thing for reprex
output <- input_vec*input_num
#display console messages
cat(paste0('[', as.character(Sys.time()), '] ','Function finished','\n'))
cat(paste0('[', as.character(Sys.time()), '] ','A logfile is stored in: ','\n'))
cat(paste0(' ', log_filepath, '\n'))
# close logfile connection
on.exit(close(log_file))
return(output)
}
And here is some dummy input:
input_vec <- c(1:100)
input_num <- 14
test <- example_function(input_vec = input_vec, input_num = input_num, save_dir =getwd())
Currently, the given example does not allow to produce a logfile containing error info produced by assertions.
Solution:
I am posting this solution (may be not the neattiest, but it works) because I think it might be useful (there are some other questions regarding logging to a file in R, some of them remain unanswered etc.).
assertions created with stop() function:
Using the reprex code from the above question. One needs to switch this:
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
To this:
if (missing(input_vec)) {
sink(log_file, type='message')
cat(paste0('[', as.character(Sys.time()), '] ','Function encountered an error. Aborted.\n'))
on.exit(sink())
stop("An input_vec is missing. ", call. =FALSE)
}
And use similar syntax in the case of the other similar assertions.
assertions created with assertthat & assertive
Once more going back to reprex. In this scenario, enclosing conditions within sink() on/off connection works perfectly fine.
sink(log_file, append=TRUE, split = F, type='message')
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input num must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
sink(type='message')
I made two big mistakes, which prompted me to ask a question on SO. First, I was trying to sink() the assertion with stop() function outside the condition brackets, which resulted in an empty log (without the error message). Second, regarding assertthat, I tried to put the sink() function inside the assert_that function call, which also was inappropriate. Hadley Wickham's book on advanced R gave me hint how to solve this.
FYI, based on some comments I added more information.
I created the following function that is making a call to an API:
keyword_checker <- function(keyword, domain, loc, lang){
keyword_to_check <- as.character(keyword)
api_request <- paste("https://script.google.com/blabalbalba",
"?kw=",keyword,
"&domain=",domain,
"&loc=",loc,
"&lang=",lang,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <-fromJSON(file = api_request)#Json file into Data Frame
return(data.frame(do.call("rbind", source$data$result))) ##in order to extract only the "results" data
I am using the R package foreach() with %dopar% and doSNOW to do many API calls (more than 120k calls).
Unfortunately, it happens that there are some errors (usually time out connection), so it makes the script stops. In order to avoid this problem I used the .errorhandling = 'pass'. Now, the script doesn't stop but I would like to know if there is a way to make the API call until I get an answer?
Here is my script:
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass', .packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
results <- indexed_url(test_keyword)} ##name of my function
##Stop cluster
stopCluster(cl)
I basically want my script to continue (without stopping the whole process) until I get the answer from the API call
Do I need to incorporate the TryCatch function within the foreach, OR is that better to "upgrade" the function that I created by adding something like "if the API doesn't give the answer, then wait until it gets it?"
I hope this is clearer.
Try using tryCatch inside the foreach function to catch the expected error messages (here failed API call due to time out). Below is a sample code snippet for the given function keyword_checker, based on my understanding.
library(foreach)
cl <- makeCluster(9)
registerDoSNOW(cl)
final_urls_checker <- foreach(i = 1:length(mes_urls_to_check), .combine=rbind, .errorhandling = 'pass',
.packages='rjson') %dopar% {
test_keyword <- as.character(mes_urls_to_check[i])
#results <- keyword_checker(test_keyword)} ##name of my function
results <- function(test_keyword){
dmy <- tryCatch(
{
keyword_checker(test_keyword)
},
error = function(cond){
message = "Timeout error! Calling again..."
dmy2 <- keyword_checker(test_keyword)
return(dmy2)
}
warning = function(cond){
message("Warning message:")
message(cond)
return(NULL)
}
finally = {
message(paste("Succesfully called API ", test_keyword))
}
)
return(dmy)
}
##Stop cluster
stopCluster(cl)
Here's a link which explains how to write tryCatch. Note, this snippet may not exactly work since I didn't run the code block. But calling the API caller again, when it fails should do the job.
Check this link, for a discussion on similar issue.
Here is an updated script including the TryCatch directly in the function.
indexed_url <- function(url){
url_to_check <- as.character(url)
api_request <- paste("https://script.google.com/macros/blablabalbalbaexec",
"?page=",url_to_check,sep="")
api_request <- URLencode(api_request, repeated = TRUE)
source <- tryCatch({
fromJSON(file = api_request)#Convertir un Json file en Data Frame
}, error = function(e) {
cat(paste0("Une erreur a eu lieu :",e))
Sys.sleep(1)
indexed_url(url)
})
return(data.frame(do.call("rbind", source)))
}
Then running the foreach just the way it was is working perfectly. No more errors, and I have the full analysis.
I want to process a own designed function on every cell using the calc function of the "raster" package.
Everything works perfectly when I try to print the "final" result of the function (value I want to return), but when I try to use return statement, I got an error :
Error in .local(x, values, ...) :
values must be numeric, integer or logical.
Here is the code leading to that error
inR <- 'D://test/TS_combined_clipped.tif'
outR <- 'D://test/R_test3.tif'
rasterB <- brick(inR)
fun1 <-function(x){
years = seq(1, 345)
na_idx = which(is.na(x))
years = years[-na_idx]
x <- na.omit(x)
idx = detectChangePoint(x, cpmType='Student', ARL0=500)$changePoint
return(years[idx]) # this raises error
# print(years[idx]) # This does *not* raises any error
}
r <- calc(rasterB, fun=fun1, filename=outR, overwrite=TRUE)
How is it possible to have a return statement to make it fails ?
Some of my tests leads to the fact that it seems that the process fails just after the execution of the calc function on the very last cell of the rasterBrick.
But I have no clue of where to start to try to fix this.
Input image is available here
[EDIT]
I just noticed that if I use return(idx) instead of return(year[idx]) the process works without error raised.
So it seems that the problem is more at fetching the value of the year variable.
Is therefore any particular thing that I missed in the use of indexes with R ?
Comment of user2554330 put me on the good track, issue was that calc cannot handle a "numeric(0)" result.
Updated code is then
inR <- 'D://test/TS_combined_clipped.tif'
outR <- 'D://test/R_test3.tif'
rasterB <- brick(inR)
fun1 <-function(x){
years = seq(1, 345)
na_idx = which(is.na(x))
years = years[-na_idx]
x <- na.omit(x)
idx = detectChangePoint(x, cpmType='Student', ARL0=500)$changePoint
if (idx==0){
return(0)
} else {
return(as.integer(years[idx]))
}
}
r <- calc(rasterB, fun=fun1, filename=outR, overwrite=TRUE)
I am trying to write a function that cleans spreadsheets. However, some of the spreadsheets are corrupted and will not open. I want the function to recognize this, print an error message, and skip execution of the rest of the function (since I am using lapply() to iterate across files), and continues. My current attempt looks like this:
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- tryCatch(read_xls(filename, col_names = F),
error = function (e){
warning(paste(filename, "cannot be opened; corrupted or does not exist"))
})
print(filename)
#rest of function
cand_df[1,1]
}
test_vec <- c("test.xls", "test2.xls", "test3.xls")
lapply(FUN = candidate.cleaner, X = test_vec)
However, this still executes the line of the function after the tryCatch statement when given a .xls file that does not exist, which throws a stop since I'm attempting to index a dataframe that doesn't exist. This exits the lapply call. How can I write the tryCatch call to make it skip execution of the rest of the function without exiting lapply?
One could set a semaphore at the start of the tryCatch() indicating that things have gone OK so far, then handle the error and signal that things have gone wrong, and finally check the semaphore and return from the function with an appropriate value.
lapply(1:5, function(i) {
value <- tryCatch({
OK <- TRUE
if (i == 2)
stop("stopping...")
i
}, error = function(e) {
warning("oops: ", conditionMessage(e))
OK <<- FALSE # assign in parent environment
}, finally = {
## return NA on error
OK || return(NA)
})
## proceed
value * value
})
This allows one to continue using the tryCatch() infrastructure, e.g., to translate warnings into errors. The tryCatch() block encapsulates all the relevant code.
Turns out, this can be accomplished in a simple way with try() and an additional help function.
candidate.cleaner <- function(filename){
#this function cleans candidate data spreadsheets into an R dataframe
#dependency check
library(readxl)
#read in
cand_df <- try(read_xls(filename, col_names = F))
if(is.error(cand_df) == T){
return(list("Corrupted: rescrape", filename))
} else {
#storing election name for later matching
election_name <- cand_df[1,1]
}
}
Where is.error() is taken from Hadley Wickham's Advanced R chapter on debugging. It's defined as:
is.error <- function(x) inherits(x, "try-error")