about reading files and using tryCatch() in R

about reading files and using tryCatch() in R - r

What I want to do is to read output files and extract some value in files. But, in fact, some files don't exist, so I use tryCatch() in my program to catch those "errors", then my program will return a NA value and continue reading the next file. But when I execute my program, it reports errors("all connections are in use"), I have tried to find answers online, but there is no good answer for my questions. So, if you can solve my problem, please give your advice ! Thank you very much !
prop.protec <- 0.5
pat <- read.csv("~/par.csv",header = FALSE) # parameter file
pat <- as.matrix(pat)
tpw <- matrix(NA, nrow = 36, ncol = num.rep) # used to store p-values
num.rep <- 500
for (i in 1:36){
# following are 4 parameters
dis.mod <- pat[i,1]
herit.tot <- pat[i,2]
bin <- pat[i,3]
op <- pat[i,4]
for(reps in 1:num.rep){
tryCatch(
{
res.file <- paste("~/z-out-",op, "-", dis.mod, "-",bin,"-",herit.tot,"-",prop.protec,"-",reps, ".extended.qls.res", sep = "")
res.dat <- read.table(file = res.file, header = TRUE)
tpw[i,reps] <- res.dat$P_MFQLS
},
warning=function(cond) {
message("Here's the original warning message:")
message(cond)
},
error = function(e){
message("Here's the original error message:")
message(e)} )
}
}

I have found the answer, a very simple and wonderful answer, we just need to add closeAllConnections() at the end of inner loop. That works very well ! I have spent much time on it, when I know this answer and succeed, I believe you can understand my feeling.

Related

Redirect warnings & error messages produced by assertions into the logfile [R]

I am writing a function which would create a logfile along the output. The logfile is supposed to contain the info whether the data processing was finished successfully or not (and why is that be).
I know how to display custom error/warning messages using tryCatch (and I use this function IRL). However I do not know how to deal with the messages produced by assertions. I use assertthat & assertive for validation of arguments passed to the function.
I would like to divert (sink?) the assertthat output to the logfile if the argument is missing or does not meet the requirements (so the logfile would inform why the function finished unsuccessfully).
For instance, I would like to have a following info within the logfile: "Function finished unsuccessfully because (assertion msg)". Does anyone know how to do it?
Here is a dummy function which does nothing spectacular, it serves just as a simple reprex:
example_function <- function(input_vec, input_num, save_dir){
cat(paste0('[', as.character(Sys.time()), '] ', 'Pipeline initialized','\n','\n'))
# Create a log file
if (dir.exists(file.path(save_dir))) {
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
#Show console message
cat(paste0('Hello there!','\n', '\n', sep = ""))
# Handle if save dir does not exist
if (!dir.exists(file.path(save_dir))) {
cat(paste0('[', as.character(Sys.time()), '] ', 'Defined save directory does not exist. Creating...','\n', sep=''))
tryCatch({dir.create(file.path(save_dir, fsep = .Platform$file.sep))
cat('Done!\n')
},
error=function(e){
cat(paste0('[', as.character(Sys.time()), '] ', 'Failed to create the save dir. Results will be stored in the current working directory.\n', sep=''))
save_dir <- getwd()
})
log_filename <- paste0(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_example_function.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
# Assertions
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
if (missing(input_num)) {
stop("An input_num is missing.", call. =FALSE)
}
if (missing(save_dir)) {
stop("A save dir is missing. ", call. =FALSE)
}
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
#just a dummy thing for reprex
output <- input_vec*input_num
#display console messages
cat(paste0('[', as.character(Sys.time()), '] ','Function finished','\n'))
cat(paste0('[', as.character(Sys.time()), '] ','A logfile is stored in: ','\n'))
cat(paste0(' ', log_filepath, '\n'))
# close logfile connection
on.exit(close(log_file))
return(output)
}
And here is some dummy input:
input_vec <- c(1:100)
input_num <- 14
test <- example_function(input_vec = input_vec, input_num = input_num, save_dir =getwd())
Currently, the given example does not allow to produce a logfile containing error info produced by assertions.

Solution:
I am posting this solution (may be not the neattiest, but it works) because I think it might be useful (there are some other questions regarding logging to a file in R, some of them remain unanswered etc.).
assertions created with stop() function:
Using the reprex code from the above question. One needs to switch this:
if (missing(input_vec)) {
stop("An input_vec is missing. ", call. =FALSE)
}
To this:
if (missing(input_vec)) {
sink(log_file, type='message')
cat(paste0('[', as.character(Sys.time()), '] ','Function encountered an error. Aborted.\n'))
on.exit(sink())
stop("An input_vec is missing. ", call. =FALSE)
}
And use similar syntax in the case of the other similar assertions.
assertions created with assertthat & assertive
Once more going back to reprex. In this scenario, enclosing conditions within sink() on/off connection works perfectly fine.
sink(log_file, append=TRUE, split = F, type='message')
assertthat::assert_that(assertive::is_numeric(input_vec),
msg=paste0("Input vec must be numeric."))
assertthat::assert_that(assertive::is_numeric(input_num),
msg=paste0("Input num must be numeric."))
assertthat::assert_that(assertive::is_character(save_dir),
msg = paste0("Path to output files is not a character string."))
sink(type='message')
I made two big mistakes, which prompted me to ask a question on SO. First, I was trying to sink() the assertion with stop() function outside the condition brackets, which resulted in an empty log (without the error message). Second, regarding assertthat, I tried to put the sink() function inside the assert_that function call, which also was inappropriate. Hadley Wickham's book on advanced R gave me hint how to solve this.

how to properly close connection so I won't get "Error in file(con, "r") : all connections are in use" when using "readlines" and "tryCatch"

I have a list of URLs (more than 4000) from a specific domain (pixilink.com) and what I want to do is to figure out if the provided domain is a picture or a video. To do this, I used the solutions provided here: How to write trycatch in R and Check whether a website provides photo or video based on a pattern in its URL and wrote the code shown below:
#Function to get the value of initial_mode from the URL
urlmode <- function(x){
mycontent <- readLines(x)
mypos <- grep("initial_mode = ", mycontent)
if(grepl("0", mycontent[mypos])){
return("picture")
} else if(grepl("tour", mycontent[mypos])){
return("video")
} else{
return(NA)
}
}
Also, in order to prevent having error for URLs that don't exist, I used the code below:
readUrl <- function(url) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
return(1)
},
error=function(cond) {
return(NA)
},
warning=function(cond) {
return(NA)
},
finally={
message( url)
}
)
return(out)
}
Finally, I separated the list of URLs and pass it into the functions (here for instance, I used 1000 values from URL list) described above:
a <- subset(new_df, new_df$host=="www.pixilink.com")
vec <- a[['V']]
vec <- vec[1:1000] # only chose first 1000 rows
tt <- numeric(length(vec)) # checking validity of url
for (i in 1:length(vec)){
tt[i] <- readUrl(vec[i])
print(i)
}
g <- data.frame(vec,tt)
g2 <- g[which(!is.na(g$tt)),] #only valid url
dd <- numeric(nrow(g2))
for (j in 1:nrow(g2)){
dd[j] <- urlmode(g2[j,1])
}
Final <- cbind(g2,dd)
Final <- left_join(g, Final, by = c("vec" = "vec"))
I ran this code on a sample list of URLs with 100, URLs and it worked; however, after I ran it on whole list of URLs, it returned an error. Here is the error : Error in textConnection("rval", "w", local = TRUE) : all connections are in use Error in textConnection("rval", "w", local = TRUE) : all connections are in use
And after this even for sample URLs (100 samples that I tested before) I ran the code and got this error message : Error in file(con, "r") : all connections are in use
I also tried closeAllConnection after each recalling each function in the loop, but it didn't work.
Can anyone explain what this error is about? is it related to the number of requests we can have from the website? what's the solution?

So, my guess as to why this is happening is because you're not closing the connections that you're opening via tryCatch() and via urlmode() through the use of readLines(). I was unsure of how urlmode() was going to be used in your previous post so it had made it as simple as I could (and in hindsight, that was badly done, my apologies). So I took the liberty of rewriting urlmode() to try and make it a little bit more robust for what appears to be a more expansive task at hand.
I think the comments in the code should help, so take a look below:
#Updated URL mode function with better
#URL checking, connection handling,
#and "mode" investigation
urlmode <- function(x){
#Check if URL is good to go
if(!httr::http_error(x)){
#Test cases
#x <- "www.pixilink.com/3"
#x <- "https://www.pixilink.com/93320"
#x <- "https://www.pixilink.com/93313"
#Then since there are redirect shenanigans
#Get the actual URL the input points to
#It should just be the input URL if there is
#no redirection
#This is important as this also takes care of
#checking whether http or https need to be prefixed
#in case the input URL is supplied without those
#(this can cause problems for url() below)
myx <- httr::HEAD(x)$url
#Then check for what the default mode is
mycon <- url(myx)
open(mycon, "r")
mycontent <- readLines(mycon)
mypos <- grep("initial_mode = ", mycontent)
#Close the connection since it's no longer
#necessary
close(mycon)
#Some URLs with weird formats can return
#empty on this one since they don't
#follow the expected format.
#See for example: "https://www.pixilink.com/clients/899/#3"
#which is actually
#redirected from "https://www.pixilink.com/3"
#After that, evaluate what's at mypos, and always
#return the actual URL
#along with the result
if(!purrr::is_empty(mypos)){
#mystr<- stringr::str_extract(mycontent[mypos], "(?<=initial_mode\\s\\=).*")
mystr <- stringr::str_extract(mycontent[mypos], "(?<=\').*(?=\')")
return(c(myx, mystr))
#return(mystr)
#So once all that is done, check if the line at mypos
#contains a 0 (picture), tour (video)
#if(grepl("0", mycontent[mypos])){
# return(c(myx, "picture"))
#return("picture")
#} else if(grepl("tour", mycontent[mypos])){
# return(c(myx, "video"))
#return("video")
#}
} else{
#Valid URL but not interpretable
return(c(myx, "uninterpretable"))
#return("uninterpretable")
}
} else{
#Straight up invalid URL
#No myx variable to return here
#Just x
return(c(x, "invalid"))
#return("invalid")
}
}
#--------
#Sample code execution
library(purrr)
library(parallel)
library(future.apply)
library(httr)
library(stringr)
library(progressr)
library(progress)
#All future + progressr related stuff
#learned courtesy
#https://stackoverflow.com/a/62946400/9494044
#Setting up parallelized execution
no_cores <- parallel::detectCores()
#The above setup will ensure ALL cores
#are put to use
clust <- parallel::makeCluster(no_cores)
future::plan(cluster, workers = clust)
#Progress bar for sanity checking
progressr::handlers(progressr::handler_progress(format="[:bar] :percent :eta :message"))
#Website's base URL
baseurl <- "https://www.pixilink.com"
#Using future_lapply() to recursively apply urlmode()
#to a sequence of the URLs on pixilink in parallel
#and storing the results in sitetype
#Using a future chunk size of 10
#Everything is wrapped in with_progress() to enable the
#progress bar
#
range <- 93310:93350
#range <- 1:10000
progressr::with_progress({
myprog <- progressr::progressor(along = range)
sitetype <- do.call(rbind, future_lapply(range, function(b, x){
myprog() ##Progress bar signaller
myurl <- paste0(b, "/", x)
cat("\n", myurl, " ")
myret <- urlmode(myurl)
cat(myret, "\n")
return(c(myurl, myret))
}, b = baseurl, future.chunk.size = 10))
})
#Converting into a proper data.frame
#and assigning column names
sitetype <- data.frame(sitetype)
names(sitetype) <- c("given_url", "actual_url", "mode")
#A bit of wrangling to tidy up the mode column
sitetype$mode <- stringr::str_replace(sitetype$mode, "0", "picture")
head(sitetype)
# given_url actual_url mode
# 1 https://www.pixilink.com/93310 https://www.pixilink.com/93310 invalid
# 2 https://www.pixilink.com/93311 https://www.pixilink.com/93311 invalid
# 3 https://www.pixilink.com/93312 https://www.pixilink.com/93312 floorplan2d
# 4 https://www.pixilink.com/93313 https://www.pixilink.com/93313 picture
# 5 https://www.pixilink.com/93314 https://www.pixilink.com/93314 floorplan2d
# 6 https://www.pixilink.com/93315 https://www.pixilink.com/93315 tour
unique(sitetype$mode)
# [1] "invalid" "floorplan2d" "picture" "tour"
#--------
Basically, urlmode() now opens and closes connections only when necessary, checks for URL validity, URL redirection, and also "intelligently" extracts the value assigned to initial_mode. With the help of future.lapply(), and the progress bar from the progressr package, this can now be applied quite conveniently in parallel to as many pixilink.com/<integer> URLs as desired. With a bit of wrangling thereafter, the results can be presented very tidily as a data.frame as shown.
As an example, I've demonstrated this for a small range in the code above. Note the commented out 1:10000 range in the code in this context: I let this code run the last couple of hours over this (hopefully sufficiently) large range of URLs to check for errors and problems. I can attest that I encountered no errors (only the regular warnings In readLines(mycon) : incomplete final line found on 'https://www.pixilink.com/93334'). For proof, I have the data from all 10000 URLs written to a CSV file that I can provide upon request (I don't fancy uploading that to pastebin or elsewhere unnecessarily). Due to oversight on my part, I forgot to benchmark that run, but I suppose I could do that later if performance metrics are desired/would be considered interesting.
For your purposes, I believe you can simply take the entire code snippet below and run it verbatim (or with modifications) by just changing the range assignment right before the with_progress(do.call(...)) step to a range of your liking. I believe this approach is simpler and does away with having to deal with multiple functions and such (and no tryCatch() messes to deal with).

tryRetry function in R?

I am looking for a tryCatch function in R that would retry n times instead of just once. One of my web request fails occasionally to return a value when the server is busy, but after one or two retries it usually works fine.
The excellent page How to write trycatch in R does not touch on this topic. I found the function TryRetry in C (orginally discussed in TryRetry - Try, Catch, then Retry) which accomplishes what I was looking for and I thought maybe a similar function exist in R in some package too?
Unfortunately, I don't have the skills to abstract an R code structure from the C example. I could just recall my function in the error handling portion of the tryCatch, but somehow this seems the wrong way to go, especially once you deal with more than one retry.
Any suggestions on how to approach a tryRetry-code structure in R would be appreciated.

You can implement a retry logic by relying on the RETRY method from the httr package and parsing the response in a second step.
In order to apply it to file download I would go down the following path (using this hosted .csv file as an example):
library(httr)
library(dplyr)
df <- RETRY(
"GET",
url = "https://www.stats.govt.nz/assets/Uploads/Business-operations-survey/Business-operations-survey-2018/Download-data/business-operations-survey-2018-business-finance-csv.csv",
times = 3) %>% # max retry attempts
content(., "parsed")

Here is a way of having a web read request tried several times before failing. It's an adaptation of the post linked to in the question, called in a loop a number of times chosen by the user. Between each try there is a Sys.sleep defaulting to 3 seconds.
I repost the function readUrl, changed. And with many comments deleted, they are in the original code.
readUrl <- function(url) {
out <- tryCatch(
{
message("This is the 'try' part")
text <- readLines(con=url, warn=FALSE)
return(list(ok = TRUE, contents = text))
},
error=function(cond) {
message(paste("URL does not seem to exist:", url))
message("Here's the original error message:")
message(paste(cond, "\n"))
# Choose a return value in case of error
return(list(ok = FALSE, contents = cond))
},
warning=function(cond) {
message(paste("URL caused a warning:", url))
message("Here's the original warning message:")
message(paste(cond, "\n"))
# Choose a return value in case of warning
return(list(ok = FALSE, contents = cond))
},
finally={
message(paste("Processed URL:", url))
message("Some other message at the end")
}
)
return(out)
}
readUrlRetry <- function(url, times = 1, secs = 3){
count <- 0L
while(count < times){
res <- readUrl(url)
count <- count + 1L
OK <- res$ok
if(OK) break
Sys.sleep(time = secs)
}
res
}
url <- c(
"http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html",
"http://en.wikipedia.org/wiki/Xz",
"xxxxx")
res <- lapply(url, readUrlRetry, times = 3)
res[[3]]
inherits(res[[3]]$contents, "warning")

quantmod getsymbols error handling

I am trying to cycle through a list of symbols and download the data and save it to a csv file. individual stocks work perfectly fine if it is there, but it stops if there is an error and I do not know how to handle errors (new to R) I used part of an answer here, but I am unable to find answer on error handling while looping it to save it to file.
quantmod omitting tickers in getSymbols
startDate = Sys.Date()- 365
pth = "C:\\"
tickers <- c("LMT","AAPL","AMT", "GOOG")
#the sapply method works by not stopping when it has issues with LMT and still it goes not to dwld AAPL,
library(quantmod)
WoW <- new.env()
sapply(tickers, function(x){
try(
getSymbols(
x,
src ="google",
from =startDate,
env=WoW),
silent=TRUE)
})
#Now for the looping to save to file, somehow it does not go althe way till GOOG. it stops at AAPL
#Error in data.frame(sym) : row names contain missing values.
for (i in 1:length(tickers) ) {
col <- c( "Open","High","Low","Close","Volume")
sym <- eval(parse(text=paste("WoW$",tickers[i],sep="")))
if (!is.null(nrow(sym))){
colnames(sym) <- col
sym <- data.frame(sym)
sym <- cbind(BizDay = 0, sym)
sym$BizDay <- rownames(sym)
op <- paste0(pth,tickers[i],".csv")
print(op)
write.table(sym, file=op, na="", sep=",", row.names = FALSE)
}
}
Any pointers on how to handle basic errors? I have to run through full security list, and have to make sure that i handle those. but right now stuck on this.
Thanks

Got it to work with nrow(sym) > 1 check.

Run until no error occurred

I want to execute a function which uses an internet connection to grep some online data. Because the connection is not very stable, it needs several attempts to run the function successfully.
Therfore I want to repeat or loop the function until it worked and also save the results.
tryCatch seems to be a suitable function but so far I did not find a way to solve the problem.
This is the function:
annotations(snp = 'rs1049434', output = 'snpedia')
and sometimes this error occur:
Error in annotations(snp = "rs1049434", output = "snpedia") :
server error: (502) Bad Gateway
The basic code schematically:
while( tmp == F){
ifelse(result <- function worked, tmp <- T, tmp <- F)}
And I need the output result which is a data.frame.
ANSWER (see the link in the comment from nicola):
bo=0
while(bo!=10){
x = try(annotations(snp = 'rs1049434', output = 'snpedia'),silent=TRUE)
if (class(x)=="try-error") {
cat("ERROR1: ", x, "\n")
Sys.sleep(1)
print("reconntecting...")
bo <- bo+1
print(bo)
} else
break
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

about reading files and using tryCatch() in R - r

I have found the answer, a very simple and wonderful answer, we just need to add closeAllConnections() at the end of inner loop. That works very well ! I have spent much time on it, when I know this answer and succeed, I believe you can understand my feeling.

Related

Redirect warnings & error messages produced by assertions into the logfile [R]

how to properly close connection so I won't get "Error in file(con, "r") : all connections are in use" when using "readlines" and "tryCatch"

tryRetry function in R?

quantmod getsymbols error handling

Run until no error occurred

Categories

Resources