failed - "cannot open the connection" - r

I am trying to read several url files. Does anyone know how to check first if it can open the url and then do something? Sometimes I am getting error (failed="cannot open the connection"). I just want this to skip if it cannot open the connection.
urlAdd=paste0(server,siteID,'.dly')
# Reading the whole data in the page
if(url(urlAdd)) {
tmp <- read.fwf(urlAdd,widths=c(11,4,2,4,rep(c(5,1,1,1),31)))
}
But this condition fails.

You can use tryCatch which returns the value of the expression if it succeeds, and the value of the error argument if there is an error. See ?tryCatch.
This example looks up a bunch of URLs and downloads them. The tryCatch will return the result of readlines if it is successful, and NULL if not. If the result is NULL we just next() to the next part of the loop.
urls <- c('http://google.com', 'http://nonexistent.jfkldasf', 'http://stackoverflow.com')
for (u in urls) {
# I only put warn=F to avoid the "incomplete final line" warning
# you put read.fwf or whatever here.
tmp <- tryCatch(readLines(url(u), warn=F),
error = function (e) NULL)
if (is.null(tmp)) {
# you might want to put some informative message here.
next() # skip to the next url.
}
}
Note this will do so on any error, not just a "404 not found"-type error. If I typo'd and wrote tryCatch(raedlines(url(u), warn=F) (typo on readLines) it'd just skip everything as this would also through an error.
edit re: comments (lapply is being used, where to put data-processing code). Instead of next(), just only do your processing if the read succeeds. Put data-processing code after reading-data code. Try something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (is.null(tmp)) {
# read failed
return() # or return whatever you want the failure value to be
}
# data processing code goes here.
})
The above returns out of the function (only affects the current element of lapply) if the read fails.
Or you could invert it and do something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (!is.null(tmp)) {
# read succeeded!
# data processing code goes here.
}
})
which will do the same (it only does your data processing code if the read succeeded, and otherwise skips that whole block of code and returns NULL).

Related

How to check if an url object is reachable or not using try catch in R

I have the following URL objects and need to check if they are reachable before downloading and processing the CSV files. I can't use the URLs directly as it keeps on changing based on previous steps.
My requirement is, read the link if reachable else throw an error and go to the next link.
url1= "https://s3.mydata.csv"
url2="https://s4.mydata.csv"
url3="https://s5.mydata.csv"
(Below code will be repeated for the other 2 URLs as well)
readUrl <- function(url1) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
error=function(cond) {
message(cond)
return(NA)
},
finally={
dataread=data.table::fread(url1, sep = ",", header= TRUE,verbose = T,
fill =TRUE,skip = 2 )
}
)
return(out)
}
y <- lapply(urls, readUrl)
Why not the function url.exists directly from package RCurl.
From documentation:
This functions is analogous to file.exists and determines whether a
request for a specific URL responds without error.
Function doc LINK
Using the boolean result of this function you can easly adapt your starting code without Try Catch.

R: Continue loop to next iteration if function used in loop has stop() clausule

I have created a function that reads in a dataset but returns a stop() when this specific file does not exist on the drive. This function is called sondeprofile(), but the only important part is this:
if(file.exists(sonde)) {
dfs <- read.table(sonde, header=T, sep=",", skip = idx, fill = T)
} else {
stop("No sonde data available for this day")
}
This function has then been used within a for-loop to loop over specific days and stations to do calculations on each day. Extremely simplified problem:
for(name in stations) {
sonde <- sondeprofile(date)
# Continue with loop if sonde exists, skip this if not
if(exists("sonde")) {
## rest of code ##
}
}
But my issue is whenever the sondeprofile() functions finds that there is no file for this specific date, the stop("No sonde data available for this date") causes the whole for loop above to stop. I thought by checking if the file exists it would be enough to make sure it skips this iteration. But alas I can't get this to work properly.
I want that whenever the sondeprofile() function finds that there is no data available for a specific date, it skips the iteration and does not execute the rest of the code, rather just goes to the next one.
How can I make this happen? sondeprofile() is used in other portions of the code as well, as a standalone function so I need it to skip the iteration in the for loop.
When the function sondeprofile() throws an error, it will stop your whole loop. However, you can avoid that with try(), which attempts to try to run "an expression that might fail and allow the user's code to handle error-recovery." (From help("try")).
So, if you replace
sonde <- sondeprofile(date)
with
sonde <- try(sondeprofile(date), silent = TRUE)
you can avoid the problem of it stopping your loop. But then how do you deal with the if() condition?
Well, if a try() call encounters an error, what it returns will be of class try-error. So, you can just make sure that sonde isn't of that class, changing
if(exists("sonde")) {
to
if ( !inherits(sonde, "try-error") ) {

TryCatch with parLapply (Parallel package) in R

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?
In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like
message(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.
What you could do is something like this (with another example, reproducible):
test1 <- function(i) {
dat <- NA
try({
if (runif(1) < 0.8) {
dat <- rnorm(i)
} else {
stop("Error!")
}
})
return(dat)
}
cl <- parallel::makeCluster(3)
dat <- parallel::parLapply(cl, 1:100, test1)
See this related question for other solutions. I think using foreach with .errorhandling = "pass" would be another good solution.

Logging console history with errors in R or Rstudio

For educational purposes we are logging all commands that students type in the rstudio console during labs. In addition we would like to store if call was successful or raised an error, to identify students which struggling to get the syntax right.
The best I can come up with is something like this:
options(error = function(){
timestamp("USER ERROR", quiet = TRUE)
})
This adds an ## ERROR comment on the history log when an exception occurs. Thereby we could analyze history files to see which commands were followed by an ## ERROR comment.
However R's internal history system is not well suited for logging because it is in-memory, limited size and needs to be stored manually with savehistory(). Also I would prefer to store log one-line-per-call, i.e. escape linebreaks for multi-line commands.
Is there perhaps a hook or in the R or RStudio console for logging actual executed commands? That would allow me to insert each evaluated expression (and error) in a database along with a username and timestamp.
A possible solution would be to use addTaskCallback or the taskCallbackManager with a function that writes each top-level command to your database. The callback will only fire on the successful completion of a command, so you would still need to call a logging function on an error.
# error handler
logErr <- function() {
# turn logging callback off while we process errors separately
tcbm$suspend(TRUE)
# turn them back on when we're done
on.exit(tcbm$suspend(FALSE))
sc <- sys.calls()
sclen <- length(sc) # last call is this function call
if(sclen > 1L) {
cat("myError:\n", do.call(paste, c(lapply(sc[-sclen], deparse), sep="\n")), "\n")
} else {
# syntax error, so no call stack
# show the last line entered
# (this won't be helpful if it's a parse error in a function)
file1 <- tempfile("Rrawhist")
savehistory(file1)
rawhist <- readLines(file1)
unlink(file1)
cat("myError:\n", rawhist[length(rawhist)], "\n")
}
}
options(error=logErr)
# top-level callback handler
log <- function(expr, value, ok, visible) {
cat(deparse(expr), "\n")
TRUE
}
tcbm <- taskCallbackManager()
tcbm$add(log, name = "log")
This isn't a complete solution, but I hope it gives you enough to get started. Here's an example of what the output looks like.
> f <- function() stop("error")
f <- function() stop("error")
> hi
Error: object 'hi' not found
myError:
hi
> f()
Error in f() : error
myError:
f()
stop("error")

Handling 404 and other bad responses with looping twitteR queries

I would like it if my loop wouldn't break if I get an error. I want it to just move on to the next iteration. This example is a minimal example of the error I'm getting and the loop breaking. In my real application I'm iterating through some of the followers I've generated from another script.
library(twitteR)
#set oauth...
for(i in 1:10) {
+ x <- getUser("nalegezx") }
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"), params = params, :
client error: (404) Not Found
I understand that this loop would simply rewrite the same response to x. I'm just interested in not breaking the loop.
I'm not an expert in the R Twitter API, but I can suggest that you consider placing your call to getUser() inside a try block like this:
for (i in 1:10) {
x <- try(getUser("sdlfkja"))
}
This should stop your code from crashing in the middle of the loop. If you want to also have separate logic when a warning or error occurs in the loop, you can use tryCatch:
for (i in 1:10) {
x <- tryCatch(getUser("sdlfkja"),
warning = function(w) {
print("warning");
# handle warning here
},
error = function(e) {
print("error");
# handle error here
})
}
I accepted Tim's answer because it resolved the problem I had but for the specific instance of getting many results from twitter on the profile for users I used lookupUsers which does the job for me without messing with my request limit.

Resources