Handling 404 and other bad responses with looping twitteR queries - r

I would like it if my loop wouldn't break if I get an error. I want it to just move on to the next iteration. This example is a minimal example of the error I'm getting and the loop breaking. In my real application I'm iterating through some of the followers I've generated from another script.
library(twitteR)
#set oauth...
for(i in 1:10) {
+ x <- getUser("nalegezx") }
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"), params = params, :
client error: (404) Not Found
I understand that this loop would simply rewrite the same response to x. I'm just interested in not breaking the loop.

I'm not an expert in the R Twitter API, but I can suggest that you consider placing your call to getUser() inside a try block like this:
for (i in 1:10) {
x <- try(getUser("sdlfkja"))
}
This should stop your code from crashing in the middle of the loop. If you want to also have separate logic when a warning or error occurs in the loop, you can use tryCatch:
for (i in 1:10) {
x <- tryCatch(getUser("sdlfkja"),
warning = function(w) {
print("warning");
# handle warning here
},
error = function(e) {
print("error");
# handle error here
})
}

I accepted Tim's answer because it resolved the problem I had but for the specific instance of getting many results from twitter on the profile for users I used lookupUsers which does the job for me without messing with my request limit.

Related

Handle "500" server error type during API request using rentrez

I am trying to recover some IDs linked to names using the rentrez package that is a R wrapper over the entrez API using this code (short list of query as an example):
vect_names <- c("Theileria sergenti","Dipodascus ambrosiae","Dipodascus armillariae","Dipodascus macrosporus")
idseq <- lapply(vect_names, function(x){
query <- entrez_search(db = "taxonomy", term = x)
return(query$ids)
})
Now, this code works for me as long as I get no server errors (type : 500) which stops my requests. For small amounts of queries it is not a problem but I have around 40k queries to send so it will encounter the error for sure.
This is the error :
Erreur : HTTP failure: 500
{"error":"error forwarding request","api-key":"xxx.xx.xx.xxx","type":"ip",
"status":"ok"
I did some research and I think I need to wrap this code into a try/except function. However, the documentation is pretty scary to me and I don't see how I can replicate the server error I have so I could build a reproducible example with the error. Also because my full request will last several hours, testing multiple versions of a try/except until I am sure my code handles the error could take a long time.
So what I am looking for here is a version of this first piece of code that will continue to request the same vector element until it gets the result for it (until the HTTP failure is solved, which should take a matter of seconds).
Thanks!
The sample list you provided doesn't give an error for me, but you can use tryCatch like this
idseq <- lapply(vect_names, function(x) {
query <- tryCatch(entrez_search(db = "taxonomy", term = x), error= function(e) NA)
return(ifelse(is.na(query), NA, query$ids))
}
if there's an error query will be NA, so I changed the returned value to check for that.
After some research I needed to use the tryCatch function coupled with Sys.sleep :
idseq <- lapply(vect_names, function(x){
tryCatch(
{
query <- entrez_search(db = "taxonomy", term = x)
return(ifelse(is.na(query), NA, query$ids))
},
error = function(e)
{
Sys.sleep(60) # If error (most probably type 500 serveur), sleep 60scd then redo
query <- entrez_search(db = "taxonomy", term = x)
return(ifelse(is.na(query), NA, query$ids))
}
)
})

How to redo tryCatch after error in for loop

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?
You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}
If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

How to catch the error, save it and then remove the error in foreach in R?

I am running a code and it is really important for me to catch the error and save it for later, but not include it in my final result of the foreach. I have used trycatch and even tried coercing an error using stop. Here is a snippet of my code:
##options namely stop , remove or pass
error_handle <- "remove"
cores <- round(detectCores()*percent)
cl<-makeCluster(cores)
registerDoParallel(cl)
predict_load_all <- foreach(i=1:length(id),.export=func,.packages
(.packages()),.errorhandling = error_handle) %dopar% {
possibleError <- tryCatch({
weather_cast <- data.frame(udata$date,j,coeff_i,predict(hour_fits[[j]],
newdata=udata))
},error=function(e)return(paste0("The hour '",j, "'",
" caused the error: '", e, "'")))
if(!exists("weather_cast")){
#possibleError <- data.frame('coeff_item' = coeff_i,'Error' =
possibleError)
possibleError <- data.frame('Error' = possibleError)
write_csv(possibleError,file.path(path_predict,
'Error_weather_cast.csv'),append = T)
stop('error')
}
colnames(weather_cast)<- c("Date","Hour","coeff_item","Predicted_Load")
ifelse(j==1,predict_load <-weather_cast,predict_load <-
rbind(predict_load,weather_cast))
predict_load <- spread(predict_load, Hour, Predicted_Load)
predict_load
}
I am running the foreach to output predict_load_all. possibleError is the error which needs to be saved, which is bound by a trycatch. This should save the object (satisfies exists condition) and then using stop, an error is induced which gets ignored by the remove(.errohandling object) and the loop in the foreach is skipped. This way, I get the error and a list without the errors.
This doesn't seem to be saving an error file.
Any ideas?
In my experience (and I have a lot more "learning experiences" with parallel processing than actual success stories, unfortunately), this is not an easy task. The best solution I found was to add outfile = '' to the makeCluster command. For me, in a non-interactive mode, this caused (some of?) the error messages to get written to the output file instead of being discarded completely.

Logging console history with errors in R or Rstudio

For educational purposes we are logging all commands that students type in the rstudio console during labs. In addition we would like to store if call was successful or raised an error, to identify students which struggling to get the syntax right.
The best I can come up with is something like this:
options(error = function(){
timestamp("USER ERROR", quiet = TRUE)
})
This adds an ## ERROR comment on the history log when an exception occurs. Thereby we could analyze history files to see which commands were followed by an ## ERROR comment.
However R's internal history system is not well suited for logging because it is in-memory, limited size and needs to be stored manually with savehistory(). Also I would prefer to store log one-line-per-call, i.e. escape linebreaks for multi-line commands.
Is there perhaps a hook or in the R or RStudio console for logging actual executed commands? That would allow me to insert each evaluated expression (and error) in a database along with a username and timestamp.
A possible solution would be to use addTaskCallback or the taskCallbackManager with a function that writes each top-level command to your database. The callback will only fire on the successful completion of a command, so you would still need to call a logging function on an error.
# error handler
logErr <- function() {
# turn logging callback off while we process errors separately
tcbm$suspend(TRUE)
# turn them back on when we're done
on.exit(tcbm$suspend(FALSE))
sc <- sys.calls()
sclen <- length(sc) # last call is this function call
if(sclen > 1L) {
cat("myError:\n", do.call(paste, c(lapply(sc[-sclen], deparse), sep="\n")), "\n")
} else {
# syntax error, so no call stack
# show the last line entered
# (this won't be helpful if it's a parse error in a function)
file1 <- tempfile("Rrawhist")
savehistory(file1)
rawhist <- readLines(file1)
unlink(file1)
cat("myError:\n", rawhist[length(rawhist)], "\n")
}
}
options(error=logErr)
# top-level callback handler
log <- function(expr, value, ok, visible) {
cat(deparse(expr), "\n")
TRUE
}
tcbm <- taskCallbackManager()
tcbm$add(log, name = "log")
This isn't a complete solution, but I hope it gives you enough to get started. Here's an example of what the output looks like.
> f <- function() stop("error")
f <- function() stop("error")
> hi
Error: object 'hi' not found
myError:
hi
> f()
Error in f() : error
myError:
f()
stop("error")

failed - "cannot open the connection"

I am trying to read several url files. Does anyone know how to check first if it can open the url and then do something? Sometimes I am getting error (failed="cannot open the connection"). I just want this to skip if it cannot open the connection.
urlAdd=paste0(server,siteID,'.dly')
# Reading the whole data in the page
if(url(urlAdd)) {
tmp <- read.fwf(urlAdd,widths=c(11,4,2,4,rep(c(5,1,1,1),31)))
}
But this condition fails.
You can use tryCatch which returns the value of the expression if it succeeds, and the value of the error argument if there is an error. See ?tryCatch.
This example looks up a bunch of URLs and downloads them. The tryCatch will return the result of readlines if it is successful, and NULL if not. If the result is NULL we just next() to the next part of the loop.
urls <- c('http://google.com', 'http://nonexistent.jfkldasf', 'http://stackoverflow.com')
for (u in urls) {
# I only put warn=F to avoid the "incomplete final line" warning
# you put read.fwf or whatever here.
tmp <- tryCatch(readLines(url(u), warn=F),
error = function (e) NULL)
if (is.null(tmp)) {
# you might want to put some informative message here.
next() # skip to the next url.
}
}
Note this will do so on any error, not just a "404 not found"-type error. If I typo'd and wrote tryCatch(raedlines(url(u), warn=F) (typo on readLines) it'd just skip everything as this would also through an error.
edit re: comments (lapply is being used, where to put data-processing code). Instead of next(), just only do your processing if the read succeeds. Put data-processing code after reading-data code. Try something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (is.null(tmp)) {
# read failed
return() # or return whatever you want the failure value to be
}
# data processing code goes here.
})
The above returns out of the function (only affects the current element of lapply) if the read fails.
Or you could invert it and do something like:
lapply(urls,
function (u) {
tmp <- tryCatch(read.fwf(...), error = function (e) NULL)
if (!is.null(tmp)) {
# read succeeded!
# data processing code goes here.
}
})
which will do the same (it only does your data processing code if the read succeeded, and otherwise skips that whole block of code and returns NULL).

Resources