Cheddar Node Removal Errors - r

I've been trying to run some species deletion simulations using the cheddar package and have come across an error:
Error in RemoveNodes(new.community, new.remove, title = title, method = "cascade") :
Removing these nodes would result in an empty community
you can recreate the error as such:
library(cheddar)
data(SkipwithPond)
a<-RemoveNodes(SkipwithPond,c('Detritus','Corixidae nymphs','Agabus / Ilybius larvae'),method='cascade')
i was wondering if was possible to disable this feature so as to allow the removal to occur? If not would there be a way to return a certain value (the number of nodes in the web in this case) if this error occurs?

I don't know much about the cheddar package, but the second option you mention would be to "catch" the error after trying to evaluate the expression. Enter tryCatch. See the documentation for this function, but generally when you save result of tryCatch to a variable, you can redirect your flow to accommodate for the error. Something along the lines of
# spaces possibly make code easier to read
a <- tryCatch(RemoveNodes(SkipwithPond, c('Detritus','Corixidae nymphs','Agabus / Ilybius larvae'), method='cascade'), error = function(e) e)
# str(a) to see what the error is (message, class...) and act on that message
# or if you want a custom message to catch
a <- tryCatch(RemoveNodes(SkipwithPond, c('Detritus','Corixidae nymphs','Agabus / Ilybius larvae'), method='cascade'), error = function(e) "empty community?")
if (a$message == "empty community?") {
# ...do something
}

Related

In R, what do I need to do when my Radio API is on commercial break

Right now, I am trying to get my function to run every so often to check if there's a new song and add it to a data frame. This is so I can figure out what are the most played songs. My code runs great until the API gives a commercial break, I read the json by doing this
jsonlite::read_json(
"https://us.api.iheart.com/api/v3/live-meta/stream/2501/currentTrackMeta?defaultMetadata=true")
And when it goes to commercial break I get this error
Error in open.connection(con, "rb") : cannot open the connection
In addition: Warning message:
In open.connection(con, "rb") :
cannot open URL 'https://us.api.iheart.com/api/v3/live-meta/stream/2501/currentTrackMeta?defaultMetadata=true': HTTP status was '204 No Content'
Does anyone know how I can still run my code every minute or so even through commercial breaks?
You appear to have options(warn=2) or higher, since for me that is a Warning, not an error. However, we can resolve that too.
nowplaying <- tryCatch(
jsonlite::read_json("https://us.api.iheart.com/api/v3/live-meta/stream/2501/currentTrackMeta?defaultMetadata=true"),
warning = function(w) w,
error = function(e) e)
if (inherits(nowplaying, c("error", "warning"))) {
msg <- conditionMessage(nowplaying)
if (grepl("204 No Content", msg)) {
message("Stand by for a few words from our sponsors ...")
# sleep or do something else before re-querying
} else {
stop("Oops! Something else is wrong and I should not continue blindly: ",
msg, call. = FALSE)
# STOP! if you continue, you might trigger some FW-defense or
# over-use and get yourself throttled or banned
}
} else {
print(nowplaying$album)
# allegedly sleep a little here before starting all over again
}
I put in some extra logic so as to not blindly assume all warnings and errors are transient. If you hit some limit or something else is breaking, this should stop (and you'll be in the "Oops" branch above).
It is feasible to use withCallingHandlers and invokeRestart("mufflewarning") conditioned on the warning/error message, but (1) that's fairly advanced, and (2) since that method allows continuing with some processing, that isn't really necessary here, so the add complication is for no real gain.

How to force a For loop() or lapply() to run with error message in R

On this code when I use for loop or the function lapply I get the following error
"Error in get_entrypoint (debug_port):
Cannot connect R to Chrome. Please retry. "
library(rvest)
library(xml2) #pull html data
library(selectr) #for xpath element
url_stackoverflow_rmarkdown <-
'https://stackoverflow.com/questions/tagged/r-markdown?tab=votes&pagesize=50'
web_page <- read_html(url_stackoverflow_rmarkdown)
questions_per_page <- html_text(html_nodes(web_page, ".page-numbers.current"))[1]
link_questions <- html_attr(html_nodes(web_page, ".question-hyperlink")[1:questions_per_page],
"href")
setwd("~/WebScraping_chrome_print_to_pdf")
for (i in 1:length(link_questions)) {
question_to_pdf <- paste0("https://stackoverflow.com",
link_questions[i])
pagedown::chrome_print(question_to_pdf)
}
Is it possible to build a for loop() or use lapply to repeat the code from where it break? That is, from the last i value without breaking the code?
Many thanks
I edited #Rui Barradas idea of tryCatch().
You can try to do something like below.
The IsValues will get either the link value or bad is.
IsValues <- list()
for (i in 1:length(link_questions)) {
question_to_pdf <- paste0("https://stackoverflow.com",
link_questions[i])
IsValues[[i]] <- tryCatch(
{
message(paste("Converting", i))
pagedown::chrome_print(question_to_pdf)
},
error=function(cond) {
message(paste("Cannot convert", i))
# Choose a return value in case of error
return(i)
})
}
Than, you can rbind your values and extract the bad is:
do.call(rbind, IsValues)[!grepl("\\.pdf$", do.call(rbind, IsValues))]
[1] "3" "5" "19" "31"
You can read more about tryCatch() in this answer.
Based on your example, it looks like you have two errors to contend with. The first error is the one you mention in your question. It is also the most frequent error:
Error in get_entrypoint (debug_port): Cannot connect R to Chrome. Please retry.
The second error arises when there are links in the HTML that return 404:
Failed to generate output. Reason: Failed to open https://lh3.googleusercontent.com/-bwcos_zylKg/AAAAAAAAAAI/AAAAAAAAAAA/AAnnY7o18NuEdWnDEck_qPpn-lu21VTdfw/mo/photo.jpg?sz=32 (HTTP status code: 404)
The key phrase in the first error is "Please retry". As far as I can tell, chrome_print sometimes has issues connecting to Chrome. It seems to be fairly random, i.e. failed connections in one run will be fine in the next, and vice versa. The easiest way to get around this issue is to just keep trying until it connects.
I can't come up with any fix for the second error. However, it doesn't seem to come up very often, so it might make sense to just record it and skip to the next URL.
Using the following code I'm able to print 48 of 50 pages. The only two I can't get to work have the 404 issue I describe above. Note that I use purrr::safely to catch errors. Base R's tryCatch will also work fine, but I find safely to be a little more convient. That said, in the end it's really just a matter of preference.
Also note that I've dealt with the connection error by utilizing repeat within the for loop. R will keep trying to connect to Chrome and print until it is either successful, or some other error pops up. I didn't need it, but you might want to include a counter to set an upper threshold for the number of connection attempts:
quest_urls <- paste0("https://stackoverflow.com", link_questions)
errors <- NULL
safe_print <- purrr::safely(pagedown::chrome_print)
for (qurl in quest_urls){
repeat {
output <- safe_print(qurl)
if (is.null(output$error)) break
else if (grepl("retry", output$error$message)) next
else {errors <- c(errors, `names<-`(output$error$message, qurl)); break}
}
}

Loop to wait for result or timeout in r

I've written a very quick blast script in r to enable interfacing with the NCBI blast API. Sometimes however, the result url takes a while to load and my script throws an error until the url is ready. Is there an elegant way (i.e. a tryCatch option) to handle the error until the result is returned or timeout after a specified time?
library(rvest)
## Definitive set of blast API instructions can be found here: https://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/BLAST_URLAPI.html
## Generate query URL
query_url <-
function(QUERY,
PROGRAM = "blastp",
DATABASE = "nr",
...) {
put_url_stem <-
'https://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put'
arguments = list(...)
paste0(
put_url_stem,
"&QUERY=",
QUERY,
"&PROGRAM=",
PROGRAM,
"&DATABASE=",
DATABASE,
arguments
)
}
blast_url <- query_url(QUERY = "NP_001117.2") ## test query
blast_session <- html_session(blast_url) ## create session
blast_form <- html_form(blast_session)[[1]] ## pull form from session
RID <- blast_form$fields$RID$value ## extract RID identifier
get_url <- function(RID, ...) {
get_url_stem <-
"https://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get"
arguments = list(...)
paste0(get_url_stem, "&RID=", RID, "&FORMAT_TYPE=XML", arguments)
}
hits_xml <- read_xml(get_url(RID)) ## this is the sticky part
Sometimes it takes several minutes for the get_url to go live so what I would like is to do is to keep trying let's say every 20-30 seconds until it either produces the url or times out after a pre-specified time.
I think you may find this answer about the use of tryCatch useful
Regarding the 'keep trying until timeout' part. I imagine you can work on top of this other answer about a tryCatch loop on error
Hope it helps.

Repeatedly running a function in r till an error is not produced.

I apologize that I can not tell you what these functions are form the start.
I have a function CheckOutCell. It takes one argument and that is the number 764. So every time I run the function it looks like this in it's entirety: CheckOutCell(764).
Now many times the function will give me an error:
Error in checkInCell(764) :
The function is currently locked; try again in a minute.
Which is a custom error message and the details are not important to this question.
Now this function could be locked from anywhere from 30 seconds to an hour. I want to be able to automatically run CheckOutCell(764) till it goes through, and then stop running it. That is, run it till I do not get an error, then stop.
I think a start would be using
while(capture.output(checkInCell(764)) == "Error in checkInCell(764) :
The function is currently locked; try again in a minute."){
do something}
However this just produces
Error in checkInCell(764) :
The function is currently locked; try again in a minute.
because the function is still locked, so no output can be captured.
How would I test for while(error = T)
Assume the source code of the function cannot be modified.
Even is.error(CheckInCell(764)) will just produce the same error message
So it seems that this code works in a way
wrapcheck <- function(x){
repeatCheck =tryCatch(checkOutCell(764),
error = function(cond)"skip")
SudoCheck = ifelse(repeatCheck=="skip",repeatCheck, checkOutCell(764))
while(SudoCheck == "skip"){
repeatCheck
}
}
wrapcheck(764)
Basically this checks for an error and then keeps running the function till the error is not produced. In fact I am fairly confident that this would work with any funciton you wanted to put in place of CheckOutCell.
The main problem is that when the function is locked, that it not really an error, it is locked. Therefore this above block will not work. This above block will work when errors other than a lock are produced.

Check if html is available

I would like to know how can I check if a html is available. If it is not, I would like to control the return to avoid stop the script by error.
Ex:
arq <- readLines("www.pageerror.com.br")
print(arq)
An alternative is try() - it is simpler to work with than trycatch() but isn't as featureful. You might also need to suppress warnings as R will report that it can't resolve the address.
You want something like this in your script:
URL <- "http://www.pageerror.com.br"
arq <- try(suppressWarnings(readLines(con <- url(URL))), silent = TRUE)
close(con) ## close the connection
if(inherits(arq, "try-error")) {
writeLines(strwrap(paste("Page", URL, "is not available")))
} else {
print(arq)
}
The silent = TRUE bit suppresses the reporting of errors (if you leave this at the default FALSE, then R will report the error but not abort the script). We wrap the potentially error-raising function call in try(...., silent = TRUE), with suppressWarnings() being used to suppress the warnings. Then we test the class of the returned object arq and if it inherits from class "try-error" we know the page could not be retrieved and issue a message indicating so. Otherwise we can print arq.
?tryCatch 'Nuff said. <-- Except, apparently not, because the pageweaver demands more characters in an answer. So, "If a chicken and a half lays an egg and a half in a day and a half, how many eggs do nine chickens lay in nine days?"
OK, long enough.

Resources