If-statement in RSelenium - r

I have a vast list of chemicals for that I need to extract the CAS number. I have written a for loop which works as intended. However, when a chemical name is not found on the website, my code obviously stops.
Is there a way to account for this in the for loop? So that when a search query is not found, the loop goes back to the start page and searches for the next item in the list?
Down below is my code for the for loop with a short list of names to search for:
library(RSelenium)
library(netstat)
# start the server
rs_driver_object <- rsDriver(browser = "firefox",
verbose = FALSE,
port = 4847L) # change number if port is not open
# create a client object
remDrCh <- rs_driver_object$client
items <- c("MCPA", "DEET", "apple")
numbers <- list()
for (i in items) {
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
search_box <- remDrCh$findElement(using = 'class', 'search-input')
search_box$sendKeysToElement(list(paste(i), key = 'enter'))
Sys.sleep(2)
result <- remDrCh$findElement(using = "class", "result-content")
result$clickElement()
Sys.sleep(2)
cas <- remDrCh$findElements(using = 'class', 'cas-registry-number')
cas_n <- lapply(cas, function (x) x$getElementText())
numbers[[i]] <- unlist(cas_n)
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
Sys.sleep(2)
}
The problem lies in the result <- remDrCh$findElement(using = "class", "result-content") part. For "apple" there is no result, and thus no element that R could use.
I tried to write a separate if else argument for that specific part, but to no avail.
This still only works for queries that yield a result. I also tried to use findElements but this only helps for the case when no result is found.
result <- remDrCh$findElement(using = "class", "result-content")
if (length(result) > 0) {
result$clickElement()
} else {
remDrCh$navigate("https://commonchemistry.cas.org/")
}
I also tried to use this How to check if an object is visible in a webpage by using its xpath? but I cannot get it to work on my example.
Any help would be much appreciated!

This should work
items <- c("MCPA", "apple", "DEET")
numbers <- list()
for (i in items) {
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
search_box <- remDrCh$findElement(using = 'class', 'search-input')
search_box$sendKeysToElement(list(paste(i), key = 'enter'))
Sys.sleep(2)
result <- try(remDrCh$findElement(using = "class", "result-content"))
if(!inherits(result, "try-error")){
result$clickElement()
Sys.sleep(2)
cas <- remDrCh$findElements(using = 'class', 'cas-registry-number')
cas_n <- lapply(cas, function (x) x$getElementText())
numbers[[i]] <- unlist(cas_n)
}else{
numbers[[i]] <- NA
}
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
Sys.sleep(2)
}
Note the try() wrapper around the problematic code:
result <- try(remDrCh$findElement(using = "class", "result-content"))
This will capture the error if there is one, but allow the loop to continue. Then, there is an if statement that tries to find the result if the output from try is not of class "try-error" otherwise, it returns the number as NA.

Related

How to use tryCatch() to ignore the error in while loop in R

I have a code that reads each line of my dataframe's first column, visits the website and then downloads the photo of each deputy. But it doesn't work properly because there are some deputies who don't have a photo yet.
That's why my code breaks and stop working. I tried to use "next" and if clauses, but it still didn't work. So a friend recomended me to use the tryCatch(). I couldn't find enough information online, and the code still doesn't work.
The file is here:
https://gist.github.com/gabrielacaesar/940f3ef14eaf29d18c3780a66053bbee
deputados <- fread("dep-legislatura56-14jan2019.csv")
i <- 1
while(i <= 514) {
this.could.go.wrong <- tryCatch(
attemptsomething(),
error=function(e) next
)
url <- deputados$uri[i]
api_content <- rawToChar(GET(url)$content)
pessoa_info <- jsonlite::fromJSON(api_content)
pessoa_foto <- pessoa_info$dados$ultimoStatus$urlFoto
download.file(pessoa_foto, basename(pessoa_foto), mode = "wb")
Sys.sleep(0.5)
i <- i + 1
}
Here is a solution using purrr:
library(purrr)
download_picture <- function(url){
api_content <- rawToChar(httr::GET(url)$content)
pessoa_info <- jsonlite::fromJSON(api_content)
pessoa_foto <- pessoa_info$dados$ultimoStatus$urlFoto
download.file(pessoa_foto, basename(pessoa_foto), mode = "wb")
}
walk(deputados$uri, possibly(download_picture, NULL))
Simply wrap tryCatch on the lines that can potentially raise errors and have it return NULL or NA on the error block:
i <- 1
while(i <= 514) {
tryCatch({
url <- deputados$uri[i]
api_content <- rawToChar(GET(url)$content)
pessoa_info <- jsonlite::fromJSON(api_content)
pessoa_foto <- pessoa_info$dados$ultimoStatus$urlFoto
download.file(pessoa_foto, basename(pessoa_foto), mode = "wb")
Sys.sleep(0.5)
}, error = function(e) return(NULL)
)
i <- i + 1
}

How to create a data frame with Rblpapi subscribe function

I'm sorry this example won't be reproducible by those who aren't Bloomberg users.
For the others, I'm using Rblpapi and its subscribe function. I would like to create something like a data frame, a matrix or an array and fill it with values that are streamed by the subscription.
Assuming your BBComm component is up and running, my example says:
require(Rblpapi)
con <- blpConnect()
securities <- c('SX5E 07/20/18 C3400 Index',
'SX5E 07/20/18 C3450 Index',
'SX5E 07/20/18 C3500 Index')
I would like to fill a 3 x 2 matrix with these fields:
fields <- c('BID', 'ASK')
I guess I can create a matrix like this with almost no performance overhead:
mat <- matrix(data = NA,
nrow = 3,
ncol = 2)
Now I use subscribe and its argument fun for filling purposes, so something like this (albeit ugly to see and likely inefficient):
i <- 1
subscribe(securities = securities,
fields = fields,
fun = function(x){
if (i > length(securities))
i <<- 1
tryCatch(
expr = {
mat[i, 1] <<- x$data$BID
mat[i, 2] <<- x$data$ASK
i <<- i + 1
},
error = function(e){
message(e)
},
finally = {}
)
})
Result:
Error in subscribe_Impl(con, securities, fields, fun, options, identity) :
Evaluation error: number of items to replace is not a multiple of replacement length.
Of course, this doesn't work because I don't really know how to use indexing on streamed data. $ operator seems fine to retrieve data points by name - like I did with BID and ASK - but I cannot find a way to figure out which values are referring to, say, securities[1] or to securities[2]. It seems that I get a stream of numeric values that are indistinguishable one from each other because I cannot retrieve the ownership of the value among the securities.
Using an index on x$data$BID[1] throws the same error.
Ok your code looks fine, the only thing that does not work is x$data$BID, change to x$data["BID"] and then you can store it, Im working with your code and this is my result.
fields=c("TIME","LAST_PRICE", "BID", "ASK")
blpConnect()
blpConnect()
i <- 1
subscribe(securities = securities,
fields = fields,"interval=60",
fun = function(x){
if (i > length(securities))
i <<- 1
tryCatch(
expr = {
tim <- x$data["TIME"]
last <<- x$data["LAST_PRICE"]
ask <<- x$data["ASK"]
bid <<- x$data["BID"]
i <<- i + 1
},
error = function(e){
message(e)
},
finally = {}
)
print(cbind(tim$TIME,last$LAST_PRICE,ask$ASK, bid$BID))
})
result
A good way to take a look at the result object from the subscribe function is:
subscribe(securities=c("AAPL US Equity"),
fields=c("LAST_PRICE"),
fun=function(x) print(str(x)))
From there you can work your way into the data:
subscribe(securities=c("AAPL US Equity", "INTC US Equity"),
fields=c("LAST_PRICE","BID","ASK"),
fun=function(x) {
if (!is.null(x$data$MKTDATA_EVENT_TYPE) && x$data$MKTDATA_EVENT_TYPE == "TRADE" && exists("LAST_PRICE", where = x$data)) {
print(data.frame(Ticker = x$topic, DateTime = x$data$TRADE_UPDATE_STAMP_RT, Trade = x$data$LAST_PRICE))
}
})
I only printed the data.frame here. The data can be processed or stored directly using the FUN argument of subscribe.

Constructing a function 'while' that identifies when an object from **JSON** has an error

I´m trying to construct a loop function that catches a json from an API in R like this:
for(j in 1:700){
tx_i <- paste0("https://example.com/api/",bloco_i_final$tx[j])
txi <- GET(tx_i, add_headers(Authorization = full_token, Accept =
header_type), timeout(120), verbose())
conteudo <- content(txi, type = 'text', encoding = "UTF-8")
tx_i_final <- rjson::fromJSON(getURL(tx_i))
(some functions that bind this data.frames)
}
But sometimes,in loop, this function ends up with an error message:
Error in fromJSON(conteudo) : unexpected character 'B'
I want to build a while function that identifies this error and repeats the process.
Example:
for(j in 1:700){
#THIS PART
while(identifies erro in FROMJSON){
tx_i <- paste0("https://example.com/api/tx/",bloco_i_final$tx[j])
txi <- GET(tx_i, add_headers(Authorization = full_token, Accept =
header_type), timeout(120), verbose())
conteudo <- content(txi, type = 'text', encoding = "UTF-8")
tx_i_final <- rjson::fromJSON(getURL(tx_i))
} #REPEATS ALL PROCESS WHILE ERRO EXISTS
(some functions that bind this data.frames)
}
One trick that you can use is to implement the tryCatch(.) method inside your loop and then decrease your counter. For example:
for(j in 1:100){
tryCatch({
#Define the link with all positions
### Implement your method here!! ###
}, error = function(err) {
#Print the error:
print(paste("MY_ERROR: ",err))
#Decrease your counter to try again
j<- (j-1)
#Wait some time before try again...
Sys.sleep(10)
})
}

implementing tryCatch R

Trying to use tryCatch. What I want is to run through a list of urls that I have stored in page1URLs and if there is a problem with one of them (using readHTMLTable() )I want a record of which ones and then I want the code to go on to the next url without crashing.
I think I don't have the right idea here at all. Can anyone suggest how I can do this?
Here is the beginning of the code:
baddy <- rep(NA,10,000)
badURLs <- function(url) { baddy=c(baddy,url) }
writeURLsToCsvExtrema(38.361042, 35.465144, 141.410522, 139.564819)
writeURLsToCsvExtrema <- function(maxlat, minlat, maxlong, minlong) {
urlsFuku <- page1URLs
allFuku <- data.frame() # need to initialize it with column names
for (url in urlsFuku) {
tryCatch(temp.tables=readHTMLTable(url), finally=badURLs(url))
temp.df <- temp.tables[[3]]
lastrow <- nrow(temp.df)
temp.df <- temp.df[-c(lastrow-1,lastrow),]
}
One general approach is to write a function that fully processes one URL, returning either the computed value or NULL to indicate failure
FUN = function(url) {
tryCatch({
xx <- readHTMLTable(url) ## will sometimes fail, invoking 'error' below
## more calculations
xx ## final value
}, error=function(err) {
## what to do on error? could return conditionMessage(err) or other...
NULL
})
}
and then use this, e.g., with a named vector
urls <- c("http://cran.r-project.org", "http://stackoverflow.com",
"http://foo.bar")
names(urls) <- urls # add names to urls, so 'result' elements are named
result <- lapply(urls, FUN)
These guys failed (returned NULL)
> names(result)[sapply(result, is.null)]
[1] "http://foo.bar"
And these are the results for further processing
final <- Filter(Negate(is.null), result)

Understanding how the device list is read

I ran into a rather annoying issue with the list of open devices, trying to construct a function that saves a number of graphs for a list. Say we have following data :
Alist <- list(
X1 = data.frame(X=rnorm(10),Y=1:10),
X2 = data.frame(X=rnorm(10),Y=1:10),
X3 = data.frame(X=rnorm(10),Y=1:10)
)
and following function :
myPlotFunc <- function(x,save=F){
fnames <- paste(names(x),"pdf",sep=".")
for(i in 1:length(x)){
if(save){
pdf(fnames[i])
on.exit(dev.off(),add=T)
}
plot(x[[i]])
}
fnames
}
If I run fnames <- myPlotFunc(Alist,save=T), everything behaves normally and I get 3 pdf files names X1.pdf to X3.pdf. That is, if there's no graphics window open. If there is, then one of the pdfs is not closed, and all subsequent plots are added to the pdf until I explicitly call dev.off() in the console. Like this :
plot(Alist[[1]])
fnames <- myPlotFunc(Alist,save=T)
myPlotFunc(Alist,save=F)
> dev.list()
pdf
4
If I add on.exit({print(dev.cur());dev.off()},add=T) , I get following output :
> fnames <- myPlotFunc(Alist,save=T)
pdf
5
windows
2
pdf
3
So apparently it takes the list from the bottom up again to close everything it meets. So if there is a graphics window open, that is the next "current" device. Meaning that the second-last opened pdf-connection will not be closed by dev.off() any more, as there will be one short in the on.exit call.
I got around it by changing my function to :
myPlotFunc <- function(x,save=F){
fnames <- paste(names(x),"pdf",sep=".")
devs <- NULL
on.exit(for(i in devs) dev.off(i), add=T)
for(i in 1:length(x)){
if(save){
pdf(fnames[i])
devs <- c(devs,dev.cur())
}
plot(x[[i]])
}
fnames
}
but this feels rather awkward. Is there something I'm missing here, or a better way of getting around this?
disclaimer :
In case you're not aware, run dev.off() after running the third code block. You can clean up easily by running unlink(fnames) when you're done.
How about making help function to do one plot:
myPlotFunc <- function(x, save=FALSE) {
fnames <- paste(names(x), "pdf", sep=".")
plot_one <- function(xx, fname, save=save) {
if (save) {
pdf(fname)
on.exit(dev.off())
}
plot(xx)
}
for (i in 1:length(x)) plot_one(x[[i]], fnames[i], save)
fnames
}
One drastic solution might be to use graphics.off() instead of trying to close devices your script opens. If this is just user code, then perhaps it doesn't matter if all graphics devics are closed upon exit?
Using this brutal approach seems to work:
myPlotFunc <- function(x,save = FALSE) {
fnames <- paste(names(x),"pdf",sep=".")
if(save)
on.exit(graphics.off(),add = TRUE)
for(i in 1:length(x)) {
if(save) {
pdf(fnames[i])
}
plot(x[[i]])
}
fnames
}
An alternative is just to list all devices when on.exit() gets called, select out the pdf ones and close them. This function implements this and seems to have the desired behaviour.
myPlotFunc2 <- function(x,save = FALSE) {
fnames <- paste(names(x), "pdf", sep=".")
if(save) {
on.exit(foo <- lapply(dev.list()[grepl("pdf", names(dev.list()))],
dev.off),
add = TRUE)
}
for(i in 1:length(x)) {
if(save) {
pdf(fnames[i])
}
plot(x[[i]])
}
fnames
}
It seems the lowest numbered device is the one that R activates after a call to dev.off(), and that will be the on-screen device in the setting you describe, and hence the behaviour your report.

Resources