I have a few scrapes via RSelenium scheduled. Sometimes the scraping failes and i would like to know the reason.
I note that the error Messages (in red) are quite informative, but i dont know how to log them.
Lets say i provided a "non well formed URL".:
tryCatch(
expr = remDr$navigate("i.am.not.an.url"),
error = function(error){
print(error)
# write.table(error, file = ...)
}
)
This is what i get, but it doesnt give much specification on what triggered the error
<simpleError: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.WebDriverException
Further Details: run errorDetails method>
This is more informative - but i dont manage to log it.
Selenium message:Target URL i.am.not.an.url is not well-formed.
Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03'
System info: host: '9bc48e7a4511', ip: '172.17.0.4', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1087-aws', java.version: '1.8.0_91'
Driver info: driver.version: unknown
What i tried:
Using the Error Handling Class. It includes very detailed specification of the error Messages and its meanings, but i dont manage to extract them given my current error.
errHandle = errorHandler(remDr)
errHandle$checkStatus(remDr)
errHandle$checkError(res = remDr)
Using a message handler from another #SO question:
messageHandler <- function(fun, ...) {
zz <- textConnection("foo", "w", local = TRUE)
sink(zz, type = "message")
res <- fun(...)
sink(type = "message")
close(zz)
#handle messages
list(res, messages = foo)
}
wrongURL <- function() {
remDr$navigate("mistake")
}
messageHandler(fun = wrongURL)
I found a way via errorDetails():
tryCatch(
expr = remDr$navigate("i.am.not.an.url"),
error = function(error){
return(remDr$errorDetails()$localizedMessage)
}
)
Related
I feel this is supposed to be simple but I have been struggled to get it right. I'm trying to extract the Employees number ("2,300,000") from this webpage: https://fortune.com/company/walmart/
I used Chrome's extension SelectorGadget to locate the number---"info__row--7f9lE:nth-child(13) .info__value--2AHH7""
```
library(RSelenium)
library(rvest)
library(netstat)
rs_driver_object<-rsDriver(browser='chrome',chromever='103.0.5060.53',verbose=FALSE, port=free_port())
remDr<-rs_driver_object$client
remDr$navigate('https://fortune.com/company/walmart/')
Employees<-remDr$findElement(using = 'xpath','//h3[#class="info__row--7f9lE:nth-child(13) .info__value--2AHH7"]')
Employees
```
An error says
> "Selenium message:no such element: Unable to locate element".
I have also tried:
```
Employees<-remDr$findElement(using = 'class name','info__value--2AHH7')
```
But it returns the data not as wanted.
Can someone point out the problem? Really appreciate it!
Updated
I modified the code as suggested by Frodo below in the comment to apply to multiple webpages to save the statistics as a dataframe. But I still encountered an error.
library(RSelenium)
library(rvest)
library(netstat)
rs_driver_object<-rsDriver(browser='chrome',chromever='103.0.5060.53',verbose=FALSE, port=netstat::free_port())
remDr<-rs_driver_object$client
Data<-data.frame("url" = c("https://fortune.com/company/walmart/", "https://fortune.com/company/amazon-com/"
,"https://fortune.com/company/apple/"
,"https://fortune.com/company/cvs-health/"
,"https://fortune.com/company/jpmorgan-chase/"
,"https://fortune.com/company/verizon/"
,"https://fortune.com/company/ford-motor/"
, "https://fortune.com/company/general-motors/"
,"https://fortune.com/company/anthem/"
, "https://fortune.com/company/centene/"
,"https://fortune.com/company/fannie-mae/"
, "https://fortune.com/company/comcast/"
, "https://fortune.com/company/chevron/"
,"https://fortune.com/company/dell-technologies/"
,"https://fortune.com/company/bank-of-america-corp/"
,"https://fortune.com/company/target/") )
Data$numEmp<-"NA"
Data$numEmp <- numeric()
for (i in 1:length(Data$url))
{
remDr$navigate(url = Data$url[i])
pgSrc <- remDr$getPageSource()
pgCnt <- read_html(pgSrc[[1]])
Data$numEmp[i] <- pgCnt %>%
html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>%
html_text(trim = TRUE)
}
Data$numEmp
Selenium message:unknown error: unexpected command response
(Session info: chrome=103.0.5060.114)
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: 'DESKTOP-VCCIL8P', ip: '192.168.1.249', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_311'
Driver info: driver.version: unknown
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.WebDriverException
Further Details: run errorDetails method
Can someone please take another look?
Use RSelenium to load up the webpage and get the page source
remdr$navigate(url = 'https://fortune.com/company/walmart/')
pgSrc <- remdr$getPageSource()
Use Rvest to read the contents of the webpage
pgCnt <- read_html(pgSrc[[1]])
Further, use rvest::html_nodes and rvest::html_text functions to extract the text using relevant xpath selectors. (this Chrome extension should help)
reqTxt <- pgCnt %>%
html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>%
html_text(trim = TRUE)
Output of reqTxt
> reqTxt
[1] "2,300,000"
UPDATE
The error Selenium message:unknown error: unexpected command response seems to be occurring specifically 103 version of Chromedriver. More info here. One of the answers there was a giving a simple wait of 5 seconds before and after the driver navigates to the URL. And I have also used tryCatch to keep continuing the code to run within a while loop. Essentially, the code will run until it loads the page. This seems to work.
# Function to fetch employee count
getEmployees <- function(myURL) {
pagestatus <<- 0
while(pagestatus == 0) {
tryCatch(
expr = remDr$navigate(url = myURL),
pagestatus <<- 1,
error = function(error){
pagestatus <<- 0
}
)
}
pgSrc <- remDr$getPageSource()
pgCnt <- read_html(pgSrc[[1]])
return(pgCnt %>% html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>% html_text(trim = TRUE))
}
Implement this function to all of your dataframe URLs.
for(i in 1:nrow(Data)) {
Sys.sleep(5)
Data[i, 2] <- getEmployees(Data[i, 1])
Sys.sleep(5)
}
Now when we see the output of second column
> Data[, 2]
[1] "2,300,000" "1,608,000" "154,000" "258,000" "271,025" "118,400"
[7] "183,000" "157,000" "98,200" "72,500" "7,400" "189,000"
[13] "42,595" "133,000" "208,248" "450,000"
Does it have to be with RSelenium only? In my experience, the most flexible approach is to use RSelenium to navigate to the required pages (where findElement helps you find boxes to enter text into or buttons to click) and then use rvest to extract what you need from the page.
Start with
rs_driver_object<-rsDriver(browser='chrome',chromever='103.0.5060.53',verbose=FALSE, port=netstat::free_port())
remDr<-rs_driver_object$client
remDr$navigate('https://fortune.com/company/walmart/')
page_source <- remDr$getPageSource()
pg <- xml2::read_html(page_source[[1]])
How you then go about it depends on how specific you want the solution to be wrt this exact page. Here is one way:
rvest::html_elements(pg, "div.info__row--7f9lE") |>
rvest::html_text2()
or
rvest::html_elements(pg, "div:nth-child(13) > div.info__value--2AHH7") |>
rvest::html_text2()
or
rvest::html_elements(pg, "div.info__row--7f9lE")[11] |>
rvest::html_children()
or
rvest::html_elements(pg, '.info__row--7f9lE:nth-child(13) .info__value--2AHH7') |>
rvest::html_text2()
et cetera. What you do in the rvest part would depend on how general you want the selection/extraction process to be.
In summary, I'm finding that when running httr::POST against a plumber api, within an R.utils::withTimeout, the post is throwing an error message which can't be suppressed. The error message is:
Error in .Call(R_curl_fetch_memory, enc2utf8(url), handle, nonblocking) : reached elapsed time limit
Things I've tried:
suppressmessages/warnings
tryCatch with no error response
Using sinks
I can't think of any other way of stopping this error. The code carries on running but it's resulting in users of the api getting spammed with error messages when checking for an available port to call. Any ideas welcome.
Reproducible example (note this uses rstudio jobs pkg and a separate plumber file, but i've tried without the jobs pkg and the issue persists so it's not connected to that.) :
plumber.R:
#* quick ping
#* #post /ping
function() {
list(msg = "you got here!")
}
#* slow 10s call
#* #post /slowfxn
function() {
Sys.sleep(10)
1
}
code which calls and produces errors:
#run the plumber file in a separate job:
r <- plumber::plumb("errortest/plumber.R")
job::job({r$run(swagger = F,port = 1111)})
#this function will throw the error if no response comes within 3s:
call_with_timeout = function() {
R.utils::withTimeout(
rawToChar(httr::POST("http://127.0.0.1:1111/echo")$content)
,timeout = 3,onTimeout = "silent")
}
#now call the fxn (again in a job) which takes 10 s to run:
job::job({httr::POST(url = "http://127.0.0.1:1111/slowfxn")})
#wait a second for that job to be initiated:
Sys.sleep(1)
#while that's running, try and call the quick fxn, with a 3 second timeout:
#try wrap it in a trycatch to suppress:
tryCatch( {
call_with_timeout()
},
error = function(x) {},
TimeoutException = function(x){},
warning = function(x) {}
)
suppressMessages(suppressWarnings(call_with_timeout()))
#not even a sink can stop it!
sink("delete.txt")
call_with_timeout()
sink(NULL)
I have code that includes several initial checks of different parameter values. The code is part of a larger project involving several R scripts as well as calls from other environments. If a parameter value does not pass one of the checks, I want to
Generate a customizable result code
Skip the remaining code (which is not going to work anyhow if the parameters are wrong)
Create a log entry with the line where the error was thrown (which tells me which test was not passed by the parameters)
Print my customizable result code to the console (without a more detailed explanation / trace back from the error)
Otherwise, the remaining code should be run. If there are other errors (not thrown by me), I also need an error handling resulting in a customizable general result code (signalling that there was an error, but that it was not one thrown by me) and a more detailled log.
The result codes are part of the communication with a larger environment and just distinguishes between wrong parameter values (i.e., errors thrown by me) and other internal problems (that might occur later in the script).
I would like to use tryCatchLog because it allows me to log a detailed traceback including the script name (I am sourcing my own code) and the line number. I have not figured out, however, how to generate my own error code (currently I am doing this via the base function stop()) and pass this along using tryCatchLog (while also writing a log).
Example
In the following example, my parameter_check() throws an error via stop() with my result code "400". Using tryCatchLog I can catch the error and get a detailed error message including a traceback. However, I want to seperate my own error code (just "400"), which should be printed to the console, and a more detailed error message, which should go to a log file.
library(tryCatchLog)
parameter_check <- function(error) {
if (error){
stop("400")
print("This line should not appear")
}
}
print("Beginning")
tryCatchLog(parameter_check(error = TRUE),
error = function(e) {print(e)}
)
print("End")
Currently, the result is:
[1] "Beginn"
ERROR [2021-12-08 11:43:38] 400
Compact call stack:
1 tryCatchLog(parameter_check(0), error = function(e) {
2 #3: stop("400")
Full call stack:
1 tryCatchLog(parameter_check(0), error = function(e) {
print(e)
2 tryCatch(withCallingHandlers(expr, condition =
cond.handler), ..., finall
3 tryCatchList(expr, classes, parentenv, handlers)
4 tryCatchOne(expr, names, parentenv, handlers[[1]])
5 doTryCatch(return(expr), name, parentenv, handler)
6 withCallingHandlers(expr, condition = cond.handler)
7 parameter_check(0)
8 #3: stop("400")
9 .handleSimpleError(function (c)
{
if (inherits(c, "condition")
<simpleError in parameter_check(0): 400>
I would like to get my own result code ("400") so that I can print it to the console while logging the complete error message in a file. Is there a way of doing it without writing code parsing the error message, etc.?
Solution with tryCatch
Based on the hint by R Yoda and this answers this is a solution with tryCatch and calling handlers.
### Parameters
log_file_location <- "./logs/log.txt"
### Defining functions
parameter_check_1 <- function(error) {
if (error){
stop("400")
}
}
parameter_check_2 <- function(error) {
if (error){
stop("400")
}
}
write_to_log <- function(file_location, message) {
if (file.exists(file_location))
{write(message, file_location, append = TRUE)}
else
{write(message, file_location, append = FALSE)}
}
parameter_check <- function(){
print("Beginning of parameter check")
print("First check")
parameter_check_1(error = TRUE)
print("Second check")
parameter_check_2(error = FALSE)
print("End of parameter check")
}
main<- function() {
print("Beginning of main function")
log(-1) # throws warning
log("error") # throws error
print("End of main function")
}
### Setting parameters
result_code_no_error <- "200"
result_code_bad_request <- "400"
result_code_internal_error <- "500"
# initial value for result_code
result_code <- result_code_no_error
print("Beginning of program")
### Execute parameter check with tryCatch and calling handlers
# Error in parameter checking functions should result in result_code_bad_request
tryCatch(withCallingHandlers(parameter_check(),
error = function(condition){},
warning = function(condition){
write_to_log(log_file_location, condition$message)
invokeRestart("muffleWarning")
}
),
error = function(condition) {
write_to_log(log_file_location, condition$message)
result_code <<- result_code_bad_request
}
)
### Execute main section with tryCatch and calling handlers
# Error in main section should result in result_code_internal_error
# main section should only be excecuted if there is no error (internal or bad request) in the previous section
if (result_code == result_code_no_error) {
tryCatch(withCallingHandlers(main(),
error = function(condition){},
warning = function(condition){
write_to_log(log_file_location, condition$message)
invokeRestart("muffleWarning")
}
),
error = function(condition) {
write_to_log(log_file_location, condition$message)
result_code <<- result_code_internal_error
}
)
}
print("End of program")
print(result_code)
As explained in the vignette for tryCatchLog this has the disadvantage of not logging the precise location of the error. I am not passing on the error message from stop("400"), because all parameter checking functions are in one function call now, but this could be done using condition$message.
The solution is (totally independent of using tryCatchLog or standard R tryCatch):
...
error = function(e) {print(e$message)}
..
Background (how R errors work): They create an object of type (error) condition:
e <- simpleError("400") # same "condition" object as created by stop("400")
str(e)
# List of 2
# $ message: chr "400"
# $ call : NULL
# - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
print(e$message)
[1] "400"
I am building a function to connect to a specific password-protected ODBC data source that will be used many members of a team - it may be used in multiple environments. In the event that the connection is rejected, I would like to display the warning messages but mask the password that's displayed. If I use suppressWarnings() nothing gets captured as far as I can tell, and if I don't, then the message is displayed in the standard output with the password. Here's the function so far:
connectToData <- function(uid, pswd, dsn='myDSN') {
# Function to connect to myDSN data
#
# Args:
# uid: The user's ID for connecting to the database
# pswd: The user's password for connecting to the database.
# dsn: The DSN for the (already existing) ODBC connection to the 5G
# data. It must be set up on an individual Windows user's machine,
# and they could use any name for it. The default is 'myDSN'
#
# Returns:
# The 'RODBC' class object returned by the RODBC:odbcConnect() function.
#
# TODO: 1) See if you can specify the connection using odbcDriverConnect()
# so as to not rely on user's ODBC connections
# 2) Capture warnings from odbcConnect() and print them while
# disguising password using gsub, as I've attempted to do below.
library('RODBC')
db.conn <- odbcConnect(dsn,
uid=uid,
pwd=pswd)
if(class(db.conn) != 'RODBC') { # Error handling for connections that don't make it
print(gsub(pswd,'******',warnings())) # This doesn't work like I want it to
stop("ODBC connection could not be opened. See warnings()")
} else {
return(db.conn)
}
}
When I run it with the right username/password, I get the right result but when I run it with a bad password, I get this:
> db.conn <- connectTo5G(uid='myID',pswd='badpassword', dsn='myDSN')
[1] "RODBC::odbcDriverConnect(\"DSN=myDSN;UID=myID;PWD=******\")"
[2] "RODBC::odbcDriverConnect(\"DSN=myDSN;UID=myID;PWD=******\")"
Error in connectTo5G(uid = "myID", pswd = "badpassword", dsn = "myDSN") :
ODBC connection could not be opened. See warnings()
In addition: Warning messages:
1: In RODBC::odbcDriverConnect("DSN=myDSN;UID=myID;PWD=badpassword") :
[RODBC] ERROR: state 28000, code 1017, message [Oracle][ODBC][Ora]ORA-01017: invalid username/password; logon denied
2: In RODBC::odbcDriverConnect("DSN=myDSN;UID=myID;PWD=badpassword") :
ODBC connection failed
The print(gsub(...)) appears to work on the most recent warnings from before the function was invoked, and it also only prints the function call that produced the warning, not the text of the warning.
What I would like to do is capture everything after "In addition: Warning messages:" so that I can use gsub() on it, but avoid printing it before the gsub() gets a chance to work on it. I think I need to use withCallingHandlers() but I've looked through the documentation and examples and I cannot figure it out.
Some extra background: This is an Oracle database that locks users out after three attempts to connect so I want to use stop() in case someone writes code that calls this function multiple times. Different users in my group work in both Windows and Linux (sometimes going back and forth) so any solution needs to be flexible.
Catching error messages
I do not fully understand what you want to accomplish with ODBC but in terms of converting the error message, you can use tryCatch as #joran suggested
pswd = 'badpassword'
# Just as a reproducable example, a function which fails and outputs badpassword
failing <- function(){
badpassword == 1
}
# This would be the error handling part
tryCatch(failing(),
error = function(e) gsub(pswd, '******', e))
[1] "Error in failing(): object '******' not found\n"
e in this case is the error message and you could think of other ways to manipulate what is put to your screen, so it would not be as easy to guess passwords based on what was replaced. Note for example that 'object' would have been replaced as well if the password had been 'object' for some reason. Or even parts of words, which get replaced as well. At the very least, it would make sense to include word boundaries in the gsub command:
pswd = 'ling'
failing <- function(){
ling == 1
}
tryCatch(failing(),
error = function(e) gsub(paste0("\\b", pswd, "\\b"), '******', e))
[1] "Error in failing(): object '******' not found\n"
For other improvements you should look closely at the specific error messages.
Warnings
trycatch can also manipulate warning:
pswd = 'ling'
failing <- function(){
warning("ling")
ling == 1
}
tryCatch(failing(),
warning = function(w) gsub(paste0("\\b", pswd, "\\b"), '******', w),
error = function(e) gsub(paste0("\\b", pswd, "\\b"), '******', e))
[1] "simpleWarning in failing(): ******\n"
This will not show the error then, however.
withCallingHandlers
If you really want to catch all output from errors and warnings, you do indeed need withCallingHandlers, which works mostly in the same way, except that it does not terminate the rest of the evaluation.
pswd = 'ling'
failing <- function(pswd){
warning(pswd)
warning("asd")
stop(pswd)
}
withCallingHandlers(failing(),
warning = function(w) {
w <- gsub(paste0("\\b", pswd, "\\b"), '******', w)
warning(w)},
error = function(e){
e <- gsub(paste0("\\b", pswd, "\\b"), '******', e)
stop(e)
})
While using elastic package in R, I'm getting an message while using connect("172.28.6.5").
Message details :
Found http or https on es_host, stripping off, see the docs .
After this when I am running the command :
res <- Search(index = 'abc_20*', fields = c("Seq_Num"),scroll="5m",search_type = "scan")
It gives me error message :
Error: 404 - IndexMissingException[[abc_20%2A] missing]
This error is only shown in my laptop.
How to resolve this issue?
Is that exactly what you did? I get no issues running that command.
library(elastic)
connect("172.28.6.5")
#> transport: http
#> host: 172.28.6.5
#> port: 9200
#> path: NULL
#> username: NULL
#> password: <secret>
#> errors: simple
#> headers (names): NULL
Looking at the source for elastic::connect(), assuming you've tried updating the package (this block traces back to April 2016)
# strip off transport if found
if (grepl("^http[s]?://", es_host)) {
message("Found http or https on es_host, stripping off, see the docs")
es_host <- sub("^http[s]?://", "", es_host)
}
(note: this is a message not an Error as you have in your question) suggests you're passing in something that regex matches to ^http[s]?://