RSelenium Server Issue - r

I am trying to use RSelenium, so far I used the below code to initiate the Selenium server on my local computer (on my RStudio IDE) and had no problem.
But I am now working on an AWS EC2 instance on RStudio and the same code does not work anymore so I am a bit puzzled.
[1] "Connecting to remote server" Could not open chrome browser.
Client error message: Undefined error in httr call. httr output:
Failed to connect to localhost port 4567: Connection refused Check
server log for further details. Warning message: In rsDriver() : Could
not determine server status.
devtools::install_github("johndharrison/binman")
devtools::install_github("johndharrison/wdman")
devtools::install_github("ropensci/RSelenium")
library(RSelenium)
library(wdman)
rD<- rsDriver()
Anyone ? Thanks!

Related

How can I reliably open a server/client connection with RSelenium under Linux?

Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]

error while running jupyter notebook: The requested URL could not be retrieved

I am using ubuntu 20.04
I have got error while running jupyter notebook:
ERROR
The requested URL could not be retrieved
The following error was encountered while trying to retrieve the URL: http://localhost:8888/tree?
Connection to 127.0.0.1 failed.
The system returned: (111) Connection refused
The remote host or network may be down. Please try the request again.

R selenium could not open chrome browser problem

i've been trying to connect to chrome many times through selenium, however, it always shows the error message below.
I've already check the latest version of my chrome, not sure what's going on :( thanks!
[1] "Connecting to remote server"
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to localhost port 4567 after 0 ms: Connection refused
Check server log for further details.
Warning message:
In rsDriver(browser = c("chrome"), chromever = "99.0.4844.51", verbose = T, :
Could not determine server status.

RSelenium with RSDriver. Error: httr output: Failed to connect to localhost port 4445: Connection refused

I am trying to use RSelenium for webscraping. I am following the basics tutorial as explained on cran. The recommended approach is to install Docker (see tutorial as well as this stackoverflow answer). If I understand correctly, this is not an option for me as I am operating on Windows 7 for which Docker seems not to be available (see docker forum).
Thus, I am trying option 2 using the RSDriver. I run
RSelenium::rsDriver()
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
and get the error
> remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
This question has been asked and answered before here, here, here and here, though these are about the same error when using Docker and their solutions did not work for me.
Is there anyway to get this running with rsDriver? Is there any option for me as a Windows 7 user?
With RSelenium version 1.7.7 this is a workaround:
library(RSelenium)
remDr <- rsDriver(
port = 4445L,
browser = "firefox"
)
This command combines the server setup, and driver initation.
My issue (on Mac) was updating Java:
https://www.oracle.com/java/technologies/downloads/#jdk19-mac
Worked after this.

How to resolve RSelenium error message "Failed to connect to localhost port 4444: Connection refused"?

I am trying to use RSelenium with Dockerto crawl a website. However, I have some issues trying to get RSelenium/Docker to work.
Specifically, I installed Docker on my computer, which appears to be running fine (I see the image of the whale below when I open it).
In R, I then run the following code with no problems and see the expected output.
shell('docker run -d -p 4445:4444 selenium/standalone-chrome')
shell('docker ps')
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d7de815ce644 selenium/standalone-chrome "/opt/bin/entry_poin…" 13 minutes ago Up 13 minutes 0.0.0.0:4445->4444/tcp zen_mclean
But when I then run the following code, I always receive the following error message:
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4444,
browserName = "chrome")
remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
I am not sure what is going on here (I'm new to scraping). Can anybody help me figure out what to do here?
If it helps, I am running Windows 10.
In docker, you've binded your hosts port 4445 to the selenium-driver port 4444.
Which means if you run R in your host, you need to specify port = 4445
Does that solve it?
I managed to solve the problem by uninstalling Docker Toolbox and VMBox, which I was using, and installing the latest version of Docker from their website instead.

Resources