RSelenium Connecting to a running server - r

I am trying to use Rselenium to automate some of my more tedious reporting tasks
I have downloaded the Java virtual machine as per the instructions
I have gotten it running by using the code below
# Run the Command at the command line
cd selenium
java -jar selenium-server-standalone-3.0.1.jar
In R i then add the following code
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4445L
, browserName = "firefox"
)
remDr <- remoteDriver(port = 4445L)
remDr$open()
When i run the last line i get the error
[1] "Connecting to remote server"
Error in checkError(res) :
Couldnt connect to host on http://localhost:4445/wd/hub.
Please ensure a Selenium server is running.
I can see in the command line window that the server is running as i am getting the message Selenium Server is up and Running
Can anyone see what I'm doing incorrectly?
Update
I have tried switching the port to 4444 as based on the advice below but i get the error
From the Cmd Prompt
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
From R
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method

From Firefox version 48 the gecko driver is also required to drive a Firefox browser with Selenium Server. The geckodriver can be downloaded at https://github.com/mozilla/geckodriver/releases. If you wish to run the Selenium server manually you should then either
Add the geckodriver path to PATH
Or set the webdriver.gecko.driver system property on the JVM
The second method would be done as:
java -Dwebdriver.gecko.driver="path-to-geckodriver" -jar selenium-server-standalone-3.0.1.jar
If you are running windows and have downloaded the Selenium standalone to C:\Selenium and the geckodriver to the same location then this would look like:
C:\Users\john>cd C:\Selenium
C:\Selenium>java -Dwebdriver.gecko.driver="C:\Selenium\geckodriver.exe" -jar selenium-server-standalone-3.0.1.jar
NOTE: on a 32bit windows machine you will need the 32bit geckodriver and on a 64bit machine the corresponding 64bit geckodriver.
Alternatively the recommended way to run a Selenium server with RSelenium is to run a Docker container which includes the Selenium Server, geckodriver and appropriate Firefox browser:
docker run -d -p 5901:5900 -p 127.0.0.1:4444:4444 --link http-server selenium/standalone-firefox-debug:3.0.1-barium
see the vignette at http://rpubs.com/johndharrison/RSelenium-Docker

Related

How can I reliably open a server/client connection with RSelenium under Linux?

Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]

How to resolve RSelenium error message "Failed to connect to localhost port 4444: Connection refused"?

I am trying to use RSelenium with Dockerto crawl a website. However, I have some issues trying to get RSelenium/Docker to work.
Specifically, I installed Docker on my computer, which appears to be running fine (I see the image of the whale below when I open it).
In R, I then run the following code with no problems and see the expected output.
shell('docker run -d -p 4445:4444 selenium/standalone-chrome')
shell('docker ps')
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d7de815ce644 selenium/standalone-chrome "/opt/bin/entry_poin…" 13 minutes ago Up 13 minutes 0.0.0.0:4445->4444/tcp zen_mclean
But when I then run the following code, I always receive the following error message:
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4444,
browserName = "chrome")
remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
I am not sure what is going on here (I'm new to scraping). Can anybody help me figure out what to do here?
If it helps, I am running Windows 10.
In docker, you've binded your hosts port 4445 to the selenium-driver port 4444.
Which means if you run R in your host, you need to specify port = 4445
Does that solve it?
I managed to solve the problem by uninstalling Docker Toolbox and VMBox, which I was using, and installing the latest version of Docker from their website instead.

Rselenium executable does not exist

I try to run RSelenium using the following:
library("RSelenium")
#start RSelenium server
rD <- rsDriver(verbose = FALSE)
remDr <- rD$client
remDr$open()
However, in rsDriver(), I receive this error:
Selenium message:The driver executable does not exist: C:\Users\kira\Documents
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
I have download the standalone jar of Selenium and put it into the path but the error does not disappear. Any other workarounds?
From the docs, it looks like you should be starting the server from the command terminal. Of course, you can do this from R with the system2 command, but probably easier to start the jar from a terminal first for debugging.
Alternatively you can run the binary manually. Open a console in your
OS and navigate to where the binary is located and run:
java -jar selenium-server-standalone-x.xx.x.jar
By default the
Selenium Server listens for connections on port 4444.
Note for Mac OSX: The default port 4444 is sometimes used by other
programs such as kerberos. In our examples we use port 4445 in respect
of this and for cdonsistency with the Docker vignette.
Afterwards, connecting from R:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "firefox"
)
remDr$open()
remDr$getStatus()

r - learning RSelenium, a few basic beginner technical issues

I've looked at https://github.com/ropensci/RSelenium/issues/94 and https://github.com/ropensci/RSelenium/issues/82 but was not able to solve my problem. It didn't help that this person was on Windows, and I am on Mac (El Capitan, version 10.11.6)
I am trying to learn data scraping with RSelenium, but some of the technical aspects of it are giving me issues early on. I have a few questions first and then will share my code:
(1) Right away, it says that startServer() is deprecated. specifically, that:
startServer()
# output
Warning message:
startServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
.
what should i use in place of startSever(), or what do I need to change on my computer? I'm confused as to what this warming message is saying.
(2) Since it's just a warning, I continue by trying to open a browser in chrome. I quickly run into another error:
remDr = remoteDriver$new(browserName = 'chrome')
remDr$open()
# output
[1] "Connecting to remote server"
$webdriver.remote.sessionid
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
$locationContextEnabled
[1] TRUE
$webStorageEnabled
[1] TRUE
$takesScreenshot
[1] TRUE
$javascriptEnabled
[1] TRUE
$message
[1] "session not created exception\nfrom unknown error: Runtime.executionContextCreated has invalid 'context': {\"auxData\":{\"frameId\":\"34144.1\",\"isDefault\":true},\"id\":1,\"name\":\"\",\"origin\":\"://\"}\n (Session info: chrome=54.0.2840.71)\n (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Mac OS X 10.11.6 x86_64)"
$hasTouchScreen
[1] TRUE
$platform
[1] "ANY"
$cssSelectorsEnabled
[1] TRUE
$id
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
the $message line output mentions that the session was not created. on my desktop, what i see is that chrome opens initially for a split second, and then closes / crashes / doesn't actually open up. I try again for firefox, and get:
remDr = remoteDriver$new(browserName = 'firefox')
remDr$open()
# output
[1] "Connecting to remote server"
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
it is frustrating to try to learn this, but to not even be able to get past the very first steps of opening a browser. Any help is greatly appreciated!
As noted checkForServer and startServer are deprecated you may be able to use them as follows:
unlink(file.path(find.package("RSelenium"), "bin"), recursive = TRUE, force = TRUE)
RSelenium::checkForServer()
For Firefox:
In terminal, run the following command
brew install geckodriver
Running selenium at the default port on Mac has an issue as often Kerberos is already running on default port 4444 on MAC. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(extraCapabilities = list(marionette = TRUE), port=5556)
remDr$open()
......
# when finished
selServ$stop()
For chrome:
brew install chromedriver
Running selenium at the default port on Mac has an issue. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(browserName = "chrome",
extraCapabilities = list(marionette = TRUE),
port=5556)
remDr$open()
......
# when finished
selServ$stop()
If the above doesnt help then look at running a Docker container see
http://rpubs.com/johndharrison/RSelenium-Docker and https://github.com/SeleniumHQ/docker-selenium . This basically involves running a Docker container using something like:
$ docker run -d -p 5556:4444 selenium/standalone-chrome:3.0.1-aluminum
then a selenium server and chrome browser should be accessible on port 5556 which you can connect to giving appropriate arguments in remoteDriver.

Dealing with GConf Error: No D-BUS daemon running error in Ubuntu 12.0.4 LTS

I would like to run some R scripts in terminal on an Ubuntu 12.04 LTS (no GUI) that is hosted as a virtual server inside a Windows 2008 Server. I log into the server through ssh and whenever I run R scripts or gnome-open command in the terminal I get the following error
(gnome-open:10138): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
Unable to autolaunch a dbus-daemon without a $DISPLAY for X11
GConf Error: No D-BUS daemon running
I have tried everything including:
1.Installing Xvfb and configuring it.
2.Exporting display variable set in /etc/environment & ~/.bashrc
3.Trying to export dbus-launch to no avail
4.Getting & loading the session dbus id from file
X11 forwarding has been enabled
I need help dealing with this issue. Any ideas?
Try
apt-get install dbus-x11
this helps me on a ubuntu 12.04 with X installed and the same error message.

Resources