r - learning RSelenium, a few basic beginner technical issues - r

I've looked at https://github.com/ropensci/RSelenium/issues/94 and https://github.com/ropensci/RSelenium/issues/82 but was not able to solve my problem. It didn't help that this person was on Windows, and I am on Mac (El Capitan, version 10.11.6)
I am trying to learn data scraping with RSelenium, but some of the technical aspects of it are giving me issues early on. I have a few questions first and then will share my code:
(1) Right away, it says that startServer() is deprecated. specifically, that:
startServer()
# output
Warning message:
startServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
.
what should i use in place of startSever(), or what do I need to change on my computer? I'm confused as to what this warming message is saying.
(2) Since it's just a warning, I continue by trying to open a browser in chrome. I quickly run into another error:
remDr = remoteDriver$new(browserName = 'chrome')
remDr$open()
# output
[1] "Connecting to remote server"
$webdriver.remote.sessionid
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
$locationContextEnabled
[1] TRUE
$webStorageEnabled
[1] TRUE
$takesScreenshot
[1] TRUE
$javascriptEnabled
[1] TRUE
$message
[1] "session not created exception\nfrom unknown error: Runtime.executionContextCreated has invalid 'context': {\"auxData\":{\"frameId\":\"34144.1\",\"isDefault\":true},\"id\":1,\"name\":\"\",\"origin\":\"://\"}\n (Session info: chrome=54.0.2840.71)\n (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Mac OS X 10.11.6 x86_64)"
$hasTouchScreen
[1] TRUE
$platform
[1] "ANY"
$cssSelectorsEnabled
[1] TRUE
$id
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
the $message line output mentions that the session was not created. on my desktop, what i see is that chrome opens initially for a split second, and then closes / crashes / doesn't actually open up. I try again for firefox, and get:
remDr = remoteDriver$new(browserName = 'firefox')
remDr$open()
# output
[1] "Connecting to remote server"
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
it is frustrating to try to learn this, but to not even be able to get past the very first steps of opening a browser. Any help is greatly appreciated!

As noted checkForServer and startServer are deprecated you may be able to use them as follows:
unlink(file.path(find.package("RSelenium"), "bin"), recursive = TRUE, force = TRUE)
RSelenium::checkForServer()
For Firefox:
In terminal, run the following command
brew install geckodriver
Running selenium at the default port on Mac has an issue as often Kerberos is already running on default port 4444 on MAC. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(extraCapabilities = list(marionette = TRUE), port=5556)
remDr$open()
......
# when finished
selServ$stop()
For chrome:
brew install chromedriver
Running selenium at the default port on Mac has an issue. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(browserName = "chrome",
extraCapabilities = list(marionette = TRUE),
port=5556)
remDr$open()
......
# when finished
selServ$stop()
If the above doesnt help then look at running a Docker container see
http://rpubs.com/johndharrison/RSelenium-Docker and https://github.com/SeleniumHQ/docker-selenium . This basically involves running a Docker container using something like:
$ docker run -d -p 5556:4444 selenium/standalone-chrome:3.0.1-aluminum
then a selenium server and chrome browser should be accessible on port 5556 which you can connect to giving appropriate arguments in remoteDriver.

Related

How can I reliably open a server/client connection with RSelenium under Linux?

Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]

RSelenium with RSDriver. Error: httr output: Failed to connect to localhost port 4445: Connection refused

I am trying to use RSelenium for webscraping. I am following the basics tutorial as explained on cran. The recommended approach is to install Docker (see tutorial as well as this stackoverflow answer). If I understand correctly, this is not an option for me as I am operating on Windows 7 for which Docker seems not to be available (see docker forum).
Thus, I am trying option 2 using the RSDriver. I run
RSelenium::rsDriver()
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
and get the error
> remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
This question has been asked and answered before here, here, here and here, though these are about the same error when using Docker and their solutions did not work for me.
Is there anyway to get this running with rsDriver? Is there any option for me as a Windows 7 user?
With RSelenium version 1.7.7 this is a workaround:
library(RSelenium)
remDr <- rsDriver(
port = 4445L,
browser = "firefox"
)
This command combines the server setup, and driver initation.
My issue (on Mac) was updating Java:
https://www.oracle.com/java/technologies/downloads/#jdk19-mac
Worked after this.

RSelenium Installation on MacBook, with Chrome

I have RSelenium (the package) installed, to do some scraping of NHL statistics from hockeyreference.com
It was working, all fine, but recently stopped working, giving this error:
[1] "Connecting to remote server"
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to
localhost port 4567: Connection refused
Check server log for further details.
$client
[1] "No sessionInfo. Client browser is mostly likely not opened."
$server
Process Handle
command : /private/var/folders/dk/kf4tf83n1lg40687w6fmq5wh0000gn/T/Rtmpiy1cOY/file1d1856ef53ae.sh
system id : 18786
state : exited
Warning message:
In rsDriver(port = 4567L, geckover = NULL, phantomver = NULL) :
Could not determine server status.
I've tried reinstalling, but could not get it working. My original install is outlined in this question (using homebrew, with the latest chromedriver installed):
Css selector issue with rvest and NHL statistics
Any help would be great. The code I'm running with RSelenium is here:
https://github.com/papelr/nhldatar/blob/master/nhldatar/R/nhldatar-phase-2.R
TL;DR, I can't get the rsDriver argument to work, and it gives the error posted above:
rsDriver(port = 4567L, geckover = NULL, phantomver = NULL)
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
If RSelenium works (opening a chrome browser), then the rest of this will run! Thanks!
I reccomend to:
install docker for Mac,
pull the image for chrome, firefox (recommend version 3.5.1) or phantom
run image in docker: docker run...
create remotedriver:
remDr <- remoteDriver(remoteServerAddr = "here you IP", port = 4445L,
browserName = "firefox")
if you have debug version you can watch in VNC what are you doing

Rselenium executable does not exist

I try to run RSelenium using the following:
library("RSelenium")
#start RSelenium server
rD <- rsDriver(verbose = FALSE)
remDr <- rD$client
remDr$open()
However, in rsDriver(), I receive this error:
Selenium message:The driver executable does not exist: C:\Users\kira\Documents
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
I have download the standalone jar of Selenium and put it into the path but the error does not disappear. Any other workarounds?
From the docs, it looks like you should be starting the server from the command terminal. Of course, you can do this from R with the system2 command, but probably easier to start the jar from a terminal first for debugging.
Alternatively you can run the binary manually. Open a console in your
OS and navigate to where the binary is located and run:
java -jar selenium-server-standalone-x.xx.x.jar
By default the
Selenium Server listens for connections on port 4444.
Note for Mac OSX: The default port 4444 is sometimes used by other
programs such as kerberos. In our examples we use port 4445 in respect
of this and for cdonsistency with the Docker vignette.
Afterwards, connecting from R:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "firefox"
)
remDr$open()
remDr$getStatus()

RSelenium Connecting to a running server

I am trying to use Rselenium to automate some of my more tedious reporting tasks
I have downloaded the Java virtual machine as per the instructions
I have gotten it running by using the code below
# Run the Command at the command line
cd selenium
java -jar selenium-server-standalone-3.0.1.jar
In R i then add the following code
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4445L
, browserName = "firefox"
)
remDr <- remoteDriver(port = 4445L)
remDr$open()
When i run the last line i get the error
[1] "Connecting to remote server"
Error in checkError(res) :
Couldnt connect to host on http://localhost:4445/wd/hub.
Please ensure a Selenium server is running.
I can see in the command line window that the server is running as i am getting the message Selenium Server is up and Running
Can anyone see what I'm doing incorrectly?
Update
I have tried switching the port to 4444 as based on the advice below but i get the error
From the Cmd Prompt
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
From R
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
From Firefox version 48 the gecko driver is also required to drive a Firefox browser with Selenium Server. The geckodriver can be downloaded at https://github.com/mozilla/geckodriver/releases. If you wish to run the Selenium server manually you should then either
Add the geckodriver path to PATH
Or set the webdriver.gecko.driver system property on the JVM
The second method would be done as:
java -Dwebdriver.gecko.driver="path-to-geckodriver" -jar selenium-server-standalone-3.0.1.jar
If you are running windows and have downloaded the Selenium standalone to C:\Selenium and the geckodriver to the same location then this would look like:
C:\Users\john>cd C:\Selenium
C:\Selenium>java -Dwebdriver.gecko.driver="C:\Selenium\geckodriver.exe" -jar selenium-server-standalone-3.0.1.jar
NOTE: on a 32bit windows machine you will need the 32bit geckodriver and on a 64bit machine the corresponding 64bit geckodriver.
Alternatively the recommended way to run a Selenium server with RSelenium is to run a Docker container which includes the Selenium Server, geckodriver and appropriate Firefox browser:
docker run -d -p 5901:5900 -p 127.0.0.1:4444:4444 --link http-server selenium/standalone-firefox-debug:3.0.1-barium
see the vignette at http://rpubs.com/johndharrison/RSelenium-Docker

Resources