RSelenium Installation on MacBook, with Chrome - r

I have RSelenium (the package) installed, to do some scraping of NHL statistics from hockeyreference.com
It was working, all fine, but recently stopped working, giving this error:
[1] "Connecting to remote server"
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to
localhost port 4567: Connection refused
Check server log for further details.
$client
[1] "No sessionInfo. Client browser is mostly likely not opened."
$server
Process Handle
command : /private/var/folders/dk/kf4tf83n1lg40687w6fmq5wh0000gn/T/Rtmpiy1cOY/file1d1856ef53ae.sh
system id : 18786
state : exited
Warning message:
In rsDriver(port = 4567L, geckover = NULL, phantomver = NULL) :
Could not determine server status.
I've tried reinstalling, but could not get it working. My original install is outlined in this question (using homebrew, with the latest chromedriver installed):
Css selector issue with rvest and NHL statistics
Any help would be great. The code I'm running with RSelenium is here:
https://github.com/papelr/nhldatar/blob/master/nhldatar/R/nhldatar-phase-2.R
TL;DR, I can't get the rsDriver argument to work, and it gives the error posted above:
rsDriver(port = 4567L, geckover = NULL, phantomver = NULL)
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
If RSelenium works (opening a chrome browser), then the rest of this will run! Thanks!

I reccomend to:
install docker for Mac,
pull the image for chrome, firefox (recommend version 3.5.1) or phantom
run image in docker: docker run...
create remotedriver:
remDr <- remoteDriver(remoteServerAddr = "here you IP", port = 4445L,
browserName = "firefox")
if you have debug version you can watch in VNC what are you doing

Related

How can I reliably open a server/client connection with RSelenium under Linux?

Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]

How to run RSelenium script in Heroku

I've been struggling to run an R script on Heroku with RSelenium. I have other scripts running on the platform, but I can't make a Selenium server start. I'm trying it using the help I found
here on stackoverflow. I have Google Chrome and ChromeDriver added to my Heroku slug. I managed to pull up the following code:
eCaps <- list(chromeOptions = list(args = c('--headless', '--disable-gpu', '--blink-settings=imagesEnabled=false',
'--disable-dev-shm-usage', '--no-sandbox', '-Dwebdriver.chrome.driver=/app/.chromedriver/bin/chromedriver'),
binary = Sys.getenv("GOOGLE_CHROME_BIN")))
remDr <- remoteDriver(
browser = "chrome",
extraCapabilities = eCaps
)
remDr$open()
where I have an environment variable GOOGLE_CHROME_BIN storing the path of the Chrome binary as /app/.apt/usr/bin/google-chrome.
When I run the script I'm getting the following error:
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
Calls: <Anonymous> -> queryRD -> checkError

RSelenium with RSDriver. Error: httr output: Failed to connect to localhost port 4445: Connection refused

I am trying to use RSelenium for webscraping. I am following the basics tutorial as explained on cran. The recommended approach is to install Docker (see tutorial as well as this stackoverflow answer). If I understand correctly, this is not an option for me as I am operating on Windows 7 for which Docker seems not to be available (see docker forum).
Thus, I am trying option 2 using the RSDriver. I run
RSelenium::rsDriver()
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
and get the error
> remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
This question has been asked and answered before here, here, here and here, though these are about the same error when using Docker and their solutions did not work for me.
Is there anyway to get this running with rsDriver? Is there any option for me as a Windows 7 user?
With RSelenium version 1.7.7 this is a workaround:
library(RSelenium)
remDr <- rsDriver(
port = 4445L,
browser = "firefox"
)
This command combines the server setup, and driver initation.
My issue (on Mac) was updating Java:
https://www.oracle.com/java/technologies/downloads/#jdk19-mac
Worked after this.

Rselenium executable does not exist

I try to run RSelenium using the following:
library("RSelenium")
#start RSelenium server
rD <- rsDriver(verbose = FALSE)
remDr <- rD$client
remDr$open()
However, in rsDriver(), I receive this error:
Selenium message:The driver executable does not exist: C:\Users\kira\Documents
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
I have download the standalone jar of Selenium and put it into the path but the error does not disappear. Any other workarounds?
From the docs, it looks like you should be starting the server from the command terminal. Of course, you can do this from R with the system2 command, but probably easier to start the jar from a terminal first for debugging.
Alternatively you can run the binary manually. Open a console in your
OS and navigate to where the binary is located and run:
java -jar selenium-server-standalone-x.xx.x.jar
By default the
Selenium Server listens for connections on port 4444.
Note for Mac OSX: The default port 4444 is sometimes used by other
programs such as kerberos. In our examples we use port 4445 in respect
of this and for cdonsistency with the Docker vignette.
Afterwards, connecting from R:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "firefox"
)
remDr$open()
remDr$getStatus()

r - learning RSelenium, a few basic beginner technical issues

I've looked at https://github.com/ropensci/RSelenium/issues/94 and https://github.com/ropensci/RSelenium/issues/82 but was not able to solve my problem. It didn't help that this person was on Windows, and I am on Mac (El Capitan, version 10.11.6)
I am trying to learn data scraping with RSelenium, but some of the technical aspects of it are giving me issues early on. I have a few questions first and then will share my code:
(1) Right away, it says that startServer() is deprecated. specifically, that:
startServer()
# output
Warning message:
startServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
.
what should i use in place of startSever(), or what do I need to change on my computer? I'm confused as to what this warming message is saying.
(2) Since it's just a warning, I continue by trying to open a browser in chrome. I quickly run into another error:
remDr = remoteDriver$new(browserName = 'chrome')
remDr$open()
# output
[1] "Connecting to remote server"
$webdriver.remote.sessionid
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
$locationContextEnabled
[1] TRUE
$webStorageEnabled
[1] TRUE
$takesScreenshot
[1] TRUE
$javascriptEnabled
[1] TRUE
$message
[1] "session not created exception\nfrom unknown error: Runtime.executionContextCreated has invalid 'context': {\"auxData\":{\"frameId\":\"34144.1\",\"isDefault\":true},\"id\":1,\"name\":\"\",\"origin\":\"://\"}\n (Session info: chrome=54.0.2840.71)\n (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Mac OS X 10.11.6 x86_64)"
$hasTouchScreen
[1] TRUE
$platform
[1] "ANY"
$cssSelectorsEnabled
[1] TRUE
$id
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
the $message line output mentions that the session was not created. on my desktop, what i see is that chrome opens initially for a split second, and then closes / crashes / doesn't actually open up. I try again for firefox, and get:
remDr = remoteDriver$new(browserName = 'firefox')
remDr$open()
# output
[1] "Connecting to remote server"
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
it is frustrating to try to learn this, but to not even be able to get past the very first steps of opening a browser. Any help is greatly appreciated!
As noted checkForServer and startServer are deprecated you may be able to use them as follows:
unlink(file.path(find.package("RSelenium"), "bin"), recursive = TRUE, force = TRUE)
RSelenium::checkForServer()
For Firefox:
In terminal, run the following command
brew install geckodriver
Running selenium at the default port on Mac has an issue as often Kerberos is already running on default port 4444 on MAC. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(extraCapabilities = list(marionette = TRUE), port=5556)
remDr$open()
......
# when finished
selServ$stop()
For chrome:
brew install chromedriver
Running selenium at the default port on Mac has an issue. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(browserName = "chrome",
extraCapabilities = list(marionette = TRUE),
port=5556)
remDr$open()
......
# when finished
selServ$stop()
If the above doesnt help then look at running a Docker container see
http://rpubs.com/johndharrison/RSelenium-Docker and https://github.com/SeleniumHQ/docker-selenium . This basically involves running a Docker container using something like:
$ docker run -d -p 5556:4444 selenium/standalone-chrome:3.0.1-aluminum
then a selenium server and chrome browser should be accessible on port 5556 which you can connect to giving appropriate arguments in remoteDriver.

Resources