How to run RSelenium script in Heroku - r

I've been struggling to run an R script on Heroku with RSelenium. I have other scripts running on the platform, but I can't make a Selenium server start. I'm trying it using the help I found
here on stackoverflow. I have Google Chrome and ChromeDriver added to my Heroku slug. I managed to pull up the following code:
eCaps <- list(chromeOptions = list(args = c('--headless', '--disable-gpu', '--blink-settings=imagesEnabled=false',
'--disable-dev-shm-usage', '--no-sandbox', '-Dwebdriver.chrome.driver=/app/.chromedriver/bin/chromedriver'),
binary = Sys.getenv("GOOGLE_CHROME_BIN")))
remDr <- remoteDriver(
browser = "chrome",
extraCapabilities = eCaps
)
remDr$open()
where I have an environment variable GOOGLE_CHROME_BIN storing the path of the Chrome binary as /app/.apt/usr/bin/google-chrome.
When I run the script I'm getting the following error:
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
Calls: <Anonymous> -> queryRD -> checkError

Related

Undefined error in httr call. httr output: Failed to connect to localhost port 14415 after 0 ms: Connection refused

After updating to RStudio Version 2022.12.0+353 (2022.12.0+353), I am unable to find a freeport using netstat.
I have never had this problem before. How do I fix this?
library(RSelenium)
library(netstat)
remote_driver <- rsDriver(browser = 'firefox',
verbose = F,
port = free_port())
Error message:
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to localhost port 14415 after 0 ms: Connection refused
Check server log for further details.
Warning message:
In rsDriver(browser = "firefox", verbose = F, netstat::free_port()) :
Could not determine server status.
I have tried:
netstat::free_port() which failed.
I've switched the browser to chrome with no success.
Docker has never worked with my macbook.

How can I reliably open a server/client connection with RSelenium under Linux?

Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]

RSelenium with RSDriver. Error: httr output: Failed to connect to localhost port 4445: Connection refused

I am trying to use RSelenium for webscraping. I am following the basics tutorial as explained on cran. The recommended approach is to install Docker (see tutorial as well as this stackoverflow answer). If I understand correctly, this is not an option for me as I am operating on Windows 7 for which Docker seems not to be available (see docker forum).
Thus, I am trying option 2 using the RSDriver. I run
RSelenium::rsDriver()
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
and get the error
> remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
This question has been asked and answered before here, here, here and here, though these are about the same error when using Docker and their solutions did not work for me.
Is there anyway to get this running with rsDriver? Is there any option for me as a Windows 7 user?
With RSelenium version 1.7.7 this is a workaround:
library(RSelenium)
remDr <- rsDriver(
port = 4445L,
browser = "firefox"
)
This command combines the server setup, and driver initation.
My issue (on Mac) was updating Java:
https://www.oracle.com/java/technologies/downloads/#jdk19-mac
Worked after this.

RSelenium Installation on MacBook, with Chrome

I have RSelenium (the package) installed, to do some scraping of NHL statistics from hockeyreference.com
It was working, all fine, but recently stopped working, giving this error:
[1] "Connecting to remote server"
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to
localhost port 4567: Connection refused
Check server log for further details.
$client
[1] "No sessionInfo. Client browser is mostly likely not opened."
$server
Process Handle
command : /private/var/folders/dk/kf4tf83n1lg40687w6fmq5wh0000gn/T/Rtmpiy1cOY/file1d1856ef53ae.sh
system id : 18786
state : exited
Warning message:
In rsDriver(port = 4567L, geckover = NULL, phantomver = NULL) :
Could not determine server status.
I've tried reinstalling, but could not get it working. My original install is outlined in this question (using homebrew, with the latest chromedriver installed):
Css selector issue with rvest and NHL statistics
Any help would be great. The code I'm running with RSelenium is here:
https://github.com/papelr/nhldatar/blob/master/nhldatar/R/nhldatar-phase-2.R
TL;DR, I can't get the rsDriver argument to work, and it gives the error posted above:
rsDriver(port = 4567L, geckover = NULL, phantomver = NULL)
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
If RSelenium works (opening a chrome browser), then the rest of this will run! Thanks!
I reccomend to:
install docker for Mac,
pull the image for chrome, firefox (recommend version 3.5.1) or phantom
run image in docker: docker run...
create remotedriver:
remDr <- remoteDriver(remoteServerAddr = "here you IP", port = 4445L,
browserName = "firefox")
if you have debug version you can watch in VNC what are you doing

Rselenium executable does not exist

I try to run RSelenium using the following:
library("RSelenium")
#start RSelenium server
rD <- rsDriver(verbose = FALSE)
remDr <- rD$client
remDr$open()
However, in rsDriver(), I receive this error:
Selenium message:The driver executable does not exist: C:\Users\kira\Documents
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
I have download the standalone jar of Selenium and put it into the path but the error does not disappear. Any other workarounds?
From the docs, it looks like you should be starting the server from the command terminal. Of course, you can do this from R with the system2 command, but probably easier to start the jar from a terminal first for debugging.
Alternatively you can run the binary manually. Open a console in your
OS and navigate to where the binary is located and run:
java -jar selenium-server-standalone-x.xx.x.jar
By default the
Selenium Server listens for connections on port 4444.
Note for Mac OSX: The default port 4444 is sometimes used by other
programs such as kerberos. In our examples we use port 4445 in respect
of this and for cdonsistency with the Docker vignette.
Afterwards, connecting from R:
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444L
, browserName = "firefox"
)
remDr$open()
remDr$getStatus()

Resources