Chromedriver vs. Chrome update incompatibility - r

I was working on setting up RSelenium in R to interact with Chrome; however, I keep receiving an error message that the Chrome driver can't work with my version of Chrome even though I already specified the version of the Chromedriver to match Chrome on my desktop.
Below is the code producing an error: (MacOS Mojave Version 10.14.5)
library(RSelenium)
library(xml2)
library(rvest)
library(tidyverse)
library(wdman)
library(binman)
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
binman::list_versions("chromedriver")
rD <- rsDriver(browser = "chrome", chromever="75.0.3770.90")
list versions from binman are:"75.0.3770.90" "76.0.3809.12" "76.0.3809.25"
The error that I kept receiving is as follows:
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 76
(Driver info: chromedriver=76.0.3809.25 (a0c95f440512e06df1c9c206f2d79cc20be18bb1-refs/branch-heads/3809#{#271}),platform=Mac OS X 10.14.5 x86_64)
However, I checked the version that Chrome is updated to and it is 75.0.3770.100 so I assumed that the chrome driver that I specified would be suffice.
I tried a couple different methods such as adding the following functions; however, I keep receiving the same error.
eCaps <- list(chromeOptions = list(
args = c('--no-sandbox','--headless', '--disable-gpu', '--window-size=1280,800')
))
cDrv <- chrome()
I was wondering if there was anyway to remove the higher versions of chrome driver so there is only one chrome driver the code to possibly use. Any other solutions would also be very much appreciated!

Related

RSelenium:: Connection refused

I am trying (for the first time) to scrape content from a dynamic webpage, for which RSelenium appears to be the go-to. I cannot however get past the first step of calling rsDriver.
My code:
rdriver <- rsDriver(browser = "chrome",
port = free_port(),
chromever = "109.0.5414.25")
The rsDriver() function started throwing an error every time I tried to open it
[1] "Connecting to remote server" Could not open chrome browser.
Client error message: Undefined error in httr call. httr output:
Failed to connect to localhost port 14415: Connection refused Check
server log for further details. Warning message: In rsDriver(browser =
"chrome", port = free_port(), chromever = "109.0.5414.25") : Could
not determine server status.`
Version:
R 4.2.2
Java(TM) SE Development Kit 19.0.2 (64 bit)
> binman::list_versions("chromedriver")
$win32
[1] "109.0.5414.25" "109.0.5414.74" "110.0.5481.30"
> binman::list_versions("seleniumserver")
$generic
[1] "3.141.59" "4.0.0-alpha-1" "4.0.0-alpha-2"`
Any recommendations are much appreciated.
I installed all the necessary programs from scratch.
I searched for help on the internet and couldn't find a solution
there is an outstanding issue with how the wdman package reads the latest versions of Chrome. This is causing issues for lots of users (examples here https://stackoverflow.com/a/75176907/15363011 and here https://github.com/ropensci/RSelenium/issues/264)
You can specify a version of Chrome before the issue took place and binman/wdman will download and start using it:
rdriver <- rsDriver(browser = "chrome",
port = free_port(),
chromever = "108.0.5359.71")
If you'd like to use the newest versions, the fix is to delete the LICENSE.chromedriver file found in the same directory as the driver. You can find out how to do that in the other issues that I linked. If you want to use the latest version of Chrome you will have to do this any time a new chrome driver is released.

Problems with RSelenium and ChromeDriver - "Could not open chrome browser"

I have been using RSelenium for years and have never had this issue. I recently updated my google chrome to the latest version available 110.0.5481.78. I am now getting the following error when I go to use rsDriver
require(RSelenium)
rD <- rsDriver(browser = "chrome",port = 9537L, chromever = "110.0.5481.77")
"> Could not open chrome browser.
> Client error message:
> Undefined error in httr call. httr output: Failed to connect to localhost port 9537: Connection refused
> Check server log for further details.
> Warning message:
> In rsDriver(browser = "chrome", port = 9537L, chromever = "110.0.5481.77") :
> Could not determine server status."
R Console
I have tried with different versions of chromever from binman::list_versions("chromedriver") as well as leaving rsDriver blank all together. In the past when chrome has updated it has been a very simple change to chromever and everything works perfectly. Not sure if or what has changed with this latest update.
Thanks in advance.
I just fixed this same problem by removing a file LICENSE.chromedriver as per this thread: https://github.com/ropensci/RSelenium/issues/264
Use
wdman::selenium(retcommand=T)
to find the file location of binman_chromedriver files.
Navigate to this file location, go to the driver version you're using and delete the LICENSE.chromedriver file. Mine worked immediately after this action, but note that I also tried downgading wdman version to 0.2.5 (I was on 0.2.6) first:
remotes::install_version('wdman',version = '0.2.5')
I'm not sure if it was both actions that fixed it or just the file delete!

Can no longer run RSelenium Chrome Driver

I'll start by saying I have ran into this issue many times in the past, and all that's been required is updating my Google Chrome version and then updating the chromever = param in rsDriver(). These are no longer working for me.
I've tried the solutions in many posts (this one in particular) but I still can't get it to work.
Here are some details:
Computer/browser/R info:
Chrome Version: 89.0.4389.90
Mac Version: 10.15.4
RStudio Version: 1.3.959
For the longest time, I've been able to use chromever = "87.0.4280.20" even though my browser wasn't on that version. I could open up a remote driver with remDrall <- rsDriver(port = 4445L, browser = "chrome", chromever = "87.0.4280.20")
When I try this now, I get an error saying
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 87
Current browser version is 89.0.4389.90 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
This prompted me to try updating my ChromeDriver if it was stuck on 87. I updated it by manually downloading and moving to /usr/local/bin/chromedriver as well as updating via brew upgrade chromedriver. As far as I know it worked, testing via:
ls /usr/local/Caskroom/chromedriver/ gives me 89.0.4389.23
/usr/local/bin/chromedriver starts a chromedriver session with 89.0.4389.23
I've tried using chromever = "89.0.4389.23" but I get an unknown server-side error. I know my chrome version is 89.0.4389.90, but that version isn't available to use in rsDriver.
I'm really not sure where to go from here. I just don't get why it says my ChromeDriver only supports Chrome version 87 when I clearly have it updated to 89. Could it be that my rsDriver function is still picking up some old version of ChromeDriver? Can I direct it specifically to the one in /usr/local/bin/?
Any thoughts on what I can try next?
Try to replace the chromedriver.exe file on you local directory with the updated version from here:
https://chromedriver.storage.googleapis.com/index.html?path=89.0.4389.23/
I ran into a similar problem last week and was able to get the command working again by making the following change to the chromever option:
Previous version that has stopped working:
rsDriver(chromever = "87.0.4280.88", browser = "chrome", extraCapabilities = eCaps)
New version that works:
rsDriver(chromever = "89.0.4389.23", browser = "chrome", extraCapabilities = eCaps)

Latest version of RSelenium and Firefox

When I try to open the RSelenium I receive this error
[1] "Connecting to remote server"
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.firefox.NotConnectedException
The version of Firefox I have is
Firefox version: 480b10
According to this I tried to update the server version
library("RSelenium")
startServer()
unlink(system.file("bin", package = "RSelenium"), recursive = T)
checkForServer(update = TRUE)
remDr <- remoteDriver()
Sys.sleep(5)
remDr$open()
Sys.sleep(5)
but the problem still exist. Does anyone face this problem? Any possible solution?
From Firefox 48 on-wards the gecko driver/ marionette will be needed to run Firefox with Selenium.
If you have Firefox 48 you can run the gecko driver as follows:
Refer to the guidelines
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver
Download the relevant gecko driver from https://github.com/mozilla/geckodriver/releases
Add it to your PATH or refer to the location when starting binary (see below)
# get beta selenium standalone
RSelenium::checkForServer(beta = TRUE)
# assume gecko driver is not in our path (assume windows and we downloaded to docs folder)
# if the driver is in your PATH the javaargs call is not needed
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/john/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open()
....
....
remDr$close()
selServ$stop()
The above currently requires the dev version of RSelenium. Alternatively you can download the Selenium binary from http://selenium-release.storage.googleapis.com/index.html . Pick the 3.0 beta 2 binary to currently run with Firefox 48. Run the binary
java -Dwebdriver.gecko.driver=C:/Users/john/Documents/geckodriver.exe -jar selenium-server-standalone-3.0.0-beta2.jar

rselenium | get youtube page source

Why is the page source of youtube.com not scrapeable?
I tried the following (using phantomjs as well as chrome with a selenium server)
library(RSelenium)
pJS <- phantom(pjs_cmd = ...)
Sys.sleep(5) # give the binary a moment
remDr <- remoteDriver(browserName = 'phantomjs')
remDr$open()
remDr$navigate("https://www.youtube.com/")
remDr$getTitle()[[1]] # [1] "YouTube"
remDr$getPageSource()
Returns:
Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
Its an issue with encoding. Use the dev version for now until the next version is released to CRAN:
devtools::install_github("ropensci/RSelenium")
I would agree that the problem is most probably with encoding.
For instance, such problem seems to appear on nasa.gov website only on topic pages related to American-Russian space collaboration (which suggests that it is due to cyrillic characters in webpages content).
I solved the problem by using deprecated Relenium where RSelenium fails. To make Relenium run smoothly on Ubuntu 16.04 I had to install Firefox 25.0 and configure it in a way to prevent any updates. The other issue during set up was to properly install rJava, which can fail due to lack of environment variables with proper paths to Java libraries.
System configuration is as follows:
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
relenium_0.3.0; seleniumJars_2.41.0; rJava_0.9-8; RSelenium_1.3.5
Below is an example of a page that can be scraped with Relenium but not with release version of RSelenium:
link = "http://www.nasa.gov/mission_pages/station/expeditions/expedition14/index.html"
RSelenium solution fails (with Firefox of version either 34.0.5, or 25.0, no matter):
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(link)
doc = unlist(remDr$getPageSource())
Result: "Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input"
While Relenium is ok with it:
relenium_browser <- firefoxClass$new()
relenium_browser$get(link)
doc = unlist(relenium_browser$getPageSource())
doc = read_html(doc)

Resources