rselenium | get youtube page source - r

Why is the page source of youtube.com not scrapeable?
I tried the following (using phantomjs as well as chrome with a selenium server)
library(RSelenium)
pJS <- phantom(pjs_cmd = ...)
Sys.sleep(5) # give the binary a moment
remDr <- remoteDriver(browserName = 'phantomjs')
remDr$open()
remDr$navigate("https://www.youtube.com/")
remDr$getTitle()[[1]] # [1] "YouTube"
remDr$getPageSource()
Returns:
Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input

Its an issue with encoding. Use the dev version for now until the next version is released to CRAN:
devtools::install_github("ropensci/RSelenium")

I would agree that the problem is most probably with encoding.
For instance, such problem seems to appear on nasa.gov website only on topic pages related to American-Russian space collaboration (which suggests that it is due to cyrillic characters in webpages content).
I solved the problem by using deprecated Relenium where RSelenium fails. To make Relenium run smoothly on Ubuntu 16.04 I had to install Firefox 25.0 and configure it in a way to prevent any updates. The other issue during set up was to properly install rJava, which can fail due to lack of environment variables with proper paths to Java libraries.
System configuration is as follows:
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
relenium_0.3.0; seleniumJars_2.41.0; rJava_0.9-8; RSelenium_1.3.5
Below is an example of a page that can be scraped with Relenium but not with release version of RSelenium:
link = "http://www.nasa.gov/mission_pages/station/expeditions/expedition14/index.html"
RSelenium solution fails (with Firefox of version either 34.0.5, or 25.0, no matter):
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(link)
doc = unlist(remDr$getPageSource())
Result: "Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input"
While Relenium is ok with it:
relenium_browser <- firefoxClass$new()
relenium_browser$get(link)
doc = unlist(relenium_browser$getPageSource())
doc = read_html(doc)

Related

Can no longer run RSelenium Chrome Driver

I'll start by saying I have ran into this issue many times in the past, and all that's been required is updating my Google Chrome version and then updating the chromever = param in rsDriver(). These are no longer working for me.
I've tried the solutions in many posts (this one in particular) but I still can't get it to work.
Here are some details:
Computer/browser/R info:
Chrome Version: 89.0.4389.90
Mac Version: 10.15.4
RStudio Version: 1.3.959
For the longest time, I've been able to use chromever = "87.0.4280.20" even though my browser wasn't on that version. I could open up a remote driver with remDrall <- rsDriver(port = 4445L, browser = "chrome", chromever = "87.0.4280.20")
When I try this now, I get an error saying
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 87
Current browser version is 89.0.4389.90 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
This prompted me to try updating my ChromeDriver if it was stuck on 87. I updated it by manually downloading and moving to /usr/local/bin/chromedriver as well as updating via brew upgrade chromedriver. As far as I know it worked, testing via:
ls /usr/local/Caskroom/chromedriver/ gives me 89.0.4389.23
/usr/local/bin/chromedriver starts a chromedriver session with 89.0.4389.23
I've tried using chromever = "89.0.4389.23" but I get an unknown server-side error. I know my chrome version is 89.0.4389.90, but that version isn't available to use in rsDriver.
I'm really not sure where to go from here. I just don't get why it says my ChromeDriver only supports Chrome version 87 when I clearly have it updated to 89. Could it be that my rsDriver function is still picking up some old version of ChromeDriver? Can I direct it specifically to the one in /usr/local/bin/?
Any thoughts on what I can try next?
Try to replace the chromedriver.exe file on you local directory with the updated version from here:
https://chromedriver.storage.googleapis.com/index.html?path=89.0.4389.23/
I ran into a similar problem last week and was able to get the command working again by making the following change to the chromever option:
Previous version that has stopped working:
rsDriver(chromever = "87.0.4280.88", browser = "chrome", extraCapabilities = eCaps)
New version that works:
rsDriver(chromever = "89.0.4389.23", browser = "chrome", extraCapabilities = eCaps)

Why do I still have Selenium chromedriver mismatch problem when I have the correct chromedriver downloaded and saved to the path

My problem is related to this post: session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium. Basically there is a mismatch between the version of chrome and chromedriver that's sourced by the code.
I'm running chrome 73.0.3683.86 (Official Build) (32-bit) on a corporate computer (so can't be upgraded) and have downloaded chromedriver (v73.0.3683.68) which has been saved to the path (saved to users path as I can't access system path). R version is 3.6.2. RSelenium version is 1.7.7.
cprof <- getChromeProfile("C:/Users/sizhu/AppData/Local/Google/Chrome/UserData/Default","Default")
rD1 <- rsDriver(browser = "chrome",chromever = "73.0.3683.68",extraCapabilities = cprof)
When I ran the above lines, it gives me error:
version requested doesn't match versions available = 80.0.3987.106,80.0.3987.16,81.0.4044.20,81.0.4044.69
I have run binman::list_versions("chromedriver") to see what chrome driver version is sourced, it shows the ones above in bold and not the one I saved in the path. Is there a way to force the program to use the chrome driver I downloaded? (sorry I'm a newbie to programming in general so it might just be very trivial...)
Thanks very much in advance!
update not quite an eventual solution to this but made some changes so the codes can now open up the chrome browser: 1)going into wdman>yaml>chromedriver 2)change history to 20 (it was 3 hence every time I run this line, 3 latest chromever drivers (v80-81) are downloaded to binman; well since what I need is v73, I need to go back 20 versions) 3)save and specify chromever = "73.0.3683.68" which can now be found. The problem with this approach is obvious, and still doesn't solve the puzzling fact that why the heck the v73 chromedriver that I've saved to the path is not found
If you are using Chrome version 81, please download ChromeDriver 81.0.4044.69
If you are using Chrome version 80, please download ChromeDriver 80.0.3987.106
If you are using Chrome version 79, please download ChromeDriver 79.0.3945.36
https://chromedriver.chromium.org/downloads
😀😀😀
Had a similar issue, and this worked for me.
Check the selenium server version: binman::list_versions("seleniumserver"),
and then in your rsDriver(), specify the version out, not using "latest" or default.
rD1 <- rsDriver(browser = "chrome", chromever = "73.0.3683.68", version = "the version number you got", extraCapabilities = cprof)

Chromedriver vs. Chrome update incompatibility

I was working on setting up RSelenium in R to interact with Chrome; however, I keep receiving an error message that the Chrome driver can't work with my version of Chrome even though I already specified the version of the Chromedriver to match Chrome on my desktop.
Below is the code producing an error: (MacOS Mojave Version 10.14.5)
library(RSelenium)
library(xml2)
library(rvest)
library(tidyverse)
library(wdman)
library(binman)
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
binman::list_versions("chromedriver")
rD <- rsDriver(browser = "chrome", chromever="75.0.3770.90")
list versions from binman are:"75.0.3770.90" "76.0.3809.12" "76.0.3809.25"
The error that I kept receiving is as follows:
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 76
(Driver info: chromedriver=76.0.3809.25 (a0c95f440512e06df1c9c206f2d79cc20be18bb1-refs/branch-heads/3809#{#271}),platform=Mac OS X 10.14.5 x86_64)
However, I checked the version that Chrome is updated to and it is 75.0.3770.100 so I assumed that the chrome driver that I specified would be suffice.
I tried a couple different methods such as adding the following functions; however, I keep receiving the same error.
eCaps <- list(chromeOptions = list(
args = c('--no-sandbox','--headless', '--disable-gpu', '--window-size=1280,800')
))
cDrv <- chrome()
I was wondering if there was anyway to remove the higher versions of chrome driver so there is only one chrome driver the code to possibly use. Any other solutions would also be very much appreciated!

RSelenium could not navegate with Chrome

I have problems trying to implement package RSelenium on R. I already begin with the process, I downloaded selenium (selenium-server-standalone-3.5.3.jar) and Google Chrome's web driver (chromedriver.exe). I also included in the environment variables, in path, the location of both files and a variable to the java location.
Then, I begin whit code:
> require(RSelenium)
> remDr <- remoteDriver(browserName = "chrome")
> remDr$open()
This open the following window:
Later, when I try navegate some page I obtain the following error:
> remDr$navigate("http://www.la14.com")
Selenium message:unknown error: Runtime.executionContextCreated has invalid 'context': {"auxData":{"frameId":"8112.1","isDefault":true},"id":1,"name":"","origin":"://"}
(Session info: chrome=60.0.3112.113)
(Driver info: chromedriver=2.9.248315,platform=Windows NT 6.1 SP1 x86_64)
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
Further Details: run errorDetails method
I don't know if it is related with browser permissions. I will grateful with your help.
Your issue is that your are using a chromedriver that is too old. You are using 2.9 and latest one is 2.32.
Download the latest chromedriver from below link
https://chromedriver.storage.googleapis.com/index.html?path=2.32/
And then replace the old one. This should work

Latest version of RSelenium and Firefox

When I try to open the RSelenium I receive this error
[1] "Connecting to remote server"
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.firefox.NotConnectedException
The version of Firefox I have is
Firefox version: 480b10
According to this I tried to update the server version
library("RSelenium")
startServer()
unlink(system.file("bin", package = "RSelenium"), recursive = T)
checkForServer(update = TRUE)
remDr <- remoteDriver()
Sys.sleep(5)
remDr$open()
Sys.sleep(5)
but the problem still exist. Does anyone face this problem? Any possible solution?
From Firefox 48 on-wards the gecko driver/ marionette will be needed to run Firefox with Selenium.
If you have Firefox 48 you can run the gecko driver as follows:
Refer to the guidelines
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver
Download the relevant gecko driver from https://github.com/mozilla/geckodriver/releases
Add it to your PATH or refer to the location when starting binary (see below)
# get beta selenium standalone
RSelenium::checkForServer(beta = TRUE)
# assume gecko driver is not in our path (assume windows and we downloaded to docs folder)
# if the driver is in your PATH the javaargs call is not needed
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/john/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open()
....
....
remDr$close()
selServ$stop()
The above currently requires the dev version of RSelenium. Alternatively you can download the Selenium binary from http://selenium-release.storage.googleapis.com/index.html . Pick the 3.0 beta 2 binary to currently run with Firefox 48. Run the binary
java -Dwebdriver.gecko.driver=C:/Users/john/Documents/geckodriver.exe -jar selenium-server-standalone-3.0.0-beta2.jar

Resources