I recently got a new MacBook which seems to have changed several RSelenium settings. I've gotten my scraping to work properly again, except when I start a server with rsDriver, an actual window pops up and I don't know how to stop it.
I do like having this feature because it makes it so much easier to debug, but I run several scripts in the background while working and whenever rsDriver gets used it becomes the desktop focus. This is very frustrating since it disrupts whatever I'm currently working on.
To start a driver, I run:
remDrall <- rsDriver(port = 4446L, browser = "chrome", chromever = "84.0.4147.30", verbose = F)
When it's completed, a chrome window pops up with the url = "data:,". I feel like it should be an easy fix (like a silent = TRUE param), but I've looked through the documentation and can't find anything.
Running Catalina v 10.15.4
RSelenium package is v 1.7.1
Side note, but I'm not really sure how my rsDriver statement is working, since the port isn't the same as my docker image. I open up my docker with $ docker run -d -p 4567:4444 selenium/standalone-chrome, but if I try to use rsDriver(port = 4567L) then it doesn't work because it says the port is already in use. So I don't even have a docker image with port 4446, but my rsDriver statement still works...
Try adding:
extraCapabilities = list("chromeOptions" = list(args = list('--headless')))
as an extra argument to rsDriver.
Related
I'll start by saying I have ran into this issue many times in the past, and all that's been required is updating my Google Chrome version and then updating the chromever = param in rsDriver(). These are no longer working for me.
I've tried the solutions in many posts (this one in particular) but I still can't get it to work.
Here are some details:
Computer/browser/R info:
Chrome Version: 89.0.4389.90
Mac Version: 10.15.4
RStudio Version: 1.3.959
For the longest time, I've been able to use chromever = "87.0.4280.20" even though my browser wasn't on that version. I could open up a remote driver with remDrall <- rsDriver(port = 4445L, browser = "chrome", chromever = "87.0.4280.20")
When I try this now, I get an error saying
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 87
Current browser version is 89.0.4389.90 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
This prompted me to try updating my ChromeDriver if it was stuck on 87. I updated it by manually downloading and moving to /usr/local/bin/chromedriver as well as updating via brew upgrade chromedriver. As far as I know it worked, testing via:
ls /usr/local/Caskroom/chromedriver/ gives me 89.0.4389.23
/usr/local/bin/chromedriver starts a chromedriver session with 89.0.4389.23
I've tried using chromever = "89.0.4389.23" but I get an unknown server-side error. I know my chrome version is 89.0.4389.90, but that version isn't available to use in rsDriver.
I'm really not sure where to go from here. I just don't get why it says my ChromeDriver only supports Chrome version 87 when I clearly have it updated to 89. Could it be that my rsDriver function is still picking up some old version of ChromeDriver? Can I direct it specifically to the one in /usr/local/bin/?
Any thoughts on what I can try next?
Try to replace the chromedriver.exe file on you local directory with the updated version from here:
https://chromedriver.storage.googleapis.com/index.html?path=89.0.4389.23/
I ran into a similar problem last week and was able to get the command working again by making the following change to the chromever option:
Previous version that has stopped working:
rsDriver(chromever = "87.0.4280.88", browser = "chrome", extraCapabilities = eCaps)
New version that works:
rsDriver(chromever = "89.0.4389.23", browser = "chrome", extraCapabilities = eCaps)
MacOS Sierra 10.12.4. Chrome 63 (most recent). R 1.1.383.
I'm using RSelenium to scrape web data. I'm able to pull data using the remote driver, but the actual web page browser doesn't pop up for me to view. This makes it difficult to debug some of my trickier web pulls. This is an example video of what I want to happen. The user can visually see the changes he's making in the browser- The goal of this post is to find out why I cannot visually see the browser as I run the code.
Here's an example of my process to pull from RSelenium.
From the Terminal:
(name)$ docker run -d -p 4567:4444 selenium/standalone-chrome
(name)$ docker ps
output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8de3a1cbc777 selenium/standalone-chrome "/opt/bin/entry_po..." 5 minutes ago Up 5 minutes 0.0.0.0:4567->4444/tcp wizardly_einstein
In R
library(RSelenium)
library(magrittr)
library(stringr)
library(stringi)
library(XML)
remDr <- rsDriver(port = 4567L, browser = "chrome")
remDr$client$open()
remDr$client$navigate("https://shiny.rstudio.com/gallery/datatables-options.html")
webElems <- remDr$client$findElements("css selector", "iframe")
remDr$client$switchToFrame(webElems[[1]])
elems <- remDr$client$findElements("css selector", "#showcase-app-container > nav > div > ul li")
unlist(lapply(elems, function(x) x$getElementText()))
[1] "Display length" "Length menu" "No pagination" "No filtering" "Function callback"
This is my confirmation that RSelenium is working properly. However, this is all happening "blindly" - I can't see what is going on. In a complicated web pull I'm trying to perform (hidden behind credentials so I can't give an example), certain elements cannot be found after iterations even though I know they are on the page. Being able to see the browser would allow me to easily debug the code.
Not sure if this means anything, but it doesn't look like the driver is attached to an IP address:
(name)$ docker-machine ip
Error: No machine name(s) specified and no "default" machine exists
Is there something else I need to download to be able to visually see the webdriving process? Thanks in advance.
I'm not sure about the exact behavior in that video, but I always use a phantomjs headless browser and look at screenshots as I go. This code would produce what I'm talking about:
library(RSelenium)
#this sets up the phantomjs driver
pjs <- wdman::phantomjs()
#open a connection to it
dr <- rsDriver(browser = 'phantomjs')
remdr <- dr[['client']]
#go to the site
remdr$navigate("https://stackoverflow.com/")
#show browser screenshot in viewer
remdr$screenshot(TRUE)
"everything was better back then"...
since firefox 49 (?) you can't use the rselenium package not straightforward anymore. I have searched the whole internet to find a SIMPLE How To Manual for setting up rselenium but did not find anything relevant and uptodate.
Can someone provide me and all the others out there who have no clue a simple How To manual? Like:
download XY
open AB
so I can run code like the following
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L,
browserName = "firefox")
remDr$open()
Download latest version of RSelenium >= 1.7.1. Run the following:
library(RSelenium)
rD <- rsDriver() # runs a chrome browser, wait for necessary files to download
remDr <- rD$client
# no need for remDr$open() browser should already be open
If you want a firefox browser use rsDriver(browser = "firefox").
This is detailed in http://rpubs.com/johndharrison/RSelenium-Basics appendix. The recommended way to run RSelenium is via Docker containers however. Instructions for use of Docker with RSelenium can be found at http://rpubs.com/johndharrison/RSelenium-Docker
ISSUES:
If you have issues which may occur due to admin rights or other variables such as anti-virus software you can run a Selenium server manually. The easiest way to do this is via the wdman package:
selCommand<-
wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"),
retcommand = TRUE)
> cat(selCommand)
C:\PROGRA~3\Oracle\Java\javapath\java.exe -Dwebdriver.chrome.verboseLogging=true -Dwebdriver.chrome.driver="C:\Users\john\AppData\Local\binman\binman_chromedriver\win32\2.27/chromedriver.exe" -Dwebdriver.gecko.driver="C:\Users\john\AppData\Local\binman\binman_geckodriver\win64\0.14.0/geckodriver.exe" -Dphantomjs.binary.path="C:\Users\john\AppData\Local\binman\binman_phantomjs\windows\2.1.1/phantomjs-2.1.1-windows/bin/phantomjs.exe" -jar "C:\Users\john\AppData\Local\binman\binman_seleniumserver\generic\3.0.1/selenium-server-standalone-3.0.1.jar" -port 4567
Using one of the wdman functions with the retcommand option enabled will return the
commandline call that would have been ran.
Now you can run the output of cat(selCommand) in a terminal
C:\Users\john>C:\PROGRA~3\Oracle\Java\javapath\java.exe -Dwebdriver.chrome.verboseLogging=true -Dwebdriver.chrome.driver="C:\Users\john\AppData\Local\binman\binman_chromedriver\win32\2.27/chromedriver.exe" -Dwebdriver.gecko.driver="C:\Users\john\AppData\Local\binman\binman_geckodriver\win64\0.14.0/geckodriver.exe" -Dphantomjs.binary.path="C:\Users\john\AppData\Local\binman\binman_phantomjs\windows\2.1.1/phantomjs-2.1.1-windows/bin/phantomjs.exe" -jar "C:\Users\john\AppData\Local\binman\binman_seleniumserver\generic\3.0.1/selenium-server-standalone-3.0.1.jar" -port 4567
12:15:29.206 INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
12:15:29.206 INFO - Launching a standalone Selenium Server
2017-02-08 12:15:29.223:INFO::main: Logging initialized #146ms
12:15:29.265 INFO - Driver class not found: com.opera.core.systems.OperaDriver
12:15:29.265 INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
Unable to create new instances on this machine.
12:15:29.265 INFO - Driver class not found: com.opera.core.systems.OperaDriver
12:15:29.266 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
12:15:29.271 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform WIN10
2017-02-08 12:15:29.302:INFO:osjs.Server:main: jetty-9.2.15.v20160210
2017-02-08 12:15:29.317:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler#c4c815{/,null,AVAILABLE}
2017-02-08 12:15:29.332:INFO:osjs.ServerConnector:main: Started ServerConnector#4af044{HTTP/1.1}{0.0.0.0:4567}
2017-02-08 12:15:29.333:INFO:osjs.Server:main: Started #257ms
12:15:29.334 INFO - Selenium Server is up and running
Now try and run a browser
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()
If you are unable to manually run a Selenium Server then you will need to address your issues (including relevant log files) to the Selenium project or the appropriate driver project (chromedriver/geckodriver/ghostdirver etc.)
Download Docker at https://www.docker.com/products/docker-desktop
Run docker pull selenium/standalone-chrome-debug in terminal (or cmd for windows)
In Docker Desktop's Dashboard, go to the "images" tab on the left. After that, you should see something like this:
Click Run
A popup will appear. There, click on "Optional Settings"
Type 4445 on Ports. Click on the "plus" sign, type 5901 on the other input that will be created on Ports. It should look like the image below. After that, click Run.
Now, if you click on the Containers / Apps tab on the left, there should be something like this:
In Rs console, go:
install.packages("RSelenium")
library(RSelenium)
remDr <- remoteDriver(
remoteServerAdd = "localhost",
port = 4445L,
browser = "chrome"
)
remDr$open()
Every time you want RSelenium to work you will need to run the Docker container as you did in steps 3 and 5 above.
The steps also allow you to use VNC to watch what happens and debug. If you need to learn a bit about it go to https://www.realvnc.com/pt/connect/download/viewer/ More details are out of the scope of this topic.
Well, I think this can take you to a point where you can now follow these instructions of RSelenium's basic usage vignette: https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html
You should also read about security related to exposed ports and how to handle it.
These videos from R Consortium may help you out from here on:
https://www.youtube.com/watch?v=OxbvFiYxEzI and https://www.youtube.com/watch?v=JcIeWiljQG4
I hope it may help you as you would have helped me some time ago.
I've been learning R programming for the last few months and really enjoying the language. I wanted to start using it to automate a few things at work. However for the life of me no matter how much I Google or experiment I can't seem to start the browser.
I followed the steps from this article
https://www.r-bloggers.com/rselenium-a-wonderful-tool-for-web-scraping/
and got the server started from the command line. This is the code I ran in the console and the error message I'm getting.
> library(RSelenium)
> checkForServer()
Warning message:
checkForServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
I'm running on Windows 10 64-bit and have installed the latest Firefox.
Any help or pointers on this would be greatly appreciated.
Thanks,
Shan
Okay, I just went through this. So you can skip the whole Selenium Server entirely by just using phantomjs, which RSelenium can call directly.
Steps:
Download phantomjs for your platform here
Put this binary file in the system PATH or anywhere else you have access too from R
Now try this:
library(RSelenium)
pJS <- phantom(pjs_cmd = "<YOUR BINARY LOCATION>") # no arg if it's in PATH
Sys.sleep(5)
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = T)
url <- "http://www.google.com"
remDr$navigate(url)
remDr$screenshot(display = TRUE)
NOTE: When I run this I get an error after the first step, but it still works and pulls up the page. Not sure why that happens.
Recently, I've been having trouble driving phantomjs under RSelenium. It seems that the browser is unable to locate anything on the page using findElement(). If I pass something as simple as:
library("RSelenium")
RSelenium::checkForServer()
RSelenium::startServer()
rd <- remoteDriver(browserName = "phantomjs")
rd$open()
Sys.sleep(5)
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
I get the error below:
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Any thoughts on what is causing this? It doesn't seem to matter what page I navigate to; it simply fails anytime I try to locate an element on the webpage. This issue started recently and I noticed it when my cron jobs began failing.
I'm working in Ubuntu 14.04 LTS with R 3.3.1 and phantomjs 2.1.1. I don't suspect some type of compatibility issue as this has worked very recently and I haven't updated anything.
The version of phantomjs you installed may be limited. See here
Disabled Ghostdriver due to pre-built source-less Selenium blobs.
Added README.Debian explaining differences from upstream "phantomjs".
If you installed recently using apt-get then this is most likely the case. You can download from the phantomjs website and place the bin location in your PATH.
Alternatively use npm to install a version for you
npm install phantomjs-prebuilt
This will then but a link to the bin in node_modules/.bin/phantomjs.
For the reasons behind the limitations in apt-get you can read the README.Debian file contained here.
Limitations
Unlike original "phantomjs" binary that is statically linked with
modified QT+WebKit, Debian package is built with system libqt5webkit5.
Unfortunately the latter do not have webSecurity extensions therefore
"--web-security=no" is expected to fail.
https://github.com/ariya/phantomjs/issues/13727#issuecomment-155609276
Ghostdriver is crippled due to removed source-less pre-built blobs:
src/ghostdriver/third_party/webdriver-atoms/*
Therefore all PDF functionality is broken.
PhantomJS cannot run in headless mode (if there is no X server
available).
Unfortunately it can not be fixed in Debian. To achieve headless-ness
upstream statically link with customised QT + Webkit. We don't want to
ship forks of those projects. It would be great to eventually convince
upstream to use standard libraries. Meanwhile one can use "xvfb-run"
from "xvfb" package:
xvfb-run --server-args="-screen 0 640x480x16" phantomjs
If you don't want to set your path for phantomjs then you can add it as a extra:
library(RSelenium)
selServ <- startServer()
pBin <- list(phantomjs.binary.path = "/home/john/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs")
rd <- remoteDriver(browserName = "phantomjs"
, extraCapabilities = pBin)
Sys.sleep(5)
rd$open()
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
rd$close()
selServ$stop()