Rselenium issue - r

I'm trying to scrape a website using Rselenium. However I'm getting an error:
Error: checkForServer is now defunct. Users in future can find the function in
file.path(find.package("RSelenium"), "examples/serverUtils"). The
recommended way to run a selenium server is via Docker. Alternatively
see the RSelenium::rsDriver function.
My chrome is updted to version 58 and moxilla to version 45, rselenium used to work earlier but I'm not sure what happened please help guys.

The following script works for me with the new RSelenium...
rD <- rsDriver(port=4444L,browser="chrome")
remDr <- rD$client
remDr$navigate(url)

Just make sure you have docker account and you have it installed.
try this
library('RSelenium')
rD=rsDriver()
remDr =rD[["client"]]
remDr$navigate("https://www.vinmonopolet.no/vmp/Land/Chile/Gato-Negro-Cabernet-Sauvignon-2017/p/295301")
webElement = remDr$findElement('xpath', '//*[#id="product_2953010"]/span[2]')
webElement$clickElement()

Related

R / Rvest / RSelenium: scrape data from JS Sites

I am new to the web scraping topic with R and Rvest. With rvest you can scrape static HTML but I have found out that rvest struggeling to scrape data from heavy JS based Sites.
I found some articels or blog posts but they seems depricated like
https://awesomeopensource.com/project/yusuzech/r-web-scraping-cheat-sheet
In my case i want scrape odds from Sport Betting Sites but with rvest and SelectorGadget this isnt possible in my Opinion because of the JS.
There is an Articel from 2018 about scraping Odds from PaddyPower(https://www.r-bloggers.com/how-to-scrape-data-from-a-javascript-website-with-r/) but this is out dated too, because PhantomJS isnt available anymore.
RSelenium seems to be an option but the repo has many issues https://github.com/ropensci/RSelenium.
So is it possible to work with RSelenium in its current state or what options do I have instead of RSelenium?
kind regards
I've had no problems using RSelenium with the help of the wdman package, which allowed me to just not bother with Docker. wdman also fetches all binaries you need if they aren't already available. It's nice magic.
Here's a simple script to spin up a Selenium instance with Chrome, open a site, get the contents as xml and then close it all down again.
library(wdman)
library(RSelenium)
library(xml2)
# start a selenium server with wdman, running the latest chrome version
selServ <- wdman::selenium(
port = 4444L,
version = 'latest',
chromever = 'latest'
)
# start your chrome Driver on the selenium server
remDr <- remoteDriver(
remoteServerAddr = 'localhost',
port = 4444L,
browserName = 'chrome'
)
# open a selenium browser tab
remDr$open()
# navigate to your site
remDr$navigate(some_url)
# get the html contents of that site as xml tree
page_xml <- xml2::read_html(remDr$getPageSource()[[1]])
# do your magic
# ... check doc at `?remoteDriver` to see what your remDr object can help you do.
# clean up after you
remDr$close()
selServ$stop()

RSelenium through docker

My OS is windows 8.1 and I have the version 3.3.3 of R.
I have installed the RSelenium packages and I try to run it using this:
library("RSelenium")
#start RSelenium server
startServer()
checkForServer()
and I receive this error:
Error: checkForServer is now defunct. Users in future can find the function in
file.path(find.package("RSelenium"), "examples/serverUtils"). The
recommended way to run a selenium server is via Docker. Alternatively
see the RSelenium::rsDriver function.
Is there anything changed in the way RSelenium opens? I search for the error and I found only this but it doesn't help me. What can I do?
Also an alternative I tried is to download the chromedrive from here 'https://sites.google.com/a/chromium.org/chromedriver/downloads'
and using this script:
require(RSelenium)
cprof <- getChromeProfile("C:/Users/Peri/Desktop/chromedriver/chromedriver.exe", "Profile 1")
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444
, browserName = "chrome", extraCapabilities = cprof
)
remDr$open()
and I receive this error:
Error in checkError(res) :
Couldnt connect to host on http://localhost:4444/wd/hub.
Please ensure a Selenium server is running.
what can I do to run chrome instead of the pre-default browser Firefox?
You need to use the function rsDriver. The Selenium Version wants you to use Docker (which I also would recommend), but if you are not familiar with this you can go this way.
rsdriver will manage the binaries needed for running a Selenium Server. This provides a wrapper around the wdman::selenium function.
Here is what you have to do to start a Chrome Browser:
driver<- rsDriver()
remDr <- driver[["client"]]
And then you can work with it:
remDr$navigate("http://www.google.de")
remDr$navigate("http://www.spiegel.de")
And stop it:
remDr$close()

Having trouble connecting RSelenium to Server

I've been learning R programming for the last few months and really enjoying the language. I wanted to start using it to automate a few things at work. However for the life of me no matter how much I Google or experiment I can't seem to start the browser.
I followed the steps from this article
https://www.r-bloggers.com/rselenium-a-wonderful-tool-for-web-scraping/
and got the server started from the command line. This is the code I ran in the console and the error message I'm getting.
> library(RSelenium)
> checkForServer()
Warning message:
checkForServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
I'm running on Windows 10 64-bit and have installed the latest Firefox.
Any help or pointers on this would be greatly appreciated.
Thanks,
Shan
Okay, I just went through this. So you can skip the whole Selenium Server entirely by just using phantomjs, which RSelenium can call directly.
Steps:
Download phantomjs for your platform here
Put this binary file in the system PATH or anywhere else you have access too from R
Now try this:
library(RSelenium)
pJS <- phantom(pjs_cmd = "<YOUR BINARY LOCATION>") # no arg if it's in PATH
Sys.sleep(5)
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = T)
url <- "http://www.google.com"
remDr$navigate(url)
remDr$screenshot(display = TRUE)
NOTE: When I run this I get an error after the first step, but it still works and pulls up the page. Not sure why that happens.

phantomjs unable to find element on page

Recently, I've been having trouble driving phantomjs under RSelenium. It seems that the browser is unable to locate anything on the page using findElement(). If I pass something as simple as:
library("RSelenium")
RSelenium::checkForServer()
RSelenium::startServer()
rd <- remoteDriver(browserName = "phantomjs")
rd$open()
Sys.sleep(5)
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
I get the error below:
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Any thoughts on what is causing this? It doesn't seem to matter what page I navigate to; it simply fails anytime I try to locate an element on the webpage. This issue started recently and I noticed it when my cron jobs began failing.
I'm working in Ubuntu 14.04 LTS with R 3.3.1 and phantomjs 2.1.1. I don't suspect some type of compatibility issue as this has worked very recently and I haven't updated anything.
The version of phantomjs you installed may be limited. See here
Disabled Ghostdriver due to pre-built source-less Selenium blobs.
Added README.Debian explaining differences from upstream "phantomjs".
If you installed recently using apt-get then this is most likely the case. You can download from the phantomjs website and place the bin location in your PATH.
Alternatively use npm to install a version for you
npm install phantomjs-prebuilt
This will then but a link to the bin in node_modules/.bin/phantomjs.
For the reasons behind the limitations in apt-get you can read the README.Debian file contained here.
Limitations
Unlike original "phantomjs" binary that is statically linked with
modified QT+WebKit, Debian package is built with system libqt5webkit5.
Unfortunately the latter do not have webSecurity extensions therefore
"--web-security=no" is expected to fail.
https://github.com/ariya/phantomjs/issues/13727#issuecomment-155609276
Ghostdriver is crippled due to removed source-less pre-built blobs:
src/ghostdriver/third_party/webdriver-atoms/*
Therefore all PDF functionality is broken.
PhantomJS cannot run in headless mode (if there is no X server
available).
Unfortunately it can not be fixed in Debian. To achieve headless-ness
upstream statically link with customised QT + Webkit. We don't want to
ship forks of those projects. It would be great to eventually convince
upstream to use standard libraries. Meanwhile one can use "xvfb-run"
from "xvfb" package:
xvfb-run --server-args="-screen 0 640x480x16" phantomjs
If you don't want to set your path for phantomjs then you can add it as a extra:
library(RSelenium)
selServ <- startServer()
pBin <- list(phantomjs.binary.path = "/home/john/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs")
rd <- remoteDriver(browserName = "phantomjs"
, extraCapabilities = pBin)
Sys.sleep(5)
rd$open()
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
rd$close()
selServ$stop()

RSelenium, Can't start server

I'm trying to use RSelenium for web-scraping purposes behind a login and I can't get the server to run.
Current result:
library(RSelenium)
startServer()
remDr <- remoteDriver(port = 4444,
browserName = "firefox")
remDr$open()
# [1] "Connecting to remote server"
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.firefox.NotConnectedException
I've tried running the server myself by downloading and trying to open it (nothing happens).
This was a tough one and stopped me for a couple of days when I could search on it. In the end I uninstalled Firefox and installed version 37.0 while also disabling the update service. That fixed it for me and RSelenium works fine again.
Run the following code first then it should work:
RSelenium::checkForServer()
This line of code installs the selenium server which you need for running RSelenium commands.
Try below.
rD <- rsDriver(port=4444L,browser="firefox")
mybrowser <- remoteDriver(browserName = "firefox")
mybrowser$open()
RSelenium has problems to establish serwer at the begginig on respective port. Subsequently we are telling which driver should be used.

Resources