RSelenium in MAC - r

I am using R R 3.1.1 on OS X Yosemite(10.10.4). I have recently installed RSelenium and I constantly receive an unknown error. The code that I use is as follow:
require(RSelenium)
checkForServer()
startServer()
Sys.sleep(5)
remDr <- remoteDriver()
remDr$open()
The error is as follows:
remDr$open()
[1] "Connecting to remote server"
Undefined error in RCurl call.
Error in queryRD(paste0(serverURL, "/session"), "POST", qdata = toJSON(serverOpts)) :
I tried downloading the selenium-java-2.41.0 from the official website. Then I put the file in the Library/Java/Extension. Then I tried this line of code
system("java -jar ~/Library/Java/Extension/selenium-2.47-2.1/selenium-java-2.47.1.jar")
But it did not worked and I kept on receiving the same error.
Then I used the terminal to install the package like this:
sudo java -jar selenium-server-standalone-2.47.1.jar
It installed something but still the problem did not solved.
I have no idea what else to do.

It's a security issue for Mac's. You need to download the standalone selenium server from http://www.seleniumhq.org/download/, put it in the same directory as the script you are trying to run, then run it. Your security settings might prevent this because it's "not authenticated," which means you'll have to go to your security settings and manually override. After that, it'll work fine.
Source:
http://www.computerworld.com/article/2971265/application-development/how-to-drive-a-web-browser-with-r-and-rselenium.html

I don't know if you are still interested, but I struggled with this for days! Here is what works for my installation (RSelenium 1.3.5, phantom for Mac OS X 2.0.0, R 3.2.2, OS X Yosemite 10.10.4):
library("RSelenium")
message("Starting Phantom JS ...")
pJS <- phantom() # starts PhantomJS in webdriver mode on port 4444
Sys.sleep(5) # give binary time to run and open port
eCap <- list(phantomjs.page.settings.userAgent
= "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36")
remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap)
message("Opening headless browser session ...")
remDr$open(silent=T)
Sys.sleep(5) # give it a moment
Phantom is in the usr/bin/ directory. Notice there is no "startServer()" statement or finding the selenium jar and running it. If you run the Selenium server directly it opens port 4444, and then Phantom JS will not start on that port. Use the command "lsof -i :4444" in the Mac terminal window to see what is happening on port 4444.
Having done all this, the operation is still not satisfactory - I can only execute a handful of RSelenium commands before I get the spinning color wheel and have to go to the terminal window and issue a "kill PID" command to get control of R again. I have tried putting in delays all over the place in case the problem is slow website response time, but it doesn't make any difference.
Good luck.

Related

R Rselenium ".... Failed to connect to localhost port 4444: Connection refused"

Seeking guidance on how to resolve the subject line error.
The many previous posts and solutions referenced here have already been reviewed/tried.
In the past this same error has been resolved by updating R, Rselenium, Selenium Server (selenium-server-4.1.3.jar), Java, Chrome browser, Chromedriver and/or Gecko Driver (when using Firefox). All were updated to the latest versions. Also tried Firefox. Error remains.
Windows 10 was updated/computer rebooted. No joy.
The code, which has worked for years and as recently as two weeks ago:
remDr <- remoteDriver(browserName = "chrome")
remDr$open(silent = TRUE)
Error message and parameters:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
remDr
$remoteServerAddr
[1] "localhost"
$port
[1] 4444
$browserName
[1] "chrome"
$version
[1] ""
$platform
[1] "ANY"
$javascript
[1] TRUE
$nativeEvents
[1] TRUE
$extraCapabilities
list()
What else should I examine or try?
The solution was to revert back to selenium-server-standalone-3.9.1.jar.
For folks trying to set this up for the first time, the steps that work for me are to run a batch file (.cmd) with the following two lines before running the R file.
java -jar selenium-server-standalone-3.9.1.jar
pause
Of course, edit the first line to match the file name as new selenium server versions release. Place the .jar file and browser drivers in a folder that's in the system search path (I edit the system path to include a custom folder that's dedicated to RSelenium related files).
When the command box pops up, you should see the following line:
07:35:53.054 INFO - Selenium Server is up and running on port 4444
My big mistake was not to double check for that line, once I returned to this with fresh eyes I realized that I should look for that line, then the solution was obvious.
Then these RSelenium commands work:
remDr <- remoteDriver(browserName = "chrome")
remDr$open(silent = TRUE)

RSelenium through docker

My OS is windows 8.1 and I have the version 3.3.3 of R.
I have installed the RSelenium packages and I try to run it using this:
library("RSelenium")
#start RSelenium server
startServer()
checkForServer()
and I receive this error:
Error: checkForServer is now defunct. Users in future can find the function in
file.path(find.package("RSelenium"), "examples/serverUtils"). The
recommended way to run a selenium server is via Docker. Alternatively
see the RSelenium::rsDriver function.
Is there anything changed in the way RSelenium opens? I search for the error and I found only this but it doesn't help me. What can I do?
Also an alternative I tried is to download the chromedrive from here 'https://sites.google.com/a/chromium.org/chromedriver/downloads'
and using this script:
require(RSelenium)
cprof <- getChromeProfile("C:/Users/Peri/Desktop/chromedriver/chromedriver.exe", "Profile 1")
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost"
, port = 4444
, browserName = "chrome", extraCapabilities = cprof
)
remDr$open()
and I receive this error:
Error in checkError(res) :
Couldnt connect to host on http://localhost:4444/wd/hub.
Please ensure a Selenium server is running.
what can I do to run chrome instead of the pre-default browser Firefox?
You need to use the function rsDriver. The Selenium Version wants you to use Docker (which I also would recommend), but if you are not familiar with this you can go this way.
rsdriver will manage the binaries needed for running a Selenium Server. This provides a wrapper around the wdman::selenium function.
Here is what you have to do to start a Chrome Browser:
driver<- rsDriver()
remDr <- driver[["client"]]
And then you can work with it:
remDr$navigate("http://www.google.de")
remDr$navigate("http://www.spiegel.de")
And stop it:
remDr$close()

How to set up rselenium for R?

"everything was better back then"...
since firefox 49 (?) you can't use the rselenium package not straightforward anymore. I have searched the whole internet to find a SIMPLE How To Manual for setting up rselenium but did not find anything relevant and uptodate.
Can someone provide me and all the others out there who have no clue a simple How To manual? Like:
download XY
open AB
so I can run code like the following
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L,
browserName = "firefox")
remDr$open()
Download latest version of RSelenium >= 1.7.1. Run the following:
library(RSelenium)
rD <- rsDriver() # runs a chrome browser, wait for necessary files to download
remDr <- rD$client
# no need for remDr$open() browser should already be open
If you want a firefox browser use rsDriver(browser = "firefox").
This is detailed in http://rpubs.com/johndharrison/RSelenium-Basics appendix. The recommended way to run RSelenium is via Docker containers however. Instructions for use of Docker with RSelenium can be found at http://rpubs.com/johndharrison/RSelenium-Docker
ISSUES:
If you have issues which may occur due to admin rights or other variables such as anti-virus software you can run a Selenium server manually. The easiest way to do this is via the wdman package:
selCommand<-
wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"),
retcommand = TRUE)
> cat(selCommand)
C:\PROGRA~3\Oracle\Java\javapath\java.exe -Dwebdriver.chrome.verboseLogging=true -Dwebdriver.chrome.driver="C:\Users\john\AppData\Local\binman\binman_chromedriver\win32\2.27/chromedriver.exe" -Dwebdriver.gecko.driver="C:\Users\john\AppData\Local\binman\binman_geckodriver\win64\0.14.0/geckodriver.exe" -Dphantomjs.binary.path="C:\Users\john\AppData\Local\binman\binman_phantomjs\windows\2.1.1/phantomjs-2.1.1-windows/bin/phantomjs.exe" -jar "C:\Users\john\AppData\Local\binman\binman_seleniumserver\generic\3.0.1/selenium-server-standalone-3.0.1.jar" -port 4567
Using one of the wdman functions with the retcommand option enabled will return the
commandline call that would have been ran.
Now you can run the output of cat(selCommand) in a terminal
C:\Users\john>C:\PROGRA~3\Oracle\Java\javapath\java.exe -Dwebdriver.chrome.verboseLogging=true -Dwebdriver.chrome.driver="C:\Users\john\AppData\Local\binman\binman_chromedriver\win32\2.27/chromedriver.exe" -Dwebdriver.gecko.driver="C:\Users\john\AppData\Local\binman\binman_geckodriver\win64\0.14.0/geckodriver.exe" -Dphantomjs.binary.path="C:\Users\john\AppData\Local\binman\binman_phantomjs\windows\2.1.1/phantomjs-2.1.1-windows/bin/phantomjs.exe" -jar "C:\Users\john\AppData\Local\binman\binman_seleniumserver\generic\3.0.1/selenium-server-standalone-3.0.1.jar" -port 4567
12:15:29.206 INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
12:15:29.206 INFO - Launching a standalone Selenium Server
2017-02-08 12:15:29.223:INFO::main: Logging initialized #146ms
12:15:29.265 INFO - Driver class not found: com.opera.core.systems.OperaDriver
12:15:29.265 INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
Unable to create new instances on this machine.
12:15:29.265 INFO - Driver class not found: com.opera.core.systems.OperaDriver
12:15:29.266 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
12:15:29.271 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform WIN10
2017-02-08 12:15:29.302:INFO:osjs.Server:main: jetty-9.2.15.v20160210
2017-02-08 12:15:29.317:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler#c4c815{/,null,AVAILABLE}
2017-02-08 12:15:29.332:INFO:osjs.ServerConnector:main: Started ServerConnector#4af044{HTTP/1.1}{0.0.0.0:4567}
2017-02-08 12:15:29.333:INFO:osjs.Server:main: Started #257ms
12:15:29.334 INFO - Selenium Server is up and running
Now try and run a browser
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()
If you are unable to manually run a Selenium Server then you will need to address your issues (including relevant log files) to the Selenium project or the appropriate driver project (chromedriver/geckodriver/ghostdirver etc.)
Download Docker at https://www.docker.com/products/docker-desktop
Run docker pull selenium/standalone-chrome-debug in terminal (or cmd for windows)
In Docker Desktop's Dashboard, go to the "images" tab on the left. After that, you should see something like this:
Click Run
A popup will appear. There, click on "Optional Settings"
Type 4445 on Ports. Click on the "plus" sign, type 5901 on the other input that will be created on Ports. It should look like the image below. After that, click Run.
Now, if you click on the Containers / Apps tab on the left, there should be something like this:
In Rs console, go:
install.packages("RSelenium")
library(RSelenium)
remDr <- remoteDriver(
remoteServerAdd = "localhost",
port = 4445L,
browser = "chrome"
)
remDr$open()
Every time you want RSelenium to work you will need to run the Docker container as you did in steps 3 and 5 above.
The steps also allow you to use VNC to watch what happens and debug. If you need to learn a bit about it go to https://www.realvnc.com/pt/connect/download/viewer/ More details are out of the scope of this topic.
Well, I think this can take you to a point where you can now follow these instructions of RSelenium's basic usage vignette: https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html
You should also read about security related to exposed ports and how to handle it.
These videos from R Consortium may help you out from here on:
https://www.youtube.com/watch?v=OxbvFiYxEzI and https://www.youtube.com/watch?v=JcIeWiljQG4
I hope it may help you as you would have helped me some time ago.

Having trouble connecting RSelenium to Server

I've been learning R programming for the last few months and really enjoying the language. I wanted to start using it to automate a few things at work. However for the life of me no matter how much I Google or experiment I can't seem to start the browser.
I followed the steps from this article
https://www.r-bloggers.com/rselenium-a-wonderful-tool-for-web-scraping/
and got the server started from the command line. This is the code I ran in the console and the error message I'm getting.
> library(RSelenium)
> checkForServer()
Warning message:
checkForServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
I'm running on Windows 10 64-bit and have installed the latest Firefox.
Any help or pointers on this would be greatly appreciated.
Thanks,
Shan
Okay, I just went through this. So you can skip the whole Selenium Server entirely by just using phantomjs, which RSelenium can call directly.
Steps:
Download phantomjs for your platform here
Put this binary file in the system PATH or anywhere else you have access too from R
Now try this:
library(RSelenium)
pJS <- phantom(pjs_cmd = "<YOUR BINARY LOCATION>") # no arg if it's in PATH
Sys.sleep(5)
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = T)
url <- "http://www.google.com"
remDr$navigate(url)
remDr$screenshot(display = TRUE)
NOTE: When I run this I get an error after the first step, but it still works and pulls up the page. Not sure why that happens.

RSelenium, Can't start server

I'm trying to use RSelenium for web-scraping purposes behind a login and I can't get the server to run.
Current result:
library(RSelenium)
startServer()
remDr <- remoteDriver(port = 4444,
browserName = "firefox")
remDr$open()
# [1] "Connecting to remote server"
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: org.openqa.selenium.firefox.NotConnectedException
I've tried running the server myself by downloading and trying to open it (nothing happens).
This was a tough one and stopped me for a couple of days when I could search on it. In the end I uninstalled Firefox and installed version 37.0 while also disabling the update service. That fixed it for me and RSelenium works fine again.
Run the following code first then it should work:
RSelenium::checkForServer()
This line of code installs the selenium server which you need for running RSelenium commands.
Try below.
rD <- rsDriver(port=4444L,browser="firefox")
mybrowser <- remoteDriver(browserName = "firefox")
mybrowser$open()
RSelenium has problems to establish serwer at the begginig on respective port. Subsequently we are telling which driver should be used.

Resources