phantomjs unable to find element on page - r

Recently, I've been having trouble driving phantomjs under RSelenium. It seems that the browser is unable to locate anything on the page using findElement(). If I pass something as simple as:
library("RSelenium")
RSelenium::checkForServer()
RSelenium::startServer()
rd <- remoteDriver(browserName = "phantomjs")
rd$open()
Sys.sleep(5)
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
I get the error below:
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Any thoughts on what is causing this? It doesn't seem to matter what page I navigate to; it simply fails anytime I try to locate an element on the webpage. This issue started recently and I noticed it when my cron jobs began failing.
I'm working in Ubuntu 14.04 LTS with R 3.3.1 and phantomjs 2.1.1. I don't suspect some type of compatibility issue as this has worked very recently and I haven't updated anything.

The version of phantomjs you installed may be limited. See here
Disabled Ghostdriver due to pre-built source-less Selenium blobs.
Added README.Debian explaining differences from upstream "phantomjs".
If you installed recently using apt-get then this is most likely the case. You can download from the phantomjs website and place the bin location in your PATH.
Alternatively use npm to install a version for you
npm install phantomjs-prebuilt
This will then but a link to the bin in node_modules/.bin/phantomjs.
For the reasons behind the limitations in apt-get you can read the README.Debian file contained here.
Limitations
Unlike original "phantomjs" binary that is statically linked with
modified QT+WebKit, Debian package is built with system libqt5webkit5.
Unfortunately the latter do not have webSecurity extensions therefore
"--web-security=no" is expected to fail.
https://github.com/ariya/phantomjs/issues/13727#issuecomment-155609276
Ghostdriver is crippled due to removed source-less pre-built blobs:
src/ghostdriver/third_party/webdriver-atoms/*
Therefore all PDF functionality is broken.
PhantomJS cannot run in headless mode (if there is no X server
available).
Unfortunately it can not be fixed in Debian. To achieve headless-ness
upstream statically link with customised QT + Webkit. We don't want to
ship forks of those projects. It would be great to eventually convince
upstream to use standard libraries. Meanwhile one can use "xvfb-run"
from "xvfb" package:
xvfb-run --server-args="-screen 0 640x480x16" phantomjs
If you don't want to set your path for phantomjs then you can add it as a extra:
library(RSelenium)
selServ <- startServer()
pBin <- list(phantomjs.binary.path = "/home/john/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs")
rd <- remoteDriver(browserName = "phantomjs"
, extraCapabilities = pBin)
Sys.sleep(5)
rd$open()
rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")
rd$close()
selServ$stop()

Related

github actions 'SSL connect error' on R package check on ubuntu but not macOS or windows

I'm hitting an SSL error in a ubuntu session of GitHub actions that I'm not sure I understand. these are the three lines of R code in the GitHub action that works on windows and macOS but fail on Ubuntu:
tf <- tempfile()
this_url <- "https://webfs.oecd.org/piaac/puf-data/SAS/SAS7BDAT/prgusap1_2012.sas7bdat"
download.file( this_url , tf , mode = 'wb' )
You can see the windows/macOS successes and ubuntu failures at: https://github.com/asdfree/piaac/actions/runs/4130891484
From what I've found elsewhere, it looks like I need to use OpenSSL on the Linux runner of the GitHub Actions.
I found some examples of how to use OpenSSL within an r.yml file here: https://github.com/jeroen/testbug/blob/929151a0f19ed029d0ae34a7b3661d453974de53/.github/workflows/test.yaml
But my attempt to modify the r.yml file didn't seem to solve the issue: https://github.com/asdfree/piaac/commit/9eb93f8ac8c2eae61300364337b6a357ba56d9e7
i've also attempted three alternatives to download.file() -- ubuntu still fails where windows and macOS succeeds. the error here shows up as unsafe legacy renegotiation disabled even when i use ssl.verifypeer=FALSE
https://github.com/asdfree/piaac/commit/07cb4808a4d027f28d6e8cefdf5af09493dcb309
https://github.com/asdfree/piaac/actions/runs/4131647361
https://github.com/asdfree/piaac/commit/468588f6813cfe734d7b11b875070b331189cbc0
https://github.com/asdfree/piaac/actions/runs/4131695976
https://github.com/asdfree/piaac/commit/77e228ef72fc3a781dba51c23ceb9f54f0fac286
https://github.com/asdfree/piaac/actions/runs/4131784227
Any advice about how to work around this issue would be appreciated. Thanks!

R RSelenium rsDriver chrome browser error on Mac

I am using a Mac (OS 10.13.6) and am trying to learn how to use RSelenium.
I have installed RSelenium but am having trouble with the rsDriver command:
rD <- rsDriver(browser="chrome",chromever="80.0.3987.106")
I get this error:
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused
Check server log for further details.
Warning message:
In rsDriver(browser = "chrome", chromever = "80.0.3987.106") :
Could not determine server status.
I've been poking around for help for a couple of days now but am not clear on the appropriate solution here. I've tried the command with chromever="latest" and following the suggested workaround found here: stackoverflow.com/questions/55201226/. Furthermore, I don't know where to find the "server log" mentioned in the error.
Having never used this package before, or done this type of thing, I can't tell if I just don't have things set up on my machine correctly (non-R requirements of RSelenium that I need to install and where), or whether this is strictly a chrome browser setting/verison issue, or generally mac compatibility issue.
Does anyone an updated (i.e. not involving the defunct checkForServer() command) set of steps (for absolute Selenium beginners) for getting RSelenium set up and rsDriver working on a mac?
After a lot of trials and errors, I managed to solve the same issue by installing
Java SE Development Kit 14 on my Mac.
I hope this solves your issue.

Having trouble connecting RSelenium to Server

I've been learning R programming for the last few months and really enjoying the language. I wanted to start using it to automate a few things at work. However for the life of me no matter how much I Google or experiment I can't seem to start the browser.
I followed the steps from this article
https://www.r-bloggers.com/rselenium-a-wonderful-tool-for-web-scraping/
and got the server started from the command line. This is the code I ran in the console and the error message I'm getting.
> library(RSelenium)
> checkForServer()
Warning message:
checkForServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
I'm running on Windows 10 64-bit and have installed the latest Firefox.
Any help or pointers on this would be greatly appreciated.
Thanks,
Shan
Okay, I just went through this. So you can skip the whole Selenium Server entirely by just using phantomjs, which RSelenium can call directly.
Steps:
Download phantomjs for your platform here
Put this binary file in the system PATH or anywhere else you have access too from R
Now try this:
library(RSelenium)
pJS <- phantom(pjs_cmd = "<YOUR BINARY LOCATION>") # no arg if it's in PATH
Sys.sleep(5)
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = T)
url <- "http://www.google.com"
remDr$navigate(url)
remDr$screenshot(display = TRUE)
NOTE: When I run this I get an error after the first step, but it still works and pulls up the page. Not sure why that happens.

Can browser called from RSelenium run in the backround

I am working on a windows 7 machine. Is it possible to run remoteDriver()$open() from the RSelenium library and have the browser run in the background (i.e not visible).
Thanks
Yes, that is possible. The default browser for RSelenium is Firefox. However, RSelenium even supports headless browsing using PhantomJS which is described in the respective vignette in detail.
In general, for leveraging PhanomJS under Windows 7 you just need to
download PhantomJS and add the folder path to the phantomjs.exe as an additional entry to the user or system PATH variable in the Environment Variable menu on your system (e.g. C:\Program Files\phantomjs-1.9.7-windows) Note: phantomjs.exe itself is not part of the path specification.
replace the code snippets at the beginning and at the end of your code like described next
Default browsing:
checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open()
...
remDrv$quit()
remDrv$closeServer()
Headless browsing:
pJS <- phantom()
remDrv <- remoteDriver(browserName = 'phantomjs')
remDrv$open()
...
remDrv$close()
pJS$stop()
Additional advice
Command line arguments and POODLE
Pay attention to the command line arguments which you can pass to phantom.
For instance, PhantomJS uses SSLv3 by default which is discouraged by every server since POODLE.
The workaround is to call phantom with --ssl-protocol=tlsv1:
pJS <- phantom(extras = c('--ssl-protocol=tlsv1'))
Timing issues
One thing that often happens with PhantomJS is timing issues. Code that works with browsers such as Firefox and Chrome breaks with PhantomJS because PhantomJS is too fast.
You can solve this issue by placing Sys.sleep calls between the different remoteDriver calls.

R Gist script gives error in RGui console but works fine in RStudio console - Windows 8 R3.1.2(64 bit) [duplicate]

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

Resources