RSelenium: Find link with Xpath - r

I want to find all links to PDF files in a page with RSelenium and Xpath.
Please consider
require(RSelenium)
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://cran.r-project.org/manuals.html")
In the page there are multiple links to PDF files such as
PDF
But my first try
remDr$findElement(using = "xpath", "//a[contains(#href,'.pdf')/#href")
produces the following error
Error: Summary: InvalidSelector
Detail: Argument was an invalid selector (e.g. XPath/CSS).
class: org.openqa.selenium.InvalidSelectorException
Am I getting the syntax wrong?

There is a syntax error inside your expression, missing a closing ]:
//a[contains(#href,'.pdf')]/#href
HERE^
But, even if you fix it, you'll get an error - a different one this time. This is because XPath expressions in selenium have to point to web elements and not element attributes. In other words, use //a[contains(#href,'.pdf')] to find an element and then get_attribute method to get the href attribute value.
Note that you may also find the link by link text:
remDr$findElement(using = "link text", "PDF")

Related

Rselenium: ‘sendKeystoElement’ is not a valid field or method name for reference class “webElement”

I am trying to insert a keyword into a search box on a website by using RSelenium:
library(RSelenium)
#Opens the Browser and nafigates to SHAB
driver <- rsDriver(browser="firefox", port=4545L, verbose=F)
remoteDriver <- driver[["client"]]
remoteDriver$navigate("https://shab.ch/#!/search/publications")
#Find Form Elements and fill insert the parameters
keyword <- "(nouveau+capital-actions)-reduction|(aktienkapital+neu)-herabgesetzt|(nuovo+capitale+azionario)-riduzione)"
element<- remoteDriver$findElement("id", "keyword")
element$sendKeystoElement(list(keyword))
However when I try to execute the code in RStudio I always get this error:
> element$sendKeystoElement(list(keyword))
Error in envRefInferField(x, what, getClass(class(x)), selfEnv) :
‘sendKeystoElement’ is not a valid field or method name for reference class “webElement”
Any suggestions?
You simply made a typo: it's not sendKeystoElement but sendKeysToElement.

Error in findElement function in RSelenium

I am trying to run this code:
library(RSelenium)
pJS<- phantom()
remDr <- remoteDriver(browserName = "phantomjs")
url<- "http://www.magicbricks.com/property-for-rent/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Service-Apartment,Residential-House,Villa&cityName=Mumbai"
remDr$open()
remDr$navigate(url)
webElem1 <- remDr$findElement("name", ">5 BHK")
webElem2 <- remDr$findElement("css", "#refinebedrooms li:nth-child(6)")
webElem3 <- remDr$findElement("css", "#viewMoreButton a")
But I keep getting the following error:
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
What does this mean? And how can I overcome it? I am new to R and a first time user of RSelenium so any kind of help would be much appreciated? TIA
Firstly, if you are new I would strongly recommend to go over the help file R-SELENIUM and then try using the package.
The element with name >5 BHK does not exist. And that is the reason you are getting an error. but the webElem2 is the same as webElem1(if this worked).
So to answer your question, you have to identify where the error occurs. and the error is pretty self-explanatory. NoSuchElement.
So one of your three webelements1,2,3 is not seen in the page by the webdriver. If you want to identify the elements using css assuming you are new to HTML too, I would suggest you to use Selector gadget to identify the element using css or xpath

blank value captures while scraping using Rselenium

I am trying to scrape a textbox value from the URL in the code. I picked the css using slector gadget. It is not able to capture the content in the text box. Tested several other CSS toobut the textbox value is not captured.
Text box is : construction year
Please help . Below is the code for reference.
url = "https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848"
values = list()
remDr$navigate(url)
page_source<-remDr$getPageSource()
a = read_html(page_source[[1]])
= html_nodes(a,"#ctl00_mainContentPlaceholder_txtConstructionYear_iu")
values = html_text(html_main_node)
values
Thanks in advance
Why RSelenium? It scrapes fine with rvest (though it is a horrible SharePoint site which may cause problems down the end with maintaining the proper view state cookies).
library(rvest)
pg <- html_session("https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848")
html_attr(html_nodes(pg, "input#ctl00_mainContentPlaceholder_txtConstructionYear_iu"), "value")
## [1] 1987
You should be grabbing the value attribute vs the node text. This should work in the your selenium code, too.
The above answer also works. But if you are only trying to use RSelenium. Here is the code
library(RSelenium)
checkForServer()
startServer()
Sys.sleep(5)
re<-remoteDriver()
re$open()
re$navigate("https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848")
re$findElement(using = "css selector", "#ctl00_mainContentPlaceholder_txtConstructionYear_iu")$clickElement()
text<-unlist(re$findElement(using = "css selector", "#ctl00_mainContentPlaceholder_txtConstructionYear_iu")$getElementAttribute("value"))
This works

Get Google Chrome's Inspect Element into R

This question is based on another that I saw closed which generated curiosity as I learned something new about Google Chrome's Inspect Element to create the HTML parsing path for XML::getNodeSet. While this question was closed as I think it may have been too broad I'll ask a smaller more focused question that may get at the root of the problem.
I tried to help the poster by writing code I typically use for scraping but ran into a wall immediately as the poster wanted elements from Google Chrome's Inspect Element. This is not the same as the HTML from htmlTreeParse as demonstrated here:
url <- "http://collegecost.ed.gov/scorecard/UniversityProfile.aspx?org=s&id=198969"
doc <- htmlTreeParse(url, useInternalNodes = TRUE)
m <- capture.output(doc)
any(grepl("258.12", m))
## FALSE
But here in Google Chrome's Inspect Element we can see that this information is provided (in yellow):
How can we get the information from Google Chrome's Inspect Element into R? The poster could obviously copy and paste the code into a text editor and parse that way but they are looking to scrape and thus that workflow does not scale. Once the poster can get this info into R they can then use typical HTML parsing techniques (XLM and RCurl-fu).
You should be able to scrape the page using something like the following code for RSelenium. You need to have java installed and available on your path for the startServer() line to work (and thus for you to be able to do anything).
library("RSelenium")
checkForServer()
startServer()
remDr <- remoteDriver(remoteServerAddr = "localhost",
port = 4444,
browserName = "firefox"
)
url <- "http://collegecost.ed.gov/scorecard/UniversityProfile.aspx?org=s&id=198969"
remDr$open()
remDr$navigate(url)
source <- remDr$getPageSource()[[1]]
Check to make sure it worked according to your test:
> grepl("258.12", source)
[1] TRUE

How do I click a javascript "link"? Is it my xpath or my relenium/selenium usage?

Like the beginning to any problem before I post it on stack overflow I think I have tried everything. This is a learning experience for me on how to work with javascript and xml so I'm guessing my problem is there.
My question is how to get the results of clicking on the parcel number links that are javascript links? I've tried getting the xpath of the link and using the $click method which following my intuition but this wasn't right or is at least not working for me.
Firefox 26.0
R 3.0.2
require(relenium)
library(XML)
library(stringr)
initializing_parcel_number <- "00000000000"
firefox <- firefoxClass$new()
firefox$get("http://www.muni.org/pw/public.html")
inputElement <- firefox$findElementByXPath("/html/body/form[2]/table/tbody/tr[2]/td/table[1]/tbody/tr[3]/td[4]/input[1]")
inputElement$sendKeys(initializing_parcel_number)
inputElement$sendKeys(key = "ENTER")
##xpath to the first link. Or is it?
first_link <- "/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a"
##How I'm trying to click the thing.
linkElement <- firefox$findElementByXPath("/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a")
linkElement$click()
You can do this using RSelenium. See http://johndharrison.github.io/RSelenium/ . DISCLAIMER I am the author of the RSelenium package. A basic vignette on operation can be viewed at RSelenium basics and
RSelenium: Testing Shiny apps
If you are unsure of what element is selected you can use the highlightElement utility method in the webElement class see the commented out code.
The element click event wont work in this case. You need to simulate a click using javascript:
require(RSelenium)
# RSelenium::startServer # if needed
initializing_parcel_number <- "00000000000"
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.muni.org/pw/public.html")
webElem <- remDr$findElement(using = "name", "PAR1")
# webElem$highlightElement() # to visually check what elemnet is selected
webElem$sendKeysToElement(list(initializing_parcel_number, key = "enter"))
# get first link containing javascript:getParcel
webElem <- remDr$findElement(using = "css selector", '[href*="javascript:getParcel"]')
# webElem$highlightElement() # to visually check what elemnet is selected
# send a webElement as an argument.
remDr$executeScript("arguments[0].click();", list(webElem))
#

Resources