I am trying to use RSelenium to navigate this page: https://championsleague.len.eu/calendar/
For some reason, I can't seem to be able to find any element on the page. Firstly, the selector gadget does not work on it.
In addition, when I use the developer tools to grab the class or xpath of an object that I want to click (for example, let's say I want on the DAY 10 button of the calendar), the findElements function always returns an empty list.
remDr$navigate("https://championsleague.len.eu/calendar")
#using CSS selector
remDr$findElements("css selector", '#tblBottoniDay > div')
#using xpath
remDr$findElements("xpath", '//*[#id="tblBottoniDay"]/div')
Does anyone have an idea of what I can do to solve this problem?
Thank you very much.
You are missing a delay here.
Before accessing element on the page you need to wait for these elements to be completely loaded.
The simplest way is to add a delay there, like this:
remDr$navigate("https://championsleague.len.eu/calendar")
Sys.sleep(5)
remDr$findElements("css selector", '#tblBottoniDay > div')
The more preferred way is to use Expected Conditions to wait for elements visibility
The item(DAY) you want to click is in iframe first we shall swtich to iframe and use fullxpath to click the item.
#launch browser
library(RSelenium)
driver <- rsDriver(browser = "chrome")
remDr<-driver[["client"]]
url = "https://championsleague.len.eu/calendar"
#navigate
remDr$navigate(url)
#accept cookie
remDr$findElement(using = "xpath",'//*[#id="cmplz-cookiebanner-container"]/div/div[6]/button[1]')$clickElement()
#swtich to iframe
webElem <- remDr$findElements(using = "xpath", value = ' //*[#id="advanced_iframe"]')
remDr$switchToFrame(webElem[[1]])
#now we shall click on DAY12
remDr$findElement(using = "xpath",'/html/body/div[1]/div[1]/div/div/div/div[2]/div/div[2]/div[12]/div/span')$clickElement()
#note that last number in xpath represents the day
The element with text as DAY 10 is within an <frame> so you have to switchToFrame and you can use the following Locator Strategies:
Using xpath:
webElem <- remDr$findElement(using = "id", value = "advanced_iframe")
remDr$switchToFrame(webElem$elementId)
remDr$findElement("xpath", "//span[text()='DAY 10']")
References
Package ‘RSelenium’
Related
I am trying to scrape a web page with a JavaScript drop down menu in R. I can follow the directions listed here, but nothing happens and no errors are shown. Instead, it gives an empty list:
dropdown <- remDr$findElement(using = "id", "s2id_autogen4_search")
remDr$executeScript("arguments[0].setAttribute('class','select2-input select2-focused');", list(dropdown))
> list()
Also, nothing happens (and no console output) with dropdown$clickElement().
This is somewhat related to this post, but I need to click first to activate the drop down.
There was a mask over it, so I needed to find the mask, click on that, then supply arguments to the dropdown itself:
dropdown <- remDr$findElement(using = "id", "s2id_autogen4_search")
mask <- remDr$findElement(using = "xpath", "//*[#id='select2-drop-mask']")
mask$clickElement()
dropdown$sendKeysToElement(list("l"))
I'm using R (and RSelenium) to scrape data from ESPN. It's not the first time I use it, but in this case I'm getting an error and I can't sort this out.
Consider this page: http://en.espn.co.uk/premiership-2011-12/rugby/match/142562.html
Let's try to scrape the timeline. If I inspect the page I get the css selector
#liveLeft
As usual, I go with
checkForServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
and the page correctly loads. So far so good. Now when I try to get the nodes with
div<- remDr$findElement(using = 'css selector','#liveLeft')
I get back
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
I'm puzzled. I tried also with Xpath and doesn't work. I also tried to get different elements of the page with no luck. The only selector that gives something back is
#scrumContent
From the comments.
The element resides in an iframe and as such the element isnt available to select. This is shown when using js in the console in chrome with document.getElementById('liveLeft'). When on the full page it will return null, i.e. element doesn't exist, even though it is clearly visible. To get around this simply load the iframe instead.
If you inspect the page you will see the scr for the iframe is /premiership-2011-12/rugby/current/match/142562.html?view=scorecard, from the example provided. Navigating to this page instead of the 'full' page will allow the element to be 'visible' and as such selectable to RSelenium.
checkForServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/current/match/",matchId,".html?view=scorecard")
# Amend url to return iframe
remDr$navigate(url)
div<- remDr$findElement(using = 'css selector','#liveLeft')
UPDATE
If it would be more applicable to load the iframe contents in a variable and then traverse through that then the following example shows this.
document.getElementById('liveLeft') # Will return null as iframe has seperate DOM
var doc = document.getElementById('win_old').contentDocument # Loads iframe DOM elements in the variable doc
doc.getElementById('liveLeft') # Will now return the desired element.
Generally with Selenium when you have a webpage with frames/iframes you need to use the switchToFrame method of the remoteDriver class:
library(RSelenium)
selServ <- startServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
# check the iframes
iframes <- htmlParse(remDr$getPageSource()[[1]])["//iframe", fun = function(x){xmlGetAttr(x, "id")}]
# iframes[[3]] == "win_old" contains the data switch to this frame
remDr$switchToFrame(iframes[[3]])
# check you can access the element
div<- remDr$findElement(using = 'css selector','#liveLeft')
div$highlightElement()
# get data
ifSource <- htmlParse(remDr$getPageSource()[[1]])
out <- readHTMLTable(ifSource["//div[#id = 'liveLeft']"][[1]], header = TRUE)
I am trying to scrape a textbox value from the URL in the code. I picked the css using slector gadget. It is not able to capture the content in the text box. Tested several other CSS toobut the textbox value is not captured.
Text box is : construction year
Please help . Below is the code for reference.
url = "https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848"
values = list()
remDr$navigate(url)
page_source<-remDr$getPageSource()
a = read_html(page_source[[1]])
= html_nodes(a,"#ctl00_mainContentPlaceholder_txtConstructionYear_iu")
values = html_text(html_main_node)
values
Thanks in advance
Why RSelenium? It scrapes fine with rvest (though it is a horrible SharePoint site which may cause problems down the end with maintaining the proper view state cookies).
library(rvest)
pg <- html_session("https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848")
html_attr(html_nodes(pg, "input#ctl00_mainContentPlaceholder_txtConstructionYear_iu"), "value")
## [1] 1987
You should be grabbing the value attribute vs the node text. This should work in the your selenium code, too.
The above answer also works. But if you are only trying to use RSelenium. Here is the code
library(RSelenium)
checkForServer()
startServer()
Sys.sleep(5)
re<-remoteDriver()
re$open()
re$navigate("https://www.ncspo.com/FIS/dbBldgAsset_public.aspx?BldgAssetID=8848")
re$findElement(using = "css selector", "#ctl00_mainContentPlaceholder_txtConstructionYear_iu")$clickElement()
text<-unlist(re$findElement(using = "css selector", "#ctl00_mainContentPlaceholder_txtConstructionYear_iu")$getElementAttribute("value"))
This works
I'm trying to download a spreadsheet from this website using RSelenium. The first code I made was:
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://observatorios.dieese.org.br/ws/tabela/porto-alegre/bairros/numero-de-estabelecimentos-formais-por-grande-setor-de-atividade-economica")
remDr$executeScript("return baixarArquivo(1)")
And it works! But I want download the entire data (ie. all years), so I need check the years checkbox (Filtros -> Anos). I can do this in 2 ways:
Selecting the checkbox 'Ano(s)', and automatically select all years
Selecting all years manually
I've tried both ways, but didn't work. The 'best' result I got was:
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://observatorios.dieese.org.br/ws/tabela/porto-alegre/bairros/numero-de-estabelecimentos-formais-por-grande-setor-de-atividade-economica")
webElem <- remDr$findElement(using = 'id', value = 'anos')
remDr$executeScript("visualizar('filtros', true)")
remDr$executeScript("visualizarAnos()")
chkbox <- remDr$findElement(using = 'xpath', "//input[#name='inputAno'][#type='checkbox']")
chkbox$clickElement()
remDr$executeScript("return submeter()")
remDr$executeScript("return baixarArquivo(1)")
But this uncheck the first year (2012) (It's my best result because it's the onlyone that do something :( )
So, the question is: how can I solve this problem?
In your best result attempt, you are trying to get all the checkboxes within anos but are calling findElement. That is why only 2012 is being clicked, because findElement is returning the first element it can find that satisfies your xpath, //input[#name='inputAno'][#type='checkbox'].
You could fix your solution by using findElements like so:
sapply
(
remDr$findElements(using = 'xpath', "//input[#name='inputAno'][#type='checkbox']"),
function(element){ element$clickElement() }
)
Alternatively, you could search for the select all checkbox with a css selector and click just that:
selectAll <- remDr$findElement(using = 'css selector', '#anos > #alternar')
selectAll$clickElement()
Like the beginning to any problem before I post it on stack overflow I think I have tried everything. This is a learning experience for me on how to work with javascript and xml so I'm guessing my problem is there.
My question is how to get the results of clicking on the parcel number links that are javascript links? I've tried getting the xpath of the link and using the $click method which following my intuition but this wasn't right or is at least not working for me.
Firefox 26.0
R 3.0.2
require(relenium)
library(XML)
library(stringr)
initializing_parcel_number <- "00000000000"
firefox <- firefoxClass$new()
firefox$get("http://www.muni.org/pw/public.html")
inputElement <- firefox$findElementByXPath("/html/body/form[2]/table/tbody/tr[2]/td/table[1]/tbody/tr[3]/td[4]/input[1]")
inputElement$sendKeys(initializing_parcel_number)
inputElement$sendKeys(key = "ENTER")
##xpath to the first link. Or is it?
first_link <- "/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a"
##How I'm trying to click the thing.
linkElement <- firefox$findElementByXPath("/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a")
linkElement$click()
You can do this using RSelenium. See http://johndharrison.github.io/RSelenium/ . DISCLAIMER I am the author of the RSelenium package. A basic vignette on operation can be viewed at RSelenium basics and
RSelenium: Testing Shiny apps
If you are unsure of what element is selected you can use the highlightElement utility method in the webElement class see the commented out code.
The element click event wont work in this case. You need to simulate a click using javascript:
require(RSelenium)
# RSelenium::startServer # if needed
initializing_parcel_number <- "00000000000"
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.muni.org/pw/public.html")
webElem <- remDr$findElement(using = "name", "PAR1")
# webElem$highlightElement() # to visually check what elemnet is selected
webElem$sendKeysToElement(list(initializing_parcel_number, key = "enter"))
# get first link containing javascript:getParcel
webElem <- remDr$findElement(using = "css selector", '[href*="javascript:getParcel"]')
# webElem$highlightElement() # to visually check what elemnet is selected
# send a webElement as an argument.
remDr$executeScript("arguments[0].click();", list(webElem))
#