NoSuchElementException scraping ESPN with RSelenium - css

I'm using R (and RSelenium) to scrape data from ESPN. It's not the first time I use it, but in this case I'm getting an error and I can't sort this out.
Consider this page: http://en.espn.co.uk/premiership-2011-12/rugby/match/142562.html
Let's try to scrape the timeline. If I inspect the page I get the css selector
#liveLeft
As usual, I go with
checkForServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
and the page correctly loads. So far so good. Now when I try to get the nodes with
div<- remDr$findElement(using = 'css selector','#liveLeft')
I get back
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
I'm puzzled. I tried also with Xpath and doesn't work. I also tried to get different elements of the page with no luck. The only selector that gives something back is
#scrumContent

From the comments.
The element resides in an iframe and as such the element isnt available to select. This is shown when using js in the console in chrome with document.getElementById('liveLeft'). When on the full page it will return null, i.e. element doesn't exist, even though it is clearly visible. To get around this simply load the iframe instead.
If you inspect the page you will see the scr for the iframe is /premiership-2011-12/rugby/current/match/142562.html?view=scorecard, from the example provided. Navigating to this page instead of the 'full' page will allow the element to be 'visible' and as such selectable to RSelenium.
checkForServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/current/match/",matchId,".html?view=scorecard")
# Amend url to return iframe
remDr$navigate(url)
div<- remDr$findElement(using = 'css selector','#liveLeft')
UPDATE
If it would be more applicable to load the iframe contents in a variable and then traverse through that then the following example shows this.
document.getElementById('liveLeft') # Will return null as iframe has seperate DOM
var doc = document.getElementById('win_old').contentDocument # Loads iframe DOM elements in the variable doc
doc.getElementById('liveLeft') # Will now return the desired element.

Generally with Selenium when you have a webpage with frames/iframes you need to use the switchToFrame method of the remoteDriver class:
library(RSelenium)
selServ <- startServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
# check the iframes
iframes <- htmlParse(remDr$getPageSource()[[1]])["//iframe", fun = function(x){xmlGetAttr(x, "id")}]
# iframes[[3]] == "win_old" contains the data switch to this frame
remDr$switchToFrame(iframes[[3]])
# check you can access the element
div<- remDr$findElement(using = 'css selector','#liveLeft')
div$highlightElement()
# get data
ifSource <- htmlParse(remDr$getPageSource()[[1]])
out <- readHTMLTable(ifSource["//div[#id = 'liveLeft']"][[1]], header = TRUE)

Related

Cannot find elements with onclick (R Selenium)

I am trying to use RSelenium to navigate this page: https://championsleague.len.eu/calendar/
For some reason, I can't seem to be able to find any element on the page. Firstly, the selector gadget does not work on it.
In addition, when I use the developer tools to grab the class or xpath of an object that I want to click (for example, let's say I want on the DAY 10 button of the calendar), the findElements function always returns an empty list.
remDr$navigate("https://championsleague.len.eu/calendar")
#using CSS selector
remDr$findElements("css selector", '#tblBottoniDay > div')
#using xpath
remDr$findElements("xpath", '//*[#id="tblBottoniDay"]/div')
Does anyone have an idea of what I can do to solve this problem?
Thank you very much.
You are missing a delay here.
Before accessing element on the page you need to wait for these elements to be completely loaded.
The simplest way is to add a delay there, like this:
remDr$navigate("https://championsleague.len.eu/calendar")
Sys.sleep(5)
remDr$findElements("css selector", '#tblBottoniDay > div')
The more preferred way is to use Expected Conditions to wait for elements visibility
The item(DAY) you want to click is in iframe first we shall swtich to iframe and use fullxpath to click the item.
#launch browser
library(RSelenium)
driver <- rsDriver(browser = "chrome")
remDr<-driver[["client"]]
url = "https://championsleague.len.eu/calendar"
#navigate
remDr$navigate(url)
#accept cookie
remDr$findElement(using = "xpath",'//*[#id="cmplz-cookiebanner-container"]/div/div[6]/button[1]')$clickElement()
#swtich to iframe
webElem <- remDr$findElements(using = "xpath", value = ' //*[#id="advanced_iframe"]')
remDr$switchToFrame(webElem[[1]])
#now we shall click on DAY12
remDr$findElement(using = "xpath",'/html/body/div[1]/div[1]/div/div/div/div[2]/div/div[2]/div[12]/div/span')$clickElement()
#note that last number in xpath represents the day
The element with text as DAY 10 is within an <frame> so you have to switchToFrame and you can use the following Locator Strategies:
Using xpath:
webElem <- remDr$findElement(using = "id", value = "advanced_iframe")
remDr$switchToFrame(webElem$elementId)
remDr$findElement("xpath", "//span[text()='DAY 10']")
References
Package ‘RSelenium’

Nodes from a website are not scraping the content

I have tried to scrape the content of a news website ('titles', 'content', etc) but the nodes I am using do not return the content.
I have tried different nodes/tags, but none of them seem to be working. I have also used the SelectorGadget without any result. I have used the same strategy for scraping other websites and it has worked with no issues.
Here is an example of trying to get the 'content'
library(rvest)
url_test <- read_html('https://lasillavacia.com/silla-llena/red-de-la-paz/historia/las-disidencias-son-fruto-de-anos-de-division-interna-de-las-farc')
content_test <- html_text(html_nodes(url_test, ".article-body-mt-5"))
I have also tried using the xpath instead of the css class with no results.
Here is an example of trying to get the 'date'
content_test <- html_text(html_nodes(url_test, ".article-date"))
Even if I try to scrape all the <h>from the website page, for example, I do also get character(0)
What can be the problem? Thanks for any help!
Since the content is loaded by javascript to the page, I used RSelenium to scrape the data and it worked
library(RSelenium)
#Setting the remote browser
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100",
port = 4444L,
browserName = "chrome")
remDr$open()
url_test <- 'https://lasillavacia.com/silla-llena/red-de-la-paz/historia/las-disidencias-son-fruto-de-anos-de-division-interna-de-las-farc'
remDr$navigate(url_test)
#Checking if the website page is loaded
remDr$screenshot(display = TRUE)
#Getting the content
content_test <- remDr$findElements(using = "css selector", value = '.article-date')
content_test <- sapply(content_test, function(x){x$getElementText()})
> content_test
[[1]]
[1] "22 de Septiembre de 2018"
Two things.
Your css selector is wrong. It should have been:
".article-body.mt-5"
The data is dynamically loaded and returned as json. You can find the endpoint in the network tab. No need for overhead of using selenium.
library(jsonlite)
data <- jsonlite::read_json('https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/68077&_format=hal_json')
body is html so you could use html parser. The following is a simple text dump. You would refine with node selection.
library(rvest)
read_html(data[[1]]$body) %>% html_text()

Scrolling to an element and clicking on it

I am trying to web scrape the data from the Flipkart site. The link for the webpage is as follows:
https://www.flipkart.com/mi-a1-black-64-gb/product-reviews/itmexnsrtzhbbneg?aid=overall&pid=MOBEX9WXUSZVYHET
I need to automate navigation to the NEXT page by clicking on NEXT button the webpage. Below is the code I'm using
nextButton <-remDr$findElement(value ='//div[#class="_2kUstJ"]')$clickElement()
Error
Selenium message:Element is not clickable at point
I even tried scrolling the webpage as suggested by many stackoverflow questions using the below code
remDr$executeScript("arguments[0].scrollIntoView(true);", nextButton)
But this code is also giving error as
Error in checkError(res) : Undefined error in httr call. httr output: No method for S4 class:webElement
Kindly suggest the solution. I'm using firefox browser and selenium to automate using R programming.
If you do not mind using Chrome driver, the following code worked:
eCaps <- list(chromeOptions = list(
args = c('--headless', '--disable-gpu', '--window-size=1880,1000', "--no-sandbox", "--disable-dev-shm-usage")
))
remDr <- rsDriver(port = 4565L,browser = "chrome",extraCapabilities = eCaps)
remCl <- remDr[["client"]]
remCl$navigate("https://www.flipkart.com/mi-a1-black-64-gb/product-reviews/itmexnsrtzhbbneg?aid=overall&pid=MOBEX9WXUSZVYHET")
remCl$findElement(using = "css selector", "._3fVaIS > span:nth-child(1)")$clickElement()
We shall first scroll to the end of the page and then click Next.
#Navigate to webpage
remDr$navigate("https://www.flipkart.com/mi-a1-black-64-gb/product-reviews/itmexnsrtzhbbneg?aid=overall&pid=MOBEX9WXUSZVYHET")
#Scroll to the end
webElem <- remDr$findElement("css", "html")
webElem$sendKeysToElement(list(key="end"))
#click on Next
remDr$findElement(using = "xpath", '//*[#id="container"]/div/div[3]/div/div/div[2]/div[13]/div/div/nav/a[11]/span')$clickElement()

Using R as my browser how can I log into http://games.espn.go.com/ffl/signin and scrape my FFL Team HTML tables?

I have been trying everything I can find online to log in and set cookies and certificates.... can't seem to get past the redirect to a login screen.
Here is what I am trying to do:
##################################################
library("RCurl")
library("XML")
loginURL <- "http://games.espn.go.com/ffl/signin"
dataURL <- "http://games.espn.go.com/ffl/clubhouse?leagueId=123456&teamId=8&seasonId=2014"
# ESPN Fantasy Football Login Screen
userID <- dQuote("myUsername")
pword <-dQuote("myPassword")
pushbutton <- dQuote("OK")
# concatenate the url and log in options
FFLsigninURL <- paste(loginURL ,
"&username=",userID,
"&password=",pword,
"&submit=",pushbutton)
page <- getURL(loginURL , verbose = TRUE)
and this seems to be leading me to a redirect for logging in - so Problem 1 - login not working
Part 2- one logged in - How can I proceed to the dataURL to scrape the tables? I tried login parameters on the data page as well but still get redirected to a login screen.
I'm sure I am missing something simple - just not seeing it...
It should be possible to follow location etc using RCurl alternatively you could use selenium and drive a browser:
library(RSelenium)
loginURL <- "http://games.espn.go.com/ffl/signin"
user <- 'myPass'
pass <- 'myUser'
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(loginURL)
webElem <- remDr$findElement('name', 'username')
webElem$sendKeysToElement(list(user))
webElem <- remDr$findElement('name', 'password')
webElem$sendKeysToElement(list(pass))
remDr$findElement('name', 'submit')$clickElement()
dataURL <- "http://games.espn.go.com/ffl/clubhouse?leagueId=123456&teamId=8&seasonId=2014"
remDr$navigate(dataURL)
# YOU can get the page source for example
pageSrc <- remDr$getPageSource()[[1]]
# now operate on pageSrc using for example library(XML) etc
# readHTMLTable(pageSrc) # for example
remDr$close()
remDr$closeServer()

How do I click a javascript "link"? Is it my xpath or my relenium/selenium usage?

Like the beginning to any problem before I post it on stack overflow I think I have tried everything. This is a learning experience for me on how to work with javascript and xml so I'm guessing my problem is there.
My question is how to get the results of clicking on the parcel number links that are javascript links? I've tried getting the xpath of the link and using the $click method which following my intuition but this wasn't right or is at least not working for me.
Firefox 26.0
R 3.0.2
require(relenium)
library(XML)
library(stringr)
initializing_parcel_number <- "00000000000"
firefox <- firefoxClass$new()
firefox$get("http://www.muni.org/pw/public.html")
inputElement <- firefox$findElementByXPath("/html/body/form[2]/table/tbody/tr[2]/td/table[1]/tbody/tr[3]/td[4]/input[1]")
inputElement$sendKeys(initializing_parcel_number)
inputElement$sendKeys(key = "ENTER")
##xpath to the first link. Or is it?
first_link <- "/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a"
##How I'm trying to click the thing.
linkElement <- firefox$findElementByXPath("/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a")
linkElement$click()
You can do this using RSelenium. See http://johndharrison.github.io/RSelenium/ . DISCLAIMER I am the author of the RSelenium package. A basic vignette on operation can be viewed at RSelenium basics and
RSelenium: Testing Shiny apps
If you are unsure of what element is selected you can use the highlightElement utility method in the webElement class see the commented out code.
The element click event wont work in this case. You need to simulate a click using javascript:
require(RSelenium)
# RSelenium::startServer # if needed
initializing_parcel_number <- "00000000000"
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.muni.org/pw/public.html")
webElem <- remDr$findElement(using = "name", "PAR1")
# webElem$highlightElement() # to visually check what elemnet is selected
webElem$sendKeysToElement(list(initializing_parcel_number, key = "enter"))
# get first link containing javascript:getParcel
webElem <- remDr$findElement(using = "css selector", '[href*="javascript:getParcel"]')
# webElem$highlightElement() # to visually check what elemnet is selected
# send a webElement as an argument.
remDr$executeScript("arguments[0].click();", list(webElem))
#

Resources