Get current url in RSelenium - r

I have a code in R where I am using Selenium to get URL from Google Maps. But it doesn't work:
for (i in 1:dim(Coord_t)[1])
{
remDr$navigate("https://www.google.ro/maps") # navigates to webpage
# select CNP field
option <- remDr$findElement(using='xpath', "//*[#id='searchboxinput']")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list(Coord_t$Conc[i]))
#press key
webElem <- remDr$findElement(using = 'css', "#searchbox-searchbutton")
webElem$highlightElement()
webElem$clickElement()
remDr$setTimeout(type = "page load", milliseconds = 10000)
Coord_t$Url[i] = remDr$getCurrentUrl()
}
Coord$Conc[i] it is a simple name of a restaurant, hotel etc.
I get the first URL and after I get only: "https://www.google.ro/maps".
I tried also with: webElem$getCurrentUrl() and option$getCurrentUrl() and it doesn't work.
I just can't figure it out what is wrong here.
Thank you!
EDIT: I found that the problem is with "page load". It seems that I have to set a timeout to wait the page load. I did this with setTimeout but it doesn't work.

Related

Rselenium - click on search button gives error "element is not attached to the page document"

I want to enter a zip code and click on search for a website using Rselenium
library(RSelenium)
library(netstat)
zip <- "601"
rs_driver_object <- rsDriver(browser = 'chrome',
chromever = "101.0.4951.41",
verbose = F,
port = free_port())
remDr <- rs_driver_object$client
remDr$open()
remDr$maxWindowSize()
# go to website
remDr$navigate("https://pizza.dominos.com/")
# locate the search box
address_element <- remDr$findElement(using = 'id', value = 'geo-search')
# locate the search button
button_element <- remDr$findElement(using = 'class', value = "btn-search")
# send the zip code
address_element$sendKeysToElement(list(zip))
# click on search
button_element$clickElement()
However, when I do the last step of clicking, it shows me the error:
Selenium message:stale element reference: element is not attached to the page document
(Session info: chrome=101.0.4951.67)
Error: Summary: StaleElementReference
Detail: An element command failed because the referenced element is no longer attached to the DOM.
class: org.openqa.selenium.StaleElementReferenceException
Further Details: run errorDetails method
I was able to achieve this using the following code. The error stale element reference: element is not attached to the page document usually comes when the backend HTML data is changed and the xpath you used becomes outdated.
# go to website
remdr$navigate("https://pizza.dominos.com/")
# locate the search box and enter the zip
zip <- "601"
address_element <- remdr$findElement(using = 'id', value = 'geo-search')$sendKeysToElement(list(zip))
# click on search
button_element <- remdr$findElement(using = 'xpath', value = "(//button[#class='btn-search'])[2]")$clickElement()

Using purrr:map to loop through web pages for scraping with Rselenium

I have a basic R script which I have cobbled together using Rselenium which allows me to log into a website, once authenticated my script then goes to the first page of interest and pulls 3 pieces of text from the page.
Luckily for me the URL has been created in such a way that I can pass in a vector of numbers to the URL to take me to the next page of interest hence the use of map().
While on each page I want to scrape the same 3 elements off the page and store them in a master data frame for later analysis.
I wish to use the map family of functions so that I can become more familiar with them but I am really struggling to get these to work, could anyone kindly tell me where I am going wrong?
Here is the main part of my code (go to website and log in)
library(RSelenium)
# https://stackoverflow.com/questions/55201226/session-not-created-this-version-of-chromedriver-only-supports-chrome-version-7/56173984
rd <- rsDriver(browser = "chrome",
chromever = "88.0.4324.27",
port = netstat::free_port())
remdr <- rd[["client"]]
# url of the site's login page
url <- "https://www.myWebsite.com/"
# Navigating to the page
remdr$navigate(url)
# Wait 5 secs for the page to load
Sys.sleep(5)
# Find the initial login button to bring up the username and password fields
loginbutton <- remdr$findElement(using = 'css selector','.plain')
# Click the initial login button to bring up the username and password fields
loginbutton$clickElement()
# Find the username box
username <- remdr$findElement(using = 'css selector','#username')
# Find the password box
password <- remdr$findElement(using = 'css selector','#password')
# Find the final login button
login <- remdr$findElement(using = 'css selector','#btnLoginSubmit1')
# Input the username
username$sendKeysToElement(list("myUsername"))
# Input the password
password$sendKeysToElement(list("myPassword"))
# Click login
login$clickElement()
And hey presto we're in!
Now my code takes me to the initial page of interest (index = 1)
Above I mentioned that I am looking to increment through each page and I can do this by substituting an integer into the URL at the rcId element, see below
#remdr$navigate("https://myWebsite.com/rc_redesign/#/layout/jcard/drugCard?accountId=XXXXXX&rcId=1&searchType=R&reimbCode=&searchTerm=&searchTexts=*") # Navigating to the page
For each rcId in 1:9999 I wish to grab the following 3 elements and store them in a data frame
hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs <- hcpcs_info$getElementText()[[1]]
hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
hcpcs_desc <- hcpcs_description$getElementText()[[1]]
tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
therapeutic_class <- tc_info$getElementText()[[1]]
I have tried creating a separate function and passing to map but I am not advance enough to piece this together, below is what I have tried.
my_function <- function(index) {
remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",index)
Sys.sleep(5)
hcpcs_info[index] <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs[index] <- hcpcs_info$getElementText()[index][[1]])
}
x <- 1:10 %>%
map(~ my_function(.x))
Any help would be greatly appreciated
Try the following :
library(RSelenium)
purrr::map_df(1:10, ~{
remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",.x))
Sys.sleep(5)
hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs <- hcpcs_info$getElementText()[[1]]
hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
hcpcs_desc <- hcpcs_description$getElementText()[[1]]
tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
therapeutic_class <- tc_info$getElementText()[[1]]
tibble(hcpcs, hcpcs_desc, therapeutic_class)
}) -> result
result

RSelenium: clicking on subsequent links in for loop from a Google search

I'm using RSelenium to do some simple Google searches. Setup:
library(tidyverse)
library(RSelenium) # running docker to do this
library(rvest)
library(httr)
remDr <- remoteDriver(port = 4445L, browserName = "chrome")
remDr$open()
remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")
books$sendKeysToElement(list("NHL teams", key = "enter"))
bookElem <- remDr$findElements(using = "css", "h3.LC20lb")
That's the easy part. Now, there are 10 links on that first page, and I want to click on every link, back out, and then clink the next link. What's the most efficient way to do that? I've tried the following:
bookElem$clickElement()
Returns Error: attempt to apply non-function - I expected this to click on the first link, but no good. (This works if I take the s off of findElements() - the above, not the for loop below).
clack <- lapply(bookElem, function(y) {
y$clickElement()
y$goBack()
})
Begets an error, kind of like this question:
Error: Summary: StaleElementReference
Detail: An element command failed because the referenced element is no longer attached to the DOM.
Further Details: run errorDetails method
Would it be easier to use rvest, within RSelenium?
I think you could consider grabbing the links and looping through them without going back to the main page.
In order to achieve that, you will have to grab the link elements ("a tag").
bookElems <- remDr$findElements(using = "xpath",
"//h3[#class = 'LC20lb']//parent::a")
And then extracting the "href" attribute and navigate to that:
links <- sapply(bookElems, function(bookElem){
bookElem$getElementAttribute("href")
})
for(link in links){
remDr$navigate(link)
# DO SOMETHING
}
Full code would read:
remDr$open()
remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")
books$sendKeysToElement(list("NHL teams", key = "enter"))
bookElems <- remDr$findElements(using = "xpath",
"//h3[#class = 'LC20lb']//parent::a")
links <- sapply(bookElems, function(bookElem){
bookElem$getElementAttribute("href")
})
for(link in links){
remDr$navigate(link)
# DO SOMETHING
}

RSelenium: Scraping a dynamically loaded page that loads slowly

I'm not sure if it is because my internet is slow, but I'm trying to scrape a website that loads information as you scroll down the page. I'm executing a script that goes to the end of the page, and waits for the Selenium/Chrome server to load the additional content. The server does update and load the new content, because I am able to scrape information that wasn't on the page originally and the new content shows up on the chrome viewer, but it only updates once. I set a Sys.sleep() function to wait for a minute each time so that the content will have plenty of time to load, but it still doesn't update more than once. Am I using RSelenium incorrectly? Are there other ways of scraping a site that dynamically loads?
Anyway, any kind of advice or help you can provide would be awesome.
Below is what I think is the relevant portion of my code with regards to loading the new content at the end of the page:
for(i in 1:3){
webElem <- remDr$findElement('css', 'body')
remDr$executeScript('window.scrollTo(0, document.body.scrollHeight);')
Sys.sleep(60)
}
Below is the full code:
library(RSelenium)
library(rvest)
library(stringr)
rsDriver(port = 4444L, browser = 'chrome')
remDr <- remoteDriver(browser = 'chrome')
remDr$open()
remDr$navigate('http://www.codewars.com/kata')
#find the total number of recorded katas
tot_kata <- remDr$findElement(using = 'css', '.is-gray-text')$getElementText() %>%
unlist() %>%
str_extract('\\d+') %>%
as.numeric()
#there are about 30 katas per page reload
tot_pages <- (tot_kata/30) %>%
ceiling()
#will be 1:tot_pages once I know the below code works
for(i in 1:3){
webElem <- remDr$findElement('css', 'body')
remDr$executeScript('window.scrollTo(0, document.body.scrollHeight);')
Sys.sleep(60)
}
page_source <- remDr$getPageSource()
kata_vector <- read_html(page_source[[1]]) %>%
html_nodes('.item-title a') %>%
html_attr('href') %>%
str_replace('/kata/', '')
remDr$close
The website provides an api which should be the first port of call. Failing this you can access individual pages using for example:
http://www.codewars.com/kata?page=21
If you want to scroll to the bottom of the page until there is no more content with RSelenium you can use the "Loading..." element it has a class=js-infinite-marker. While we still have this element on the page we attempt to scroll down to it every second (with some error catching for any issues). If the element is not present we assume all content is loaded:
library(RSelenium)
rD <- rsDriver(port = 4444L, browser = 'chrome')
remDr <- rD$client # You dont need to use the open method
remDr$navigate('http://www.codewars.com/kata')
chk <- FALSE
while(!chk){
webElem <- remDr$findElements("css", ".js-infinite-marker")
if(length(webElem) > 0L){
tryCatch(
remDr$executeScript("elem = arguments[0];
elem.scrollIntoView();
return true;", list(webElem[[1]])),
error = function(e){}
)
Sys.sleep(1L)
}else{
chk <- TRUE
}
}

Press the next button in google search result

I try to click to go to the next page of search results of google using the following code:
library("RSelenium")
startServer()
checkForServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://www.google.com/")
webElem <- remDr$findElement(using = "xpath", "//*/input[#id = 'lst-ib']")
webElem$sendKeysToElement(list("R Cran", "\uE007"))
webElem <- remDr$findElement(using = 'css selector', "#pnnext")
click <- webElem$getElementAttribute("href")
remDr$clickElement(click)
However I receive the following error:
Error in envRefInferField(x, what, getClass(class(x)), selfEnv) :
‘clickElement’ is not a valid field or method name for reference class “remoteDriver”
Does click next button to google search results has different code?
Using inspect I can see that the source code for the button is:
<a id="pnnext" class="pn" style="text-align:left" href="/search?q=R+Cran&biw=1366&bih=657&ei=szhxVv_NMaHMygPW4pLQDg&start=10&sa=N">
Finally what was worked for me:
library("RSelenium")
startServer()
checkForServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://www.google.com/")
webElem <- remDr$findElement(using = "xpath", "//*/input[#id = 'lst-ib']")
Sys.sleep(5)
webElem$sendKeysToElement(list("R Cran", "\uE007"))
Sys.sleep(5)
link <- remDr$executeScript("return document.getElementById('pnnext').href;")
remDr$navigate(link[[1]])
You are trying to "click" an attribute/string, which is not working the way you try it.
On this line you are grabbing a link as a string (which is not a WebElement for Selenium!)
click <- webElem$getElementAttribute("href")
and then in the next line you are trying to click this link/string via a method that actually needs a WebElement
remDr$clickElement(click)
So here is what you can try:
1) you could try to click your last WebElement directly (not doing the getAttribute):
webElem$clickElement()
or
2) you could try to navigate to the link you just got through getAttribute:
click <- webElem$getElementAttribute("href")
// change your last line to this
remDr$navigate(click)
Not sure what client are you using but it might be that you need to wait() until the ajax request finishes. visibilityOfElementLocated #pnext

Resources