Click on cross domain iframe element using Rselenium - r

I am using R, version 3.3.2. Using Rselenium package, I am trying to scrap some data from this website: http://www.dziv.hr/en/e-services/on-line-database-search/patents/
I am using Rselenium and my code looks like this:
selServ <- RSelenium::startServer(javaargs = c("-Dwebdriver.gecko.driver=\"C:/Users/Mislav/Documents/geckodriver.exe\""))
remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open()
Sys.sleep(2)
# Simulate browser session and fill out form
remDr$navigate("http://www.dziv.hr/hr/e-usluge/pretrazivanje-baza-podataka/patent/")
This doesn't work:
webel <- remDr$findElement(using = "xpath", "/input[#id = 'TB1']")
Then I wanted to swith to iframe using switchToFrame() function, but the iframe does not contain id.
Then I have tr to use index: webel <- remDr$switchToFrame(1) but this just return NULL
Also, I recognized, iframe has different domain.
Is it possible to svrap data from this web site?

You can just select the first iframe and pass it to the switchToFrame method:
webElem <- remDr$findElements("css", "iframe")
remDr$switchToFrame(webElem[[1]])
webel <- remDr$findElement(using = "xpath", "//input[#id = 'TB1']")

Related

How to get new generated url after using clickElement function from RSelenium package

I would like to know how to get a url after using the clickElement function from R's RSelenium package.
Here's an example:
library(RSelenium)
rD <- rsDriver(browser = c("chrome"),
chromever = "98.0.4758.102",
#extraCapabilities = list(chromeOptions = list(args = list("--headless"))),
port = 4580L)
driver <- rD[["client"]]
urll <- "https://www.zapimoveis.com.br/venda/fazendas-sitios-chacaras/ms+campo-grande/?pagina=1"
driver$navigate(urll)
linkimovdescr <- driver$findElement(using = "xpath",
"/html/body/main/section[1]/div[2]/div[3]/section/div/div[1]")
linkimovdescr$clickElement()
Here's the question! How to get the address "https://www.zapimoveis.com.br/imovel/venda-fazenda-sitio-chacara-parque-do-sol-campo-grande-ms-240m2-id-2531139106/"
Note: The linkimovdescr$getCurrentUrl() or driver$getCurrentUrl() command does not answer my question, as it keeps pointing to the home page.
Thanks for any help.
As new tab opens when you click the item, thus we need to switch tabs in order to get url.
1.Clicking the item
linkimovdescr <- driver$findElement(using = "xpath",
"/html/body/main/section[1]/div[2]/div[3]/section/div/div[1]")
linkimovdescr$clickElement()
2.Now get the list of all the tabs by getWindowHandles
df = driver$getWindowHandles()
3.Now switch to the second tab
driver$switchToWindow(df[[2]])
4.Get the url,
driver$getCurrentUrl()
[[1]]
[1] "https://www.zapimoveis.com.br/imovel/venda-fazenda-sitio-chacara-zona-rural-campo-grande-ms-30000m2-id-2552129433/"

Using purrr:map to loop through web pages for scraping with Rselenium

I have a basic R script which I have cobbled together using Rselenium which allows me to log into a website, once authenticated my script then goes to the first page of interest and pulls 3 pieces of text from the page.
Luckily for me the URL has been created in such a way that I can pass in a vector of numbers to the URL to take me to the next page of interest hence the use of map().
While on each page I want to scrape the same 3 elements off the page and store them in a master data frame for later analysis.
I wish to use the map family of functions so that I can become more familiar with them but I am really struggling to get these to work, could anyone kindly tell me where I am going wrong?
Here is the main part of my code (go to website and log in)
library(RSelenium)
# https://stackoverflow.com/questions/55201226/session-not-created-this-version-of-chromedriver-only-supports-chrome-version-7/56173984
rd <- rsDriver(browser = "chrome",
chromever = "88.0.4324.27",
port = netstat::free_port())
remdr <- rd[["client"]]
# url of the site's login page
url <- "https://www.myWebsite.com/"
# Navigating to the page
remdr$navigate(url)
# Wait 5 secs for the page to load
Sys.sleep(5)
# Find the initial login button to bring up the username and password fields
loginbutton <- remdr$findElement(using = 'css selector','.plain')
# Click the initial login button to bring up the username and password fields
loginbutton$clickElement()
# Find the username box
username <- remdr$findElement(using = 'css selector','#username')
# Find the password box
password <- remdr$findElement(using = 'css selector','#password')
# Find the final login button
login <- remdr$findElement(using = 'css selector','#btnLoginSubmit1')
# Input the username
username$sendKeysToElement(list("myUsername"))
# Input the password
password$sendKeysToElement(list("myPassword"))
# Click login
login$clickElement()
And hey presto we're in!
Now my code takes me to the initial page of interest (index = 1)
Above I mentioned that I am looking to increment through each page and I can do this by substituting an integer into the URL at the rcId element, see below
#remdr$navigate("https://myWebsite.com/rc_redesign/#/layout/jcard/drugCard?accountId=XXXXXX&rcId=1&searchType=R&reimbCode=&searchTerm=&searchTexts=*") # Navigating to the page
For each rcId in 1:9999 I wish to grab the following 3 elements and store them in a data frame
hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs <- hcpcs_info$getElementText()[[1]]
hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
hcpcs_desc <- hcpcs_description$getElementText()[[1]]
tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
therapeutic_class <- tc_info$getElementText()[[1]]
I have tried creating a separate function and passing to map but I am not advance enough to piece this together, below is what I have tried.
my_function <- function(index) {
remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",index)
Sys.sleep(5)
hcpcs_info[index] <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs[index] <- hcpcs_info$getElementText()[index][[1]])
}
x <- 1:10 %>%
map(~ my_function(.x))
Any help would be greatly appreciated
Try the following :
library(RSelenium)
purrr::map_df(1:10, ~{
remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",.x))
Sys.sleep(5)
hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs <- hcpcs_info$getElementText()[[1]]
hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
hcpcs_desc <- hcpcs_description$getElementText()[[1]]
tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
therapeutic_class <- tc_info$getElementText()[[1]]
tibble(hcpcs, hcpcs_desc, therapeutic_class)
}) -> result
result

Zoom out of website when using RSelenium without changing page size/resolution

I'd like to zoom out of RSelenium remote, but this is surprisingly difficult to find a solution.
I am aware of
How to zoom out page using RSelenium library in R?, but I woud like to not adjust the page size or the resolution---simply zoom out.
I have pondered sending control + substract simultaneously, but this is also not quite working. I have taken a look at How to press two keys simultaneously (i.e., control-s) in a webpage using RSelenium?, which as the OP noted, control + a worked but not control + s, and subsequently control + substract didn't work either.
I also tried the Unicode method specified in How to send simultaneous keys in RSelenium ALT+S to web driver?, which did not work either.
library(RSelenium)
driver <- rsDriver()
remDr <- driver[["client"]]
remDr$navigate("https://www.google.com/")
webElem <- remDr$findElement("css", "html")
webElem$sendKeysToElement(list(key = "control", "-")) ## Does not work
webElem$sendKeysToElement(list(key = "control", key = "subtract")) ## Does not work
The browser is Chrome.
I don't think the problem is with sending the keys to the browser, since as noted in the linked posts, it is possible to send control + a to the browser window to select elements. It seems rather that the keys are not being recognised as commands to the browser application.
There are other ways round this problem however.
As #Muzzamil suggests, you can get a similar effect with changing the css of the document body using Chrome, though this doesn't work in Firefox.
If you want to natively change the browser zoom in a way that persists throughout the session, I can demonstrate solutions using Firefox and Chrome, since in both cases one can navigate to the html-based options page and interact with it to set the browser zoom level.
Here's how you do it with Firefox:
library(RSelenium)
zoom_firefox <- function(client, percent)
{
store_page <- client$getCurrentUrl()[[1]]
client$navigate("about:preferences")
webElem <- client$findElement("css", "#defaultZoom")
webElem$clickElement()
webElem$sendKeysToElement(list(as.character(percent)))
webElem$sendKeysToElement(list(key = "return"))
client$navigate(store_page)
}
This allows the following:
driver <- rsDriver(browser = "firefox")
client <- driver$client
client$navigate("https://www.google.com")
client$screenshot(display = TRUE)
We can see the default zoom is set (100%):
Now we zoom out to 50% like this:
zoom_firefox(client, 50)
client$screenshot(display = TRUE)
And zoom back in like this:
zoom_firefox(client, 100)
client$screenshot(display = TRUE)
It's harder with Chrome because its options page uses a complex, nested shadow DOM. Since we can't get an xpath or css selectors to navigate a shadow dom, we need to extract the element's webdriver id using javascript and then force this Id onto another web element which we can then control.
zoom_chrome <- function(client, percent)
{
store_page <- client$getCurrentUrl()[[1]]
client$navigate("chrome://settings/")
webElemId <- client$executeScript(paste0("return document.querySelector",
"(\"body > settings-ui\").",
"shadowRoot.querySelector(\"#main\")",
".shadowRoot.querySelector",
"(\"settings-basic-page\")",
".shadowRoot.querySelector",
"(\"#basicPage > ",
"settings-section:nth-child(8)",
"> settings-appearance-page\")",
".shadowRoot.querySelector",
"(\"#zoomLevel\");"),
args = list("dummy"))
webElem <- client$findElement("css", "html")
webElem#.xData$elementId <- as.character(webElemId)
webElem$clickElement()
webElem$sendKeysToElement(list("3"))
zooms <- c(25, 33, 50, 67, 75, 8:11 * 10, 125, 150, 175, 200, 250, 3:5 * 100)
desired_zoom <- which.min(abs(percent - zooms))
current_zoom <- which(zooms == 300)
n_keys <- desired_zoom - current_zoom
if(n_keys > 0)
for(i in seq(n_keys))
webElem$sendKeysToElement(list(key = "down_arrow"))
if(n_keys < 0)
for(i in seq(abs(n_keys)))
webElem$sendKeysToElement(list(key = "up_arrow"))
webElem$sendKeysToElement(list(as.character(percent)))
webElem$sendKeysToElement(list(key = "return"))
client$navigate(store_page)
}
But it works in the same way:
driver <- rsDriver(browser = "chrome", chromever = "80.0.3987.106")
client <- driver$client
client$navigate("https://www.google.com")
client$screenshot(display = TRUE)
zoom_chrome(client, 50)
client$screenshot(display = TRUE)
zoom_chrome(client, 100)
client$screenshot(display = TRUE)
Which gives exactly the same results as firefox.
Of course, you could easily write a simple wrapper function that selects which "zoom" function to call based on the current browser.
I have not looked into implementing this in internet explorer or phantomjs since they do not have html-based options pages.
You can try Zoom out with Java script. Please try below code for zoom out at 90%.
library(RSelenium)
driver <- rsDriver()
remDr <- driver[["client"]]
remDr$navigate("https://www.google.com/")
webElem <- remDr$findElement("css", "html")
script <- "document.body.style.zoom='90%'"
remDr$executeScript(script, args = list())

How to send ctrl+click+i to RSelenium

Basically in the title. I would like to use the keyboard shortcut for inspect element within the webdriver. I have tried:
remDr <- driver$client
webEl <- remDr$findElement(using = 'xpath',"/html/body/div[1]/div/div/div[2]/button[2]")
webEl$sendKeysToElement(sendKeys = list("\ue008\ue009","i"))
and also with
webEl$sendKeysToElement(sendKeys = list("\ue008\ue009""i"))
But nothing at all happens.
Thanks.

RSelenium: clicking on subsequent links in for loop from a Google search

I'm using RSelenium to do some simple Google searches. Setup:
library(tidyverse)
library(RSelenium) # running docker to do this
library(rvest)
library(httr)
remDr <- remoteDriver(port = 4445L, browserName = "chrome")
remDr$open()
remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")
books$sendKeysToElement(list("NHL teams", key = "enter"))
bookElem <- remDr$findElements(using = "css", "h3.LC20lb")
That's the easy part. Now, there are 10 links on that first page, and I want to click on every link, back out, and then clink the next link. What's the most efficient way to do that? I've tried the following:
bookElem$clickElement()
Returns Error: attempt to apply non-function - I expected this to click on the first link, but no good. (This works if I take the s off of findElements() - the above, not the for loop below).
clack <- lapply(bookElem, function(y) {
y$clickElement()
y$goBack()
})
Begets an error, kind of like this question:
Error: Summary: StaleElementReference
Detail: An element command failed because the referenced element is no longer attached to the DOM.
Further Details: run errorDetails method
Would it be easier to use rvest, within RSelenium?
I think you could consider grabbing the links and looping through them without going back to the main page.
In order to achieve that, you will have to grab the link elements ("a tag").
bookElems <- remDr$findElements(using = "xpath",
"//h3[#class = 'LC20lb']//parent::a")
And then extracting the "href" attribute and navigate to that:
links <- sapply(bookElems, function(bookElem){
bookElem$getElementAttribute("href")
})
for(link in links){
remDr$navigate(link)
# DO SOMETHING
}
Full code would read:
remDr$open()
remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")
books$sendKeysToElement(list("NHL teams", key = "enter"))
bookElems <- remDr$findElements(using = "xpath",
"//h3[#class = 'LC20lb']//parent::a")
links <- sapply(bookElems, function(bookElem){
bookElem$getElementAttribute("href")
})
for(link in links){
remDr$navigate(link)
# DO SOMETHING
}

Resources