RSelenium message:no such element: Unable to locate element - r

I intend to download and clean databases using RSelenium. I am able to open the link however I am having trouble downloading and opening the database. I believe the xpath is right but when I try to open I receive the following error
Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="ESTBAN_AGENCIA"]"}
My code is the following:
dir <- getwd()
file_path <- paste0(dir,"\\DataBase") %>% str_replace_all("/", "\\\\\\")
eCaps <- list(
chromeOptions =
list(prefs = list('download.default_directory' = file_path))
)
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
#Creating server
rD <- rsDriver(browser = "chrome",
chromever = "101.0.4951.15",
port = 4812L,
extraCapabilities = eCaps)
#Creating the driver to use R
remDr <- remoteDriver(
remoteServerAddr = "localhost",
browserName = "chrome",
port = 4812L)
#Open server
remDr$open()
#Navegating in the webpage of ESTABAN
remDr$navigate("https://www.bcb.gov.br/acessoinformacao/legado?url=https:%2F%2Fwww4.bcb.gov.br%2Ffis%2Fcosif%2Festban.asp")
##Download
remDr$findElement(using ="xpath", '//*[#id="ESTBAN_AGENCIA"]/option[1]')

The element you are trying to access is inside an iframe and you need switch that iframe first in order to access the element.
remDr$navigate("https://www.bcb.gov.br/acessoinformacao/legado?url=https:%2F%2Fwww4.bcb.gov.br%2Ffis%2Fcosif%2Festban.asp")
#Switch to Iframe
webElem <- remDr$findElement("css", "iframe#framelegado")
remDr$switchToFrame(webElem)
##Download
remDr$findElement(using ="xpath", '//*[#id="ESTBAN_AGENCIA"]/option[1]')

Related

Web scraping with R: findElement doesn't recognise drop down menu

I'm trying to scrape a government database with multiple dropdown menus. Using RSelenium, I've managed to click on the button taking me to the interactive database ("Sistema de Catastros de superficie frutícola regional"), and I'm now trying to click on the drop-down menus (ex: region, year), but keep getting errors that there's NoSuchElement. I've tried multiple xpaths based on inspect element and the Selector Gadget chrome extension to no avail. It looks like each of the dropdown menus is a combobox.
If helpful, my end goal is to go through each of the regions, years, and crops; scraping the table generated by each one.
library(RSelenium)
library(tidyverse)
rdriver = rsDriver(browser = "chrome", port = 9515L, chromever = "106.0.5249.61")
obj = rdriver$client
obj$navigate("https://www.odepa.gob.cl/estadisticas-del-sector/catastros-fruticolas")
link = obj$findElement(using = 'xpath', value = '//*[#id="content"]/div/div/div/div/div[1]/div[2]/div/div[2]/div[1]/div/div/div[3]/div/p[2]/a')$clickElement()
When you click on the button, a new tab appears.
You have to switch tabs with the help of
remDr$switchToWindow(remDr$getWindowHandles()[[2]])
Here is an example.
library(rvest)
library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
Sys.sleep(15)
url <- "https://www.odepa.gob.cl/estadisticas-del-sector/catastros-fruticolas"
remDr$navigate(url)
Sys.sleep(15)
web_Obj_Button <- remDr$findElement("xpath", '//*[#id="content"]/div/div/div/div/div[1]/div[2]/div/div[2]/div[1]/div/div/div[3]/div/p[2]/a')
web_Obj_Button$clickElement()
remDr$switchToWindow(remDr$getWindowHandles()[[2]])
web_Obj_Date <- remDr$findElement("css selector", "#mat-select-value-3 > span > span")
web_Obj_Date$clickElement()
remDr$screenshot(TRUE)

RSelenium: configure firefox remotedriver to use Tor network

I am trying to use RSelenium with firefox using a local proxy (Tor) on a linux machine.
I had no problem in installing Tor following this tuto, and the command line wget -qO - https://api.ipify.org; echo do get me an new IP.
Now I am willing to use firefox with RSelenium going through the Tor localhost on port 9050:
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.1:9050 *:*
LISTEN 0 128 127.0.0.1:9051 *:*
I use a standalone selenium java (selenium-server-standalone-2.53.0.jar), which work fine with regular RSelenium: here is an example getting the ip displayed on ipchicken
library(RSelenium)
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox"
)
remDr$open()
remDr$navigate("https://ipchicken.com/")
ip <- remDr$findElements(using = "css", value ='b')
print(ip[[1]]$getElementText())
And I do get my IP. Now I want to see it happen with Tor. I thus try to add the proxy option when connecting the remotedriver with firefox:
eCaps <- list("moz:firefoxOptions" = list(
args = c('--proxy-server=localhost:9050'
)))
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox",
extraCapabilities = eCaps
)
I tried '--proxy-server=localhost:9050', '--proxy-server=http://localhost:9050','--proxy-server=socks5://localhost:9050', '--proxy-server=127.0.0.1:9050', and it did not output any error and gave me my initial IP. So it is not working. The standalone says it does execute with the options: for example
22:59:10.288 INFO - Executing: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, moz:firefoxOptions={args=--proxy-server= 127.0.0.1:9050}, version=, platform=ANY}]])
22:59:10.297 INFO - Creating a new session for Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, moz:firefoxOptions={args=--proxy-server= 127.0.0.1:9050}, version=, platform=ANY}]
22:59:30.323 INFO - Done: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, moz:firefoxOptions={args=--proxy-server= 127.0.0.1:9050}, version=, platform=ANY}]]
What Am I doing wrong ?
Edit
After user1207289's answer, and after realizing that you could directly create a firefox profile in RSelenium, I tried:
eCaps <- makeFirefoxProfile(list(network.proxy.type = 1,
network.proxy.socks = "127.0.0.1",
network.proxy.socks_port = 9050,
network.proxy.socks_version = 5))
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox",
extraCapabilities = eCaps
)
I used integer for network.proxy.socks_port, network.proxy.socks_port and network.proxy.type because of this question, but tried with character also, without any success. I tried with and without network.proxy.socks_version = 5, and it did not work (I am getting my normal ip). I tried network.proxy.socks_port = 9150, but it did not work.
I also tried
eCaps <- list("moz:firefoxOptions" = list(
args = c('network.proxy.socks=127.0.0.1:9050' ,'network.proxy.type=1' )
)
)
but that did not work either.
I could connect to TOR using webdriver and firefox with the below . Just make sure TOR is installed and running. I used it on mac (catalina). You can check port settings according to your OS , in case they are different.
It is in c# but you can pretty much do it for any binding
FirefoxOptions firefoxOptions = new FirefoxOptions();
firefoxOptions.SetPreference("network.proxy.type", 1);
firefoxOptions.SetPreference("network.proxy.socks", "127.0.0.1");
firefoxOptions.SetPreference("network.proxy.socks_port", 9150);
FirefoxDriverService service = FirefoxDriverService.CreateDefaultService();
IWebDriver driver = new FirefoxDriver(service, firefoxOptions);
When this opens a firefox browser instance , Just visit https://check.torproject.org/ on the same instance to check if it is connected to TOR. And that will confirm you are connected and will show your new ip also
After A lot of searching, I found a way: RSelenium has the getFirefoxProfile function which allows to get a firefox profile.
So I first configured the profile directly from firefox following the same tuto and copied it to my R folder. Using
fprof <- getFirefoxProfile("myprofile.default")
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox",
extraCapabilities = fprof
)
Did work.

Download.File Issue with xls Excel Workbook

I'm trying to download an Excel workbook xls using R's download.file function (Windows 10, R version 3.4.4 (2018-03-15)).
When I download the file manually (using Internet Explorer or Chrome) then the file downloads and I can then open it in Excel without any problems.
When I use download.file in R, the file downloads but size is smaller than correct download file - this file is hmtl file with some notes that my browser is not supported. Tyred different modes and no luck.
My code:
download.file(
url = "https://www.atsenergo.ru/nreport?fid=696C3DB7A3F6019EE053AC103C8C8733",
destfile = "C:/MyExcel.xls",
mode = "wb",
method = "auto"
)
Solving this problem with RSelenium library. ATS site reject any query for downloading file (return .hmtl file with Required javascript enabled message) and in this case Selenium method only works. My code below (where urlList data frame with files download links):
rD <- rsDriver(port = 4444L,
browser = "chrome",
check = FALSE,
geckover = NULL,
iedrver = NULL,
phantomver = NULL)
remDr <- rD$client
for (i in 1:nrow(urlList)) {
tryCatch({
row <- urlList[i,]
remDr$navigate(row$url)
webElem <-
remDr$findElement(using =
'link text', row$FileName)
webElem$clickElement()
},
error = function(e)
logerror(paste(
substr(e, 1, 50),
atsCode,
dateFileName,
sep = "\t"
), logger = loggerName),
finally = next)
}
remDr$close()
# stop the selenium server
rD[["server"]]$stop()

Rselenium makeFirefoxProfile not implemented

I am trying to run a RSelenium instance to download some pdf files for me without having to click on the dialog boxes (or it opening using pdfjs).
But even if I set my configurations, the Firefox instance still loads the default profile.
RSelenium version: 1.73
Firefox version: 56.0 (32-bit)
Windows: 7 Ultimate
Create profile and start server:
library(RSelenium)
library(rvest)
library(XML)
library(stringi)
cprof <- makeFirefoxProfile(list(
pdfjs.disabled = TRUE,
plugin.scan.plid.all = FALSE,
plugin.scan.Acrobat = "99.0",
browser.helperApps.neverAsk.saveToDisk = 'application/pdf',
browser.download.dir = "C:\\temp")
)
remDr <- rsDriver(port = 4477L, browser = "firefox", check = FALSE, extraCapabilities = cprof)
remDr <- remDr[["client"]]
After Firefox launches I check the configs, the settings have remained in their default state:

Open new tab in RSelenium

I have the following code with which I try to open the url into a new tab every take a new url loaded from the for loop open to a new tab. What I made until know is this:
library("RSelenium")
startServer()
checkForServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.google.com/")
Sys.sleep(5)
myurllist <- c("https://cran.r-project.org/", "http://edition.cnn.com/", "https://cran.r-project.org/web/packages/")
for (i in 1:length(myurllist)) {
url <- url_list[i]
webElem <- remDr$findElement("css", "urlLink")
webElem$sendKeysToElement(list(key = "t"))
remDr$navigate(url)
Sys.sleep(5)
}
From selenium I found this answer
A new tab is opened by pressing CTRL+T, not T:
library("RSelenium")
startServer()
checkForServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.google.com/")
url_list <- c("http://edition.cnn.com/", "https://cran.r-project.org/web/packages/")
for (url in url_list) {
webElem <- remDr$findElement("css", "html")
webElem$sendKeysToElement(list(key="control", "t"))
remDr$navigate(url)
}

Resources