How to upload files when sendKeysToElement does not work - r

Using RSelenium, I am trying to upload several image files in this page, which uses OCR to create Excel files from image files.
The CSS selector for the button "choose new files" is "button.btn:nth-child(2)", so I try to upload my image files using the following code:
images <- list.files(path = image_path, full.names = TRUE) %>% as.list()
remDr <<- remoteDriver(remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox")
remDr$open()
remDr$navigate("https://www.table-reader.com/image-to-excel")
fileupload <- remDr$findElement(using = "css selector", "button.btn:nth-child(2)")
fileupload$sendKeysToElement(images)
remDr$close()
But I just hear a "pop" kind of sound from firefox and nothing happens.
Could this site be blocking the use of Selenium?

Related

problem finding element on web page using RSelenium

I have been trying to get the element "Excel CSV" on a web page using the remDrv $ findElements in R software, but have not been able to achieve it. how could you call the element using the xpath, css, etc arguments?
i try:
library(RSelenium)
test_link="https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante&macropath=./RM/D28/Cal/PM25&macro=PM25.horario.horario&from=080522&to=210909&"
rD <- rsDriver(port=4446L, browser = "firefox", chromever = "92.0.4515.107") # runs a chrome browser, wait for necessary files to download
remDrv <- rD$client
#remDrv$open(silent = TRUE)
url<-test_link
remDrv$navigate(url)
remDrv$findElements(using = "xpath", "/html/body/table/tbody/tr/td/table[2]/tbody/tr[1]/td/label/span[3]/a")
link: https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante&macropath=./RM/D28/Cal/PM25&macro=PM25.horario.horario&from=080522&to=210909&

Rselenium - Save page as

My goal is to download an image from a URL. In my case I can't use download.file because my picture is in a web page requiring login and it has some java scripts running in the background before the real image gets visible. This is why I need to do it using RSelenium package.
As suggested here, I've built a docker container with a standalone-chrome tag. Output from Docker terminal:
$ docker-machine ip
192.168.99.100
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c651dab3a948 selenium/standalone-chrome:3.4.0 "/opt/bin/entry_po..." 24 hours ago Up 24 hours 0.0.0.0:4445->4444/tcp cranky_kalam
Here's what I've tried:
require(RSelenium)
# Avoid download prompt to pop up and parsing default download folder
eCaps <- list(
chromeOptions =
list(prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = "C:/temp/Pictures"
)
)
)
# Open connection
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100",port = 4445L,browserName="chrome",extraCapabilities = eCaps)
remDr$open()
# Navigate to desired URL with picture
url <- "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
remDr$navigate(url)
remDr$screenshot(display = TRUE) # Everything looks fine here
# Move mouse to the page's center
webElem <- remDr$findElement(using = 'xpath',value = '/html/body')
remDr$mouseMoveToLocation(webElement = webElem)
# Right click and
remDr$click(2)
remDr$screenshot(display = TRUE) # I don't see the right-click dialog!
# Try to move right-click dialog to 'Save as' or 'Save image as'
remDr$sendKeysToActiveElement(list(key = 'down_arrow',
key = 'down_arrow',
key = 'enter'))
### NOTHING HAPPENS
I've tried to play around with the amount of key = 'down_arrow' and every time I look into C:/temp/Pictures nothing has been saved.
Please note that this is just an example and I know I could have downloaded this picture with download.file. I need a solution with RSelenium for my real case.
I tried using remDr$click(buttonId = 2) to perform Right click but to no avail. Thus, one workaround to save the image would be extracting links from the webpage and using download.file to download it.
#navigate
url <- "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
remDr$navigate(url)
#get the link of image
link = remDr$getPageSource()[[1]] %>%
read_html() %>% html_nodes('img') %>%
html_attr('src')
[1] "https://www.google.be/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
#download using download.file in your current working directory.
download.file(link, basename(url), method = 'curl')

Rselenium - How to disable images in Firefox profile

How can image downloading be disabled when using Firefox in Rselenium? I want to see if doing so makes a scraping script faster.
I've read the Reselnium package manual including the sections on getFirefoxProfile & makeFirefoxProfile.
I've found this link that shows how to handle chromedriver.
I can disable images for a Firefox instance that I manually open in Windows 10 but Rselenium does not appear to use that same profile.
Previously you would need to set the appropriate preference (in this case
permissions.default.image) however there is now an issue with firefox resetting this value see:
https://github.com/seleniumhq/selenium/issues/2171
a work around is given:
https://github.com/gempesaw/Selenium-Remote-Driver/issues/248
implementing this in RSelenium:
library(RSelenium)
fprof <- makeFirefoxProfile(list(permissions.default.image = 2L,
browser.migration.version = 9999L))
rD <- rsDriver(browser = "firefox", extraCapabilities = fprof)
remDr <- rD$client
remDr$navigate("http://www.google.com/ncr")
remDr$screenshot(display = TRUE)
# clean up
rm(rD)
gc()

Webscrape w/ Rselenium and Rvest from dropdown box where id changes

I am looking to scrape some NBA date from the website numberfire at: https://www.numberfire.com/nba/daily-fantasy/daily-basketball-projections
I am trying to go into a drop down box and switch the displayed data from Fanduel to Draftkings. The 1st encountered problem is that the web page does not change with the changes to the that pull down menu. I installed and am successfully running selenium to counter this. However the next problem has been that the id for this pull down menu (and the id for all pull down menus) on this site changes with each refresh. This is causing an error in R as it says there is "NoSuchElement", as it cannot lock on to the proper menu box when it goes to the page.
Is there a way with RSelenium to or another package to fix this?
Here is my code in R:
require(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445, browserName = "chrome")
remDr$open()
remDr$navigate("https://www.numberfire.com/nba/daily-fantasy/daily-basketball-projections")
iframe <- remDr$findElement(using='id', value="select2-dy8e-container")
remDr$switchToFrame(iframe)
option <- remDr$findElement(using = 'xpath', "//*/option[#value = 'DraftKings']")
option$clickElement()
option
Update after doing a lot of searching on nonstatic Id's I came up with this and it worked:
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445, browserName = "chrome")
remDr$open()
remDr$navigate("https://www.numberfire.com/nba/daily-fantasy/daily-basketball-projections")
webElem <- remDr$findElement('xpath', '//*[(#class = "dropdown-custom dfs-option select2-hidden-accessible")]/option[#value = "4"]')
webElem$clickElement()

Retrieve data from a web page table using RSelenium

I am trying to scrape the annual maximum flow data from this National River Flow Archive (UK) website:
http://nrfa.ceh.ac.uk/data/station/info/69032
using RSelenium.
I can't find a way to negotiate the drop down menu. At present I can semi-automate the process using:
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "firefox", platform = "LINUX")
remDr$open()
i <- "69032"
remDr$navigate(paste0("http://nrfa.ceh.ac.uk/data/station/peakflow/", i))
# read the raw html and parse
doc<-htmlParse(remDr$getPageSource()[[1]])
peak.flows <- as.numeric(readHTMLTable(doc)$tablesorter[, "Flow (m3/s)"])
This is a bit of a hack and involves me having to click a few buttons on the page rather than getting RSelenium to do it. Any suggestions as to how RSelenium can select the "Peak flow data" tab and then the "Maximum Annual (AMAX) data" option from the drop-down menu?
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "firefox", platform = "LINUX")
remDr$open() i <- "69032"
remDr$navigate(paste0("http://nrfa.ceh.ac.uk/data/station/peakflow/", i))
remDr$findElement(using="css selector",'.selected a')$clickElement()
Sys.sleep(5)
remDr$findElement(using = "css selector", "#selectDataType")$clickElement()
remDr$findElement(using = "css selector", "#selectDataType")$sendKeysToElement(list(key="down_arrow", key="enter"))
Sys.sleep(2)`
If you want to know about the css id of the element of interest, please install [SELECTOR GADGET] plugin into chrome. Highlight the element you want RSelenium to click, then grab the css id.

Resources