How can image downloading be disabled when using Firefox in Rselenium? I want to see if doing so makes a scraping script faster.
I've read the Reselnium package manual including the sections on getFirefoxProfile & makeFirefoxProfile.
I've found this link that shows how to handle chromedriver.
I can disable images for a Firefox instance that I manually open in Windows 10 but Rselenium does not appear to use that same profile.
Previously you would need to set the appropriate preference (in this case
permissions.default.image) however there is now an issue with firefox resetting this value see:
https://github.com/seleniumhq/selenium/issues/2171
a work around is given:
https://github.com/gempesaw/Selenium-Remote-Driver/issues/248
implementing this in RSelenium:
library(RSelenium)
fprof <- makeFirefoxProfile(list(permissions.default.image = 2L,
browser.migration.version = 9999L))
rD <- rsDriver(browser = "firefox", extraCapabilities = fprof)
remDr <- rD$client
remDr$navigate("http://www.google.com/ncr")
remDr$screenshot(display = TRUE)
# clean up
rm(rD)
gc()
Related
I want to take a screenshot of an entire webpage using RSelenium. I have this code working:
library(RSelenium)
driver <- rsDriver(browser = "firefox")
remdriv <- driver$client
remdriv$navigate("https://stackoverflow.com/questions/73115385/counting-all-elements-row-wise-in-list-column")
remdriv$screenshot(file = "post.png")
But when I run this I get a screenshot of exactly what the driver's browser is showing, like this:
What I want is the full-length screenshot of the entire webpage. What can I do to capture that within RSelenium or another R tool?
In the end I want it to look like this:
I think you have to scroll down and take multiple screenshots, then combine these multiple screenshots into a single one. I haven't managed yet to zoom out yet, which would be another option.
The scrolling doesn't work perfectly and you need to loop and finish up the script, but I hope this is useful to start with:
# Load packages
if (!require(pacman)) {install.packages("pacman")}
pacman::p_load(here, RSelenium, stringr)
# Stop currently running server
if(exists("rD")) suppressWarnings(rD$server$stop())
# Load RSelenium
rD <- rsDriver(browser = "chrome", chromever = "106.0.5249.61", port = 4567L)
remDr <- rD[["client"]]
link <- "https://stackoverflow.com/questions/73115385/counting-all-elements-row-wise-in-list-column"
remDr$open()
remDr$navigate(link)
# Get browser height and width
browser_height <- remDr$executeScript("return document.body.offsetHeight;")[[1]]
browser_width <- remDr$executeScript("return document.body.offsetWidth;")[[1]]
remDr$getWindowSize()
remDr$setWindowSize(browser_width, browser_height)
remDr$getWindowSize()
# This is what actually can be displayed
browser_final_height <- remDr$getWindowSize()$height # 1175
# Get inner window height and width
inner_window_height <- remDr$executeScript("return window.innerHeight")[[1]]
inner_window_width <- remDr$executeScript("return window.innerWidth")[[1]]
# Check how many xtimes the inner window fits in what should be the document size
num_screen <- (browser_height / inner_window_height)
# Move to top of window
remDr$executeScript("window.scrollBy(0, -5000);")
# Scroll down (loop from here to end)
remDr$executeScript(str_c("window.scrollBy(0, ", inner_window_height, ");"))
# Take screenshot
remDr$screenshot(file = here("results", "screenshots", "ex2.png"))
# Close server
remDr$close()
rD$server$stop()
According to this answer https://stackoverflow.com/a/72793082/2554330, there are some bugs in the latest version of chromedriver that have been fixed in the version that works with Google Chrome Beta, so I'd like to try the beta.
This answer https://stackoverflow.com/a/65975577/2554330 shows how to run Google Chrome Beta from Javascript. I'd like to do the same from RSelenium, but I can't spot an equivalent of chrome_options.binary_location.
How do I specify the Chrome location when using RSelenium?
Try following codes:
cPath <- "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"
ecap <- list(chromeOptions = list("binary" = cPath))
remDr <- remoteDriver(browserName = "chrome", extraCapabilities = ecap)
remDr$open()
Note that startServer() func is now defunct.
These were obtained from this comment by the author of RSelenium.
Try this, the chromedriver is wherever you place it and the beta browser is wherever it gets installed. It's been a long time since I used r/selenium, so slashes maybe the wrong way
require(RSelenium)
RSelenium::startServer(args = c("-Dwebdriver.chrome.driver=C:\\Users\\me\\Documents\\chromedriver.exe")
, log = FALSE, invisible = FALSE)
remDr <- remoteDriver(
browserName = "chrome",
extraCapabilities = list("chrome.binary" = "C:\\Program Files\\ChromeBeta\\chrome.exe")
)
remDr$open()
head(remDr$sessionInfo)
I have been trying to get the element "Excel CSV" on a web page using the remDrv $ findElements in R software, but have not been able to achieve it. how could you call the element using the xpath, css, etc arguments?
i try:
library(RSelenium)
test_link="https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante¯opath=./RM/D28/Cal/PM25¯o=PM25.horario.horario&from=080522&to=210909&"
rD <- rsDriver(port=4446L, browser = "firefox", chromever = "92.0.4515.107") # runs a chrome browser, wait for necessary files to download
remDrv <- rD$client
#remDrv$open(silent = TRUE)
url<-test_link
remDrv$navigate(url)
remDrv$findElements(using = "xpath", "/html/body/table/tbody/tr/td/table[2]/tbody/tr[1]/td/label/span[3]/a")
link: https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante¯opath=./RM/D28/Cal/PM25¯o=PM25.horario.horario&from=080522&to=210909&
I use RSelenium to run a scraping loop which sometimes (infrequently) meets an error and then stops.
The problem for me is that when this happens and I don't check in on the RSelenium session for a while (for like half an hour or so..?), the RSelenium session closes automatically, which removes logs from the session that I want to check.
How can I stop this from happening -- or more precisely, how can I prevent the RSelenium session (and the Firefox browser opened from RSelenium) from closing when left idle for an extended time period?
The following is how I start the scraping -- I open the Firefox browser like this, then go to the URL that I want and then start scraping.
library(RSelenium)
# Running with the browser open ------------------------------------------------
rD <- RSelenium::rsDriver(port = 4454L, browser = "firefox")
remDr <- rD$client
remDr$open()
P.S. Just to clarify, it's okay that the scraping stops once in a while -- that's how I can check for loopholes that I am missing. What I need is a way for me to stop the RSelenium session from closing when left idle. Thank you in advance for any help you can give!
Found a similar issue, https://github.com/ropensci/RSelenium/issues/241
chrome_prefs =
list(
# chrome prefs
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE
)
chrome_args =
c(
# chrome command arguments
'--headless',
'--window-size=1200,1800',
'-sessionTimeout 57868143'
)
eCaps_notimeout =
list(chromeOptions =
list(
prefs = chrome_prefs,
args = chrome_args
))
remDr <- remoteDriver(
browserName = "chrome",
extraCapabilities = eCaps_withhead
)
Further reference Is there a way too prevent selenium automatically terminating idle sessions?
So I'm not 100% sure this is possible, but I found a good solution in Ruby and in python, so I was wondering if something similar might work in R.
Basically, given a URL, I want to render that URL, take a screenshot of the rendering as a .png, and save the screenshot to a specified folder. I'd like to do all of this on a headless linux server.
Is my best solution here going to be running system calls to a tool like CutyCapt, or does there exist an R-based toolset that will help me solve this problem?
You can take screenshots using Selenium:
library(RSelenium)
rD <- rsDriver(browser = "phantomjs")
remDr <- rD[['client']]
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
remDr$close()
rD$server$stop()
In earlier versions, you were able to do:
library(RSelenium)
startServer()
remDr <- remoteDriver$new()
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
I haven't tested it, but this open source project seems to do exactly that: https://github.com/wch/webshot
It is a easy as:
library(webshot)
webshot("https://www.r-project.org/", "r.png")