So I'm not 100% sure this is possible, but I found a good solution in Ruby and in python, so I was wondering if something similar might work in R.
Basically, given a URL, I want to render that URL, take a screenshot of the rendering as a .png, and save the screenshot to a specified folder. I'd like to do all of this on a headless linux server.
Is my best solution here going to be running system calls to a tool like CutyCapt, or does there exist an R-based toolset that will help me solve this problem?
You can take screenshots using Selenium:
library(RSelenium)
rD <- rsDriver(browser = "phantomjs")
remDr <- rD[['client']]
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
remDr$close()
rD$server$stop()
In earlier versions, you were able to do:
library(RSelenium)
startServer()
remDr <- remoteDriver$new()
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
I haven't tested it, but this open source project seems to do exactly that: https://github.com/wch/webshot
It is a easy as:
library(webshot)
webshot("https://www.r-project.org/", "r.png")
Related
I want to scrape the US apps of Play Store, but I am in Brazil.
How can I fake my location using R? I am using Firefox.
This is my code:
urls <- c('https://play.google.com/store/apps/collection/cluster?clp=0g4jCiEKG3RvcHNlbGxpbmdfZnJlZV9BUFBMSUNBVElPThAHGAM%3D:S:ANO1ljKs-KA&gsr=CibSDiMKIQobdG9wc2VsbGluZ19mcmVlX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljL40zU',
'https://play.google.com/store/apps/collection/cluster?clp=0g4jCiEKG3RvcHNlbGxpbmdfcGFpZF9BUFBMSUNBVElPThAHGAM%3D:S:ANO1ljLdnoU&gsr=CibSDiMKIQobdG9wc2VsbGluZ19wYWlkX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljIKVpg',
'https://play.google.com/store/apps/collection/cluster?clp=0g4fCh0KF3RvcGdyb3NzaW5nX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljLe6QA&gsr=CiLSDh8KHQoXdG9wZ3Jvc3NpbmdfQVBQTElDQVRJT04QBxgD:S:ANO1ljKx5Ik',
'https://play.google.com/store/apps/collection/cluster?clp=0g4cChoKFHRvcHNlbGxpbmdfZnJlZV9HQU1FEAcYAw%3D%3D:S:ANO1ljJ_Y5U&gsr=Ch_SDhwKGgoUdG9wc2VsbGluZ19mcmVlX0dBTUUQBxgD:S:ANO1ljL4b8c',
'https://play.google.com/store/apps/collection/cluster?clp=0g4cChoKFHRvcHNlbGxpbmdfcGFpZF9HQU1FEAcYAw%3D%3D:S:ANO1ljLtt38&gsr=Ch_SDhwKGgoUdG9wc2VsbGluZ19wYWlkX0dBTUUQBxgD:S:ANO1ljJCqyI',
'https://play.google.com/store/apps/collection/cluster?clp=0g4YChYKEHRvcGdyb3NzaW5nX0dBTUUQBxgD:S:ANO1ljLhYwQ&gsr=ChvSDhgKFgoQdG9wZ3Jvc3NpbmdfR0FNRRAHGAM%3D:S:ANO1ljIKta8')
flw_rk <- vector("list", length(urls))
df_total_rk = data.frame()
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.firefox.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "firefox")
remDr$open()
for (i in urls){
remDr$navigate(i)
for(j in 1:5){
remDr$executeScript(paste("scroll(0,",j*10000,");"))
Sys.sleep(3)
}
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()
names <- html_obj %>% html_nodes(".WsMG1c.nnK0zc") %>% html_text()
flw_rk[[i]] <- data.frame(names = names, stringsAsFactors = F)
}
Just use a Virtual Private Network(VPN). No need for over-complicated solutions. I found one that is free and works best for me. Here's the link to the Google Play Store App:
https://play.google.com/store/apps/details?id=free.vpn.unblock.proxy.turbovpn
Also, You could try to download a VPN extension from the Mozilla Add-on store. Here's the link:
https://addons.mozilla.org/en-US/firefox/addon/setupvpn/
EDIT
This add-on will work for an unlimited amount of time. This is what I think will be the best choice for you now.
https://addons.mozilla.org/en-US/firefox/addon/touch-vpn/?src=search
You could just add gl=us at the end of the URL:
https://play.google.com/store/apps/collection/cluster?clp=0g4YChYKEHRvcGdyb3NzaW5nX0dBTUUQBxgD:S:ANO1ljLhYwQ&gsr=ChvSDhgKFgoQdG9wZ3Jvc3NpbmdfR0FNRRAHGAM%3D:S:ANO1ljIKta8&gl=us
This is how we solved the location issue when scraping the Play Store at SerpApi.
If you are using Linux you can spoof your location by using a proxy,
to use a proxy in linux(debian/ubuntu) do the following steps:
1.type sudo apt-get install proxychains
2.type proxychains <path to code>
Please note these steps are specific to debain and ubuntu but can be done using other operating linux sysytem's if you use the operating sytems package manager.
If you are using windows try using tor-browser which is based on firefox.Tor browser automatically sets up multiple proxys for you.However Tor is better for better for browsing and not technical(code) solutions
Another more flexible windows alternative for more technical(code) solutions is proxifier
I'm trying to get a file using web scraping but I can't find "a" and "href". The link is here. Maybe Data/File Paths above button can help but don't know how to use it. What should I do to achieve my aim?
If anyone is wondering how to run with RSelenium, they can run the following codes:
driver <- rsDriver(browser = "chrome", port = 80L, chromever = "83.0.4103.39")
rmDr <- driver[["client"]]
rmDr$navigate("https://www.borsaistanbul.com/en/data/data/debt-securities-market-data/market-data")
download <- rmDr$findElement(using = "id", value = "TextContent_C001_lbtnIslemGorenTahvilVeBonolaraIliskinBilgiler")
download$clickElement()
How can image downloading be disabled when using Firefox in Rselenium? I want to see if doing so makes a scraping script faster.
I've read the Reselnium package manual including the sections on getFirefoxProfile & makeFirefoxProfile.
I've found this link that shows how to handle chromedriver.
I can disable images for a Firefox instance that I manually open in Windows 10 but Rselenium does not appear to use that same profile.
Previously you would need to set the appropriate preference (in this case
permissions.default.image) however there is now an issue with firefox resetting this value see:
https://github.com/seleniumhq/selenium/issues/2171
a work around is given:
https://github.com/gempesaw/Selenium-Remote-Driver/issues/248
implementing this in RSelenium:
library(RSelenium)
fprof <- makeFirefoxProfile(list(permissions.default.image = 2L,
browser.migration.version = 9999L))
rD <- rsDriver(browser = "firefox", extraCapabilities = fprof)
remDr <- rD$client
remDr$navigate("http://www.google.com/ncr")
remDr$screenshot(display = TRUE)
# clean up
rm(rD)
gc()
I’m trying to automate browsing on a site with RSelenium in order to retrieve the latest planned release dates. My problem lies in that there is an age-check that pops up when I visit the URL. The page(age-check-page) concists of two buttons, which I haven’t succeeded to click on through RSelenium. The code that I use thus far is appended below, what is the solution for this problem?
#Varialble and URL
s4 <- "https://www.systembolaget.se"
#Start Server
rd <- rsDriver()
remDr <- rd[["client"]]
#Load Page
remDr$navigate(s4)
webE <- remDr$findElements("class name", "action")
webE$isElementEnabled()
webE$clickElement()
You need to more accurately target the selector:
#Varialble and URL
s4 <- "https://www.systembolaget.se"
#Start Server
rd <- rsDriver()
remDr <- rd[["client"]]
#Load Page
remDr$navigate(s4)
webE <- remDr$findElement("css", "#modal-agecheck .action.primary")
webE$clickElement()
I think this can be done but I do not know if the functionality exists. I have searched the internet ans stack high and low and can not find anything. I'd like to save www.espn.com as an image to a certain folder on my computer at a certain time of day. Is this possible? Any help would be very much appreciated.
Selenium allows you to do this. See http://johndharrison.github.io/RSelenium/ . DISCLAIMER I am the author of the RSelenium package. The image can be exported as a base64 encoded png. As an example:
# RSelenium::startServer() # start a selenium server if required
require(RSelenium)
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://espn.go.com/")
# remDr$screenshot(display = TRUE) # to display image
tmp <- paste0(tempdir(), "/tmpScreenShot.png")
base64png <- remDr$screenshot()
writeBin(base64Decode(base64png, "raw"), tmp)
The png will be saved to the file given at tmp.
A basic vignette on operation can be viewed at RSelenium basics and
RSelenium: Testing Shiny apps