How to download file from website using web scraping in R? - r

I'm trying to get a file using web scraping but I can't find "a" and "href". The link is here. Maybe Data/File Paths above button can help but don't know how to use it. What should I do to achieve my aim?

If anyone is wondering how to run with RSelenium, they can run the following codes:
driver <- rsDriver(browser = "chrome", port = 80L, chromever = "83.0.4103.39")
rmDr <- driver[["client"]]
rmDr$navigate("https://www.borsaistanbul.com/en/data/data/debt-securities-market-data/market-data")
download <- rmDr$findElement(using = "id", value = "TextContent_C001_lbtnIslemGorenTahvilVeBonolaraIliskinBilgiler")
download$clickElement()

Related

Extract reviews from Free Tours websites

My intention is to extract the reviews of the free tours that appear on these pages:
Guruwalks (https://www.guruwalk.com/es/walks/39405-free-tour-malaga-con-guias-profesionales)
Freetour.com (https://www.freetour.com/es/budapest/free-tour-budapest-imperial)
I'm working with R on Windows, but when using RSelenium it gives me an error.
My initial code is:
#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of
library(purrr) # for 'map_chr' to get reply
df_0<-data.frame(tour=character(),
dates=character(),
names=character(),
starts=character(),
reviews=character())
url_google <- list("https://www.guruwalk.com/es/walks/39405-free-tour-malaga-con-guias-profesionales")
for (apps in url_google) {
#Specifying the url for desired website to be scrapped
url <- apps
# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "firefox")
remDr$open()
require(RSelenium)
# go to website
remDr$navigate(url)
The mistake is:
Show in New Window
Error: Summary: SessionNotCreatedException
Detail: A new session could not be created.
Further Details: run errorDetails method
How can I solve it? Thank you

problem finding element on web page using RSelenium

I have been trying to get the element "Excel CSV" on a web page using the remDrv $ findElements in R software, but have not been able to achieve it. how could you call the element using the xpath, css, etc arguments?
i try:
library(RSelenium)
test_link="https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante&macropath=./RM/D28/Cal/PM25&macro=PM25.horario.horario&from=080522&to=210909&"
rD <- rsDriver(port=4446L, browser = "firefox", chromever = "92.0.4515.107") # runs a chrome browser, wait for necessary files to download
remDrv <- rD$client
#remDrv$open(silent = TRUE)
url<-test_link
remDrv$navigate(url)
remDrv$findElements(using = "xpath", "/html/body/table/tbody/tr/td/table[2]/tbody/tr[1]/td/label/span[3]/a")
link: https://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.htmlindico2.cgi?page=pageFrame&header=Talagante&macropath=./RM/D28/Cal/PM25&macro=PM25.horario.horario&from=080522&to=210909&

How can I fake my location in Firefox using R?

I want to scrape the US apps of Play Store, but I am in Brazil.
How can I fake my location using R? I am using Firefox.
This is my code:
urls <- c('https://play.google.com/store/apps/collection/cluster?clp=0g4jCiEKG3RvcHNlbGxpbmdfZnJlZV9BUFBMSUNBVElPThAHGAM%3D:S:ANO1ljKs-KA&gsr=CibSDiMKIQobdG9wc2VsbGluZ19mcmVlX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljL40zU',
'https://play.google.com/store/apps/collection/cluster?clp=0g4jCiEKG3RvcHNlbGxpbmdfcGFpZF9BUFBMSUNBVElPThAHGAM%3D:S:ANO1ljLdnoU&gsr=CibSDiMKIQobdG9wc2VsbGluZ19wYWlkX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljIKVpg',
'https://play.google.com/store/apps/collection/cluster?clp=0g4fCh0KF3RvcGdyb3NzaW5nX0FQUExJQ0FUSU9OEAcYAw%3D%3D:S:ANO1ljLe6QA&gsr=CiLSDh8KHQoXdG9wZ3Jvc3NpbmdfQVBQTElDQVRJT04QBxgD:S:ANO1ljKx5Ik',
'https://play.google.com/store/apps/collection/cluster?clp=0g4cChoKFHRvcHNlbGxpbmdfZnJlZV9HQU1FEAcYAw%3D%3D:S:ANO1ljJ_Y5U&gsr=Ch_SDhwKGgoUdG9wc2VsbGluZ19mcmVlX0dBTUUQBxgD:S:ANO1ljL4b8c',
'https://play.google.com/store/apps/collection/cluster?clp=0g4cChoKFHRvcHNlbGxpbmdfcGFpZF9HQU1FEAcYAw%3D%3D:S:ANO1ljLtt38&gsr=Ch_SDhwKGgoUdG9wc2VsbGluZ19wYWlkX0dBTUUQBxgD:S:ANO1ljJCqyI',
'https://play.google.com/store/apps/collection/cluster?clp=0g4YChYKEHRvcGdyb3NzaW5nX0dBTUUQBxgD:S:ANO1ljLhYwQ&gsr=ChvSDhgKFgoQdG9wZ3Jvc3NpbmdfR0FNRRAHGAM%3D:S:ANO1ljIKta8')
flw_rk <- vector("list", length(urls))
df_total_rk = data.frame()
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.firefox.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "firefox")
remDr$open()
for (i in urls){
remDr$navigate(i)
for(j in 1:5){
remDr$executeScript(paste("scroll(0,",j*10000,");"))
Sys.sleep(3)
}
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()
names <- html_obj %>% html_nodes(".WsMG1c.nnK0zc") %>% html_text()
flw_rk[[i]] <- data.frame(names = names, stringsAsFactors = F)
}
Just use a Virtual Private Network(VPN). No need for over-complicated solutions. I found one that is free and works best for me. Here's the link to the Google Play Store App:
https://play.google.com/store/apps/details?id=free.vpn.unblock.proxy.turbovpn
Also, You could try to download a VPN extension from the Mozilla Add-on store. Here's the link:
https://addons.mozilla.org/en-US/firefox/addon/setupvpn/
EDIT
This add-on will work for an unlimited amount of time. This is what I think will be the best choice for you now.
https://addons.mozilla.org/en-US/firefox/addon/touch-vpn/?src=search
You could just add gl=us at the end of the URL:
https://play.google.com/store/apps/collection/cluster?clp=0g4YChYKEHRvcGdyb3NzaW5nX0dBTUUQBxgD:S:ANO1ljLhYwQ&gsr=ChvSDhgKFgoQdG9wZ3Jvc3NpbmdfR0FNRRAHGAM%3D:S:ANO1ljIKta8&gl=us
This is how we solved the location issue when scraping the Play Store at SerpApi.
If you are using Linux you can spoof your location by using a proxy,
to use a proxy in linux(debian/ubuntu) do the following steps:
1.type sudo apt-get install proxychains
2.type proxychains <path to code>
Please note these steps are specific to debain and ubuntu but can be done using other operating linux sysytem's if you use the operating sytems package manager.
If you are using windows try using tor-browser which is based on firefox.Tor browser automatically sets up multiple proxys for you.However Tor is better for better for browsing and not technical(code) solutions
Another more flexible windows alternative for more technical(code) solutions is proxifier

Rselenium - How to disable images in Firefox profile

How can image downloading be disabled when using Firefox in Rselenium? I want to see if doing so makes a scraping script faster.
I've read the Reselnium package manual including the sections on getFirefoxProfile & makeFirefoxProfile.
I've found this link that shows how to handle chromedriver.
I can disable images for a Firefox instance that I manually open in Windows 10 but Rselenium does not appear to use that same profile.
Previously you would need to set the appropriate preference (in this case
permissions.default.image) however there is now an issue with firefox resetting this value see:
https://github.com/seleniumhq/selenium/issues/2171
a work around is given:
https://github.com/gempesaw/Selenium-Remote-Driver/issues/248
implementing this in RSelenium:
library(RSelenium)
fprof <- makeFirefoxProfile(list(permissions.default.image = 2L,
browser.migration.version = 9999L))
rD <- rsDriver(browser = "firefox", extraCapabilities = fprof)
remDr <- rD$client
remDr$navigate("http://www.google.com/ncr")
remDr$screenshot(display = TRUE)
# clean up
rm(rD)
gc()

Automatically take and download a screenshot in Shiny [duplicate]

So I'm not 100% sure this is possible, but I found a good solution in Ruby and in python, so I was wondering if something similar might work in R.
Basically, given a URL, I want to render that URL, take a screenshot of the rendering as a .png, and save the screenshot to a specified folder. I'd like to do all of this on a headless linux server.
Is my best solution here going to be running system calls to a tool like CutyCapt, or does there exist an R-based toolset that will help me solve this problem?
You can take screenshots using Selenium:
library(RSelenium)
rD <- rsDriver(browser = "phantomjs")
remDr <- rD[['client']]
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
remDr$close()
rD$server$stop()
In earlier versions, you were able to do:
library(RSelenium)
startServer()
remDr <- remoteDriver$new()
remDr$open()
remDr$navigate("http://www.r-project.org")
remDr$screenshot(file = tf <- tempfile(fileext = ".png"))
shell.exec(tf) # on windows
I haven't tested it, but this open source project seems to do exactly that: https://github.com/wch/webshot
It is a easy as:
library(webshot)
webshot("https://www.r-project.org/", "r.png")

Resources