Specify download folder in RSelenium - r

I am using RSelenium to navigate towards a webpage which contains a button to download a file. I use RSelenium to click this button which downloads the file. However, the files are by default downloaded in my folder 'downloads', whereas I want to file to be downloaded in my working directory. I tried specifying a chrome profile as below but this did not seem to do the job:
wd <- getwd()
cprof <- getChromeProfile(wd, "Profile 1")
remDr <- remoteDriver(browserName= "chrome", extraCapabilities = cprof)
The file is still downloaded in the folder 'downloads', rather than my working directory. How can this be solved?

The solution involves setting the appropriate chromeOptions outlined at https://sites.google.com/a/chromium.org/chromedriver/capabilities . Here is an example on a windows 10 box:
library(RSelenium)
eCaps <- list(
chromeOptions =
list(prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = "C:/temp/chromeDL"
)
)
)
rD <- rsDriver(extraCapabilities = eCaps)
remDr <- rD$client
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(#href, 'zip')]")
firstzip$clickElement()
> list.files("C:/temp/chromeDL")
[1] "peace.zip"

I've been trying the alternatives, and it seems that #Bharath's first comment about giving up on fiddling with the prefs (it doesn't seem possible to do that) and instead moving the file from the default download folder to the desired folder is the way to go. The trick to making this a portable solution is finding where the default download directory is—of course it varies by os (which you can get like so)—and you need to find the user's username too:
desired_dir <- "~/Desktop/cool_downloads"
file_name <- "whatever_I_downloaded.zip"
# build path to chrome's default download directory
if (Sys.info()[["sysname"]]=="Linux") {
default_dir <- file.path("home", Sys.info()[["user"]], "Downloads")
} else {
default_dir <- file.path("", "Users", Sys.info()[["user"]], "Downloads")
}
# move the file to the desired directory
file.rename(file.path(default_dir, file_name), file.path(desired_dir, file_name))

Look this alternative way.
Your download folder should be empty.
# List the files inside the folder
down.list <- list.files(path = "E:/Downloads/",all.files = T,recursive = F)
# Move all files to specific folder
file.rename(from = paste0("E:/Downloads/",down.list),to = paste0("E:/1/scrape/",down.list))

Related

R pagedown/shinyapps error - Google Chrome cannot be found

I've been trying to use the pagedown package to create a pdf from a html which is better formatted and more presentable from a shinyapp. It tested fine locally on my machine. After deployment, the document failed to download with the following error from the logs:
Warning: Error in find_chrome: Cannot find Chromium or Google Chrome
[No stack trace available]
I am using Google Chrome to view the app but I am guessing there's more to it than this?
The code for the download is as follows:
output$downloadDrug <- downloadHandler(
filename = function() {("drug-instructions.pdf)},
content = function(file) {
src <- normalizePath('report_drugHTML.Rmd')
src2 <- normalizePath("printout.css")
label <- normalizePath("pxLabel.png")
logo <-normalizePath("logo.png")
fasting <- normalizePath("fasting.png")
owd <- setwd(tempdir())
on.exit(setwd(owd))
file.copy(src, 'report_drugHTML.Rmd', overwrite = TRUE)
file.copy(src2, "printout.css", overwrite = TRUE)
file.copy(label, "pxLabel.png", overwrite = TRUE)
file.copy(logo, "logo.png", overwrite = TRUE)
file.copy(fasting, "fasting.png", overwrite = TRUE)
library(rmarkdown)
out <- render('report_drugHTML.Rmd',
params = list(name = input$px_name, dob = input$dob),
'html_document')
library(pagedown)
out <- pagedown::chrome_print(out, "drug-instructions.pdf")
file.rename(out, file)
}
)
Any suggestions would be greatly appreciated. I am still a novice so please feel free to point out the obvious.

Web scraping PDF files from a map

I've been trying to download pdfs embedded in a map following this code (original one can be found here). Each pdf refers to a brazilian municipality (5,570 files).
library(XML)
library(RCurl)
url <- "http://simec.mec.gov.br/sase/sase_mapas.php?uf=RJ&tipoinfo=1"
page <- getURL(url)
parsed <- htmlParse(page)
links <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds <- grep("*.pdf", links)
links <- links[inds]
regex_match <- regexpr("[^/]+$", links, perl=TRUE)
destination <- regmatches(links, regex_match)
for(i in seq_along(links)){
download.file(links[i], destfile=destination[i])
Sys.sleep(runif(1, 1, 5))
}
I already used this code in other projects a few times and it worked. For this specific case, it doesn't. In fact, I've tried many things to scrape these files but it seems impossible to me. Recently, I got the following link. Then it makes possible to combine uf (state) and muncod (municipal code) to download the file, but I dont know how to include this to the code though.
http://simec.mec.gov.br/sase/sase_mapas.php?uf=MT&muncod=5100102&acao=download
Thanks in advance!
devtools::install_github("ropensci/RSelenium")
library(rvest)
library(httr)
library(RSelenium)
# connect to selenium server from within r (REPLACE SERVER ADDRESS)
rem_dr <- remoteDriver(
remoteServerAddr = "192.168.50.25", port = 4445L, browserName = "firefox"
)
rem_dr$open()
# get the two-digit state codes for brazil by scraping the below webpage
tables <- "https://en.wikipedia.org/wiki/States_of_Brazil" %>%
read_html() %>%
html_table(fill = T)
states <- tables[[4]]$Abbreviation
# for each state, we are going to go navigate to the map of that state using
# selenium, then scrape the list of possible municipality codes from the drop
# down menu present in the map
get_munip_codes <- function(state) {
url <- paste0("http://simec.mec.gov.br/sase/sase_mapas.php?uf=", state)
rem_dr$navigate(url)
# have to wait until the drop down menu loads. 8 seconds will be enough time
# for each state
Sys.sleep(8)
src <- rem_dr$getPageSource()
out <- read_html(src[[1]]) %>%
html_nodes(xpath = "//select[#id='muncod']/option[boolean(#value)]") %>%
xml_attrs("value") %>%
unlist(use.names = F)
print(state)
out
}
state_munip <- sapply(
states, get_munip_codes, USE.NAMES = TRUE, simplify = FALSE
)
# now you can download each pdf. first create a directory for each state, where
# the pdfs for that state will go:
lapply(names(state_munip), function(x) dir.create(file.path("brazil-pdfs", x)))
# ...then loop over each state/municipality code and download the pdf
lapply(
names(state_munip), function(state) {
lapply(state_munip[[state]], function(munip) {
url <- sprintf(
"http://simec.mec.gov.br/sase/sase_mapas.php?uf=%s&muncod=%s&acao=download",
state, munip
)
file <- file.path("brazil-pdfs", state, paste0(munip, ".pdf"))
this_one <- paste0("state ", state, ", munip ", munip)
tryCatch({
GET(url, write_disk(file, overwrite = TRUE))
print(paste0(this_one, " downloaded"))
},
error = function(e) {
print(paste0("couldn't download ", this_one))
try(unlink(file, force = TRUE))
}
)
})
}
)
STEPS:
Get the IP address of your windows machine (see https://www.digitalcitizen.life/find-ip-address-windows)
start selenium server docker container by running this:
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.1
start rocker/tidyverse docker container by running this:
docker run -v `pwd`/brazil-pdfs:/home/rstudio/brazil-pdfs -dp 8787:8787 rocker/tidyverse
Go into your preferred browser and enter this address: http://localhost:8787 ...This will take you to the login screen for rstudio server. login using the username "rstudio" and password "rstudio"
Copy/paste the code shown above in a new Rstudio .R document. Replace the value for remoteServerAddr with the IP address you found in step 1.
Run the code...this should write the pdfs to a directory "brazil-pdfs" that is both inside the container and mapped to your windows machine (in other words, the pdfs will show up in the brazil-pdfs dir on your local machine as well). note, it takes a while to run the code b/c there are a lot of pdfs.

Download.File Issue with xls Excel Workbook

I'm trying to download an Excel workbook xls using R's download.file function (Windows 10, R version 3.4.4 (2018-03-15)).
When I download the file manually (using Internet Explorer or Chrome) then the file downloads and I can then open it in Excel without any problems.
When I use download.file in R, the file downloads but size is smaller than correct download file - this file is hmtl file with some notes that my browser is not supported. Tyred different modes and no luck.
My code:
download.file(
url = "https://www.atsenergo.ru/nreport?fid=696C3DB7A3F6019EE053AC103C8C8733",
destfile = "C:/MyExcel.xls",
mode = "wb",
method = "auto"
)
Solving this problem with RSelenium library. ATS site reject any query for downloading file (return .hmtl file with Required javascript enabled message) and in this case Selenium method only works. My code below (where urlList data frame with files download links):
rD <- rsDriver(port = 4444L,
browser = "chrome",
check = FALSE,
geckover = NULL,
iedrver = NULL,
phantomver = NULL)
remDr <- rD$client
for (i in 1:nrow(urlList)) {
tryCatch({
row <- urlList[i,]
remDr$navigate(row$url)
webElem <-
remDr$findElement(using =
'link text', row$FileName)
webElem$clickElement()
},
error = function(e)
logerror(paste(
substr(e, 1, 50),
atsCode,
dateFileName,
sep = "\t"
), logger = loggerName),
finally = next)
}
remDr$close()
# stop the selenium server
rD[["server"]]$stop()

Always set working directory to Dropbox folder on any machine

I wanted to have a simple code at the beginning of my scripts to set the working directory to my Dropbox folder, regardless of which machine I run my code on:
setdir <- function(){
wandir <- paste(path.expand("~"), "/Dropbox/_R", sep = "")
curdir <- getwd()
if(curdir!=wandir){
setwd(wandir)
}
}
setdir()
The trick with the path.expand("~") works on Linux machines, but it doesn't on Windows machines, because it leads to C:/Users/username/Documents instead of C:/Users/username/. Is there a function that would work globally?
Here is a hacky workaround, which is far from a global one:
setdir <- function(){
wandir <- paste(path.expand("~"), "/Dropbox/_R", sep = "")
wandir <- sub("/Documents", "", wandir)
curdir <- getwd()
if(curdir!=wandir){
setwd(wandir)
}
}
setdir()

R & figshare: corrupt zip when downloaded in R

I'm trying to make my research reproducible storing the data at figshare.
Something strange happens when I download and unzip the data in R.
here is the zip
If I download it manually, it opens ok; but when I try to get it with an R script, the downloaded archive is corrupt. Any ideas where is the problem?
the code to reproduce my error
url <- 'https://ndownloader.figshare.com/files/4797355'
path <- 'test/missing_data_raw.zip'
ifelse(file.exists(path1), yes = 'file alredy exists', no = download.file(url1, path1))
unzip(zipfile = path1,exdir = 'test')
Try setting the download mode to binary explicitly:
url <- 'https://ndownloader.figshare.com/files/4797217'
path1 <- tempfile(fileext = ".zip")
if (file.exists(path1)) 'file alredy exists' else download.file(url, path1, mode="wb")
unzip(zipfile = path1,exdir = tempdir())

Resources