I'm trying to download an Excel workbook xls using R's download.file function (Windows 10, R version 3.4.4 (2018-03-15)).
When I download the file manually (using Internet Explorer or Chrome) then the file downloads and I can then open it in Excel without any problems.
When I use download.file in R, the file downloads but size is smaller than correct download file - this file is hmtl file with some notes that my browser is not supported. Tyred different modes and no luck.
My code:
download.file(
url = "https://www.atsenergo.ru/nreport?fid=696C3DB7A3F6019EE053AC103C8C8733",
destfile = "C:/MyExcel.xls",
mode = "wb",
method = "auto"
)
Solving this problem with RSelenium library. ATS site reject any query for downloading file (return .hmtl file with Required javascript enabled message) and in this case Selenium method only works. My code below (where urlList data frame with files download links):
rD <- rsDriver(port = 4444L,
browser = "chrome",
check = FALSE,
geckover = NULL,
iedrver = NULL,
phantomver = NULL)
remDr <- rD$client
for (i in 1:nrow(urlList)) {
tryCatch({
row <- urlList[i,]
remDr$navigate(row$url)
webElem <-
remDr$findElement(using =
'link text', row$FileName)
webElem$clickElement()
},
error = function(e)
logerror(paste(
substr(e, 1, 50),
atsCode,
dateFileName,
sep = "\t"
), logger = loggerName),
finally = next)
}
remDr$close()
# stop the selenium server
rD[["server"]]$stop()
Related
I am trying to run a RSelenium instance to download some pdf files for me without having to click on the dialog boxes (or it opening using pdfjs).
But even if I set my configurations, the Firefox instance still loads the default profile.
RSelenium version: 1.73
Firefox version: 56.0 (32-bit)
Windows: 7 Ultimate
Create profile and start server:
library(RSelenium)
library(rvest)
library(XML)
library(stringi)
cprof <- makeFirefoxProfile(list(
pdfjs.disabled = TRUE,
plugin.scan.plid.all = FALSE,
plugin.scan.Acrobat = "99.0",
browser.helperApps.neverAsk.saveToDisk = 'application/pdf',
browser.download.dir = "C:\\temp")
)
remDr <- rsDriver(port = 4477L, browser = "firefox", check = FALSE, extraCapabilities = cprof)
remDr <- remDr[["client"]]
After Firefox launches I check the configs, the settings have remained in their default state:
I need to download some csv files from "http://www.elections.state.md.us".
And here is my code.
url <- "http://www.elections.state.md.us/elections/2012/election_data/index.html"
# recognize the links
links <- getHTMLLinks(url)
filenames <- links[str_detect(links,"_General.csv")]
filenames_list <- as.list(filenames)
filenames
# create a function
downloadcsv <- function(filename,baseurl,folder){
dir.create(folder,showWarnings = FALSE)
fileurl <- str_c(baseurl,filename)
if(!file.exists(str_c(folder,"/",filename))){
download.file(fileurl,
destfile = str_c(folder,"/",filename))
# 1 sec delay between files
Sys.sleep(1)
}
}
library(plyr)
l_ply(filenames_list,downloadcsv,
baseurl = "www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
The error comes out as :
Error in download.file(fileurl, destfile = str_c(folder, "/",
filename)) : scheme not supported in URL 'www.elections.state.md.us/elections/2012/election_data/State_Congressional_Districts_2012_General.csv'
However, when I try to paste the url into the IE and it did work. So what is the problem of my code?
Any idea would be helpful,Thx.
It turns out that the url must start with a scheme such as http://, https://, ftp:// or file://. So in the last line, I changed the code to
l_ply(filenames_list,downloadcsv,
baseurl = "http://www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
And it works.
My download code stopped working as my code stopped passing "extraCapabilities" properly.
This is what used to work:
require(RSelenium)
require(XML)
require(data.table)
source(file.path(find.package("RSelenium"), "examples/serverUtils/checkForServer.r"))
source(file.path(find.package("RSelenium"), "examples/serverUtils/startServer.r"))
checkForServer();
server<-startServer()
referencedirectory <- "d://temp"
fprof <- makeFirefoxProfile(list(browser.download.dir = referencedirectory, browser.download.folderList = 2L, browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk="text/xml",browser.tabs.remote.autostart = FALSE,browser.tabs.remote.autostart.2 = FALSE,browser.tabs.remote.desktopbehavior = FALSE))
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "firefox",extraCapabilities = fprof)
remDr$open()
Now it throws an error:
Selenium message:Profile has been set on both the capabilities and these options, but they're different. Unable to determine which one you want to use.
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
I have tried an alternative:
rD <- rsDriver(port = 4444L, browser = "firefox", version = "latest", geckover = "0.15.0", iedrver = NULL, phantomver = "2.1.1",
verbose = TRUE, check = TRUE, extraCapabilities = fprof)
That produces the same error in addition to complaining (these complaints do not result in an error by themselves):
Selenium message:wrong number of arguments
If extraCapabilities are removed, the above code executes, but if you then try:
rD <- rsDriver(port = 4446L, browser = "firefox", version = "latest", geckover = "0.15.0", iedrver = NULL, phantomver = "2.1.1",
verbose = TRUE, check = TRUE)
remDr <- rD[["client"]]
fprof <- makeFirefoxProfile(list(browser.download.dir = "D:/temp"))
remDr <- remoteDriver(extraCapabilities = fprof)
remDr$open()
You get the same error after the last line. rsDriver opens a browser, but that browser does not have any of the desired properties. If you close the browser (without closing the server) before trying to assign remDr and open it, you will still get the same error.
I have tried version 13, 14, and 15 of the driver and the Server 3.1.0, with the same result.
I have found the line in Java that is throwing the error, but I cannot figure out how to pass a different Firefox profile than the one that gets automatically generated behind the scenes. I have tried various versions of "Profile"/"requiredProfile"/"FirefoxProfile ", etc., but that does not get recognized as a valid input... I also see some discussion of how it may be done in Java, but not in R.
The code used to work for me until about 36 hours ago, and I have been trying to find the way out of it ever since. I am now at complete loss.
UPDATE: the setup with very sensitive about combination of versions. The brand new Selenium server version (3.3.1) works with Gecko 0.15.0 and Firefox 52. Some other combinations may work, but most do not.
Also, when setting the folder location string you need to be careful. In most contexts within R, the forward slash, / is OS-neutral, as such, I use it most of the time both in UNIX and Windows. However, when setting browser.download.dir in Windows, one apparently has to use the (escaped) backslash, \\. Otherwise the directory assignment will appear to work, but it does not work de facto.
Finally, the recommended approach with rsDriver works AND the approach with the defunct functions also works again (checkForServer() and startServer). Lesson to be learned: do not be unlucky like me in choosing the moment to update your Selenium code
It appears to be an issue with geckodriver(0.15.0)/selenium(3.3.0). I used the following:
library(RSelenium)
referencedirectory <- "c://temp"
fprof <- makeFirefoxProfile(list(browser.download.dir = referencedirectory, browser.download.folderList = 2L, browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk="text/xml",browser.tabs.remote.autostart = FALSE,browser.tabs.remote.autostart.2 = FALSE,browser.tabs.remote.desktopbehavior = FALSE))
rD <- rsDriver(port = 4444L, browser = "firefox", version = "3.1.0", geckover = "0.14.0", iedrver = NULL, phantomver = "2.1.1",
verbose = TRUE, check = TRUE, extraCapabilities = fprof)
which appeared to function correctly. As noted in the documentation I would advise if possible to use a Docker image to run a Selenium Server which will prevent issues with incompatible browser/driver versions.
Update:
There is an updated version of selenium server which should now address this issue:
rD <- rsDriver(port = 4444L, browser = "firefox", version = "3.3.1", geckover = "0.15.0",
verbose = TRUE, check = TRUE, extraCapabilities = fprof)
it seems that you don't really need to makefireprof.
the code is indeed very simple:
remDr=rsDriver(browser=browserName,extraCapabilities=list(acceptInsecureCerts=TRUE,acceptUntrustedCerts=TRUE))
I'm trying to make my research reproducible storing the data at figshare.
Something strange happens when I download and unzip the data in R.
here is the zip
If I download it manually, it opens ok; but when I try to get it with an R script, the downloaded archive is corrupt. Any ideas where is the problem?
the code to reproduce my error
url <- 'https://ndownloader.figshare.com/files/4797355'
path <- 'test/missing_data_raw.zip'
ifelse(file.exists(path1), yes = 'file alredy exists', no = download.file(url1, path1))
unzip(zipfile = path1,exdir = 'test')
Try setting the download mode to binary explicitly:
url <- 'https://ndownloader.figshare.com/files/4797217'
path1 <- tempfile(fileext = ".zip")
if (file.exists(path1)) 'file alredy exists' else download.file(url, path1, mode="wb")
unzip(zipfile = path1,exdir = tempdir())
I am using RSelenium to navigate towards a webpage which contains a button to download a file. I use RSelenium to click this button which downloads the file. However, the files are by default downloaded in my folder 'downloads', whereas I want to file to be downloaded in my working directory. I tried specifying a chrome profile as below but this did not seem to do the job:
wd <- getwd()
cprof <- getChromeProfile(wd, "Profile 1")
remDr <- remoteDriver(browserName= "chrome", extraCapabilities = cprof)
The file is still downloaded in the folder 'downloads', rather than my working directory. How can this be solved?
The solution involves setting the appropriate chromeOptions outlined at https://sites.google.com/a/chromium.org/chromedriver/capabilities . Here is an example on a windows 10 box:
library(RSelenium)
eCaps <- list(
chromeOptions =
list(prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = "C:/temp/chromeDL"
)
)
)
rD <- rsDriver(extraCapabilities = eCaps)
remDr <- rD$client
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(#href, 'zip')]")
firstzip$clickElement()
> list.files("C:/temp/chromeDL")
[1] "peace.zip"
I've been trying the alternatives, and it seems that #Bharath's first comment about giving up on fiddling with the prefs (it doesn't seem possible to do that) and instead moving the file from the default download folder to the desired folder is the way to go. The trick to making this a portable solution is finding where the default download directory is—of course it varies by os (which you can get like so)—and you need to find the user's username too:
desired_dir <- "~/Desktop/cool_downloads"
file_name <- "whatever_I_downloaded.zip"
# build path to chrome's default download directory
if (Sys.info()[["sysname"]]=="Linux") {
default_dir <- file.path("home", Sys.info()[["user"]], "Downloads")
} else {
default_dir <- file.path("", "Users", Sys.info()[["user"]], "Downloads")
}
# move the file to the desired directory
file.rename(file.path(default_dir, file_name), file.path(desired_dir, file_name))
Look this alternative way.
Your download folder should be empty.
# List the files inside the folder
down.list <- list.files(path = "E:/Downloads/",all.files = T,recursive = F)
# Move all files to specific folder
file.rename(from = paste0("E:/Downloads/",down.list),to = paste0("E:/1/scrape/",down.list))