Error in download.file : scheme not supported - r

I need to download some csv files from "http://www.elections.state.md.us".
And here is my code.
url <- "http://www.elections.state.md.us/elections/2012/election_data/index.html"
# recognize the links
links <- getHTMLLinks(url)
filenames <- links[str_detect(links,"_General.csv")]
filenames_list <- as.list(filenames)
filenames
# create a function
downloadcsv <- function(filename,baseurl,folder){
dir.create(folder,showWarnings = FALSE)
fileurl <- str_c(baseurl,filename)
if(!file.exists(str_c(folder,"/",filename))){
download.file(fileurl,
destfile = str_c(folder,"/",filename))
# 1 sec delay between files
Sys.sleep(1)
}
}
library(plyr)
l_ply(filenames_list,downloadcsv,
baseurl = "www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
The error comes out as :
Error in download.file(fileurl, destfile = str_c(folder, "/",
filename)) : scheme not supported in URL 'www.elections.state.md.us/elections/2012/election_data/State_Congressional_Districts_2012_General.csv'
However, when I try to paste the url into the IE and it did work. So what is the problem of my code?
Any idea would be helpful,Thx.

It turns out that the url must start with a scheme such as http://, https://, ftp:// or file://. So in the last line, I changed the code to
l_ply(filenames_list,downloadcsv,
baseurl = "http://www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
And it works.

Related

how to pause an R script until a file has been downloaded?

I have the following piece of code which runs a python selenium script that downloads a report
library(reticulate)
source_python("C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/PBK-Report-Automation-for-Dashboard-master/pbk automation.py")
I want to make sure that R waits until the earlier piece of code has downloaded a file into my downloads folder before the script executes this next piece of code
my.file.copy <- function(from, to) {
todir <- dirname(to)
if (!isTRUE(file.info(todir)$isdir)) dir.create(todir, recursive=TRUE)
file.copy(from = from, to = to,overwrite = TRUE)
}
my.file.copy(from = "C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv",
to = "C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/RawData/Issued and Referral Charge2020.csv")
I found this question How to make execution pause, sleep, wait for X seconds in R?
But is it possible to wait for execution until a file has been downloaded?
actually found a solution from this question
Wait for file to exist
library(reticulate)
source_python("C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/PBK-Report-Automation-for-Dashboard-master/pbk automation.py")
while (!file.exists("C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv")) {
Sys.sleep(1)
}
# Moves Downloaded CSV file of Issued_and_refused from Downloads --------
my.file.copy <- function(from, to) {
todir <- dirname(to)
if (!isTRUE(file.info(todir)$isdir)) dir.create(todir, recursive=TRUE)
file.copy(from = from, to = to,overwrite = TRUE)
}
my.file.copy(from = "C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv",
to = "C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/RawData/Issued and Referral Charge2020.csv")

Download files from FTP folder using Loop

I am trying to download all the files inside FTP folder
temp <- tempfile()
destination <- "D:/test"
url <- "ftp://XX.XX.net/"
userpwd <- "USER:Password"
filenames <- getURL(url, userpwd = userpwd,ftp.use.epsv = FALSE,dirlistonly = TRUE)
filenames <- strsplit(filenames, "\r*\n")[[1]]
When I am printing "filenames" I am getting all the file names which are inside the FTP folder - correct output till here
[1] "2018-08-28-00.gz" "2018-08-28-01.gz"
[3] "2018-08-28-02.gz" "2018-08-28-03.gz"
[5] "2018-08-28-04.gz" "2018-08-28-05.gz"
[7] "2018-08-28-08.gz" "2018-08-28-09.gz"
[9] "2018-08-28-10.gz" "2018-08-28-11.gz"
[11] "2018-08-28-12.gz" "2018-08-28-13.gz"
[13] "2018-08-28-14.gz" "2018-08-28-15.gz"
[15] "2018-08-28-16.gz" "2018-08-28-17.gz"
[17] "2018-08-28-18.gz" "2018-08-28-23.gz"
for ( i in filenames ) {
download.file(paste0(url,i), paste0(destination,i), mode="w")
}
I got this error
trying URL 'ftp://XXX.net/2018-08-28-00.gz'
Error in download.file(paste0(url, i), paste0(destination, i), mode = "w") :
cannot open URL 'ftp://XXX.net/2018-08-28-00.gz'
In addition: Warning message:
In download.file(paste0(url, i), paste0(destination, i), mode = "w") :
InternetOpenUrl failed: 'The login request was denied'
I modified the code to
for ( i in filenames )
{
#download.file(paste0(url,i), paste0(destination,i), mode="w")
download.file(getURL(paste(url,filenames[i],sep=""), userpwd =
"USER:PASSWORD"), paste0(destination,i), mode="w")
}
After that, I got this error
Error in function (type, msg, asError = TRUE) : RETR response: 550
Without a minimal, complete, and verifiable example it is a challenge to directly replicate your problem. Assuming the file names don't include the URL, you'll need to combine them to access the files.
download.file() requires a file to be read, an output file, as well as additional flags regarding whether you want a binary download or not.
For example, I have data from Alberto Barradas' Pokémon Stats kaggle.com data set stored on my Github site. To download some of the files to the test subdirectory of my R Working Directory, I can use the following code:
filenames <- c("gen01.csv","gen02.csv","gen03.csv")
fileLocation <- "https://raw.githubusercontent.com/lgreski/pokemonData/master/"
# use ./ for subdirectory of current directory, end with / to work with paste0()
destination <- "./test/"
# note that these are character files, so use mode="w"
for (i in filenames){
download.file(paste0(fileLocation,i),
paste0(destination,i),
mode="w")
}
...and the output:
The paste0() function concatenates text without spaces, which allows the code to generate a fully qualified path name for the url of each source file, as well as the subdirectory where the destination file will be stored.
To illustrate what's happening with paste0() in the for() loop, we can use message() to print to the R console.
> # illustrate what paste0() does
> for (i in filenames){
+ message(paste("Source is: ",paste0(fileLocation,i)))
+ message(paste("Destination is:",paste0(destination,i)))
+ }
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen01.csv
Destination is: ./test/gen01.csv
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen02.csv
Destination is: ./test/gen02.csv
Source is: https://raw.githubusercontent.com/lgreski/pokemonData/master/gen03.csv
Destination is: ./test/gen03.csv
>

Download.File Issue with xls Excel Workbook

I'm trying to download an Excel workbook xls using R's download.file function (Windows 10, R version 3.4.4 (2018-03-15)).
When I download the file manually (using Internet Explorer or Chrome) then the file downloads and I can then open it in Excel without any problems.
When I use download.file in R, the file downloads but size is smaller than correct download file - this file is hmtl file with some notes that my browser is not supported. Tyred different modes and no luck.
My code:
download.file(
url = "https://www.atsenergo.ru/nreport?fid=696C3DB7A3F6019EE053AC103C8C8733",
destfile = "C:/MyExcel.xls",
mode = "wb",
method = "auto"
)
Solving this problem with RSelenium library. ATS site reject any query for downloading file (return .hmtl file with Required javascript enabled message) and in this case Selenium method only works. My code below (where urlList data frame with files download links):
rD <- rsDriver(port = 4444L,
browser = "chrome",
check = FALSE,
geckover = NULL,
iedrver = NULL,
phantomver = NULL)
remDr <- rD$client
for (i in 1:nrow(urlList)) {
tryCatch({
row <- urlList[i,]
remDr$navigate(row$url)
webElem <-
remDr$findElement(using =
'link text', row$FileName)
webElem$clickElement()
},
error = function(e)
logerror(paste(
substr(e, 1, 50),
atsCode,
dateFileName,
sep = "\t"
), logger = loggerName),
finally = next)
}
remDr$close()
# stop the selenium server
rD[["server"]]$stop()

R & figshare: corrupt zip when downloaded in R

I'm trying to make my research reproducible storing the data at figshare.
Something strange happens when I download and unzip the data in R.
here is the zip
If I download it manually, it opens ok; but when I try to get it with an R script, the downloaded archive is corrupt. Any ideas where is the problem?
the code to reproduce my error
url <- 'https://ndownloader.figshare.com/files/4797355'
path <- 'test/missing_data_raw.zip'
ifelse(file.exists(path1), yes = 'file alredy exists', no = download.file(url1, path1))
unzip(zipfile = path1,exdir = 'test')
Try setting the download mode to binary explicitly:
url <- 'https://ndownloader.figshare.com/files/4797217'
path1 <- tempfile(fileext = ".zip")
if (file.exists(path1)) 'file alredy exists' else download.file(url, path1, mode="wb")
unzip(zipfile = path1,exdir = tempdir())

Specify download folder in RSelenium

I am using RSelenium to navigate towards a webpage which contains a button to download a file. I use RSelenium to click this button which downloads the file. However, the files are by default downloaded in my folder 'downloads', whereas I want to file to be downloaded in my working directory. I tried specifying a chrome profile as below but this did not seem to do the job:
wd <- getwd()
cprof <- getChromeProfile(wd, "Profile 1")
remDr <- remoteDriver(browserName= "chrome", extraCapabilities = cprof)
The file is still downloaded in the folder 'downloads', rather than my working directory. How can this be solved?
The solution involves setting the appropriate chromeOptions outlined at https://sites.google.com/a/chromium.org/chromedriver/capabilities . Here is an example on a windows 10 box:
library(RSelenium)
eCaps <- list(
chromeOptions =
list(prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = "C:/temp/chromeDL"
)
)
)
rD <- rsDriver(extraCapabilities = eCaps)
remDr <- rD$client
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(#href, 'zip')]")
firstzip$clickElement()
> list.files("C:/temp/chromeDL")
[1] "peace.zip"
I've been trying the alternatives, and it seems that #Bharath's first comment about giving up on fiddling with the prefs (it doesn't seem possible to do that) and instead moving the file from the default download folder to the desired folder is the way to go. The trick to making this a portable solution is finding where the default download directory is—of course it varies by os (which you can get like so)—and you need to find the user's username too:
desired_dir <- "~/Desktop/cool_downloads"
file_name <- "whatever_I_downloaded.zip"
# build path to chrome's default download directory
if (Sys.info()[["sysname"]]=="Linux") {
default_dir <- file.path("home", Sys.info()[["user"]], "Downloads")
} else {
default_dir <- file.path("", "Users", Sys.info()[["user"]], "Downloads")
}
# move the file to the desired directory
file.rename(file.path(default_dir, file_name), file.path(desired_dir, file_name))
Look this alternative way.
Your download folder should be empty.
# List the files inside the folder
down.list <- list.files(path = "E:/Downloads/",all.files = T,recursive = F)
# Move all files to specific folder
file.rename(from = paste0("E:/Downloads/",down.list),to = paste0("E:/1/scrape/",down.list))

Resources