R & figshare: corrupt zip when downloaded in R - r

I'm trying to make my research reproducible storing the data at figshare.
Something strange happens when I download and unzip the data in R.
here is the zip
If I download it manually, it opens ok; but when I try to get it with an R script, the downloaded archive is corrupt. Any ideas where is the problem?
the code to reproduce my error
url <- 'https://ndownloader.figshare.com/files/4797355'
path <- 'test/missing_data_raw.zip'
ifelse(file.exists(path1), yes = 'file alredy exists', no = download.file(url1, path1))
unzip(zipfile = path1,exdir = 'test')

Try setting the download mode to binary explicitly:
url <- 'https://ndownloader.figshare.com/files/4797217'
path1 <- tempfile(fileext = ".zip")
if (file.exists(path1)) 'file alredy exists' else download.file(url, path1, mode="wb")
unzip(zipfile = path1,exdir = tempdir())

Related

R Markdown Error: Error in tempfile(pattern = "_rs_rdf_", tmpdir = outputFolder, fileext = ".rdf") : temporary name too long

I am working on RMarkdown and keep getting the following error when I run my code. W
Error in tempfile(pattern = "_rs_rdf_", tmpdir = outputFolder, fileext = ".rdf") :
temporary name too long
However, hen I paste the code in the console, it works. I can even knit the document successfully. What is the problem and how can one resolve this?

Download.File Issue with xls Excel Workbook

I'm trying to download an Excel workbook xls using R's download.file function (Windows 10, R version 3.4.4 (2018-03-15)).
When I download the file manually (using Internet Explorer or Chrome) then the file downloads and I can then open it in Excel without any problems.
When I use download.file in R, the file downloads but size is smaller than correct download file - this file is hmtl file with some notes that my browser is not supported. Tyred different modes and no luck.
My code:
download.file(
url = "https://www.atsenergo.ru/nreport?fid=696C3DB7A3F6019EE053AC103C8C8733",
destfile = "C:/MyExcel.xls",
mode = "wb",
method = "auto"
)
Solving this problem with RSelenium library. ATS site reject any query for downloading file (return .hmtl file with Required javascript enabled message) and in this case Selenium method only works. My code below (where urlList data frame with files download links):
rD <- rsDriver(port = 4444L,
browser = "chrome",
check = FALSE,
geckover = NULL,
iedrver = NULL,
phantomver = NULL)
remDr <- rD$client
for (i in 1:nrow(urlList)) {
tryCatch({
row <- urlList[i,]
remDr$navigate(row$url)
webElem <-
remDr$findElement(using =
'link text', row$FileName)
webElem$clickElement()
},
error = function(e)
logerror(paste(
substr(e, 1, 50),
atsCode,
dateFileName,
sep = "\t"
), logger = loggerName),
finally = next)
}
remDr$close()
# stop the selenium server
rD[["server"]]$stop()

Error in download.file : scheme not supported

I need to download some csv files from "http://www.elections.state.md.us".
And here is my code.
url <- "http://www.elections.state.md.us/elections/2012/election_data/index.html"
# recognize the links
links <- getHTMLLinks(url)
filenames <- links[str_detect(links,"_General.csv")]
filenames_list <- as.list(filenames)
filenames
# create a function
downloadcsv <- function(filename,baseurl,folder){
dir.create(folder,showWarnings = FALSE)
fileurl <- str_c(baseurl,filename)
if(!file.exists(str_c(folder,"/",filename))){
download.file(fileurl,
destfile = str_c(folder,"/",filename))
# 1 sec delay between files
Sys.sleep(1)
}
}
library(plyr)
l_ply(filenames_list,downloadcsv,
baseurl = "www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
The error comes out as :
Error in download.file(fileurl, destfile = str_c(folder, "/",
filename)) : scheme not supported in URL 'www.elections.state.md.us/elections/2012/election_data/State_Congressional_Districts_2012_General.csv'
However, when I try to paste the url into the IE and it did work. So what is the problem of my code?
Any idea would be helpful,Thx.
It turns out that the url must start with a scheme such as http://, https://, ftp:// or file://. So in the last line, I changed the code to
l_ply(filenames_list,downloadcsv,
baseurl = "http://www.elections.state.md.us/elections/2012/election_data/",
folder = "elec12_maryland")
And it works.

Specify download folder in RSelenium

I am using RSelenium to navigate towards a webpage which contains a button to download a file. I use RSelenium to click this button which downloads the file. However, the files are by default downloaded in my folder 'downloads', whereas I want to file to be downloaded in my working directory. I tried specifying a chrome profile as below but this did not seem to do the job:
wd <- getwd()
cprof <- getChromeProfile(wd, "Profile 1")
remDr <- remoteDriver(browserName= "chrome", extraCapabilities = cprof)
The file is still downloaded in the folder 'downloads', rather than my working directory. How can this be solved?
The solution involves setting the appropriate chromeOptions outlined at https://sites.google.com/a/chromium.org/chromedriver/capabilities . Here is an example on a windows 10 box:
library(RSelenium)
eCaps <- list(
chromeOptions =
list(prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = "C:/temp/chromeDL"
)
)
)
rD <- rsDriver(extraCapabilities = eCaps)
remDr <- rD$client
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(#href, 'zip')]")
firstzip$clickElement()
> list.files("C:/temp/chromeDL")
[1] "peace.zip"
I've been trying the alternatives, and it seems that #Bharath's first comment about giving up on fiddling with the prefs (it doesn't seem possible to do that) and instead moving the file from the default download folder to the desired folder is the way to go. The trick to making this a portable solution is finding where the default download directory is—of course it varies by os (which you can get like so)—and you need to find the user's username too:
desired_dir <- "~/Desktop/cool_downloads"
file_name <- "whatever_I_downloaded.zip"
# build path to chrome's default download directory
if (Sys.info()[["sysname"]]=="Linux") {
default_dir <- file.path("home", Sys.info()[["user"]], "Downloads")
} else {
default_dir <- file.path("", "Users", Sys.info()[["user"]], "Downloads")
}
# move the file to the desired directory
file.rename(file.path(default_dir, file_name), file.path(desired_dir, file_name))
Look this alternative way.
Your download folder should be empty.
# List the files inside the folder
down.list <- list.files(path = "E:/Downloads/",all.files = T,recursive = F)
# Move all files to specific folder
file.rename(from = paste0("E:/Downloads/",down.list),to = paste0("E:/1/scrape/",down.list))

Default .Options setting for unzip

I was trying to install ramnathv's slidify package and was surprised the by the following error:
# install.packages("devtools")
library(devtools)
install_github("slidify","ramnathv")
Installing github repo slidify/master from ramnathv
Downloading slidify.zip from https://github.com/ramnathv/slidify/archive/master.zip
Installing package from /tmp/RtmpOFEJuD/slidify.zip
Error in unzip(src, exdir = target, unzip = getOption("unzip")) :
'unzip' must be a single character string
I've installed it before with no issues on another system using the same Arch Linux setup and their R package, granted it was an earlier version of R (3.0.0 or 3.0.1).
In googling the error, it comes from this bit in zip.R:
unzip <-
function(zipfile, files = NULL, list = FALSE, overwrite = TRUE,
junkpaths = FALSE, exdir = ".", unzip = "internal",
setTimes = FALSE)
{
if(identical(unzip, "internal")) {
if(!list && !missing(exdir))
dir.create(exdir, showWarnings = FALSE, recursive = TRUE)
res <- .External(C_unzip, zipfile, files, exdir, list, overwrite,
junkpaths, setTimes)
if(list) {
dates <- as.POSIXct(res[[3]], "%Y-%m-%d %H:%M", tz="UTC")
data.frame(Name = res[[1]], Length = res[[2]], Date = dates,
stringsAsFactors = FALSE)
} else invisible(attr(res, "extracted"))
} else {
WINDOWS <- .Platform$OS.type == "windows"
if(!is.character(unzip) || length(unzip) != 1L || !nzchar(unzip))
stop("'unzip' must be a single character string")
...
While the default setting for unzip() is unzip = "internal", it can also be passed getOptions("unzip"):
Usage
unzip(zipfile, files = NULL, list = FALSE, overwrite = TRUE,
junkpaths = FALSE, exdir = ".", unzip = "internal",
setTimes = FALSE)
...
unzip
The method to be used. An alternative is to use getOption("unzip"),
which on a Unix-alike may be set to the path to a unzip program.
In looking at the output of .Options, I see, indeed, that there's nothing set for unzip:
> .Options$unzip
[1] ""
From perusing documentation, I see that I can set this in ~/.Renviron, but my question is whether on a Linux system this should be picked up by default or if it's pretty standard to have to set your unzip command?
If it's not standard that this be un-populated, I'll file a bug report with Arch Linux, as perhaps the package was compiled without this as a default, as documentation seems to suggest that it would be populated with an unzip command if one was found during installation:
unzip
a character string, the path of the command used for unzipping help files, or
"internal". Defaults to the value of R_UNZIPCMD, which is set in ‘etc/Renviron’
if an unzip command was found during configuration.
The Arch package I'm using is a binary and thus was pre-compiled, so intentionally or unintentionally perhaps this was overlooked or should be checked for during installation?
P.S. Just to avoid the obvious...
$ which unzip
/usr/bin/unzip

Resources