Using download.file to download a zip file from URL in R - r

I am trying to use download.file to extract a zip file from a URL and then push all the data in each of the files into a MySQL database. I am getting stuck in the first step where I use download.file to extract the zip file
I have tried the following but to no avail
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='auto')
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='curl')
Please suggest where am I going wrong. Also some pointers on how to take one file at a time from the zip folder and push into a DB will be most helpful

What finally worked in AWS is the use of the package downloader
https://cran.r-project.org/web/packages/downloader/downloader.pdf
It has features to support https. Hope it will help someone

You can try this:
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
dir = "zippedFile.zip"
download.file(myURL, dir, mode="wb")
destfile -- a character string with the name where the downloaded file
is saved. Tilde-expansion is performed.

Related

Passed a filename that is NOT a string of characters! (RMarkdown)

I'm accessing ncdf files directly from a website [here][1] into my RMarkdown.
When I try to read the file using the nc_open functions as in the code below, I get the error 'Passed a filename that is NOT a string of characters!'
Any idea how I can solve this?
ps: I even tried uncompressing the files with the gzcon function but the result is the same when I try to read the data.
Thanks for your help!
Kami
library(httr)
library(ncdf4)
nc<-GET("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz")
cru_nc<-nc_open(nc)
OK here is the fill answer:
library(httr)
library(ncdf4)
library(R.utils)
url <- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz"
filename <- "/tmp/file.nc.gz"
# Download the file and store it as a temp file
download.file(url, filename, mode = "wb")
# Unzip the temp file
gunzip(filename)
# The unzipped filename drops the .gz
unzip_filename <- "/tmp/file.nc"
# You can now open the unzipped file with its **filename** rather than the object
cru_nc<-nc_open(unzip_filename)
Is this a mode="w" Vs mode="wb" issue. I've had this with files before. No experience of ncdf4.
Not sure if you can pass mode="wb" to get but does
file.download(yourUrl, mode="wb")
Work / help
Edit:
Ah. Other thing is you are storing the object as an object (nc) but nc_open wants to open a file.
I think you need to save the object locally (unless nc_open can just take the URL) and then open it? Possibly after unzipping.

Reading a gpx file into Shiny from a dropbox account

I have a Shiny app that accesses data from a dropbox account. I used the instructions at https://github.com/karthik/rdrop2/blob/master/README.md to be been able to read in csv data with no problem, i.e. using the drop_read_csv command from the rdrop2 package after doing the authentication step.
e.g.
my_data<-drop_read_csv("ProjectFolder/DataSI.csv")
My next problem however is that there are going to be a lot of gpx track files uploaded to the dropbox that I want the app to be able to read in. I have tried using:
gpx.files<-drop_search('gpx', path="ProjectFolder/gpx_files")
trk.tmp<-vector("list",dim(gpx.files)[1])
for(i in 1: dim(gpx.files)[1]){
trk.tmp[[i]]<-readOGR(gpx.files$path[i], layer="tracks")
}
But no luck. At the readOGR step, I get:
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
Cannot open data source
Hopefully someone can help.
My problem was I hadn't specified the dropbox path properly. I have used the drop_read_csv code and made a drop_readOGR version:
drop_readOGR<-function(my.file, dest=tempdir()){
localfile = paste0(dest, "/", basename(my.file))
drop_get(my.file, local_file = localfile, overwrite = TRUE)
readOGR(localfile, layer="tracks")
}
So now I can just use what I was doing before except I have changed the line in the loop to call the new function.
gpx.files<-drop_search('gpx', path="ProjectFolder/gpx_files")
trk.tmp<-vector("list",dim(gpx.files)[1])
for(i in 1: dim(gpx.files)[1]){
trk.tmp[[i]]<-drop_readOGR(gpx.files$path[i])
}

R download.file() rename the downloaded file, if the filename already exists

In R, I am trying to download files off the internet using the download.file() command in a simple code (am complete newbie). The files are downloading properly. However, if a file already exists in the download destination, I'd wish to rename the downloaded file with an increment, as against an overwrite which seems to be the default process.
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
nse.destfile = paste0(nse.folder,"fo04FEB2016bhav.csv.zip")
download.file(nse.url,nse.destfile,mode = "wb",method = "libcurl")
Problem w.r.t to this specific code: if "fo04FEB2016bhav.csv.zip" already exists, then get say "fo04FEB2016bhav.csv(2).zip"?
General answer to the problem (and not just the code mentioned above) would be appreciated as such a bottleneck could come up in any other situations too.
The function below will automatically assign the filename based on the file being downloaded. It will check the folder you are downloading to for the presence of a similarly named file. If it finds a match, it will add an incrementation and download to the new filename.
ekstroem's suggestion to fiddle with the curl settings is probably a much better approach, but I wasn't clever enough to figure out how to make that work.
download_without_overwrite <- function(url, folder)
{
filename <- basename(url)
base <- tools::file_path_sans_ext(filename)
ext <- tools::file_ext(filename)
file_exists <- grepl(base, list.files(folder), fixed = TRUE)
if (any(file_exists))
{
filename <- paste0(base, " (", sum(file_exists), ")", ".", ext)
}
download.file(url, file.path(folder, filename), mode = "wb", method = "libcurl")
}
download_without_overwrite(
url = "https://raw.githubusercontent.com/nutterb/redcapAPI/master/README.md",
folder = "[path_to_folder]")
Try this:
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
#Get file name from url, with file extention
fname.x <- gsub(".*/(.*)", "\\1", nse.url)
#Get file name from url, without file extention
fname <- gsub("(.*)\\.csv.*", "\\1", fname.x)
#Get xtention of file from url
xt <- gsub(".*(\\.csv.*)", "\\1", fname.x)
#How many times does the the file exist in folder
exist.times <- sum(grepl(fname, list.files(path = nse.folder)))
if(exist.times){
# if it does increment by 1
fname.x <- paste0(fname, "(", exist.times + 1, ")", xt)
}
nse.destfile = paste0(nse.folder, fname.x)
download.file(nse.url, nse.destfile, mode = "wb",method = "libcurl")
Issues
This approach will not work in cases where part of the file name already exists for example you have url/test.csv.zip and in the folder you have a file testABC1234blahblah.csv.zip. It will think the file already exists, so it will save it as test(2).csv.zip.
You will need to change the #How many times does the the file exist in folder part of the code accordingly.
This is not a proper answer and shouldn't be considered as such, but the comment section above was too small to write it all.
I thought the -O -n options to curl could be used to but now that I looked at it more closely it turned out that it wasn't implemented yet. Now wget automatically increment the filename when downloading a file that already exists. However, setting method="wget" doesn't work with download.file because you are forced to set the destination file name, and once you do that you overwrite the automatic file increments.
I like the solution that #Benjamin provided. Alternatively, you can use
system(paste0("wget ", nse.url))
to get the file through the system (provided that you have wget installed) and let wget handle the increment.

Download URL links using R

I am new to R and would like to seek some advice.
I am trying to download multiple url links (pdf format, not html) and save it into pdf file format using R.
The links I have are in character (took from the html code of the website).
I tried using download.file() function, but this requires specific url link (Written in R script) and therefore can only download 1 link for 1 file. However I have many url links, and would like to get help in doing this.
Thank you.
I believe what you are trying to do is download a list of URLs, you could try something like this approach:
Store all the links in a vector using c(), ej:
urls <- c("http://link1", "http://link2", "http://link3")
Iterate through the file and download each file:
for (url in urls) {
download.file(url, destfile = basename(url))
}
If you're using Linux/Mac and https you may need to specify method and extra attributes for download.file:
download.file(url, destfile = basename(url), method="curl", extra="-k")
If you want, you can test my proof of concept here: https://gist.github.com/erickthered/7664ec514b0e820a64c8
Hope it helps!
URL
url = c('https://cran.r-project.org/doc/manuals/r-release/R-data.pdf',
'https://cran.r-project.org/doc/manuals/r-release/R-exts.pdf',
'http://kenbenoit.net/pdfs/text_analysis_in_R.pdf')
Designated names
names = c('manual1',
'manual2',
'manual3')
Iterate through the file and download each file with corresponding name:
for (i in 1:length(url)){
download.file(url[i], destfile = names[i], mode = 'wb')
}

How to download an .xlsx file from a dropbox (https:) location

I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon
You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.

Resources