I'm trying to download a spreadsheet from the Australian Bureau of Statistics using download.file. But I'm getting a corrupted file back and when I go to open it using readxl my session is crashing.
target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
dest = 'downloaded_file.xlsx'
download.file(url = target, destfile = dest)
Any pointers would be great.
Looks like that file is an xls file not using the newer xlsx format. Remove the 'x' at the end of the filename so readxl knows to use the right format. Note also that I'm pretty sure xls is a binary format, so you should use binary mode to write the file.
target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
dest = 'downloaded_file.xls'
download.file(url = target, destfile = dest, mode='wb')
Related
I'm accessing ncdf files directly from a website [here][1] into my RMarkdown.
When I try to read the file using the nc_open functions as in the code below, I get the error 'Passed a filename that is NOT a string of characters!'
Any idea how I can solve this?
ps: I even tried uncompressing the files with the gzcon function but the result is the same when I try to read the data.
Thanks for your help!
Kami
library(httr)
library(ncdf4)
nc<-GET("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz")
cru_nc<-nc_open(nc)
OK here is the fill answer:
library(httr)
library(ncdf4)
library(R.utils)
url <- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz"
filename <- "/tmp/file.nc.gz"
# Download the file and store it as a temp file
download.file(url, filename, mode = "wb")
# Unzip the temp file
gunzip(filename)
# The unzipped filename drops the .gz
unzip_filename <- "/tmp/file.nc"
# You can now open the unzipped file with its **filename** rather than the object
cru_nc<-nc_open(unzip_filename)
Is this a mode="w" Vs mode="wb" issue. I've had this with files before. No experience of ncdf4.
Not sure if you can pass mode="wb" to get but does
file.download(yourUrl, mode="wb")
Work / help
Edit:
Ah. Other thing is you are storing the object as an object (nc) but nc_open wants to open a file.
I think you need to save the object locally (unless nc_open can just take the URL) and then open it? Possibly after unzipping.
readxl_1.1.0
I'm trying to read the file from this link (US gov website)
https://www.cftc.gov/files/dea/history/dea_com_xls_2018.zip
When I unzip the xls file inside, and read with readxl::read_excel, it fails with the error message failed to open C:\path to file
I can open the file in excel, save it to csv and read it to R by fread, but there are a lot of those files, so that's tedious. By the way, some other xls files downloaded from the same webpage can be read by read_excel
There's something odd about the xls file. I think it's because it contains some VBA code.
If you are happy to use XLConnect here is an alternative that reads the file.
library(XLConnect)
extdir = tempdir()
unzip("dea_com_xls_2018.zip", exdir = extdir)
file = list.files(extdir, pattern = 'xls', full.names = T)
wb = loadWorkbook(file)
ws = readWorksheet(wb, sheet = 1)
dim(ws)
#[1] 11131 126
I'm working on an R script that is supposed to open an excel file from a folder in the current user computer using read_excel from readxl library.
The path will have a personal folder (C:/Users/Username....).
I'm trying to accomplish that as follows:
string <- getwd()
name <- strsplit(strsplit(x = string, split = "C:/Users/")[[1]][2], split = "/")[[1]][1]
path_crivo <- paste0("C:/Users/", name, "/some_folders/excel_file.xlsx")
So path_crivo stores the string: C:/Users/João Anselmo/some_folders/excel_file.xlsx"
When I run the read_excel function with this path I get the error:
read_excel(path_crivo)
"Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim, :
Evaluation error: zip file 'C:/Users/João Anselmo/some_folders/excel_file.xlsx' cannot be opened."
If I set path_crivo directly as follows:
path_crivo <- "C:/Users/João Anselmo/some_folders/excel_file.xlsx"
It works perfectly.
Anyone have faced a similar problem?
I can't rename the folders, nor set path_crivo directly, it is supposed to be a personal path.
Thanks for your help
Try
Encoding(path_crivo)<-"latin1"
(or, possibly, change the encoding of string before you create path_crivo)
I am trying to use download.file to extract a zip file from a URL and then push all the data in each of the files into a MySQL database. I am getting stuck in the first step where I use download.file to extract the zip file
I have tried the following but to no avail
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='auto')
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='curl')
Please suggest where am I going wrong. Also some pointers on how to take one file at a time from the zip folder and push into a DB will be most helpful
What finally worked in AWS is the use of the package downloader
https://cran.r-project.org/web/packages/downloader/downloader.pdf
It has features to support https. Hope it will help someone
You can try this:
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
dir = "zippedFile.zip"
download.file(myURL, dir, mode="wb")
destfile -- a character string with the name where the downloaded file
is saved. Tilde-expansion is performed.
A follow up to this question: How can I download and uncompress a gzipped file using R? For example (from the UCI Machine Learning Repository), I have a file of insurance data. How can I download it using R?
Here is the data url: http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz.
I like Ramnath's approach, but I would use temp files like so:
tmpdir <- tempdir()
url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)
untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)
The list.files() should produce something like this:
[1] "TicDataDescr.txt" "dictionary.txt" "ticdata2000.txt" "ticeval2000.txt" "tictgts2000.txt"
which you could parse if you needed to automate this process for a lot of files.
Here is a quick way to do it.
# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')
# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)
# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))
Please the content of help(download.file) for that. If the file in question is merely a gzipped but otherwise readable file, you can feed the complete URL to read.table() et al too.
Using library(archive) one can also read in a particular csv file within an archive without having to UNZIP it first : read_csv(archive_read("http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz", file = 1), col_types = cols())
This is quite a bit faster.
To unzip everything one can do archive_extract("http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz", dir=XXX).
That worked very well for me & is faster than the unbuilt untar(). It also works on all platforms. It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.