Downloading NetCDF files with R: Manually works, download.file produces error - r

I am trying to download a set of NetCDF files from: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/
When I manually download the files I have no issues connecting, but when I use download.file and attempt to connect I get the following error:
Assertion failed!
Program: C:\Program Files\Rstudio\bin\rsession.exe
File: nc4file.c, Line 2771
Expression: 0
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
I have attempted to run the code in R without R studio and got the same result.
My abbreviated code is as followed:
library("ncdf4")
library("ncdf4.helpers")
download.file("ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/nwm.t00z.medium_range.channel_rt.f006.conus.nc","c:/users/nt/desktop/nwm.t00z.medium_range.channel_rt.f006.conus.nc")
temp = nc_open("c:/users/nt/desktop/nwm.t00z.medium_range.channel_rt.f006.conus.nc")

Adding mode = 'wb' to the download.file arguments solves the issue for me. I've had the same problem when downloading PDFs
download.file("ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/nwm.t00z.medium_range.channel_rt.f006.conus.nc","C:/teste/teste.nc", mode = 'wb')

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Error reading a .nc4 file in R (ncdf4 package)

I am trying to use a data set of .nc4 files downloaded from NASA.
The format NCDF4 is confirmed by this source.
I used download .file in R to get the database and then a simple nc_open (ncdf4 package) to test the file. Unfortunately the result is an "Unknown file format" error.
Here my replication file and my script:
download.file (url=http://hydro1.gesdisc.eosdis.nasa.gov/.../url, destfile=destination_folder/file.nc4)
All fine till this point, but when testing the files:
library(ncdf4)
setwd('destination_folder')
data <- nc_open('file.nc4')
Error in R_nc4_open: NetCDF: Unknown file format
Error in nc_open("file.nc4") :
Error in nc_open trying to open file file.nc4
Am I missing something?
Thank you.
I do not know what is wrong, but I can add the information that the problem resides in the Windows implementation of the ncdf4 package. With the following statement:
catlg<-nc_open("http://opendap.deltares.nl/thredds/dodsC/opendap/rijkswaterstaat/waterbase/concentration_of_suspended_matter_in_water/catalog.nc")
I have the same problem as described in the question. However, it works perfectly in R under Linux
The file server is an OpenDAP server strictly following netcdf 4 conventions, but maybe some features are not correctly implemented in the ncdf4 package under Windows
for some reason I get the same error using [64-bit] C:\Program Files\R\R-3.4.2), but when using [64-bit] C:\Program Files\R\R-3.3.3 the ncdf4 package works fine.
not that this solves the problem, but it provides an easy work around for the time being.

R XBRL IO Error when attempting to read from SEC web site and local file

I am having trouble with the XBRL library examples for reading XBRL documents from either the SEC website and from my local hard drive.
This code first attempts to do the read from the SEC site as written in the example in the pdf file for the XBRL library, and second tries to read a file saved locally:
# Following example from XBRL pdf doc - read xml file directly from sec web site
library(XBRL)
inst <- "http://www.sec.gov/Archives/edgar/data/1223389/000122338914000023/conn-20141031.xml"
options(stringsAsFactors = FALSE)
xbrl.vars <- xbrlDoAll(inst)
# attempt 2 - save the xml file to a local directory - so no web I/O
localdoc <- "~/R/StockTickers/XBRLdocs/aapl-20160326.xml"
xbrl.vars <- xbrlDoAll(localdoc)
Both of these throw an IO error. The first attempt to read from the SEC site results in this and crashes my RStudio instance:
error : Unknown IO error
I/O warning : failed to load external entity "http://www.sec.gov/Archives/edgar/data/1223389/000122338914000023/conn-20141031.xml"
So I restart RStudio, re-load XBRL library and try the second attempt, to read from a local file give this error:
I/O warning : failed to load external entity "~/R/StockTickers/XBRLdocs/aapl-20160326.xml"
I am using R version 3.3.0 (2016-05-03)
I hope I am missing something obvious to somebody, I am just not seeing it. Any help would be appreciated.

Warning message In download.file: download had nonzero exit status

I am downloading data from data.gov website and I get following two types of errors in the process:
fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families"
download.file(fileUrl,destfile=".data/studentdata.csv",method="curl")
Warning message:
In download.file(fileUrl, destfile = ".data/studentdata.csv", method = "curl") :
download had nonzero exit status
I tried to remove the method="curl" as suggested in other forum, but again I get this new error
download.file(fileUrl,destfile=".data/studentdata.csv")
Error in download.file(fileUrl, destfile = ".data/studentdata.csv") :
cannot open destfile '.data/studentdata.csv', reason 'No such file or directory'
I think there are two major factors why your curl doesn't work well.
First, the problem is on your URL. fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families". In your URL, it is not referred to a csv file. So, they won't work even if you set the destination into a csv file such as destfile = ".data/studentdata.csv"
I have an example of getting a csv dataset using the same code (different dataset):
DataURL<- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD" (This link refers to a rows.csv file)
download.file(DataURL, destfile="./data/rows.csv", method="curl") (The method is quite same, using curl)
Second, previously I had the same problem that the curl does not work, even I used a proper URL that refers to a csv file. However, when I diagnosed a bit deeper, I found something interesting fact about why my curl method cannot work properly. It was my R session program. I used a 32-bit R, in which the error occurs. Later then, I tried to change the session into a 64-bit R. Amazingly, and the download status was running at that time. To see your R session architecture (whether you are using 32-bit or 64-bit), type in your R:
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-ming32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
You have to switch your R, from 32-bit to 64-bit to avoid 'curl' call had nonzero exit status. You go to your R directory folder, and then you run a 64-bit R.
If you are using a Windows OS and installing the R in a default path folder, you can run this C:\Program Files\R\R-3.5.3\bin\x64\R.exe. (I used a version of 3.5.3, so it may be different with your version)
If you are using R-studio, you can switch the R session on the menubar Tools -> Global Options -> R version -> Change -> Use your machine's default version of R64 (64-bit) -> OK. Then restart your R-studio.
However, it depends on your OS architecture. If you are using a 32-bit OS, hence you have to find another way to solve this.
So looking at the code for download.file(...), if you specify method="curl" the function tries to use the curl shell command. If this command does not exist on your system, you will get the error above.
If you do not specify a method, the default is to use an internal R method to download, which evidently works on your system. In that case, the function is trying to put the file in .data/studentdata.csv but evidently there is not .data directory. Try taking out the ..
When this download works, you will get a text/html file, not a csv file. Your url points to a web page, not a download link. That page does have a download link, but unfortunately it is a pdf, not a csv.
Finally, if your goal is to have the data in R (is it?), and if the link actually produces a csv file, you could more easily use
df <- read.csv(fileUrl)
If I'm not very much mistaken you just have a simple typo here. I suspect you have a "data" directory, not a ".data" directory - in which case your only problem is that your destfile string needs to begin "./data", not ".data".
I was having the same problem.
Then I realized that I forget to create the "data" directory!
So try adding this above your fileURL line to create the directory first.
if(!file.exists("data")){
dir.create("data")
}
Also, if you are running a Mac, then you want to keep method="curl" when downloading a https file. I don't believe Windows has that problem hence the suggestions to remove it.
Try this:
file<-'http://catalog.data.gov/dataset/expenditures-on-children-by-families'
file<- read.csv(file)

download.file() fails when appending a random suffix to the filename

I'm trying to download a file in R on a remote server which sits behind a number of proxies. Something - I can't figure out what - is causing the file to be returned cached whenever I try and access it on that server, whether I do so through R or just through a Web Browser.
I've tried using cacheOK=FALSE in my download.file call and this has had no effect.
Per Is there a way to force browsers to refresh/download images? I have tried adding a random suffix to the end of the URL:
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",
format(Sys.time(), "%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", cacheOK=FALSE)
This produces, e.g., the following URL:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?17092012
Which when accessed from a Web Browser on the remote server, indeed returns the latest version of the file. However, when accessed using download.file in R, this returns a corrupted zip archive. Both WinRAR and R's unzip function complain that the zip file is corrupt.
unzip("F-F_Research_Data_Factors_daily.zip")
1: In unzip("F-F_Research_Data_Factors_daily.zip") :
internal error in unz code
I can't see why downloading this file via R would cause a corrupted file to be returned, whereas downloading it via a Web Browser gives no problem.
Can anyone suggest either a way to beat the cache from R (about which I'm not hopeful), or a reason why download.file doesn't like my URL with ?someRandomString tacked onto the end of it?
It will work if you use mode="wb"
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",format(Sys.time(),"%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", mode='wb', cacheOK=FALSE)

Resources