download.file() fails when appending a random suffix to the filename - r

I'm trying to download a file in R on a remote server which sits behind a number of proxies. Something - I can't figure out what - is causing the file to be returned cached whenever I try and access it on that server, whether I do so through R or just through a Web Browser.
I've tried using cacheOK=FALSE in my download.file call and this has had no effect.
Per Is there a way to force browsers to refresh/download images? I have tried adding a random suffix to the end of the URL:
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",
format(Sys.time(), "%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", cacheOK=FALSE)
This produces, e.g., the following URL:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?17092012
Which when accessed from a Web Browser on the remote server, indeed returns the latest version of the file. However, when accessed using download.file in R, this returns a corrupted zip archive. Both WinRAR and R's unzip function complain that the zip file is corrupt.
unzip("F-F_Research_Data_Factors_daily.zip")
1: In unzip("F-F_Research_Data_Factors_daily.zip") :
internal error in unz code
I can't see why downloading this file via R would cause a corrupted file to be returned, whereas downloading it via a Web Browser gives no problem.
Can anyone suggest either a way to beat the cache from R (about which I'm not hopeful), or a reason why download.file doesn't like my URL with ?someRandomString tacked onto the end of it?

It will work if you use mode="wb"
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",format(Sys.time(),"%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", mode='wb', cacheOK=FALSE)

Related

Error when trying to read excel file from web site

I'm trying to download the xlsx file that is available at the following url. If you go to the website and click the link, it will download as a file on your computer. However, I want to automate this process. I have tried the following:
library(RCurl)
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx", "temp.xlsx")
library(readxl)
tmp <- read_xlsx("temp.xlsx")
# Error: Evaluation error: error reading from the connection.
This method does download a temp.xlsx file to my drive. However, if you try and manually click on it to open, excel fails to open it. It knows it's size, but is unable to open.
.
readxl::read_xlsx("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx")
# Error: `path` does not exist: ‘https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx’
Both of these methods are my go-to for downloading excel files from websites. Is there some specific reason why these methods don't work here?
When downloading certain file formats on Windows you need to specify that it should be a binary rather than the (usual) default of a text transfer - from the download.file() documentation:
The choice of binary transfer (mode = "wb" or "ab") is important on
Windows, since unlike Unix-alikes it does distinguish between text and
binary files and for text transfers changes \n line endings to \r\n
(aka ‘CRLF’).
On Windows, if mode is not supplied (missing()) and url ends in one of
.gz, .bz2, .xz, .tgz, .zip, .rda, .rds or .RData, mode = "wb" is set
such that a binary transfer is done to help unwary users.
Code written to download binary files must use mode = "wb" (or "ab"),
but the problems incurred by a text transfer will only be seen on
Windows.
In this case so that the file is written correctly use:
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx",
"temp.xlsx", mode = "wb")

R - Cannot Download gz file from FTP Server

I have been trying for three days now to download a file from an FTP server with R without a result. I have really tried everything and read all questions but still cannot manage.
The url is:
u <- "ftp://user:password#109.2.160.55/AGLO/2020/10/AGLO_00001_03-0_GDBX_1000077_202010032206_860101.CSV.gz"
When I copy paste this link in Firefox I can download the file, but with R I cannot. I tried download.file, GET, writeBin, getURL. All failed: getURL gives the following error:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: '\037‹\b\bøÙx_\0\003AGLO_00001_03-0_GDBX_1000077_202010032206_860101.CSV\0¬\\M³£¸’Ý¿_Áî­º\002I Xɶ®­*\fn>nÔ­MÇ›\231·èͼ\210î×\021óóç¤\004\006#¹.Œ§+¢ë’צŽ’TæÉ\017¡ÎUS*üï·\030ÿ±ßbñKüÛùtøþ\033#A–ýÆc\036ãgÁy,\177Ë%~f_ŽÝ{é~,é\v%}¡\\~ÐIN¿ÿùï?~ÿ\217¿þýÏ¿þ(\017±\034oY¾ýë¯?þû÷?ÿ$qù·Å/p\v÷\177{£òð]U½.umÊ¿\235»¦/{s~ƒ”\220–7ÓŸ¢Ã¿þø¯\177þã¯ÿ)ó¸ÈŠ\022_eEœã·îúz-K\031—í £¯ZÕÑW5´º+…H9/{Uéú¨K^BfNôsT©è]·­n\215ŽÊ2\021œ©²¦¯”e/æ»7e\fIy‹\231Ä_"ayF?ÈñÏýsåQUGüâ3ô¼H’d\t\177\024Hàgœ•ê=ºªópR„\235笼\002á¹VÇ\aðºRFy°y\b6Ç_xz¹d¯Àf<—I2Â.\030›\004\f°3\002­ZÓõ#\027\035Z£ê\023ÀDz(+\\7CwT}ÉJ\215¿»nè£þ¢\201\230Ô^>"§\033×ø3#çLäž¾éc›õ\035§‰`wàr\022\020pg-êøë »èÖjØCï\003çeý÷\037j\210êoM}n"ÝG׫ÆCÂ:£úï]S«è¢a_*°\034¹Z\016\036C\034XŽÜ¼œBð8YX\217»&ã‘\t=†›êz=´X…`yyÓ]g-G\177é¾Ü¾(ü颶9`\235Ñ­\031ÎXË\016\033*RÚwÏmèûgàešf|\001Þ]\023xžYËï/F·\235}\004p\tM{Òj\200·);ݾ\033\230ýE\003úY_u\215‡`j,´Û¾\0^°8}iëf<\023Kå\217\002,àP2a­Éß\006‚%åM\r¦ªì“¸Ý`jã%°“{ú±AùiÊ_s;ô_ºÄî\004Öí0\vý4ÜTÿá+ÿÚœL5þv‡¹0ž’ÃxÁ夒\025l\
There is no proxy problem whatsoever since to get the u url I am searching in the FTP directory.
How can I download this fing file ?
Also another way I could eventially work around this is using:
browseURL(u,
browser = "C:/Program Files (x86)/Mozilla Firefox/firefox.exe")
The issue with this is that it will open a Firefox browser that will:
Ask me if I am sure I want to go to this site and then
Ask me what I want to do with this file (not so much of a problem since I can choose a default to always download, but still I do not want to)
The issue with this is that simply I do not want to open a browser and I do not want to be asked if I want to go to the site and if I want to download. There are many files on this server so I want to do all of this automatically and I will need to be working in parllel, so having a browser pop up is not great, but if all else fails I can accept.
I am so desperate that I can give you the user name and password in private.
Apparently downloading the file to disk using httr solved the problem. It is possible to combine write_disk and httr::GET to download files to disk in the following way:
library(httr)
to_download <- "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
# Download pdf to disk
GET(to_download, write_disk("dummy.pdf"))

Downloading NetCDF files with R: Manually works, download.file produces error

I am trying to download a set of NetCDF files from: ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/
When I manually download the files I have no issues connecting, but when I use download.file and attempt to connect I get the following error:
Assertion failed!
Program: C:\Program Files\Rstudio\bin\rsession.exe
File: nc4file.c, Line 2771
Expression: 0
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
I have attempted to run the code in R without R studio and got the same result.
My abbreviated code is as followed:
library("ncdf4")
library("ncdf4.helpers")
download.file("ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/nwm.t00z.medium_range.channel_rt.f006.conus.nc","c:/users/nt/desktop/nwm.t00z.medium_range.channel_rt.f006.conus.nc")
temp = nc_open("c:/users/nt/desktop/nwm.t00z.medium_range.channel_rt.f006.conus.nc")
Adding mode = 'wb' to the download.file arguments solves the issue for me. I've had the same problem when downloading PDFs
download.file("ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/nwm.20180425/medium_range/nwm.t00z.medium_range.channel_rt.f006.conus.nc","C:/teste/teste.nc", mode = 'wb')

R: error downloading shapefiles with download.file() and opening them with readRDS() [duplicate]

I was getting hung up on Shiny Apps Tutorial Lesson 5 because I was unable to open the counties.rds file. readRDS() threw: error reading from connection.
I figured out I could open the .rds fine if I downloaded it with download.file(URL, dest, mode = "wb") or simply used my browser to download the file to my local directory.
Outstanding Question: Why does the counties.rds file not open properly if I use download.file() without setting mode = "wb"? I expect the answer will be something obvious like: "Duh, counties.rds is a binary file." However, before I try to answer my own question, I'd like confirmation from someone with more experience.
Repro steps:
download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
"counties.rds")
counties <- readRDS("counties.rds")
Error in readRDS("counties.rds") : error reading from connection
Resolution: Download via browser or use binary mode (wb).
download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
"counties.rds", mode = "wb")
counties <- readRDS("counties.rds") # Success!
My suggestion is to always specify 'mode' regardless and that it pretty safe to always use mode="wb". I argue it the latter should be the default and that the automatic recognition by file extension is faulty and should not be relied upon, cf. https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html

Importing a csv.gz file from an FTP server with Port and Directory Credentials into R

I want to import datasets in R that come from an FTP server. I am using FileZilla to manually see the files. Currently my data is in a xxxx.csv.gz file in the FTP server and a new file gets added once a day.
My issues are that I have tried using the following link as guidance and it doesn't seem to work well in my case:
Using R to download newest files from ftp-server
When I attempt the following code an error message comes up:
library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)
Error:
Error in function (type, msg, asError = TRUE) :
Failed to connect to ftp.xxxx.com port 21: Timed out
The reason why this happened is because under the credentials: it states that I should use Port: 22 for secure port
How do I modify my getURL function so that I can access Port: 22?
Also there is a directory after making this call that I need to get to in order to access the files.
For example purposes: let's say the directory is:
Directory: /xxx/xxxxx/xxxxxx
(I've also tried attaching this to the original URL callout and the same error message comes up)
Basically I want to get access to this directory, upload individual csv.gz files into R and then automatically call the following day's data.
The file names are:
XXXXXX_20160205.csv.gz
(The file names are just dates and each file will correspond to the previous day)
I guess the first step is to just make a connection to the files and download them and later down the road, automatically call the previous day's csv.gz file.
Any help would be great, thanks!

Resources