downloading csv file works via libcurl yet does not via curl method - r

OS: Win 7 64 bit
RStudio Version 1.1.463
As per Getting and Cleaning Data course, I attempted to download a csv file with method = curl:
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./cameras.csv", method = "curl")
Error in download.file(fileUrl, destfile = "./cameras.csv", method =
"curl") : 'curl' call had nonzero exit status
However, method = libcurl resulted a successful download:
download.file(fileUrl, destfile = "./cameras.csv", method = "libcurl")
trying URL
'https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD'
downloaded 9443 bytes
changing from *http***s** to http produced exactly the same results for curl and libcurl, respectively.
Is there anyway to make this download work via method = curl as per the course?
Thanks

As you can see from ?download.file:
For methods "wget" and "curl" a system call is made to the tool given
by method, and the respective program must be installed on your system
and be in the search path for executables. They will block all other
activity on the R process until they complete: this may make a GUI
unresponsive.
Therefore, you should install curlfirst.
See this How do I install and use curl on Windows? to learn how.
Best!

I believe there were a few issues here:
Followed the steps in the link quoted by #JonnyCrunch
a) Reinstalled Git for windows;
b) added C:\Program Files\Git\mingw64\bin\ to the 'PATH' variable;
c) Disabled Use Internet Explorer library/proxy for HTTP in RStudio in: Tools > Options > Packages
d) Attempted steps in 'e)' below and added data.baltimorecity.gov
website to exclusions as per Kaspersky anti-virus' prompt;
e) Then in RStudio:
options(download.file.method = "curl")
download.file(fileUrl, destfile="./data/cameras.csv")
Success!
Thank you

Related

R CMD check fails with ubuntu when trying to download file, but function works within R

I am writing an R package and one of its functions download and unzips a file from a link (it is not exported to the user, though):
download_f <- function(download_dir) {
utils::download.file(
url = "https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php",
destfile = file.path(download_dir, "fines.rar"),
mode = 'wb',
method = 'libcurl'
)
utils::unzip(
zipfile = file.path(download_dir, "fines.rar"),
exdir = file.path(download_dir)
)
}
This function works fine with me when I run it within some other function to compile an example in a vignette.
However, with R CMD check in github action, it fails consistently on ubuntu 16.04, release and devel. It [says][1]:
Error: Error: processing vignette 'IBAMA.Rmd' failed with diagnostics:
cannot open URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php'
--- failed re-building ‘IBAMA.Rmd’
SUMMARY: processing the following file failed:
‘IBAMA.Rmd’
Error: Error: Vignette re-building failed.
Execution halted
Error: Error in proc$get_built_file() : Build process failed
Calls: <Anonymous> ... build_package -> with_envvar -> force -> <Anonymous>
Execution halted
Error: Process completed with exit code 1.
When I run devtools::check() it never finishes running it, staying in "creating vignettes" forever. I don't know if these problems are related though because there are other vignettes on the package.
I pass the R CMD checks with mac os and windows. I've tried switching the "mode" and "method" arguments on utils::download.file, but to no avail.
Any suggestions?
[1]: https://github.com/datazoompuc/datazoom.amazonia/pull/16/checks?check_run_id=2026865974
The download fails because libcurl tries to verify the webservers certificate, but can't.
I can reproduce this on my system:
trying URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php'
Error in utils::download.file(url = "https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php", :
cannot open URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php'
In addition: Warning message:
In utils::download.file(url = "https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php", :
URL 'https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php': status was 'SSL peer certificate or SSH remote key was not OK'
The server does not allow you to download from http but redirects to https, so the only thing to do now is to tell libcurl to not check the certificate and accept what it is getting.
You can do this by specifying the argument -k to curl
download_f <- function(download_dir) {
utils::download.file(
url = "https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php",
destfile = file.path(download_dir, "fines.rar"),
mode = 'wb',
method = 'curl',
extra = '-k'
)
utils::unzip(
zipfile = file.path(download_dir, "fines.rar"),
exdir = file.path(download_dir)
)
}
This also produces some download progress bar, you can silence this by setting extra to -k -s
This now opens you up to a Machine In The Middle Attack. (You possibly already are attacked this way, there is no way to check without verifying the current certificate with someone you know at the other side)
So you could implement an extra check, e.g. check the sha256sum of the downloaded file and see if it matches what you expect to receive before proceeding.
myfile <- system.file("fines.rar")
hash <- sha256(file(myfile))

Error downloading GPL file with getGEO

Using OSX 10.11 and R 3.3.0 I get this error using GEOQuery package:
library(GEOquery)
GSE56045 <- getGEO("GSE56045")
It downloads the GSE file but not the GPL:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL10558&form=text&view=full'
It looks like the GPL file was redirected and the download method auto set in GEOquery fails to follow the redirect: setting options('download.file.method.GEOquery'='auto')
I was able to get it working by running this in R: options('download.file.method.GEOquery' = 'libcurl')
Also, I had to delete the old downloaded GPL file - which was just the redirect message. It's easier to just set a download directory instead of finding the temp file, using destdir = for the getGEO command.

R Gist script gives error in RGui console but works fine in RStudio console - Windows 8 R3.1.2(64 bit) [duplicate]

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

Unable to downlad file to load data on R

OS- Windows 7
R version 3.0.3
What I typed on R console :
if(!file.exists("data")){dir.create("data")}
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx",method="curl")
dateDownloaded <- date()
What I got on R console
Warning messages:
1: running command 'curl "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD" -o "./data/cameras.xlsx"' had status 127
2: In download.file(fileUrl, destfile = "./data/cameras.xlsx", method = "curl") :
download had nonzero exit status
How do I make it right ?
Your code works for me. You probably don't have cURL installed on your system. See ?download.file:
For methods "wget", "curl" and "lynx" a system call is made to the tool given by method, and the respective program must be installed on your system and be in the search path for executables.
Install cURL and make sure it is in your path.
After removing the method="curl" parameter, it works well on Windows.

Warning message In download.file: download had nonzero exit status

I am downloading data from data.gov website and I get following two types of errors in the process:
fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families"
download.file(fileUrl,destfile=".data/studentdata.csv",method="curl")
Warning message:
In download.file(fileUrl, destfile = ".data/studentdata.csv", method = "curl") :
download had nonzero exit status
I tried to remove the method="curl" as suggested in other forum, but again I get this new error
download.file(fileUrl,destfile=".data/studentdata.csv")
Error in download.file(fileUrl, destfile = ".data/studentdata.csv") :
cannot open destfile '.data/studentdata.csv', reason 'No such file or directory'
I think there are two major factors why your curl doesn't work well.
First, the problem is on your URL. fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families". In your URL, it is not referred to a csv file. So, they won't work even if you set the destination into a csv file such as destfile = ".data/studentdata.csv"
I have an example of getting a csv dataset using the same code (different dataset):
DataURL<- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD" (This link refers to a rows.csv file)
download.file(DataURL, destfile="./data/rows.csv", method="curl") (The method is quite same, using curl)
Second, previously I had the same problem that the curl does not work, even I used a proper URL that refers to a csv file. However, when I diagnosed a bit deeper, I found something interesting fact about why my curl method cannot work properly. It was my R session program. I used a 32-bit R, in which the error occurs. Later then, I tried to change the session into a 64-bit R. Amazingly, and the download status was running at that time. To see your R session architecture (whether you are using 32-bit or 64-bit), type in your R:
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-ming32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
You have to switch your R, from 32-bit to 64-bit to avoid 'curl' call had nonzero exit status. You go to your R directory folder, and then you run a 64-bit R.
If you are using a Windows OS and installing the R in a default path folder, you can run this C:\Program Files\R\R-3.5.3\bin\x64\R.exe. (I used a version of 3.5.3, so it may be different with your version)
If you are using R-studio, you can switch the R session on the menubar Tools -> Global Options -> R version -> Change -> Use your machine's default version of R64 (64-bit) -> OK. Then restart your R-studio.
However, it depends on your OS architecture. If you are using a 32-bit OS, hence you have to find another way to solve this.
So looking at the code for download.file(...), if you specify method="curl" the function tries to use the curl shell command. If this command does not exist on your system, you will get the error above.
If you do not specify a method, the default is to use an internal R method to download, which evidently works on your system. In that case, the function is trying to put the file in .data/studentdata.csv but evidently there is not .data directory. Try taking out the ..
When this download works, you will get a text/html file, not a csv file. Your url points to a web page, not a download link. That page does have a download link, but unfortunately it is a pdf, not a csv.
Finally, if your goal is to have the data in R (is it?), and if the link actually produces a csv file, you could more easily use
df <- read.csv(fileUrl)
If I'm not very much mistaken you just have a simple typo here. I suspect you have a "data" directory, not a ".data" directory - in which case your only problem is that your destfile string needs to begin "./data", not ".data".
I was having the same problem.
Then I realized that I forget to create the "data" directory!
So try adding this above your fileURL line to create the directory first.
if(!file.exists("data")){
dir.create("data")
}
Also, if you are running a Mac, then you want to keep method="curl" when downloading a https file. I don't believe Windows has that problem hence the suggestions to remove it.
Try this:
file<-'http://catalog.data.gov/dataset/expenditures-on-children-by-families'
file<- read.csv(file)

Resources