Reading an online xlsx file into R - r

I am trying to download spreadsheets from AQR data library into R directly.
I have this link: http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx which prompts a download. However, when trying the following code:
> url1<-"http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
> download.file(url1,destfile="example.xlsx")
I get this error
trying URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
Error in download.file(url1, destfile = "example.xlsx") : cannot open URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
https://www.aqr.com/library/data-sets/value-and-momentum-everywhere-portfolios-monthly is the page from which I am trying to download data(under full set data link).
Could you provide some guidance?

It looks like that link redirects to https, which download.file does not support by default. If you have wget or curl installed you can use
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "wget")
or
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "curl")
These and other options are discussed at Download a file from HTTPS using download.file()

I'm not quite sure what is causing the problem for you, but the following worked for me:
library(XLConnect)
##
con <- "http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
download.file(con,"xlsxFile.xlsx",mode="wb")
##
newWB <- loadWorkbook(
file="xlsxFile.xlsx",
create=F)
##
R> getSheets(newWB)
[1] "VME Portfolios" "Definitions" "Data Sources" "Disclosures"
and here's a screenshot of the downloaded file:

Related

Certificate required for download.file() R

I am trying to download a data package from KNB and keep getting an "cannot open URL" and "InternetOpenUrl failed: 'A certificate is required to complete client authentication'". I have checked my github credentials and everything seems to be in order, but I just updated git and setup a PAT. Code below, but note that you will have to set your own directory.
This worked literally two weeks ago. Not sure what changed.
download.file("https://knb.ecoinformatics.org/knb/d1/mn/v2/packages/application%2Fbagit-097/resource_map_urn%3Auuid%3A14644b19-6e53-4063-aad9-fc823a45ac50", destfile = #your dir#, method = "wininet")
The suggested adding 'mode = "wb"' to download.file() works. I have been able to download, unzip and access the various data types in the downloaded folder.
Thank you!
download.file("https://knb.ecoinformatics.org/knb/d1/mn/v2/packages/application%2Fbagit-097/resource_map_urn%3Auuid%3A14644b19-6e53-4063-aad9-fc823a45ac50", destfile = #your dir#, method = "libcurl", mode = "wb")
unzip(zipfile = "./data.zip", exdir = "./data")

R Gist script gives error in RGui console but works fine in RStudio console - Windows 8 R3.1.2(64 bit) [duplicate]

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

How to set proxy for XML package in R

How to setup proxy for XML package?
The Rcurl package is working fine if I set:
options(RCurlOptions = list(proxy = "111.22.33.44.333", proxyport = 0000))
But id doesn't work for XML package functions.
If I set the explorer setup still nothing:
setInternet2(TRUE)
I have also added the setInternet2(TRUE) to .Rprofile but still is failing to take on the proxy. So how to assure the proxy is set globally or rather how to make it work for XML package function such as readHTMLTable etc.
As RCurl is working
try to download the HTML-Page via getURL and then parse it with readHTMLTable:
require(RCurl)
require(XML)
url <- "yourURL.com"
doc_raw <- getURL(url)
tab <- readHTMLTable(doc_raw)

Importing Excel file using url using read.xls

I'm trying to use read.xls from gdata to import an Excel file directly into R. I'm on a Windows machine running 64 bit R.
I have checked my PATH variable for perl and I appear to have that set correctly, so that doesn't appear to be a problem. Here's my code, and I've attached my error below. Does anyone have any pointers on how I can get this done?
require(RCurl)
require(gdata)
url <- "https://dl.dropboxusercontent.com/u/27644144/NADAC%2020140101.xls"
test <- read.xls(url)
The error I'm getting is:
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
Intermediate file 'C:\Users\Me\AppData\Local\Temp\RtmpeoJNxP\file338c26156d7.csv' missing!
In addition: Warning message:
running command '"C:\STRAWB~1\perl\bin\perl.exe" "C:/Users/Me/Documents/R/win-library/3.0/gdata/perl/xls2csv.pl" "https://dl.dropboxusercontent.com/u/27644144/NADAC%2020140101.xls" "C:\Users\Me\AppData\Local\Temp\RtmpeoJNxP\file338c26156d7.csv" "1"' had status 22
Error in file.exists(tfn) : invalid 'file' argument
#G.G is correct that read.xls does not support https. However, if you simply replace the https with http in the url you should be able to download the file.
Give this a try:
require(RCurl)
require(gdata)
url <- "http://dl.dropboxusercontent.com/u/27644144/NADAC%2020140101.xls"
test <- read.xls(url)
read.xls supports http and ftp but does not support https. Download it first and then use read.xls with the downloaded file.

Sourcing R script over HTTPS

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

Resources