I just started playing around with the Twitter Streaming API and using the command line, redirect the raw JSON reponses to a file using the command below:
curl https://stream.twitter.com/1/statuses/sample.json -u USER:PASSWORD -o "somefile.txt"
Is it possible to stay completely within R and leverage RCurl to do the same thing? Instead of just saving the output to a file, I would like to parse each response that is returned. I have parsed twitter search results in the past, but I would like to do this as each response is received. Essentially, apply a function to each JSON response.
Thanks in advance.
EDIT: Here is the code that I have tried in R (I am on Windows, unfortunately). I need to include the reference to the .pem file to avoid the error. However, the code just "runs" and I can not seem to see what is returned. I have tried print, cat, etc.
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
getURL("https://stream.twitter.com/1/statuses/sample.json",
userpwd="USER:PWD",
cainfo = "cacert.pem")
I was able to figure out the basics, hopefully this helps.
#==============================================================================
# Streaming twitter using RCURL
#==============================================================================
library(RCurl)
library(rjson)
# set the directory
setwd("C:\\")
#### redirects output to a file
WRITE_TO_FILE <- function(x) {
if (nchar(x) >0 ) {
write.table(x, file="Twitter Stream Capture.txt", append=T, row.names=F, col.names=F)
}
}
### windows users will need to get this certificate to authenticate
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
### write the raw JSON data from the Twitter Firehouse to a text file
getURL("https://stream.twitter.com/1/statuses/sample.json",
userpwd=USER:PASSWORD,
cainfo = "cacert.pem",
write=WRITE_TO_FILE)
Try the twitter api package for R.
install.packages('twitteR')
library(twitteR)
I think this is what you need.
Related
I'm trying to implement R in the workplace and save a bit of time from all the data churning we do.
A lot of files we receive are sent to us via SFTP as they contain sensitive information.
I've looked around on StackOverflow & Google but nothing seems to work for me. I tried using the RCurl Library from an example I found online but it doesn't allow me to include the port(22) as part of the login details.
library(RCurl)
protocol <- "sftp"
server <- "hostname"
userpwd <- "user:password"
tsfrFilename <- "Reports/Excelfile.xlsx"
ouptFilename <- "~/Test.xlsx"
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd=userpwd)
I end up getting the error code
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
Any help would be greatly appreciated as this will save us loads of time!
Thanks,
Shan
Looks like a similar situation here: Using R to download SAS file from ftp-server
I'm no expert in r but there it looks like getBinaryUrl() worked instead of getURL() in the example given.
Hope that helps
M
Note that there are two packages, RCurl and rcurl. For RCurl, I used successfully keyfiles to connect via sftp:
opts <- list(
ssh.public.keyfile = pubkey, # file name
ssh.private.keyfile = privatekey, # filename
keypasswd <- keypasswd # optional password
)
RCurl::getURL(url=uri, .opts = opts, curl = RCurl::getCurlHandle())
For this to work, you need two create the keyfiles e.g. via putty or similar.
I too was having problems specifying the port options when using the getURI() and getURL() functions.
In order to specify the port, you simply add the port as port = #### instead of port(####). For example:
data <- getURI(url = url,
userpwd = userpwd,
port = 22)
Now, like #MarkThomas pointed out, whenever you get an encodoing error, try getBinaryURL() instead of getURI(). In most cases, this will allow you to download SAS files as well as .csv files econded in UTF-8 or LATIN1!!
I am trying to get a large dataset (3+ GB) from the following server:
ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/L4/GLOB/JPL/MUR
I know RCurl is a good package for getting data from FTP. The file is a compressed netcdf file. I need to uncompress it to read it into R using ncdf4. It's compressed as bz2.
Importantly, the file is larger than I want on my hard drive, so saving a copy locally is not an ideal option. How can I access data on the file without saving a copy to my disk first?
Here's my attempt so far:
library(RCurl); library(ncdf4)
d = getURL('ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/L4/GLOB/JPL/MUR/2015/144/20150524-JPL-L4UHfnd-GLOB-v01-fv04-MUR.nc.bz2')
d = bzfile(d, open = 'r')
d = nc_open(d)
But I'm stuck at this cryptic error after the first line:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: 'BZh91AY&SY¦ÁÀÉ\0033[ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿáåÏ\035\017)³îÎ\u009dÍØcn]sw7½ÎkÜÞõï=uÎׯv]ìçn\u009dÎn½îê·±Þìê÷wSM\u008có·+ÎçW¹Ý=Ù×¹\u009cγÜëÞs½ÛN¹²w;\u009buÍÝ]{·k^çuªnìº-³6«[+Üå;\033m»Û½ow:w¹ïo{uyîî\u00937¬\\Ƶl¶½\u009dÖVìç¯{ÎõïoSm]Ý×\u009eî\u008dæî®î®î\vÛÕïgW\036î®wqîÝ\\ïw«6½Þï\036Ýrë§=¬Fg·\\íåÔÙº÷gu·3\u009bKmÛ\027Þ»\u0092îî\016îêwwm»\u009b·s;MÞÁ½½ÎóÍso^»q¯o;k\033iµ\u009bÛuyÝÞní5w:ï]ÓuÎo[«\033:åÞvEÜíÎç½ÝË\u009eìQNöÔ\u008e\u0094vmÝȯg»e lÍ^\u008a©'
It seems to be an encoding issue based on other similar problems but I tried both .encoding = 'UTF-8' and .encoding = 'ISO-8859-1' as shown in the getURL() documentation but neither work.
I've seen other answers to problems like this but they all seem to involve editing the source file. I don't have write access to this file, however. Any help?
I'd use httr for this
library("httr")
library("ncdf4")
url <- 'ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/L4/GLOB/JPL/MUR/2015/144/20150524-JPL-L4UHfnd-GLOB-v01-fv04-MUR.nc.bz2'
res <- GET(url, write_disk(basename(url)))
# uncompress - I used OSX's default compression tool
nc_open(sub("\\.bz2", "", res$request$output$path))
the only step i didn't sort out programatically is un-compressing the bz2 file, just did that with OSX's default tool
I don't know much at all about R, but you should be able to do this with curl in FTP mode by changing the output to stdout rather than a local filename and then using bz2 to uncompress the file you want from its standard input.
So, for example, I can do this:
curl --output - --user user:password 'ftp://127.0.0.1/somefile.bz2' | bz2 ...
Maybe you can start that from within R? Or make a fifo with:
mkfifo fifo
curl ....
and then read from the file called fifo in R.
Or maybe R has a system() command, and you could do:
system('mkfifo fifo; curl ..... | bz2 .... > fifo &')
and then read from the file called fifo in R.
I am trying to download spreadsheets from AQR data library into R directly.
I have this link: http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx which prompts a download. However, when trying the following code:
> url1<-"http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
> download.file(url1,destfile="example.xlsx")
I get this error
trying URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
Error in download.file(url1, destfile = "example.xlsx") : cannot open URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
https://www.aqr.com/library/data-sets/value-and-momentum-everywhere-portfolios-monthly is the page from which I am trying to download data(under full set data link).
Could you provide some guidance?
It looks like that link redirects to https, which download.file does not support by default. If you have wget or curl installed you can use
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "wget")
or
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "curl")
These and other options are discussed at Download a file from HTTPS using download.file()
I'm not quite sure what is causing the problem for you, but the following worked for me:
library(XLConnect)
##
con <- "http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
download.file(con,"xlsxFile.xlsx",mode="wb")
##
newWB <- loadWorkbook(
file="xlsxFile.xlsx",
create=F)
##
R> getSheets(newWB)
[1] "VME Portfolios" "Definitions" "Data Sources" "Disclosures"
and here's a screenshot of the downloaded file:
Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))
Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))