How to make a GET request in base R? - r

I see several methods of making a GET in R.
Using httr
myurl <- "https://stackoverflow.com"
library(httr)
httr::GET(myurl)
Using rvest/xml2
library(rvest)
xml2::read_html(myurl)
Using curl
I have tested using curl and can confirm the following works from a standard macbook and from a windows 10 device:
command <- paste("curl", myurl)
system(command)
Question
The third method above (using curl) seems to work fairly universally.
Is there any better way of making a GET request (or similarly a HEAD, POST etc) than the method above using curl?
'better' in this case means works universally across operating systems with minimal coding / external programs/libraries being installed), or is using curl (through a call to system) the best way?

From base R there is url with method = "libcurl"
con <- url("https://www.stackoverflow.com", method = "libcurl")
tmp <- readLines(con)
Also, this is not strictly base R, but from utils there is url.show
utils::url.show("https://stackoverflow.com", method = "curl")

Related

How to use proxies with urls in R?

I am trying to use proxies with my request urls in R. It changes my requested url from "www.abc.com/games" to "www.abc.com/unsupportedbrowser"
The proxies are working since I tested them in python. However I would like to implement it in R
I tried using "httr" and "crul" library in R
#Using httr library
r <- GET(url,use_proxy("proxy_url", port = 12345, username = "abc", password ="xyz") )
text <-content(r, "text")
#using "crul"
res <- HttpClient$new(
url,
proxies = proxy(proxy_url:12345,"abc","xyz")
)
out <-res$get()
text <-out$parse("UTF-8")
Is there some other way to implement the above using proxies or how can I avoid getting the request url changing from "www.abc.com/games" to "www.abc.com/unsupportedbrowser"
I also tried using "requestsR" package
However when I try something like this:
library(dplyr)
library(reticulate)
library(jsonlite)
library(requestsR)
library(rvest)
library(listviewer)
proxies <-
"{'http': 'http://abc:xyz#proxy_url:12345',
'https': 'https://abc:xyz#proxy_url:12345'}" %>%
convert_dictionary_to_list()
res <- Get(url, proxy=proxies)
It gives an error: "r$get : $ operator is invalid for atomic vectors"
I don't understand why it raises such an error. Please Let me know if this can be resolved
Thanks!
I was able to solve the above problem using "user_agent" argument with my GET()

How to distribute a plumber API over several file using mounts?

I am dealing with a large API and I'd like to distribute its definition over several file.
As far as I understood, reading the documentation this where the "mounnt()" method from a plumb comes to play
I have tried the following:
iris.R:
#* Return a bit of iris
#* #get /iris
function(){
head(iris)
}
In a new R session running:
irisAPI <- plumber::plumb("iris.R")
server <- plumber::plumber$new()
server$mount("/server", irisAPI)
server$run(host="0.0.0.0", port=8080, swagger= T)
Curling returns nothing, the swagger json is empty
Cancelling and then running the exact same thing onthe irisAPI plumb and then it works.
Is this a bug or am I missing something?
Thanks,
I had the same problem.
The problem was in plumber version. On CRAN repositories exist 0.4.6, you need download 0.5.0 (on docs say it but downloaded version is 0.4.7.9000) version from github using devtools library on R.
https://github.com/trestletech/plumber/blob/master/NEWS.md
https://cran.r-project.org/web/packages/plumber/index.html
The follow code run succesfully for me:
root <- plumber$new()
a <- plumber$new("controllers/a.R")
root$mount("/a", a)
b <- plumber$new("controllers/b.R")
root$mount("/b", b)
root$run(port = 8080, swagger=TRUE, debug= TRUE)
Regards!

R. RCurl 400 Bad Request

I try to send a request to API, use RCurl library.
My code:
start = "2018-07-30"
end = "2018-08-15"
api_request <- paste("https://api-metrika.yandex.ru/stat/v1/data.csv?id=34904255&date1=",
start,
"&date2=",
end,
"&dimensions=ym:s:searchEngine&metrics=ym:s:visits&dimensions=ym:s:<attribution>SearchPhrase&filters=ym:s:<attribution>SearchPhrase!~'some|phrases|here'&limit=100000&oauth_token=OAuth_token_here", sep="")
s <- getURL(api_request)
And every time I try to do it I have the response "Error 400" or "Bad Request" if I use getUrlContent instead. When I just open this url in my browser - it works correctly.
I still couldn't find any solution for this problem, so if somebody knows something about it - please help me, kind man =)
There are several approaches you can use in case the URL is right. First you can add the following parameter to the getURL function. Setting the parameter followlocation equal TRUE allows to follow any "Location:" header that the server returns as part of the HTTP headers. Processing is recursive, PHP will follow any "Location:" header.
> s <- getURL(url1, .opts=curlOptions(followlocation = TRUE))
If this is not working an alternative way is to use the XML package by calling the htmlParse method instead of getURL
> library(XML)
> s <- htmlParse(api_request)
Another approach would be to use the httr package and call the GET function:
> library(httr)
> s <- GET(api_request)

R produces "unsupported URL scheme" error when getting data from https sites

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551
I am using knitr to do the markdown of my R code.
As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message.
I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.
What do I need to do so that knitr will gather data from Https websites?
Edit:
Here is the code chunk that returns an error in Knitr but when run through R works without error.
```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```
You could use https with download.file() function by passing "curl" to method as :
download.file(url,destination,method="curl")
Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.
You want something like this:
library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
ssl.verifypeer=0L, followlocation=1L)
That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:
writeLines(data,'temp.csv')
read.csv('temp.csv')
You can also separate out the data directly without writing to file:
read.csv(text=data)
Edit: A much easier option is actually to use the rio package:
library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")
This will read directly from the HTTPS URL and return a data.frame.
Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.
setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")
I am sure you have already found solution to your problem by now.
I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.
Anyhow, I changed the link to http: rather than https: and that did the trick.
Following is chunk of my code:
if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")
install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")
noaaData <- read.csv ('./PeerAssessment2/Data')
Hope this helps.
I had the same issue with knitr and download.file() with a https url, on Windows 8.
You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.
setInternet2(TRUE) # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work
Source : R documentation (?download.file()) :
Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.
I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:
temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))
I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...
Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:
```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```

POST request using RCurl

As a way of exploring how to make a package in R for the Denver RUG, I decided that it would be a fun little project to write an R wrapper around the datasciencetoolkit API. The basic R tools come from the RCurl package as you might imagine. I am stuck on a seemingly simple problem and I'm hoping that somebody in this forum might be able to point me in the right direction. The basic problem is that I can't seem to use postForm() to pass an un-keyed string as part of the data option in curl, i.e. curl -d "string" "address_to_api".
For example, from the command line I might do
$ curl -d "Tim O'Reilly, Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people"
with success. However, it seems that postForm() requires an explicit key when passing additional arguments into the POST request. I've looked through the datasciencetoolkit code and developer docs for a possible key, but can't seem to find anything.
As an aside, it's pretty straightforward to pass inputs via a GET request to other parts of the DSTK API. For example,
ip2coordinates <- function(ip) {
api <- "http://www.datasciencetoolkit.org/ip2coordinates/"
result <- getURL(paste(api, URLencode(ip), sep=""))
names(result) <- "ip"
return(result)
}
ip2coordinates('67.169.73.113')
will produce the desired results.
To be clear, I've read through the RCurl docs on DTL's omegahat site, the RCurl docs with the package, and the curl man page. However, I'm missing something fundamental with respect to curl (or perhaps .opts() in the postForm() function) and I can't seem to get it.
In python, I could basically make a 'raw' POST request using httplib.HTTPConnection -- is something like that available in R? I've looked at the simplePostToHost function in the httpRequest package as well and it just seemed to lock my R session (it seems to require a key as well).
FWIW, I'm using R 2.13.0 on Mac 10.6.7.
Any help is much appreciated. All of the code will soon be available on github if you're interested in playing around with the data science toolkit.
Cheers.
With httr, this is just:
library(httr)
r <- POST("http://www.datasciencetoolkit.org/text2people",
body = "Tim O'Reilly, Archbishop Huxley")
stop_for_status(r)
content(r, "parsed", "application/json")
Generally, in those cases where you're trying to POST something that isn't keyed, you can just assign a dummy key to that value. For example:
> postForm("http://www.datasciencetoolkit.org/text2people", a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Would work the same if I'd used b="Archbishop Huxley", etc.
Enjoy RCurl - it's probably my favorite R package. If you get adventurous, upgrading to ~ libcurl 7.21 exposes some new methods via curl (including SMTP, etc.).
From Duncan Temple Lang on the R-help list:
postForm() is using a different style (or specifically Content-Type) of submitting the form than the curl -d command.
Switching the style = 'POST' uses the same type, but at a quick guess, the parameter name 'a' is causing confusion
and the result is the empty JSON array - "[]".
A quick workaround is to use curlPerform() directly rather than postForm()
r = dynCurlReader()
curlPerform(postfields = 'Archbishop Huxley', url = 'http://www.datasciencetoolkit.org/text2people', verbose = TRUE,
post = 1L, writefunction = r$update)
r$value()
This yields
[1]
"[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":0,\"end_index\":17,\"matched_string\":\"Archbishop
Huxley\"}]"
and you can use fromJSON() to transform it into data in R.
I just wanted to point out that there must be an issue with passing a raw string via the postForm function. For example, if I use curl from the command line, I get the following:
$ curl -d "Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people
[{"gender":"u","first_name":"","title":"archbishop","surnames":"Huxley","start_index":0,"end_index":17,"matched_string":"Archbishop Huxley"}]
and in R I get
> api <- "http://www.datasciencetoolkit.org/text2people"
> postForm(api, a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Note that it returns two elements in the JSON string and neither one matches on the start_index or end_index. Is this a problem with encoding or something?
The simplePostToHost function in the httpRequest package might do what you are looking for here.

Resources