I think I'm following the instructions in the documentation exactly (https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html) but I can't get the add_headers functionality to work. A simple example is:
library(httr)
res <- GET('http://www.google.com', httr::add_headers(Referer= 'https://www.google.com/'), user_agent('Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0'))
str(content(res)$headers)
The last line is supposed to print the header of the request and I am getting NULL
It's because google.com returns HTML, and content by default parses with xml2 to xml_document which you can't index with $headers. And headers is a field returned by httpbin.org in JSON, but not by google.com (headers from google, as most sites will do, you can get to by res$headers)
Related
I have an issue using an R script as a data source in Microsoft PowerBi. I think this is fundermentally an issue with PowerBi, but in the short term I'll need to find a solution in R.
Essentially, PowerBi doesn't appear to be able to handle the messages that would be sent to the console if I was using R Studio.
Within the R script I'm using a REST API to request data from a URL. The JSON message that is received is converted into an R data frame. When using the script as a datasource in PowerBi, this only works if I set the verbose settings to FALSE i.e. if I was using R Studio no messages (in particular data in) are sent to the console.
response <- GET(<url>,
body = list(),
add_headers(.headers = c('<identity token>' = ID_to_use)),
verbose(data_out = FALSE,
data_in = FALSE,
info = FALSE,
ssl = FALSE),
encode = "json")
However, I do not have the option to switch off the incoming/outgoing JSON header messages (which is going to come back to bite!).
<< {"identity":" <token>"}
* Connection #54 to <host> left intact
No encoding supplied: defaulting to UTF-8.
-> GET <URL request> HTTP/1.1
-> Host: <host>
-> User-Agent: libcurl/7.64.1 r-curl/4.3 httr/1.4.1
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, */*
-> <Identity>: <Identity>
->
<- HTTP/1.1 200 OK
<- X-Session-Expiry: 3599
<- Content-Type: application/json
<- Transfer-Encoding: chunked
<- Date: Thu, 06 Aug 2020 16:14:26 GMT
<- Server: <Server>
<-
No encoding supplied: defaulting to UTF-8.
No encoding supplied: defaulting to UTF-8.
No encoding supplied: defaulting to UTF-8.
From R help
.
.
verbose() uses the following prefixes to distinguish between different components of the http messages:
* informative curl messages
-> headers sent (out)
>> data sent (out)
*> ssl data sent (out)
<- headers received (in)
<< data received (in)
<* ssl data received (in)
.
.
Switching the verbose settings to FALSE works for a single request, however, I need to put the request into a loop and keep requesting more data until the API gateway indicates there is no more data to be received. PowerBi appears to fail when in the script five or more request/replies are sent/received.
Just from observation, I assume this is to do with the JSON Header messages piling up.
I've tried a number of approaches but nothing seems to work: sink('NUL'), invisible(), capture.output().
Any help would be appreciated.
I found a hacky solution, which at least solved the problem I had in R, but not in PowerBi.
By writing a "wrapper" R script (see below) which calls my main script THE_SCRIPT.R using a shell command. THE_SCRIPT dumps out a CSV file, which I then read in the wrapper script:
#Required by PowerBi
library(mice)
#set the directory, between R and the shell it's a pain to deal with spaces in the directories and quotes
setwd("C:/Program Files/R/R-3.6.2/bin/")
system("Rscript.exe C:\\Users\\<USER>\\Documents\\THE_SCRIPT.R > Nul 2>&1")
A_DATA_TABLE <- read.csv("C:\\Users\\<USER>\\Documents\\THE_FILE.csv")
However, this still didn't resolve the issue when running it in PowerBi.
Note, I tried sink('Nul 2>&1') in R, didn't work.
When using httr::GET, in certain queries it replaces % with safe representation %25, but in other queries it doesn't. I cannot find any rule that would make this happen.
I'm using httr 1.4.1
Sample query where % is replaced (notice the error code and that URL entered is not the same as in response object returned):
> httr::GET("jira.spring.io/rest/api/latest/search?jql=project=Spring%20Framework&startAt=0")
Response [https://jira.spring.io/rest/api/latest/search?jql=project=Spring%2520Framework&startAt=0]
Date: 2020-01-16 22:57
Status: 400
Content-Type: application/json;charset=UTF-8
Size: 196 B
Query where it is not replaced (no error, URL in response same as entered):
> httr::GET("issues.jenkins-ci.org/rest/api/latest/search?jql=project='WEBSITE'%20OR%20project='Infrastructure'&startAt=0")
Response [https://issues.jenkins-ci.org/rest/api/latest/search?jql=project='WEBSITE'%20OR%20project='Infrastructure'&startAt=0]
Date: 2020-01-16 23:02
Status: 200
Content-Type: application/json;charset=UTF-8
Size: 430 kB
What is going on? Is it a bug in httr? Or should I change some parameters in GET() call?
tldr; use HTTPS requests with jira.spring.io to avoid a broken protocol upgrade.
It's not an R/HTTR issue. It's the website. Compare the results of HTTP ("failing with mystery %25") and HTTPS ("succeeding"):
http://jira.spring.io/rest/api/latest/search?jql=project=Spring%20Framework&startAt=0
{"errorMessages":["Error in the JQL Query: The character '%' is a reserved JQL character. You must enclose it in a string or use the escape '\u0025' instead. (line 1, character 15)"],"errors":{}}
https://jira.spring.io/rest/api/latest/search?jql=project=Spring%20Framework&startAt=0
{"errorMessages":["Error in the JQL Query: Expecting either 'OR' or 'AND' but got 'Framework'. (line 1, character 16)"],"errors":{}}
There appears to be a 'malfunction' in the HTTP -> HTTPS redirect protocol upgrade, which has this response header:
Status Code: 301 Moved Permanently
Location: https://jira.spring.io/rest/api/latest/search?jql=project=Spring%252520Framework&startAt=0
^^^^^
Thus a solution is to use the HTTPS endpoint and avoid the strange target Location..
I'm trying to setup monitoring (http-ecv) with authorization header, but I'm getting illegal character or header folding error (jetty 9.3)
1. example
GET /somepath/somepage.html HTTP/1.1
Server Running
\r\nAuthorization: Basic somestring=\r\n
Response "HTTP1.1 400 Illegal character SPACE=''\r\n"
2. example
GET /somepath/somepage.html
Server Running
HTTP/1.1\r\nAuthorization: Basic somestring=\r\n
Response HTTP/1.1 400 Illegal character VCHAR='/'\r\n
This example was working on older jetty version
3.example
GET /somepath/somepage.html
Server Running
\r\nHTTP/1.1\r\nAuthorization: Basic somestring=\r\n
Response HTTP1.1 400 Header Folding\r\n
Any ideas?
try the solution on the article https://support.citrix.com/article/CTX117142
edit to add more context: the article describes how to create a monitor for a back end server that requests basic authentication with a user name and password.
Summarized:
add lb monitor test_login_tcp TCP-ECV -send "GET / HTTP/1.1\r\nAuthorization: Basic YOURBASE64USERPW\r\nHost: IP_or_FQDN\r\n\r\n" -recv 200 -LRTM ENABLED
I've just started working with the Quectel MC60 and I am having some issues:
About HTTP GET method, I make the following commands:
AT+QIFGCNT=0
AT+QICSGP=1,"my_apn"
AT+QIREGAPP
AT+QIACT
AT+QSSLCFG="https",1
AT+QHTTPURL=39,40
my_url_39_bytes_long
AT+QHTTPGET=60
AT+QHTTPREAD=30
AT+QIDEACT
When using the QCOM software, I make a script running all the above commands sequentially. When it comes to the AT+QHTTPREAD command, the response is always "+CME ERROR: 3822" (HTTP response failed). What can it be? I'm sure the HTTP server is working properly.
The answer is that it is necessary to configure the request header
AT+QIFGCNT=0
AT+QICSGP=1,"my_apn"
AT+QIREGAPP
AT+QIACT
AT+QHTTPURL=39,40
my_url_39_bytes_long
AT+QHTTPCFG="requestheader",1
AT+QHTTPPOST=77
GET path HTTP/1.1
User-Agent: Fiddler
Host: www.my_host.com
AT+QHTTPREAD=30
AT+QIDEACT
NOTE: in AT+HTTPPOST=77, 77 is the size of the POST message (last two \r\n are required and count)
NOTE2: after GET you're supposed to write the path to the url inserted in AT+QHTTPURL. For example, if you specified your URL as https://www.my_host.com/debug/main/port, your AT+HTTPPOST request should look like this (don't forget the last two \r\n):
GET /debug/main/port HTTP/1.1
User-Agent: Fiddler
Host: www.my_host.com
I am trying to scrape data on pages from an API using the getURL function of the RCurl package in R. My problem is that I can't replicate the response that I get when I open the URL in Chrome when I make the request using R. Essentially, when I open the API page (url below) in Chrome it works fine but if I request it in using getURL in R (or using incognito mode in Chrome) I get a '500 Internal Server Error' response and not the pretty JSON that I'm looking for.
URL/API in question:
http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA¤cy=USD&language=en-us&productSet=BN&sku=LD04077082
Here is my (failed) request in [R].
test2 <- fromJSON(getURL("http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA¤cy=USD&language=en-us&productSet=BN&sku=LD04077082", ssl.verifypeer = FALSE, useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36"))
My Research so Far
First I looked at this prior question on stack and added in my useragent to the request (did not solve problem but may still be necessary):
ViralHeat API issues with getURL() command in RCurl package
Next I looked at this helpful post which guides my rationale:
R Disparity between browser and GET / getURL
My Ideas About the Solution
This is not my area of expertise but my guess is that the request is lacking a cookie needed to complete the request (hence why it doesn't work in my browser in incognito mode). I compared the requests and responses from the successful request to the unsuccessful request:
Successful request:
Unsuccessful request:
Anyone have any ideas? Should I try using the package RSelenium package that was suggested by MrFlick in the 2nd post I made.
This is a courteous site. It would like to know where you come from what currency you use etc. to give you a better user experience. It does this by setting a multitude of cookies on the landing page. So we follow suit and navigate to the landing page first getting the cookies then we goto the page we want:
library(RCurl)
myURL <- "http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA¤cy=USD&language=en-us&productSet=BN&sku=LD04077082"
agent="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0"
#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt", useragent = agent, followlocation = TRUE, curl=curl)
firstPage <- getURL("http://www.bluenile.com", curl=curl)
myPage <- getURL(myURL, curl = curl)
library(RJSONIO)
> names(fromJSON(myPage))
[1] "diamondDetailsHeader" "diamondDetailsBodies" "pageMetadata" "expandedUrl"
[5] "newVersion" "multiDiamond"
and the cookies:
> getCurlInfo(curl)$cookielist
[1] ".bluenile.com\tTRUE\t/\tFALSE\t2412270275\tGUID\tDA5C11F5_E468_46B5_B4E8_D551D4D6EA4D"
[2] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tsplit\tver~3&presetFilters~TEST"
[3] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tsitetrack\tver~2&jse~0"
[4] ".bluenile.com\tTRUE\t/\tFALSE\t1425230275\tpop\tver~2&china~false&french~false&ie~false&internationalSelect~false&iphoneApp~false&survey~false&uae~false"
[5] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tdsearch\tver~6&newUser~true"
[6] ".bluenile.com\tTRUE\t/\tFALSE\t1443806275\tlocale\tver~1&country~IRL¤cy~EUR&language~en-gb&productSet~BNUK"
[7] ".bluenile.com\tTRUE\t/\tFALSE\t0\tbnses\tver~1&ace~false&isbml~false&fbcs~false&ss~0&mbpop~false&sswpu~false&deo~false"
[8] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tbnper\tver~5&NIB~0&DM~-&GUID~DA5C11F5_E468_46B5_B4E8_D551D4D6EA4D&SESS-CT~1&STC~32RPVK&FB_MINI~false&SUB~false"
[9] "#HttpOnly_www.bluenile.com\tFALSE\t/\tFALSE\t0\tJSESSIONID\tB8475C3AEC08205E5AC6252C94E4B858"
[10] ".bluenile.com\tTRUE\t/\tFALSE\t1727630278\tmigrationstatus\tver~1&redirected~false"