Failing to set the user agent via httr::user_agent - r

Is there anything special I need to consider when trying to change the user agent via httr::user_agent in a httr::GET() call on MS Windows? I'm using R-3.1.0 and httr 0.3.
Following the example at ?user_agent, I'm getting these results:
url_this <- "http://httpbin.org/user-agent"
Standard user agent:
GET(url_this)
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
Modified user agent:
GET(url_this, user_agent("Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
I had expected that the second call returns something closer to what I'm getting when visiting url_this in my browser:
{
"user-agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0"
}
What am I missing here? Also ran setInternet2(TRUE) first, but got identical results.

Very curious the help page ?user_agent suggests it should work. You can set a header explicitly and it does work
> GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "Mozilla/5.0"
}
but the example given in ?user_agent appears not to.
> GET("http://httpbin.org/user-agent", user_agent("Mozilla/5.0") )
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
>
It is returning
> httr:::default_ua()
[1] "curl/7.19.7 Rcurl/1.95.4.1 httr/0.3"
My ISP was also doing something funky so you may need:
GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0", "Cache-Control" = "no-cache"))

Related

Can't simulate an XHR request to a PowerBI dashboard

I'm trying to scrape some data from a PowerBI dashboard but for some reason I'm not able to replicate an XHR request successfully. Here are the details of the original request taken from Chrome web inspector:
Request
Request URL: https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true
Request Method: POST
Status Code: 200 OK
Remote Address: 51.144.73.151:443
Referrer Policy: no-referrer-when-downgrade
Headers
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate, br
Accept-Language: it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7,ar;q=0.6,fr;q=0.5,sl;q=0.4
ActivityId: b3b20ea3-8f93-1848-b4be-ebf1a5c0952f
Connection: keep-alive
Content-Length: 1176
Content-Type: application/json;charset=UTF-8
Host: wabi-west-europe-api.analysis.windows.net
Origin: https://app.powerbi.com
Referer: https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9
RequestId: 70c90610-a020-7191-a0fe-91b74d0407b9
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
X-PowerBI-ResourceKey: 3c1cdd14-9072-4021-a549-f9fa7e48417d
Request body
{"version":"1.0.0","queries":[{"Query":{"Commands":[{"SemanticQueryDataShapeCommand":{"Query":{"Version":2,"From":[{"Name":"q","Entity":"LastRefresh","Type":0}],"Select":[{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"q"}},"Property":"Date Last Refreshed"}},"Function":3},"Name":"Min(Query1.Date Last Refreshed)"}]},"Binding":{"Primary":{"Groupings":[{"Projections":[0]}]},"DataReduction":{"DataVolume":3,"Primary":{"Top":{}}},"Version":1}}}]},"CacheKey":"{\"Commands\":[{\"SemanticQueryDataShapeCommand\":{\"Query\":{\"Version\":2,\"From\":[{\"Name\":\"q\",\"Entity\":\"LastRefresh\",\"Type\":0}],\"Select\":[{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"q\"}},\"Property\":\"Date Last Refreshed\"}},\"Function\":3},\"Name\":\"Min(Query1.Date Last Refreshed)\"}]},\"Binding\":{\"Primary\":{\"Groupings\":[{\"Projections\":[0]}]},\"DataReduction\":{\"DataVolume\":3,\"Primary\":{\"Top\":{}}},\"Version\":1}}}]}","QueryId":"","ApplicationContext":{"DatasetId":"ec162a68-e319-4018-8364-d2a74d3ed429","Sources":[{"ReportId":"8ef2e9f7-0417-4e8f-bd02-f7a3ee0fedd2"}]}}],"cancelQueries":[],"modelId":3563760}
For my simulated request I use:
httr::POST("https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true", content_type_json(), add_headers(.headers = heads), body = payload) %>% content()
to perform the request. As headers I only used: 'X-PowerBI-ResourceKey', 'RequestId', 'ActivityId', 'Referer'. Payload is the json copied from the Request body. I get this response:
$error
$error$code
[1] "BadRequest"
$error$message
[1] "Bad Request"
$error$details
$error$details[[1]]
$error$details[[1]]$message
[1] "After parsing a value an unexpected character was encountered: C. Path 'queries[0].CacheKey', line 1, position 488."
$error$details[[1]]$target
[1] "request.queries[0].CacheKey"
$error$details[[2]]
$error$details[[2]]$message
[1] "'request' is a required parameter"
$error$details[[2]]$target
[1] "request"
I can't understand what I'm doing wrong.
UDPATE:
solved with a change of approach in Correct way to get response body of XHR requests generated by a page with RStudio Chromote

Are browsers supposed to handle 304 responses automagically?

Might be a silly question, but I haven't found any clear answer yet.
My server handles ETag caching for some quite big JSON responses we have, returning 304 NOT MODIFIED with an empty body if the If-None-Match header contains the same hash as the one newly generated (shallow ETags).
Are browsers supposed to handle this automagically, or do the in-browser client apps consuming the API asynchronously need to implement some logic to handle such responses (i.e. use the cached version if 304 is responded, create/update the cached version otherwise)?
Because so far, I've manually implemented this logic client-side, but I'm wondering whether I just reinvented a square wheel...
In other words, with the Cache-Control header for example, the in-browser client apps don't need to parse the value, check for max-age for instance, stores it somehow, setup a timeout, etc.: everything is handled ahead by the browsers directly. The question is: are browsers supposed to behave the same way when they receive a 304?
Here is how I wrote my client so far (built with AngularJS, running in browsers):
myModule
.factory("MyRepository", ($http) => {
return {
fetch: (etag) => {
return $http.get(
"/api/endpoint",
etag ? { headers: { "If-None-Match": etag } } : undefined
);
}
};
})
.factory("MyService", (MyRepository, $q) => {
let latestEtag = null;
let latestVersion = null;
return {
fetch: () => {
return MyRepository
.fetch(latestEtag)
.then((response) => {
latestEtag = response.headers("ETag");
latestVersion = response.data;
return angular.copy(latestVersion);
})
.catch((response) => {
return 304 === error.status
? angular.copy(latestVersion)
: $q.reject(response)
});
}
};
});
So basically, is the above logic effectively needed, or am I supposed to be able to simply use $http.get("/api/endpoint") directly?
This code above is working fine, which seems to mean that it needs to be handled programmatically, although I've never seen such "custom" implementations on the articles I read.
The 304 responses are automagically handled by browser as such
So I created a simple page
<html>
<head>
<script src="./axios.min.js"></script>
<script src="./jquery-3.3.1.js"></script>
</head>
<body>
<h1>this is a test</page>
</body>
</html>
and the added a test.json file
root#vagrant:/var/www/html# cat test.json
{
"name": "tarun"
}
And then in nginx added below
location ~* \.(jpg|jpeg|png|gif|ico|css|js|json)$ {
expires 365d;
}
Now the results
AXIOS
As you can see the first request is 200 and second one 304 but there is no impact on the JS code
jQuery
Same thing with jQuery as well
From the curl you can see that server didn't send anything on the 2nd 304 request
$ curl -v 'http://vm/test.json' -H 'If-None-Match: "5ad71064-17"' -H 'DNT: 1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.9' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' -H 'Accept: */*' -H 'Referer: http://vm/' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' -H 'If-Modified-Since: Wed, 18 Apr 2018 09:31:16 GMT' --compressed
* Trying 192.168.33.100...
* TCP_NODELAY set
* Connected to vm (192.168.33.100) port 80 (#0)
> GET /test.json HTTP/1.1
> Host: vm
> If-None-Match: "5ad71064-17"
> DNT: 1
> Accept-Encoding: gzip, deflate
> Accept-Language: en-US,en;q=0.9
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
> Accept: */*
> Referer: http://vm/
> X-Requested-With: XMLHttpRequest
> Connection: keep-alive
> If-Modified-Since: Wed, 18 Apr 2018 09:31:16 GMT
>
< HTTP/1.1 304 Not Modified
< Server: nginx
< Date: Wed, 18 Apr 2018 09:42:45 GMT
< Last-Modified: Wed, 18 Apr 2018 09:31:16 GMT
< Connection: keep-alive
< ETag: "5ad71064-17"
<
* Connection #0 to host vm left intact
So you don't need to handle a 304, browser will do that work for you.
Yes, probably all modern major browsers handle response validation using conditional requests well. Relevant excerpt from The State of Browser Caching, Revisited article by Mark Nottingham:
Validation allows a cache to check with the server to see if a stale stored response can be reused.
All of the tested browsers support validation based upon ETag and Last-Modified. The tricky part is making sure that the 304 Not Modified response is correctly combined with the stored response; specifically, the headers in the 304 update the stored response headers.
All of the tested browsers do update stored headers upon a 304, both in the immediate response and subsequent ones served from cache.
This is good news; updating headers with a 304 is an important mechanism, and when they get out of sync it can cause problems.
For more information check HTTP Caching article by Ilya Grigorik.

Why is my AJAX result not ETag-cached (no If-None-Match)?

Here is my AJAX function:
function ajax(url, data) {
return new Promise((resolve, reject) => {
$.ajax({
url: "https://xxx",
data: data,
method: 'POST',
timeout: 50000,
cache: true,
ifModified: true,
crossDomain: true,
success: (data, textStatus, jqXHR) => {
if (data == '#fail#') reject(data);
else {resolve(data);}
},
error: (jqXHR, textStatus, errorThrown) => {
reject(errorThrown);
}
});
});
}
As observed in Chrome -> Network(F12), this is the response header from the server:
HTTP/1.1 200 OK
X-Powered-By: Express
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
Content-Length: 3
ETag: W/"3-R7zlx09Yn0hn29V+nKn4CA"
Date: Fri, 06 Apr 2018 11:39:41 GMT
Connection: keep-alive
The request header is always identical, even in subsequent calls:
POST /register HTTP/1.1
Host: xxx:60001
Connection: keep-alive
Content-Length: 0
Accept: */*
Origin: http://localhost:8000
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Referer: http://localhost:8000/index.html
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Shouldn't Chrome, upon receiving an ETag header, cache the resource and set the 'If-None-Match' header on subsequent calls to the same URL? Shouldn't I obtain a status code of 304 instead of 200 as the returned content is the same?
The calls to the resources in other servers such as the Google Map server do return 304 sometimes though.
This confirms that caching is generally limited to GET request methods only:
However, common HTTP caches are typically limited to caching responses to GET and may decline other methods. The primary cache key consists of the request method and target URI (oftentimes only the URI is used as only GET requests are caching targets)
This is also confirmed in a post in StackOverflow here.

How to get http headers on a service in Jolie Programming Language

I have a service written in Jolie, where I want to extract the http headers on request. In the same way the request.id can be printed out, I would like to print the headers. There is a try on the bold letter down in the code. Here the code:
execution { concurrent }
inputPort UserDB_Service {
Location: "socket://localhost:8002/"
Protocol: http { .format = "json"}
Interfaces: Users, ShutdownInterface, ConnectionPool
}
outputPort DB_Connector {
Location: "socket://localhost:1000/"
Protocol: sodep
Interfaces: ConnectionPool
}
init
{
connectionConfigInfo#DB_Connector()(connectionInfo);
connect#Database(connectionInfo)()
}
main
{
//Example: http://localhost:8002/retrieve?id=1
[ retrieve(request)(response) {
query#Database(
"select * from users where user_id=:id" {
.id = request.id
}
)(sqlResponse);
println#Console( "You have requested the user_id: " + request.id)();
**println#Console( "Request Headers: " + response.format)();**
if (#sqlResponse.row == 1) {
response -> sqlResponse.row[0]
}
} ]
}
Thanks for the help.
I did not understand if you know which headers you want to have in the inbound request or if you just want to print the whole http message for debugging purposes. It is quick in both cases, I report both solutions :)
In the first case you can set the headers parameter of the http protocol for the inputPort to include in the request message also the content of a specific header, e.g.,
http {
.headers.format = "format";
}
and then you can inspect the value in the usual way
println#Console( request.format )()
In the second case, you can use
http {
.debug = true;
.debug.showContent = true
}
to see the log of all http requests and responses and their bodies.
These and further info on protocols and in particular the http protocol is in the documentation of the Jolie site.
I put the output here again. I wonder if it is possible to extract the "iv-user: g47257" header, which I have injected by using Fiddler. Thanks again for the help.
The headers are like this (better format).
INFO: [UserDB_crud.ol] [HTTP debug] Receiving:
HTTP Code: 0
Resource: /retrieve?id=1
--> Header properties
iv-user: g47257
accept-language: en-US,en;q=0.8,da;q=0.6,es;q=0.4
host: localhost:8002
upgrade-insecure-requests: 1
connection: keep-alive
cache-control: max-age=0
accept-encoding: gzip, deflate, sdch
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp
,*/*;q=0.8
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
L, like Gecko) Chrome/48.0.2564.116 Safari/537.36
You have requested the user_id: 1
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Sending:
HTTP/1.1 200 OK
Server: Jolie
X-Jolie-MessageID: 0
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Content-Length: 72
?V*H,..?/JQ?R*I-.Q?Q*-N-??♦
↑?(?%?"dRs‼3s?\►?????T♂ %??WE
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Receiving:
HTTP Code: 0
Resource: /favicon.ico
--> Header properties
iv-user: g47257
referer: http://localhost:8002/retrieve?id=1
accept-language: en-US,en;q=0.8,da;q=0.6,es;q=0.4
host: localhost:8002
connection: keep-alive
cache-control: no-cache
pragma: no-cache
accept-encoding: gzip, deflate, sdch
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
L, like Gecko) Chrome/48.0.2564.116 Safari/537.36
accept: */*
mar. 10, 2016 2:30:44 PM jolie.Interpreter logWarning
WARNING: [UserDB_crud.ol] Received a message for operation favicon.ico, not specified in the input port at the receiving service. Sending IOException to the caller.
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Sending:
HTTP/1.1 200 OK
Server: Jolie
X-Jolie-MessageID: 0
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Content-Length: 102
?VJ-*?/R??V?M-.NLOU?R??w?HN-(???S?QJ?O☺?→←↓↑↑?(?$?$???%?d?(?↨?▬%?¶Z)?%?e&???☺ ??Z ?yd?Y
I re-post my last comment here since other people faced the same difficulties found by Efrin but might miss the solution I posted as a comment.
You can inspect the headers of a HTTP request as shown in the code below
include "console.iol"
inputPort Me {
Location: "socket://localhost:8000"
Protocol: http { .headers.iv_user = "ivUser" }
RequestResponse: myRequest
}
main {
myRequest( request )(){ println#Console( request.ivUser )() }
}
Remember that, as reported in the documentation, Jolie http.headers parameters map - in header names with _, e.g., in your case, header iv-user becomes iv_user in the Jolie HTTP protocol parameters.
Besides the description and code found in the Jolie documentation, you can find further examples and a more thorough explanation on how the HTTP protocol works in Jolie in its presentation paper wrote by Montesi https://doi.org/10.1016/j.scico.2016.05.002.

POST raw to server Processing

I have an Intel Edison running a Node.JS server that is printing everything I post to it into the console. I can successfully post to it using Postman and see the sent raw data in the console.
Now I'm using Processing to POST to it, which will fire off different events on the Node.JS server.
My problem is that I can't seem to successfully POST the raw body to the server, I've been trying to get this working for several hours already.
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
//Attempting to write the raw post body
myClient.write("test");
//2 newlines tells the server that we're done sending
myClient.write("\n\n");
}
The console shows that the server received the POST, and the correct headers, but it doesn't show any data in it.
How do I specify the that "test" is the raw POST data?
The HTTP code from Postman:
POST HTTP/1.1
Host: 192.168.0.107:3000
Content-Type: text/plain
Cache-Control: no-cache
Postman-Token: 6cab79ad-b43b-b4d3-963f-fad11523ec0b
test
The server output from a POST from Postman:
{ host: '192.168.0.107:3000',
connection: 'keep-alive',
'content-length': '4',
'cache-control': 'no-cache',
origin: 'chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop',
'content-type': 'text/plain',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36',
'postman-token': 'd17676a6-98f4-917c-955c-7d8ef01bb024',
accept: '*/*',
'accept-encoding': 'gzip, deflate',
'accept-language': 'en-US,en;q=0.8' }
test
The server output from my POST from Processing:
{ host: '192.168.0.107:3000',
'cache-control': 'no-cache',
'content-type': 'text/plain' }
{}
I just figured out what was wrong, I needed to add the content-length header to tell the server how much data to listen for, and then a newline before the data.
Final code:
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
myClient.write("content-length: 4\n");
myClient.write("\n");
myClient.write("test");
myClient.write("\n\n");
}

Resources