I'm trying to scrape some data from a PowerBI dashboard but for some reason I'm not able to replicate an XHR request successfully. Here are the details of the original request taken from Chrome web inspector:
Request
Request URL: https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true
Request Method: POST
Status Code: 200 OK
Remote Address: 51.144.73.151:443
Referrer Policy: no-referrer-when-downgrade
Headers
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate, br
Accept-Language: it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7,ar;q=0.6,fr;q=0.5,sl;q=0.4
ActivityId: b3b20ea3-8f93-1848-b4be-ebf1a5c0952f
Connection: keep-alive
Content-Length: 1176
Content-Type: application/json;charset=UTF-8
Host: wabi-west-europe-api.analysis.windows.net
Origin: https://app.powerbi.com
Referer: https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9
RequestId: 70c90610-a020-7191-a0fe-91b74d0407b9
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
X-PowerBI-ResourceKey: 3c1cdd14-9072-4021-a549-f9fa7e48417d
Request body
{"version":"1.0.0","queries":[{"Query":{"Commands":[{"SemanticQueryDataShapeCommand":{"Query":{"Version":2,"From":[{"Name":"q","Entity":"LastRefresh","Type":0}],"Select":[{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"q"}},"Property":"Date Last Refreshed"}},"Function":3},"Name":"Min(Query1.Date Last Refreshed)"}]},"Binding":{"Primary":{"Groupings":[{"Projections":[0]}]},"DataReduction":{"DataVolume":3,"Primary":{"Top":{}}},"Version":1}}}]},"CacheKey":"{\"Commands\":[{\"SemanticQueryDataShapeCommand\":{\"Query\":{\"Version\":2,\"From\":[{\"Name\":\"q\",\"Entity\":\"LastRefresh\",\"Type\":0}],\"Select\":[{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"q\"}},\"Property\":\"Date Last Refreshed\"}},\"Function\":3},\"Name\":\"Min(Query1.Date Last Refreshed)\"}]},\"Binding\":{\"Primary\":{\"Groupings\":[{\"Projections\":[0]}]},\"DataReduction\":{\"DataVolume\":3,\"Primary\":{\"Top\":{}}},\"Version\":1}}}]}","QueryId":"","ApplicationContext":{"DatasetId":"ec162a68-e319-4018-8364-d2a74d3ed429","Sources":[{"ReportId":"8ef2e9f7-0417-4e8f-bd02-f7a3ee0fedd2"}]}}],"cancelQueries":[],"modelId":3563760}
For my simulated request I use:
httr::POST("https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true", content_type_json(), add_headers(.headers = heads), body = payload) %>% content()
to perform the request. As headers I only used: 'X-PowerBI-ResourceKey', 'RequestId', 'ActivityId', 'Referer'. Payload is the json copied from the Request body. I get this response:
$error
$error$code
[1] "BadRequest"
$error$message
[1] "Bad Request"
$error$details
$error$details[[1]]
$error$details[[1]]$message
[1] "After parsing a value an unexpected character was encountered: C. Path 'queries[0].CacheKey', line 1, position 488."
$error$details[[1]]$target
[1] "request.queries[0].CacheKey"
$error$details[[2]]
$error$details[[2]]$message
[1] "'request' is a required parameter"
$error$details[[2]]$target
[1] "request"
I can't understand what I'm doing wrong.
UDPATE:
solved with a change of approach in Correct way to get response body of XHR requests generated by a page with RStudio Chromote
Related
How is it possible to make a request by HttpClient with the HTTP request header Sec-Fetch-Mode: no-cors in Blazor Webassembly?
My actuel code is :
var hc = new HttpClient();
var responseHTTP = await hc.GetAsync("https://www.somedomain.com/api/");
But this produces the following HTTP request headers :
:authority: www.somedomain.com
:method: GET
:path: /api/json?input=test&key=AIzaSyDqWvsxxxxxxxxxxxxxxxxx1R7x2qoSkc&sessiontoken=136db14b-88bd-4730-a0b2-9b6c1861d9c7
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
origin: http://localhost:5000
referer: http://localhost:5000/places
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36
x-client-data: CJS2yQxxxxxxxxxxxxxxxxxxxxxxxxygEI7bXKAQiOusoBCObGygE=
To specifically answer your question, you need to create a HttpRequestMessage first.
e.g.
var request = new HttpRequestMessage(HttpMethod.Get, "https://www.somedomain.com/api/");
request.SetBrowserRequestMode(BrowserRequestMode.NoCors);
request.SetBrowserRequestCache(BrowserRequestCache.NoStore); //optional
using (var httpClient = new HttpClient())
{
var response = await httpClient.SendAsync(request);
var content = await response.Content.ReadAsStringAsync();
}
This will correctly set the sec-fetch-mode header to no-cors
I've found however, that the response comes back as empty even though upon inspection in fiddler the response is there.
The closest I got to understanding the problem is through this issue here but unfortunately the bug was closed.
Im trying to set up a web server that receives and handles a couple of http requests. Unfortunately, I'm stuck on "POST" requests. Hope someone can help me here.
I have a form that I use to send a file(image) + some text fields from the browser to my server.
When I receive the request on the server and analyze the request body (which is below all "header" fields), I notice that it does not always look the same.
The body always starts with:
--- webkitformboundry .....
Content-Disposition .....
Content-Type: image .....
and then lots of unreadable data...
After the unreadable data it continues the same way it started...This time for the remaining fields:
--- webkitformboundry .....
Content-Disposition .....
So now I wonder. Why is the information about the other input fields only visible sometimes, and sometimes not?
Best regards
UPDATE: This is the form I use to send the request
<form method="POST" action="process_upload.php" enctype="multipart/form-data">
<input type="file" name="user_upload">
<input type="hidden" name="duration">
<input type="hidden" name="test" value="test test">
<button type="submit">Send</button>
</form>
Server receives:
POST /process_upload.php HTTP/1.1
Host: localhost:27015
Connection: keep-alive
Content-Length: 129985
Cache-Control: max-age=0
Origin: http://localhost:27015
Upgrade-Insecure-Requests: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary1JhkT5GiE89P2SBs
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: navigate
Referer: http://localhost:27015/index
Accept-Encoding: gzip, deflate, br
Accept-Language: sv-SE,sv;q=0.9,en-US;q=0.8,en;q=0.7
Cookie: PHPSESSID=5motqqf425llf53j5hutj0a0bn
------WebKitFormBoundarymRYGcz75Ct8BV6bh
Content-Disposition: form-data; name="user_upload"; filename="myfile.jpg"
Content-Type: image/jpeg
ÿØÿáɧ¦¨ä"Ö]2üà©–=/D‰…Ë7]^ÞXªë§¬–-zƉx›Ž;?•?üƒî«û³o±Vôì°%²úŸà£õ'õ·òùÿmá±E~ßÝ°Ú׫Ç{NÉò·ƒLûúÌHøûs–ŸsÅ“µï²gÃi½î¢©ÇR»ŽE¼S^]þÏÿ
//End of message here.
As you can see, information about 2 input fields is missing (duration and test).
Full content length is received and no supplementary request is sent
Here is my AJAX function:
function ajax(url, data) {
return new Promise((resolve, reject) => {
$.ajax({
url: "https://xxx",
data: data,
method: 'POST',
timeout: 50000,
cache: true,
ifModified: true,
crossDomain: true,
success: (data, textStatus, jqXHR) => {
if (data == '#fail#') reject(data);
else {resolve(data);}
},
error: (jqXHR, textStatus, errorThrown) => {
reject(errorThrown);
}
});
});
}
As observed in Chrome -> Network(F12), this is the response header from the server:
HTTP/1.1 200 OK
X-Powered-By: Express
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
Content-Length: 3
ETag: W/"3-R7zlx09Yn0hn29V+nKn4CA"
Date: Fri, 06 Apr 2018 11:39:41 GMT
Connection: keep-alive
The request header is always identical, even in subsequent calls:
POST /register HTTP/1.1
Host: xxx:60001
Connection: keep-alive
Content-Length: 0
Accept: */*
Origin: http://localhost:8000
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Referer: http://localhost:8000/index.html
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Shouldn't Chrome, upon receiving an ETag header, cache the resource and set the 'If-None-Match' header on subsequent calls to the same URL? Shouldn't I obtain a status code of 304 instead of 200 as the returned content is the same?
The calls to the resources in other servers such as the Google Map server do return 304 sometimes though.
This confirms that caching is generally limited to GET request methods only:
However, common HTTP caches are typically limited to caching responses to GET and may decline other methods. The primary cache key consists of the request method and target URI (oftentimes only the URI is used as only GET requests are caching targets)
This is also confirmed in a post in StackOverflow here.
I have a service written in Jolie, where I want to extract the http headers on request. In the same way the request.id can be printed out, I would like to print the headers. There is a try on the bold letter down in the code. Here the code:
execution { concurrent }
inputPort UserDB_Service {
Location: "socket://localhost:8002/"
Protocol: http { .format = "json"}
Interfaces: Users, ShutdownInterface, ConnectionPool
}
outputPort DB_Connector {
Location: "socket://localhost:1000/"
Protocol: sodep
Interfaces: ConnectionPool
}
init
{
connectionConfigInfo#DB_Connector()(connectionInfo);
connect#Database(connectionInfo)()
}
main
{
//Example: http://localhost:8002/retrieve?id=1
[ retrieve(request)(response) {
query#Database(
"select * from users where user_id=:id" {
.id = request.id
}
)(sqlResponse);
println#Console( "You have requested the user_id: " + request.id)();
**println#Console( "Request Headers: " + response.format)();**
if (#sqlResponse.row == 1) {
response -> sqlResponse.row[0]
}
} ]
}
Thanks for the help.
I did not understand if you know which headers you want to have in the inbound request or if you just want to print the whole http message for debugging purposes. It is quick in both cases, I report both solutions :)
In the first case you can set the headers parameter of the http protocol for the inputPort to include in the request message also the content of a specific header, e.g.,
http {
.headers.format = "format";
}
and then you can inspect the value in the usual way
println#Console( request.format )()
In the second case, you can use
http {
.debug = true;
.debug.showContent = true
}
to see the log of all http requests and responses and their bodies.
These and further info on protocols and in particular the http protocol is in the documentation of the Jolie site.
I put the output here again. I wonder if it is possible to extract the "iv-user: g47257" header, which I have injected by using Fiddler. Thanks again for the help.
The headers are like this (better format).
INFO: [UserDB_crud.ol] [HTTP debug] Receiving:
HTTP Code: 0
Resource: /retrieve?id=1
--> Header properties
iv-user: g47257
accept-language: en-US,en;q=0.8,da;q=0.6,es;q=0.4
host: localhost:8002
upgrade-insecure-requests: 1
connection: keep-alive
cache-control: max-age=0
accept-encoding: gzip, deflate, sdch
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp
,*/*;q=0.8
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
L, like Gecko) Chrome/48.0.2564.116 Safari/537.36
You have requested the user_id: 1
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Sending:
HTTP/1.1 200 OK
Server: Jolie
X-Jolie-MessageID: 0
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Content-Length: 72
?V*H,..?/JQ?R*I-.Q?Q*-N-??♦
↑?(?%?"dRs‼3s?\►?????T♂ %??WE
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Receiving:
HTTP Code: 0
Resource: /favicon.ico
--> Header properties
iv-user: g47257
referer: http://localhost:8002/retrieve?id=1
accept-language: en-US,en;q=0.8,da;q=0.6,es;q=0.4
host: localhost:8002
connection: keep-alive
cache-control: no-cache
pragma: no-cache
accept-encoding: gzip, deflate, sdch
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTM
L, like Gecko) Chrome/48.0.2564.116 Safari/537.36
accept: */*
mar. 10, 2016 2:30:44 PM jolie.Interpreter logWarning
WARNING: [UserDB_crud.ol] Received a message for operation favicon.ico, not specified in the input port at the receiving service. Sending IOException to the caller.
mar. 10, 2016 2:30:44 PM jolie.Interpreter logInfo
INFO: [UserDB_crud.ol] [HTTP debug] Sending:
HTTP/1.1 200 OK
Server: Jolie
X-Jolie-MessageID: 0
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Content-Length: 102
?VJ-*?/R??V?M-.NLOU?R??w?HN-(???S?QJ?O☺?→←↓↑↑?(?$?$???%?d?(?↨?▬%?¶Z)?%?e&???☺ ??Z ?yd?Y
I re-post my last comment here since other people faced the same difficulties found by Efrin but might miss the solution I posted as a comment.
You can inspect the headers of a HTTP request as shown in the code below
include "console.iol"
inputPort Me {
Location: "socket://localhost:8000"
Protocol: http { .headers.iv_user = "ivUser" }
RequestResponse: myRequest
}
main {
myRequest( request )(){ println#Console( request.ivUser )() }
}
Remember that, as reported in the documentation, Jolie http.headers parameters map - in header names with _, e.g., in your case, header iv-user becomes iv_user in the Jolie HTTP protocol parameters.
Besides the description and code found in the Jolie documentation, you can find further examples and a more thorough explanation on how the HTTP protocol works in Jolie in its presentation paper wrote by Montesi https://doi.org/10.1016/j.scico.2016.05.002.
Is there anything special I need to consider when trying to change the user agent via httr::user_agent in a httr::GET() call on MS Windows? I'm using R-3.1.0 and httr 0.3.
Following the example at ?user_agent, I'm getting these results:
url_this <- "http://httpbin.org/user-agent"
Standard user agent:
GET(url_this)
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
Modified user agent:
GET(url_this, user_agent("Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
I had expected that the second call returns something closer to what I'm getting when visiting url_this in my browser:
{
"user-agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0"
}
What am I missing here? Also ran setInternet2(TRUE) first, but got identical results.
Very curious the help page ?user_agent suggests it should work. You can set a header explicitly and it does work
> GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "Mozilla/5.0"
}
but the example given in ?user_agent appears not to.
> GET("http://httpbin.org/user-agent", user_agent("Mozilla/5.0") )
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
>
It is returning
> httr:::default_ua()
[1] "curl/7.19.7 Rcurl/1.95.4.1 httr/0.3"
My ISP was also doing something funky so you may need:
GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0", "Cache-Control" = "no-cache"))