I am trying to turn an XHR (XMLHttpRequest) request into an R command.
I am using the following code:
library(httr)
x <- POST("https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/getDataTableDetailData/?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&dateTime.endDateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&area.values=CTY%7C10YBE----------2!BZN%7C10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable-detail_22WAMERCO000010Y_22WAMERCO000008L_length=10&dv-datatable_length=50&detailId=22WAMERCO000010Y_22WAMERCO000008L",
user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.50 Safari/537.36"),
add_headers(`Referer`="https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show?name=&defaultValue=true&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&dateTime.endDateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&area.values=CTY|10YBE----------2!BZN|10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&productionType.values=B20&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable_length=100",
Connection = "keep-alive",
Host = "https://transparency.entsoe.eu/",
Accept = "application/json, text/javascript, */*; q=0.01",
`Accept-Encoding` = "gzip, deflate, br",
Origin = "https://transparency.entsoe.eu",
`X-Requested-With` = "XMLHttpRequest",
`Content-Type` = "application/json;charset=UTF-8",
`Accept-Language`= "en-US,en;q=0.8,nl;q=0.6,fr-FR;q=0.4,fr;q=0.2"))
But I keep getting an 400 error: bad request instead of the 200 which would mark a successful response.
I've extracted the values via the Chrome network monitor from this website. The XHR request is sent when the plus button is clicked. I can send it repeatedly from my browser, but it doesn't seem to work from R.
What am I doing wrong in creating the Post request?
Related
I'm trying to scrape some data from a PowerBI dashboard but for some reason I'm not able to replicate an XHR request successfully. Here are the details of the original request taken from Chrome web inspector:
Request
Request URL: https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true
Request Method: POST
Status Code: 200 OK
Remote Address: 51.144.73.151:443
Referrer Policy: no-referrer-when-downgrade
Headers
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate, br
Accept-Language: it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7,ar;q=0.6,fr;q=0.5,sl;q=0.4
ActivityId: b3b20ea3-8f93-1848-b4be-ebf1a5c0952f
Connection: keep-alive
Content-Length: 1176
Content-Type: application/json;charset=UTF-8
Host: wabi-west-europe-api.analysis.windows.net
Origin: https://app.powerbi.com
Referer: https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9
RequestId: 70c90610-a020-7191-a0fe-91b74d0407b9
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
X-PowerBI-ResourceKey: 3c1cdd14-9072-4021-a549-f9fa7e48417d
Request body
{"version":"1.0.0","queries":[{"Query":{"Commands":[{"SemanticQueryDataShapeCommand":{"Query":{"Version":2,"From":[{"Name":"q","Entity":"LastRefresh","Type":0}],"Select":[{"Aggregation":{"Expression":{"Column":{"Expression":{"SourceRef":{"Source":"q"}},"Property":"Date Last Refreshed"}},"Function":3},"Name":"Min(Query1.Date Last Refreshed)"}]},"Binding":{"Primary":{"Groupings":[{"Projections":[0]}]},"DataReduction":{"DataVolume":3,"Primary":{"Top":{}}},"Version":1}}}]},"CacheKey":"{\"Commands\":[{\"SemanticQueryDataShapeCommand\":{\"Query\":{\"Version\":2,\"From\":[{\"Name\":\"q\",\"Entity\":\"LastRefresh\",\"Type\":0}],\"Select\":[{\"Aggregation\":{\"Expression\":{\"Column\":{\"Expression\":{\"SourceRef\":{\"Source\":\"q\"}},\"Property\":\"Date Last Refreshed\"}},\"Function\":3},\"Name\":\"Min(Query1.Date Last Refreshed)\"}]},\"Binding\":{\"Primary\":{\"Groupings\":[{\"Projections\":[0]}]},\"DataReduction\":{\"DataVolume\":3,\"Primary\":{\"Top\":{}}},\"Version\":1}}}]}","QueryId":"","ApplicationContext":{"DatasetId":"ec162a68-e319-4018-8364-d2a74d3ed429","Sources":[{"ReportId":"8ef2e9f7-0417-4e8f-bd02-f7a3ee0fedd2"}]}}],"cancelQueries":[],"modelId":3563760}
For my simulated request I use:
httr::POST("https://wabi-west-europe-api.analysis.windows.net/public/reports/querydata?synchronous=true", content_type_json(), add_headers(.headers = heads), body = payload) %>% content()
to perform the request. As headers I only used: 'X-PowerBI-ResourceKey', 'RequestId', 'ActivityId', 'Referer'. Payload is the json copied from the Request body. I get this response:
$error
$error$code
[1] "BadRequest"
$error$message
[1] "Bad Request"
$error$details
$error$details[[1]]
$error$details[[1]]$message
[1] "After parsing a value an unexpected character was encountered: C. Path 'queries[0].CacheKey', line 1, position 488."
$error$details[[1]]$target
[1] "request.queries[0].CacheKey"
$error$details[[2]]
$error$details[[2]]$message
[1] "'request' is a required parameter"
$error$details[[2]]$target
[1] "request"
I can't understand what I'm doing wrong.
UDPATE:
solved with a change of approach in Correct way to get response body of XHR requests generated by a page with RStudio Chromote
How is it possible to make a request by HttpClient with the HTTP request header Sec-Fetch-Mode: no-cors in Blazor Webassembly?
My actuel code is :
var hc = new HttpClient();
var responseHTTP = await hc.GetAsync("https://www.somedomain.com/api/");
But this produces the following HTTP request headers :
:authority: www.somedomain.com
:method: GET
:path: /api/json?input=test&key=AIzaSyDqWvsxxxxxxxxxxxxxxxxx1R7x2qoSkc&sessiontoken=136db14b-88bd-4730-a0b2-9b6c1861d9c7
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
origin: http://localhost:5000
referer: http://localhost:5000/places
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36
x-client-data: CJS2yQxxxxxxxxxxxxxxxxxxxxxxxxygEI7bXKAQiOusoBCObGygE=
To specifically answer your question, you need to create a HttpRequestMessage first.
e.g.
var request = new HttpRequestMessage(HttpMethod.Get, "https://www.somedomain.com/api/");
request.SetBrowserRequestMode(BrowserRequestMode.NoCors);
request.SetBrowserRequestCache(BrowserRequestCache.NoStore); //optional
using (var httpClient = new HttpClient())
{
var response = await httpClient.SendAsync(request);
var content = await response.Content.ReadAsStringAsync();
}
This will correctly set the sec-fetch-mode header to no-cors
I've found however, that the response comes back as empty even though upon inspection in fiddler the response is there.
The closest I got to understanding the problem is through this issue here but unfortunately the bug was closed.
I have a simple node server. All it does is log the req.headers and res (I am learning!).
let http = require('http');
function handleIncomingRequest(req, res) {
console.log('---------------------------------------------------');
console.log(req.headers);
console.log('---------------------------------------------------');
console.log();
console.log('---------------------------------------------------');
res.writeHead(200, {'Content-Type': 'application/json'});
res.end(JSON.stringify( {error: null}) + '\n');
}
let s = http.createServer(handleIncomingRequest);
s.listen(8080);
When I use curl to test the server it sends 1 request. When I use chrome it sends 2 different requests.
{ host: 'localhost:8080',
connection: 'keep-alive',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-GB,en;q=0.8' }
and
{ host: 'localhost:8080',
connection: 'keep-alive',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
accept: 'image/webp,image/*,*/*;q=0.8',
referer: 'http://localhost:8080/',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-GB,en;q=0.8' }
This is in incognito mode as in normal mode there are 3 requests!
What is the browser doing and why?
Hard to tell without seeing the full transaction data (for example, what was the request, i.e. what came after GET or POST - and what were the answers from the server).
But it could be caused by the 'upgrade-insecure-requests': '1' header:
When a server encounters this preference in an HTTP request’s headers,
it SHOULD redirect the user to a potentially secure representation of
the resource being requested.
See this.
accept: 'image/webp,image/*,*/*;q=0.8'
On the other hand, the second request is probably for an image only, most likely the favicon.ico or a (bigger) icon for iPad/iPhone maybe (that could explain the 3 requests). You should check out the full request data to be sure.
You can use F12 en select network in the browser to see what's really happening.
I'm trying to work with requests (python 3.4) to create a session where I log into gamefaqs.com and navigate to a board page so that I can scrape the content off to get relavant information for what I'm trying to accomplish. I directly copied the header and payload information from the developer console in firefox.
import requests
import urllib3
url = 'http://www.gamefaqs.com/user/login'
url2 = 'http://www.gamefaqs.com/user/Leight_Weight/boards'
header = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.5',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Referer': 'http://www.gamefaqs.com/',
'Connection': 'keep-alive',
'Host': 'www.gamefaqs.com',
}
payload = {
'path': "http://www.gamefaqs.com/",
'key': "71548de4",
'EMAILADDR': "username",
'PASSWORD': "password",
}
with requests.Session() as s:
p = s.get(url, headers=header)
p = s.post(url, headers=header, data=payload, cookies = s.cookies)
The problem that I'm having is that I'm not receiving back the authentication cookie passed from the website to my session. I'm using fiddler to track the post request from Python. Despite the request header information being identical to the request header information in firefox, the response header information is very different.
The response header from firefox (as seen by Fiddler):
Firefox Response Header
The response header from Python (as seen by Fiddler):
Python Response Header
At this point I'm at a bit of a loss. As far as I can tell my code is sound and the request headers are correct, however not receiving the authentication cookie proves something is wrong. If you look in the response header the codes are different (302 vs 200). I'm not sure what the error is.
As it turns out - the payload item 'key' changes depending on your session. I didn't catch this initially because I didn't think of the fact that browsers use persistent cookies through open/close, something this solution does not use.
I did a bit of a heavy-handed approach to finding the right key value using BeautifulSoup, but the result remains the same. Once I had the appropriate key value, I added that to the payload before doing the post command and viola - successful login.
For posterity's sake, the code is below.
import requests
from bs4 import BeautifulSoup as bs
url = 'http://www.gamefaqs.com/user/login'
url2 = 'http://www.gamefaqs.com/user/Leight_Weight/boards'
header = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.5',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Referer': 'http://www.gamefaqs.com/',
'Connection': 'keep-alive',
'Host': 'www.gamefaqs.com',
}
payload = {
'PASSWORD': "password",
'path': "http://www.gamefaqs.com/",
'EMAILADDR': "username",
}
with requests.Session() as s:
resp = s.get(url, headers=header)
parse = bs(resp.text)
keyval = parse.find_all('form')[1].contents[1]['value']
payload['key'] = keyval
p = s.post(url, headers=header, data=payload)
I have an Intel Edison running a Node.JS server that is printing everything I post to it into the console. I can successfully post to it using Postman and see the sent raw data in the console.
Now I'm using Processing to POST to it, which will fire off different events on the Node.JS server.
My problem is that I can't seem to successfully POST the raw body to the server, I've been trying to get this working for several hours already.
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
//Attempting to write the raw post body
myClient.write("test");
//2 newlines tells the server that we're done sending
myClient.write("\n\n");
}
The console shows that the server received the POST, and the correct headers, but it doesn't show any data in it.
How do I specify the that "test" is the raw POST data?
The HTTP code from Postman:
POST HTTP/1.1
Host: 192.168.0.107:3000
Content-Type: text/plain
Cache-Control: no-cache
Postman-Token: 6cab79ad-b43b-b4d3-963f-fad11523ec0b
test
The server output from a POST from Postman:
{ host: '192.168.0.107:3000',
connection: 'keep-alive',
'content-length': '4',
'cache-control': 'no-cache',
origin: 'chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop',
'content-type': 'text/plain',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36',
'postman-token': 'd17676a6-98f4-917c-955c-7d8ef01bb024',
accept: '*/*',
'accept-encoding': 'gzip, deflate',
'accept-language': 'en-US,en;q=0.8' }
test
The server output from my POST from Processing:
{ host: '192.168.0.107:3000',
'cache-control': 'no-cache',
'content-type': 'text/plain' }
{}
I just figured out what was wrong, I needed to add the content-length header to tell the server how much data to listen for, and then a newline before the data.
Final code:
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
myClient.write("content-length: 4\n");
myClient.write("\n");
myClient.write("test");
myClient.write("\n\n");
}