MSXML client side XSLT does not send accept-language header - http

When using client side XSLT in IE9 I noticed that IE sends different headers for requests that fetch the XSL and subsequent requests triggered via the document() method, than for requesting the original XML file. In particular the accept-language header is missing completely.
The bootstrap XML looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="transform.xsl"?>
<root/>
and the XSLT like this
...
<body>
<xsl:apply-templates select="document('section.xml')"/>
</body>
...
What I notice is that both the XSLT as well as the section.xml file are loaded with an HTTP request without an accept language header.
The request headers to fetch the XML file look like this:
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US,de-DE;q=0.5
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
whereas the other resources are loaded with
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
Connection: Keep-Alive
Is is a feature or a bug? Other browsers such as FF or Chrome send identical headers.
A working example can be found on my test server
This effect causes problems in a real life project, because the XML files are generated dynamically and contain end-user facing content that is negotiated based on the accept-language header. This fails because no header is sent by the transformer.
Any insight or suggestions for workarounds are welcome!
Thanks!
Carsten

I vote for "bug" since it seems more logical to repeat the accept-language header for the dependent requests (not sure if that would be specified anywhere). Could you carry the language preference information as a query parameter for the request fetching the XSL?

Related

POST command missing BODY parameters in SSL (https:), but not in http:

Working on some code that that I inherited from a non-responsive initial developer. My ASP.NET and web.config are at best "dated", thus I'm turning to the community for some help. One of the first things I had to change is to force this website to operate in SSL (https:) as it deals with sensitive data. The program immediately stopped working and I had to make some undesirable changes to code that "already worked". And it still seems broken, and the changes won't make the client happy.
This is an ASP.NET project that seems hand-rolled.
Sending a POST command with some body text that (I think) is JSon setting additional parameters to the POST command such as: "indexID=8379fcd1-5083-4d1c-a6ee-5812f134a505".
As far as I can tell, this works as intended on non SSL (i.e. http: requests). However, when running in SSL (i.e. https: requests), it appears that the BODY (Json text) isn't getting decoded into the HttpContect.Current.Request parameters (which seems to be happening in http:).
However the post_data that I can read from the input stream has the JSon body text (as clear text?) with the parameters, which my 'fix' adds to the incoming HttpContect.Current.Request parameters as a combined dictionary.
[Here is the RAW command intercepted with Fiddler] POST https://vmdev-xpp/BuilderQC/Services/Data.svc/QueryGridResults?typename=ImportReadyForDownload HTTP/1.1 Host: vmdev-xpp User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:92.0) Gecko/20100101 Firefox/92.0 Accept: application/json, text/javascript, /; q=0.01 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Content-Type: application/x-www-form-urlencoded X-Requested-With: XMLHttpRequest Content-Length: 378 Origin: https://vmdev-xpp Connection: keep-alive Referer: https://vmdev-xpp/BuilderQC/BREDFileManagement.aspx Cookie: ASP.NET_SessionId=ehi2ccsdmekvkgegmfh11n1v; .ASPXLMPTest=2226D5725FBC10FBCCD606108CE5A4E32990EEA8FF8A1864496F69874F116D7E1ABF48A8BFD05EE683FE3F456D4475E88A61B19B299CB557209129BD25E87AC38CECA5303C7E2035E64C1F5A4AD2605D8581181A9C7E48680371F83BC7A93D7A63D8748EA4761A608F424578C20127D01DE0E2FBFBD5F079575E86FD506925D541026B7C8713FDEFE108BCEADBFC1DA0 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin
_search=true&nd=1629052554858&rows=40&page=1&sidx=&sord=asc&QC_ProjectID=14&FileReadyForDownload=1&filters=%7B%22fields%22%3A%5B%7B%22field%22%3A%22QC_ProjectID%22%2C+%22op%22%3A+%22cn%22%2C+%22value%22%3A%2214%22%7D%2C%7B%22field%22%3A%22FileReadyForDownload%22%2C+%22op%22%3A+%22cn%22%2C+%22value%22%3A%221%22%7D%5D%7D&indexID=8379fcd1-5083-4d1c-a6ee-5812f134a505&entity=false
[Here is the post_data I obtained from the input stream, I think that this being in clear-text is suspect] Post_data = _search=true&nd=1629052767566&rows=40&page=1&sidx=&sord=asc&QC_ProjectID=14&FileReadyForDownload=1&filters=%7B%22fields%22%3A%5B%7B%22field%22%3A%22FileName%22%2C+%22op%22%3A+%22cn%22%2C+%22value%22%3A%22th%22%7D%2C%7B%22field%22%3A%22QC_ProjectID%22%2C+%22op%22%3A+%22cn%22%2C+%22value%22%3A%2214%22%7D%2C%7B%22field%22%3A%22FileReadyForDownload%22%2C+%22op%22%3A+%22cn%22%2C+%22value%22%3A%221%22%7D%5D%7D&indexID=8379fcd1-5083-4d1c-a6ee-5812f134a505&entity=false&FileName=th
Here is the incoming HttpContext.Current.Request.Params.AllKeys, Notice the lacking "indexID" among other parameters

Jsoup times out and cURL only works with '--compressed' header - how do I emulate this header in Jsoup?

I am trying to use JSoup to parse content from URLs like https://www.tesco.com/groceries/en-GB/products/300595003
Jsoup.connect(url).get() simply times out, however I can access the website fine in the web browser.
Through trial and error, the simplest working curl command I found was:
curl 'https://www.tesco.com/groceries/en-GB/products/300595003' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0' \
-H 'Accept-Language: en-GB,en;q=0.5' --compressed
I am able to translate the User-Agent and Accept-Language into JSoup, however I still get timeouts. Is there an equivalent to the --compressed flag for Jsoup, because the curl command will not work without it?
To find out what --compressed option does try using curl with --verbose parameter. It will display full request headers.
Without --compressed:
> GET /groceries/en-GB/products/300595003 HTTP/2
> Host: www.tesco.com
> Accept: */*
> User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101
Firefox/76.0
> Accept-Language: en-GB,en;q=0.5
With --comppressed:
> GET /groceries/en-GB/products/300595003 HTTP/2
> Host: www.tesco.com
> Accept: */*
> Accept-Encoding: deflate, gzip
> User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101
Firefox/76.0
> Accept-Language: en-GB,en;q=0.5
The difference is new Accept-Encoding header so adding .header("Accept-Encoding", "deflate, gzip") should solve your problem.
By the way, for me both jsoup and curl are able to download page source without this header and without --compressed and I'm not getting timeouts, so there's a chance your requests are somehow limited by server for making too many requests.
EDIT:
It works for me using your original command with --http1.1 so there has to be a way to make it work for you as well. I'd start with using Chrome developer tools to take a look at what headers your browser sends and try to pass all of them using .header(...). You can also copy curl command to see all headers and simulate exactly what Chrome is sending:

WebClient.DownloadString seems to change some of the html from an external site

I have an ASP.NET website (.aspx) that I call from within an ASP.NET MVC 4 mobile website (.cshtml) to get its html response string. Both sites are hosted on a Windows Server 2008 R2 system. They are created and published with VS2010 Professional.
-If I go directly to the external site and view source then it is correct.
-If I use any of the below ways of getting the external html:
using (WebClient client = new WebClient())
{
html = client.DownloadString(strUrl);
}
or
using (WebClient client = new WebClient())
{
byte[] DataBuffer = client.DownloadData(strUrl);
html = Encoding.ASCII.GetString(DataBuffer);
}
or
WebResponse objResponse;
WebRequest objRequest = System.Net.HttpWebRequest.Create(strUrl);
objResponse = objRequest.GetResponse();
using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
{
html = sr.ReadToEnd();
sr.Close();
}
then the html is changed from this ( where the font-family is set on a parent table ):
<td align="right" style="color:Red;background-color:White;width:4.375em;border-bottom:1px solid black;border-right:1px solid black;">-27.0%</td>
to this:
<td align="right" bgcolor="White" style="border-bottom:1px solid black;border-right:1px solid black;"><font face="Arial,sans-serif" color="Red">-27.0%</font></td>
I doesn't look like anything else has changed other than the font style is changed to a tag, the background color moved from a style to a tag attribute, and the width style being completely removed. This happens on the entire page.
If I put a break point on the html variable and view it then the html has already been changed by the time DownloadString is called.
Anyone know why this is happening?
Thanks in advance.
edit:
this link: WebClient.DownloadString() Not Producing Exact HTML
is not quite the same thing as I am not using Ajax or JavaScript on the external page.
edit:
here are the request headers from fiddler and the site that calls the other site (I used Chrome):
GET / HTTP/1.1
Connection: keep-alive
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: .ASPXBrowserOverride=Mozilla%2f4.0+(compatible%3b+MSIE+6.0%3b+Windows+CE%3b+IEMobile+8.12%3b+MSIEMobile+6.0);
going to the site directly I get this request header:
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: .ASPXBrowserOverride=Mozilla%2f4.0+(compatible%3b+MSIE+6.0%3b+Windows+CE%3b+IEMobile+8.12%3b+MSIEMobile+6.0);
edit:
If I look at the client object in debug mode client.Headers is empty before and after DownloadString is called.
Also, after DownloadString is called here are the client.ResponseHeaders:
{Content-Length: 267123
Cache-Control: private
Content-Type: text/html; charset=utf-8
Date: Tue, 27 Nov 2012 18:37:27 GMT
Set-Cookie: ASP.NET_SessionId=******; path=/; HttpOnly
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
}
Solution:
Unfortunately I cannot accept two answers. Both Icarus and James Lawruk's answers helped me to solve the problem. I am picking an answer based on what most recently lead me to the final solution. So thanks to you both!
So here is the solution in a nutshell:
Use fiddler to view the request headers and find the user-agent.
Modify the code as follows:
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11");
html = client.DownloadString(strUrl);
}
Try setting the user-agent value and experiment with different browsers. This may prove the Web site is switching the HTML response based on the user-agent header.
webClient.Headers.Add("user-agent", "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5");
var iphoneHtml = webClient.DownloadString("http://www.yoursite.com");
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11");
var safariHtml = webClient.DownloadString("http://www.yoursite.com");
Most likely Browser Sniffing as Dour pointed out on his comment because WebClient does not change the resulting HTML at all.
You can probably verify this if you use Fiddler and set up the request headers in exactly the same way WebClient does. I bet you you get the same HTML output.

Why would a browser make two separate requests for the same file?

I'm debugging a program I wrote and noticed something strange. I set up an HTTP server on port 12345 that servers a simple OGG video file, and attempted to access it from Firefox.
Upon sniffing the network requests, I found these two requests were made:
GET /video.ogv HTTP/1.1
Host: 127.0.0.1:12345
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
GET /video.ogv HTTP/1.1
Host: 127.0.0.1:12345
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Range: bytes=8122368-
The video is almost 8 MB in size, so the fact that the second request specifics 8122368 bytes, which is 7932 KB, suggests it is requesting the very end of the file for some reason. Anyone have ideas?
In order to support seeking and playing back regions of the media that aren't yet downloaded, Gecko uses HTTP 1.1 byte-range requests to retrieve the media from the seek target position. So because Ogg files don't contain their duration, the initial download connection is terminated. Then there is a seek to the end of the Ogg file and read a bit of data to extract the time duration of the media. Info from here and here.
Some media format have meta data at the end of the file, and this data is usually required to allow proper seeking of the video.
Its actually requesting 8122368 bytes starting backwards from the end. Which is 7.74MB if I did my calcs correctly.
it might be something in how the buffering for that file type is done.

Http get request packet size in bytes

How many bytes of data does a typical HTTP get request consume.
For instance if I request a page from the server through a browser how many bytes of data would be sent?
Pretty typical request, 430 bytes:
GET /ga.js HTTP/1.1\r\n
Host: www.google-analytics.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)\r\n
Accept: */*\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Referer: http://stackoverflow.com/\r\n
If-Modified-Since: Mon, 31 Aug 2009 17:13:58 GMT\r\n
\r\n
\r\n
Request with a long query string and a small cookie 657 bytes)
GET /pixel;r=978178957;fpan=0;fpa=1241112640-44259546-69321280;ns=0;url=http%3A%2F%2Fstackoverflow.com%2F;ref=;ce=1;je=1;sr=1920x1200x32;dg=E5912-W-MO-5;dst=1;et=1252061014745;tzo=-120;a=p-c1rF4kxgLUzNc HTTP/1.1\r\n
Host: pixel.quantserve.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)\r\n
Accept: image/png,image/*;q=0.8,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Referer: http://stackoverflow.com/\r\n
Cookie: uid=1274108650-45267447-66848880; mc=1137458542-57565784-88898864\r\n
\r\n
\r\n
Use Fiddler to intercept the request and see for yourself.
It varies, especially when it comes to GET queries or POST requests, but I'd estimate it about 0.5—1k.
Requesting a page from the browser, though, may also result in requesting pictures, stylesheets and other referenced content.
Edit: originally I put in the estimation for request+reply.
I would suggest you use a full packet sniffer like wireshark. You would love it :)
Get it here:
http://www.wireshark.org/

Resources