ParseError on Node.js http request - http

Hello StackOverflow community!
I started to learn Node.js recently, and decided to implement a reverse HTTP proxy as a task. There were a couple of rough places, which I managed to get through on my own, but now I'm a bit of stuck, and need your help. I managed to handle redirects and relative urls, and with implementation of relative url support I faced the problem I'm going to describe.
You can find my code at http://pastebin.com/vZfEfk8r. It's not very big, but still doesn't fit nicely to this page.
So to the problems (there are 2 of them). I'm using http.request to forward client's request to the target server, then waiting for response and sending this response back to client. It works okay for some of the requests, but not for others. This is the first problem: on the web-site I'm using to test the proxy ( http://ixbt.com, cool russian web-site about the tech) I can always get the main page /index.html, but when browser starts to fetch other files referenced from that page (css, img, etc.), most of the requests are ending with ParseError ({"bytesParsed":0}).
While debugging (using Wireshark) I noticed that at some of the requests (if not all) fail with this error when the following HTTP negotiation between proxy and target server occurs:
Request:
GET articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
Host: www.ixbt.com
Connection: keep-alive
Response:
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
Looks like server doesn't send the status code, and no headers. So the question is, can this be the reason of failure (ParseError)?
My another concern is that when I'm trying to get the same file as a standalone request, I have no problems. Just look:
Request:
GET /articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
Host: www.ixbt.com
Connection: keep-alive
Response:
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 25 Jun 2012 17:09:51 GMT
Content-Type: image/jpeg
Content-Length: 3046
Last-Modified: Fri, 22 Jun 2012 00:06:27 GMT
Connection: keep-alive
Expires: Wed, 25 Jul 2012 17:09:51 GMT
Cache-Control: max-age=2592000
Accept-Ranges: bytes
... and here goes the body ...
So in the end of the day there may be some mistake in how I do the proxy requests. Maybe it's because I actually do lots of them, when the main page is loaded - it has many images, etc.?
I hope I was clear enough, but please ask about details if I missed something. And the full source code is available (again, at the http://pastebin.com/vZfEfk8r), so if somebody would try it, it would be just great. :)
Much thanks in advance!
P.S. As I said, I'm just learning, so if you'll see some bad practices in my code (even unrelated to the question), it would be nice know them.
UPDATE: As was mentioned in comment, I didn't proxied the original request's headers, which in theory could lead to problems with the following requests. I changed that, but, unfortunately, the behavior remained the same. Here's example of new request and response:
Request
GET css/main_fixed.css HTTP/1.1
Host: www.ixbt.com
connection: keep-alive
cache-control: no-cache
pragma: no-cache
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.56 Safari/536.5
accept: text/css,*/*;q=0.1
accept-encoding: gzip,deflate,sdch
accept-language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
accept-charset: windows-1251,utf-8;q=0.7,*;q=0.3
referer: http://www.ixbt.com/
Response
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
I had to craft the 'referer' header by hand, since browser is sending it with reverse proxy url. Still behavior is the same, as you can see. Any other ideas?

Thanks to the valuable comments, I was able to find the answer to this problem. It was nothing related to Node or target web-servers, just a coding error.
The answer is that path component of url was wrong for the relative urls. It already can be visible from my logs in the question's body. I'll repeat them here to reiterate:
Wrong request:
GET articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
Right request:
GET /articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
See the difference? The leading slash. Turns out I missed it on my requests for relative urls, due to my own awkward client's url handling. But with a quick-and-dirty fix it's working now, well enough until I'll do a proper client's url handling.
Much thanks for comments, they were insightful!

If above solutions do not work, try removing content-length header. Content-length mismatch causes body parsers to cause this errors

Related

SignalR longPolling with GET method and application/json Content-Type causes security warnings

Our Third Party security software is being triggered by an apparent mismatch between a header of GET and a Content-Type of application/json.
Payload not allowed (Content-Type header not allowed for this method)
/signalr/poll
transport=longPolling&messageId=...&clientProtocol=1.4&etc
application/json; charset=UTF-8
Mozilla/5.0 (Windows NT 6.1;Trident/7.0; rv:11.0) like Gecko
Is this a known issue or have I done something silly?
Thanks,
James
Although largely superfluous, it is the default behaviour of SignalR to send a Content-Type header with GET method Http requests.
Content-Type: application/json; charset=UTF-8
I have confirmed this with a small SignalR test program and Fiddler.
As far as I can tell, our Third Party Security software is just being a little overeager.

Windows 8 built-in WebDAV client ignores 401 Unauthorized

I create a webdav connection with the Windows 8 built-in WebDAV client (Microsoft-WebDAV-MiniRedir).
I have only a read permission for the files and try to delete one.
I can open by right-click the context menu and delete it, although my WebDAV server returns 401 Unauthorized. The file disappears in the explorer as if it has been deleted.
If I close the explorer window and open it again, the file is back again, what is ok.
Why the deletion is not refused and why I doesn't get from the WebDAV client an error message like "401 unauthorized access"?
Here are the request and response.
Request:
DELETE https://xxx.yyy.zz/webdav/mysharedfolder/file1.txt HTTP/1.1
Connection: Keep-Alive
User-Agent: Microsoft-WebDAV-MiniRedir/6.3.9600
translate: f
Host: xxx.yyy.zz
Authorization: Basic dlk7uXNvcmt1QHdlYi5kZTpRd2VyMTIzNA==
Cookie: JSESSIONID=A7497F42472ECC676E44A90E3C5D5E7
Response:
HTTP/1.1 401 Unauthorized
Date: Thu, 13 Nov 2014 23:21:43 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Basic realm="https://xxx.yyy.zz/webdav/mysharedfolder/file1.txt"
Content-Length: 0
Connection: close
Content-Type: text/plain; charset=UTF-8
A redirect on an OPTIONS request (or in any webdav request actually) is suspicious, and I wouldnt assume windows will correctly handle that, so that might be something to look at. But i also vaguely remember encountering something similar with Win7 years ago. A workaround might be to return a different 4xx error code for the mini-redirector agent.

Amazon CloudFront not consistently returning 304 (Not Modified) for unchanged static content?

A grid of EC2 web servers is running behind an ELB load balancer. The ELB is behind Amazon's CloudFront content delivery network. Content Delivery Networks are very new to me. My understanding is that CloudFront is supposed to speed up performance by caching static content at its "edges". But this isn't what's happening.
Consider my EC2 instances whose content should always have a lifetime of five minutes. For static content this usually means declaring the following in my web.config file:
<staticContent>
<clientCache cacheControlCustom="public" cacheControlMode="UseMaxAge" cacheControlMaxAge="00.00:05:00"/>
</staticContent>
...and for the dynamic stuff, it usually means executing the following commands against an HttpResponse object:
resp.Cache.SetCacheability(HttpCacheability.Public);
resp.Cache.SetMaxAge(TimeSpan.FromMinutes(5));
With that as background...
When my browser hits the ELB directly, everything works as expected. Firebug consistently shows that 304 (Not Modified) is returned for content that exists in the browser's cache, has passed its five minute expiration, but has not been changed on the server. Here are the response headers for a download of defs.js, for example:
HTTP/1.1 304 Not Modified
Accept-Ranges: bytes
Cache-Control: public,max-age=300
Date: Tue, 22 Apr 2014 13:54:16 GMT
Etag: "0152435d158cf1:0"
Last-Modified: Tue, 15 Apr 2014 17:36:18 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Connection: keep-alive
IIS correctly sees that the file hasn't been changed since April 15th and returns 304.
But looks what happens when the file is grabbed through CloudFront.
HTTP/1.1 200 OK
Content-Type: application/x-javascript
Content-Length: 205
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: public,max-age=300
Date: Tue, 22 Apr 2014 14:07:33 GMT
Etag: "0152435d158cf1:0"
Last-Modified: Tue, 15 Apr 2014 17:36:18 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Age: 16
X-Cache: Hit from cloudfront
Via: 1.1 0f140ef1be762325ad24a7167aa57e65.cloudfront.net (CloudFront)
X-Amz-Cf-Id: Evfdhs-pxFojnzkQWuG-Ubp6B2TC5xbunhavG8ivXURdp2fw_noXjw==
In this case CloudFront forces the browser to download the entire file again even though, as you can see:
(a) it knows the file hasn't been modified since April 15th (see Last-Modified header), and
(b) CloudFront does have a cached copy of the file on hand (see X-Cache header)
Perhaps you're wondering if my browser is sending a valid If-Modified-Since header. Indeed it is. Here are the request headers:
GET /code/shared/defs.js HTTP/1.1
Host: d2fn6fv5a0cu3b.cloudfront.net
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://d2fn6fv5a0cu3b.cloudfront.net/
Connection: keep-alive
If-Modified-Since: Tue, 15 Apr 2014 17:36:18 GMT
If-None-Match: "0152435d158cf1:0"
Cache-Control: max-age=0
It's an odd situation. If I just sit in front of my browser and keep doing page Reloads (Cmd-R), maybe about half the time CloudFront will correctly return a 304 and the other half of the time it'll incorrectly return 200 along with all of the content. Waiting for the five minute expiration before interacting with the page yields primarily 200's and only a few 304's. This odd behavior applies to all of the files (.css, .js, .png, etc.) referenced on the HTML page as well as for the containing HTML page itself. I know my app is coded properly because as mentioned above, hitting the ELB directly without going through CloudFront results in the expected 304 result. Any ideas?
The answer was found in an obscure sentence written in a seemingly unrelated piece of Amazon documentation:
When you configure CloudFront to forward cookies to your origin [...] If-Modified-Since and If-None-Match conditional requests are not supported.
Strange, but indeed the reality of the situation is far worse; It's not that forwarding cookies to your origin servers disables conditional requests, but rather that is disables them sometimes -- to the point where the HTTP result code (304 vs 200) is virtually random.
It's important to note that you'll be bitten by this bizarre behavior even if you're not using cookies at all. It's still absolutely essential that the Forward Cookies drop-down be set to "None" as shown in the image below:
Switching the setting to "None" fixes the errant behavior described in my original post.
This solution presents you with another problem though. You're telling CloudFront to totally strip out all cookies prior to forwarding the request to your origin. But your origin server might need those cookies. Further, if you're using the ELB (load balancer) as your origin, a critical cookie that the ELB depends upon to maintain sticky sessions will be totally dropped. Not good.
The solution to the cookie-stripping problem will depend on how your site is organized. In my case, transmission of cookies (session-related or otherwise) is only necessary when posting AJAX data to myDomain.com/ajax/. Because all cookie-dependent URLs fall under the category of ajax/* , a new behavioral rule for that path had to be created and in that rule, and that rule only, the Forward Cookies drop-down is set to "All" instead of "None."
So there it is. Hope this helps someone.

ASP.NET postback has empty request body, but only sometimes

I'm working on a high-traffic ASP.NET website, and around once per 30 minutes, we get a POST request that triggers a HttpException: Request timed out. In our debugging, we found out that ASP.NET is getting the request, but not the request body. Here are the headers:
Connection: Keep-Alive
Content-Length: 49476
Content-Type: application/x-www-form-urlencoded
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en
Host: abc.123.com
Referer: https://abc.123.com/page.aspx
And the request body is empty. We are behind a load balancer, if that helps with anything.
My question is, how would I debug and fix an issue like this? According to the accepted answer in this post:
Diagnosing "Request timed out" HttpExceptions
It looks like what may be happening is that the request is being split into two TCP segments, one for the header and one for the body. Since we're behind a load balancer with a common virtual IP, it may be entirely possible that one of the TCP segments is being sent to one server and the other is being sent to another. Would this be a plausible case? Or can something else be causing it?

is an HTTP/1.1 request implicitly keep-alive by default?

Solved: pasting the bytes here made me realise that I was missing empty lines between chunks...
Does an HTTP/1.1 request need to specify a Connection: keep-alive header, or is it always keep-alive by default?
This guide made me think it would; that, when my http server gets a 1.1 request, it is keep-alive unless explicitly receiving a Connection: close header.
I ask since my the different client behaviour of ab and httperf is driving me mad enough to wonder my sanity on this one...
Here's what httperf --hog --port 42042 --print-reply body sends:
GET / HTTP/1.1
User-Agent: httperf/0.9.0
Host: localhost
And here's my server's response:
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
Content-Length: 18
12
Hello World 1
0
httpref promptly prints out the response, but then just sits there, neither side closing the connection and httpref not exiting.
Where's my bug?
From RFC 2616, section 8.1.2:
A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.

Resources