Chunked Transfer Encoding problem with Apache Abdera - http

I'm using Apache Abdera to POST atom multipart data to my server, and am having some odd problems that I can't pin down.
It looks like an issue with chunked transfer encoding, but I'm insufficiently experienced to be certain. The problem manifests as the server throwing an error indicating that the request I sent it contains only one mime part, not two as required. I attached Wireshark to the interface and captured the conversation, and it went like this:
POST /sss/col-uri/2ee98ea1-f9ad-4f01-9b1c-cfa3c4a6dc3c HTTP/1.1
Host: localhost
Expect: 100-continue
Transfer-Encoding: chunked
Content-Type: multipart/related; boundary="1306399868259";type="application/atom+xml;type=entry"
The server's response:
HTTP/1.1 100 Continue
My client continues:
198
--1306399868259
Content-Type: application/atom+xml;type=entry
Content-Disposition: attachment; name="atom"
<entry xmlns="http://www.w3.org/2005/Atom"><title xmlns="http://purl.org/dc/terms/">Richard Woz Ere</title><bibliographicCitation xmlns="http://purl.org/dc/terms/">this is my citation</bibliographicCitation><content type="application/zip" src="cid:48bd9436-e8b6-4f68-aa83-5c88eda52fd4" /></entry>
0
b0e9
--1306399868259
Content-Type: application/zip
Content-Disposition: attachment; name="payload"; filename="example.zip"
Content-ID: <48bd9436-e8b6-4f68-aa83-5c88eda52fd4>
Packaging: http://purl.org/net/sword/package/SimpleZip
And at this point the server responds with:
HTTP/1.1 400 Bad Request
Date: Thu, 26 May 2011 08:51:08 GMT
Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/0.9.8l DAV/2 mod_wsgi/3.3 Python/2.6.1
Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml
Indicating the error (which is well understood). My server goes on to stream a pile of base64 encoded bits onto the output stream, but in the mean time the server is not listening, it has already decided that the request was erroneous.
Unfortunately, I'm not in charge of the HTTP layer - this is all handled by Abdera using Apache httpclient. My code that does this looks like this:
client.execute("POST", url.toString(), new SWORDMultipartRequestEntity(deposit), options);
Here, the SWORDMultipartRequestEntity is a copy of the standard Abdera MultipartRequestEntity class, with a few extra headers thrown in (see, for example, Packaging in the above snippet); the "deposit" argument is just an object holding the atom part and the inputstream.
When attaching a debugger I get to this line of code fine, and then it disappears into a rat hole and then I get this error back.
Any hints or tips? I've pretty much exhausted my angles of attack!
The only thing that stands out for me is that immediately after the atom:entry document, there is a newline with "0" on it alone, which appears to be chunked transfer encoding speak for "I'm finished". Not sure how it got there, or whether it really has any effect. Help much appreciated.
Cheers,
Richard

The lonely 0 may indeed be a problem. My uninformed guess is that it results from some call to flush(), which then writes the whole buffer as another HTTP chunk. Unfortunately at the point where flush is called, the buffer had already been flushed and its size is therefore zero. So the HttpChunkedOutputFilter (or however it is called) should be taught than an empty buffer does not need to be flushed.
[update:] You should set a breakpoint in the ChunkedOutputStream class, especially the flush method. I just looked at its code and it seems to be ok, but maybe I missed something.

Related

How to use TCP send out a HTTP response?

I try to use c++ develop a HTTP server on Windows,and when i reponse a HTTP by use WSASend to send out
char response[] =
"HTTP/1.1 200 OK\r\n\
Date: Mon, 27 Jul 2009 12:28:53 GMT\n\r\
Server: Apache/2.2.14 (Win32)\n\r\
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT\n\r\
Content-Length: 88\n\r\
Content-Type: text/html\n\r\
Connection: Closed\n\r\n\r\
<html><body><h1>Hello, World!</h1></body></html>"
Althrougn the browser did show Hello, World! when i type in 127.0.0.1,but the browser just keep show loading sigh as if the pages not yet load complete.And the browser's console never show the response message.Why?
Is there some format issue with my response message?
Content-Length: 88\n\r\
....
Connection: Closed\n\r\n\r\
There are several problems with your code. All over your code you use \n\r instead of \r\n. Therefore the response is invalid HTTP. And the Content-length header must reflect the actual length of the body: <html><body><h1>Hello, World!</h1></body></html> has 48 bytes and not 88 bytes as your code claims. Apart from that it must be Connection: Close instead of Connection: Closed.
Note that HTTP is way more complex than you think. If you really need to implement it yourself instead of using established libraries please study the actual standard (that's what standards are for!) instead of fiddling around until it seems to work. Otherwise it might work only within your specific environment and with a specific browser and you'll get strange problems later.

HTTPListener response - how to define packet boundaries

I am using the HTTPListener class to implement a basic web server. My response uses a Content-Type of "multipart/x-mixed-replace" to return a stream of JPEG images. I am able to construct my response correctly, however my web client does not properly interpret the response due to the way in which the response is broken across IP packet boundaries.
Using a separate server implemented in python, I am able to generate a good, working case. The response to the client's HTTP GET request looks like this:
packet 1:
HTTP/1.1 200 OK
packet 2:
Server: (myServer)
Date: (the date)
Connection: close
(other header info)
Content-Type: multipart/x-mixed-replace; boundary=myboundary
packet 3:
--myboundary
Content-Length: 1042
Content-Type: image/jpeg
(jpeg data)
In the failed case, using the HTTPListener, everything gets sent in a single packet
packet 1:
HTTP/1.1 200 OK
Server: (myServer)
Date: (the date)
Connection: close
(other header info)
Content-Type: multipart/x-mixed-replace; boundary=myboundary
--myboundary
Content-Length: 1042
Content-Type: image/jpeg
(jpeg data)
So my question is, how do I manipulate the HTTPListenerResponse's OutputStream to force a packet boundary? I want to be able to specify some data, manually tell the OutputStream to push out a packet, then specify some more data and push out another packet. Does the HTTPListener offer this level of control, or do I need to instead use a TCPListener? I've not been able to find a solution; please help!
If your client doesn't work because of IP packet boundaries it is severely broken. Fix the client, don't add crutches for it in a place where there isn't a problem. HTTP is defined over TCP, and TCP is a byte-stream protocol. Period. Any correctly written TCP program, let alone an HTTP client, doesn't care where the packet boundaries are. If your client misbehaves in this way it will misbehave in other ways as well. Fix it.

How does HTTP download work?

Let's say i want to download a file called example.pdf from http://www.xxx.ууу/example.pdf
Probably, i send GET request like this:
GET /example.pdf HTTP/1.1␍␊
Host: www.xxx.yyy␍␊
␍␊
But what's next?
How does exchange of http headers look like?
I'm assuming you've read the Wikipedia article on the HTTP protocol. If you just need more examples I'd highly recommend you download Wireshark. Wireshark is an extremely powerful packet sniffer which will allow you to watch packet communications between you and any website. In addition it will actually break down the packets and tell you a little bit about their meanings in more "human terms". It has a bit of a learning curve but it can teach you a lot about a number of different protocols including HTTP.
http://www.wireshark.org/
I'm not sure what your ultimate goal is, but you can view real-time http header interaction with the Live HTTP Headers Firefox add-on. It's also possible in Chrome, but it's a little more work.
Check the HTTP 1.1 RFC.
You might want to look at http://www.w3.org/Protocols/rfc2616/rfc2616.html . But also, there is rarely a need to recreate the protocol.
To answer such GET request, the packet with the following header should be passed:
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 6475593
Content-Type: application/x-msdownload
Etag: "qwfw473usll"
Last-Modified: Sun, 18 Jul 2021 12:02:31 GMT
Server: Caddy
Date: Sun, 18 Jul 2021 12:03:47 GMT
After the last line, you must specify 2 CRLF and row bytes of the file to be transmitted.

Why do my HTTP/1.0 GET request returns OK but no body content?

I'm making a simple http page-requester in C. It uses sockets to send HTTP/1.0 GET requests to hosts, and parses the answer to effectively download the html file.
However, when i send a request like this:
GET http://stackoverflow.com/questions HTTP/1.0
User-Agent: myRequester/1.0
It returns this
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Date: Mon, 19 Dec 2011 15:28:08 GMT
Content-Length: 54362
Connection: close
But no body content.
Yes, I've put CRLF on the end of every line and a blank line at the end.
I use only one socket through one connection. And i also have to stick to HTTP/1.0.
The most likely explanation is that the server is actually sending a body but you are not reading all of it. Most networking systems do not necessarily return all of the response in one function call, because they see it useful that what data is available immediately is returned immediately, even if more is expected.
The Unix system call recv returns zero when the connection has ended. You should keep calling it until you get zero or an error.

Web service response encoding issue

I am developing a Web Services based on ASP.Net asmx web service. The server end will response byte[] to client encoded in UTF-8, and client to convert the byte[] to string.
My confusion is, the England pound character at server side (I dump just before the Http response is wrote, and the character at server side is correct to be England pound) will be received as ?? from client side.
Any ideas what is wrong? I suspect it is encoding issue, but I have no idea how to debug further and any settings (settings from client web service proxy?) which will impact?
Here is the header part which I got from Fiddler.
HTTP/1.1 200 OK
Date: Fri, 20 Feb 2009 16:51:30 GMT
Server: Microsoft-IIS/6.0
cache-control: no-cache
pragma: no-cache
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Cache-Control: private
Content-Type: text/xml
Content-Length: 22752
xml version="1.0" encoding="utf-8"
The first thing to do is to sniff what's actually being sent, in terms of the headers, the XML declaration and the bytes forming the text itself.
Fiddler is good as an HTTP proxy, or you could use WireShark to sniff at the network level.
Once you've got those three bits of information (the Content-Type header, the XML declaration and the bytes making up the pound sign) if you update your answer we'll see what we can do. It does sound odd, as usually ASP.NET just gets all of this right.
What does your client side code look like? Is that just the normal .NET web service client code as well?
EDIT: Try to find a binary (hex dump) display in Fiddler so you can find the bytes.
However, I strongly suspect that the problem is merely with dumping the result to the console. Here's a bit of code to use to dump the unicode code points:
static void DumpString (string value)
{
foreach (char c in value)
{
Console.Write ("{0:x4} ", (int)c);
}
Console.WriteLine();
}
I suspect you'll see an 00A3 in the output, which is the Unicode for the pound sign. That means the string has actually reached your client fine - but writing it out to the console is failing.

Resources