What to do with errors when streaming the body of an Http request - http

How do I handle a server error in the middle of an Http message?
Assuming I already sent the header of the message and I am streaming the
the body of the message, what do I do when I encounter an unexpected error.
I am also assuming this error was caused while generating the content and not a connection error.
(Greatly) Simplified Code:
// I can define any transfer encoding or header fields i need to.
send(header); // Sends the header to the Http client.
// Using an iterable instead of stream for code simplicity's sake.
Iterable<String> stream = getBodyStream();
Iterator<String> iterator = stream.iterator();
while (iterator.hasNext()) {
String string;
try {
string = iterator.next();
catch (Throwable error) { // Oops! an error generating the content.
// What do i do here? (In regards to the Http protocol)
}
send(string);
}
Is there a way to tell the client the server failed and should either retry or abandon the connection or am I sool?
The code is greatly simplified but I am only asking in regards to the protocol and not the exact code.
Thank You

One of the following should do it:
Close the connection (reset or normal close)
Write a malformed chunk (and close the connection) which will trigger client error
Add a http trailer telling your client that something went wrong.
Change your higher level protocol. Last piece of data you send is a hash or a length and the client knows to deal with it.
If you can generate a hash or a length (in a custom header if using http chunks) of your content before you start sending you can send it in a header so your client knows what to expect.
It depends on what you want your client to do with the data (keep it or throw it away). You may not be able to make changes on the client side so the last option will not work for example.
Here is some explanation about the different ways to close. TCP option SO_LINGER (zero) - when it's required.

I think the server should return a response code start with 5xx as per RFC 2616.
Server Error 5xx
Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request. Except when responding to a HEAD request, the server SHOULD include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. User agents SHOULD display any included entity to the user. These response codes are applicable to any request method.

Related

Aspects from HttpServletResponse's sendError method explained by the official documentation

I am in reference to HttpServletResponse's sendError method:
void sendError(int sc,
java.lang.String msg)
throws java.io.IOException
... and the official documentation provided:
Sends an error response to the client using the specified status and
clears the buffer. The server defaults to creating the response to
look like an HTML-formatted server error page containing the specified
message, setting the content type to "text/html". The server will
preserve cookies and may clear or update any headers needed to serve
the error page as a valid response. If an error-page declaration has
been made for the web application corresponding to the status code
passed in, it will be served back in preference to the suggested msg
parameter and the msg parameter will be ignored.
If the response has
already been committed, this method throws an IllegalStateException.
After using this method, the response should be considered to be
committed and should not be written to.
Can anyone please explain what is meant by "clears the buffer" and "If the response has already been committed"?
what is meant by "clears the buffer"
The code will response.resetBuffer() which basically resets any written and unflushed data of response body.
and "If the response has already been committed"?
If the response headers are already sent to the client. This is a point of no return. The server cannot take back already sent data from the client and re-send a different response.
An example of the normal flow is as below:
user requests JSP
JSP writes some HTML to response body
some awkward code in midst of a JSP file throws an exception
server calls response.sendError(500) so that HTML of HTTP 500 error page will be written
user sees HTTP 500 error page
However, if between step 2 and 3 the response buffer is flushed (i.e. any so far written data gets actually sent from server to client), then the response is in a "committed" state. This cannot be resetted. The enduser basically ends up getting halfbaked HTML output representing the part until the point the exception has occurred.
That's also one of the reasons why doing business logic in a JSP file is a bad practice. See also a.o. How to avoid Java code in JSP files?

Returning http 200 OK with error within response body

I'm wondering if it is correct to return HTTP 200 OK when an error occurred on the server side (the error details would be contained inside the response body).
Example:
We're sending HTTP GET
Something unexpected happened on the server side.
Server returns HTTP 200 OK status code with error inside a response (e.g. {"status":"some error occurred"})
Is this the correct behavior or not? Should we change the status code to something else than 200?
No, it's very incorrect to send 200 with a error body
HTTP is an application protocol. 200 implies that the response contains a payload that represents the status of the requested resource. An error message usually is not a representation of that resource.
If something goes wrong while processing GET, the right status code is 4xx ("you messed up") or 5xx ("I messed up").
HTTP status codes say something about the HTTP protocol. HTTP 200 means transmission is OK on the HTTP level (i.e request was technically OK and server was able to respond properly). See this wiki page for a list of all codes and their meaning.
HTTP 200 has nothing to do with success or failure of your "business code". In your example the HTTP 200 is an acceptable status to indicate that your "business code error message" was successfully transferred, provided that no technical issues prevented the business logic to run properly.
Alternatively you could let your server respond with HTTP 5xx if technical or unrecoverable problems happened on the server. Or HTTP 4xx if the incoming request had issues (e.g. wrong parameters, unexpected HTTP method...) Again, these all indicate technical errors, whereas HTTP 200 indicates NO technical errors, but makes no guarantee about business logic errors.
To summarize: YES it is valid to send error messages (for non-technical issues) in your http response together with HTTP status 200. Whether this applies to your case is up to you. If for instance the client is asking for a file that isn't there, that would be more like a 404. If there is a misconfiguration on the server that might be a 500. If client asks for a seat on a plane that is booked full, that would be 200 and your "implementation" will dictate how to recognise/handle this (e.g. JSON block with a { "booking": "failed" })
I think these kinds of problems are solved if we think about real life.
Bad Practice:
Example 1:
Darling everything is FINE/OK (HTTP CODE 200) - (Success):
{
...but I don't want us to be together anymore!!!... (Error)
// Then everything isn't OK???
}
Example 2:
You are the best employee (HTTP CODE 200) - (Success):
{
...But we cannot continue your contract!!!... (Error)
// Then everything isn't OK???
}
Good Practices:
Darling I don't feel good (HTTP CODE 400) - (Error):
{
...I no longer feel anything for you, I think the best thing is to separate... (Error)
// In this case, you are alerting me from the beginning that something is wrong ...
}
This is only my personal opinion, each one can implement it as it is most comfortable or needs.
Note: The idea for this explanation was drawn from a great friend #diosney
Even if I want to return a business logic error as HTTP code there is no such
acceptable HTTP error code for that errors rather than using HTTP 200 because it will misrepresent the actual error.
So, HTTP 200 will be good for business logic errors. But all errors which are covered by HTTP error codes should use them.
Basically HTTP 200 means what server correctly processes user request (in case of there is no seats on the plane it is no matter because user request was correctly processed, it can even return just a number of seats available on the plane, so there will be no business logic errors at all or that business logic can be on client side. Business logic error is an abstract meaning, but HTTP error is more definite).
To clarify, you should use HTTP error codes where they fit with the protocol, and not use HTTP status codes to send business logic errors.
Errors like insufficient balance, no cabs available, bad user/password qualify for HTTP status 200 with application specific error handling in the response body.
See this software engineering answer:
I would say it is better to be explicit about the separation of protocols. Let the HTTP server and the web browser do their own thing, and let the app do its own thing. The app needs to be able to make requests, and it needs the responses--and its logic as to how to request, how to interpret the responses, can be more (or less) complex than the HTTP perspective.
I think people have put too much weight into the application logic versus protocol matter. The important thing is that the response should make sense. What if you have an API that serves a dynamic resource and a request is made for X which is derived from template Y with data Z and either Y or Z isn't currently available? Is that a business logic error or a technical error? The correct answer is, "who cares?"
Your API and your responses need to be intelligible and consistent. It should conform to some kind of spec, and that spec should define what a valid response is. Something that conforms to a valid response should yield a 200 code. Something that does not conform to a valid response should yield a 4xx or 5xx code indicative of why a valid response couldn't be generated.
If your spec's definition of a valid response permits { "error": "invalid ID" }, then it's a successful response. If your spec doesn't make that accommodation, it would be a poor decision to return that response with a 200 code.
I'd draw an analogy to calling a function parseFoo. What happens when you call parseFoo("invalid data")? Does it return an error result (maybe null)? Or does it throw an exception? Many will take a near-religious position on whether one approach or the other is correct, but ultimately it's up to the API specification.
"The status-code element is a three-digit integer code giving the result of the attempt to understand and satisfy the request"
Obviously there's a difference of opinion with regards to whether "successfully returning an error" constitutes an HTTP success or error. I see different people interpreting the same specs different ways. So pick a side, sure, but also accept that either way the whole world isn't going to agree with you. Me? I find myself somewhere in the middle, but I'll offer some commonsense considerations.
If your server-side code catches an unexpected exception when dispatching a request, that sounds like the very definition of a 500 Internal Server Error. This seems to be OP's situation. The application should not return a 200 for unexpected errors, but also see point 3.
If your server-side code should be able to gracefully handle a given invalid input, and it doesn't constitute an "exceptional" error condition, your spec should accommodate HTTP 200 responses that provide meaningful diagnostic information.
Above all: Have a spec. Make it consistent. Stick to it.
In OP's situation, it sounds like you have a de-facto standard that unhandled exceptions yield a 200 with a distinguishable response body. It's not ideal, but if it's not breaking things and actively causing problems, you probably have bigger, more important problems to solve.
HTTP Is the Protocol handling the transmission of data over the internet.
If that transmission breaks for whatever reason the HTTP error codes tell you why it can't be sent to you.
The data being transmitted is not handled by HTTP Error codes. Only the method of transmission.
HTTP can't say 'Ok, this answer is gobbledigook, but here it is'. it just says 200 OK.
i.e : I've completed my job of getting it to you, the rest is up to you.
I know this has been answered already but I put it in words I can understand. sorry for any repetition.

Does 200 mean the request successfully started or successfully finished?

I realize this might sound like an odd question, but I'm seeing some odd results in my network calls so I'm trying to figure out if perhaps I'm misunderstanding something.
I see situations where in isolated incidents, when I'm uploading data, even though the response is 200, the data doesn't appear on the server. My guess is that the 200 response arrives during the initial HTTP handshake and then something goes wrong after the fact. Is that a correct interpretation of the order of events? Or is the 200 delivered once the server has collected whatever data the sending Header tells it is in the request? (cause if so then I'm back to the drawing board as to how I'm seeing what I'm seeing).
It means it has been successfully finished. From the HTTP /1.1 Spec
10.2.1 200 OK
The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:
GET an entity corresponding to the requested resource is sent in the response;
HEAD the entity-header fields corresponding to the requested resource are sent in the response without any message-body;
POST an entity describing or containing the result of the action;
TRACE an entity containing the request message as received by the end server.
Finished. It doesn't make sense otherwise. Something might go wrong and the code could throw an error. How could the server send that error code when it already sent 200 OK.
What you experience might be lag, caching, detached thread running after the server sent the 200 etc.
A success reply, like 200, is not sent until after server has received and processed the full request. Error replies, on the other hand, may be sent before the request is fully received. If that happens, stop sending your request.

How do server-sent events actually work?

So I understand the concept of server-sent events (EventSource):
A client connects to an endpoint via EventSource
Client just listens to messages sent from the endpoint
The thing I'm confused about is how it works on the server. I've had a look at different examples, but the one that comes to mind is Mozilla's: http://hacks.mozilla.org/2011/06/a-wall-powered-by-eventsource-and-server-sent-events/
Now this may be just a bad example, but it kinda makes sense how the server side would work, as I understand it:
Something changes in a datastore, such as a database
A server-side script polls the datastore every Nth second
If the polling script notices a change, a server-sent event is fired to the clients
Does that make sense? Is that really how it works from a barebones perspective?
The HTML5 doctor site has a great write-up on server-sent events, but I'll try to provide a (reasonably) short summary here as well.
Server-sent events are, at its core, a long running http connection, a special mime type (text/event-stream) and a user agent that provides the EventSource API. Together, these make the foundation of a unidirectional connection between a server and a client, where messages can be sent from server to client.
On the server side, it's rather simple. All you really need to do is set the following http headers:
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Be sure to respond with the code 200 and not 204 or any other code, as this will cause compliant user agents to disconnect. Also, make sure to not end the connection on the server side. You are now free to start pushing messages down that connection. In nodejs (using express), this might look something like the following:
app.get("/my-stream", function(req, res) {
res.status(200)
.set({ "content-type" : "text/event-stream"
, "cache-control" : "no-cache"
, "connection" : "keep-alive"
})
res.write("data: Hello, world!\n\n")
})
On the client, you just use the EventSource API, as you noted:
var source = new EventSource("/my-stream")
source.addEventListener("message", function(message) {
console.log(message.data)
})
And that's it, basically.
Now, in practice, what actually happens here is that the connection is kept alive by the server and the client by means of a mutual contract. The server will keep the connection alive for as long as it sees fit. Should it want to, it may terminate the connection and respond with a 204 No Content next time the client tries to connect. This will cause the client to stop trying to reconnect. I'm not sure if there's a way to end the connection in a way that the client is told not to reconnect at all, thereby skipping the client trying to reconnect once.
As mentioned client will keep the connection alive as well, and try to reconnect if it is dropped. The algorithm to reconnect is specified in the spec, and is fairly straight forward.
One super important bit that I've so far barely touched on however is the mime type. The mime type defines the format of the message coming down the connecting. Note however that it doesn't dictate the format of the contents of the messages, just the structure of the messages themselves. The mime type is extremely straight forward. Messages are essentially key/value pairs of information. The key must be one of a predefined set:
id - the id of the message
data - the actual data
event - the event type
retry - milleseconds the user agent should wait before retrying a failed connection
Any other keys should be ignored. Messages are then delimited by the use of two newline characters: \n\n
The following is a valid message: (last new line characters added for verbosity)
data: Hello, world!
\n
The client will see this as: Hello, world!.
As is this:
data: Hello,
data: world!
\n
The client will see this as: Hello,\nworld!.
That pretty much sums up what server-sent events are: a long running non-cached http connection, a mime type and a simple javascript API.
For more information, I strongly suggest reading the specification. It's small and describes things very well (although the requirements of the server side could possibly be summarized a bit better.) I highly suggest reading it for the expected behavior with certain http status codes, for instance.
You also need to make sure to call res.flushHeaders(), otherwise Node.js won't send the HTTP headers until you call res.end(). See this tutorial for a complete example.

Does sending POST data to a server that doesn't accept post data recieve the data?

I am setting up a back end API in a script of mine that contacts one of my sites by sending XML to my web server in the form of POST data. This script will be used by many and I want to limit the bandwidth waste for people that accidentally turn the feature on without a proper access key.
I will be denying requests that do not have the correct access key by maybe generating a 403 access code.
Lets say the POST data is ~500kb of data. Does the server receive all 500kb of data when this attempt is made regardless of the status code?
How about if I made the url contain the key mydomain/api/123456789 and generate 403 status on all bad access keys.
Does the POST data still get sent/received regardless or is it negotiated before the data is finally sent.
Thanks in advance!
Generally speaking, the entire request will be sent, including post data. There is often no way for the application layer to return a response like a 403 until it has received the entire request.
In reality, it will depend on the language/framework used and how closely it is linked to the HTTP server. Section 8.2.2 of RFC2616 HTTP/1.1 specification has this to say
An HTTP/1.1 (or later) client sending
a message-body SHOULD monitor the
network connection for an error status
while it is transmitting the request.
If the client sees an error status, it
SHOULD immediately cease transmitting
the body. If the body is being sent
using a "chunked" encoding (section
3.6), a zero length chunk and empty trailer MAY be used to prematurely
mark the end of the message. If the
body was preceded by a Content-Length
header, the client MUST close the
connection.
So, if you can find a language environemnt closely linked with the HTTP server (for example, mod_perl), you could do this in a way which does comply with standards.
An alternative approach you could take is to make an initial, smaller request to obtain a URL to use for the larger POST. The application can then deny providing the URL to clients without an appropriate key.
Here is great book about RESTful Web Services, where it's explained how HTTP works: http://oreilly.com/catalog/9780596529260
You can consider any request as envelope, where on top of it it's written address (URL), some properties (HTTP Headers) and inside it there's some data (if request is initiated by post method). So as you might guess you can't receive envelope partially.
Oh I forgot, it's when you are using HTTP Post with standard HTTP header "application/x-www-form-urlencoded" but if you are uploading files (correspondingly using ""multipart/form-data") Django gives you control over streamed chunks of files using Middleware classes: http://docs.djangoproject.com/en/dev/topics/http/middleware/

Resources