How to deal with failure in the middle of giving a HTTP response? - http

I've recently stumbled upon multiple talks claiming that you should start responding to any incoming HTTP request ASAP, which seems reasonable, but I'm not clear on how to communicate failure when employing this strategy.
Let's image the situation where we send a response code of 200 and start rendering a HTML page. As we construct the body, the data flows in from various db queries and suddenly an error occurs. It's a bit too late to change our mind at this point in time.
Or maybe a more practical example:
We're providing an API that potentially delivers a lot of data. To keep it fast, we stream the data off the DB connection through some project function and then into a streaming JSON encoder that writes right to the socket. And poof, something goes wrong. DB connection breaks up, reconnection attempts time out. We've just flushed 100K JSON objects, but the result set was actually bigger than that.
Is there any good way to die gracefully half way into an HTTP response?
In the HTML case, one could always print some human readable information. And in the API, once can respond with { "results": [ /* payload goes here */ ], "error": { /* error information */ } }, which is ok, since the error is written after the payload. But ideally I would like to use something built into the HTTP protocol. It seems weird to say 200 and then deliver an error. Is there a better way?

Once the status code is sent the only option you have is closing the connection.

Is there any good way to die gracefully half way into an HTTP
response?
No, if you flushed http status to client - there is no possibility to change it, so the only approach is to generate output fully on server and then start streaming it to client with proper http code
NOTE - checkout gzip compression for big JSON data

Related

Efficiently handling HTTP uploads of many large files in Go

There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.

Can a web server begin responding before the client has sent the full request?

I am writing a web application for an academic research group. The researchers need to be able to upload large data sets (100MB - 1GB) in CSV format. I've written the server to process the data as it comes in. This means that if there is an error in the first row of the CSV, we can return an error straight away.
However, when this happens, the browser reports that "The connection was reset" or similar. Clearly, my web server is responding in a way that doesn't make sense.
If I explicitly close the HTTP request stream (this is Kotlin on the JVM by the way) before returning the error to the browser, then the problem goes away. However, it turns out that the close implementation of the request stream first goes and reads the whole stream to its end. So at that point the user still has to wait 30mins+ to find out that there is an error in the first row of their CSV.
Is what I am trying to do possible? Does the HTTP protocol permit a web server, in any circumstances, to begin responding before the full request body has been sent? If not, can you suggest a workaround that would allow me to deliver a user experience where the user doesn't have to wait for the whole file to be uploaded before finding out if there are any problems?
The answer is yes, according to the http spec servers should be able to send responses early and the client should stop sending the request body. Most browsers however, don't implement this correctly.
In theory, your http server needs to return a 4xx error code with a response body, then reset the connection to prevent the upload continuing in the background. See the answers below for a more detailed description of the issue. There are a couple of browser versions that do support this, so if you're doing this in lab conditions where you can control the client being used the links below will help.
https://stackoverflow.com/a/14483857/2274303
https://stackoverflow.com/a/18370751/2274303
[edit]
To answer your question about using a workaround, chunking the uploads using javascript is a good way to mitigate internet connectivity issues, but if you want to parse it in real time it's not as simple as arbitrarily breaking up the file into pieces. You need to make sure you're not splitting the file in the middle of a line, otherwise it will fail even if the data is valid. That brings up the issue of parsing a 1GB file in javascript, which isn't a good idea imo.
If you want to use javascript, continue uploading the entire file at once via an ajax request, so you can get the response outside of the main dom and force a redirect or cancel the upload. Depending on which js libraries you're using there are different ways of doing this.
None of this solves the reverse scenario. What if the file is 95% uploaded before there's an error? The researcher will need to either upload the whole thing again or edit the file to only include the rows from the error going forward. That means your application needs to support partial uploads and know to pick up where it left off. All these things are possible, but you're probably not going to find a simple workaround to get this working well.
Without understanding the dataset and what kind of validation you are doing it's hard to come up with a full solution. If parsing each row doesn't depend on the previous rows being valid, you could always upload the whole file, then display the rows with errors at the end and ask them to upload a second file with just the corrections.
The normal process of a HTTP web server happens like:
Server listens for request
Client creates request
Client sends request to server
Server processes request
Server creates response
Server sends response to client
Client processes response
The client starts the connection for communication and the server is able to respond on that connection, however if you close the connection the server will need to send a response on another connection. The browser may not allow the server to start a new connection that the client didn't request.
You may be able to respond by reading the first line and creating an error quickly, but the client will not read the response until it is done sending the request.
By sending the file in chunks or asynchronously sending lines of the file, you will be able to give feedback more immediately. You will be sending many smaller requests with the ability to respond in between.
The question was about HTTP protocol. I feel like this would be allowed by the protocol if you wrote a custom app and web app, however if you are using browsers then you must use HTTP as the companies have implemented it. In a custom app you could check for interruptions however most browsers will probably fire a full request before listening for a response, which is also a reason AJAX took off 20years ago.

Correct handling of long-running HTTP processes

I am building a REST-style Back End web service which serves up pre-generated blobs of data to a Front End site. The blobs themselves are not large and can easily be satisfied in a single HTTP response/request. The Back End is written in PHP.
It all works just fine. The difficulty is the blob regeneration, which needs considerably longer when there are many blobs. The regeneration can take longer than the response timeout on the (separately hosted) server.
I would like to proceed as follows:
– the initial request sent is "regenerate all blobs"
– processing starts, with no response until either all are complete (HTTP 200, all hunky-dory) or I reach an internal timelimit
– if the timelimit is reached, I want to send a response which indicates that the processing was incomplete (which HTTP status is appropriate, since the processing was successful, but incomplete? – 206 does not apply without Range headers...), so that the client can request continuation. I can imagine returning data that indicates what the "please continue" request should be (is this best done in a Link header?)
– the client then requests continuation and the loop continues until full success is signalled.
What is the best way to signal such things in an HTTP client-server exchange? I am prepared to write a short piece of Javascript to handle the client loop.
Thanks for any good ideas!
I did quite a bit of research about Range headers, and came to the conclusion that there was not enough consensus about the effect of using a range unit that was not "bytes". Despite the fact that the HTTP statuses 200, 206 and 416 carry useful meanings. See http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html for a good summary.
So I ended up writing a special case solution with the result in each response signalling whether to rerun the query with a "resume" value that enables already-processed blobs to be skipped.
It would be just great if the Range stuff would also allow "items" as unit – this would enable simple collections to be handled via header parameters without additional overloading of the URI.

Is there a way using HTTP to allow the server to update the content in a client browser without client requesting for it?

It is quite easy to update the interface by sending jQuery ajax request and updating with new content. But I need something more specific.
I want to send the response to client without their having requested it and update the content when they have found something new on the server. No need to send an ajax request every time. When the server has new data it sends a response to every client.
Is there any way to do this using HTTP or some specific functionality inside the browser?
Websockets, Comet, HTTP long polling.
It has name server push (you can also find it under name Comet technology). Do search using these keywords and you will find bunch examples, tools and so on. No special protocol is required for that.
Aaah! You are trying to break the principles of the web :) You see if the web was pure MVC (model-view-controller) the 'server' could actually send messages to the client(s) and ask them to update. The issue is that the server could be load balanced and the same request could be sent to different servers. Now if you were to send a message back to the client you'll have to know who all are connected to the server. Let's say the site is quite popular and you have about 100,000 people connecting to it every day. You'll actually have to store the IPs of each of them to know where on the internet they are located and to be able to "push" them a message.
Caveats:
What if they are no longer browsing your website? You see currently there is no way to log out automatically if you close your browser. The server needs to check after a fixed timeout if you have logged out (or you send a new nonce with every response to prevent the server from doing that check)
What about a system restart/crash etc? You'd lose all the IPs that you were keeping track of and you are back to square one - you have people connected to you but until you receive new requests you can't really "send" them data when they may be expecting it as per your model.
Let's take an example of facebook's news feeds or "Most recent" link close to the top right - sometimes while you are browsing your wall you see the number next to most recent has gone up or a new 'feed' has come to the top of your wall post! It's the client sending periodic requests to the server to find out what was updated rather than the other way round
You see, it keeps it simple and restful. You may feel it's inefficient for the client to "poll" the server to pull the data and you'd prefer push, but the design of the server gets simplified :)
I suggest ajax-pulling is the best way to go - you are distributing computation to the client and keeping it simple (KIS principle :)
Of course you can get around it, the question is, is it worth it?
Hope this helps :)
RFC 6202 might be a good read.

What response time overhead would you expect for a webservice request compared to a simple file request?

I'm developing an asp.net webservice application to provide json-formatted data to a widget that uses jQuery.ajax to make the request. I've been using the FireBug Net view to check how long the requests for data take.
In my initial prototype I was simply requesting static json data files, which on my dev machine were obviously returned very quickly by IIS - in around 2 to 5ms, even if not present in the browser's cache.
Now I've connected to the webservice I'm concerned that the data requests are way too slow, as they are taking consistently around 200ms to return. (This is even after the first request which is obviosuly compiling stuff and taking around 6 whole seconds.) I have removed all database/processing overhead from the web request, so it should take very little time to process, and this is also still on the local dev machine, so no network latency. The overhead is no better with a release build and on a production server.
My question is this:
Is this response time of around 200ms the best I can expect from a .net web service that is simply returning 'Hello World'? If it is possible to do much better, then what on earth might I be doing wrong? If it isn't possible, what would you do instead?
If it's really doing nothing in terms of connecting to a database etc, then you should be able to get a much better response time that 200ms.
If you measure the time at the server side instead of the client side, what do you see? Have you tried using WireShark to see what's happening in the network?
Basically you want to be able to create a timeline as accurately as possible, showing when the client sent the request, when the request hit the server, when your server-side code received the request, when your server-side code finished processing the request, when the server actually sent the response, and when the client actually received the response.
At that point you can work out where the bottleneck is.

Resources