I am writing an upload handler (asp.net) to handle image uploads.
The aim is to check the image type and content size before the entire file is uploaded. So I cannot use the Request object directly as doing so loads the entire file input stream. I therefore use the HttpWorkerRequest.
However, I keep getting "The connection to the server was reset while the page was loading".
After quite a bit of investigation it has become apparent that when posting the file the call works only if the entire input stream is read.
This, of course, is exactly what I do not want to do :)
Can someone please tell me how I can close off the request without causing the "connection reset" issue and having the browser process the response?
There is no way to do this, as this is how HTTP functions. The best you can do is slurp the data from the client (i.e. read it in chunks) and immediately forget about it. This should prevent your memory requirements from being hammered, though will hurt your bandwidth.
Related
There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.
I am writing a web application for an academic research group. The researchers need to be able to upload large data sets (100MB - 1GB) in CSV format. I've written the server to process the data as it comes in. This means that if there is an error in the first row of the CSV, we can return an error straight away.
However, when this happens, the browser reports that "The connection was reset" or similar. Clearly, my web server is responding in a way that doesn't make sense.
If I explicitly close the HTTP request stream (this is Kotlin on the JVM by the way) before returning the error to the browser, then the problem goes away. However, it turns out that the close implementation of the request stream first goes and reads the whole stream to its end. So at that point the user still has to wait 30mins+ to find out that there is an error in the first row of their CSV.
Is what I am trying to do possible? Does the HTTP protocol permit a web server, in any circumstances, to begin responding before the full request body has been sent? If not, can you suggest a workaround that would allow me to deliver a user experience where the user doesn't have to wait for the whole file to be uploaded before finding out if there are any problems?
The answer is yes, according to the http spec servers should be able to send responses early and the client should stop sending the request body. Most browsers however, don't implement this correctly.
In theory, your http server needs to return a 4xx error code with a response body, then reset the connection to prevent the upload continuing in the background. See the answers below for a more detailed description of the issue. There are a couple of browser versions that do support this, so if you're doing this in lab conditions where you can control the client being used the links below will help.
https://stackoverflow.com/a/14483857/2274303
https://stackoverflow.com/a/18370751/2274303
[edit]
To answer your question about using a workaround, chunking the uploads using javascript is a good way to mitigate internet connectivity issues, but if you want to parse it in real time it's not as simple as arbitrarily breaking up the file into pieces. You need to make sure you're not splitting the file in the middle of a line, otherwise it will fail even if the data is valid. That brings up the issue of parsing a 1GB file in javascript, which isn't a good idea imo.
If you want to use javascript, continue uploading the entire file at once via an ajax request, so you can get the response outside of the main dom and force a redirect or cancel the upload. Depending on which js libraries you're using there are different ways of doing this.
None of this solves the reverse scenario. What if the file is 95% uploaded before there's an error? The researcher will need to either upload the whole thing again or edit the file to only include the rows from the error going forward. That means your application needs to support partial uploads and know to pick up where it left off. All these things are possible, but you're probably not going to find a simple workaround to get this working well.
Without understanding the dataset and what kind of validation you are doing it's hard to come up with a full solution. If parsing each row doesn't depend on the previous rows being valid, you could always upload the whole file, then display the rows with errors at the end and ask them to upload a second file with just the corrections.
The normal process of a HTTP web server happens like:
Server listens for request
Client creates request
Client sends request to server
Server processes request
Server creates response
Server sends response to client
Client processes response
The client starts the connection for communication and the server is able to respond on that connection, however if you close the connection the server will need to send a response on another connection. The browser may not allow the server to start a new connection that the client didn't request.
You may be able to respond by reading the first line and creating an error quickly, but the client will not read the response until it is done sending the request.
By sending the file in chunks or asynchronously sending lines of the file, you will be able to give feedback more immediately. You will be sending many smaller requests with the ability to respond in between.
The question was about HTTP protocol. I feel like this would be allowed by the protocol if you wrote a custom app and web app, however if you are using browsers then you must use HTTP as the companies have implemented it. In a custom app you could check for interruptions however most browsers will probably fire a full request before listening for a response, which is also a reason AJAX took off 20years ago.
I have a 5MB MemoryStream generated on server and it needed serving to users as an excel File.
I used Response.close to make it downloadable. But for sure, it will abort all requests / response on the page.
I known using a download page may help the thread, but how do I pass the MemoryStream to the download page? Normally it should pass a file URL to that page.
Any ideas?
More comments:
1. First, I want stream the file to client. To make it download property, which can be used instead of Response.close().
2. Second, during the client download, I want to show an processing bar(JUST AN IMAGE). The Response.close will stop the JavaScript function to hidden the bar.
So how to achieve the both requires? Thanks
Thanks anyway. The difficulty is that after Response.End or CompleteRequest the Http header has been sent. I'll not be able to access anything in the frond end. I should really use a separate page that handles process logic as well as is used to download file.
Your question is a little unclear. You ask how to end the response without ending the response. Do you mean you want to rest of the code to run after the response is flushed to the client? Or are you having a problem with the actual response not being right?
Using Response.Close() could be problematic as it basically resets the HTTP connection to the client. See This MSDN Blog Post and MSDN Response.Close() Reference.
If you can describe the problem you are having with more detail I can update my answer.
Look at Response.Flush();
Also, see this: http://blogs.msdn.com/b/aspnetue/archive/2010/05/25/response-end-response-close-and-how-customer-feedback-helps-us-improve-msdn-documentation.aspx
Should the logic of generating your excel file be on it own separate page or handler?
In order to achieve your second objective, it would require a session-based object (holding the size of file generated) that can get updated periodically and asynchronously during the generation of your downloading file. And the object then can be read by the AJAX request that is need to be periodically sent from the page that holds progress bar, then you can grab that to update the progress bar in the frond end.
Just an idea, not sure if it works. If you google AJAX progress bar you can find couple of examples.
Hope it helps
Ok, so here's the problem: I'm reading the stream from a FileUpload control, reading in chunks of n bytes and writing the array in a loop until I reach the stream's end.
Now the reason I do this is because I need to check several things while the upload is still going on (rather than doing a Save(); which does the whole thing in one go). Here's the problem: when doing this from the local machine, I can see the file just fine as it's uploading and its size increases (had to add a Sleep(); clause in the loop to actually get to see the file being written).
However, when I upload the file from a remote machine, I don't get to see it until the the file has completed uploading. Also, I've added another call to write the progress to a text file as the progress is going on, and I get the same thing. Local: the file updates as the upload goes on, remote: the token file only appears after the upload's done (which is somewhat useless since I need it while the upload's still happening).
Is there some sort of security setting in (or ASP.net) that maybe saves files in a temporary location for remote machines as opposed to the local machine and then moves them to the specified destination? I would liken this with ASP.net displaying error messages when browsing from the local machine (even on the public hostname) as opposed to the generic compilation error page/generic exception page that is shown when browsing from a remote machine (and customErrors are not off)
Any clues on this?
Thanks in advance.
FileUpload control renders as an <input type="file"> HTML element; this way, your browser will open that file, read ALL content, encode and send it.
Your ASP.NET request just starts after IIS receives all browser data.
This way, you'll need to code a client component (Flash, Java applet, Silverlight) to send a file in small chunks and rebuild that at server-side.
EDIT: Some information on MSDN:
To control whether the file to upload is temporarily stored in memory or on the server while the request is being processed, set the requestLengthDiskThreshold attribute of the httpRuntime element. This attribute enables you to manage the size of the input stream buffer. The default is 256 bytes. The value that you specify should not exceed the value that you specify for the maxRequestLength attribute.
I understand that you want to check the file which is being uploaded for it's content.
If this is your requirement then why not add a textbox and populate it while you are reading the file from HttpPostedFile.
I am trying to design a system for something like this with ASP.net/C#.
The users pay for downloading some content (files- mp3s/PDFs,doc etc).I should be able to track the number of bytes downloaded by the user. If the number of bytes downloaded match the number of bytes on the server, I should set a flag in DB (telling that the download was successful and prevent them from downloading the file again/asking them to pay for the download again). If the download was incomplete, they should be able to download the file again without paying for it again(since the flag will not be set).
Is there any way to keep track of the number of bytes successfully downloaded by the client ?
Also when I see a file size in my WinXP machine, I see two sizes(size,size on disk). Which one should I consider ? And will it differ from one OS to another ?
You can easily measure data passed to the client in ASP.NET assuming you replace a direct IIS-controlled download with your own, which would go something like this:
while (context.Response.IsClientConnected) {
bytesRead = ReadFileChunkAsByteArrayWIthOffsetOrWhatever(buffer, offset);
context.Response.OutputStream.Write(buffer, 0, bytesRead);
context.Response.Flush();
offset += bytesRead;
if (bytesRead != bufferSize)
break;
}
It's complicated to make this 100% reliable from within ASP, but it can be done. You pretty much have to account for every possible failure point and react accordingly.
The problem though is still - as someone mentioned above - that it's impossible to know that the client received the data. If money is involved in this transaction, that can get to be a problem really quickly.
For that reason, the best approach would be to use a custom downloader client, like the one Amazon uses for MP3 file purchases. That way you're not subjecting either yourself or your customers to the vagaries of moving monetized bits over something as unreliable as HTTP.
you can create an asp.net handler that serves the file ( for asp.net mvc u can do a result action instead ... this is what I'm using). Make sure it supports resumable downloads.
from the you can track the bytes served.
Ps. this incurs a performance overhead vs. letting IIS serve it
update 1: I used something pretty similar to this http://dotnetslackers.com/articles/aspnet/Range-Specific-Requests-in-ASP-NET.aspx ... and the article has a pretty clear explanation on what's inside it. You probably can use that one as is, see the example in that post.
You could try looking into HTTP reponse codes (i.e: 200, 404 etc) - the client and server will be exchanging http headers so that they know what's going on - you should be able to monitor these to see if the reponses was successful (not sure - but you should be able to).
With regards to file size - I would try experiments on files with 'known' sizes, compare what the Http Logs tell you with what file explorer tells you.
Also, I've seen tools/wodgets that report file upload progress - so you're right you should be able to to the same in reverse, I guess. You could try looking at file upload code examples and tutorials - you might get some hints. I can't think of any off the top of my head - sorry.
To do custom byte serving like this, you will need to implement your own http handler.
This handler should do the following:
Implement some kind of authentication on the http handler, so you know who you are dealing with.
Then you will need to implement some kind of logging for files requested and files allowed to be downloaded.
Implement etags and expires headers for client side caching.
Server side caching
Deflate, gzip compression
If you want to support resumable downloads, you will need to implement 206 partial responses. This is essential for any kind of streaming and serving pdfs.
So you should be handling the following http headers:
ETag
Expires
Accept-Ranges
Range
If-Range
Last-Modified
If-Match
If-None-Match
If-Modified-Since
If-Unmodified-Since
Unless-Modified-Since
If you are looking for a sample implementations of http handlers check out:
http://code.google.com/p/talifun-web/wiki
It has a static file handler that implements all the above http headers, client side and server side caching and even compression.
There is also a log module and an authorization module that should go a long way into how to implement authentication and logging.
The size you want is the size (not the size on disk). Size on disk includes extra space that is taken up by fitting into the 4K block size of the partition. The size is the exact number of bits in the file.
I don't believe there is a good way to tell that a download has been completed. Response.TransmitFile is probably the best method for sending the file securely. But I don't believe it has anything that will tell you if the user actually recieved the file.
I don't know about the business this is supporting, but I can't think of a legitimate business where users would tolerate a single download per purchase model, and with the abiguity of the standard HTTP request/response model does not lend itself to making an accurate client side reciever. Not to mention this model could be eaisyly hacked by sending a failed response on reciept of the last packet.
I think using somthing like download windows (2hrs after purchase) and then lock it to an IP after the first request would accomplish the same result and result in alot less user issues and support calls. Also unless the file has some sort of stringent DRM, allowing the user persisten access based on their loggin is most likely the appropriate business model, because once they get the file they can copy it as many times as they like.
Look at DVD or Blu-Ray, no amount of copy protection or access controls will save your files from pirates, so make things easy for legitimate users.