does multipart/form-data sends the whole file data at one go or in a stream - http

I have a requirement of uploading a large file over HTTP to a remote server.
I am researching on how to send the data using multipart/form-data.
I have gone through How does HTTP file upload work? and understood how it separates the file data using boundaries.
I wanted to know whether all the file data is sent at one go or is streamed with several requests to the remote server.
Because if it is sent at one go, it is not possible to read the whole data at the remote server and write it to a file.
But if it streamed, how does the remote server parses the streamed data, write this streamed data to a file and redo the same thing till all the data is streamed.
Sorry if it a noob question, I am researching about it as well.
Maybe it is outside the scope of multipart/form-data and HTTP is itself taking care of.
Any help is appreciated.

The logistics of the sending is not relevant. What matters is the maximum request size that is set on the server side. How it is set depends on the technology used there: IIS, Apache, nginx? If the post request of the browser exceeds that size (because of a too large file), errors will happen. There is nothing on the browser side u can tweak or change to fix breaking uploads. Unless you are building your own browser:-)

Related

Does HTTP multipart/form-data provide reliability guarantees?

I have react front-end and flask backend web application. In this web app, I upload large CSV files from client to server via HTTP multipart/form-data. To achieve this, I take file information in <form encType='multipart/form-data'> element, with <input type='file'>. Then I use axios.post to make a POST request to the server.
On the flask server side, I access the file using request.files['file'] and save the file using file.save. This works as expected. The file is transferred successfully.
I'm thinking to compute MD5 checksum on both client and server side in order to make sure that both sides have files with same MD5 hash. However, this requires reading the file in chunks from the disk and compute the MD5. (since I'm dealing with large files, it is not possible to load the entire file in memory). So, I think this is little inefficient. I want to know whether this transfer via 'HTTP multipart/form-data' provide reliability guarantee? If so, I can ignore the MD5 verification right?
If reliability is not guaranteed, is there any good approach to make sure that both sides have exact same file copy? Thanks in advance.
HTTP integrity is as reliable as the underlying transport protocol, be it TCP (HTTP/1 and 2) or UDP (HTTP/3). Bits can fall over and still yield a valid checksum. This does happen.
If you want to make absolutely sure that you've received the same file as the uploader intended, you need to add a checksum yourself, using for example SHA or MD5.

Optimizing file synchronization over HTTP

I'm attempting to synchornize a set of files over HTTP.
For the moment, I'm using HTTP PUT, and sending files that have been altered. However, this is very inefficient when synchronizing large files where the delta is very small.
I'd like to do something closer to what rsync does to transmit the deltas, but I'm wondering what the best approach to do this would be.
I know I could use an rsync library on both ends, and wrap their communication over HTTP, but this sounds more like an antipattern; tunneling a standalone protocol over HTTP. I'd like to do something that's more in line with how HTTP works, and not wrap binary data (except my files, duh) in an HTTP request/response.
I've also failed to find any relevant/useful functionality already implemented in WebDAV.
I have total control over the client and server implementation, since this is a desktop-ish application (meaning "I don't need to worry about browser compatibility").
The HTTP PATCH recommended in a comment requires the client to keep track of local changes. You may not be able to do that due to the size of the file.
Alternatively you could treat "chunks" of the huge file as resources: depending on the nature of the changes and the content of the file it could be by bytes, chapters, whatever.
The client could query the hash of all chunks, calculate the same for the local version, and PUT only the changed ones.

with NodeJS, What's the best way to parse a file upload that does not necessarily end?

Short summary: How to accept content that may be endless and not uploaded at once (the connection needs to be kept alive), in a scenario where I'm the server and I'd like the clients to make those uploads in a RESTful (or something close) way ?
In the same way that I can make an http server that keeps the connection alive with a client and may continue sending content that the client reads and parses intantly (probably using a browser), I need to keep a connection opened with a client that will send me data that may not end or be continuously uploaded.
One (simple) way to do this would be simply to have a TCP server and then clients would write data to a socket.
But how do I do this with an HTTP PUT request ? This answers half of the question: "How will I parse a file upload continuously, without the upload finishing ?" But how will clients proceed to upload something that is not even a file and are separate blocks of data, like if they would be writing those blocks of data to a socket ? Is it even possible ?
If your data isn't going to have a discrete end, then you're not really performing an upload; you're doing a streaming scenario. For a streaming scenario, socket handling is much more appropriate.
First I think Sonier is right. But I found this solution by Felix called "Streaming file uploads with node.js" which might be useful.
Furthermore I think node.js might not be best fit for this, because everything has to be kept in memory and with big file-size you can hit a very hard wall. Some other popular node.js file upload solutions are:
https://github.com/felixge/node-formidable
https://github.com/rootslab/formaline
https://github.com/FooBarWidget/multipart-parser

Is there anything in the FTP protocol like the HTTP Range header?

Suppose I want to transfer just a portion of a file over FTP - is it possible using a standard FTP protocol?
In HTTP I could use a Range header in the request to specify the data range of the remote resource. If it's a 1mb file, I could ask for the bytes from 600k to 700k.
Is there anything like that in FTP? I am reading the FTP RFC, don't see anything, but want to make sure I'm not missing anything.
There's a Restart command in FTP - would that work?
Addendum
After getting Brian Bondy's answer below, I wrote a read-only Stream class that wraps FTP. It supports Seek() and Read() operations on a resource that is read via FTP, based on the REST verb.
Find it at http://cheeso.members.winisp.net/srcview.aspx?dir=streams&file=FtpReadStream.cs
It's pretty slow to Seek(), because setting up the data socket takes a long time. Best results come when you wrap that stream in a BufferedStream.
Yes you can use the REST command.
REST sets the point at which a subsequent file transfer should start. It is used usually for restarting interrupted transfers. The command must come right before a RETR or STOR and so come after a PORT or PASV.
From FTP's RFC 959:
RESTART (REST) The argument field
represents the server marker at which
file transfer is to be restarted. This
command does not cause file transfer
but skips over the file to the
specified data checkpoint. This
command shall be immediately followed
by the appropriate FTP service command
which shall cause file transfer to
resume.
Read more:
http://www.faqs.org/rfcs/rfc959.html#ixzz0jZp8azux
You should check out how GridFTP does parallel transfers. That's using the sort of techniques that you want (and might actually be code that it is better to borrow rather than implementing from scratch yourself).

Understanding REST: REST as a high volume transport?

I'm designing a system that will need to move multi-GB backup images over TCP, and I'm looking at REST as an alternative to ONC RPC.
For example, I might have
POST http://site/backups/image1
where image1 is an 50GB file whose data is contained in the HTTP body.
My question: is this within the scope of what REST is meant for? Is it inappropriate to move massive files over HTTP? My preliminary testing shows that the performance isn't too bad, and I like the clean, debuggable protocol, as opposed to a custom ONC RPC server. But is this overloading the role of a webserver?
Thanks,
-Steve
HTTP has about the same overheads as FTP.
An HTTP server if often asked to do more work than an FTP server. But otherwise, using HTTP to send a large file is about the same as using FTP.
The only consideration is making sure your web server and web application framework are configured to do this kind of thing without needlessly expanding the entire 50Gb file inside Apache.
Steve,
HTTP has a look-before-you-leap 'feature' that allows the client to ask the server whether it will accept the data submission before it actually sends out the data. I'd look into using this to avoid transferring GBs of data only to find out that the server is currently not willing to handle them. Look at the HTTP Expect header and 100 Continue status codes.
Also, you can use FTP within a RESTful approach, IOW, think along the lines of
<backup-store href="ftp://example.org/site/backup/images/"/>
and make your clients understand the ftp URI scheme.
Finally, the T in HTTP means transfer and not transport - an important distinction to make because the former is an application semantic (HTTP is an application protocol) and the latter is a not.
HTH,
Jan
REST has nothing to do with how large your data is or which method you use to transport it.

Resources