We realize that if we want to produce a multipart query that contains a video file of 15GB, it is impossible to allocate in memory the size needed for such a large amount of data, most devices have only 2 or 3GB of RAM.
It is therefore absolutely necessary to switch to the uploadTask method which will push to the server the contents of a block file of the maximum size allowed by the IP packets sent to the server.
This is a POST method. However, it does not contain parameters such as the folder id or the file name. So you need a way to transmit these parameters. The best way is to code them in the URL.
I proposed an encoding format in the form of a path behind the endpoint of the API, but we can also very well encode these two parameters in a classic way in the URL, eg:
/api/upload?id=123&filename=video.mp4
From what I read on Stackoverflow, it's trivial with Symfony to retrieve id and filename. Then all the data received in the body of the POST request can be written in a raw way directly into a file, without also passing through a buffer in server-side memory.
The user data must imperatively be streamed, whether mobile side or server side, and whether upload or download. Loading user content in memory is also very dangerous in terms of security.
In symfony, how can I do that?
This goes way beyond Symfony and depends on the web server you are using.
By default with apache/nginx and php you will receive an already buffered request, so you cannot stream it to a file.
However, there are solutions, for example with Apache you can stream requests, see http://hc.apache.org/httpclient-3.x/performance.html#Request_Response_entity_streaming
Probably nginx also has options for it, but I don't know about those.
Another option might be websockets, see http://en.wikipedia.org/wiki/WebSocket
Related
There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.
We have an application that uses base64 encoded content to transmit attachments to backend. Backend then moves the content to Storage after some manipulation. This way we can enjoy world class offline support and sync and at the same time use the much cheaper Storage to store the files in the end.
Initially we used updateChildren to set the content in one go. This works fairly well, but then users started to upload bigger and more files at the same time, resulting in silent freezing of the database in the end user devices.
We then changed the code to write the files one by one using FirebaseDatabase.getInstance().getReference("/full/uri").setValue(base64stuff), and then using updateChildren to only set the metadata.
This allowed seemingly endless amount of files (provided that it is chopped to max 9 meg chunks), but now we're facing another problem.
Our backend uses Firebase listener to start working once new content is available. The trigger waits for the metadata and then starts to process the attachments. It seems that even though the client device writes the files before we set the metadata, the backend usually receives the metadata before the content from the files is available. This forced us to change backend code to stop processing and check later again if the attachment base64 data is available.
This works, but is not elegant and wastes cpu cycles and increases latencies.
I haven't found anything in the docs wether Firebase guarantees anything about the order in which the data is received by the backend. It seems that everything written in one go (using setValue or updateChildren) is available in the backend as one atomic unit.
Is this correct? Can I depend on that as a fact that will not change in the future?
The way I'm going to go about this (if the assumptions are correct above) is to write metadata first using updateChildren in the client like this
"/uri/of/metadata/uid/attachments/attachment_uid1" = "per attachment metadata"
"/uri/of/metadata/uid/attachments/attachment_uid2" = "per attachment metadata"
and then each base64 chunk using updateChildren with following payload:
"/uri/of/metadata/uid/uploaded_attachments/attachment_uid2" = true
"/uri/of/base64/content/attachment_uid" = "base64content"
I can't use setValue for any data to prevent accidental overwrite depending the order in which the writes will happen in the end.
This would allow me to listen to /uri/of/base64/content and try to start the handling of the metadata package every time a new attachment completes the load. The only thing needed to determine if all files have been already uploaded is to grab the metadata and see that all attachment uids found from /attachments/ are also present /uploaded_attachments/.
Writes from a single Firebase Database client are delivered to the server in the same order as they are executed on the client. They are also broadcast out to any listening clients in the same order.
There is no chance that another client will see the results of write B without seeing the results from write A (unless A was rejected by security rules)
I have an ASP.NET web application and I want my users to be able to upload large files. However, some files are very large and uses too much memory.
In principle it should be possible to receive the request stream and write it directly to a FileWriter stream, removing any need to load the entire file into memory first.
I've tried accessing Request.InputStream and writing it directly to a file. It works, but a test using larger files reveal that Request.InputStream is only available after the entire request is already loaded into memory.
Can someone tell me an approach I can use to receive a normal Request.InputStream in ASP.NET and directly write it to a file without first loading it into memory?
Note, the file is sent through a normal request in a browser by posting a form with a file field.
(I actually use BlueImp JQuery File Upload but I don't think it's relevant to this question)
The process is called byte serving.
Byte Serving:
Byte serving is the process of sending only a portion of an HTTP/1.1 message from a server to a client. Byte serving begins when an HTTP server advertises its willingness to serve partial requests using the Accept-Ranges response header. A client then requests a specific part of a file from the server using the Range request header.
Is seems that IIS and ASP.NET are capable of handling Accept-Range headers. There is a Range Controller on Microsoft git repositories.
Here is an article that may be useful in configuring IIS to handle these requests.
I am trying to design a system for something like this with ASP.net/C#.
The users pay for downloading some content (files- mp3s/PDFs,doc etc).I should be able to track the number of bytes downloaded by the user. If the number of bytes downloaded match the number of bytes on the server, I should set a flag in DB (telling that the download was successful and prevent them from downloading the file again/asking them to pay for the download again). If the download was incomplete, they should be able to download the file again without paying for it again(since the flag will not be set).
Is there any way to keep track of the number of bytes successfully downloaded by the client ?
Also when I see a file size in my WinXP machine, I see two sizes(size,size on disk). Which one should I consider ? And will it differ from one OS to another ?
You can easily measure data passed to the client in ASP.NET assuming you replace a direct IIS-controlled download with your own, which would go something like this:
while (context.Response.IsClientConnected) {
bytesRead = ReadFileChunkAsByteArrayWIthOffsetOrWhatever(buffer, offset);
context.Response.OutputStream.Write(buffer, 0, bytesRead);
context.Response.Flush();
offset += bytesRead;
if (bytesRead != bufferSize)
break;
}
It's complicated to make this 100% reliable from within ASP, but it can be done. You pretty much have to account for every possible failure point and react accordingly.
The problem though is still - as someone mentioned above - that it's impossible to know that the client received the data. If money is involved in this transaction, that can get to be a problem really quickly.
For that reason, the best approach would be to use a custom downloader client, like the one Amazon uses for MP3 file purchases. That way you're not subjecting either yourself or your customers to the vagaries of moving monetized bits over something as unreliable as HTTP.
you can create an asp.net handler that serves the file ( for asp.net mvc u can do a result action instead ... this is what I'm using). Make sure it supports resumable downloads.
from the you can track the bytes served.
Ps. this incurs a performance overhead vs. letting IIS serve it
update 1: I used something pretty similar to this http://dotnetslackers.com/articles/aspnet/Range-Specific-Requests-in-ASP-NET.aspx ... and the article has a pretty clear explanation on what's inside it. You probably can use that one as is, see the example in that post.
You could try looking into HTTP reponse codes (i.e: 200, 404 etc) - the client and server will be exchanging http headers so that they know what's going on - you should be able to monitor these to see if the reponses was successful (not sure - but you should be able to).
With regards to file size - I would try experiments on files with 'known' sizes, compare what the Http Logs tell you with what file explorer tells you.
Also, I've seen tools/wodgets that report file upload progress - so you're right you should be able to to the same in reverse, I guess. You could try looking at file upload code examples and tutorials - you might get some hints. I can't think of any off the top of my head - sorry.
To do custom byte serving like this, you will need to implement your own http handler.
This handler should do the following:
Implement some kind of authentication on the http handler, so you know who you are dealing with.
Then you will need to implement some kind of logging for files requested and files allowed to be downloaded.
Implement etags and expires headers for client side caching.
Server side caching
Deflate, gzip compression
If you want to support resumable downloads, you will need to implement 206 partial responses. This is essential for any kind of streaming and serving pdfs.
So you should be handling the following http headers:
ETag
Expires
Accept-Ranges
Range
If-Range
Last-Modified
If-Match
If-None-Match
If-Modified-Since
If-Unmodified-Since
Unless-Modified-Since
If you are looking for a sample implementations of http handlers check out:
http://code.google.com/p/talifun-web/wiki
It has a static file handler that implements all the above http headers, client side and server side caching and even compression.
There is also a log module and an authorization module that should go a long way into how to implement authentication and logging.
The size you want is the size (not the size on disk). Size on disk includes extra space that is taken up by fitting into the 4K block size of the partition. The size is the exact number of bits in the file.
I don't believe there is a good way to tell that a download has been completed. Response.TransmitFile is probably the best method for sending the file securely. But I don't believe it has anything that will tell you if the user actually recieved the file.
I don't know about the business this is supporting, but I can't think of a legitimate business where users would tolerate a single download per purchase model, and with the abiguity of the standard HTTP request/response model does not lend itself to making an accurate client side reciever. Not to mention this model could be eaisyly hacked by sending a failed response on reciept of the last packet.
I think using somthing like download windows (2hrs after purchase) and then lock it to an IP after the first request would accomplish the same result and result in alot less user issues and support calls. Also unless the file has some sort of stringent DRM, allowing the user persisten access based on their loggin is most likely the appropriate business model, because once they get the file they can copy it as many times as they like.
Look at DVD or Blu-Ray, no amount of copy protection or access controls will save your files from pirates, so make things easy for legitimate users.
Scenario:
localhost receives the current HttpRequest with 3 hidden inputs and a posted file. I must then forward this form data to an external image host and get the response.
See the System.Net.WebClient and related classes. You can use them to create a request to the remote server and handle the response. Also get Fiddler to help you replicate what the browser sends.
I hate doing this. It wastes my server's bandwidth and ties up IIS threads as well as using my server's CPU. It sucks and it's worth avoiding at all cost. Many services like, one that comes to mind is fliqz, provide a mechanism such that the files are uploaded directly from the client to their server (bypassing yours) and then they make a request to your server passing it various info on the query string.