I've written a Camel (2.10) component to do Sftp as I needed a bit more control over the connection than the out of the box component offers.
I have route that looks something like this:
from("direct:start")
.to(startProcessor()) //1. Start processor sets the connection parameters for myCustomSftpComp producer
.to("myCustomSftpComp") //2. Uses Jsch, connects to server, gets the file, add to exchange, closes connection
.to(somePostProcessor()) //3. Does something with the file
.to("file://...."); //4. Write the file
This all works perfectly well.
My problem is at step 2, at the moment my files are quite small and I buffer them into memory, add the byte array to the Exchange body and its passed along and processed until it gets written by the file endpoint.
Of course this wont be sustainable with a large file, I need to add the InputStream reference to the exchange instead. My problem is I close and clean up the connection to the server inside myCustomSftpComp so when the exchange gets to post processor and file endpoint, it can no longer be accessed.
So basically I need some way to keep the connection open until after the file is written and closing the server connection inside the component from the route definition, sounds untidy so I'm open to atlernative ways of doing this.
I'm not sure why you've written your own SFTP component as the regular FTP component handles SFTP out of the box.
Passing just the input stream will still have you passing around the content in memory if you are going to do some processing in step three. Especially, this will be a problem since an InputStream can only be read once (although StreamCaching can be enabled, but is memory consuming).
What the FTP component can do is to download the file locally to a temporary file at disk. Then pass around the File handle to it. From that one, you could easily get Streams to do things with it as well as write it to a new file once done.
Check this out:
http://camel.apache.org/ftp2.html#FTP2-UsingLocalWorkDirectory
Related
We realize that if we want to produce a multipart query that contains a video file of 15GB, it is impossible to allocate in memory the size needed for such a large amount of data, most devices have only 2 or 3GB of RAM.
It is therefore absolutely necessary to switch to the uploadTask method which will push to the server the contents of a block file of the maximum size allowed by the IP packets sent to the server.
This is a POST method. However, it does not contain parameters such as the folder id or the file name. So you need a way to transmit these parameters. The best way is to code them in the URL.
I proposed an encoding format in the form of a path behind the endpoint of the API, but we can also very well encode these two parameters in a classic way in the URL, eg:
/api/upload?id=123&filename=video.mp4
From what I read on Stackoverflow, it's trivial with Symfony to retrieve id and filename. Then all the data received in the body of the POST request can be written in a raw way directly into a file, without also passing through a buffer in server-side memory.
The user data must imperatively be streamed, whether mobile side or server side, and whether upload or download. Loading user content in memory is also very dangerous in terms of security.
In symfony, how can I do that?
This goes way beyond Symfony and depends on the web server you are using.
By default with apache/nginx and php you will receive an already buffered request, so you cannot stream it to a file.
However, there are solutions, for example with Apache you can stream requests, see http://hc.apache.org/httpclient-3.x/performance.html#Request_Response_entity_streaming
Probably nginx also has options for it, but I don't know about those.
Another option might be websockets, see http://en.wikipedia.org/wiki/WebSocket
There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.
I am writing an upload handler (asp.net) to handle image uploads.
The aim is to check the image type and content size before the entire file is uploaded. So I cannot use the Request object directly as doing so loads the entire file input stream. I therefore use the HttpWorkerRequest.
However, I keep getting "The connection to the server was reset while the page was loading".
After quite a bit of investigation it has become apparent that when posting the file the call works only if the entire input stream is read.
This, of course, is exactly what I do not want to do :)
Can someone please tell me how I can close off the request without causing the "connection reset" issue and having the browser process the response?
There is no way to do this, as this is how HTTP functions. The best you can do is slurp the data from the client (i.e. read it in chunks) and immediately forget about it. This should prevent your memory requirements from being hammered, though will hurt your bandwidth.
I have inherited some code which involves a scheduled task that writes data (obtained from an external source) to XML files, and a website that reads said XML files to get information to be presented to the visitor.
There is no synchronization in place, and needless to say, sometimes the scheduled task fails to write the file because it is currently open for reading.
The heart of the writer code is:
XmlWriter writer = XmlWriter.Create(fileName);
try
{
xmldata.WriteTo(writer);
}
finally
{
writer.Close();
}
And the heart of the reader code is:
XmlDocument theDocument = new XmlDocument();
theDocument.Load(filename);
(yep no exception handling at either end)
I'm not sure how to best approach trying to synchronize these. As far as I know neither XmlWriter.Create() nor XmlDocument.Load() take any parameters regarding file access modes. Should I manage the underlying FileStreams myself (with appropriate access modes) and use the .Create() and .Load() overloads that take Stream parameters?
Or should I just catch the IOExceptions and do some sort of "catch, wait a few seconds, retry" approach?
Provided that your web site does not need to write back to the XmlDocument that is loaded, I would load it via a FileStream that has FileShare.ReadWrite set. That should allow your XmlWriter in the other thread to write to the file.
If that does not work, you could also try reading the xml from the FileStream into a MemoryStream, and close the file as quickly as possible. I would still open the file with FileShare.ReadWrite, but this would minimize the amount of time your reader needs to access data in the file.
By using FileShare.ReadWrite (or FileShare.Write for that matter) as the sharing mode, you run the risk that the document is updated while you are still reading it. That could result in invalid XML content, preventing the XmlDocument.Load call from successfully parsing it. If you wish to avoid this, you could try synchronizing with a temporary "locking file". Rather than allowing file sharing, you prevent either thread from concurrently accessing, and when either of them is processing the file, write an empty, temporary file to disk that indicates this. When processing (reading or writing) is done, delete the temporary file. This prevents an exception from being thrown on either end, and allows you to synchronize access to the file.
There are a couple other options you could use as well. You could simply let both ends swallow any exception and wait a short time before trying again, although that isn't really the best design. If you understand the threading options of .NET well enough, you could also use a named system Mutex that both processes (your writing process and your web site process) know about. You could then use the Mutex to lock, and not have to bother with the locking file.
Ok, so here's the problem: I'm reading the stream from a FileUpload control, reading in chunks of n bytes and writing the array in a loop until I reach the stream's end.
Now the reason I do this is because I need to check several things while the upload is still going on (rather than doing a Save(); which does the whole thing in one go). Here's the problem: when doing this from the local machine, I can see the file just fine as it's uploading and its size increases (had to add a Sleep(); clause in the loop to actually get to see the file being written).
However, when I upload the file from a remote machine, I don't get to see it until the the file has completed uploading. Also, I've added another call to write the progress to a text file as the progress is going on, and I get the same thing. Local: the file updates as the upload goes on, remote: the token file only appears after the upload's done (which is somewhat useless since I need it while the upload's still happening).
Is there some sort of security setting in (or ASP.net) that maybe saves files in a temporary location for remote machines as opposed to the local machine and then moves them to the specified destination? I would liken this with ASP.net displaying error messages when browsing from the local machine (even on the public hostname) as opposed to the generic compilation error page/generic exception page that is shown when browsing from a remote machine (and customErrors are not off)
Any clues on this?
Thanks in advance.
FileUpload control renders as an <input type="file"> HTML element; this way, your browser will open that file, read ALL content, encode and send it.
Your ASP.NET request just starts after IIS receives all browser data.
This way, you'll need to code a client component (Flash, Java applet, Silverlight) to send a file in small chunks and rebuild that at server-side.
EDIT: Some information on MSDN:
To control whether the file to upload is temporarily stored in memory or on the server while the request is being processed, set the requestLengthDiskThreshold attribute of the httpRuntime element. This attribute enables you to manage the size of the input stream buffer. The default is 256 bytes. The value that you specify should not exceed the value that you specify for the maxRequestLength attribute.
I understand that you want to check the file which is being uploaded for it's content.
If this is your requirement then why not add a textbox and populate it while you are reading the file from HttpPostedFile.