I have inherited some code which involves a scheduled task that writes data (obtained from an external source) to XML files, and a website that reads said XML files to get information to be presented to the visitor.
There is no synchronization in place, and needless to say, sometimes the scheduled task fails to write the file because it is currently open for reading.
The heart of the writer code is:
XmlWriter writer = XmlWriter.Create(fileName);
try
{
xmldata.WriteTo(writer);
}
finally
{
writer.Close();
}
And the heart of the reader code is:
XmlDocument theDocument = new XmlDocument();
theDocument.Load(filename);
(yep no exception handling at either end)
I'm not sure how to best approach trying to synchronize these. As far as I know neither XmlWriter.Create() nor XmlDocument.Load() take any parameters regarding file access modes. Should I manage the underlying FileStreams myself (with appropriate access modes) and use the .Create() and .Load() overloads that take Stream parameters?
Or should I just catch the IOExceptions and do some sort of "catch, wait a few seconds, retry" approach?
Provided that your web site does not need to write back to the XmlDocument that is loaded, I would load it via a FileStream that has FileShare.ReadWrite set. That should allow your XmlWriter in the other thread to write to the file.
If that does not work, you could also try reading the xml from the FileStream into a MemoryStream, and close the file as quickly as possible. I would still open the file with FileShare.ReadWrite, but this would minimize the amount of time your reader needs to access data in the file.
By using FileShare.ReadWrite (or FileShare.Write for that matter) as the sharing mode, you run the risk that the document is updated while you are still reading it. That could result in invalid XML content, preventing the XmlDocument.Load call from successfully parsing it. If you wish to avoid this, you could try synchronizing with a temporary "locking file". Rather than allowing file sharing, you prevent either thread from concurrently accessing, and when either of them is processing the file, write an empty, temporary file to disk that indicates this. When processing (reading or writing) is done, delete the temporary file. This prevents an exception from being thrown on either end, and allows you to synchronize access to the file.
There are a couple other options you could use as well. You could simply let both ends swallow any exception and wait a short time before trying again, although that isn't really the best design. If you understand the threading options of .NET well enough, you could also use a named system Mutex that both processes (your writing process and your web site process) know about. You could then use the Mutex to lock, and not have to bother with the locking file.
Related
)
I'm deveoping a program using an SQLite database I acces via QSqlDatabase. I'd like to handle the (hopefully rare) case when some changes are done to the database which are not caused by the program while it's running (e. g. the user could remove write access, move or delete the file or modify it manually).
I tried to use a QFileSystemWatcher. I let it watch the database file, and in all functions wrtiting something to it, I blocked it's signals, so that only "external" changes would trigger the changed signal.
Problem is that the check of the QFileSystemWatcher and/or the actual writing to disk of QSqlDatabase::commit() seems not to happen in the exact moment I call commit(), so that actually, first the QFileSystemWatcher's signals are blocked, then I change some stuff, then I unblock them and then, it reports the file to be changed.
I then tried to set a bool variable (m_writeInProgress) to true each time a function requests a change. The "changed" slot then checks if a write action has be requested and if so, sets m_writeInProgress to false again and exits. This way, it would only handle "external" changes.
Problem is still that if the change happens in the exact moment the actual writing is going on, it's not catched.
So possibly, using a QFileSystemWatcher is the wrong way to implement this.
How could this be done in a safe way?
Thanks for all help!
Edit:
I found a way to solve a part of the problem. Starting an exclusive lock on the database file prevents other connections from changing it. It's quite simple, I just have to execute
PRAGMA locking_mode = EXCLUSIVE
BEGIN EXCLUSIVE
COMMIT
and handle the error that emerges if another instance of my program trys to access the database.
What's left is to know if the user (accidentally) deleted the file during runtime ...
First of all, there's no SQLITE support for this: SQLITE only supports monitoring changes created over a database connection within your direct control. Whatever happens in a separate process concurrently with your process, or when your process is not running, is by design completely out of your control.
The canonical solution to this problem is to encrypt the database with a key specific to your application (and perhaps user, etc.). Then, no third-party process can modify the database using SQLITE. Of course any process can corrupt your database, or get rid of it -- that's too bad. You can detect corruption trivially by using cryptographic signatures, perhaps even error correcting codes so as to be able to restore the data should a certain amount of corruption happen. You don't need notifications of someone moving or deleting the database file: you will know when you attempt to open the database and the "file not found" error is given back to you.
Of course all of the above requires a custom VFS implementation. That's very much par for the course.
I've written a Camel (2.10) component to do Sftp as I needed a bit more control over the connection than the out of the box component offers.
I have route that looks something like this:
from("direct:start")
.to(startProcessor()) //1. Start processor sets the connection parameters for myCustomSftpComp producer
.to("myCustomSftpComp") //2. Uses Jsch, connects to server, gets the file, add to exchange, closes connection
.to(somePostProcessor()) //3. Does something with the file
.to("file://...."); //4. Write the file
This all works perfectly well.
My problem is at step 2, at the moment my files are quite small and I buffer them into memory, add the byte array to the Exchange body and its passed along and processed until it gets written by the file endpoint.
Of course this wont be sustainable with a large file, I need to add the InputStream reference to the exchange instead. My problem is I close and clean up the connection to the server inside myCustomSftpComp so when the exchange gets to post processor and file endpoint, it can no longer be accessed.
So basically I need some way to keep the connection open until after the file is written and closing the server connection inside the component from the route definition, sounds untidy so I'm open to atlernative ways of doing this.
I'm not sure why you've written your own SFTP component as the regular FTP component handles SFTP out of the box.
Passing just the input stream will still have you passing around the content in memory if you are going to do some processing in step three. Especially, this will be a problem since an InputStream can only be read once (although StreamCaching can be enabled, but is memory consuming).
What the FTP component can do is to download the file locally to a temporary file at disk. Then pass around the File handle to it. From that one, you could easily get Streams to do things with it as well as write it to a new file once done.
Check this out:
http://camel.apache.org/ftp2.html#FTP2-UsingLocalWorkDirectory
When is PostedFile.InputStream available when uploading a large file?
I'd like to pass a Stream to another process and I'm hoping that if a large file was being uploaded that I can pass the Stream straight to that new process w/o writing to the file system. Since the process and/or upload could take a while, I'm wondering if I can start reading the InputStream immediately or whether I have to wait for the whole file to be transferred to the server before it can be processed.
I guess a more general question is - what's the lifecycle of a POST request when file upload is involved?
The PostedFile.InputStream isn't available until the entire file has been uploaded. IIS6 caches the file in memory while IIS7 now caches the file to disk before handing off the input stream to your method.
You can get a HttpModule such as NeatUpload which allows you access to the bits while they're uploading.
Is the WriteFile call properly synchronous, and can I delete the file written immediately after the call?
If you're writing a file to the client with Response.WriteFile(), a call to Response.Flush() will make sure it's been fully output to the client. Once that's done you can delete it off of the webserver.
You may want to come up with a more robust system if the file is mission-critical. Say, a client-side script to validate that the file was received OK and then alerts the webserver that the file can be deleted.
That is the solution, after use the syntax Response.WriteFile(fileName);, type the following code lines:
Response.Flush();
System.IO.File.Delete(fullPathFileName);
Response.End();
It is fully synchronous, as you can see by looking at the implementation of HttpResponse.WriteFile with Lutz Reflector. You can delete the file immediately after the call to Response.WriteFile.
You don't have the guarantee that the response stream has been completely transmitted to the client, but calling Response.Flush doesn't give you that guarantee either. So I don't see a need to call Response.Flush before deleting the file.
Avoid loading the file into a MemoryStream, it brings you no benefit, and has a cost in memory usage, especially for large files.
If memory serves it is synchronous, as are the rest of the RESPONSE commands.
TransmitFile
You can also call TransmitFile to let IIS take care of it. It actually gets sent by IIS outside of your worker processs.
Memory Stream
If you are REALLY paranoid, don't send the file. Load it into a memory stream (if the size is reasonable) and transmit that. Then you can delete the file whenever you like. The file on disk will never be touched by IIS.
My web application generates pdf files and either e-mails or faxes them to our customers. Somehow IIS6 is keeping hold of the file and blocking any other requests for it claiming the old '..the process cannot access the file 'xxx.pdf' because it is being used by another process.'
When I recycle the application pool all is ok. Does anybody know why this is happening and how can I stop it.
Thanks
As with everyone said, do call the Close and Dispose method on any IO objects you have open when reading/writing the PDF files.
But I suppose you'd incorporated a 3rd party component? to do the PDF writing for you? If that's the case you might want to check with the vendor and/or its documentation to make sure that you are doing things in the way the vendors intended them to be. Don't trust the black box you got from someone else unless it has proven itself.
Another place to look might be what happens during multiple web request to the PDF files, are you sure that the file is not written simultaneously from multiple places? e.g. 2-3 requests genrating PDF simultaneously? or 2-3 pages along the PDF generation process?
And lastly, you might want to check the exception logs to make sure that nothing is crashing/thread exiting and leaving the file handle open without you noticing it. It happens a lot in multiple threading scenarios, sometimes the thread just crashes and exits - which could happen especially if you use 3rd party components, they might be performing some magic tricks, you'd never know.
Sounds like, the files - after being created - are still locked by the worker process. Make sure that you close all the connections for your file.
(remember, using using blocks'll take care of that)
I'd look through your code and make sure all handles to open (generated) files have been closed properly. Sometimes you just can't rely on the garbage collector to sort these things out.
Like mentioned before: Take care that you close all open handlers.
Sometimes the indexing service of Microsoft blocks files. Exclude your directory
Check that all the code writing files on disk properly close every handle using the proper .Close() in the finally clause or trough the "using" clause of C#
byte[] asciiBytes = getPdf(...);
try{
BinaryWriter bw = new BinaryWriter(File.Create(filename));
bw.Write(pdfBytes);
}
finally {
if(null != bw)
bw.Close();
}
Use the Response and the Content-Disposition clause to send the file
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-disposition", "attachment; filename=" + PDID + ".pdf");
Response.WriteFile(filename);
Response.Flush();
The code shown creates and send Pdf files to customer from about 18 months and we've never seen a file locked.