After hearing several “choice words” emanating from his cube. I guessed it was not as straight forward as it sounded. Turns out, the files he was processing were decent sized. In fact, due to their size the file transfer latency was causing issues with the FileSystemWatcher firing notifications before the files had completely been written to disk. The result being a System.IO.IOException. I was surprised, however, when he told me that his solution was to enter into a polling loop when the FileSystemWatcher fires an event to attempt opening the file and eat the IOException until the file had fully arrived. The pseudo code looked something like this:
bool fileIsBusy = true; while(!fileIsBusy) { try { using (var file = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read)) { } fileIsBusy = false; } catch (IOException) { //The file is still arriving, give it time to finish copying and check again Thread.Sleep(5000); } } DoWorkOnFile(path);
My first reaction was: “There must be a better way! Why would you want to incur the overhead of unwinding the stack multiple times just to find out when your file is ready for processing?” These files were taking several minutes to upload. Regardless of the polling interval, there were several exceptions being generated prior to a single file getting processed.
After a little research, my colleague came to the realization that this was a standard pattern when using System.IO.FileSystemWatcher. His (and my) learning experience inspired me to share it with you.
My only previous experience with System.IO.FileSystemWatcher was indirectly through my colleague. Not satisfied with his solution, I went in search of a less brute force solution without reinventing the proverbial wheel. I wanted to use the FileSystemWatcher if possible, but I also knew I didn’t care for the exception handling solution that my colleague ended up with.
After discovering that FileSystemWatcher exposes several notification options that allow you to get notified when a file has arrived or changed in some way, I came up with a solution using the FileSystemWatcher NotifyFilters.LastWrite option. What I found is that the FileSystemWatcher sends multiple events over the course of a file being written to disk. I also noticed that it fires a NotifyFilters.LastWrite event once when the file begins writing, and then again when the file completes writing. That second event is the one I was interested in.
With that in mind I came up with some code that looks like this:
class WatcherService { System.IO.FileSystemWatcher watcher; public WatcherService() { watcher = new System.IO.FileSystemWatcher(); watcher.EnableRaisingEvents = true; watcher.Filter = "*.zip"; watcher.NotifyFilter = System.IO.NotifyFilters.LastWrite; watcher.Changed += new System.IO.FileSystemEventHandler(FileChanged); } private void FileChanged(object sender, FileSystemEventArgs e) { if (!IsFileReady(e.FullPath)) return; //first notification the file is arriving //The file has completed arrived, so lets process it DoWorkOnFile(e.FullPath); } private bool FileIsReady(string path) { //One exception per file rather than several like in the polling pattern try { //If we can't open the file, it's still copying using (var file = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read)) { return true; } } catch (IOException) { return false; } } }
Now we’ve limited the number of exceptions handled for each file to once per file instead of many per file. Granted, there’s room for improvement, we could maintain a dictionary of files that have started arriving and only process files that already have an entry in the dictionary, but I think you get the idea.
What if someone sends an *.tmp file for example and then rename it at the end? your solution wont work.. FileSystemWatcher is just bad for that!!
Thanks for you post. As I was implementing your suggestion, it occurred to me that if the lastwrite event is triggered only twice, you could just skip the first call by checking the value of a counter variable –
if (EventTriggeredCount == 1) {
this.WriteToLog(“File Change Event ” + EventTriggeredCount + ” Triggered for ” + e.FullPath);
EventTriggeredCount = EventTriggeredCount + 1;
return;
}else{
//processfile and reset the counter
EventTriggeredCount =1
}
Thanks Richard for your comment. It’s good to know my post is still getting some traction…I like your suggestion to user a counter. That pattern avoids having to rely on exceptions to know when the file is available.
Thanks Richard for your comment. It’s good to know my post is still getting some traction…I like your suggestion to user a counter. That pattern avoids having to rely on exceptions to know when the file is available.
File Changed can occur more than twice it seems for the LastWrite event, especially for larger files. I implemented yoru code and just wrote “File still writing” or “File ready for processing” instead of returning bool, and it’s unfortunately inconsistently not reporting state.
I like your approach, but it will need some tweaking I think.
8/23/2017 12:31:58 PM File Changed: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File still writing: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File Changed: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File still writing: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File Changed: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File still writing: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File Changed: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
8/23/2017 12:31:58 PM File still writing: C:inetpubftprootLocalUserBrokerIdretreesandshrubs.csv
Thanks Reuben for your feedback. In theory the NotifyFilter.LastWrite filter gets through when the file is created, and again when the file is closed. The purpose of that filter is to request notification when the file’s Modified Date gets updated. I can see with large files that additional notifications might occur when the file system fills a block and needs to allocate additional blocks to continue writing. It could also depend on how the file is been written to disk. For example one scenario I can think of is where a file is being append to in a flow that requires opening and closing the file multiple times.
I hope this helps.
Good input. Yesterday, I implemented an algorithm that starts a timer after the first changed (last write) occurred. It then monitors (every second) if the last write occurred greater than X seconds ago, and then calls the file “finished” if so. Not perfect, but no more IOExceptions for now.