A while back a colleague of mine was developing a Windows Service using the .NET System.IO.FileSystemWatcher to process files arriving via FTP. The design was pretty straight forward: monitor a file share for incoming files, parse each file after it arrived, and then persist the information gleaned from each file into a database for downstream processing.

After hearing several “choice words” emanating from his cube. I guessed it was not as straight forward as it sounded. Turns out, the files he was processing were decent sized. In fact, due to their size the file transfer latency was causing issues with the FileSystemWatcher firing notifications before the files had completely been written to disk. The result being a System.IO.IOException. I was surprised, however, when he told me that his solution was to enter into a polling loop when the FileSystemWatcher fires an event to attempt opening the file and eat the IOException until the file had fully arrived. The pseudo code looked something like this:

bool fileIsBusy = true;
while(!fileIsBusy)
{
    try
    {
    using (var file = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read))
    {
    }
    fileIsBusy = false;
 }
    catch (IOException)
    {
        //The file is still arriving, give it time to finish copying and check again
        Thread.Sleep(5000);
    }
}
DoWorkOnFile(path);

My first reaction was: “There must be a better way! Why would you want to incur the overhead of unwinding the stack multiple times just to find out when your file is ready for processing?” These files were taking several minutes to upload. Regardless of the polling interval, there were several exceptions being generated prior to a single file getting processed.

After a little research, my colleague came to the realization that this was a standard pattern when using System.IO.FileSystemWatcher. His (and my) learning experience inspired me to share it with you.

My only previous experience with System.IO.FileSystemWatcher was indirectly through my colleague. Not satisfied with his solution, I went in search of a less brute force solution without reinventing the proverbial wheel. I wanted to use the FileSystemWatcher if possible, but I also knew I didn’t care for the exception handling solution that my colleague ended up with.

After discovering that FileSystemWatcher exposes several notification options that allow you to get notified when a file has arrived or changed in some way, I came up with a solution using the FileSystemWatcher NotifyFilters.LastWrite option. What I found is that the FileSystemWatcher sends multiple events over the course of a file being written to disk. I also noticed that it fires a NotifyFilters.LastWrite event once when the file begins writing, and then again when the file completes writing. That second event is the one I was interested in.

With that in mind I came up with some code that looks like this:

class WatcherService
{
    System.IO.FileSystemWatcher watcher;
    public WatcherService()
    {
        watcher = new System.IO.FileSystemWatcher();
        watcher.EnableRaisingEvents = true;
        watcher.Filter = "*.zip";
        watcher.NotifyFilter = System.IO.NotifyFilters.LastWrite;
        watcher.Changed += new System.IO.FileSystemEventHandler(FileChanged);
    }

    private void FileChanged(object sender, FileSystemEventArgs e)
    {
        if (!IsFileReady(e.FullPath)) return; //first notification the file is arriving

        //The file has completed arrived, so lets process it
        DoWorkOnFile(e.FullPath);
    }

    private bool FileIsReady(string path)
    {
        //One exception per file rather than several like in the polling pattern
        try
        {
             //If we can't open the file, it's still copying
             using (var file = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read))
             {
                 return true;
             }
        }
        catch (IOException)
        {
             return false;
        }
    }
}

Now we’ve limited the number of exceptions handled for each file to once per file instead of many per file. Granted, there’s room for improvement, we could maintain a dictionary of files that have started arriving and only process files that already have an entry in the dictionary, but I think you get the idea.