| More
Introducing Data Parallelism using the .NET 4.0 TPL
 


Introducing Data Parallelism using the .NET 4.0 TPL

With the release of .NET 4.0, we are provided with a brand new parallel programming library (the Task Parallel Library, or TPL). Using the types of System.Threading.Tasks, you can build fine-grained, scalable parallel code without having to work directly with threads or the thread pool. Furthermore, when you do so, you can make use of strongly typed LINQ queries (via "parallel LINQ", or PLINQ) to divide up your workload.

 

The primary class of the TPL is System.Threading.Tasks.Parallel.  This class supports a number of methods which allow you to iterate over a collection of data (specifically, an object implementing IEnumerable<T>) in a parallel fashion. If you were to look up the Parallel class in the .NET Framework 4.0 SDK documentation, you'll see that this class supports two primary static methods, Parallel.For() and Parallel.ForEach(), each of which defines numerous overloaded versions.

Both of these methods require you to specify an IEnumerable or IEnumerable<T> compatible container that holds the data you need to process in a parallel manner. The container could be a simple array, a non-generic collection (such as ArrayList), a generic collection (such as List<T>) or the results of a LINQ query.

In addition, you will need to make use of the System.Func<T> and System.Action<T> delegates to specify the target method which will be called to process the data. Recall that Func<T> represents a method which can have a given return value and a varied number of arguments. The Action<T> delegate is very similar to Func<T>, in that it allows you to point to a method taking some number of parameters.  However, Action<T> specifies a method which can only return void.

While you could call the Parallel.For() and Parallel.ForEach() methods and pass a strongly typed Func<T> or Action<T> delegate object, you can simplify your programming by making use of a fitting C# anonymous method or lambda expression.

One way to use the TPL is to perform data parallelism. Simply put, this term refers to the task of iterating over an array or collection in a parallel manner using the Parallel.For() or Parallel.ForEach() methods. Assume you need to perform some labor intensive File IO operations.  Specifically, you need to load a large number of *.jpg files into memory, flip them upside-down, and save the modified image data to a new location. Consider the following code snippet, which uses is currently *not* using TPL to process each image file:

private void ProcessFiles()
{
  // Load up all *.jpg files, and make a new folder for the modified data.
  string[] files = Directory.GetFiles
    (@"C:\Users\AndrewTroelsen\Pictures\My Family", "*.jpg",
    SearchOption.AllDirectories);
  string newDir = @"C:\ModifiedPictures";
  Directory.CreateDirectory(newDir);

  //  Process the image data in a blocking manner. 
  foreach (string currentFile in files)
  {
    string filename = Path.GetFileName(currentFile);

    using (Bitmap bitmap = new Bitmap(currentFile))
    {
      bitmap.RotateFlip(RotateFlipType.Rotate180FlipNone);
      bitmap.Save(Path.Combine(newDir, filename));
      this.Text = string.Format("Processing {0} on thread {1}", filename,
        Thread.CurrentThread.ManagedThreadId);
    }
  }
  this.Text = "All done!";
}

If we were to call this method from a Button Click, the UI would hang for some time, as the primary thread is waiting to complete the lengthy processing of image file. We can replace the C# foreach loop with the following code, which will inform the TPL to iterate over the data. 

//  Process the image data in a parallel manner! 
Parallel.ForEach(files, currentFile =>
  {
    string filename = Path.GetFileName(currentFile);

    using (Bitmap bitmap = new Bitmap(currentFile))
    {
      bitmap.RotateFlip(RotateFlipType.Rotate180FlipNone);
      bitmap.Save(Path.Combine(newDir, filename));
      this.Text = string.Format("Processing {0} on thread {1}", filename,
                  Thread.CurrentThread.ManagedThreadId);
    }
  }
);

Now, if you run program, the TPL will indeed distribute the workload to multiple threads from the thread pool, using as many CPUs as possible.  However, you will not see the window's caption display the name of each unique thread!  The reason is that the primary UI thread is still blocked, waiting for all of the other threads to finish up their business.

 

To keep the user interface responsive, you could certainly make use of asynchronous delegates or the members of the System.Threading namespace directly, but the System.Threading.Tasks namespace provides a simpler alternative, via the Task class. Task allows you to easily invoke a method on a secondary thread, and can be used as a simple alternative to working with asynchronous delegates. Here is a Click handler for a button control which will run the parallel code in a non-blocking manner: 

private void btnProcessImages_Click(object sender, EventArgs e)
{
  // Start a new "task" to process the files. 
  Task.Factory.StartNew(() =>
  {
    ProcessFiles();
  });
}

 

The Factory property of Task returns a TaskFactory object.  When you call its StartNew() method, you pass in an Action<T> delegate (here, hidden away with a fitting lambda expression) which points to the method to invoke in an asynchronous manner. With this small update, you will now find that the window's title will show which thread from the thread pool is processing a given file, and better yet, the text area is able to receive input, as the UI thread is no longer blocked.

 

This is just one small example of how the new TPL can simplify the development of multithreaded programs, which automatically leverage (where possible) the CPUs on the target machine.

 

 

 

 

 

 

 

 

 


Posted by: Andrew Troelsen
Posted on: 2/23/2010 at 8:31 AM
Tags:
Categories: .NET
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Add comment




biuquote
  • Comment
  • Preview
Loading