Many of you are probably already familiar with the parallel processing capabilities offered by the Task Parallel Library (TPL) that was first introduced in the .NET Framework 4. In .NET 4 TPL provided some core building blocks and algorithms for doing parallel computing in your .NET applications. Now that the development for the next version of the .NET platform is well under way, it is no surprise that Microsoft is looking for new ways to improve upon a good thing. One such update which should make the final cut is yet another way for developers to build highly responsive code using the new Async capabilities of .NET (see Andrew Troelson's Async CTP Anyone?), currently available as a community technology preview known as the Async CTP. One component of the Async CTP is TPL Dataflow, a library for doing agent/actor based data processing using parallel techniques.
The TPL Dataflow library was first released for preview as part of the Microsoft Visual Studio Async CTP, and it is also available in a separate TPL Dataflow CTP, both of which you can download from http://msdn.microsoft.com/vstudio/async. Both the Async CTP and TPL Dataflow CTP bits can be installed and used with Visual Studio 2010 SP1.
Typically when you did parallel programming with .NET 4 you were being proactive. It was usually the case that you had some data and you wanted to perform some computation on that data. For example, you may have a range of data that you wanted to perform some computation on, or you might need to filter your data in such a way that it was useful to you in your application. To solve these problems in .NET 4 with the Task Parallel Library you might have used Parallel.For or Parallel.Foreach, or possibly even a PLINQ query to iterate through the data and take advantage of the parallel constructs provided to you by TPL. In all these cases is was typically data first, then computations on that data later using primitives for tasks and data parallelism provided to you by the Task Parallel Library. What was missing was the ability to be reactive. Essentially the ability to setup your computation framework first, and then react to the data as it is coming in.
This reactive method of processing data is commonly referred to as dataflow parallelism. Essentially what you're doing is creating computational networks through which data can flow. Agent-based and actor-based message passing patterns like the producer/consumer pattern follow this reactive model. With the introduction of TPL Dataflow you will be able to use the parallel programming paradigms in your own .NET applications to build reactive dataflow networks. This is not to say that you couldn't have solved these problems in .NET 4 using Task Parallel Library, however it would have involved quite a bit of code to manage buffering data, scheduling tasks and dealing with the inter-process communications required to perform the work. The TPL Dataflow library gives you a lot of the primitives that you need to solve these problems out of the box without having to worry about the minutia required to implement such architectures.
So what is TPL Dataflow? Essentially it is a parallel processing library that exposes primitives for doing in-process messaging passing and processing. TPL Dataflow provides a set of agents, commonly referred to as dataflow blocks, or simply blocks, that contain the infrastructure required to buffer and process your data in an asynchronous and parallel manner. TPL Dataflow provides the infrastructure for being able to build data parallelism into your applications.
At the core of the TPL Dataflow library is its interface hierarchy, pictured below. These interfaces describe the behavior of a dataflow block. At the very top of the hierarchy is IDataflowBlock, defining contract for dealing with the lifetime of a dataflow block. Below IDataflowBlock are three sub-interfaces that define contracts for blocks that can be a source of data, blocks that can be a target for data, and blocks that can be both a source and target for data. ISourceBlock represents a source of data, defining a contract for buffering and receiving data. ITargetBlock represents a target for incoming data, defining a contract for buffering and passing data. Finally IPropagatorBlock represents a block that can be both a source and a target for data, defining a contract for receiving data from sources, possibly transforming that data and propagating the result on to other targets.

With these interfaces you can develop your own blocks to build your own custom dataflow networks, but there are also several built in blocks for the most common scenarios like buffering and propagation of data, acting on data, and even blocks for joining data from multiple sources then buffering and presenting the result to a target.
For buffering and propagation of data TPL Dataflow provides the following blocks:
BufferBlock<T> - buffers incoming data and delivers that data to a target in an asynchronous and parallel fashion.
WriteOnceBlock<T> - accepts one piece of data and delivers that piece of data in a broadcast fashion to all target blocks interested in that piece of data.
BroadcastBlock<T> - accepts multiple pieces of data, buffering them as they arrive and broadcasts each piece of data to all interested target blocks.
For acting on incoming data TPL Dataflow provides the following dataflow blocks:
ActionBlock<Tin> - an ISourceBlock that accepts and buffers incoming data and performs some action on that data.
TransformBlock<Tin, Tout> - an IPropagatorBlock that accepts and buffers incoming data, executes some function on that data and then buffers the output to be consumed by a target.
TransformManyBlock<Tin, Tout> - very similar to TransformBlock, however the relationship to its targets is one-to-many. It buffers and transforms incoming data, then buffers and makes that data available to multiple targets.
For joining and grouping data TPL Dataflow provides the following dataflow blocks:
BatchBlock<T> - An IPropagatorBlock that accepts and buffers incoming data and groups that data into batches (arrays).
JoinBlock<T1, T2> - Accepts and buffers data from two different source blocks, combines data from each source into tuples, then buffers and makes those tuples available to a target.
BatchedJoinBlock<T1, T2> - a combination of BatchBlock and JoinBlock that groups pairs of collections of incoming data into tuples of those collections, buffering and presenting that data as output to a target.
As you can see TPL Dataflow provides you with some significant capabilities for developing parallel dataflow capabilities into your own applications without having to reinvent the infrastructure to do so. This was a quick look at the TPL Dataflow and the interface hierarchy upon which it is build, as well as some of the built in capabilities for building applications that incorporate data parallelism. Watch for future posts where we'll look at some of these built in mechanisms along with examples demonstrating how you can get started using TPL Dataflow.
If you are interested in looking at the current state of this new technology, here are some helpful links:
159e3a31-6d24-43bc-bd6a-2e0679ff4e56|0|.0