Big Ball of Mud


with one comment

pipeline-alaskaPipelines (or pipes and filters) are a commonly used architectural pattern used for processing large amount of data, meaning that you have more much data items to process than operations to apply. In regular procedural approach, it is very easy to end up with monolithic, inflexible application with non-reusable components. Implementing a change by adding or removing operations may be difficult. Another problem is boilerplate code mixing up with business logic. If you ever dealt with batch processing (inevitable in every business system), you know that things like performance and transaction management may become crucial for this and it is very hard to implement a processing component which separates these issues from logic.

The idea is to break down the processing business logic into small components and connect them using pipes, to form a pipeline. Instead of water or oil, this pipeline will now transport our data:


Basically the operations (sometimes called filters) are chunks of our logic, while pipes connect operations with each other and allow data flow. This pattern can be seen in different forms e.g. in command line pipes:

C:\>dir | more

A very interesting example of using pipelines for processing is Yahoo Pipes.

Linq queries can also be viewed as data pipelines:

            string[] numbers = { "zero", "one", "two", "three",
                                   "four", "five", "six", "seven",
                                   "eight", "nine" };
            var shortNums =
                from n in numbers
                where n.Length < 5
                select n;
            foreach (var num in shortNums)

In this example shortNums query is a pipeline with one Where operation. Linq pipelines are build by wrapping the iterators (here array iterator is wrapped by Where iterator, which is then wrapped by Select operator).

Operations are generally stateless components which do processing, transforming and filtering of data. They are easily reusable, modifiable and free of boilerplate code. The only way operations communicate with each other should be through the data stream. Pipes connect operations, but can also perform additional tasks, like data buffering or parallelizing the execution.

Data flow

Now that we know how a basic pipeline looks, let’s see how data can flow through the pipe. Each pipe will have at least one input and output:


Input and output here are operations connected to the pipeline. Operations can be passive, that means each data item will just go through it when provided by pipes (think chained method calls – each method is such a simple passive operation) or they can be active,  i.e. initiate operations by themselves.

A push operation occurs when the data is available in the active input, and it initiates the flow:


The picture describes the push operation on the input side.

A pull operation occurs when active output component requires new data and requests the pipe for a new data item:


Linq queries are usually pull pipelines, due to the lazy execution approach. In the previously presented code, the data does not flow through the pipe, until the foreach loop requests next element from the outermost iterator (from select clause). This makes Linq operations active.

Push and pull operations can mixed in a single pipe: it can buffer data pushed by active input and then serve it to active output when requested.

The pipelines presented here showed a simple sequential dataflow:

pipeline-sequentialSometimes the logic in the flow does not require sequential execution and the flow can split and joined:

pipeline-parallelThis approach allows parallel execution of some steps.


The main advantage of pipelines is the possibility to execute the same logic in different ways, e.g. parallel execution. Instead of sequentially executing steps for each element:

pipeline-sequential_processingwe can try to parallalize the execution as spawn a new instance of the pipeline for each incoming item:

pipeline-parallel_processingIf you know something about CPUs, you probably are familiar with the similar technique with regard to processors.


Pipelines can be very handy and despite all the advanced theory, we can use it in simple way (with Linq for example). The main benefits of this approach are reusability and composability of operations and possibility to manage execution from outside. There are also some disadvantages, like error processing for example. In next posts I will try to show some possible implementations, especially for batch processing.


Written by bigballofmud

2009/04/18 at 11:56 pm

Posted in C#, Patterns, Pipelines

Helpers and managers

with one comment

Imagine you are working on a system built in a transaction script architecture and have the following talk with the lead dev/architect/some other manager:

You: This tax calculation for each order gets really messy.We have four different taxes to apply, some are for the whole order and some for the specific order items. Now we have to multiply the net value by each tax rate in the specific order, handle rounding properly and then calculate totals for the order. This multiplying is all over our order service!

Lead: Do you have any ideas?

You: Value objects. I want to implement a Tax class. Each created tax object will take your input value and handle the job of calculating tax according to the logic for the tax type. I am pretty sure we can chain these taxes together, so that we can setup tax pipelines for each order item, depending on its category. This is much better than having the tax logic all over the services. What do you think?

Lead: Wait, all domain logic must be in services. You can’t change the architecture!

I highlighted in green the part which he/she really hears. Why did this happen? Because you were specific enough using the value object pattern name, that it rang a bell in his head. The Tax class is a really strong concept, which in his opinion goes against the system’s architecture (whether it is good or bad is another story). Now imagine a different conversation:

Lead: Do you have any ideas?

You: I want to create a TaxHelper class which will be responsible for tax calculations.

Lead: Helper class? I think that’s OK. Go on.

Ever wanted to hide your crazy ideas in the production code? Use helpers 😉

No, really, my point here is that these helper classes do not show any specific concept. They are everywhere, I bet you have some OrderManager, TransactionHelper, FeeCalculator in your code. Usually such class has lots of static methods, sometimes even state. The problem is that it can encapsulate very important business logic and instead of properly modelling the domain, you leak some concepts to these helpers.

The main problems I see here:

  1. Violation of SOLID. As Nick Malik proves, they violate each letter of the acronym.
  2. Swelling: No single responsibility, so such class becomes a bag for unwanted, repetitive code.
  3. Proliferation: Helper classes usually stay close to the consuming class. Let’s say you have a domain service layer and in these services you use your helper class extensively. Now you want to use one method of the helper in the client.  The client and server are properly separated, so you can’t just call the method. Instead of thinking in terms of application structure (e.g. creating a new service and calling it from client), you would probably create helper in the client project, copy the code and call the method, because, hey, it’s just a helper.
  4. Tight coupling: Imagine you want to take out some part of your logic to another application. Even if your modules are beautifully separated, simple POCO classes, one helper can ruin all your work. You would probably take it along with your module.
  5. Not so helping after all: What tools do you have in your StringUtils class? Do you remember all methods? Are they easily discoverable? Probably like me, you thought this was a good idea to put commonly used string manipulations in one class. The problem is nobody ever uses it and reinvents the wheel by coding the same string parsing routines again and again.

Almost always helper classes are really, really bad decisions. The only counter-examples I can recall are HtmlHelper and UrlHelper from ASP.NET MVC. Here the situation is clear: these classes are starting points, do not use anything else, here is a way to extend it. Following this pattern can give new developers a very helpful way to learn your tools. Otherwise helpers smell.

Written by bigballofmud

2009/04/02 at 7:15 pm

Posted in Antipatterns

Thread Marshalling Part 3 – Automatic Marshalling

with one comment

The last posts from this series showed some basic ways of marshalling background operations in the UI. Now that we know how the BackgroundWorker really works, we can take advantage of some .NET infrastructure and build the component, which will automagically (and this is no April 1st joke) marshall cross-thread calls depending on the execution context. This means we can use it in Windows Forms, WPF, WCF and any other environment which support SynchronizationContext model (even our own!).

In last posts we were using some seriously CPU-wasting empty looping. Let’s take this up one level: what is the more advanced way to create loops in C#? Iterators of course! The component I will show takes any IEnumerable you provide (from iterator block) and iterates through it asynchronously. Each yield return will jump back to the calling thread to notify about the progress. The code below is a modification to the previous EmptyLoop class:

    1     public class IteratingOperation<T>
    2     {
    3         private readonly IEnumerable<T> _loop;
    4         private AsyncOperation _operation;
    5         private bool _cancelled;
    7         public IteratingOperation(IEnumerable<T> loop)
    8         {
    9             _loop = loop;
   10         }
   12         public void Begin()
   13         {
   14             _operation = AsyncOperationManager.CreateOperation(null);
   15             new Action(Iterate).BeginInvoke(null, null);
   17         }
   19         public void Cancel()
   20         {
   21             _cancelled = true;
   22         }
   24         private void Iterate()
   25         {
   26             foreach (var result in _loop)
   27             {
   28                 if (_cancelled)
   29                 {
   30                     _operation.PostOperationCompleted(_ => OnCompleted(true), null);
   31                     return;
   32                 }
   33                 var resultToPost = result;
   34                 _operation.Post(_ => OnUpdateProgress(resultToPost), null);
   35             }
   36             _operation.PostOperationCompleted(_ => OnCompleted(true), null);
   37         }
   39         protected virtual void OnUpdateProgress(T progress)
   40         {
   41             if (UpdateProgress != null)
   42                 UpdateProgress(this, new UpdateProgressEventArgs(progress));
   43         }
   45         protected virtual void OnCompleted(bool cancelled)
   46         {
   47             if (Completed != null)
   48                 Completed(this, new CompletedEventArgs(cancelled));
   49         }
   51         public event EventHandler<UpdateProgressEventArgs> UpdateProgress;
   52         public event EventHandler<CompletedEventArgs> Completed;
   54         public class UpdateProgressEventArgs : EventArgs
   55         {
   56             public T Progress { get; private set; }
   58             public UpdateProgressEventArgs(T progress)
   59             {
   60                 Progress = progress;
   61             }
   62         }
   64         public class CompletedEventArgs : EventArgs
   65         {
   66             public bool Cancelled { get; private set; }
   68             public CompletedEventArgs(bool cancelled)
   69             {
   70                 Cancelled = cancelled;
   71             }
   72         }
   73     }

As you can see, the Begin method calls the Iterate method asynchronously and creates an AsyncOperation instance using AsyncOperationManager class. Both internally handle the appropriate SynchronizationContext. The Iterate method runs the provided loop as usual and calls the events, but uses the AsyncOperation instance to call this on the thread that created the operation. This gives us the safe way to publish these events to the UI. We do not use some parameter-passing features of AsyncOperation, hence lots of nulls in the code.

Because we pass lambdas to Post methods, there is one more interesting line here. If we just pass the result in this line like this:

   34                 _operation.Post(_ => OnUpdateProgress(result), null);

the result variable belongs to the outer scope of the lambda we are passing. This creates a closure and if the result variable gets modified in some way before the lambda is called, the modified version will be used. As the lambda will be called on another thread, there is a chance that the iterating loop will manage to modify it. This is a case when we have a state shared accross threads – the risk I am not willing to take. That’s why I get rid of the modified closure the usual way:

   33                 var resultToPost = result;
   34                 _operation.Post(_ => OnUpdateProgress(resultToPost), null);

The last thing I want to show is the usage of this component on the form (the form itself is the modified version from earlier examples):

    1 public partial class FormWithIterator : Form
    2     {
    3         private IteratingOperation<int> _emptyLoopOperation;
    5         public FormWithIterator()
    6         {
    7             InitializeComponent();
    8         }
   10         private void bStart_Click(object sender, EventArgs e)
   11         {
   12             progressBar.Visible = true;
   14             bCancel.Enabled = true;
   16             _emptyLoopOperation = new IteratingOperation<int>(DoEmptyLoop());
   17             _emptyLoopOperation.UpdateProgress += emptyloop_UpdateProgress;
   18             _emptyLoopOperation.Completed += emptyloop_Completed;
   19             _emptyLoopOperation.Begin();
   20         }
   22         public IEnumerable<int> DoEmptyLoop()
   23         {
   24             var step = 10000000;
   25             for (long i = 0; i < 1000000000; i++)
   26             {
   27                 if (i % step == 0) yield return (int)(i / step);
   28             }
   29         }
   31         private void ShowProgress(int percent)
   32         {
   33             progressBar.Value = percent;
   34         }
   37         private void ShowAlgorithmCompleted()
   38         {
   39             progressBar.Value = 0;
   40             progressBar.Visible = false;
   41             bCancel.Enabled = false;
   42         }
   45         private void emptyloop_UpdateProgress(object sender, IteratingOperation<int>.UpdateProgressEventArgs e)
   46         {
   47             ShowProgress(e.Progress);
   48         }
   51         private void emptyloop_Completed(object sender, IteratingOperation<int>.CompletedEventArgs e)
   52         {
   53             ShowAlgorithmCompleted();
   54         }
   57         private void bCancel_Click(object sender, EventArgs e)
   58         {
   59             _emptyLoopOperation.Cancel();
   60         }
   61     }

As you can see the DoEmptyLoop doesn’t need to know anything about threading and thread marshalling, just like in the case of BackgroundWorker.

Written by bigballofmud

2009/04/01 at 10:43 pm

Posted in C#, Multithreading

Thread Marshalling Part 2 – using BackgroundWorker

leave a comment »

The pattern described in the prevoius post is so common, that .NET 2.0 introduced a new tool to make it easier: BackgroundWorker. Let’s rewrite the code of our form to take advantage of this component:

    1     public partial class Main : Form
    2     {
    3         private readonly EmptyLoop _emptyLoop = new EmptyLoop(1000000000);
    4         private readonly BackgroundWorker _bgWorker = new BackgroundWorker();
    6         public Main()
    7         {
    8             InitializeComponent();
    9         }
   11         private void bStart_Click(object sender, EventArgs e)
   12         {
   13             progressBar.Visible = true;
   14             bCancel.Enabled = true;
   16             _emptyLoop.UpdateProgress += algorithm_UpdateProgress;
   18             _bgWorker.WorkerReportsProgress = true;
   19             _bgWorker.WorkerSupportsCancellation = true;
   20             _bgWorker.DoWork += (worker, eventArgs) => _emptyLoop.Go();
   21             _bgWorker.ProgressChanged += (worker, eventArgs) =>
   22                 ShowProgress(eventArgs.ProgressPercentage);
   23             _bgWorker.RunWorkerCompleted += (worker, eventArgs) =>
   24                 ShowAlgorithmCompleted();
   25             _bgWorker.RunWorkerAsync();
   26         }
   28         private void ShowProgress(int percent)
   29         {
   30             progressBar.Value = percent;
   31         }
   33         private void ShowAlgorithmCompleted()
   34         {
   35             progressBar.Value = 0;
   36             progressBar.Visible = false;
   37             bCancel.Enabled = false;
   38         }
   40         private void algorithm_UpdateProgress(object sender,
   41             EmptyLoop.UpdateProgressEventArgs e)
   42         {
   43             if (!_bgWorker.CancellationPending)
   44             {
   45                 _bgWorker.ReportProgress(e.Percent);
   46             }
   47             else
   48             {
   49                 _emptyLoop.Cancel();
   50             }
   51         }
   53         private void bCancel_Click(object sender, EventArgs e)
   54         {
   55             _bgWorker.CancelAsync();
   56         }
   57     }

Looks familiar but there is number of significant differences from prevoius implementation:

  1. The functionality provided by BackgroundWorker class is quite similar to the design of EmptyLoop class. This is not by accident, operations like: reporting progress, cancelling and completing are in fact common to asynchronous operations. We still need to send a message about the progress report from our EmptyLoop and as we would like to keep its implementation clean from any unnecessary dependencies, events are the best way.
  2. Canceling operation is quite different. Cancel button calls BackgroundWorker CancelAsync method, which sets CancellationPending flag to true. Normally DoWork event handler should check this flag and cancel working. In our case we do not have a way to pass the canceling message from DoWork handler, instead we check the cancelling flag in report progress method. This will stop the algorithm a little bit later, but still works.
  3. Finally: thread marshalling method calls: Invoke or BeginInvoke are gone! This is because BackgroundWorker marshals event calls automatically.

The last information is quite interesting, especially if you take into account that the BackgroundWorker lives in System.ComponentModel namespace and is supposed to be a generic asynchronous component – it can be used in WinForms, WPF and any environment you like. Somehow it knows how to do things like thread marshalling.

If we take a look into its RunWorkerAsync method, we can see that it takes advantage of the AsyncOperationManager static class. This static class has one property SynchronizationContext and one method CreateOperation. This members allow the hosting environment (like Windows Forms) set its SynchronizationContext – the object which describes the behaviour required for asynchronous operations. What is interesting, as this an extensible model, we can try to use this mechanism for our purposes.

Written by bigballofmud

2009/03/28 at 9:46 pm

Posted in C#, Multithreading

Thread Marshalling Part 1 – creating a thread in Windows Forms

with 2 comments

Note: Even if you know basics for spawning the new thread in the UI, I recommend performing the following exercise if you never did.

Note: The solutions from this post is not the way how it is done in the 21st century. Stay focused for next parts.

Common situation in Windows Forms development: suppose we want to execute a long-running task, which takes a lot of CPU. The most CPU intensive task I know is running an empty loop:

    for (long i = 0; i < 1000000000; i++)
        // DON'T DO THIS AT HOME

Creating empty loops is a wasteful for both CPU and developer, so lets create something really fancy, seasoned OO professional style:

    1     public class EmptyLoop
    2     {
    3         private readonly long _count;
    4         private readonly long _step;
    5         private bool _cancelled;
    7         public EmptyLoop(long countTo)
    8         {
    9             _count = countTo;
   10             _step = countTo / 100;
   11         }
   13         public void Go()
   14         {
   15             _cancelled = false;
   16             for (long i = 0; i < _count; i++)
   17             {
   18                 if (_cancelled) break;
   19                 if (i % _step == 0) OnUpdateProgress((int)(i / _step));
   20             }
   22             OnCompleted(_cancelled);
   23         }
   25         public void Cancel()
   26         {
   27             _cancelled = true;
   28         }
   30         protected virtual void OnUpdateProgress(int percent)
   31         {
   32             if (UpdateProgress != null)
   33                 UpdateProgress(this, new UpdateProgressEventArgs(percent));
   34         }
   36         protected virtual void OnCompleted(bool cancelled)
   37         {
   38             if (Completed != null)
   39                 Completed(this, new CompletedEventArgs(cancelled));
   40         }
   42         public event EventHandler<UpdateProgressEventArgs> UpdateProgress;
   43         public event EventHandler<CompletedEventArgs> Completed;
   45         public class UpdateProgressEventArgs : EventArgs
   46         {
   47             public int Percent { get; private set; }
   49             public UpdateProgressEventArgs(int percent)
   50             {
   51                 Percent = percent;
   52             }
   53         }
   55         public class CompletedEventArgs : EventArgs
   56         {
   57             public bool Cancelled { get; private set; }
   59             public CompletedEventArgs(bool cancelled)
   60             {
   61                 Cancelled = cancelled;
   62             }
   63         }
   64     }

Maybe not so empty now, but you’ve got the point. We have an algorithm, what we need is a form for running it:


The code for the form:

    1     public partial class Main : Form
    2     {
    3         private readonly EmptyLoop _emptyLoop = new EmptyLoop(1000000000);
    5         public Main()
    6         {
    7             InitializeComponent();
    8         }
   10         private void bStart_Click(object sender, EventArgs e)
   11         {
   12             progressBar.Visible = true;
   13             bCancel.Enabled = true;
   15             _emptyLoop.UpdateProgress += emptyloop_UpdateProgress;
   16             _emptyLoop.Completed += emptyloop_Completed;
   18             _emptyLoop.Go();
   19         }
   21         private void ShowProgress(int percent)
   22         {
   23             progressBar.Value = percent;
   24         }
   26         private void ShowAlgorithmCompleted()
   27         {
   28             _emptyLoop.UpdateProgress -= emptyloop_UpdateProgress;
   29             _emptyLoop.Completed -= emptyloop_Completed;
   30             progressBar.Value = 0;
   31             progressBar.Visible = false;
   32             bCancel.Enabled = false;
   33         }
   35         private void emptyloop_UpdateProgress(object sender, EmptyLoop.UpdateProgressEventArgs e)
   36         {
   37             ShowProgress(e.Percent);
   38         }
   40         private void emptyloop_Completed(object sender, EmptyLoop.CompletedEventArgs e)
   41         {
   42             ShowAlgorithmCompleted();
   43         }
   45         private void bCancel_Click(object sender, EventArgs e)
   46         {
   47             _emptyLoop.Cancel();
   48         }
   49     }

When we run this simple program, something weird happens (your results may vary): the program seems to do its job by running the loop, but the progress bar is not updated smoothly and when you try to click the Cancel button, the application hangs.

When we realize that each windows form is something of a loop itself, the problem becomes clear – we do not allow the form to update itself by freezing the message loop. Maybe allowing the form to update from time to time will solve the issue:

        private void ShowProgress(int percent)
            progressBar.Value = percent;

Better, but no banana. Now Cancel button sometimes work, but the UI still does not react properly and usually the operation completes before the progress bar is filled (wait, haven’t I seen this before?). Clearly the UI is still not refreshed properly. If only we could have a mechanism which would allow to switch from the running algorithm to the UI.

Wait, isn’t it what threads suppose to do? Well they should: it is an abstraction which allows to run things in parallel or just give us the impression of doing it at the level of the system. Let’s remove the DoEvents call and try to run the loop on another thread:

        private void bStart_Click(object sender, EventArgs e)
            progressBar.Visible = true;
            bCancel.Enabled = true;
            _emptyLoop.UpdateProgress += emptyloop_UpdateProgress;
            _emptyLoop.Completed += emptyloop_Completed;
            var loopThread = new Thread(_emptyLoop.Go);

Oops! As soon as we click the Start button we get InvalidOperationException in ShowProgress method:

        Cross-thread operation not valid: Control 'progressBar' accessed from a thread other than the thread it was created on.

Let me explain what is happening here. When we start the program, the main thread is created for us and this is the thread on which the form and all its controls are run. When we click the Start button, we spawn a new thread in which our empty loop is running. The EmptyLoop class has two events: UpdateProgress and Completed, to which the Form subscribes, so when we run the EmptyLoop method, it occasionally calls two event handlers emptyLoop_UpdateProgress and emptyLoop_Completed. As the EmptyLoop.Go method is called on another thread, so are event handlers. But what event handlers want to achieve is accessing the controls of the form to provide feedback to the user. WinForms controls to be handled properly need to be run only in one thread – the GUI thread. What we need is to jump for a moment to the UI thread for a moment to update the controls. This is called thread marshalling.

Each control has two methods for this: Invoke and BeginInvoke.

        private void algorithm_UpdateProgress(object sender, EmptyLoop.UpdateProgressEventArgs e)
            Invoke(new Action<int>(ShowProgress), e.Percent);
        private void algorithm_Completed(object sender, EmptyLoop.CompletedEventArgs e)
            Invoke(new Action(ShowAlgorithmCompleted));

It does exactly what we need: calls the method on the UI thread. Here we use the methods of the form, but any control will do.

If you need to check from which thread the method is accessed, use InvokeRequired property:

        if (!InvokeRequired)
            Invoke(new Action<int>(ShowProgress), e.Percent);

The difference between Invoke and BeginInvoke is that the computing thread will stop on Invoke until the ShowProgress completes, while BeginInvoke is asynchronous – the computing thread will will just fire and forget ShowProgress and continue its worthless loop spinning.

Written by bigballofmud

2009/03/21 at 8:51 pm

Posted in C#, Multithreading

Creating simple state machines using C# 2.0 iterators

with 4 comments

State machines can be a very useful way of simplifying complex logic: sometimes they are part of the business domain, sometimes the can help us managing screen logic. Normally you would use a State pattern for this, but when you have a simple case of sequential states and there is not much action with it, you end up with with creating lots of small state classes, which just change the state to the next one.

Recently when studying C# iterator blocks in depth, I discovered that we already have a mechanism  for state management built-in: iterators.

Let me show you an example with a simple lamp object: lamp can be On or Off and we change the state by switching it, it can also print its state to the console.

   26     public enum LampState
   27     {
   28         On,
   29         Off
   30     }
   32     public class Lamp : StateMachine<LampState>
   33     {
   34         public void Switch()
   35         {
   36             Console.WriteLine("Switching...");
   37             Go();
   38         }
   40         public void PrintState()
   41         {
   42             Console.WriteLine("Current state: " + CurrentState);
   43         }
   45         protected override IEnumerable<LampState> ConfigureStates()
   46         {
   47             while (true)
   48             {
   49                 yield return LampState.Off;
   50                 yield return LampState.On;
   51             }
   52         }
   53     }

The combination of ConfigureStates and Go methods do the trick. ConfigureStates stops its execution at yield return points and changes the state of the lamp. Go method advances the iterator, resumes the execution from the last point and goes to the next yield return. Clearly this approach gives the possibility to think about state changes in more sequential way. Let’s run it:

    6     class Program
    7     {
    8         static void Main()
    9         {
   10             var lamp = new Lamp();
   11             lamp.Start();
   13             lamp.PrintState();
   14             lamp.Switch();
   15             lamp.PrintState();
   16             lamp.PrintState();
   17             lamp.Switch();
   18             lamp.PrintState();
   20             Console.ReadLine();
   21         }
   23     }

The result is:

        Current state: Off
        Current state: On
        Current state: On
        Current state: Off

And finally the mysterious base class:

   56     public abstract class StateMachine<TState>
   57     {
   58         private IEnumerator<TState> _enumerator;
   60         public void Start()
   61         {
   62             _enumerator = ConfigureStates().GetEnumerator();
   63             _enumerator.MoveNext();
   64         }
   66         public TState CurrentState
   67         {
   68             get { return _enumerator.Current; }
   69         }
   71         protected void Go()
   72         {
   73             _enumerator.MoveNext();
   74         }
   76         protected abstract IEnumerable<TState> ConfigureStates();
   77     }

And that’s it! The rest is done by the compiler. The magic behind this works because iterator blocks are in fact a form of coroutine, and couroutines and state machines are in very close relationship.

I definitely declare iterator blocks #1 feature in C#.

Written by bigballofmud

2009/03/21 at 12:27 am

Posted in C#

Testing Equality

with one comment

Equality in .NET is one of the most basic and usually harder to grasp concepts. While much has been written about implementing Equals and GetHashCode contracts, it is usually quite difficult to test all cases which can be broken when providing your own implementation.

First let’s see what can MSDN says about requirements for implementing Equals:

The following statements must be true for all implementations of the Equals method. In the list, x, y, and z represent object references that are not null.

  • x.Equals(x) returns true (…)
  • x.Equals(y) returns the same value as y.Equals(x). (…)
  • (x.Equals(y) && y.Equals(z)) returns true if and only if x.Equals(z) returns true.
  • Successive calls to x.Equals(y) return the same value as long as the objects referenced by x and y are not modified.
  • x.Equals(a null reference) returns false.

While it has been a while since I was in school, this sounds somewhat familiar:

Let A be a set and ~ be a binary relation on A. ~ is called an equivalence relation if and only if for all a,b,c in A, all the following holds true:

  • Reflexivity: a ~ a
  • Symmetry: if a ~ b then b ~ a
  • Transitivity: if a ~ b and b ~ c then a ~ c.

Right! The infamous Equals contract is just a definition of equivalence relation with some additional conditions: consistency and right-side null behaviour. Let’s try to put this ideas into code (and yes, I know consistency implementation is very naive):

    4 public class EqualityConditions
    5 {
    6     private const int MaxConsistencyChecks = 5;
    8     public static bool RelationIsReflexive<T>(T x)
    9         where T : class
   10     {
   11         return Equals(x, x);
   12     }
   14     public static bool RelationIsSymmetric<T>(T x, T y)
   15         where T : class
   16     {
   17         return Equals(x, y) == Equals(y, x);
   18     }
   20     public static bool RelationIsTransitive<T>(T x, T y, T z)
   21         where T : class
   22     {
   23         if (Equals(x, y) && Equals(y, z))
   24             return Equals(x, z);
   26         return true;
   27     }
   29     public static bool RelationIsConsistent<T>(T x, T y)
   30         where T : class
   31     {
   32         bool result = Equals(x, y);
   33         for (int i = 1; i < MaxConsistencyChecks; i++)
   34         {
   35             if (Equals(x, y) != result) return false;
   36         }
   37         return true;
   38     }
   40     public static bool RelationIsFalseForRightSideNull<T>(T x)
   41         where T : class
   42     {
   43         if (x == null) return true;
   44         return !x.Equals(null);
   45     }
   46 }

While this can be useful, there is one more interesting thing about the equivalence relation – it defines how the set of values can be divided into something called equivalence classes: groups of elements which are in relation with each other. What is more important, the if we divide any set into disjoint subsets, in such way that each element belongs to exactly one subset, this division generates the equivalence relation for us (see this for more accurate description). If by implementing Equals we define the relation in one way, the second way by defining it with subsets can be useful to test if our implementation is correct. Now with previous conditions defined, we can try to write some little framework to help us. Give it an interesting API, add some generics magic and voila! Let me give you an example.

Let’s assume we have defined a simple Name class which is a value object in our domain, encapsulating all name strings. This class has an Equals method implemented in standard way:

    1 public class Name
    2 {
    3     public string Value { get; private set; }
    5     public Name(string name)
    6     {
    7         if (string.IsNullOrEmpty(name)) throw new ArgumentException();
    8         Value = name;
    9     }
   11     public override bool Equals(object obj)
   12     {
   13         if (ReferenceEquals(this, obj)) return true;
   14         if (obj == null || GetType() != obj.GetType())
   15         {
   16             return false;
   17         }
   18         var otherName = (Name)obj;
   19         return Value == otherName.Value;
   20     }
   21 }

Now let’s test its Equals method by providing some examples (I used NUnit):

    1 [TestFixture]
    2 public class NameTests
    3 {
    4     [Test]
    5     public void Verify_Equals_implementation()
    6     {
    7         Name nullReference = null;
    8         Name name1 = new Name("Name");
    9         Name name1_copy = new Name("Name");
   10         Name name2 = new Name("other Name");
   11         Name name1_Derrived = new NameDerrived("Name");
   12         Name name1_Derrived_copy = new NameDerrived("Name");
   13         Name name2_Derrived = new NameDerrived("other Name");
   15         new EqualityTestRunner<Name>(
   16             Eq.Class(nullReference),
   17             Eq.Class(name1, name1_copy),
   18             Eq.Class(name2),
   19             Eq.Class(name1_Derrived, name1_Derrived_copy),
   20             Eq.Class(name2_Derrived)
   21             ).Run();
   22     }
   24     class NameDerrived : Name {
   25         public NameDerrived(string name) : base(name)
   26         {
   27         }
   28     }
   29 }

How are the classes defined? These are example of values, which if belong to the same group, should equal each other (in terms of Equals method). Any pair of elements from different groups should not equal each other.

So what should be tested here. First of all we should check if our relation is really an equivalence using conditions from EqualityConditions class. Then we can check if the relation defined by Equals matches the examples, by comparing each pair and checking if pair equality correspond to defined sets. Enough said, let’s see some code:

    6 public class EqualityTestRunner<T> where T : class
    7 {
    8     private readonly EqualityClass<T>[] _equalityClasses;
   10     public EqualityTestRunner(params EqualityClass<T>[] equalityClasses)
   11     {
   12         _equalityClasses = equalityClasses;
   13     }
   15     public void Run()
   16     {
   17         foreach (var equalityClass in _equalityClasses)
   18         {
   19             equalityClass.AreEqualWithinClass();
   20             equalityClass.AreNotEqualWithOtherClassesMembers(_equalityClasses);
   21         }
   22         TestEqualityContractConditions(_equalityClasses);
   23     }
   25     private static void TestEqualityContractConditions(IEnumerable<EqualityClass<T>> equalityClasses)
   26     {
   27         var allExamples = from equalityClass in equalityClasses
   28                           from example in equalityClass
   29                           select example;
   30         TestEqualityContractConditions(allExamples);
   31     }
   34     private static void TestEqualityContractConditions(IEnumerable<T> examples)
   35     {
   36         foreach (var first in examples)
   37         {
   38             EqualityTests<T>.IsReflective(first);
   39             EqualityTests<T>.IsFalseForRightSideNull(first);
   41             foreach (var second in examples)
   42             {
   43                 EqualityTests<T>.IsSymmetric(first, second);
   44                 EqualityTests<T>.IsConsistent(first, second);
   46                 foreach (var third in examples)
   47                 {
   48                     EqualityTests<T>.IsTransitive(first, second, third);
   49                 }
   50             }
   51         }
   52     }
   53 }

Now the EqualityClass:

    7 public class EqualityClass<T> : IEnumerable<T> where T : class
    8 {
    9     private readonly T[] _examples;
   11     public EqualityClass(T[] examples)
   12     {
   13         _examples = examples;
   14     }
   16     public void AreEqualWithinClass()
   17     {
   18         foreach (var first in _examples)
   19             foreach (var second in _examples)
   20             {
   21                 EqualityTests<T>.AreEqual(first, second);
   22             }
   23     }
   25     public void AreNotEqualWithOtherClassesMembers(EqualityClass<T>[] equalityClasses)
   26     {
   27         foreach (var otherClass in equalityClasses)
   28         {
   29             if (otherClass == this) continue;
   31             foreach (var exampleFromEqualityClass in _examples)
   32             {
   33                 foreach (var exampleFromOtherClass in otherClass._examples)
   34                 {
   35                     EqualityTests<T>.AreNotEqual(exampleFromEqualityClass, exampleFromOtherClass);
   36                 }
   37             }
   38         }
   39     }
   41     public IEnumerator<T> GetEnumerator()
   42     {
   43         foreach (var example in _examples)
   44         {
   45             yield return example;
   46         }
   47     }
   49     IEnumerator IEnumerable.GetEnumerator()
   50     {
   51         return GetEnumerator();
   52     }
   53 }
   55 public static class Eq
   56 {
   57     public static EqualityClass<T> Class<T>(params T[] examples) where T : class
   58     {
   59         return new EqualityClass<T>(examples);
   60     }
   61 }

And tests which are performed by Runner:

    5 public static class EqualityTests<T>
    6     where T : class
    7 {
    8     public static void IsReflective(T x)
    9     {
   10         Assert.IsTrue(EqualityConditions.RelationIsReflexive(x),
   11                       string.Format("Relation is not reflective. x: {0}",
   12                                     x == null ? "null" : x.ToString())
   13             );
   14     }
   16     public static void IsSymmetric(T x, T y)
   17     {
   18         Assert.IsTrue(EqualityConditions.RelationIsSymmetric(x, y),
   19                       string.Format("Relation is not symmetric. x: {0} y: {1}",
   20                                     x == null ? "null" : x.ToString(),
   21                                     y == null ? "null" : y.ToString())
   22             );
   23     }
   25     public static void IsTransitive(T x, T y, T z)
   26     {
   27         Assert.IsTrue(EqualityConditions.RelationIsTransitive(x, y, z),
   28                       string.Format(
   29                           "Relation is not transitive. x: {0} y: {1}, z: {2}",
   30                           x == null ? "null" : x.ToString(),
   31                           y == null ? "null" : y.ToString(),
   32                           z == null ? "null" : z.ToString())
   33             );
   34     }
   36     public static void IsConsistent(T x, T y)
   37     {
   38         Assert.IsTrue(EqualityConditions.RelationIsConsistent(x, y),
   39                       string.Format(
   40                           "Relation is not consistent. x: {0} y: {1}",
   41                           x == null ? "null" : x.ToString(),
   42                           y == null ? "null" : y.ToString())
   43             );
   45     }
   47     public static void IsFalseForRightSideNull(T x)
   48     {
   49         Assert.IsTrue(EqualityConditions.RelationIsFalseForRightSideNull(x),
   50                       string.Format(
   51                           "Relation is not false for null as right side parameter. x: {0}",
   52                           x == null ? "null" : x.ToString())
   53             );
   54     }
   56     public static void AreEqual(T x, T y)
   57     {
   58         if (x != null)
   59         {
   60             Assert.IsTrue(x.Equals(y),
   61                           string.Format("Parameters should be equal, while: x != y    x: {0} y: {1}", x, y));
   62         }
   64         if (y != null)
   65         {
   66             Assert.IsTrue(y.Equals(x),
   67                           string.Format("Parameters should be equal, while: y != x    x: {0} y: {1}", x, y));
   68         }
   69     }
   71     public static void AreNotEqual(T x, T y)
   72     {
   73         if (x != null)
   74         {
   75             Assert.IsFalse(x.Equals(y),
   76                            string.Format("Parameters should not be equal, while: x == y    x: {0} y: {1}", x, y));
   77         }
   79         if (y != null)
   80         {
   81             Assert.IsFalse(y.Equals(x),
   82                            string.Format("Parameters should not be equal, while: y == x    x: {0} y: {1}", x, y));
   83         }
   84     }
   85 }

Of course this is just a concept, it does not test GetHashCode implementation at all, and if the Equals method is broken, it is somewhat difficult to find the cause.

Written by bigballofmud

2009/03/18 at 10:47 pm

Posted in C#