- Posted by Kris Vandermotten on November 30, 2010
The Visitor design pattern was first documented in 1995 by the Gang of Four. It’s a workaround for the fact that most strongly typed object oriented languages only support single dispatch, even when sometimes double dispatch is required. With C# 4, we no longer need this workaround. We now have something better, more on that below. Let’s look at an example.
The traditional Visitor pattern
The System.Linq.Expressions namespace contains types that enable code expressions to be represented as objects in the form of expression trees. For example, the following C# statement creates an expression tree:
Expression<Func<double, double>> f = x => Math.Sin(1 + 2 * x);
There are many kinds of expressions. The above example creates several objects of different types, including ParameterExpression, ConstantExpression, BinaryExpression and MethodCallExpression, all of which inherit from Expression.
There are many ways to represent expressions as text. For example, we can use an infix notation (as most programming languages do), but we can also use prefix or postfix notation. Anyone who as ever worked with an HP scientific calculator, or a programming language such as Forth, will appreciate postfix notation, also known as reverse polish notation.
As a result, the way to translate a particular expression into a text representation depends on two things: the kind of expression and the kind of notation. More precisely, the method to execute to translate an expression object into a string object depends on the type of the expression, and on the type of the translation algorithm. Using a virtual function would allow the system to choose a method based on one of these dimensions, e.g. the type of expression, but not on both. Virtual functions provide single dispatch, but we need dual dispatch.
In fact, the expression class has a virtual ToString() method, inherited from object. Every type inheriting from Expression has its own version, making the algorithm depend on the type of expression. But it’s a hardcoded implementation, using an infix notation. What if we want a postfix ToString? Or Prefix? Or a C# or F# syntax? This is where the Visitor pattern can help us, and luckily the Expression class has support for it. The class ExpressionVisitor is an abstract base class for algorithms working on expressions. Now when I say algorithms, you probably think of methods with parameters and return values, but that’s not how a Visitor works. A Visitor is an object of some class, and parameters must be passed in, typically via the constructor. The return value, i.e. the result of the algorithm must be read back from a property. Let’s create a base class for our ToString visitors:
public abstract class ToStringVisitor : ExpressionVisitor
{
protected readonly StringBuilder resultAccumulator = new StringBuilder();
public string Result
{
get { return resultAccumulator.ToString(); }
}
}
This provides us with a base class for Visitors that have a string Result property. There are some issues with it, such as when to Clear() the StringBuilder when the Visitor is reused, but let’s not get into those. We can now create a ToPostfixStringVisitor, and encapsulate it behind a static method:
public static class ExpressionExtensions
{
public static string ToPostfixString(this Expression<Func<double, double>> function)
{
var visitor = new ToPostFixStringVisitor();
visitor.Visit(function);
return visitor.Result;
}
private class ToPostFixStringVisitor : ToStringVisitor
{
protected override Expression VisitLambda<T>(Expression<T> node)
{
// enables reusing the visitor – not absolutely required here as the only
// place where an instance can be created is in the ToPostfixString method.
this.resultAccumulator.Clear();
foreach (var parameter in node.Parameters)
{
this.Visit(parameter);
}
this.resultAccumulator.Append("-> ");
this.Visit(node.Body);
return node;
}
protected override Expression VisitParameter(ParameterExpression node)
{
this.resultAccumulator.Append(node.Name);
this.resultAccumulator.Append(' ');
return node;
}
protected override Expression VisitBinary(BinaryExpression node)
{
this.Visit(node.Left);
this.Visit(node.Right);
switch (node.NodeType)
{
case ExpressionType.Add:
case ExpressionType.AddChecked:
this.resultAccumulator.Append('+');
break;
case ExpressionType.Multiply:
case ExpressionType.MultiplyChecked:
this.resultAccumulator.Append('*');
break;
case ExpressionType.Subtract:
case ExpressionType.SubtractChecked:
this.resultAccumulator.Append('-');
break;
case ExpressionType.Divide:
this.resultAccumulator.Append('/');
break;
case ExpressionType.Modulo:
this.resultAccumulator.Append('%');
break;
default:
throw new NotSupportedException();
}
this.resultAccumulator.Append(' ');
return node;
}
protected override Expression VisitMethodCall(MethodCallExpression node)
{
foreach (var arg in node.Arguments)
{
this.Visit(arg);
}
this.resultAccumulator.Append(node.Method.Name);
this.resultAccumulator.Append(' ');
return node;
}
protected override Expression VisitConstant(ConstantExpression node)
{
this.resultAccumulator.Append(node.Value);
this.resultAccumulator.Append(' ');
return node;
}
}
}
For example, the following line outputs x -> 1 2 x * + Sin
Console.WriteLine(ExpressionExtensions.ToPostfixString(x => Math.Sin(1 + 2 * x)));
It works, even though for the sake of example it only supports a very small subset of all expressions. At least in the case of binary operators, it throws a NotSupportedException for operators that are, well, not supported. I really should add a bunch of other methods, for example:
protected override Expression VisitConditional(ConditionalExpression node)
{
throw new NotSupportedException();
}
protected override Expression VisitBlock(BlockExpression node)
{
throw new NotSupportedException();
}
Anyway, how does it work? The Visit() method calls an internal virtual method on Expression called Accept. Being virtual, this chooses what kind of expression to work on an it calls the appropriate VisitX method in the visitor. This one is virtual as well, and it chooses the correct algorithm, our ToPostfixStringVisitor in this case.
So we have double dispatch, via a combination of two single dispatch calls.
Dynamic dispatch to the rescue
As of C# 4, we are not restricted to single dispatch, we now have dynamic dispatch. Let’s see what this example looks like using dynamic:
public static class ExpressionExtensions
{
public static string ToPostfixString(this Expression<Func<double, double>> function)
{
StringBuilder resultAccumulator = new StringBuilder();
Visit(function, resultAccumulator);
return resultAccumulator.ToString();
}
private static void Visit(Expression expression, StringBuilder resultAccumulator)
{
dynamic dynamicExpression = expression;
VisitCore(dynamicExpression, resultAccumulator);
}
private static void VisitCore(LambdaExpression node, StringBuilder resultAccumulator)
{
foreach (var parameter in node.Parameters)
{
Visit(parameter, resultAccumulator);
}
resultAccumulator.Append("-> ");
Visit(node.Body, resultAccumulator);
}
private static void VisitCore(ParameterExpression node, StringBuilder resultAccumulator)
{
resultAccumulator.Append(node.Name);
resultAccumulator.Append(' ');
}
private static void VisitCore(BinaryExpression node, StringBuilder resultAccumulator)
{
Visit(node.Left, resultAccumulator);
Visit(node.Right, resultAccumulator);
switch (node.NodeType)
{
case ExpressionType.Add:
case ExpressionType.AddChecked:
resultAccumulator.Append('+');
break;
case ExpressionType.Multiply:
case ExpressionType.MultiplyChecked:
resultAccumulator.Append('*');
break;
case ExpressionType.Subtract:
case ExpressionType.SubtractChecked:
resultAccumulator.Append('-');
break;
case ExpressionType.Divide:
resultAccumulator.Append('/');
break;
case ExpressionType.Modulo:
resultAccumulator.Append('%');
break;
default:
throw new NotSupportedException();
}
resultAccumulator.Append(' ');
}
private static void VisitCore(MethodCallExpression node, StringBuilder resultAccumulator)
{
foreach (var arg in node.Arguments)
{
Visit(arg, resultAccumulator);
}
resultAccumulator.Append(node.Method.Name);
resultAccumulator.Append(' ');
}
private static void VisitCore(ConstantExpression node, StringBuilder resultAccumulator)
{
resultAccumulator.Append(node.Value);
resultAccumulator.Append(' ');
}
private static void VisitCore(Expression node, StringBuilder resultAccumulator)
{
throw new NotSupportedException();
}
}
The dynamic dispatch is achieved by the Visit method. To learn more about how this works, see http://blogs.msdn.com/b/samng/archive/2008/11/06/dynamic-in-c-iii-a-slight-twist.aspx.
So how is this better?
First of all, this approach works with all classes. The Expression class does have visitor support baked in, but most classes don’t. The dynamic approach also works if the target classes don’t support the visitor pattern. That also means that you don’t have to do anything special with your own classes to enable this technique.
The dynamic approach is also simpler. I’ve noticed that most people don’t immediately “see” the visitor pattern, but the dynamic approach is easier to understand.
Visitor implementations typically have methods that return void, and producing a result must be accomplished via fields and properties (the ExpressionVisitor is a notable exception here, it is optimized for rewriting expressions, i.e. calculating a new expression based on an existing one). With dynamic methods, you choose your parameter and return types (the StringBuilder in this example). Not only is that much simpler to code, the entire thing has no state in fields, only in stack local variables. As a result, it’s completely reentrant and thread-safe.
Note also that the last VisitCore method specifies what to do with expressions that aren’t handled by any of the other methods. Much more convenient than with a Visitor, where you always have to specify a method for each concrete type, unless it just so happens that the behavior you want is the default behavior from the base class.
Conclusion
The dynamic keyword was introduced to facilitate interoperability with dynamic languages and systems, including COM. C# remains primarily a strongly typed language. As such, some people suggested that dynamic has no place in plain C# programs that don’t require such interoperability. However, the above example shows that dynamic dispatch can be very useful in the context of strongly typed C# programs, as an alternative to the Visitor pattern.
- Posted by Kris Vandermotten on October 17, 2009
A few weeks ago, I talked about static reflection and its advantages. You’ll remember that the main advantages, compared to the normal reflection API’s, are the compile time checking of parameters and IntelliSense support.
How does it compare at other levels, performance for example? Before we dive into that question, let me state that performance may or may not be important to you. A program that is fast enough is, well, fast enough. It’s unlikely that a (single) reflection call will have a significant impact on, say, the response time of a graphical user interface, and so performance doesn’t matter. If your algorithm requires millions of reflection operations, I’m sure you can rewrite it somehow to reduce that number significantly, and then performance again probably doesn’t matter anymore. That being said, we still want to know, right?
First of all, let’s compare code.
Take this line (using the Example class from the last post):
PropertyInfo pi = typeof(Example).GetProperty("Description");
This line compiles to the following IL (simplified for readability):
ldtoken Example
call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle)
ldstr "Description"
call instance class PropertyInfo Type::GetProperty(string)
Compare that to the following line:
PropertyInfo pi = StaticReflector.Create<Example>().PropertyInfo(e => e.Description);
Which compiles to:
call class IStaticReflector`1<!!0> StaticReflector::Create<class Example>()
ldtoken Example
call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle)
ldstr "e"
call class ParameterExpression Expression::Parameter(class Type, string)
stloc.0
ldloc.0
ldtoken instance string Example::get_Description()
call class MethodBase MethodBase::GetMethodFromHandle(valuetype RuntimeMethodHandle)
castclass MethodInfo
call class MemberExpression Expression::Property(class Expression, class MethodInfo)
ldc.i4.1
newarr ParameterExpression
stloc.1
ldloc.1
ldc.i4.0
ldloc.0
stelem.ref
ldloc.1
call class Expression`1<!!0> Expression::Lambda<class System.Func`2<class Example, string>>(class Expression, class ParameterExpression[])
call class PropertyInfo StaticReflectorExtensions::PropertyInfo<class Example, string>(class IStaticReflector`1<!!0>, class Expression`1<class System.Func`2<!!0, !!1>>)
As you can see, this code doesn’t load the “Description” string, it uses the ldtoken instruction instead. Some bloggers have suggested that this would make it more efficient. Unfortunately, even if the ldtoken instruction is efficient, it is largely offset by the construction of the lambda expression. I ran a little benchmark, in which I compare execution time (in ticks) and memory usage (in generation 0 garbage collection runs) of both approaches, executing each one a million times. This is the result (on my laptop):
Using Reflection Time: 1089308 Collections: 45
Using StaticReflection Time: 13513777 Collections: 264
As you can see, the Static Reflection approach is about 13.5 times slower than the good old dynamic reflection, and it uses a lot more memory. That should be no surprise either: both cases allocate a PropertyInfo object, but the static case also allocates the expression, which is nothing but food for the garbage collector.
So, one approach seems good at compile time, and the other is good at run time. It seems we’re stuck between a rock and a hard place. But the situation isn’t so bad: we have two options to choose from, each with their pro’s and con’s. What the best one is depends on your requirements, and what you value the most: compile time checking (which may result in productivity and maintainability benefits), or performance.
And who knows, maybe there is a third option, giving the best of both worlds. But that’s for next time.
- Posted by Kris Vandermotten on July 16, 2009
A long time ago, I wrote a blog post about the problems with String.Trim(). I’m happy to see that all three issues have been addressed in the .NET Framework 4.0.
To start with, Trim() will now be consistent with Char.IsWhiteSpace(). Theoretically, this is a breaking change, but I don’t expect many programs to have a problem with this change. Note that the change is very well documented in the online help.
Secondly, the code of Trim() has been cleaned up considerably. A string that consists entirely of whitespace is no longer scanned twice. I haven’t done any benchmarks, but I expect the performance to be at least as good as for the same function in .NET 2.0 – 3.5.
Last but not least, the frequent abuse of the Trim() function to simply validate strings will greatly decrease with the introduction of the static IsNullOrWhitespace(string value) function, which is much faster than calling Trim().
It’s a small detail, compared to all the other goodies .NET 4.0 brings, but a good addition to the toolbox nonetheless.
- Posted by Kris Vandermotten on May 30, 2007
Some questions seem to come and go in waves. Recently, several people asked me about versioning .NET assemblies. There are at least four attributes int the BCL that allow you to specify version information for a .NET assembly: AssemblyVersionAttribute, AssemblyFileVersionAttribute, ComCompatibleVersionAttribute and AssemblyInformationalVersionAttribute. How should you use them?
The Assembly Version (AssemblyVersionAttribute) reflects the version of the specification of the assembly. It changes when the API changes (types or methods added, modified or removed), or when the semantics of the API change (a method now does something functionally different). When neither of these conditions are met, existing clients will be compatible with the new “version”, and the Assembly Version should not change.
The File Version (AssemblyFileVersionAttribute) reflects the distribution. It changes when the binary image of the Assembly changes, even when the Assembly Version does not. Typically, this is the result of bug fixes or internal optimizations.
While in theory the File Version allows any string to be used as a value, it is highly recommended to use a four number version string, according to the same syntax and semantics as the assembly version.
These version numbers consist of four numbers in the range 0 to 65534. The four values indicate:
- Major Version: change when features have been modified or removed.
- Minor Version: change when features have been added or Major version changed.
- Build number: change when bugs have been fixed or Minor Version changed.
- Revision number: change when non-functional improvements were made or the Build number changed.
When using a correct numbering scheme, compatibility between versions is as follows:
- A change in major version: the new version is not compatible with the old version.
- A change in minor version (but not in major version): the new version is backwards compatible with the old version, but not forward compatible. Applications using the new features don’t work with the old version, but old applications do work with the new version.
- A change in build number (but not in major or minor version): the new version is binary compatible with the old version, both forward and backward. A change in behavior may be observed as a result of a bug fix.
- A change in revision number only: the new version is binary compatible with the old version, both forward and backward. Only non-functional changes in behavior, such as changed performance characteristics, may be observed as a result of non-functional changes.
When one of these numbers is modified, typically all lower level numbers are reset to zero.
If an assembly exposes types defined in another assembly in its public API, and the other assembly’s Assembly version changes, then the Assembly version of this assembly should change as well. If the other assembly has increased its major version, increase this assembly’s major version as well. Avoid exposing types defined in third-party assemblies, in order to limit this problem.
Summary:
| Changed version number |
Reason |
Compatibility |
| Major version |
Features changed or removed |
None |
| Minor version |
Features added |
Forward only |
| Build number |
Bugs fixed |
Forward and backward |
| Revision number |
Non-functional changes |
Forward and backward |
Typically, the Major Version and Minor Version are the same in the Assembly Version and in the File Version. The Assembly Version will have the build number and revision number equal to zero, while the File Version is updated with every bug fix or non-functional improvement.
.png)
When the Assembly Version includes the Minor version, strong named assemblies compiled against the previous version will not pick up the new version from the GAC. You need to recompile the client assemblies against the new version, configure the client applications, or create a publisher policy. Likewise, assemblies compiled against the new (minor) version won't accidentally pick up the old version.
When you leave the build and revision numbers equal to zero in the Assembly version, you don't need to do anything when you want to deploy a bug fix or internal optimization.
The VB and C# compilers generate an operating system resource in the assembly file, such that several of these attributes show up in the Version tab of the file properties dialog box. Please note that the C++ compiler does not do this. In C++, a version resource needs to be added manually by the developer, and the developer must manually synchronize its contents with the assembly level attributes.

Typically, there should be no reason to include a ComCompatibleVersionAttribute or AssemblyInformationalVersionAttribute.