Kris Vandermotten | All posts tagged '.net'

Farewell Visitor

01 December 2010 Kris-Vandermotten .NET

The Visitor design pattern was first documented in 1995 by the Gang of Four. It’s a workaround for the fact that most strongly typed object oriented languages only support single dispatch, even when sometimes double dispatch is required. With C# 4, we no longer need this workaround. We now have something better, more on that below. Let’s look at an example. The traditional Visitor pattern The System.Linq.Expressions namespace contains types that enable code expressions to be represented as objects in the form of expression trees. For example, the following C# statement creates an expression tree:Expression<Func<double, double>> f = x => Math.Sin(1 + 2 * x); There are many kinds of expressions. The above example creates several objects of different types, including ParameterExpression, ConstantExpression, BinaryExpression and MethodCallExpression, all of which inherit from Expression. There are many ways to represent expressions as text. For example, we can use an infix notation (as most programming languages do), but we can also use prefix or postfix notation. Anyone who as ever worked with an HP scientific calculator, or a programming language such as Forth, will appreciate postfix notation, also known as reverse polish notation. As a result, the way to translate a particular expression into a text representation depends on two things: the kind of expression and the kind of notation. More precisely, the method to execute to translate an expression object into a string object depends on the type of the expression, and on the type of the translation algorithm. Using a virtual function would allow the system to choose a method based on one of these dimensions, e.g. the type of expression, but not on both. Virtual functions provide single dispatch, but we need dual dispatch. In fact, the expression class has a virtual ToString() method, inherited from object. Every type inheriting from Expression has its own version, making the algorithm depend on the type of expression. But it’s a hardcoded implementation, using an infix notation. What if we want a postfix ToString? Or Prefix? Or a C# or F# syntax? This is where the Visitor pattern can help us, and luckily the Expression class has support for it. The class ExpressionVisitor is an abstract base class for algorithms working on expressions. Now when I say algorithms, you probably think of methods with parameters and return values, but that’s not how a Visitor works. A Visitor is an object of some class, and parameters must be passed in, typically via the constructor. The return value, i.e. the result of the algorithm must be read back from a property. Let’s create a base class for our ToString visitors: public abstract class ToStringVisitor : ExpressionVisitor { protected readonly StringBuilder resultAccumulator = new StringBuilder(); public string Result { get { return resultAccumulator.ToString(); } } } This provides us with a base class for Visitors that have a string Result property. There are some issues with it, such as when to Clear() the StringBuilder when the Visitor is reused, but let’s not get into those. We can now create a ToPostfixStringVisitor, and encapsulate it behind a static method:public static class ExpressionExtensions{ public static string ToPostfixString(this Expression<Func<double, double>> function) { var visitor = new ToPostFixStringVisitor(); visitor.Visit(function); return visitor.Result; } private class ToPostFixStringVisitor : ToStringVisitor { protected override Expression VisitLambda<T>(Expression<T> node) { // enables reusing the visitor – not absolutely required here as the only // place where an instance can be created is in the ToPostfixString method. this.resultAccumulator.Clear(); foreach (var parameter in node.Parameters) { this.Visit(parameter); } this.resultAccumulator.Append("-> "); this.Visit(node.Body); return node; } protected override Expression VisitParameter(ParameterExpression node) { this.resultAccumulator.Append(node.Name); this.resultAccumulator.Append(' '); return node; } protected override Expression VisitBinary(BinaryExpression node) { this.Visit(node.Left); this.Visit(node.Right); switch (node.NodeType) { case ExpressionType.Add: case ExpressionType.AddChecked: this.resultAccumulator.Append('+'); break; case ExpressionType.Multiply: case ExpressionType.MultiplyChecked: this.resultAccumulator.Append('*'); break; case ExpressionType.Subtract: case ExpressionType.SubtractChecked: this.resultAccumulator.Append('-'); break; case ExpressionType.Divide: this.resultAccumulator.Append('/'); break; case ExpressionType.Modulo: this.resultAccumulator.Append('%'); break; default: throw new NotSupportedException(); } this.resultAccumulator.Append(' '); return node; } protected override Expression VisitMethodCall(MethodCallExpression node) { foreach (var arg in node.Arguments) { this.Visit(arg); } this.resultAccumulator.Append(node.Method.Name); this.resultAccumulator.Append(' '); return node; } protected override Expression VisitConstant(ConstantExpression node) { this.resultAccumulator.Append(node.Value); this.resultAccumulator.Append(' '); return node; } }} For example, the following line outputs x -> 1 2 x * + SinConsole.WriteLine(ExpressionExtensions.ToPostfixString(x => Math.Sin(1 + 2 * x))); It works, even though for the sake of example it only supports a very small subset of all expressions. At least in the case of binary operators, it throws a NotSupportedException for operators that are, well, not supported. I really should add a bunch of other methods, for example: protected override Expression VisitConditional(ConditionalExpression node) { throw new NotSupportedException(); } protected override Expression VisitBlock(BlockExpression node) { throw new NotSupportedException(); } Anyway, how does it work? The Visit() method calls an internal virtual method on Expression called Accept. Being virtual, this chooses what kind of expression to work on an it calls the appropriate VisitX method in the visitor. This one is virtual as well, and it chooses the correct algorithm, our ToPostfixStringVisitor in this case. So we have double dispatch, via a combination of two single dispatch calls. Dynamic dispatch to the rescue As of C# 4, we are not restricted to single dispatch, we now have dynamic dispatch. Let’s see what this example looks like using dynamic:public static class ExpressionExtensions{ public static string ToPostfixString(this Expression<Func<double, double>> function) { StringBuilder resultAccumulator = new StringBuilder(); Visit(function, resultAccumulator); return resultAccumulator.ToString(); } private static void Visit(Expression expression, StringBuilder resultAccumulator) { dynamic dynamicExpression = expression; VisitCore(dynamicExpression, resultAccumulator); } private static void VisitCore(LambdaExpression node, StringBuilder resultAccumulator) { foreach (var parameter in node.Parameters) { Visit(parameter, resultAccumulator); } resultAccumulator.Append("-> "); Visit(node.Body, resultAccumulator); } private static void VisitCore(ParameterExpression node, StringBuilder resultAccumulator) { resultAccumulator.Append(node.Name); resultAccumulator.Append(' '); } private static void VisitCore(BinaryExpression node, StringBuilder resultAccumulator) { Visit(node.Left, resultAccumulator); Visit(node.Right, resultAccumulator); switch (node.NodeType) { case ExpressionType.Add: case ExpressionType.AddChecked: resultAccumulator.Append('+'); break; case ExpressionType.Multiply: case ExpressionType.MultiplyChecked: resultAccumulator.Append('*'); break; case ExpressionType.Subtract: case ExpressionType.SubtractChecked: resultAccumulator.Append('-'); break; case ExpressionType.Divide: resultAccumulator.Append('/'); break; case ExpressionType.Modulo: resultAccumulator.Append('%'); break; default: throw new NotSupportedException(); } resultAccumulator.Append(' '); } private static void VisitCore(MethodCallExpression node, StringBuilder resultAccumulator) { foreach (var arg in node.Arguments) { Visit(arg, resultAccumulator); } resultAccumulator.Append(node.Method.Name); resultAccumulator.Append(' '); } private static void VisitCore(ConstantExpression node, StringBuilder resultAccumulator) { resultAccumulator.Append(node.Value); resultAccumulator.Append(' '); } private static void VisitCore(Expression node, StringBuilder resultAccumulator) { throw new NotSupportedException(); }} The dynamic dispatch is achieved by the Visit method. To learn more about how this works, see http://blogs.msdn.com/b/samng/archive/2008/11/06/dynamic-in-c-iii-a-slight-twist.aspx. So how is this better? First of all, this approach works with all classes. The Expression class does have visitor support baked in, but most classes don’t. The dynamic approach also works if the target classes don’t support the visitor pattern. That also means that you don’t have to do anything special with your own classes to enable this technique. The dynamic approach is also simpler. I’ve noticed that most people don’t immediately “see” the visitor pattern, but the dynamic approach is easier to understand. Visitor implementations typically have methods that return void, and producing a result must be accomplished via fields and properties (the ExpressionVisitor is a notable exception here, it is optimized for rewriting expressions, i.e. calculating a new expression based on an existing one). With dynamic methods, you choose your parameter and return types (the StringBuilder in this example). Not only is that much simpler to code, the entire thing has no state in fields, only in stack local variables. As a result, it’s completely reentrant and thread-safe. Note also that the last VisitCore method specifies what to do with expressions that aren’t handled by any of the other methods. Much more convenient than with a Visitor, where you always have to specify a method for each concrete type, unless it just so happens that the behavior you want is the default behavior from the base class. Conclusion The dynamic keyword was introduced to facilitate interoperability with dynamic languages and systems, including COM. C# remains primarily a strongly typed language. As such, some people suggested that dynamic has no place in plain C# programs that don’t require such interoperability. However, the above example shows that dynamic dispatch can be very useful in the context of strongly typed C# programs, as an alternative to the Visitor pattern.

Static Reflection in .NET, part 2

18 October 2009 Kris-Vandermotten .NET

A few weeks ago, I talked about static reflection and its advantages. You’ll remember that the main advantages, compared to the normal reflection API’s, are the compile time checking of parameters and IntelliSense support. How does it compare at other levels, performance for example? Before we dive into that question, let me state that performance may or may not be important to you. A program that is fast enough is, well, fast enough. It’s unlikely that a (single) reflection call will have a significant impact on, say, the response time of a graphical user interface, and so performance doesn’t matter. If your algorithm requires millions of reflection operations, I’m sure you can rewrite it somehow to reduce that number significantly, and then performance again probably doesn’t matter anymore. That being said, we still want to know, right? First of all, let’s compare code. Take this line (using the Example class from the last post): PropertyInfo pi = typeof(Example).GetProperty("Description"); This line compiles to the following IL (simplified for readability): ldtoken Example call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle) ldstr "Description" call instance class PropertyInfo Type::GetProperty(string) Compare that to the following line: PropertyInfo pi = StaticReflector.Create<Example>().PropertyInfo(e => e.Description); Which compiles to: call class IStaticReflector`1<!!0> StaticReflector::Create<class Example>() ldtoken Example call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle) ldstr "e" call class ParameterExpression Expression::Parameter(class Type, string) stloc.0 ldloc.0 ldtoken instance string Example::get_Description() call class MethodBase MethodBase::GetMethodFromHandle(valuetype RuntimeMethodHandle) castclass MethodInfo call class MemberExpression Expression::Property(class Expression, class MethodInfo) ldc.i4.1 newarr ParameterExpression stloc.1 ldloc.1 ldc.i4.0 ldloc.0 stelem.ref ldloc.1 call class Expression`1<!!0> Expression::Lambda<class System.Func`2<class Example, string>>(class Expression, class ParameterExpression[]) call class PropertyInfo StaticReflectorExtensions::PropertyInfo<class Example, string>(class IStaticReflector`1<!!0>, class Expression`1<class System.Func`2<!!0, !!1>>) As you can see, this code doesn’t load the “Description” string, it uses the ldtoken instruction instead. Some bloggers have suggested that this would make it more efficient. Unfortunately, even if the ldtoken instruction is efficient, it is largely offset by the construction of the lambda expression. I ran a little benchmark, in which I compare execution time (in ticks) and memory usage (in generation 0 garbage collection runs) of both approaches, executing each one a million times. This is the result (on my laptop): Using Reflection Time: 1089308 Collections: 45 Using StaticReflection Time: 13513777 Collections: 264 As you can see, the Static Reflection approach is about 13.5 times slower than the good old dynamic reflection, and it uses a lot more memory. That should be no surprise either: both cases allocate a PropertyInfo object, but the static case also allocates the expression, which is nothing but food for the garbage collector. So, one approach seems good at compile time, and the other is good at run time. It seems we’re stuck between a rock and a hard place. But the situation isn’t so bad: we have two options to choose from, each with their pro’s and con’s. What the best one is depends on your requirements, and what you value the most: compile time checking (which may result in productivity and maintainability benefits), or performance. And who knows, maybe there is a third option, giving the best of both worlds. But that’s for next time.

String.Trim() fixed in .NET 4.0

16 July 2009 Kris-Vandermotten .NET

A long time ago, I wrote a blog post about the problems with String.Trim(). I’m happy to see that all three issues have been addressed in the .NET Framework 4.0. To start with, Trim() will now be consistent with Char.IsWhiteSpace(). Theoretically, this is a breaking change, but I don’t expect many programs to have a problem with this change. Note that the change is very well documented in the online help. Secondly, the code of Trim() has been cleaned up considerably. A string that consists entirely of whitespace is no longer scanned twice. I haven’t done any benchmarks, but I expect the performance to be at least as good as for the same function in .NET 2.0 – 3.5. Last but not least, the frequent abuse of the Trim() function to simply validate strings will greatly decrease with the introduction of the static IsNullOrWhitespace(string value) function, which is much faster than calling Trim(). It’s a small detail, compared to all the other goodies .NET 4.0 brings, but a good addition to the toolbox nonetheless.

Farewell Visitor

Static Reflection in .NET, part 2

String.Trim() fixed in .NET 4.0

.NET 3.5 SP1 Beta available

CLR Update this summer

.NET Framework Library Source Code now available

.NET Framework Source Code to be Released

Properties with property changed event, part 3

Orcas: February 2008 it will be

Versioning .NET Assemblies