Kris Vandermotten | All posts tagged 'expressions'

Farewell Visitor

01 December 2010 Kris-Vandermotten .NET

The Visitor design pattern was first documented in 1995 by the Gang of Four. It’s a workaround for the fact that most strongly typed object oriented languages only support single dispatch, even when sometimes double dispatch is required. With C# 4, we no longer need this workaround. We now have something better, more on that below. Let’s look at an example. The traditional Visitor pattern The System.Linq.Expressions namespace contains types that enable code expressions to be represented as objects in the form of expression trees. For example, the following C# statement creates an expression tree:Expression<Func<double, double>> f = x => Math.Sin(1 + 2 * x); There are many kinds of expressions. The above example creates several objects of different types, including ParameterExpression, ConstantExpression, BinaryExpression and MethodCallExpression, all of which inherit from Expression. There are many ways to represent expressions as text. For example, we can use an infix notation (as most programming languages do), but we can also use prefix or postfix notation. Anyone who as ever worked with an HP scientific calculator, or a programming language such as Forth, will appreciate postfix notation, also known as reverse polish notation. As a result, the way to translate a particular expression into a text representation depends on two things: the kind of expression and the kind of notation. More precisely, the method to execute to translate an expression object into a string object depends on the type of the expression, and on the type of the translation algorithm. Using a virtual function would allow the system to choose a method based on one of these dimensions, e.g. the type of expression, but not on both. Virtual functions provide single dispatch, but we need dual dispatch. In fact, the expression class has a virtual ToString() method, inherited from object. Every type inheriting from Expression has its own version, making the algorithm depend on the type of expression. But it’s a hardcoded implementation, using an infix notation. What if we want a postfix ToString? Or Prefix? Or a C# or F# syntax? This is where the Visitor pattern can help us, and luckily the Expression class has support for it. The class ExpressionVisitor is an abstract base class for algorithms working on expressions. Now when I say algorithms, you probably think of methods with parameters and return values, but that’s not how a Visitor works. A Visitor is an object of some class, and parameters must be passed in, typically via the constructor. The return value, i.e. the result of the algorithm must be read back from a property. Let’s create a base class for our ToString visitors: public abstract class ToStringVisitor : ExpressionVisitor { protected readonly StringBuilder resultAccumulator = new StringBuilder(); public string Result { get { return resultAccumulator.ToString(); } } } This provides us with a base class for Visitors that have a string Result property. There are some issues with it, such as when to Clear() the StringBuilder when the Visitor is reused, but let’s not get into those. We can now create a ToPostfixStringVisitor, and encapsulate it behind a static method:public static class ExpressionExtensions{ public static string ToPostfixString(this Expression<Func<double, double>> function) { var visitor = new ToPostFixStringVisitor(); visitor.Visit(function); return visitor.Result; } private class ToPostFixStringVisitor : ToStringVisitor { protected override Expression VisitLambda<T>(Expression<T> node) { // enables reusing the visitor – not absolutely required here as the only // place where an instance can be created is in the ToPostfixString method. this.resultAccumulator.Clear(); foreach (var parameter in node.Parameters) { this.Visit(parameter); } this.resultAccumulator.Append("-> "); this.Visit(node.Body); return node; } protected override Expression VisitParameter(ParameterExpression node) { this.resultAccumulator.Append(node.Name); this.resultAccumulator.Append(' '); return node; } protected override Expression VisitBinary(BinaryExpression node) { this.Visit(node.Left); this.Visit(node.Right); switch (node.NodeType) { case ExpressionType.Add: case ExpressionType.AddChecked: this.resultAccumulator.Append('+'); break; case ExpressionType.Multiply: case ExpressionType.MultiplyChecked: this.resultAccumulator.Append('*'); break; case ExpressionType.Subtract: case ExpressionType.SubtractChecked: this.resultAccumulator.Append('-'); break; case ExpressionType.Divide: this.resultAccumulator.Append('/'); break; case ExpressionType.Modulo: this.resultAccumulator.Append('%'); break; default: throw new NotSupportedException(); } this.resultAccumulator.Append(' '); return node; } protected override Expression VisitMethodCall(MethodCallExpression node) { foreach (var arg in node.Arguments) { this.Visit(arg); } this.resultAccumulator.Append(node.Method.Name); this.resultAccumulator.Append(' '); return node; } protected override Expression VisitConstant(ConstantExpression node) { this.resultAccumulator.Append(node.Value); this.resultAccumulator.Append(' '); return node; } }} For example, the following line outputs x -> 1 2 x * + SinConsole.WriteLine(ExpressionExtensions.ToPostfixString(x => Math.Sin(1 + 2 * x))); It works, even though for the sake of example it only supports a very small subset of all expressions. At least in the case of binary operators, it throws a NotSupportedException for operators that are, well, not supported. I really should add a bunch of other methods, for example: protected override Expression VisitConditional(ConditionalExpression node) { throw new NotSupportedException(); } protected override Expression VisitBlock(BlockExpression node) { throw new NotSupportedException(); } Anyway, how does it work? The Visit() method calls an internal virtual method on Expression called Accept. Being virtual, this chooses what kind of expression to work on an it calls the appropriate VisitX method in the visitor. This one is virtual as well, and it chooses the correct algorithm, our ToPostfixStringVisitor in this case. So we have double dispatch, via a combination of two single dispatch calls. Dynamic dispatch to the rescue As of C# 4, we are not restricted to single dispatch, we now have dynamic dispatch. Let’s see what this example looks like using dynamic:public static class ExpressionExtensions{ public static string ToPostfixString(this Expression<Func<double, double>> function) { StringBuilder resultAccumulator = new StringBuilder(); Visit(function, resultAccumulator); return resultAccumulator.ToString(); } private static void Visit(Expression expression, StringBuilder resultAccumulator) { dynamic dynamicExpression = expression; VisitCore(dynamicExpression, resultAccumulator); } private static void VisitCore(LambdaExpression node, StringBuilder resultAccumulator) { foreach (var parameter in node.Parameters) { Visit(parameter, resultAccumulator); } resultAccumulator.Append("-> "); Visit(node.Body, resultAccumulator); } private static void VisitCore(ParameterExpression node, StringBuilder resultAccumulator) { resultAccumulator.Append(node.Name); resultAccumulator.Append(' '); } private static void VisitCore(BinaryExpression node, StringBuilder resultAccumulator) { Visit(node.Left, resultAccumulator); Visit(node.Right, resultAccumulator); switch (node.NodeType) { case ExpressionType.Add: case ExpressionType.AddChecked: resultAccumulator.Append('+'); break; case ExpressionType.Multiply: case ExpressionType.MultiplyChecked: resultAccumulator.Append('*'); break; case ExpressionType.Subtract: case ExpressionType.SubtractChecked: resultAccumulator.Append('-'); break; case ExpressionType.Divide: resultAccumulator.Append('/'); break; case ExpressionType.Modulo: resultAccumulator.Append('%'); break; default: throw new NotSupportedException(); } resultAccumulator.Append(' '); } private static void VisitCore(MethodCallExpression node, StringBuilder resultAccumulator) { foreach (var arg in node.Arguments) { Visit(arg, resultAccumulator); } resultAccumulator.Append(node.Method.Name); resultAccumulator.Append(' '); } private static void VisitCore(ConstantExpression node, StringBuilder resultAccumulator) { resultAccumulator.Append(node.Value); resultAccumulator.Append(' '); } private static void VisitCore(Expression node, StringBuilder resultAccumulator) { throw new NotSupportedException(); }} The dynamic dispatch is achieved by the Visit method. To learn more about how this works, see http://blogs.msdn.com/b/samng/archive/2008/11/06/dynamic-in-c-iii-a-slight-twist.aspx. So how is this better? First of all, this approach works with all classes. The Expression class does have visitor support baked in, but most classes don’t. The dynamic approach also works if the target classes don’t support the visitor pattern. That also means that you don’t have to do anything special with your own classes to enable this technique. The dynamic approach is also simpler. I’ve noticed that most people don’t immediately “see” the visitor pattern, but the dynamic approach is easier to understand. Visitor implementations typically have methods that return void, and producing a result must be accomplished via fields and properties (the ExpressionVisitor is a notable exception here, it is optimized for rewriting expressions, i.e. calculating a new expression based on an existing one). With dynamic methods, you choose your parameter and return types (the StringBuilder in this example). Not only is that much simpler to code, the entire thing has no state in fields, only in stack local variables. As a result, it’s completely reentrant and thread-safe. Note also that the last VisitCore method specifies what to do with expressions that aren’t handled by any of the other methods. Much more convenient than with a Visitor, where you always have to specify a method for each concrete type, unless it just so happens that the behavior you want is the default behavior from the base class. Conclusion The dynamic keyword was introduced to facilitate interoperability with dynamic languages and systems, including COM. C# remains primarily a strongly typed language. As such, some people suggested that dynamic has no place in plain C# programs that don’t require such interoperability. However, the above example shows that dynamic dispatch can be very useful in the context of strongly typed C# programs, as an alternative to the Visitor pattern.

Static Reflection in .NET, part 2

18 October 2009 Kris-Vandermotten .NET

A few weeks ago, I talked about static reflection and its advantages. You’ll remember that the main advantages, compared to the normal reflection API’s, are the compile time checking of parameters and IntelliSense support. How does it compare at other levels, performance for example? Before we dive into that question, let me state that performance may or may not be important to you. A program that is fast enough is, well, fast enough. It’s unlikely that a (single) reflection call will have a significant impact on, say, the response time of a graphical user interface, and so performance doesn’t matter. If your algorithm requires millions of reflection operations, I’m sure you can rewrite it somehow to reduce that number significantly, and then performance again probably doesn’t matter anymore. That being said, we still want to know, right? First of all, let’s compare code. Take this line (using the Example class from the last post): PropertyInfo pi = typeof(Example).GetProperty("Description"); This line compiles to the following IL (simplified for readability): ldtoken Example call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle) ldstr "Description" call instance class PropertyInfo Type::GetProperty(string) Compare that to the following line: PropertyInfo pi = StaticReflector.Create<Example>().PropertyInfo(e => e.Description); Which compiles to: call class IStaticReflector`1<!!0> StaticReflector::Create<class Example>() ldtoken Example call class Type Type::GetTypeFromHandle(valuetype RuntimeTypeHandle) ldstr "e" call class ParameterExpression Expression::Parameter(class Type, string) stloc.0 ldloc.0 ldtoken instance string Example::get_Description() call class MethodBase MethodBase::GetMethodFromHandle(valuetype RuntimeMethodHandle) castclass MethodInfo call class MemberExpression Expression::Property(class Expression, class MethodInfo) ldc.i4.1 newarr ParameterExpression stloc.1 ldloc.1 ldc.i4.0 ldloc.0 stelem.ref ldloc.1 call class Expression`1<!!0> Expression::Lambda<class System.Func`2<class Example, string>>(class Expression, class ParameterExpression[]) call class PropertyInfo StaticReflectorExtensions::PropertyInfo<class Example, string>(class IStaticReflector`1<!!0>, class Expression`1<class System.Func`2<!!0, !!1>>) As you can see, this code doesn’t load the “Description” string, it uses the ldtoken instruction instead. Some bloggers have suggested that this would make it more efficient. Unfortunately, even if the ldtoken instruction is efficient, it is largely offset by the construction of the lambda expression. I ran a little benchmark, in which I compare execution time (in ticks) and memory usage (in generation 0 garbage collection runs) of both approaches, executing each one a million times. This is the result (on my laptop): Using Reflection Time: 1089308 Collections: 45 Using StaticReflection Time: 13513777 Collections: 264 As you can see, the Static Reflection approach is about 13.5 times slower than the good old dynamic reflection, and it uses a lot more memory. That should be no surprise either: both cases allocate a PropertyInfo object, but the static case also allocates the expression, which is nothing but food for the garbage collector. So, one approach seems good at compile time, and the other is good at run time. It seems we’re stuck between a rock and a hard place. But the situation isn’t so bad: we have two options to choose from, each with their pro’s and con’s. What the best one is depends on your requirements, and what you value the most: compile time checking (which may result in productivity and maintainability benefits), or performance. And who knows, maybe there is a third option, giving the best of both worlds. But that’s for next time.

Static Reflection in .NET

29 June 2009 Kris-Vandermotten .NET

LINQ expressions have proven to be extremely versatile, popping up in all sorts of areas. “Static Reflection” seem to be the latest hype. But what is static reflection anyway, and why is it good or why is it bad? Reflection is used to obtain information about the code you are executing, and to use that information to interact with the code dynamically. Sometimes reflection is used to interact dynamically with code that is statically known by a program already. For example, data binding heavily relies on reflection to dynamically read and write properties. The calling program knows about those properties statically, but the data binding libraries do not. In data binding, object properties are often identified by their name, expressed as a string. That string is then used by the libraries to construct a PropertyInfo object. Time for an example. Given this class: public class Example { public string Description { get; set; } } You can obtain a PropertyInfo object describing the Description property as follows: PropertyInfo pi = typeof(Example).GetProperty("Description"); We may have an issue here. If I make a typing mistake in the GetProperty call, I don’t get a compiler error. At runtime, the call will return null, probably leading to a NullReferenceException down the road. And of course, Visual Studio Intellisense will not help me to type it right. Also, if I rename the property, for example to “Summary”, the GetProperty call will be broken, without a compile-time error. Static Reflection is one technique to avoid these issues. Using LINQ expressions, we could create an API that allows us to do something like the following: PropertyInfo pi = StaticReflector.GetProperty(Example e => e.Description); The downside of this approach is that it doesn’t work with anonymous types. So I propose a different mechanism. What we need is something that statically gives us access to a type. Any generic interface will do. I propose the following: public interface IStaticReflector<T> { } Given this interface, we can define a series of extension methods, for example: public static class StaticReflectorExtensions { public static PropertyInfo PropertyInfo<T, U>(this IStaticReflector<T> obj, Expression<Func<T, U>> selector) { var body = selector.Body as MemberExpression; return body.Member as PropertyInfo; } } Notice how the obj parameter is not really used in the PropertyInfo method. It does serve a purpose however: it allows us to use type inference on the type T, and I get full Intellisense. For example: IStaticReflector<Example> reflector = null; PropertyInfo pi = reflector.PropertyInfo(e => e.Description); Granted, initializing a variable to null and then calling a method on it is a bit weird. We need a more elegant way to create these things: public static class StaticReflector { public static IStaticReflector<T> Create<T>() { return null; } } Now we can write: PropertyInfo pi = StaticReflector.Create<Example>().PropertyInfo(e => e.Description); This still doesn’t work on anonymous types though. For those, we could use the following: public static class ObjectExtensions { public static IStaticReflector<T> GetReflector<T>(this T obj) { return null; } } Now we can write things such as: var anonymous = new { Description = "Example" }; PropertyInfo pi = anonymous.GetReflector().PropertyInfo(e => e.Description); I do prefer the StaticReflector.Create<T>() method is case the type name is known though. Are we done? Not really. Let’s go back to dynamic reflection using string names. Lot’s of things could go wrong there, and we don’t get any warnings. The situation has not gone worse, but still lot’s of things can go wrong. So the PropertyInfo method needs some parameter validation. Also, properties certainly aren’t the only thing we can reflect upon. What about fields, methods and constructors? Here’s a full implementation: using System; using System.Linq.Expressions; using System.Reflection; public static class StaticReflectorExtensions { public static PropertyInfo PropertyInfo<T, U>(this IStaticReflector<T> obj, Expression<Func<T, U>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } PropertyInfo pi = obj.MemberInfo(selector) as PropertyInfo; if (pi == null) { throw new ArgumentException(Strings.InvalidPropertySelector, Strings.Selector); } return pi; } public static FieldInfo FieldInfo<T, U>(this IStaticReflector<T> obj, Expression<Func<T, U>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } FieldInfo fi = obj.MemberInfo(selector) as FieldInfo; if (fi == null) { throw new ArgumentException(Strings.InvalidFieldSelector, Strings.Selector); } return fi; } public static MemberInfo MemberInfo<T, U>(this IStaticReflector<T> obj, Expression<Func<T, U>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } var body = selector.Body as MemberExpression; if (body == null) { throw new ArgumentException(Strings.InvalidMemberSelector, Strings.Selector); } if (body.Expression.NodeType != ExpressionType.Parameter) { throw new ArgumentException(Strings.InvalidMemberSelector, Strings.Selector); } return body.Member; } public static MethodInfo MethodInfo<T, U>(this IStaticReflector<T> obj, Expression<Func<T, U>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } var body = selector.Body as MethodCallExpression; if (body == null) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } // instance methods must be called on the parameter if (body.Object != null && body.Object.NodeType != ExpressionType.Parameter) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } // static methods must be defined in the type of the parameter or a base type if (body.Object == null && !body.Method.DeclaringType.IsAssignableFrom(typeof(T))) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } return body.Method; } public static MethodInfo MethodInfo<T>(this IStaticReflector<T> obj, Expression<Action<T>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } var body = selector.Body as MethodCallExpression; if (body == null) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } // instance methods must be called on the parameter if (body.Object != null && body.Object.NodeType != ExpressionType.Parameter) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } // static methods must be defined in the type of the parameter or a base type if (body.Object == null && !body.Method.DeclaringType.IsAssignableFrom(typeof(T))) { throw new ArgumentException(Strings.InvalidMethodSelector, Strings.Selector); } return body.Method; } public static ConstructorInfo ConstructorInfo<T>(this IStaticReflector<T> obj, Expression<Func<T>> selector) { if (selector == null) { throw new ArgumentNullException(Strings.Selector); } var body = selector.Body as NewExpression; if (body == null) { throw new ArgumentException(Strings.InvalidConstructorSelector, Strings.Selector); } return body.Constructor; } private static class Strings { internal const string InvalidFieldSelector = "Invalid field selector"; internal const string InvalidPropertySelector = "Invalid property selector"; internal const string InvalidMemberSelector = "Invalid member selector"; internal const string InvalidMethodSelector = "Invalid method selector"; internal const string InvalidConstructorSelector = "Invalid constructor selector"; internal const string Selector = "selector"; } } Next time, we’ll talk about the disadvantages of this approach, and we’ll look at an alternative.

Lightweight Code Generation with Linq Expressions

16 November 2007 Kris-Vandermotten LINQ

Lightweight Code Generation with Linq Expressions [More]