Tools I use for C# development

I'm often asked what tools I recommend for (general) .NET development, or at least which tools I use on a regular basis. Here's a list:

  • Visual Studio Team Suite. I you can't get Team Suite, get the Professional version. If you can't get that one, use the Express version.
  • Code Analysis, which is built into Visual Studio Team Suite. FxCop does the same thing, and is free to download.
  • .NET Reflector. After MSDN Help (and possibly Google), this tool delivers the best documentation on the .NET Framework. Another free download.
  • The unit testing framework that is built into Visual Studio Team Suite. If you don't have Team Suite, NUnit is a very good, free alternative, possibly in combination with NCover. TestDriven.NET integrates them into Visual Studio. All Free.
  • If I need to edit graphics such as icons or other bitmaps, I use Paint.NET with some additional plug-ins installed. Free as well.
  • Hardly needed at home, but for team projects at U2U we use Team Foundation Server.

There definitively are some other tools I should take a look at, for example WiX, or SandCastle and the SandCastle Help File Builder.

I you know of any must-have tools I didn't mention, drop a comment.

Compiling regular expressions

Regular expressions are a very powerful tool for text processing, and they're well supported by the .NET Framework. Of course, you want your regular expressions to run as fast as possible. You can do two things to speed up regex processing: optimizing the regular expression itself, and optimizing the execution environment. I won't be talking about regular expression optimization itself; there is plenty of information on the net about that. One very good source is www.regular-expressions.info, but I'm sure there are others. What I do want to talk about is how the .NET Framework executes your regular expressions.

Your regular expression typically enters your program as a string. This string is first decoded into an internal form that is more easily processed. Typically, the regular expression is then interpreted. This interpretation is reasonably fast, especially when processing relatively small texts. When the text is large, or the same expression is executed many times, your program may benefit from compilation to IL. You enable compilation by setting the RegexOptions.Compiled flag. As the docs say, this yields faster execution but increases startup time. Obviously, this IL needs to be further compiled by the just-in-time compiler before your CPU can execute it.

Let's look at the following function:

static bool IsValidEmail(string text)
{ 
    Regex regex = 
        new Regex(@"^(?!\.)[a-zA-Z0-9!#\$%&'\*\+-/=\?\^_`\|~\.]+(<!\.)@(\w+[\-\.])*\w{1,63}\.[a-zA-Z]{2,6}$", 
            RegexOptions.ExplicitCapture | RegexOptions.Compiled); 
    Match m = regex.Match(text); 
    return m.Success; 
}
 

What happens when you call this function?

  1. The pattern is decoded into an internal form
  2. The internal form is compiled into IL
  3. The IL is compiled to machine code
  4. The machine code is executed

Much worse is that these steps are repeated for every function evaluation! That's right, nothing is cached, and all of the above four steps are repeated for every function call. Much better is the following:

static readonly Regex regex = 
    new Regex(@"^(?!\.)[a-zA-Z0-9!#\$%&'\*\+-/=\?\^_`\|~\.]+(?<!\.)@(\w+[\-\.])*\w{1,63}\.[a-zA-Z]{2,6}$", 
        RegexOptions.ExplicitCapture | RegexOptions.Compiled); 
 
static bool IsValidEmail(string text) 
{ 
    Match m = regex.Match(text); 
    return m.Success; 
}
 

In this case, step 1, 2 and 3 are executed just once. Only the actual execution of the machine code (step 4) is done in each call.

To get a feeling of the performance impact, I've run a few tests. Obviously, test results depend on your hardware configuration, the actual regular expression and the (length of) the input, so results may vary significantly. Anyway, in my test, interpreted execution took more than twice as long as compiled execution. The decoding step (step 1) took a long as 11 compiled executions, compilation to IL took as long as 300 compiled executions and JIT compilation to machine code (step 3) took as long as 1000 compiled executions.

What does this mean in practice? Compilation speeds up execution significantly, and it's worth doing it if you'll execute the compiled regular expression many times (at least about 500 times in my test). It also means that you should avoid steps 1 to 3. You can do so by caching the result of these steps (see above), or do them at compile time or deployment time instead of at runtime.

The Regex class has a method called CompileToAssembly, which allows you to execute steps 1 and 2 at compile time. It generates an assembly on disk (a DLL) containing strongly typed Regex classes. Unfortunately this is just a method, and the .NET framework does not come with a tool to execute this function (unlike sgen.exe, which does a similar thing for XML serialization).

I've built a command line tool around this function. It takes an xml file as input, containing a number of settings for the assembly to build, and the definition of all the regular expression classes you want to include in that assembly. The output is the assembly itself and an XML documentation file, which provides intellisense in Visual Studio (or it can be used to build a help file with SandCastle). To build an assembly for the above regular expression, the minimal input file would contain the following:

<?xml version="1.0" encoding="utf-8" ?>
<project name="Example.RegularExpressions">
  <regex>
    <name>EmailRegex</name>
    <namespace>Example.RegularExpressions.Validations</namespace>
    <pattern>^(?!\.)[a-zA-Z0-9!#\$%&amp;'\*\+-/=\?\^_`\|~\.]+(?&lt;!\.)@(\w+[\-\.])*\w{1,63}\.[a-zA-Z]{2,6}$</pattern>
    <options>ExplicitCapture</options>
  </regex>
</project>  

The root element, project has one mandatory attribute name containing the (simple) name of the assembly to build. Optional elements add more information. For example:

<project name="Example.RegularExpressions">
  <version>1.0.0.1</version>
  <title>Example.RegularExpressions</title>
  <copyright>© 2007 Kris Vandermotten</copyright>
  <strongNameKeyFile>key.snk</strongNameKeyFile>

The following elements are supported: version, title, description, configuration, company, product, copyright, trademark, culture and strongNameKeyFile. Except for the latter, they all translate to standard attributes at the assembly level. Combined with the name attribute, the version, culture and strongNameKeyFile elements allow specifying a strong name for the assembly. The location of the key file is relative to the location of the source file.

The project element then contains any number of regex elements. Each regex element contains at least the name, namespace and pattern elements. Optionally, it contains options, ispublic, and doc elements. In the doc element you can include the very same XML documentation you would use in C#, C++/CLI or VB.NET, for example:

<regex>
 
<name>EmailRegex</name>
 
<namespace>Example.RegularExpressions.Validations</namespace>
 
<pattern>^(?!\.)[a-zA-Z0-9!#\$%&amp;'\*\+-/=\?\^_`\|~\.]+(?&lt;!\.)@(\w+[\-\.])*\w{1,63}\.[a-zA-Z]{2,6}$</pattern>
 
<options>ExplicitCapture</options>
 
<doc>
   
<summary>Regular expression class to validate email addresses</summary>
   
<remarks>
     
According to RFC 2822, the local-part of the address may use any of these ASCII characters:
     
<list>
       
<item>Uppercase and lowercase letters (case sensitive)</item>
       
<item>The digits 0 through 9</item>
       
<item>The characters ! # $ % &amp; ' * + - / = ? ^ _ ` { | } ~</item>
       
<item>The character . provided that it is not the first or last character in the local part.</item>
     
</list>
      .NET doesn't like { and }
    </remarks>
  </doc>
</regex>  

If you don't include a doc element, the pattern is included in a summary element. The result is that Visual Studio intellisense will show the pattern.

Since strongly named assemblies can be build, you can deploy them in the GAC and compile them to native machine code at deployment time with ngen.exe if you want to. That way, even step 3 above can be eliminated.

Download RegexCompiler here. Source code is available upon request.

Properties with property changed event, part 2

Last time I talked about properties with a changed event, I described the traditional pattern of having one event per property. But there is a disadvantage to that approach: since an event needs storage for a delegate, this technique wastes a considerable amount of memory if nobody subscribes to the events.

Say you have 10 properties of reference types with a simple backing variable each. On a 32 bit machine, those will take 40 bytes plus the storage of the actual objects (if any). If each of these properties has an associated event, that's 40 additional bytes (plus storage for the delegate objects, if any).

If you want to monitor if anything changes to the object, no matter which property it is that actually changes, you need to subscribe to all 10 events, and that's going to create 10 delegate objects.

The System.ComponentModel.INotifyPropertyChanged interface provides an interesting alternative. It declares just one event:

public interface INotifyPropertyChanged
{
    event PropertyChangedEventHandler PropertyChanged;
}

A PropertyChangedEventHandler is an event handler that takes an object (sender) and  a PropertyChangedEventArgs, an EventArgs with an additional string PropertyName property.

How do we use this interface? Say I have a class Customer with two string properties: Name and City. It would look like this:

using System;
using System.ComponentModel;

class Customer : INotifyPropertyChanged
{
    public event PropertyChangedEventHandler PropertyChanged;

    protected virtual void OnPropertyChanged(PropertyChangedEventArgs e)
    {
        if (PropertyChanged != null)
        {
            PropertyChanged(this, e);
        }
    }

    private string name;

    public string Name
    {
        get { return name; }
        set
        {
            if (value != name)
            {
                name = value;
                OnPropertyChanged(new PropertyChangedEventArgs("Name"));
            }
        }
    }

    private string city;

    public string City
    {
        get { return city; }
        set
        {
            if (value != city)
            {
                city = value;
                OnPropertyChanged(new PropertyChangedEventArgs("City"));
            }
        }
    }
}

Even though there are two properties, there is only one event. The advantages: less code, less memory usage. There are disadvantages as well though. If you're interested only in changes to the Name property, your event handler needs to test the PropertyName in the PropertyChangedEventArgs passed to you. And you may get a lot of calls notifying you of changes to properties your aren't interested in. On the sending side, be careful with rename refactorings, since the strings being passed around won't be refactored by Visual Studio.

Nevertheless, this approach is becoming the dominant technique for data binding scenarios. In fact, the new Linq to SQL generated classes use this approach as well. So you might just as well get accustomed to it.

In part 3 of this series, we'll have a look at how Windows Forms controls store their event delegates, with a CueBannerTextBox example.

Creating a Data Access Layer with Linq to SQL, part 2

Last time, we looked at how Linq To SQL might impact how we think about what a Data Access Layer (DAL) is, based on the dependencies between assemblies. This time, we'll take a different approach: let's look at typical Linq to SQL code, and try to decide where to put it. I'll use a code sample from the "DLinq Overview for CSharp Developers" document included in the Linq May CTP (in C# 3.0, but the same applies to VB9).

A simple start

Let's take a look at the following code:

Northwind db = new Northwind(@"c:\northwind\northwnd.mdf"); 

var q = from c in db.Customers
where c.City == "London"
select c;

foreach (var cust in q)
Console.WriteLine("id = {0}, City = {1}", cust.CustomerID, cust.City);
 

It should be clear that the first line belongs in the DAL. The DataContext encapsulates a database connection, and knows about the physical location of the database. That is not something that higher layers should know about.

Let's say the actual query definition belongs in the DAL too, but clearly, the foreach loop sits in some higher layer. That means the two first statements need to be encapsulated in some function in the DAL, for example as follows (sticking with the "Entity Access Layer" terminology introduced before):

public class CustomersEal 
{
private Northwind db = new Northwind(@"c:\northwind\northwnd.mdf");
    public IQueryable<Customer> GetCustomersByCity(string city) 
{
return from c in db.Customers
where c.City == city
select c;
}
}
 

The business layer then contains the following code:

CustomersEal customersEal = new CustomersEal();
 
foreach (var cust in customersEal.GetCustomersByCity("London"))
    Console.WriteLine("id = {0}, City = {1}", cust.CustomerID, cust.City);
 

Looks good, doesn't it? All the business layer knows about the database, is that it can return Customer objects.

Problems

But wait, what if I write the following in my business layer:

CustomersEal customersEal = new CustomersEal();
 
var q = from c in customersEal.GetCustomersByCity("London")
        orderby c.ContactNamer
        select new { c.CustomerID, c.City };
 
foreach (var cust in q)
    Console.WriteLine("id = {0}, City = {1}", cust.CustomerID, cust.City);
 

This code highlights a few interesting facts.

First of all, it wasn't the DAL that executed the query, at least not in the traditional sense of the word. The DAL (CustomersEal to be precise) merely supplied the definition for the query. The query got executed when the foreach statement started looping over the result! In a traditional DAL, a call to a method like GetCustomersByCity would have executed the query, but not with Linq, at least not if we implement our code like this.

Secondly, the business layer can refine the query definition. This definitely has some advantages, but I realize some might argue that this is really bad. Note though, that the business layer cannot redefine the query, or execute just any query it wants. Or can it? You need the DataContext to start the process, and only the DAL has access to that, right? In fact, the Entity Layer generated by SQLMetal is referenced by the business layer too; it needs it to get to the definitions of the entities!

Thirdly, it is absolutely not clear where a developer should draw the line between what's business logic, and what belongs in the DAL. I could have moved the orderby into the DAL (especially if I always want customers to be ordered by their ContactName). But likewise, I could have moved the where clause to the business layer! How do I decide what to do?

I hate it when developers have to make choices like that during routine development. Choosing takes time, and that's not likely to improve productivity. But much worse is the fact that different developers will make different choices. Even a single developer may make different choices from one day to the next. That leads to inconsistencies in the code. Developers will spend more time trying to understand the code they're reading, because it doesn't always follow the same pattern. That's bad for productivity. In the worst case scenario, developers start rewriting each other's code, just so it matches their choice of the day. That kills productivity. (Wasn't Linq all about improving productivity?)

The solution?

We need a clear and simple criterion to decide which code goes where.

Note that the absolute minimum for a DAL is the following:

public class CustomersEal  
{
    private Northwind db = new Northwind(@"c:\northwind\northwnd.mdf");
 
    public IQueryable<Customer> GetCustomers()
    {
        return db.Customers;
    }
}
 

It's a bit silly of course, if that's all this layer does, we might just as well skip it (the connection string should be externalized in a configuration file anyway, and a default constructor that reads the connection string from the config file should be added to the Northwind DataContext in a partial class). Silly or not, it is a "lower bound" to an EAL as we have defined it here. I believe there's an "upper bound" too: I think the DAL shouldn't do projections (well, it definitely should not expose anonymous types). But that still leaves us with a very broad range. How to make a choice?

I'm inclined to say that the only way to make a clear and simple choice once and for all, it to go with the minimalist approach. And indeed, that means we don't need/write/use an Entity Access Layer. The business logic directly accesses the one assembly generated by SQLMetal, one assembly per database that is.

How's that for a DAL?