A fistful of Entity Framework 4.0 Tips

This article presents some useful tips for building the data access layer of an enterprise application on top of Entity Framework 4.0 (EF40). For those who can't wait, here they are:

1. Only project the columns you really need,
2. Stay away from the Include syntax,
3. Consider alternatives,
4. But then come back and try harder, and
5. Always return custom data tracking objects.

The focus of EF40 lies on developer productivity, not database performance. So there are a couple of caveats you should be aware of when you don't want your data access to become the bottleneck. I'll illustrate my tips by showing you some different ways to issue a left outer join on a small entity model with just the Person and PersonPhone entities:

Tip 1: Only project the columns you really need

You should never return full-fledged entities from your queries. If you're only interested in the FirstName and LastName of Person entities, then the following EF40 query is definitely a bad idea:

var query = from p in model.People

            select p;

A query like this selects all the columns from the underlying table. It most probably has only one covering index in the database: the clustered index. This query will suffer from all kinds of locks on the table. Just execute the following SQL (from Visual Studio or SQL Management Studio), and then start the Linq query:

BEGIN TRANSACTION

 

    UPDATE Person.Person

       SET Title = NULL

     WHERE BusinessEntityID = 1

If default isolation levels are applied to the database, the Linq query will be blocked and eventually time out:

[Don't forget to Rollback the transaction.]

You should only project (that's just a fancy word for 'select') the needed columns, like this:

var query = from p in model.People

            select new PersonDto() { LastName = p.LastName, FirstName = p.FirstName };

With a little luck -and help from your database administrator- there might be a suitable covering index in the database that swiftly produces your result set, unhindered by locks. Here's a screenshot of SQL Server Management Studio displaying the generated query plans, and their corresponding costs:

The second query runs 24 times faster than the first one. I don't know about you, but I would call this a significant improvement!

Tip 2: Stay away from the Include syntax

The Include syntax from EF 4.0 is the successor of the LoadOptions from Linq-to-SQL. It allows you to eagerly load associated entities. Here's a sample query, returning persons and their phones:

var query = from p in model.People.Include("PersonPhones")

            select p;

Although it looks like a declarative outer join, it generates weird T-SQL:

 SELECT [Project1].[BusinessEntityID] AS [BusinessEntityID],

        [Project1].[PersonType] AS [PersonType],

        [Project1].[NameStyle] AS [NameStyle],

        [Project1].[Title] AS [Title],

        [Project1].[FirstName] AS [FirstName],

        [Project1].[MiddleName] AS [MiddleName],

        [Project1].[LastName] AS [LastName],

        [Project1].[Suffix] AS [Suffix],

        [Project1].[EmailPromotion] AS [EmailPromotion],

        [Project1].[AdditionalContactInfo] AS [AdditionalContactInfo],

        [Project1].[Demographics] AS [Demographics],

        [Project1].[rowguid] AS [rowguid],

        [Project1].[ModifiedDate] AS [ModifiedDate],

        [Project1].[C1] AS [C1],

        [Project1].[BusinessEntityID1] AS [BusinessEntityID1],

        [Project1].[PhoneNumber] AS [PhoneNumber],

        [Project1].[PhoneNumberTypeID] AS [PhoneNumberTypeID],

        [Project1].[ModifiedDate1] AS [ModifiedDate1]

   FROM (SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],

                [Extent1].[PersonType] AS [PersonType],

                [Extent1].[NameStyle] AS [NameStyle],

                [Extent1].[Title] AS [Title],

                [Extent1].[FirstName] AS [FirstName],

                [Extent1].[MiddleName] AS [MiddleName],

                [Extent1].[LastName] AS [LastName],

                [Extent1].[Suffix] AS [Suffix],

                [Extent1].[EmailPromotion] AS [EmailPromotion],

                [Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],

                [Extent1].[Demographics] AS [Demographics],

                [Extent1].[rowguid] AS [rowguid],

                [Extent1].[ModifiedDate] AS [ModifiedDate],

                [Extent2].[BusinessEntityID] AS [BusinessEntityID1],

                [Extent2].[PhoneNumber] AS [PhoneNumber],

                [Extent2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],

                [Extent2].[ModifiedDate] AS [ModifiedDate1],

                CASE WHEN ([Extent2].[BusinessEntityID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]

           FROM [Person].[Person] AS [Extent1]

         LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2]

             ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]

        )  AS [Project1]

ORDER BY [Project1].[BusinessEntityID] ASC, [Project1].[C1] ASC

 

As an alternative, you could explicitly code a Linq outer join, or project the association. The database doesn't care, the two join flavors yield the same T-SQL query:

var query = from p in model.People

            join pp in model.PersonPhones

            on p.BusinessEntityID equals pp.BusinessEntityID

            into phones

            select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = phones };

var query = from p in model.People

            select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = p.PersonPhones };

This is the resulting T-SQL query:

  SELECT [Project1].[BusinessEntityID] AS [BusinessEntityID],

       [Project1].[LastName] AS [LastName],

       [Project1].[C1] AS [C1],

       [Project1].[BusinessEntityID1] AS [BusinessEntityID1],

       [Project1].[PhoneNumber] AS [PhoneNumber],

       [Project1].[PhoneNumberTypeID] AS [PhoneNumberTypeID],

       [Project1].[ModifiedDate] AS [ModifiedDate]

  FROM (SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],

               [Extent1].[LastName] AS [LastName],

               [Extent2].[BusinessEntityID] AS [BusinessEntityID1],

               [Extent2].[PhoneNumber] AS [PhoneNumber],

               [Extent2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],

               [Extent2].[ModifiedDate] AS [ModifiedDate],

               CASE

                  WHEN ([Extent2].[BusinessEntityID] IS NULL) THEN CAST(NULL AS int)

                  ELSE 1

               END AS [C1]

    FROM  [Person].[Person] AS [Extent1]

    LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]

)  AS [Project1]

ORDER BY [Project1].[BusinessEntityID] ASC, [Project1].[C1] ASC

 

It's still a weird query, but thanks to the 'LastName' projection it runs twice as fast. Here's the proof:

Neither the Include nor the standard Linq outer join allow to project selected columns from the PersonPhones table. And by the way: it would be nice if we could get rid of the unwanted sort operation that takes almost 60 % of the processing.

Tip 3: Consider alternatives

In the methods that require the best possible SQL queries, you might be tempted to abandon the Entity Framework and use another option.

Linq to SQL

If you're only targeting SQL Server, then Linq-to-SQL (L2S) provides a nice alternative for EF. According to the rumors, Linq-to-SQL still generally produces a higher quality of T-SQL.

So let's check it out. Here's the outer join in L2S (it's the same as in EF40) :

var query = from p in model.Persons

            select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = p.PersonPhones };

Here's the resulting query:

SELECT [t0].[LastName], [t1].[BusinessEntityID], [t1].[PhoneNumber], [t1].[PhoneNumberTypeID], [t1].[ModifiedDate],(

    SELECT COUNT(*)

      FROM [Person].[PersonPhone] AS [t2]

     WHERE [t2].[BusinessEntityID] = [t0].[BusinessEntityID]

      ) AS [value]

    FROM [Person].[Person] AS [t0]

LEFT OUTER JOIN [Person].[PersonPhone] AS [t1]

      ON [t1].[BusinessEntityID] = [t0].[BusinessEntityID]

ORDER BY [t0].[BusinessEntityID], [t1].[PhoneNumber], [t1].[PhoneNumberTypeID]

Here's a comparison of the query plans in SQL Server Management Studio:

As you observe, the L2S query takes more resources than the EF40 version. This is not a real surprise: as opposed to L2S, EF40 is continuously improving. Its bugs get fixed and the Linq provider gets smarter with every iteration. So sticking -or returning- to L2S might give you only a short term advantage.

Brew your own query

Where query-generating technologies fail, you should build the T-SQL queries yourself. It's the only way to get full access to the database syntax: ranking functions, cross apply calls, common table expressions, optimization and locking hints, etc. Fortunately you can do this whilst still standing on the shoulders of EF40. You don't have to programatically instantiate the whole underlying ADO.NET object stack (Connection, Adapter, Command, Datareader) yourself. EF40 will do it for you:

string joinStatement = @"SELECT [t0].[LastName], [t1].[PhoneNumber]

                        FROM [Person].[Person] AS [t0]

            LEFT OUTER JOIN [Person].[PersonPhone] AS [t1]

                            ON [t1].[BusinessEntityID] = [t0].[BusinessEntityID]";

 

var query = model.ExecuteStoreQuery<PersonWithPhoneDto>(joinStatement).ToList();

Here's the comparison between the Linq outer join and the T-SQL outer join:

 The home-made query runs three times faster. This is because we're now able to select only the needed columns from both tables.

Tip 4: But then come back to EF40, and try harder

If an alternative technology produces much better results than EF40, then you must have done something wrong. After all, EF40 is Microsoft's main data access technology.

You observed that simple EF40 Linq queries yield weird T-SQL. Well, the opposite is also true. Consider the following query:

var query = from p in model.People

            from pp in

                (from zz in model.PersonPhones

                    where p.BusinessEntityID == zz.BusinessEntityID

                    select new { zz.PhoneNumber }).DefaultIfEmpty()

            select new PersonWithPhoneDto() { LastName = p.LastName, PhoneNumber = pp.PhoneNumber };

It's an inner join between 'all persons' and 'all person phones or a default value'. In other words: it's an (left) outer join. On top of that, it only fetches the needed columns from both tables.

An indeed this query yields a T-SQL query that is spot on the ideal version.

I forgive EF40 for returning the primary key column. This takes no extra database resources -just bandwidth- and you probably need the value anyway in your business layer:

         SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],

                 [Extent1].[LastName] AS [LastName],

                [Extent2].[PhoneNumber] AS [PhoneNumber]

           FROM [Person].[Person] AS [Extent1]

LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2]

             ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]

The same query also runs in L2S, but does not return the BusinessEntityId column. And neither of the queries cause an internal sort!

Tip 5: Always return custom data transfer objects

Your should avoid returning Self Tracking Entities from your data access layer methods. Use lightweight custom data transfer objects. Only this technique will help you to put all previous tips in practice.

Source code

Here's the full source code of the sample project. I put the L2S code in a separate assembly to avoid namespace collisions: U2UConsult.EF4.Linq.Tips.zip (29,37 kb)

Enjoy!