Deciding when to do a differential backup

SQL Server allows for differential backups, which only backups extends modified since the last full backup. The advantage of these differential backups is that if only a small portion of your data has changed since the last full backup, you need less time and storage for your backups! The disadvantage however is that the time needed for a restore increases: We must first restore the full backup, then restore the last differential backup.So when few extends have changed since the last full backup, the gain on the backup is huge, and the pain when restoring is limited. But as more and more extends are being modified, the pain grows and the gain shrinks. So to decide whether we go for a full or differential backup we need to know the number of extends modified since the last full backup. But on SQL Server 2016 and earlier the only easy way to figure out was… by taking a differential backup. SQL Server 2017 offers a nice improvement on this. In the sys.dm_db_file_space_usage dynamic management view an extra column modified_extent_page_count has been added which provides this information. So if you’re willing to stick to differential backups till they are 70% of the size of your full database, you could use this script to get the backup advice: SELECT total_page_count, allocated_extent_page_count, modified_extent_page_count, modified_extent_page_count * 100 / allocated_extent_page_count AS [% changed] , CASE WHEN modified_extent_page_count * 100 / allocated_extent_page_count > 70     THEN 'FULL'     ELSE 'DIFFERENTIAL'     END AS AdviceFROM sys.dm_db_file_space_usage An example: So with this we can improve our maintenance plan. Next step: convince Ola to include this in his maintenance script

Building custom reports for SQL Server management

Recently I presented a session at the ITPROceed conference on ‘Administration intelligence’: using Business Intelligence solutions for analyzing log files, generate documentation, do disk quota estimations and other typical administration tasks. You can find the full presentation at SlideShare. In this blog post, I want to focus on one of the techniques explained in this talk: creating Reporting Services reports to run in Management Studio. SQL Server Management Studio reports SQL Server Management Studio contains a bunch of predefined reports. In the Object Explorer, right click any database, select Reports –> Standard Reports and you get a listing of nearly 20 predefined reports:   If we select any of these reports, a new tab open in Management Studio in which the report renders. There are a few special things to notice about these reports: They can be exported in PDF, Word and Excel format, as well as being printed It can be a multi-page report. There is no explicit way to go to the next or previous page, but if you scroll past the end of the current page, you go to the next page. The report is context sensitive: if you open the same report by clicking on a different database, it shows information for that specific database. Before we dive into building our own reports, I just wanted to point out that we have these predefined reports not only at the level of databases, but also at the server instance level, as well as on the Data Collection node (Management), the Integration Services Catalogs (from 2012 on) and the SQL Server Agent node. Just go ahead and try these predefined reports: They provide very useful information without the effort of building reports yourself… plus they might inspire you in building your own reports. From report consumer to report producer These predefined reports can make our life easier, but sometimes you want a report different from these predefined reports.Building a custom report is easy. Step 1: Retrieving the data The first (and often most difficult) step in creating any report is in retrieving the right data. Luckily SQL Server provides tons of information via the many management views. Just start typing SELECT * FROM sys. and the completion tool will show you the huge list of possible completions. If you want to read more on these before you continue, checkout the web, e.g. has some interesting reading material for you. In this example we want to report on log file sizes: what are the biggest log files I have on any server, and home many room for growth (or shrinking) do we still have in there. A query with which we can get started is this one: SELECT *FROM sys.dm_os_performance_countersWHERE counter_name IN ( 'Log File(s) Size (KB)' ,'Log File(s) Used Size (KB)' ) AND instance_name <> '_Total' This returns a table such as this:   But in reporting it is important to present the data as easy digestible as possible. It is easier to build reports if different measures (in our case log size and log used size) are in different columns versus all in one single column (as they are right now). The PIVOT statement in T-SQL might help us out on that. Also, simple resorting and renaming might improve the readability of the result. So after a bit of a rewrite we end up with this query: SELECT [Database Name] ,[Log File(s) Used Size (KB)] AS LogUsedSize ,[Log File(s) Size (KB)] AS LogTotalSizeFROM ( SELECT RTRIM(instance_name) AS [Database Name] ,cntr_value AS [Size in Kb] ,RTRIM(counter_name) AS [Counter Name] FROM sys.dm_os_performance_counters WHERE counter_name IN ( 'Log File(s) Size (KB)' ,'Log File(s) Used Size (KB)' ) AND instance_name <> '_Total' ) AS srcPIVOT(SUM([Size in Kb]) FOR [Counter Name] IN ( [Log File(s) Size (KB)] ,[Log File(s) Used Size (KB)] )) AS pvtORDER BY [Log File(s) Size (KB)] DESC   Just copy the query in the clipboard, we will need it for the next step: Step 2: Building the actual report To build a report which we can run in Management Studio, we need to have SQL Server Data Tools (SSDT) installed (the tools formerly known as Business Intelligence Development Studio, BIDS, so you can use BIDS as well if you’re not yet on 2012). This is part of the regular SQL Server setup, so you can find it on your SQL Server 2012 or later installation media. But you can also just go and download it from the Microsoft Download Center. After downloading, installing and starting this tool (which is just a Visual Studio template), we instruct it to create a new project. In the list of installed templates, you should find back the Business Intelligence group, select the Reporting Services group within, and on the right, click the Report Server Project Wizard and provide a useful project name:   Once you click OK, the wizard will start. Click Next to get past the start screen. On the next screen the wizard asks which type of database connection we want to have, and to which server and database we want to connect. This is the tricky part, because we actually want to connect to whatever server and database we click upon in Management Studio object explorer. But now, we first need to develop this report in SSDT. Luckily, there is an easy way out: If we use the Microsoft SQL Server data source type, whatever server and database we select will be replaced with the ‘current’ server and database whenever we run this from Management Studio. So lets just connect to a server (I’m using localhost over here, but any server to which you can connect from SSDT will do):     Notice that if you want your report to not query the current database but a fixed database (e.g. always master or msdb), you should use three part names in your query (e.g. SELECT * FROM msdb.dbo.suspect_pages).   Next screen asks for the query to retrieve the data for this report. Paste the query we’ve written earlier in this blog into this window. The next screen asks for a report type, just confirm the tabular report. Then we have a screen that asks for the report design. Just drag all three fields into the Details window and click Finish. Then we hit the last wizard screen where we give this report a meaningful name: Log Size. Click Finish. Step 3: fine-tuning the report After we finish the wizard we end up in the editor. Click the preview tab to get an impression of what the report will look once deployed:   This report only looks slightly better than the table we got in Management Studio earlier on. We need to make a couple of improvements. Switch back to the Design tab to widen the database name column (just drag the column header to the right). Finally we want to have a graphical indication on the size of these log files. To achieve this, right-click the rightmost column header in our report and add a column to the right:   Next, from the Toolbox window, drag the Data Bar component into the bottom cell of this newly created column. In the pop-up window, just confirm the default data bar type. Finally, click the blue bar that appears in this cell. On the right another pop-up window appears. Click the green plus next to Values and click LogTotalSize:   Now preview the report again:   Step 4: Run in Management Studio Now that we have a report which is easy to read, let’s make sure we can easily open up this report in Management Studio. In SSDT, in the File menu select Save Log File.rdl As... In the save dialog, navigate to My Documents > SQL Server Management Studio > Custom Reports and save it over there. Notice that this is not required, Management Studio can open reports from any file location; It’s just the default location for these files. Finally, we can now open up Management Studio, right click any object in the object explorer, select Reports > Custom Reports, and select the file we just saved there.   If you can connect to multiple instances of SQL Server, try out this report on all of them, and you will see how the report takes the server context into account.   Right click to print and export the report… you can have all the options we had on the predefined reports.   This is not the end of it, it’s just the start. There is a lot more to learn about building Reporting Services reports, as well as about all the useful information we can extract from SQL Server. So, give it a try!   Download If you just want to download the .rdl file we created in this post, you can find it at

SQL Server 2014 In-memory OLTP (a.k.a Hekaton) @ Techorama

A few days ago I presented a session on the In-Memory OLTP engine in SQL Server 2014 at Techorama, the Belgian developer conference. I talked about the good things In-Memory OLTP does, and the limitations it has. If you want to go through the slides and code examples again, feel free to download these from my OneDrive.

The whole world in a SQL Server database

Some people have the whole world in their hand, but last weekend I tried something slightly less ambitious: Getting (part of) the whole world in a SQL Server database. Loads of spatial data from OpenStreetMap I’m always looking for interesting datasets. I was still lacking a large dataset with spatial data. The biggest public spatial dataset I’m aware of is OpenStreetMap (OSM). It’s a map service comparable with Bing or Google Maps. But the data it uses is collected from public data: Some governments provide public location information, but there are also thousands of volunteers who help OSM in putting their street or village ‘on the map’. The resulting database is public as well. It is described in XML format, and I found on the wiki also an OSM XSD schema. This allows me to load the data with SQL Server Integration Services, using the data flow XML data source. There might be faster solutions than this, but for the time being, SSIS does a rather good job with a very simple setup. Creating the database So, let’s create the database in which we want to store our data. Be sure to create a database with an initial size big enough to host the data you want to load. The OSM xml file with all the data from the whole planet is about 250 Gb at the time of writing, so It will probably take about 150 Gb in SQL Server (XML is not the most compact storage format). In this blog post example I will be loading a subset of the whole planet, so I create a much smaller database (just under 2 Gb): 1: CREATE DATABASE [OSM] 2: ON PRIMARY 3: ( NAME = N'OSM', FILENAME = N'F:\Data\OSM.mdf' , SIZE = 2048000KB , MAXSIZE = UNLIMITED, FILEGROWTH = 512000KB ) 4: LOG ON 5: ( NAME = N'OSM_log', FILENAME = N'F:\Data\OSM_log.ldf' , SIZE = 3170880KB , MAXSIZE = 2048GB , FILEGROWTH = 10%) If you really plan to create a much larger database, don’t forget to enable instant initialization to speed up the process of creating and extending the data file(s). Next I create all the tables to store the different pieces of information in the OSM XML files. Notice that I do not keep all the information in the XML files. I do not store the person who entered the data, and the date the data was entered. This reduces the amount of data to be stored, which speeds up the data loading. This is the code to create the necessary tables: 1: USE [OSM] 2: GO 3: CREATE TABLE [dbo].[Nodes]( 4: [node_Id] [bigint] NOT NULL, 5: [id] [bigint] NULL, 6: [lat] [float] NULL, 7: [lon] [float] NULL, 8: [geog] [geography] NULL, 9: CONSTRAINT [PK_Nodes] PRIMARY KEY CLUSTERED 10: ( 11: [node_Id] ASC 12: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY] 13: ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] 14:  15: CREATE TABLE [dbo].[NodeTag]( 16: [key] [nvarchar](250) NULL, 17: [value] [nvarchar](3000) NULL, 18: [node_Id] [bigint] NULL 19: ) ON [PRIMARY]; 20: GO 21: CREATE TABLE [dbo].[Relation]( 22: [relation_Id] [bigint] NOT NULL, 23: [id] [bigint] NULL, 24: [visible] [bit] NULL, 25: CONSTRAINT [PK_Relation] PRIMARY KEY CLUSTERED 26: ( 27: [relation_Id] ASC 28: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY] 29: ) ON [PRIMARY] 30: GO 31: CREATE TABLE [dbo].[RelationMember]( 32: [type] [nvarchar](8) NULL, 33: [ref] [bigint] NULL, 34: [role] [nvarchar](250) NULL, 35: [relation_Id] [bigint] NULL, 36: [RelationPosition] [int] IDENTITY(1,1) NOT NULL 37: ) ON [PRIMARY] 38:  39: GO 40: CREATE TABLE [dbo].[RelationTag]( 41: [key] [nvarchar](250) NULL, 42: [value] [nvarchar](3000) NULL, 43: [relation_Id] [bigint] NULL 44: ) ON [PRIMARY] 45:  46: GO 47: CREATE TABLE [dbo].[Way]( 48: [way_Id] [bigint] NOT NULL, 49: [id] [bigint] NULL, 50: [visible] [bit] NULL, 51: CONSTRAINT [PK_Way] PRIMARY KEY CLUSTERED 52: ( 53: [way_Id] ASC 54: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY] 55: ) ON [PRIMARY] 56:  57: GO 58: CREATE TABLE [dbo].[WayNode]( 59: [ref] [bigint] NULL, 60: [way_Id] [bigint] NULL, 61: [WaiNodePosition] [bigint] IDENTITY(1,1) NOT NULL 62: ) ON [PRIMARY] 63: GO 64: CREATE TABLE [dbo].[WayTag]( 65: [key] [nvarchar](250) NULL, 66: [value] [nvarchar](3000) NULL, 67: [way_Id] [bigint] NULL 68: ) ON [PRIMARY] 69:  70: GO After the destination tables are created, it’s time to download an OSM xml file. I downloaded the file for Belgium from one of the mirror servers. After unzipping the file, it is about 2.4 Gb. All the remaining timings in this article are based on this file, but of course, feel free to experiment with data from other countries as well. Loading the data with SSIS Now it’s time to create an Integration Services package which uses an XML source to read the OSM xml file, parse it with the XSD file I mentioned earlier, and write the results to the corresponding tables. I used SQL Server destinations, but if you do be careful to set the destination timeout high enough, or your package will fail on this. You can reuse the SSIS package I made on SQL Server 2012, but be sure to check and correct the pointers to the appropriate files. Fingers crossed, run the SSIS package and wait till the data is loaded. On my machine it took about 17 minutes. It loaded about 12 million coordinates (Nodes), and more than 1.5 million road segments and other combined objects (Way). The only table that really contains coordinates is the Nodes table. In order to use SQL Server’s spatial functions on these, we must first create geographical objects out of the latitude and longitude info. This can simply be done with this statement: 1: UPDATE Nodes SET geog = geography::Point(lat, lon, 4326) This took about 5 minutes on my machine. From now on, we can start querying this information is we want, but my Belgian OSM file contains about 12 million coordinates… without a decent index, every query is going to take forever, so let’s build a spatial index on this. In the next code example I use the new type of spatial index they introduced in SQL Server 2012, but feel free to try other types of spatial indexes as well. Spatial Ed has a short ‘getting started’ paper if you want to play with spatial indexes. 1: CREATE SPATIAL INDEX [SpatialIndex-Nodes] ON [dbo].[Nodes] 2: ( 3: [geog] 4: )USING GEOGRAPHY_AUTO_GRID 5: WITH ( 6: CELLS_PER_OBJECT = 16) ON [PRIMARY] 7: GO After this index is created, we can get started querying this interesting data source. More on that (and how to create more advanced spatial objects to represent roads, areas etc.) in a later blog post. Happy querying! Nico

Fuzzy lookup of names with a Dutch Metaphone implementation

Phonetic algorithms are heavily used to provide suggestions and corrections in spell checkers and also for searching and matching names. You could use such algorithms to populate autocomplete controls for the lookup of cities and streets, patients in a hospital, or reservations for a hotel stay. All you need to know is how a part of the name of the street, patient, or guest sound like, transform your search text with the algorithm, and lookup the encoded string in the dictionary. Metaphone is a popular algorithm family in this domain. Metaphone is used for encoding a word so that similar sounding words encode the same. That sounds as Soundex, doesn't it ? Well, that's true, but Metaphone it doesn't suffer from a fixed length output (Soundex just encode to a letter followed by three numbers), it's not specialized in names only, and it's not specialized in the English language only. For the record: because of these same reasons, Metaphone requires more computational power and storage capacity than Soundex. Metaphone and double metaphone (the less English-centric version) are in the public domain and can be freely implemented. So I did some research into the English metaphone rules, and then jumped into the web of Dutch pronunciation rules. I downloaded one of the many available free implementations (in T-SQL), and adapted it towards the Dutch language. And believe me, if you thought that Dutch spelling was difficult, you should check our pronunciation rules . Fortunately Metaphone ignores all vowels (except the initial one) and diphthongs, and it doesn't specify how you should pronounce a 'g' or a 'r', so I think I came up with a pretty region-agnostic encoding (there are many-many-many Dutch dialects). Examples Here are a couple of examples: DECLARE @Test TABLE(Name NVARCHAR(50))   INSERT @Test VALUES('vrede'), ('wrede'), ('luycks'), ('luiks') INSERT @Test VALUES('Vandenbroecke'), ('Vandenberghe'), ('Verbrugghe'), ('Verschueren') INSERT @Test VALUES('baobab'), ('china'), ('shampoo'), ('sjamaan'), ('tjokvol')   SELECT Name, U2UConsult.MetaphoneDutch(name) AS [Dutch Metaphone]   FROM @Test GO And the result: Here's a lookup in the AdventureWorks persons table for last names that contain a fragment that sounds like "kamer": DECLARE @x NVARCHAR(150) = U2UConsult.MetaphoneDutch('kamer')   ;WITH cte AS ( SELECT DISTINCT LastName, U2UConsult.MetaphoneDutch(LastName) AS [Dutch Metaphone]   FROM Person.Person) SELECT * FROM cte WHERE [Dutch Metaphone] LIKE ('%' + @x + '%') GO Here's the result: Of course you should not launch this type of queries continuously against a production database. The encoded lookup value should be stored through a trigger, a persisted calculated column, or an indexed view. Source Code Here's the source code. The function is optimized for readability, not performance: USE [AdventureWorks2008] GO   CREATE Schema [U2UConsult] GO   CREATE FUNCTION [U2UConsult].[MetaphoneDutch](@word AS VARCHAR(150)) RETURNS VARCHAR(150) AS BEGIN     DECLARE @result VARCHAR(150)     DECLARE @nextChar CHAR(2)     DECLARE @currentChar CHAR(1)     DECLARE @previousChar CHAR(1)     DECLARE @length TINYINT     DECLARE @count TINYINT       SET @word = LTRIM(RTRIM(@word))     SET @length = LEN(@word)     SET @count = 1     SET @result = ''     SET @currentChar = SUBSTRING(@word, @count, 1)       IF @currentChar IN ('a', 'e', 'i', 'o', 'u')         SET @result = 'a'       WHILE @count <= @length BEGIN         SET @currentChar = SUBSTRING(@word, @count, 1);         if (@currentChar = ' ') BEGIN             SET @result = @result + ' ' + U2UConsult.MetaphoneDutch(SUBSTRING(@word, @count, @length))             RETURN @result         END         IF @count <> 1             SET @previousChar = SUBSTRING(@word, (@count-1), 1)         ELSE             SET @previousChar = ' '         IF @previousChar <> @currentChar BEGIN             SET @nextChar = SUBSTRING(@word, @count + 1, 1)               -- b --             IF @currentChar = 'b'                 IF @count = @length                     SET @result = @result + 'p'                 ELSE                     SET @result = @result + 'b'               -- c --             IF @currentChar = 'c'                 IF @nextChar = 'h'                     IF @count = 1                         SET @result = @result + 'x'                     ELSE                         SET @result = @result + 'g'                 ELSE IF @nextChar IN ('i', 'e', 'y') AND @previousChar <> 's'                     SET @result = @result + 's'                 ELSE                     SET @result = @result + 'k'               -- d --             IF @currentChar = 'd'                 IF SUBSTRING(@word,@count,3) IN ('dge', 'dgy', 'dgi')                     SET @result = @result + 'j'                 ELSE                     IF @count = @length                         SET @result = @result + 't'                     ELSE                         SET @result = @result + 'd'               -- f --             IF @currentChar = 'f'                 SET @result = @result + 'f'               -- g --             IF @currentChar = 'g'                 SET @result = @result + 'g'               -- h --             IF @currentChar = 'h'                 IF @previousChar NOT IN ('c', 's', 'p', 't', 'g') AND @nextChar IN ('a', 'e', 'i', 'o', 'u', 'y')                     SET @result = @result + 'h'               -- j --             IF @currentChar = 'j'                 IF @previousChar NOT IN ('i', 's')                     IF @previousChar = 't'                         SET @result = @result + 'x'                     ELSE                         SET @result = @result + 'j'               -- k --             IF @currentChar = 'k'                 IF @previousChar <> 'c'                     SET @result = @result + 'k'               -- l --             IF @currentChar = 'l'                 SET @result = @result + 'l'               -- m --             IF @currentChar = 'm'                 SET @result = @result + 'm'               -- n --             IF @currentChar = 'n'                 SET @result = @result + 'n'               -- p --             IF @currentChar = 'p'                 IF @nextChar = 'h'                     SET @result = @result + 'f'                 ELSE                     SET @result = @result + 'p'               -- q --             IF @currentChar = 'q'                 SET @result = @result + 'k'               -- r --             IF @currentChar IN ('r')                 SET @result = @result + 'r'               -- s --             IF @currentChar = 's'                 IF @nextChar IN ('h', 'j')                     SET @result = @result + 'x'                 ELSE                     SET @result = @result + 's'               -- t --             IF @currentChar = 't'                 IF SUBSTRING(@word,@count,3) IN ('tia', 'tio', 'tie', 'tiu')                     SET @result = @result + 's'                 ELSE                     SET @result = @result + 't'               -- v --             IF @currentChar = 'v'                 SET @result = @result + 'v'               -- w --             IF @currentChar = 'w'                 IF @nextChar = 'r'                     SET @result = @result + 'v'                 ELSE                     SET @result = @result + 'w'               -- x --             IF @currentChar = 'x'                 SET @result = @result + 'ks'               -- y --             IF @currentChar = 'y'                 IF @nextChar IN ('a', 'e', 'i', 'o', 'u')                     SET @result = @result + 'j'               -- z --             IF @currentChar = 'z'                 SET @result = @result + 'z'         END         SET @count = @count + 1     END       RETURN @result -- Mandatory END Enjoy!

Building an Entity Framework 4.0 model on views: practical tips

Many development teams and database administrators use views to create an abstraction layer on top of a physical data model. In this article I'll show you how to build an Entity Framework 4.0 (EF4) model upon such a set of views. I'll create a couple of views in SQL Server -some of them indexed-, import them in a model using the Visual Studio designer, decorate them with different types of associations (foreign key and mapped table), and finally attack the model with some Linq queries. This is how the model looks like: Creating the views Views that are directly consumed by your application should be stable. That's why I prefer such views to be declared with the SCHEMABINDING option: CREATE VIEW U2UConsult.Person WITH SCHEMABINDING AS SELECT BusinessEntityID AS PersonId,        Title,        FirstName,        LastName   FROM Person.Person The SCHEMABINDING option protects a view against modifications to the schema of the underlying tables, at least modifications that would invalidate that view. E.g. it becomes impossible to drop a column on which the view relies: Optional step: indexing the views Sometimes it makes sense to persist the view on disk and let SQL Server make sure that its content remains in sync with the underlying tables. This is very useful for complex views (with lots of joins and calculations) on stable tables. All we need to do is creating a clustered index on the view: CREATE UNIQUE CLUSTERED INDEX [IUX_Person] ON [U2UConsult].[Person]([PersonId]) WITH (FILLFACTOR = 100)   Importing the views You import a view into an entity model just like you import a table. But views -even indexed ones- can not have primary or foreign keys in the database, so there's no metadata to import. The visual designer overcompensates this by inferring a key composed of all non-nullable columns. This is not a good idea: the first thing you need to do is define the primary key of the view: before after Now do the same steps for the Address view: CREATE VIEW U2UConsult.Address WITH SCHEMABINDING AS SELECT AddressID AS AddressId,        AddressLine1,        AddressLine2,        PostalCode,        City FROM Person.Address GO   CREATE UNIQUE CLUSTERED INDEX [IUX_Address] ON [U2UConsult].[Address]([AddressId]) WITH (FILLFACTOR = 100) GO   Defining 1-to-1 or 1-to-many relationships In a table, you would express 1-to-1 or 1-to-many relationships by creating a foreign key relationship. In an entity model, you can do the same with views. For starters, define a new association between two views. The model looks good now, and IntelliSense will help you while building Linq queries against it. However, your missing an important ingredient: the physical model doesn't know to solve the association:  When creating the association, don't forget to check the 'add foreign key properties' box: If necessary, you can refine the data store binding by clicking on the ellipsis next to 'referential constraint' in the properties window: After that, you need to remove the view from the designer. The view is used as entity ànd as association, EF4 does not like that: Defining many-to-many relationships Many-to-many relations are generally implemented through an intermediate table. A many-to-many relationship between two views is built exactly the same way. The AdventureWorks2008 database has an intermediate table between Address and BusinesEntity (= Person): BusinessEntityAddress. Unfortunately we can't use this table to carry the association. Strangely enough the entity framework requires that all its physical (SSDL) primary key fields should be mapped. Using that table as the glue between Persons and Addresses yields the following error:   As a work around, you could define the link table as a view: CREATE VIEW U2UConsult.PersonAddress WITH SCHEMABINDING AS SELECT DISTINCT BusinessEntityID AS PersonId,        AddressID AS AddressId FROM Person.BusinessEntityAddress Then add it to the entity model, and map the association to it. The foreign key checkbox will be disabled for many-to-many associations:   Both types of associations were created using the designer. I didn't need to manually tweak the SSDL-part of the model. So when we upgrade it from the database they will remain intact. Querying the views For Linq it doesn't matter where the data comes from, so you use the views like you would use a table: Person person = (from p in entities.Person.Include("Phones"                  where p.PersonId == 1                  select p).FirstOrDefault(); This gives the following result: If EF4 performance matters to you, you might want to (re-)read this article. Source Here's the sample project, all SQL queries are included: (14,33 kb) Enjoy!

Very Spatial Wishes

Last year I encountered this T-SQL script on a MSDN Forum. Run it in SQL Management Studio (not Visual Studio!), and select 'Spatial Results': -- Original Source:   -- Prepare the scene CREATE TABLE #ChristmasScene (   item varchar(32),   shape geometry);   -- Put up the tree and star INSERT INTO #ChristmasScene VALUES ('Tree', 'POLYGON((4 0, 0 0, 3 2, 1 2, 3 4, 1 4, 3 6, 2 6, 4 8, 6 6, 5 6, 7 4, 5 4, 7 2, 5 2, 8 0, 4 0))'), ('Base', 'POLYGON((2.5 0, 3 -1, 5 -1, 5.5 0, 2.5 0))'), ('Star', 'POLYGON((4 7.5, 3.5 7.25, 3.6 7.9, 3.1 8.2, 3.8 8.2, 4 8.9, 4.2 8.2, 4.9 8.2, 4.4 7.9, 4.5 7.25, 4 7.5))')   -- Decorate the tree DECLARE @i int = 0, @x int, @y int; WHILE (@i < 20) BEGIN   INSERT INTO #ChristmasScene VALUES     ('Ball' + CAST(@i AS varchar(8)), geometry::Point(RAND() * 5 +1.5, RAND() * 6, 0).STBuffer(0.3))   SET @i = @i + 1; END   -- Christmas Greeting INSERT INTO #ChristmasScene VALUES ('M', 'POLYGON((0 10, 0 11, 0.25 11, 0.5 10.5, 0.75 11, 1 11, 1 10, 0.75 10, 0.75 10.7, 0.5 10.2, 0.25 10.7, 0.25 10, 0 10))'), ('E', 'POLYGON((1 10, 1 11, 2 11, 2 10.8, 1.25 10.8, 1.25 10.6, 1.75 10.6, 1.75 10.4, 1.25 10.4, 1.25 10.2, 2 10.2, 2 10, 1 10))'), ('R', 'POLYGON((2 10, 2 11, 3 11, 3 10.5, 2.4 10.5, 3 10, 2.7 10, 2.2 10.4, 2.2 10, 2 10),(2.2 10.8, 2.8 10.8, 2.8 10.7, 2.2 10.7, 2.2 10.8))'), ('R', 'POLYGON((3 10, 3 11, 4 11, 4 10.5, 3.4 10.5, 4 10, 3.7 10, 3.2 10.4, 3.2 10, 3 10),(3.2 10.8, 3.8 10.8, 3.8 10.7, 3.2 10.7, 3.2 10.8))'), ('Y', 'POLYGON((4 11, 4.2 11, 4.5 10.6, 4.8 11, 5 11, 4.6 10.5, 4.6 10, 4.4 10, 4.4 10.5, 4 11))'), ('C', 'POLYGON((0 9, 0 10, 1 10, 1 9.8, 0.2 9.8, 0.2 9.2, 1 9.2, 1 9, 0 9))'), ('H', 'POLYGON((1 9, 1 10, 1.2 10, 1.2 9.6, 1.8 9.6, 1.8 10, 2 10, 2 9, 1.8 9, 1.8 9.4, 1.2 9.4, 1.2 9, 1 9))'), ('R', 'POLYGON((2 9, 2 10, 3 10, 3 9.5, 2.4 9.5, 3 9, 2.7 9, 2.2 9.4, 2.2 9, 2 9),(2.2 9.8, 2.8 9.8, 2.8 9.7, 2.2 9.7, 2.2 9.8))'), ('I', 'POLYGON((3.2 9, 3.2 9.2, 3.4 9.2, 3.4 9.8, 3.2 9.8, 3.2 10, 3.8 10, 3.8 9.8, 3.6 9.8, 3.6 9.2, 3.8 9.2, 3.8 9, 3.2 9))'), ('S', 'POLYGON((4 9, 4 9.2, 4.8 9.2, 4.8 9.4, 4 9.4, 4 10, 5 10, 5 9.8, 4.2 9.8, 4.2 9.6, 5 9.6, 5 9, 4 9))'), ('T', 'POLYGON((5 9.8, 5 10, 6 10, 6 9.8, 5.6 9.8, 5.6 9, 5.4 9, 5.4 9.8, 5 9.8))'), ('M', 'POLYGON((6 9, 6 10, 6.25 10, 6.5 9.5, 6.75 10, 7 10, 7 9, 6.75 9, 6.75 9.7, 6.5 9.2, 6.25 9.7, 6.25 9, 6 9))'), ('A', 'POLYGON((7 9, 7 10, 8 10, 8 9, 7.75 9, 7.75 9.3, 7.25 9.3, 7.25 9, 7 9),(7.25 9.5, 7.25 9.8, 7.75 9.8, 7.75 9.5, 7.25 9.5))'), ('S', 'POLYGON((8 9, 8 9.2, 8.8 9.2, 8.8 9.4, 8 9.4, 8 10, 9 10, 9 9.8, 8.2 9.8, 8.2 9.6, 9 9.6, 9 9, 8 9))');   -- Admire the scene SELECT * FROM #ChristmasScene   -- Tidy up the pine needles and put away the decorations DROP TABLE #ChristmasScene   Happy holidays!

A fistful of Entity Framework 4.0 Tips

This article presents some useful tips for building the data access layer of an enterprise application on top of Entity Framework 4.0 (EF40). For those who can't wait, here they are: 1. Only project the columns you really need, 2. Stay away from the Include syntax, 3. Consider alternatives, 4. But then come back and try harder, and 5. Always return custom data tracking objects. The focus of EF40 lies on developer productivity, not database performance. So there are a couple of caveats you should be aware of when you don't want your data access to become the bottleneck. I'll illustrate my tips by showing you some different ways to issue a left outer join on a small entity model with just the Person and PersonPhone entities: Tip 1: Only project the columns you really need You should never return full-fledged entities from your queries. If you're only interested in the FirstName and LastName of Person entities, then the following EF40 query is definitely a bad idea: var query = from p in model.People             select p; A query like this selects all the columns from the underlying table. It most probably has only one covering index in the database: the clustered index. This query will suffer from all kinds of locks on the table. Just execute the following SQL (from Visual Studio or SQL Management Studio), and then start the Linq query: BEGIN TRANSACTION       UPDATE Person.Person        SET Title = NULL      WHERE BusinessEntityID = 1 If default isolation levels are applied to the database, the Linq query will be blocked and eventually time out: [Don't forget to Rollback the transaction.] You should only project (that's just a fancy word for 'select') the needed columns, like this: var query = from p in model.People             select new PersonDto() { LastName = p.LastName, FirstName = p.FirstName }; With a little luck -and help from your database administrator- there might be a suitable covering index in the database that swiftly produces your result set, unhindered by locks. Here's a screenshot of SQL Server Management Studio displaying the generated query plans, and their corresponding costs: The second query runs 24 times faster than the first one. I don't know about you, but I would call this a significant improvement! Tip 2: Stay away from the Include syntax The Include syntax from EF 4.0 is the successor of the LoadOptions from Linq-to-SQL. It allows you to eagerly load associated entities. Here's a sample query, returning persons and their phones: var query = from p in model.People.Include("PersonPhones")             select p; Although it looks like a declarative outer join, it generates weird T-SQL:  SELECT [Project1].[BusinessEntityID] AS [BusinessEntityID],         [Project1].[PersonType] AS [PersonType],         [Project1].[NameStyle] AS [NameStyle],         [Project1].[Title] AS [Title],         [Project1].[FirstName] AS [FirstName],         [Project1].[MiddleName] AS [MiddleName],         [Project1].[LastName] AS [LastName],         [Project1].[Suffix] AS [Suffix],         [Project1].[EmailPromotion] AS [EmailPromotion],         [Project1].[AdditionalContactInfo] AS [AdditionalContactInfo],         [Project1].[Demographics] AS [Demographics],         [Project1].[rowguid] AS [rowguid],         [Project1].[ModifiedDate] AS [ModifiedDate],         [Project1].[C1] AS [C1],         [Project1].[BusinessEntityID1] AS [BusinessEntityID1],         [Project1].[PhoneNumber] AS [PhoneNumber],         [Project1].[PhoneNumberTypeID] AS [PhoneNumberTypeID],         [Project1].[ModifiedDate1] AS [ModifiedDate1]    FROM (SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],                 [Extent1].[PersonType] AS [PersonType],                 [Extent1].[NameStyle] AS [NameStyle],                 [Extent1].[Title] AS [Title],                 [Extent1].[FirstName] AS [FirstName],                 [Extent1].[MiddleName] AS [MiddleName],                 [Extent1].[LastName] AS [LastName],                 [Extent1].[Suffix] AS [Suffix],                 [Extent1].[EmailPromotion] AS [EmailPromotion],                 [Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],                 [Extent1].[Demographics] AS [Demographics],                 [Extent1].[rowguid] AS [rowguid],                 [Extent1].[ModifiedDate] AS [ModifiedDate],                 [Extent2].[BusinessEntityID] AS [BusinessEntityID1],                 [Extent2].[PhoneNumber] AS [PhoneNumber],                 [Extent2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],                 [Extent2].[ModifiedDate] AS [ModifiedDate1],                 CASE WHEN ([Extent2].[BusinessEntityID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]            FROM [Person].[Person] AS [Extent1]          LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2]              ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]         )  AS [Project1] ORDER BY [Project1].[BusinessEntityID] ASC, [Project1].[C1] ASC   As an alternative, you could explicitly code a Linq outer join, or project the association. The database doesn't care, the two join flavors yield the same T-SQL query: var query = from p in model.People             join pp in model.PersonPhones             on p.BusinessEntityID equals pp.BusinessEntityID             into phones             select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = phones }; var query = from p in model.People             select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = p.PersonPhones }; This is the resulting T-SQL query:   SELECT [Project1].[BusinessEntityID] AS [BusinessEntityID],        [Project1].[LastName] AS [LastName],        [Project1].[C1] AS [C1],        [Project1].[BusinessEntityID1] AS [BusinessEntityID1],        [Project1].[PhoneNumber] AS [PhoneNumber],        [Project1].[PhoneNumberTypeID] AS [PhoneNumberTypeID],        [Project1].[ModifiedDate] AS [ModifiedDate]   FROM (SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],                [Extent1].[LastName] AS [LastName],                [Extent2].[BusinessEntityID] AS [BusinessEntityID1],                [Extent2].[PhoneNumber] AS [PhoneNumber],                [Extent2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],                [Extent2].[ModifiedDate] AS [ModifiedDate],                CASE                   WHEN ([Extent2].[BusinessEntityID] IS NULL) THEN CAST(NULL AS int)                   ELSE 1                END AS [C1]     FROM  [Person].[Person] AS [Extent1]     LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID] )  AS [Project1] ORDER BY [Project1].[BusinessEntityID] ASC, [Project1].[C1] ASC   It's still a weird query, but thanks to the 'LastName' projection it runs twice as fast. Here's the proof: Neither the Include nor the standard Linq outer join allow to project selected columns from the PersonPhones table. And by the way: it would be nice if we could get rid of the unwanted sort operation that takes almost 60 % of the processing. Tip 3: Consider alternatives In the methods that require the best possible SQL queries, you might be tempted to abandon the Entity Framework and use another option. Linq to SQL If you're only targeting SQL Server, then Linq-to-SQL (L2S) provides a nice alternative for EF. According to the rumors, Linq-to-SQL still generally produces a higher quality of T-SQL. So let's check it out. Here's the outer join in L2S (it's the same as in EF40) : var query = from p in model.Persons             select new PersonWithPhonesDto() { LastName = p.LastName, PersonPhones = p.PersonPhones }; Here's the resulting query: SELECT [t0].[LastName], [t1].[BusinessEntityID], [t1].[PhoneNumber], [t1].[PhoneNumberTypeID], [t1].[ModifiedDate],(     SELECT COUNT(*)       FROM [Person].[PersonPhone] AS [t2]      WHERE [t2].[BusinessEntityID] = [t0].[BusinessEntityID]       ) AS [value]     FROM [Person].[Person] AS [t0] LEFT OUTER JOIN [Person].[PersonPhone] AS [t1]       ON [t1].[BusinessEntityID] = [t0].[BusinessEntityID] ORDER BY [t0].[BusinessEntityID], [t1].[PhoneNumber], [t1].[PhoneNumberTypeID] Here's a comparison of the query plans in SQL Server Management Studio: As you observe, the L2S query takes more resources than the EF40 version. This is not a real surprise: as opposed to L2S, EF40 is continuously improving. Its bugs get fixed and the Linq provider gets smarter with every iteration. So sticking -or returning- to L2S might give you only a short term advantage. Brew your own query Where query-generating technologies fail, you should build the T-SQL queries yourself. It's the only way to get full access to the database syntax: ranking functions, cross apply calls, common table expressions, optimization and locking hints, etc. Fortunately you can do this whilst still standing on the shoulders of EF40. You don't have to programatically instantiate the whole underlying ADO.NET object stack (Connection, Adapter, Command, Datareader) yourself. EF40 will do it for you: string joinStatement = @"SELECT [t0].[LastName], [t1].[PhoneNumber]                         FROM [Person].[Person] AS [t0]             LEFT OUTER JOIN [Person].[PersonPhone] AS [t1]                             ON [t1].[BusinessEntityID] = [t0].[BusinessEntityID]";   var query = model.ExecuteStoreQuery<PersonWithPhoneDto>(joinStatement).ToList(); Here's the comparison between the Linq outer join and the T-SQL outer join:  The home-made query runs three times faster. This is because we're now able to select only the needed columns from both tables. Tip 4: But then come back to EF40, and try harder If an alternative technology produces much better results than EF40, then you must have done something wrong. After all, EF40 is Microsoft's main data access technology. You observed that simple EF40 Linq queries yield weird T-SQL. Well, the opposite is also true. Consider the following query: var query = from p in model.People             from pp in                 (from zz in model.PersonPhones                     where p.BusinessEntityID == zz.BusinessEntityID                     select new { zz.PhoneNumber }).DefaultIfEmpty()             select new PersonWithPhoneDto() { LastName = p.LastName, PhoneNumber = pp.PhoneNumber }; It's an inner join between 'all persons' and 'all person phones or a default value'. In other words: it's an (left) outer join. On top of that, it only fetches the needed columns from both tables. An indeed this query yields a T-SQL query that is spot on the ideal version. I forgive EF40 for returning the primary key column. This takes no extra database resources -just bandwidth- and you probably need the value anyway in your business layer:          SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],                  [Extent1].[LastName] AS [LastName],                 [Extent2].[PhoneNumber] AS [PhoneNumber]            FROM [Person].[Person] AS [Extent1] LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2]              ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID] The same query also runs in L2S, but does not return the BusinessEntityId column. And neither of the queries cause an internal sort! Tip 5: Always return custom data transfer objects Your should avoid returning Self Tracking Entities from your data access layer methods. Use lightweight custom data transfer objects. Only this technique will help you to put all previous tips in practice. Source code Here's the full source code of the sample project. I put the L2S code in a separate assembly to avoid namespace collisions: (29,37 kb) Enjoy!

Getting and setting the Transaction Isolation Level on a SQL Entity Connection

This article explains how to get, set, and reset the transaction isolation level on a SQL and Entity connection. In a previous article I already explained how important it is to explicitly set the appropriate isolation level when using transactions. I'm sure you're not going to wrap each and every database call in an explicit transaction (TransactionScope is simply too heavy to wrap around a simple SELECT statement). Nevertheless you should realize that every single T-SQL statement that you launch from your application will run in a transaction, hence will behave according to a transaction isolation level.  If you don't explicitely start a transaction or use a transaction scope, then SQL Server will run the statement as a transaction on its own. You don't want your SQL commands to read unofficial data (in the READ UNCOMMITTED level) or apply too heavy locks (in the SERIALIZABLE level) on the database, so you want to make sure that you're running with the correct isolation level. This article explains how to do this. Setting the Transaction Isolation Level Setting the appropriate isolation level on a session/connection is done with a standard T-SQL statement, like the following: T-SQL SET TRANSACTION ISOLATION LEVEL REPEATABLE READ It's easy to write an extension method on the SqlConnection and EntityConnection classes that allows you to do this call in C#, like this: C# using (AdventureWorks2008Entities model = new AdventureWorks2008Entities()) {     model.Connection.Open();       // Explicitely set isolation level     model.Connection.SetIsolationLevel(IsolationLevel.ReadUncommitted);       // Your stuff here ... } Here's the whole extension method. I implemented it against IDbConnection to cover all connection types. The attached Visual Studio solution contains the full code: C# public static void SetIsolationLevel(this IDbConnection connection, IsolationLevel isolationLevel) {     if (isolationLevel == IsolationLevel.Unspecified || isolationLevel == IsolationLevel.Chaos)     {         throw new Exception(string.Format("Isolation Level '{0}' can not be set.", isolationLevel.ToString()));     }       if (connection is EntityConnection)     {         SqlConnection sqlConnection = (connection as EntityConnection).StoreConnection as SqlConnection;         sqlConnection.SetIsolationLevel(isolationLevel);     }     else if (connection is SqlConnection)     {         IDbCommand command = connection.CreateCommand();         command.CommandText = string.Format("SET TRANSACTION ISOLATION LEVEL {0}", isolationLevels[isolationLevel]);         command.ExecuteNonQuery();     } }   Getting the Transaction Isolation Level If you want to retrieve the current isolation level on your connection, you have to first figure out how to do this in T-SQL. Unfortunately there is no standard @@ISOLATIONLEVEL function or so. Here's how to do it: T-SQL SELECT CASE transaction_isolation_level           WHEN 0 THEN 'Unspecified'           WHEN 1 THEN 'Read Uncommitted'           WHEN 2 THEN 'Read Committed'           WHEN 3 THEN 'Repeatable Read'           WHEN 4 THEN 'Serializable'           WHEN 5 THEN 'Snapshot'        END AS [Transaction Isolation Level]   FROM sys.dm_exec_sessions  WHERE session_id = @@SPID Altough you're querying a dynamic management view, the code requires no extra SQL Permissions (not even VIEW SERVER STATE). A user can always query his own sessions. Again, you can easily wrap this into an extension method, that can be called like this: C# using (AdventureWorks2008Entities model = new AdventureWorks2008Entities()) {     model.Connection.Open();       // Get isolation level     // Probably returns 'ReadUncommitted' (due to connection pooling)     MessageBox.Show(model.Connection.GetIsolationLevel().ToString()); } If you run the same code from the attached project, you'll indeed notice that the returned isolation level will be -most probably- READ UNCOMMITTED. This is because we -most probably- reuse the connection from the SetIsolationLevel() sample. As I already mentioned in a previous article, the transaction isolation level is NOT reset on pooled connections. So even if you're not explicitly use transactions, you still should still set the appropriate transaction isolation level. There's no default you can rely on. OK, here's the corresponding extension method: C# public static IsolationLevel GetIsolationLevel(this IDbConnection connection) {     string query =         @"SELECT CASE transaction_isolation_level                     WHEN 0 THEN 'Unspecified'                     WHEN 1 THEN 'ReadUncommitted'                     WHEN 2 THEN 'ReadCommitted'                     WHEN 3 THEN 'RepeatableRead'                     WHEN 4 THEN 'Serializable'                     WHEN 5 THEN 'Snapshot'                     END AS [Transaction Isolation Level]             FROM sys.dm_exec_sessions             WHERE session_id = @@SPID";       if (connection is EntityConnection)     {         return (connection as EntityConnection).StoreConnection.GetIsolationLevel();     }     else if (connection is SqlConnection)     {         IDbCommand command = connection.CreateCommand();         command.CommandText = query;         string result = command.ExecuteScalar().ToString();           return (IsolationLevel)Enum.Parse(typeof(IsolationLevel), result);     }       return IsolationLevel.Unspecified; } Simple and powerful, isn't it? Stuff like this should ship with the framework! Temporarily using a Transaction Isolation Level With the new GetIsolationLevel() and SetIsolationLevel() methods it becomes easy to set the isolation level to execute some commands, and then reset the level to its original value. I wrapped these calls in a class implementing IDisposable so you can apply the using statement, like this: C# using (AdventureWorks2008Entities model = new AdventureWorks2008Entities()) {     model.Connection.Open();       // Set and reset isolation level     using (TransactionIsolationLevel inner = new TransactionIsolationLevel(model.Connection, IsolationLevel.Snapshot))     {         // Your stuff here ...     } } Again, the code is very straightforward. All you need is a constructor, a Dispose-method, and a variable to store the original isolation level: C# /// <summary> /// Transaction Isolation Level. /// </summary> public class TransactionIsolationLevel : IDisposable {     /// <summary>     /// The database connection.     /// </summary>     private IDbConnection connection;       /// <summary>     /// Original isolation level of the connection.     /// </summary>     private IsolationLevel originalIsolationLevel;       /// <summary>     /// Initializes a new instance of the TransactionIsolationLevel class.     /// </summary>     /// <param name="connection">Database connection.</param>     /// <param name="isolationLevel">Required isolation level.</param>     public TransactionIsolationLevel(IDbConnection connection, IsolationLevel isolationLevel)     {         this.connection = connection;         this.originalIsolationLevel = this.connection.GetIsolationLevel();         this.connection.SetIsolationLevel(isolationLevel);     }       /// <summary>     /// Resets the isolation level back to the original value.     /// </summary>     public void Dispose()     {         this.connection.SetIsolationLevel(this.originalIsolationLevel);     } }   Source Code The attached project contains the extension methods, the IDisposable class, and some demo calls against a local AdventureWorks database: Here it is: (87,53 kb) Enjoy!