The Missing Linq to SQL Spatial

This article provides hints and hacks on how to use the SQL Server spatial data types -Geography and Geometry- in Linq to SQL. The data provider has a problem with serializing and deserializing the SQL UDT's. If you ever tried to use these data types in a Linq to SQL (or Entity Framework) project then you certainly encountered the following error: “One or more selected items contain a data type that is not supported by the designer”.

  • Setting up a test environment

I created a small database called 'Spatial' with a schema called 'Europe', and a table called 'Countries'. Here's the structure of the table:

CREATE TABLE [Europe].[Countries](

    [CountryID] [int] IDENTITY(1,1) NOT NULL,

    [CountryName] [nvarchar](50) NOT NULL,

    [Shape] [geography] NOT NULL,

 CONSTRAINT [PK_Countries] PRIMARY KEY CLUSTERED

(

    [CountryID] ASC

)


The source code that accompanies this article contains the scripts to create the assets and populate the table. Here's an indication of the table contents:

I created a Visual Studio 2008 Console application, and added a reference to Microsoft.SqlServer.Types. Then I added Linq to SQL classes (Europe.dbml). When I dropped the Europe.Counties table into the designer, the result was the expected error message:

  • Hint 1: Use views that cast the data type

I decided to create a view in the schema: vw_Countries. It exposes the same signature as the table, but with one difference: the shape column is returned as Varbinary(MAX) instead of Geography:

CREATE VIEW [Europe].[vw_Countries]

AS

SELECT CountryID

      ,CountryName

      ,CONVERT(VARBINARY(MAX), Shape) AS Shape

FROM Europe.Countries

 

The Geography UDT is physically stored as Varbinary(MAX) in SQL Server, so the table and the view are basically the same. There's one important difference though: Visual Studio is not allergic to the view. You can easily drag and drop it into the data model designer:

The Linq data provider delivers the varbinary data as System.Data.Linq.Binary, but is -strangly enough- unable to convert this back to SqlGeography. It took me a while to figure out the conversion myself, but here's a working version:

Binary belgiumb = (from c in db.vw_Countries

                   where c.CountryID == 4

                   select c.Shape).FirstOrDefault();

SqlGeography geo = new SqlGeography();

geo.Read(new System.IO.BinaryReader(new System.IO.MemoryStream(belgiumb.ToArray())));

 

  • Hint 2: Package your conversion code

We're going to get stuck with this type of conversions for a while, so it makes sense to wrap the calculations e.g. in extension methods. Here's an example:

/// <summary>

/// Converts a Linq Binary to a SQL Server Geograpy.

/// </summary>

/// <remarks>Throws an Exception if the Binary contains invalid data.</remarks>

public static SqlGeography AsGeography(this Binary binary)

{

    if (binary == null)

    {

        return SqlGeography.Null;

    }

 

    SqlGeography result = new SqlGeography();

 

    result.Read(

        new System.IO.BinaryReader(

            new System.IO.MemoryStream(

                binary.ToArray())));

 

    return result;

}

 

So now I can call it like this:

SqlGeography b = (from c in db.vw_Countries

                  where c.CountryID == 4

                  select c.Shape).FirstOrDefault().AsGeography();

Console.WriteLine(" The area of Belgium is {0} m²", b.STArea().Value);

 

  • Hack 1: Change source mappings

I wanted to execute my queries directly against the table. So I copy/pasted the view in the dbml, and modified the source mapping:

Now I can call the queries like this:

belgium =

    SqlGeography.Parse((from c in db.Countries

                        where c.CountryID == 4

                        select c.Shape.ToString()).FirstOrDefault());

 

  • Hack 2: Change data type mappings

Since I wanted IntelliSense, I copy/pasted the Country table again, and changed the .NET type of the Shape property to SqlGeography:

The good news is: I now have IntelliSense at design time:

The bad news is: I will have exceptions all over the place.

If you return a spatial data type in the select, then a query is successfully constructed and sent to SQL Server, but the return values cannot be deserialized. Here's such a query:

belgium = (from c in db.Countries2

           where c.CountryID == 4

           select c.Shape).FirstOrDefault();

 

It results in a runtime exception:

If you call a method -apart from ToString()- on a spatial data type in the where-clause, then you'll bump into a NotSupportedException already at compile time. Here's such a query:

var query = (from c in db.Countries2

             where (c.Shape.STArea() > 10).Value

             select c.CountryName).FirstOrDefault();

 

And its result at compile time:

  • Intermezzo

I tried to use Varbinary as data type for the _Shape backing variable, and SqlGeography as data type for the Shape property, and call conversions in the getter and setter. It didn't work: the getters and setters seem to be bypassed by the Linq provider.

Conclusion: hack 2 was not a good idea. We'll stop using the Countries2 entity...

  • Hint 3: Use scalar functions

Since STIntersects cannot be used, I created a scalar function that takes two Well-Known Texts, and returns whether or not the shapes intersect:

CREATE FUNCTION [Europe].[Intersects]

(

    @Shape1 NVarchar(MAX),

    @Shape2 NVarchar(MAX)

)

RETURNS integer

AS

BEGIN

 

    DECLARE @Geo1 Geography = geography::Parse(@Shape1)

    DECLARE @Geo2 Geography = geography::Parse(@Shape2)

 

    RETURN @Geo1.STIntersects(@Geo2)

 

END

 

Again, Visual Studio's designer has no problem with this, so you can drag and drop the function into the DataContext:

So you can use it in your queries:

var query = from c in db.Countries

            where db.Intersects(c.Shape.ToString(), belgium.ToString()) == 1

            select c.CountryName;

 

The response time seems to be good enough against such a small table. But I know that it is never a good idea to call functions in the where-clause of a query. It will force a table scan and an evaluation of the function for each row in the table. If you can live with the performance, then I suggest you just stick to this type of functions: they're highly reusable since they don't contain any hard-code table names.

  • Hint 4: Use table valued functions

If your geographical tables get bigger, it makes sense to define table valued functions or stored procedures to do the heavy lifting. The following example doesn't mask the usage of STIntersects in the where-clause, and may let SQL Server decide to use a spatial index (or you may even use a hint):

CREATE FUNCTION [Europe].[Intersectors]

(   

    @Shape NVarchar(MAX)

)

RETURNS TABLE

AS

RETURN

(

    SELECT CountryID, CountryName

    FROM Europe.Countries

    WHERE Shape.STIntersects(geography::Parse(@Shape)) = 1

)

 

And yes: I do realize that Intersectors probable doesn't appear in a regular dictionary...

Here's how a call to it looks like in Linq - we still have IntelliSense:

var query = from c in db.Intersectors(belgium.ToString())

             select c.CountryName;

 

I couldn't keep myself from comparing the performance of the two functions. When I checked the actual query plans,my jaw hit the ground Surprised, ouch! Completely against my expectation, the cost of the table value function was higher than the cost of the scalar function. And not just a little bit, but 5 times higher:

Anyway, you should never blindly base your conclusions on just the query plans, so I started the SQL Profiler to inspect reality. After thorough testing I was relieved Tongue out. I observed that the tabled valued function easily outruns the scalar version: it consumes between 3 and 8 times less CPU and returns the results 5 to 10 times faster. This is actually an impressive difference for such a small table. Here's the result from a test session (scripts are included in the source code):

A negative aspect of this table valued function is its reusability: the function only works against the Europe.Countries table. If you need to query more tables with spatial data, then you need to add complexity to it, or start copy/pasting.

  • Conclusions

The current Linq to Sql provider doesn't support the SQL Server spatial data types properly, so you have to be creative if you want or need to use these. It's not a waste of time to invest some effort in implementing the work arounds I suggested, and in optimizing and performance tuning these. Your solution will last for a while, since there seems to be no Microsoft solution on the horizon. The problems are not solved in SQL 2008 R2, Visual Studio 2010, or .NET 4.0.

  • Source code

Here's the source code for the test program. It also contains the necessary SQL scripts for creating and populating the entities, as well as for testing and performance tuning: U2UConsult.DockOfTheBay.LinqToSpatialSample.zip (153,76 kb)

Here's how its output should look like:

  • Credits

I would like to thank my colleague Kris Vandermotten for his input during the research.


Converting Spatial Coordinates with Proj.NET

In my previous article I expressed some disappointment in the usefulness of the Map Projections in SQL Spatial Tools on CodePlex. There's not much you can do with these in a real-life application. Fortunately there's also Proj.NET on CodePlex, a flexible advanced point-to-point coordinate conversion engine that is used internally by a lot of open source GIS projects.

Here's a small fraction of its object model:

Geographic Coordinate Systems 

The only predefined geographic coordinate system is WGS84:

ICoordinateSystem gcs_WGS84 = GeographicCoordinateSystem.WGS84;

 

But you can easily create your own, from Well-Known Text (WKT) or through the object model:

string wkt_WGS84 =

    "GEOGCS[\"GCS_WGS_1984\"," +

         "DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137,298.257223563]]," +

         "PRIMEM[\"Greenwich\",0]," +

         "UNIT[\"Degree\",0.0174532925199433]" +

    "]";

ICoordinateSystem gcs_WGS84 = CoordinateSystemWktReader.Parse(wkt_WGS84) as ICoordinateSystem

Projected Coordinate Systems

The only predefined projected coordinate system is UTM:

IProjectedCoordinateSystem pcs_UTM31N = ProjectedCoordinateSystem.WGS84_UTM(31, true);

 

But again, you can easily create your own, from WKT or through an object model. Here's how Lambert 2008 -a local Belgian projection- looks like:

string wkt_Lam08 =

    "PROJCS[\"ETRS89 / Belgian Lambert 2008\"," +

        "GEOGCS[\"ETRS89\"," +

            "DATUM[\"European Terrestrial Reference System 1989\"," +

                "SPHEROID[\"GRS 1980\",6378137.0,298.257222101," +

                    "AUTHORITY[\"EPSG\",\"7019\"]]," +

                "TOWGS84[0.0,0.0,0.0,0.0,0.0,0.0,0.0]," +

                "AUTHORITY[\"EPSG\",\"6258\"]]," +

            "PRIMEM[\"Greenwich\",0.0," +

                "AUTHORITY[\"EPSG\",\"8901\"]]," +

            "UNIT[\"degree\",0.017453292519943295]," +

            "AXIS[\"Geodetic latitude\",NORTH]," +

            "AXIS[\"Geodetic longitude\",EAST]," +

            "AUTHORITY[\"EPSG\",\"4258\"]]," +

        "PROJECTION[\"Lambert Conic Conformal (2SP)\"]," +

        "PARAMETER[\"central_meridian\",4.359215833333335]," +

        "PARAMETER[\"latitude_of_origin\",50.79781500000001]," +

        "PARAMETER[\"standard_parallel_1\",49.833333333333336]," +

        "PARAMETER[\"false_easting\",649328.0]," +

        "PARAMETER[\"false_northing\",665262.0]," +

        "PARAMETER[\"standard_parallel_2\",51.16666666666667]," +

        "UNIT[\"m\",1.0]," +

        "AXIS[\"Easting\",EAST]," +

        "AXIS[\"Northing\",NORTH]," +

        "AUTHORITY[\"EPSG\",\"3812\"]]";

IProjectedCoordinateSystem pcs_Lam08 =

    CoordinateSystemWktReader.Parse(wkt_Lam08) as IProjectedCoordinateSystem;

 

The list of supported projections includes Mercator, Transverse Mercator, Albers, Lambert Conformal, and Krovak.

If you download the project's source code, you'll also discover an SRIDReader class that allows you to instantiate a coordinate system from nothing more than its Spatial Reference ID.

Transformations

A coordinate conversion can be defined and called through the API as follows:

CoordinateTransformationFactory ctfac = new CoordinateTransformationFactory();

ICoordinateTransformation trans = ctfac.CreateFromCoordinateSystems(gcs_WGS84, pcs_UTM31N);

double[] fromPoint = new double[] { 4.296545, 50.880324 };  // U2U Consult Head Office, in degrees

double[] toPoint = trans.MathTransform.Transform(fromPoint);

 

If you're lucky, then you also get the inverse transformation, but it's not always implemented.

try

{

    IMathTransform inversedTransform = trans.MathTransform.Inverse();

    double[] point = inversedTransform.Transform(toPoint);

}

catch (NotImplementedException ex)

{

    // Your exception handling here...

}

 Test Client

Here are U2U Consult's Headquarter's coordinates:

I built a small test client that applies some transformations on these (WGS 84 - UTM - Lambert 1972 - Lambert 2008). Here's how the result looks like:

For the sake of completeness: here's the full source code: ProjNetClient.zip (38,93 kb)


Tuning SQL Server Lookups to a Linked Server

In SQL Server, if you join a local table with a table on a linked server (e.g. a remote Oracle instance) you should be prepared for horrible performance. In a lot of scenarios it makes a lot more sense to tell the remote server exactly what you need, store that data in a temporary table, and join locally with it. A couples of weeks ago I used this technique to bring the response time of some queries from SQL Server to a linked Oracle instance down from 500 seconds to less than one second.

Let's say we want to enumerate the countries that use the Euro as currency, like this (in AdventureWorks2008):

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN Person.CountryRegion r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

Here's the result:

 

Let's suppose that the names of the countries are in a separate database (e.g. an Oracle on an AS/400 box), and there's no replication in place. From SQL Server, we can get access to that source by defining a linked server. For demonstration and practical purposes -I don't have a portable AS/400 with Oracle on it- I'll create a linked server to the local SQL instance:

execute sp_addlinkedserver '.'

 

The distributed query will now look like this:

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN [.].AdventureWorks2008.Person.CountryRegion r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

For this particular query in this particular configuration, the response time is actually still nice (it's already five times slower, but you don't really notice that). In real-life queries -and with a real remote Oracle- you'll notice a dramatic decrease in performance. For this demo configuration, you can use Sql Profiler to reveal the query that was sent to the linked server. Instead of performing a selective look-up, SQL Server selected ALL of the rows, and forced even a SORT on it:

    SELECT "Tbl1003"."CountryRegionCode" "Col1011","Tbl1003"."Name" "Col1012"

      FROM "AdventureWorks2008"."Person"."CountryRegion" "Tbl1003"

  ORDER BY "Col1011" ASC

 

Here's a small part of the result for the query:

 

You can imagine what will happen if your lookup target is not a small local table but a large complex view. This is bad for the remote machine, the local machine and the network between the two. All of this happens because SQL Server will try to optimize its own workload, and considers the linked server as a black box (which -in the case of an AS/400- it actually ìs Wink).

What we should send to the linked server is a request for a -limited- number of key-value pairs, such as SELECT id, name FROM blablabla WHERE id in ('id1', 'id2', ...). We should send this query via the OPENQUERY function, so we can use the native SQL syntax of the remote DMBS. A classic way to create a short comma-separated list in T-SQL is with a variable and the COALESCE function. If the key is not numeric, then you need to wrap each value in quotes. OPENQUERY uses OLEDB under the hood, and this doesn't like double quotes. So you have to wrap each value in two single quotes that you have to wrap in single quotes during the concatenation. Oops, you're lost ? Just look at the code:

DECLARE @Countries VARCHAR(MAX)

 

-- Create comma-separated list of lookup values

;WITH Countries AS

(

SELECT CountryRegionCode

  FROM Sales.CountryRegionCurrency

 WHERE CurrencyCode = 'EUR'

)

SELECT @Countries = COALESCE(@Countries + ',', '') + '''''' + CountryRegionCode + ''''''

  FROM Countries

After these calls, the @Countries variable holds a comma-separated list of country codes:

 

Unfortunately OPENQUERY does'nt take parameters, so we need to construct the whole query dynamically, and call it via EXECUTE. To store the result, we need to create a temporary table, because unfortunately table variables disappear from the scope with EXECUTE:

CREATE TABLE #Countries (CountryRegionCode nvarchar(3), Name nvarchar(50))

 

DECLARE @Query VARCHAR(MAX)

 

SET @Query = 'INSERT #Countries ' +

             'SELECT * FROM OPENQUERY ([.], ' +

             '''SELECT CountryRegionCode, Name ' +

               'FROM AdventureWorks2008.Person.CountryRegion ' +

               'WHERE CountryRegionCode IN (' + @Countries + ')'')'

 

EXECUTE (@Query)

 

This is the value of the @Query variable: 

 

After these calls, the #Countries table contains the remote data (at least the fraction we're interested in):

 

So we now can join locally:

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN #Countries r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

And while the complexity of the code dramatically increased, the response time went down equally dramatically ...

For the sake of completeness, here's the whole demo script:

/***************/

/* Preparation */

/***************/

 

/* Add linked server to local instance */

execute sp_addlinkedserver '.'

 

USE AdventureWorks2008

GO

 

/********/

/* Test */

/********/

 

 -- Local query

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN Person.CountryRegion r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

 -- Query through linked server

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN [.].AdventureWorks2008.Person.CountryRegion r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

 -- The query sent to the linked server (from Sql Profiler)  

    SELECT "Tbl1003"."CountryRegionCode" "Col1011","Tbl1003"."Name" "Col1012"

      FROM "AdventureWorks2008"."Person"."CountryRegion" "Tbl1003"

  ORDER BY "Col1011" ASC

 

/**************/

/* Workaround */

/**************/

 

DECLARE @Countries VARCHAR(MAX)

 

-- Create comma-separated list of lookup values

;WITH Countries AS

(

SELECT CountryRegionCode

  FROM Sales.CountryRegionCurrency

 WHERE CurrencyCode = 'EUR'

)

SELECT @Countries = COALESCE(@Countries + ',', '') + '''''' + CountryRegionCode + ''''''

  FROM Countries

-- OLE DB Drivers don't like double quotes, so we have to hexuplicate ;-))

 

-- Uncomment next line for testing

-- SELECT @Countries

 

-- Create temporary table to hold results from query to linked server

CREATE TABLE #Countries (CountryRegionCode nvarchar(3), Name nvarchar(50))

 

DECLARE @Query VARCHAR(MAX)

 

-- Build query to linked server

SET @Query = 'INSERT #Countries ' +

             'SELECT * FROM OPENQUERY ([.], ' +

             '''SELECT CountryRegionCode, Name ' +

               'FROM AdventureWorks2008.Person.CountryRegion ' +

               'WHERE CountryRegionCode IN (' + @Countries + ')'')'

 

-- Uncomment next line for testing

-- SELECT @Query

 

-- Execute query to linked server

EXECUTE (@Query)

 

-- Uncomment next line for testing

-- SELECT * FROM #Countries

 

-- Execute query entirely locally

    SELECT c.CurrencyCode, r.Name

      FROM Sales.CountryRegionCurrency  c

INNER JOIN #Countries r

        ON c.CountryRegionCode = r.CountryRegionCode

     WHERE c.CurrencyCode = 'EUR'

 

DROP TABLE #Countries

 

/************/

/* Teardown */

/************/

 

/* Remove linked server */

execute sp_dropserver '.'


SQL Spatial Tools: Map Projections

SQL Server Spatial Tools on CodePlex contains useful extra functions for the SqlGeometry and SqlGeography data types, as well as a new data type for affine transformations (to scale, translate, and rotate) and a handful of Map Projections. This article describes how to use these projections and visualize the result in Windows Presentation Foundation.

All projections are instantiated from static method calls against the SqlProjection class, with one to five parameters. SQL Spatial Tools contains sample T-Sql scripts, but here's how it looks like in C# (for the inverse projection, don't forget to first assign a Spatial Reference System Identifier to the geometry):

Projection

SqlGeography shape3D = new SqlGeography();

shape3D = SqlGeography.Parse("some valid WKT"); // Or read from DB

SqlProjection proj = SqlProjection.LambertConformalConic(0, 90, 12, 36);

SqlGeometry shape2D = proj.Project(shape3D);

Inverse Projection

SqlGeometry shape2D = SqlGeometry.Parse("some valid WKT");

shape2D.STSrid = 4326; // WGS 84

SqlProjection proj = SqlProjection.AlbersEqualArea(0, 0, 15, 30);

SqlGeography shape3D = proj.Unproject(shape2D);

 

That's all there is to!

 

I added this code to an improved version of my SqlGeometry Extension Methods for WPF Visualization. Here's how the resulting application looks like, displaying a reduced shape of Belgium, and the exact location of my office @ U2U Consult: 

 

OK, I admit: projecting Belgium is not that spectacular. It will have the same shape in virtually every projection: it's a small surface on an average latitude.

So let's apply some transformations on a more representative victim such as the Tropic of Cancer. This will generally be projected as a straight line, but if you stand on the North Pole -e.g. via a Gnomonic Projection- it looks like a circle:

 And with a Transverse Mercator projection it should look like an Ellipse:

 

Unfortunately the SQL Spatial Tools code only implements the Spherical version of the Transverse Mercator projection, and not (yet ?) the Ellipsoidal version. Otherwise SQL Spatial Tools would have all the ingredients for a latitude/longitude WGS84 to UTM conversion. After all, you only need to project, then scale (to convert to meters), and finally translate (for false easting). This is a horrible calculation, but don't worry: Proj.NET on CodePlex should be able to handle this (this feels like a topic for a later articleLaughing).

Anyway, here's the full solution: U2UConsult.DockOfTheBay.SpatialProjectionsSample.zip (144,07 kb).