In SQL Server, if you join a local table with a table on a linked server (e.g. a remote Oracle instance) you should be prepared for horrible performance. In a lot of scenarios it makes a lot more sense to tell the remote server exactly what you need, store that data in a temporary table, and join locally with it. A couples of weeks ago I used this technique to bring the response time of some queries from SQL Server to a linked Oracle instance down from 500 seconds to less than one second.
Let's say we want to enumerate the countries that use the Euro as currency, like this (in AdventureWorks2008):
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN Person.CountryRegion r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
Here's the result:
Let's suppose that the names of the countries are in a separate database (e.g. an Oracle on an AS/400 box), and there's no replication in place. From SQL Server, we can get access to that source by defining a linked server. For demonstration and practical purposes -I don't have a portable AS/400 with Oracle on it- I'll create a linked server to the local SQL instance:
execute sp_addlinkedserver '.'
The distributed query will now look like this:
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN [.].AdventureWorks2008.Person.CountryRegion r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
For this particular query in this particular configuration, the response time is actually still nice (it's already five times slower, but you don't really notice that). In real-life queries -and with a real remote Oracle- you'll notice a dramatic decrease in performance. For this demo configuration, you can use Sql Profiler to reveal the query that was sent to the linked server. Instead of performing a selective look-up, SQL Server selected ALL of the rows, and forced even a SORT on it:
SELECT "Tbl1003"."CountryRegionCode" "Col1011","Tbl1003"."Name" "Col1012"
FROM "AdventureWorks2008"."Person"."CountryRegion" "Tbl1003"
ORDER BY "Col1011" ASC
Here's a small part of the result for the query:
You can imagine what will happen if your lookup target is not a small local table but a large complex view. This is bad for the remote machine, the local machine and the network between the two. All of this happens because SQL Server will try to optimize its own workload, and considers the linked server as a black box (which -in the case of an AS/400- it actually ìs ).
What we should send to the linked server is a request for a -limited- number of key-value pairs, such as SELECT id, name FROM blablabla WHERE id in ('id1', 'id2', ...). We should send this query via the OPENQUERY function, so we can use the native SQL syntax of the remote DMBS. A classic way to create a short comma-separated list in T-SQL is with a variable and the COALESCE function. If the key is not numeric, then you need to wrap each value in quotes. OPENQUERY uses OLEDB under the hood, and this doesn't like double quotes. So you have to wrap each value in two single quotes that you have to wrap in single quotes during the concatenation. Oops, you're lost ? Just look at the code:
DECLARE @Countries VARCHAR(MAX)
-- Create comma-separated list of lookup values
;WITH Countries AS
(
SELECT CountryRegionCode
FROM Sales.CountryRegionCurrency
WHERE CurrencyCode = 'EUR'
)
SELECT @Countries = COALESCE(@Countries + ',', '') + '''''' + CountryRegionCode + ''''''
FROM Countries
After these calls, the @Countries variable holds a comma-separated list of country codes:
Unfortunately OPENQUERY does'nt take parameters, so we need to construct the whole query dynamically, and call it via EXECUTE. To store the result, we need to create a temporary table, because unfortunately table variables disappear from the scope with EXECUTE:
CREATE TABLE #Countries (CountryRegionCode nvarchar(3), Name nvarchar(50))
DECLARE @Query VARCHAR(MAX)
SET @Query = 'INSERT #Countries ' +
'SELECT * FROM OPENQUERY ([.], ' +
'''SELECT CountryRegionCode, Name ' +
'FROM AdventureWorks2008.Person.CountryRegion ' +
'WHERE CountryRegionCode IN (' + @Countries + ')'')'
EXECUTE (@Query)
This is the value of the @Query variable:
After these calls, the #Countries table contains the remote data (at least the fraction we're interested in):
So we now can join locally:
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN #Countries r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
And while the complexity of the code dramatically increased, the response time went down equally dramatically ...
For the sake of completeness, here's the whole demo script:
/***************/
/* Preparation */
/***************/
/* Add linked server to local instance */
execute sp_addlinkedserver '.'
USE AdventureWorks2008
GO
/********/
/* Test */
/********/
-- Local query
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN Person.CountryRegion r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
-- Query through linked server
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN [.].AdventureWorks2008.Person.CountryRegion r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
-- The query sent to the linked server (from Sql Profiler)
SELECT "Tbl1003"."CountryRegionCode" "Col1011","Tbl1003"."Name" "Col1012"
FROM "AdventureWorks2008"."Person"."CountryRegion" "Tbl1003"
ORDER BY "Col1011" ASC
/**************/
/* Workaround */
/**************/
DECLARE @Countries VARCHAR(MAX)
-- Create comma-separated list of lookup values
;WITH Countries AS
(
SELECT CountryRegionCode
FROM Sales.CountryRegionCurrency
WHERE CurrencyCode = 'EUR'
)
SELECT @Countries = COALESCE(@Countries + ',', '') + '''''' + CountryRegionCode + ''''''
FROM Countries
-- OLE DB Drivers don't like double quotes, so we have to hexuplicate ;-))
-- Uncomment next line for testing
-- SELECT @Countries
-- Create temporary table to hold results from query to linked server
CREATE TABLE #Countries (CountryRegionCode nvarchar(3), Name nvarchar(50))
DECLARE @Query VARCHAR(MAX)
-- Build query to linked server
SET @Query = 'INSERT #Countries ' +
'SELECT * FROM OPENQUERY ([.], ' +
'''SELECT CountryRegionCode, Name ' +
'FROM AdventureWorks2008.Person.CountryRegion ' +
'WHERE CountryRegionCode IN (' + @Countries + ')'')'
-- Uncomment next line for testing
-- SELECT @Query
-- Execute query to linked server
EXECUTE (@Query)
-- Uncomment next line for testing
-- SELECT * FROM #Countries
-- Execute query entirely locally
SELECT c.CurrencyCode, r.Name
FROM Sales.CountryRegionCurrency c
INNER JOIN #Countries r
ON c.CountryRegionCode = r.CountryRegionCode
WHERE c.CurrencyCode = 'EUR'
DROP TABLE #Countries
/************/
/* Teardown */
/************/
/* Remove linked server */
execute sp_dropserver '.'