U2U Blog

for developers and other creative minds

TextBox

Deciding when to do a differential backup

SQL Server allows for differential backups, which only backups extends modified since the last full backup. The advantage of these differential backups is that if only a small portion of your data has changed since the last full backup, you need less time and storage for your backups! The disadvantage however is that the time needed for a restore increases: We must first restore the full backup, then restore the last differential backup.
So when few extends have changed since the last full backup, the gain on the backup is huge, and the pain when restoring is limited. But as more and more extends are being modified, the pain grows and the gain shrinks. So to decide whether we go for a full or differential backup we need to know the number of extends modified since the last full backup. But on SQL Server 2016 and earlier the only easy way to figure out was… by taking a differential backup.

SQL Server 2017 offers a nice improvement on this. In the sys.dm_db_file_space_usage dynamic management view an extra column modified_extent_page_count has been added which provides this information. So if you’re willing to stick to differential backups till they are 70% of the size of your full database, you could use this script to get the backup advice:

SELECT total_page_count, allocated_extent_page_count, modified_extent_page_count
, modified_extent_page_count * 100 / allocated_extent_page_count AS [% changed]
, CASE WHEN modified_extent_page_count * 100 / allocated_extent_page_count > 70
    THEN 'FULL'
    ELSE 'DIFFERENTIAL'
    END AS Advice
FROM sys.dm_db_file_space_usage

An example:

Capture

So with this we can improve our maintenance plan. Next step: convince Ola to include this in his maintenance script Smile

Building custom reports for SQL Server management

Recently I presented a session at the ITPROceed conference on ‘Administration intelligence’: using Business Intelligence solutions for analyzing log files, generate documentation, do disk quota estimations and other typical administration tasks. You can find the full presentation at SlideShare.

In this blog post, I want to focus on one of the techniques explained in this talk: creating Reporting Services reports to run in Management Studio.

SQL Server Management Studio reports

SQL Server Management Studio contains a bunch of predefined reports. In the Object Explorer, right click any database, select Reports –> Standard Reports and you get a listing of nearly 20 predefined reports:
image

 

If we select any of these reports, a new tab open in Management Studio in which the report renders. There are a few special things to notice about these reports:

  1. They can be exported in PDF, Word and Excel format, as well as being printed
  2. It can be a multi-page report. There is no explicit way to go to the next or previous page, but if you scroll past the end of the current page, you go to the next page.
  3. The report is context sensitive: if you open the same report by clicking on a different database, it shows information for that specific database.

Before we dive into building our own reports, I just wanted to point out that we have these predefined reports not only at the level of databases, but also at the server instance level, as well as on the Data Collection node (Management), the Integration Services Catalogs (from 2012 on) and the SQL Server Agent node. Just go ahead and try these predefined reports: They provide very useful information without the effort of building reports yourself… plus they might inspire you in building your own reports.

From report consumer to report producer

These predefined reports can make our life easier, but sometimes you want a report different from these predefined reports.Building a custom report is easy.

Step 1: Retrieving the data

The first (and often most difficult) step in creating any report is in retrieving the right data. Luckily SQL Server provides tons of information via the many management views. Just start typing SELECT * FROM sys. and the completion tool will show you the huge list of possible completions. If you want to read more on these before you continue, checkout the web, e.g. mssqltips.com has some interesting reading material for you.

In this example we want to report on log file sizes: what are the biggest log files I have on any server, and home many room for growth (or shrinking) do we still have in there.

A query with which we can get started is this one:

SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Log File(s) Size (KB)'
,'Log File(s) Used Size (KB)'
)
AND instance_name <> '_Total'
This returns a table such as this:
image
 
But in reporting it is important to present the data as easy digestible as possible. It is easier to build reports if different measures (in our case log size and log used size) are in different columns versus all in one single column (as they are right now). The PIVOT statement in T-SQL might help us out on that. Also, simple resorting and renaming might improve the readability of the result. So after a bit of a rewrite we end up with this query:
SELECT [Database Name]
,[Log File(s) Used Size (KB)] AS LogUsedSize
,[Log File(s) Size (KB)] AS LogTotalSize
FROM (
SELECT RTRIM(instance_name) AS [Database Name]
,cntr_value AS [Size in Kb]
,RTRIM(counter_name) AS [Counter Name]
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Log File(s) Size (KB)'
,'Log File(s) Used Size (KB)'
)
AND instance_name <> '_Total'
) AS src
PIVOT(SUM([Size in Kb]) FOR [Counter Name] IN (
[Log File(s) Size (KB)]
,[Log File(s) Used Size (KB)]
)) AS pvt
ORDER BY [Log File(s) Size (KB)] DESC
image
 
Just copy the query in the clipboard, we will need it for the next step:

Step 2: Building the actual report

To build a report which we can run in Management Studio, we need to have SQL Server Data Tools (SSDT) installed (the tools formerly known as Business Intelligence Development Studio, BIDS, so you can use BIDS as well if you’re not yet on 2012). This is part of the regular SQL Server setup, so you can find it on your SQL Server 2012 or later installation media. But you can also just go and download it from the Microsoft Download Center.
After downloading, installing and starting this tool (which is just a Visual Studio template), we instruct it to create a new project. In the list of installed templates, you should find back the Business Intelligence group, select the Reporting Services group within, and on the right, click the Report Server Project Wizard and provide a useful project name:
image
 
Once you click OK, the wizard will start. Click Next to get past the start screen.
On the next screen the wizard asks which type of database connection we want to have, and to which server and database we want to connect. This is the tricky part, because we actually want to connect to whatever server and database we click upon in Management Studio object explorer. But now, we first need to develop this report in SSDT. Luckily, there is an easy way out: If we use the Microsoft SQL Server data source type, whatever server and database we select will be replaced with the ‘current’ server and database whenever we run this from Management Studio. So lets just connect to a server (I’m using localhost over here, but any server to which you can connect from SSDT will do):
image 
 
Notice that if you want your report to not query the current database but a fixed database (e.g. always master or msdb), you should use three part names in your query (e.g. SELECT * FROM msdb.dbo.suspect_pages).
 
Next screen asks for the query to retrieve the data for this report. Paste the query we’ve written earlier in this blog into this window. The next screen asks for a report type, just confirm the tabular report. Then we have a screen that asks for the report design. Just drag all three fields into the Details window and click Finish. Then we hit the last wizard screen where we give this report a meaningful name: Log Size. Click Finish.

Step 3: fine-tuning the report

After we finish the wizard we end up in the editor. Click the preview tab to get an impression of what the report will look once deployed:
image
 
This report only looks slightly better than the table we got in Management Studio earlier on. We need to make a couple of improvements.
Switch back to the Design tab to widen the database name column (just drag the column header to the right).
Finally we want to have a graphical indication on the size of these log files. To achieve this, right-click the rightmost column header in our report and add a column to the right:
image
 
Next, from the Toolbox window, drag the Data Bar component into the bottom cell of this newly created column. In the pop-up window, just confirm the default data bar type. Finally, click the blue bar that appears in this cell. On the right another pop-up window appears. Click the green plus next to Values and click LogTotalSize:
image
 
Now preview the report again:
image
 

Step 4: Run in Management Studio

Now that we have a report which is easy to read, let’s make sure we can easily open up this report in Management Studio. In SSDT, in the File menu select Save Log File.rdl As... In the save dialog, navigate to My Documents > SQL Server Management Studio > Custom Reports and save it over there. Notice that this is not required, Management Studio can open reports from any file location; It’s just the default location for these files.
Finally, we can now open up Management Studio, right click any object in the object explorer, select Reports > Custom Reports, and select the file we just saved there.
image
 
If you can connect to multiple instances of SQL Server, try out this report on all of them, and you will see how the report takes the server context into account.
image
 
Right click to print and export the report… you can have all the options we had on the predefined reports.
 
This is not the end of it, it’s just the start. There is a lot more to learn about building Reporting Services reports, as well as about all the useful information we can extract from SQL Server. So, give it a try!
 

Download

If you just want to download the .rdl file we created in this post, you can find it at http://1drv.ms/1yQjzMD

The whole world in a SQL Server database

Some people have the whole world in their hand, but last weekend I tried something slightly less ambitious: Getting (part of) the whole world in a SQL Server database.

Loads of spatial data from OpenStreetMap

I’m always looking for interesting datasets. I was still lacking a large dataset with spatial data. The biggest public spatial dataset I’m aware of is OpenStreetMap (OSM). It’s a map service comparable with Bing or Google Maps. But the data it uses is collected from public data: Some governments provide public location information, but there are also thousands of volunteers who help OSM in putting their street or village ‘on the map’.

The resulting database is public as well. It is described in XML format, and I found on the wiki also an OSM XSD schema. This allows me to load the data with SQL Server Integration Services, using the data flow XML data source. There might be faster solutions than this, but for the time being, SSIS does a rather good job with a very simple setup.

Creating the database

So, let’s create the database in which we want to store our data. Be sure to create a database with an initial size big enough to host the data you want to load. The OSM xml file with all the data from the whole planet is about 250 Gb at the time of writing, so It will probably take about 150 Gb in SQL Server (XML is not the most compact storage format). In this blog post example I will be loading a subset of the whole planet, so I create a much smaller database (just under 2 Gb):

   1: CREATE DATABASE [OSM]
   2:  ON  PRIMARY 
   3: ( NAME = N'OSM', FILENAME = N'F:\Data\OSM.mdf' , SIZE = 2048000KB , MAXSIZE = UNLIMITED, FILEGROWTH = 512000KB )
   4:  LOG ON 
   5: ( NAME = N'OSM_log', FILENAME = N'F:\Data\OSM_log.ldf' , SIZE = 3170880KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

If you really plan to create a much larger database, don’t forget to enable instant initialization to speed up the process of creating and extending the data file(s).

Next I create all the tables to store the different pieces of information in the OSM XML files. Notice that I do not keep all the information in the XML files. I do not store the person who entered the data, and the date the data was entered. This reduces the amount of data to be stored, which speeds up the data loading. This is the code to create the necessary tables:

   1: USE [OSM]
   2: GO
   3: CREATE TABLE [dbo].[Nodes](
   4:     [node_Id] [bigint] NOT NULL,
   5:     [id] [bigint] NULL,
   6:     [lat] [float] NULL,
   7:     [lon] [float] NULL,
   8:     [geog] [geography] NULL,
   9:  CONSTRAINT [PK_Nodes] PRIMARY KEY CLUSTERED 
  10: (
  11:     [node_Id] ASC
  12: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
  13: ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
  14:  
  15: CREATE TABLE [dbo].[NodeTag](
  16:     [key] [nvarchar](250) NULL,
  17:     [value] [nvarchar](3000) NULL,
  18:     [node_Id] [bigint] NULL
  19: ) ON [PRIMARY];
  20: GO
  21: CREATE TABLE [dbo].[Relation](
  22:     [relation_Id] [bigint] NOT NULL,
  23:     [id] [bigint] NULL,
  24:     [visible] [bit] NULL,
  25:  CONSTRAINT [PK_Relation] PRIMARY KEY CLUSTERED 
  26: (
  27:     [relation_Id] ASC
  28: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
  29: ) ON [PRIMARY]
  30: GO
  31: CREATE TABLE [dbo].[RelationMember](
  32:     [type] [nvarchar](8) NULL,
  33:     [ref] [bigint] NULL,
  34:     [role] [nvarchar](250) NULL,
  35:     [relation_Id] [bigint] NULL,
  36:     [RelationPosition] [int] IDENTITY(1,1) NOT NULL
  37: ) ON [PRIMARY]
  38:  
  39: GO
  40: CREATE TABLE [dbo].[RelationTag](
  41:     [key] [nvarchar](250) NULL,
  42:     [value] [nvarchar](3000) NULL,
  43:     [relation_Id] [bigint] NULL
  44: ) ON [PRIMARY]
  45:  
  46: GO
  47: CREATE TABLE [dbo].[Way](
  48:     [way_Id] [bigint] NOT NULL,
  49:     [id] [bigint] NULL,
  50:     [visible] [bit] NULL,
  51:  CONSTRAINT [PK_Way] PRIMARY KEY CLUSTERED 
  52: (
  53:     [way_Id] ASC
  54: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
  55: ) ON [PRIMARY]
  56:  
  57: GO
  58: CREATE TABLE [dbo].[WayNode](
  59:     [ref] [bigint] NULL,
  60:     [way_Id] [bigint] NULL,
  61:     [WaiNodePosition] [bigint] IDENTITY(1,1) NOT NULL
  62: ) ON [PRIMARY]
  63: GO
  64: CREATE TABLE [dbo].[WayTag](
  65:     [key] [nvarchar](250) NULL,
  66:     [value] [nvarchar](3000) NULL,
  67:     [way_Id] [bigint] NULL
  68: ) ON [PRIMARY]
  69:  
  70: GO

After the destination tables are created, it’s time to download an OSM xml file. I downloaded the file for Belgium from one of the mirror servers. After unzipping the file, it is about 2.4 Gb. All the remaining timings in this article are based on this file, but of course, feel free to experiment with data from other countries as well.

Loading the data with SSIS

Now it’s time to create an Integration Services package which uses an XML source to read the OSM xml file, parse it with the XSD file I mentioned earlier, and write the results to the corresponding tables. I used SQL Server destinations, but if you do be careful to set the destination timeout high enough, or your package will fail on this. You can reuse the SSIS package I made on SQL Server 2012, but be sure to check and correct the pointers to the appropriate files.

Fingers crossed, run the SSIS package and wait till the data is loaded. On my machine it took about 17 minutes. It loaded about 12 million coordinates (Nodes), and more than 1.5 million road segments and other combined objects (Way).

The only table that really contains coordinates is the Nodes table. In order to use SQL Server’s spatial functions on these, we must first create geographical objects out of the latitude and longitude info. This can simply be done with this statement:

   1: UPDATE Nodes SET geog = geography::Point(lat, lon, 4326)

This took about 5 minutes on my machine. From now on, we can start querying this information is we want, but my Belgian OSM file contains about 12 million coordinates… without a decent index, every query is going to take forever, so let’s build a spatial index on this. In the next code example I use the new type of spatial index they introduced in SQL Server 2012, but feel free to try other types of spatial indexes as well. Spatial Ed has a short ‘getting started’ paper if you want to play with spatial indexes.

   1: CREATE SPATIAL INDEX [SpatialIndex-Nodes] ON [dbo].[Nodes]
   2: (
   3:     [geog]
   4: )USING  GEOGRAPHY_AUTO_GRID 
   5: WITH (
   6: CELLS_PER_OBJECT = 16) ON [PRIMARY]
   7: GO

After this index is created, we can get started querying this interesting data source. More on that (and how to create more advanced spatial objects to represent roads, areas etc.) in a later blog post.

Happy querying!
Nico

Why is ADO.Net so slow?

That was the question I recently got from a customer: they had two systems which send comparable queries to the same SQL Server database, and the system using ADO.Net was 'significantly slower' than the other system.

OK, first thing to do if such a question pops up is to run the profiler to measure the difference. And indeed, we saw two queries retrieving the same data, but the one from ADO.Net was more than a factor 20 slower than the other. Very striking was the huge number of page requests (page reads in profiler, or using SET STATISTICS IO ON): the fast query accessed about 100 pages, the ADO.Net one read more than 15000 pages!

The fast query is

exec sp_executesql N'SELECT * FROM dbo.adonetdemo WHERE trackingstring = ''4E0A-4F89-AE'''

The ADO.Net query is

exec sp_executesql N'SELECT * FROM dbo.adonetdemo WHERE trackingstring = @trackingid',
N'@trackingid NVARCHAR', @trackingid = '4E0A-4F89-AE'

My first guess was: wrong execution plan. Either outdated statistics or wrong execution plans can cause a lot of unnecessary data access... So we looked at the execution plans (in Management studio, Ctrl+M and running the query) and noticed... no immediate difference! Both queries used the same indexes in the same order: first the non-clustered index on trackingstring to resolve the where clause, then doing a lookup in the clustered index to retrieve all the columns on the selected row (select *):

image

The problem identified

The only indication that something weird was going on, was that the cost of the row lookup was estimated at 93% of the cost in the fast query, and only 9% of the cost in the slow ADO.Net query. Only upon closer investigation of the execution plan, we noticed a small but important difference: the slow query did use an implicit operator: converting the trackingstring column (of type varchar) to nvarchar. In other words, it picked the smallest index that contained the varchar column, and converted every string to nvarchar before comparing it with the varchar parameter we provided in the query!

The real problem was the ADO.Net code, which looked like this:


cmd.CommandText = "SELECT * FROM dbo.adonetdemo WHERE trackingstring = @tstring";
cmd.Parameters.AddWithValue("tstring","4E0A-4F89-AE");

Since strings in .Net are unicode, these types are mapped onto NVARCHAR, and if the underlying column happens to be of type VARCHAR, SQL Server will convert the full column to NVARCHAR before using the comparison operator, on every query again! This causes a lot of unnecessary IO, if we know that the input string can easily be written as VARCHAR. In other words: dropping the N from the NVARCHAR variable declaration improved the performance of our query by at least a factor 20!

The solution

There is nothing wrong with ADO.Net if you use it in the right way, but as the above example illustrates, it is easy to shoot yourself in the foot if you don't pay attention. For this example, there are three solutions which would have avoided the problem:

  1. Using stored procedure: if our query was embedded in a stored proc with a VARCHAR parameter, no harm would be done.
  2. Using typed datasets: if we had used typed datasets and/or typed table adapters, these would have queried the table metadata before and would have generated the right queries. Be careful though if you later on change a type from NVARCHAR to VARCHAR without regenerating the typed TableAdapters.
  3. Setting the parameter type in .Net: the Parameter objects in a SqlCommand have a DbType property, with which we can signal the appropriate type:
  4. cmd.CommandText = "SELECT * FROM dbo.adonetdemo WHERE trackingstring = @tstring";
    cmd.Parameters.AddWithValue("tstring","4E0A-4F89-AE");
    cmd.Parameters[0].DbType = System.Data.DbType.AnsiString;

 

Nico Jacobs

TextBox