U2U Blog

for developers and other creative minds

Deciding when to do a differential backup

SQL Server allows for differential backups, which only backups extends modified since the last full backup. The advantage of these differential backups is that if only a small portion of your data has changed since the last full backup, you need less time and storage for your backups! The disadvantage however is that the time needed for a restore increases: We must first restore the full backup, then restore the last differential backup.
So when few extends have changed since the last full backup, the gain on the backup is huge, and the pain when restoring is limited. But as more and more extends are being modified, the pain grows and the gain shrinks. So to decide whether we go for a full or differential backup we need to know the number of extends modified since the last full backup. But on SQL Server 2016 and earlier the only easy way to figure out was… by taking a differential backup.

SQL Server 2017 offers a nice improvement on this. In the sys.dm_db_file_space_usage dynamic management view an extra column modified_extent_page_count has been added which provides this information. So if you’re willing to stick to differential backups till they are 70% of the size of your full database, you could use this script to get the backup advice:

SELECT total_page_count, allocated_extent_page_count, modified_extent_page_count
, modified_extent_page_count * 100 / allocated_extent_page_count AS [% changed]
, CASE WHEN modified_extent_page_count * 100 / allocated_extent_page_count > 70
    THEN 'FULL'
    ELSE 'DIFFERENTIAL'
    END AS Advice
FROM sys.dm_db_file_space_usage

An example:

Capture

So with this we can improve our maintenance plan. Next step: convince Ola to include this in his maintenance script Smile

Using Azure VMs as remote R workspaces in R Tools for Visual Studio

Running R on your local laptop is easy: you just download one of the R distributions (CRAN or Microsoft) and kick off RGui.exe. But if you’re a Microsoft oriented developer or data scientist you are probably familiar with Visual Studio. So you download and install R Tools for Visual Studio (RTVS) and you can happily run r code from within Visual Studio, link this to source control etc.
image

Remote workspaces in R Tools for Visual Studio

But this decentralized approach can lead to issues:

  • What if the volume of data increases? Since R holds its data frames in memory we need at least 20 Gb of memory when working with e.g. an 16 Gb data set. The same remark for CPU power: if the R code runs on a more powerful machine it returns results faster.
  • What if the number of RTVS users increases? Do we copy the data to each users laptop? This makes it difficult to manage the different versions of these data sets and increases the risk of data breaches.

This is why RTVS also allows us to run R code from within a local instance of Visual Studio, but it executes on a remote R server, which also holds a central copy of the data sets. This remote server can be an on-premise server, but if the data scientists do not need permanent access to this server it could be cheaper to just spin up an Azure virtual machine.

Setup

When we click the Setup Remote R… menu in our Visual Studio R Tools it takes us to this webpage, which explains in detail how to setup a remote machine.
image
Unfortunately this detailed description was not detailed enough for me and I bumped into a few issues. So if you got stuck as well, read on!

Create an Azure VM

Login in the Azure portal, click the + to create a new object and create a Windows Server 2016 Data center virtual machine. Stick to the Resource Manager deployment model and click create.
When configuring the VM you can choose between SSD and HDD disks. The latter are cheaper, the former are faster if you often need to load large data sets. Also pay attention when you select your region: by storing it in the same region as where your data is stored you can save data transfer costs. But also be aware that the cost for a VM is different over the regions. At the time of writing the VM I used is 11% cheaper in West Europe than in North Europe.

In the next tab we must select the machine we want to create. I went for an A8m V2, which delivers 16 8 cores and 64 Gb of RAM at a cost of about 520 euro/month if it runs 24/7.

Azure VM settings

Before I start my machine I change two settings: 1 essential, 1 optional.

The essential setting is to give the machine a DNS name such that we can keep addressing the machine using a fixed name, even if it got a different IP address after a reboot:
In the essentials window of the Azure virtual machine blade click on the public IP address. This opens up a dialog where we can set an optional (but in our case required) DNS name. I used here the name of the VM I created, don’t know if that’s essential, but it did the job in my case:
image

The optional setting is the auto-shutdown option on your VM, which can save you some costs by shutting down the VM when all data scientists are asleep.
image

Configure the VM

Now we can start the virtual machine, connect via remote desktop and start following the instructions in the manual:

  1. Create a self-signed certificate. Be sure to use here the DNS name we made in the previous steps. In my example the statement to run from PowerShell would be:
    New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName "njremoter.westeurope.cloudapp.azure.com"
  2. Grant Full Control and Read permissions on this certificate to the NETWORK SERVICE account using certlm.msc:
    image
    image
  3. Install the R service using this installer
  4. Edit the C:\Program Files\R Remote Service for Visual Studio\1.0\Microsoft.R.Host.Broker.Config.json file and point it to the DNS name we used before. Again, on my machine this would be:
    {
      "server.urls": "https://0.0.0.0:5444",
      "security": {
        "X509CertificateName": "CN=njremoter.westeurope.cloudapp.azure.com"
      }
    }
  5. Restart the virtual machine. Verify that the two R related services start up correctly:
    image
    If the services fail to start verify the log file at C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp
  6. Last but not least we must open up port 5444. The R service installer already takes care of that on the windows firewall, but we still need to open up the port at the Azure Firewall. In the Azure portal go to the virtual machine and select Network Interfaces and click the default network interface. In the blade of this interface click the default network security group:
    image
    Create a new inbound security rule, allowing access on TCP port 5444. In my example the firewall allows access from all IP addresses, for security reasons you better limit yours using a CIDR block to just the IP addresses of your data scientists
    image

Configuring RTVS

Our server is now configured, last (and easiest) step is to configure our R Tools For Visual Studio. So, on the client machines open up the R Tools Workspaces window:
image
Then click Add and configure the server. On my machine this would be:
image

Click Save, then click the blue connect arrow next to our remote connection:
image
Unless you added the VM to your domain we will now need to provide the credentials of one of the users we made at the server. You also get a prompt to warn you that a self-signed certificate was used, which is less safe:
SelfSignedRTVS

And from now on we are running our R scripts on this remote machine. You can see this amongst others by a change of prompt: *> instead of >
image

If you have problems connecting to your server, Visual Studio can claim it couldn’t ‘ping’ your server. Don’t try to do a regular ping, since Azure VMs don’t support this. Use e.g. PsPing or similar options to ping a specific port, in our case 5444:
image

I hope this gets you up to speed with remote R workspaces. If you still experience issues you can always check the github repository for further help.

Creating histograms in Power BI was never easier

Power BI Desktop contains since the October 2016 release a binning option (under the group option). With this option it becomes super-easy to create a histogram. Let’s demonstrate:

First assume you made a data model containing the DimProduct and the FactInternetSales table of the AdventureworksDW sample database. If we would request the sum of the order quantity grouped by ListPrice we would get numbers like these:
image

But if we would plot these numbers in e.g. a bar chart, it becomes hard to read due to the large number of distinct list prices:

image

This is where a histogram would come in handy. With the new release of Power BI Desktop all we have to do is right click the ListPrice column in our field well and select Group:

image

In the Groups dialog we select the bin size. Unfortunately we cannot set the number of bins, so we still need to do a bit of math for that. In our case we want to group Listprices in 100$ groups, so we set the Group Type to Bin and the bin size to 100. Notice this only works on columns which are numerical or of type date/datetime. Other columns require the list group type.
image

As soon as we click OK, we now get a new field in our field well: ListPrice (bins):

image

By using this new field in our charts and reports, we get the data per bin. For a histogram chart, just plug it into a column chart:
image

See, that was easy!

Weird localization problem in DAX editor

One of my students (thanks, Stefan!) discovered during the PowerBI course a bizarre localization issue in the DAX editor of Excel and SQL Server Data Tools, which is easy to replicate:

Install Office 2013 or SQL Server data tools on a machine with a locale which uses a comma as decimal separator (I’m from Belgium, a Belgian locale will do).

Then create a Power Pivot model, add a table to it, and in the Table editor create a calculated column which uses at least one parameterized function. In this function provide as an argument the constant string “,”. For instance you could use as an expression =IF(1=1;”,”).

As soon as I enter this in the editor, the editor replaces the constant string “,” with “;”, but still shows in the outcome the correct value “,”.
Something similar happens when I type a dot as constant string.
When this is what I type in Power Pivot:

image

This is what I get when I press enter in the expression editor:

image

Notice how the formula in the editor uses a comma instead of a dot, but the evaluated expression (CalculatedColumn1 in my case) uses the dot I originally typed: what I see is not what I get. And to make things worse: when I edit the formula (e.g. adding an else parameter), it really starts to use the comma I never typed, and replaces in the expression the comma with a semi-column I never typed nor see in the evaluated column:
image

So, be careful when using constant strings “.” or “,” in DAX!

Excel Power Query supports Multi-value Lookup columns in SharePoint lists

Power Query is a handy tool to load all sorts of data in Excel or PowerBI. One of the supported data types is SharePoint lists. But Power Query does more than the straight-forward loading of data in a single list. If a SharePoint list contains a lookup column referring to another list, Power Query will allow you to join these two lists together easily. Ant top of the bill: it works for multi-value lookup columns as well.

Let’s walk through an example. I start creating in SharePoint a plain vanilla list with some values in there, e.g. a list of Belgian beers:
image

Then I create a second list, e.g. colleagues, and I add a column where I store their favorite beer:
image

I input some data into this list:
image

Next we want to load this into Excel or PowerPivot, so we launch Excel, go to the Power Query ribbon (download, install and configure the Power Query add-on if you don’t see the ribbon) and select From other sources –> From SharePoint List

image

In the next dialog specify the URL of your SharePoint site (not the SharePoint list URL!) and click OK. Power Query will now load a list of all SharePoint lists on this site and allow you to select the list you want to consult. We right click our list of colleagues and select edit, which opens up the Power Query editor window (in the next screenshot I moved the Title column further to the right and did not include many other columns to improve readability):
image

We see that Power Query loaded the Favorite Beer column as a nested table. By clicking on the split arrow icon to the right of the Favorite Beer header, we can ask to either expand the table of calculate aggregated values. Let’s first try an expand, and just select the name of the favorite beer:
image

If we click OK, we get this result: one row for each combination of a colleague with his or her favorite beer:
image

We could also ask Power Query to aggregate the beers per colleague, in our example we count how many favorite beers each has, and what the average alcohol percentage is (our beer list contains the alcohol percentage of each beer):
image

The result of this would be the following data set:
image

From this point on, we can continue loading this data in Excel, or via Power Pivot into Power View, Power Map or plain Excel Pivot Tables and charts.

Loading data in Azure Machine Learning

In July 2014 Microsoft made their cloud-based data mining environment (known as Azure Machine Learning, or AzureML) available to the public. With this platform users can analyze large amounts of data without the need to install and configure special software: A browser and a credit card is all you need Glimlach. With the increasing number of people in a number-crunching job (data scientists) it is nice to see Microsoft focusing on this.

In a previous blog post (see http://blogs.u2u.be/u2u/post/2014/07/14/First-Steps-in-Azure-Machine-Learning.aspx) I show how to get started with setting up an AzureML environment. In this blog post we take a look at loading data in AzureML.

Supported data formats

Currently AzureML is focusing on the most common formats used in the world of machine learning:

  • Text files containing comma separated values (CSV), tab-separated values (TSV), the Attribute-Relation File Format (ARFF) which was introduced by the open-source Weka machine learning framework, RData files or the SVMLight format.
  • Database tables: Hive tables (Hadoop), Azure Tables and SQL Databases in Azure

Since AzureML runs in the Azure cloud, all your data must be in the cloud as well. Either you already uploaded your data to Azure (e.g. your data is stored in an Azure SQL Database) or you will upload it explicitly for this project. In both cases be careful to store your data in the same region as where you’re running your AzureML, since in the preview period, AzureML only runs from the South Central US data center. If you store your data in another data center it will be slower and more expensive to run your experiments.

Let’s first consider the scenario where you upload your data from a local file directly into AzureML (Uploading a DataSet), then we cover the scenario where your data is already somewhere in Azure (Reading data).

Uploading a DataSet

A lot of sample machine learning data sets are already available out-of-the-box in Azure ML. But after some experimenting with public data, you probably want to play with your own data. If you didn’t have your data anywhere on Azure yet, you can upload it as a new dataset in AzureML directly. But before we start adding data sets, first a warning: In the current preview we cannot delete uploaded datatsets. We can override an existing data set with new data, but if you create 1001 data sets, they will be in the list forever (that is: until Microsoft fixes this limitation). Because of this, if your dataset is not yet fixed, consider uploading the data file(s) into a custom Azure blob store and then load them with the reader from within your experiment.

To add a new dataset, click the +New button at the bottom left of the ML Studio screen, and select DataSet –> From local file. In the next dialog box, we can pick the file to upload, provide a name (choose well, it cannot be altered later on), select the type of data in the file and provide an optional description:
image

If you select the checkbox you select an existing dataset, who’s content will be overwritten by the file you select. It is impossible to delete or rename a datset, but you can always upload an empty file ‘as a new version’ of a large data set to truncate it.

If we now want to use this data, we create a new experiment by clicking the +New button. In this new experiment under the Saved Datasets we will find our uploaded dataset among the list. Just drag it to the design surface.

image

Also remember the search box at the top: by typing part of an object name (and a data set is one of the many objects we have in AzureML) we get a filtered list which makes it easier to find an object.

Now that we have our data in AzureML we can start interacting with it, such as simply visualizing our data: click in the circle under the data set and select Visualize:
image

This will open up the overview screen, showing basic statistical information on each data field:

image

Reading data

Another way to get data in an AzureML experiment is by first uploading your data in a Azure SQL Database, an Hadoop cluster (such as HDInsight) or upload the files with data (same data types as we had in the previous paragraph) into an Azure blob store.

In this case you do not need to create a data set, but you can immediately create a new experiment.

In this experiment, locate the Reader under Data Input and Output and drag it into the experiment.
image
When we click in the Reader, we get on the right-hand side all the configurable properties of this Reader. The most important property is the data source type. This one determines which other properties are needed. Select over here the location where your data can be found and configure the other properties appropriately
image

When we now run the experiment, we can visualize the data from this reader, just as we could we an uploaded data set. But we have an extra option. By clicking Save as dataset, we can permanently store this data in AzureML. This speeds up the runtime of an experiment, but it increases the storage cost (we store another redundant copy of the data).
image

In a next blog post, I will discuss data preprocessing.

First Steps in Azure Machine Learning

Today Microsoft announces the availability of machine learning (data mining) in Azure. As you can assume, you need an Azure account to get started with this, but there are free trial accounts… you can try before you buy.

To get started with the machine learning preview go to http://manage.windowsazure.com and log in with your azure account. In the list of options, close to the bottom, you will find Machine Learning:
image

Click the Create an ML workspace link. Currently there is only a Quick Create option available. Invent a unique workspace name. The Workspace owner must be a valid LiveID account. Location is easy: the machine learning is currently only available in South Central US. I guess I as an European will just need to be a little more patient Glimlach.
If you already have an South Central US storage account you can reuse that, but I put all my storage accounts in Europe, so I now will need to create one on US soil. Those who need to keep there data within Europe for legal reasons will need to wait, because I assume Microsoft will make this service available later on in Europe as well.

My final configuration looks like this:
image

Now is the time to start reading the tutorial at http://azure.microsoft.com/en-us/documentation/articles/machine-learning-create-experiment/ while Azure is creating your Machine Learning workspace.

Once the workspace is created we can click the right arrow next to it.
image
Then click on the DashBoard link at the top, and next click the Sign-in to ML Studio under quick glance:
image

And now we arrive in the ML Studio:
image

At this point you can get started following the tutorials(http://azure.microsoft.com/en-us/documentation/articles/machine-learning-create-experiment/), play with the sample data or build experiments from scratch with your own data: Have fun!

Building custom reports for SQL Server management

Recently I presented a session at the ITPROceed conference on ‘Administration intelligence’: using Business Intelligence solutions for analyzing log files, generate documentation, do disk quota estimations and other typical administration tasks. You can find the full presentation at SlideShare.

In this blog post, I want to focus on one of the techniques explained in this talk: creating Reporting Services reports to run in Management Studio.

SQL Server Management Studio reports

SQL Server Management Studio contains a bunch of predefined reports. In the Object Explorer, right click any database, select Reports –> Standard Reports and you get a listing of nearly 20 predefined reports:
image

 

If we select any of these reports, a new tab open in Management Studio in which the report renders. There are a few special things to notice about these reports:

  1. They can be exported in PDF, Word and Excel format, as well as being printed
  2. It can be a multi-page report. There is no explicit way to go to the next or previous page, but if you scroll past the end of the current page, you go to the next page.
  3. The report is context sensitive: if you open the same report by clicking on a different database, it shows information for that specific database.

Before we dive into building our own reports, I just wanted to point out that we have these predefined reports not only at the level of databases, but also at the server instance level, as well as on the Data Collection node (Management), the Integration Services Catalogs (from 2012 on) and the SQL Server Agent node. Just go ahead and try these predefined reports: They provide very useful information without the effort of building reports yourself… plus they might inspire you in building your own reports.

From report consumer to report producer

These predefined reports can make our life easier, but sometimes you want a report different from these predefined reports.Building a custom report is easy.

Step 1: Retrieving the data

The first (and often most difficult) step in creating any report is in retrieving the right data. Luckily SQL Server provides tons of information via the many management views. Just start typing SELECT * FROM sys. and the completion tool will show you the huge list of possible completions. If you want to read more on these before you continue, checkout the web, e.g. mssqltips.com has some interesting reading material for you.

In this example we want to report on log file sizes: what are the biggest log files I have on any server, and home many room for growth (or shrinking) do we still have in there.

A query with which we can get started is this one:

SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Log File(s) Size (KB)'
,'Log File(s) Used Size (KB)'
)
AND instance_name <> '_Total'
This returns a table such as this:
image
 
But in reporting it is important to present the data as easy digestible as possible. It is easier to build reports if different measures (in our case log size and log used size) are in different columns versus all in one single column (as they are right now). The PIVOT statement in T-SQL might help us out on that. Also, simple resorting and renaming might improve the readability of the result. So after a bit of a rewrite we end up with this query:
SELECT [Database Name]
,[Log File(s) Used Size (KB)] AS LogUsedSize
,[Log File(s) Size (KB)] AS LogTotalSize
FROM (
SELECT RTRIM(instance_name) AS [Database Name]
,cntr_value AS [Size in Kb]
,RTRIM(counter_name) AS [Counter Name]
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Log File(s) Size (KB)'
,'Log File(s) Used Size (KB)'
)
AND instance_name <> '_Total'
) AS src
PIVOT(SUM([Size in Kb]) FOR [Counter Name] IN (
[Log File(s) Size (KB)]
,[Log File(s) Used Size (KB)]
)) AS pvt
ORDER BY [Log File(s) Size (KB)] DESC
image
 
Just copy the query in the clipboard, we will need it for the next step:

Step 2: Building the actual report

To build a report which we can run in Management Studio, we need to have SQL Server Data Tools (SSDT) installed (the tools formerly known as Business Intelligence Development Studio, BIDS, so you can use BIDS as well if you’re not yet on 2012). This is part of the regular SQL Server setup, so you can find it on your SQL Server 2012 or later installation media. But you can also just go and download it from the Microsoft Download Center.
After downloading, installing and starting this tool (which is just a Visual Studio template), we instruct it to create a new project. In the list of installed templates, you should find back the Business Intelligence group, select the Reporting Services group within, and on the right, click the Report Server Project Wizard and provide a useful project name:
image
 
Once you click OK, the wizard will start. Click Next to get past the start screen.
On the next screen the wizard asks which type of database connection we want to have, and to which server and database we want to connect. This is the tricky part, because we actually want to connect to whatever server and database we click upon in Management Studio object explorer. But now, we first need to develop this report in SSDT. Luckily, there is an easy way out: If we use the Microsoft SQL Server data source type, whatever server and database we select will be replaced with the ‘current’ server and database whenever we run this from Management Studio. So lets just connect to a server (I’m using localhost over here, but any server to which you can connect from SSDT will do):
image 
 
Notice that if you want your report to not query the current database but a fixed database (e.g. always master or msdb), you should use three part names in your query (e.g. SELECT * FROM msdb.dbo.suspect_pages).
 
Next screen asks for the query to retrieve the data for this report. Paste the query we’ve written earlier in this blog into this window. The next screen asks for a report type, just confirm the tabular report. Then we have a screen that asks for the report design. Just drag all three fields into the Details window and click Finish. Then we hit the last wizard screen where we give this report a meaningful name: Log Size. Click Finish.

Step 3: fine-tuning the report

After we finish the wizard we end up in the editor. Click the preview tab to get an impression of what the report will look once deployed:
image
 
This report only looks slightly better than the table we got in Management Studio earlier on. We need to make a couple of improvements.
Switch back to the Design tab to widen the database name column (just drag the column header to the right).
Finally we want to have a graphical indication on the size of these log files. To achieve this, right-click the rightmost column header in our report and add a column to the right:
image
 
Next, from the Toolbox window, drag the Data Bar component into the bottom cell of this newly created column. In the pop-up window, just confirm the default data bar type. Finally, click the blue bar that appears in this cell. On the right another pop-up window appears. Click the green plus next to Values and click LogTotalSize:
image
 
Now preview the report again:
image
 

Step 4: Run in Management Studio

Now that we have a report which is easy to read, let’s make sure we can easily open up this report in Management Studio. In SSDT, in the File menu select Save Log File.rdl As... In the save dialog, navigate to My Documents > SQL Server Management Studio > Custom Reports and save it over there. Notice that this is not required, Management Studio can open reports from any file location; It’s just the default location for these files.
Finally, we can now open up Management Studio, right click any object in the object explorer, select Reports > Custom Reports, and select the file we just saved there.
image
 
If you can connect to multiple instances of SQL Server, try out this report on all of them, and you will see how the report takes the server context into account.
image
 
Right click to print and export the report… you can have all the options we had on the predefined reports.
 
This is not the end of it, it’s just the start. There is a lot more to learn about building Reporting Services reports, as well as about all the useful information we can extract from SQL Server. So, give it a try!
 

Download

If you just want to download the .rdl file we created in this post, you can find it at http://1drv.ms/1yQjzMD

Loading GPX waypoints with Excel PowerQuery

Plotting data on maps is becoming popular: We can use the geometry and geography data type in SQL Server, use the Reporting Services maps, Excel PowerView allows mapping coordinates on a map, and the recently released PowerMap plug-in in Excel offers beautiful visualizations of map data.

image

But how can we easily get coordinates into these tools? Some tools connect with the Bing lookup service: you provide an address, Bing translates this into latitude-longitude coordinates. But sometimes we want to explicitly insert coordinates into the database. For instance, imagine you collecting location data with a smart phone or GPS device. Many of these devices store information in GPX or KML format. In this article, we discuss how we can load the waypoint coordinates from a GPX file into Excel. From there we can use it in Excel Power Map or Excel Power View reports, or we can use the SQL Server Import/Export wizard to import the Excel file into a SQL Server table, from which we can then use Reporting Services on this.

PowerQuery

PowerQuery is a free plugin for Excel 2010 and 2013. If you haven’t done so, download it from the Microsoft download site, and run the installer. Once installed, we can start loading different types of files into it, filter and manipulate the data, and in the end stored the result in either a regular Excel sheet, or in a PowerPivot model. Since GPX files are just XML files, we can load the file that way. GPX files can store coordinates in different ways. The two most popular are as a track (a bunch of points who make up a walk or route) or as waypoints: markers or points of interest. In the first example, we will load a local gpx file extracting the trackpoints. To save you all the trouble of repeating the transformation steps, I have included the M script. To rerun this:

  1. Start Excel
  2. In the PowerQuery ribbon select Get External Data –> From Other Sources –> Blank Query
  3. Go to the View ribbon, and select Advanced Editor
  4. Paste the query you see below into the query window, and replace the path of the gpx file with the actual path on your machine
  5. Click Done
  6. In the lower right corner, select the destination: If you plan on using PowerMap or PowerView on this data, select the Load to Data Model option
  7. In the Home ribbon select Apply and Close
let
Source = Xml.Tables(File.Contents("c:\MyGPXFile.gpx")),
#"Expand trk" = Table.ExpandTableColumn(Source, "trk", {"name", "desc", "number", "http://www.topografix.com/GPX/Private/TopoGrafix/0/1", "trkseg"}, {"trk.name", "trk.desc", "trk.number", "trk.http://www.topografix.com/GPX/Private/TopoGrafix/0/1", "trk.trkseg"}),
#"Expand trk.trkseg" = Table.ExpandTableColumn(#"Expand trk", "trk.trkseg", {"trkpt"}, {"trk.trkseg.trkpt"}),
#"Expand trk.trkseg.trkpt1" = Table.ExpandTableColumn(#"Expand trk.trkseg", "trk.trkseg.trkpt", {"Attribute:lat", "Attribute:lon"}, {"trk.trkseg.trkpt.Attribute:lat", "trk.trkseg.trkpt.Attribute:lon"}),
RemovedOtherColumns = Table.SelectColumns(#"Expand trk.trkseg.trkpt1",{"trk.trkseg.trkpt.Attribute:lat", "trk.trkseg.trkpt.Attribute:lon"}),
ChangedTypeWithLocale = Table.TransformColumnTypes(RemovedOtherColumns, {{"trk.trkseg.trkpt.Attribute:lat", type number}, {"trk.trkseg.trkpt.Attribute:lon", type number}}, "en-US")
in
ChangedTypeWithLocale
If you want to load the waypoints instead of a track, you can repeat the same steps, except using this script:
let
Source = Xml.Tables(Web.Contents("http://www.topografix.com/fells_loop.gpx")),
ChangedType = Table.TransformColumnTypes(Source,{{"time", type datetime}, {"Attribute:version", type number}, {"Attribute:creator", type text}}),
#"Expand wpt" = Table.ExpandTableColumn(ChangedType, "wpt", {"ele", "time", "Attribute:lat", "Attribute:lon"}, {"wpt.ele", "wpt.time", "wpt.Attribute:lat", "wpt.Attribute:lon"}),
RemovedOtherColumns = Table.SelectColumns(#"Expand wpt",{"wpt.ele", "wpt.time", "wpt.Attribute:lat", "wpt.Attribute:lon"}),
ChangedType1 = Table.TransformColumnTypes(RemovedOtherColumns,{{"wpt.time", type datetime}}),
ChangedTypeWithLocale = Table.TransformColumnTypes(ChangedType1, {{"wpt.ele", type number}, {"wpt.Attribute:lat", type number}, {"wpt.Attribute:lon", type number}}, "en-US")
in
ChangedTypeWithLocale
Notice that this last script loads the gpx file from the internet, replace Web.Content with File.Content if you want to load from a local file.

PowerMap

Once we have loaded the data in an Excel PowerPivot model, its easy to get it into a PowerMap. First, if you haven’t done so, download and install PowerMap. Open the Excel workbook in which you loaded the PowerQuery. We will use in the example the waypoints we loaded with the second example.

  1. Go to the Excel Insert ribbon, and click Map –> Launch Power Map
  2. The Power Map dialogue opens (you need to be connected to the internet, since it talks to Bing Maps) and prompts for the geographical information. Select the latitude and longitude fields, and specify explicitly which field is latitude and longitude:
    image
  3. Click Next. We now already see a map with the waypoints, but we can also plot extra information that is in the GPX file. For instance, we copied in this example also the time the waypoint was visited, and the elevation. So we can ask PowerMap to draw higher data bars for points with higher elevation, and we can put the time into the time axis:
    image
  4. Finally, by clicking the settings icon in the upper right corner we can make the data bars lower, less opaque and more narrow:
    image
  5. This leads now to this visualization:
    image
    You can use this as a starting point to learn and experiment more with Power Map. Or use the model we made earlier on in Power View reports (from the Excel Insert ribbon as well). 

I hope this gives you some inspiration to get started building your own visualizations on top of either GPS information you collect yourself, or information you find on internet or intranet. If you want to learn more on using these Excel data processing and reporting options (known as PowerBI), subscribe to one of our PowerBI classes.