Creating histograms in Power BI was never easier

Power BI Desktop contains since the October 2016 release a binning option (under the group option). With this option it becomes super-easy to create a histogram. Let’s demonstrate: First assume you made a data model containing the DimProduct and the FactInternetSales table of the AdventureworksDW sample database. If we would request the sum of the order quantity grouped by ListPrice we would get numbers like these: But if we would plot these numbers in e.g. a bar chart, it becomes hard to read due to the large number of distinct list prices: This is where a histogram would come in handy. With the new release of Power BI Desktop all we have to do is right click the ListPrice column in our field well and select Group: In the Groups dialog we select the bin size. Unfortunately we cannot set the number of bins, so we still need to do a bit of math for that. In our case we want to group Listprices in 100$ groups, so we set the Group Type to Bin and the bin size to 100. Notice this only works on columns which are numerical or of type date/datetime. Other columns require the list group type. As soon as we click OK, we now get a new field in our field well: ListPrice (bins): By using this new field in our charts and reports, we get the data per bin. For a histogram chart, just plug it into a column chart: See, that was easy!

Weird localization problem in DAX editor

One of my students (thanks, Stefan!) discovered during the PowerBI course a bizarre localization issue in the DAX editor of Excel and SQL Server Data Tools, which is easy to replicate: Install Office 2013 or SQL Server data tools on a machine with a locale which uses a comma as decimal separator (I’m from Belgium, a Belgian locale will do). Then create a Power Pivot model, add a table to it, and in the Table editor create a calculated column which uses at least one parameterized function. In this function provide as an argument the constant string “,”. For instance you could use as an expression =IF(1=1;”,”). As soon as I enter this in the editor, the editor replaces the constant string “,” with “;”, but still shows in the outcome the correct value “,”. Something similar happens when I type a dot as constant string. When this is what I type in Power Pivot: This is what I get when I press enter in the expression editor: Notice how the formula in the editor uses a comma instead of a dot, but the evaluated expression (CalculatedColumn1 in my case) uses the dot I originally typed: what I see is not what I get. And to make things worse: when I edit the formula (e.g. adding an else parameter), it really starts to use the comma I never typed, and replaces in the expression the comma with a semi-column I never typed nor see in the evaluated column: So, be careful when using constant strings “.” or “,” in DAX!

Excel Power Query supports Multi-value Lookup columns in SharePoint lists

Power Query is a handy tool to load all sorts of data in Excel or PowerBI. One of the supported data types is SharePoint lists. But Power Query does more than the straight-forward loading of data in a single list. If a SharePoint list contains a lookup column referring to another list, Power Query will allow you to join these two lists together easily. Ant top of the bill: it works for multi-value lookup columns as well. Let’s walk through an example. I start creating in SharePoint a plain vanilla list with some values in there, e.g. a list of Belgian beers: Then I create a second list, e.g. colleagues, and I add a column where I store their favorite beer: I input some data into this list: Next we want to load this into Excel or PowerPivot, so we launch Excel, go to the Power Query ribbon (download, install and configure the Power Query add-on if you don’t see the ribbon) and select From other sources –> From SharePoint List In the next dialog specify the URL of your SharePoint site (not the SharePoint list URL!) and click OK. Power Query will now load a list of all SharePoint lists on this site and allow you to select the list you want to consult. We right click our list of colleagues and select edit, which opens up the Power Query editor window (in the next screenshot I moved the Title column further to the right and did not include many other columns to improve readability): We see that Power Query loaded the Favorite Beer column as a nested table. By clicking on the split arrow icon to the right of the Favorite Beer header, we can ask to either expand the table of calculate aggregated values. Let’s first try an expand, and just select the name of the favorite beer: If we click OK, we get this result: one row for each combination of a colleague with his or her favorite beer: We could also ask Power Query to aggregate the beers per colleague, in our example we count how many favorite beers each has, and what the average alcohol percentage is (our beer list contains the alcohol percentage of each beer): The result of this would be the following data set: From this point on, we can continue loading this data in Excel, or via Power Pivot into Power View, Power Map or plain Excel Pivot Tables and charts.

Loading GPX waypoints with Excel PowerQuery

Plotting data on maps is becoming popular: We can use the geometry and geography data type in SQL Server, use the Reporting Services maps, Excel PowerView allows mapping coordinates on a map, and the recently released PowerMap plug-in in Excel offers beautiful visualizations of map data. But how can we easily get coordinates into these tools? Some tools connect with the Bing lookup service: you provide an address, Bing translates this into latitude-longitude coordinates. But sometimes we want to explicitly insert coordinates into the database. For instance, imagine you collecting location data with a smart phone or GPS device. Many of these devices store information in GPX or KML format. In this article, we discuss how we can load the waypoint coordinates from a GPX file into Excel. From there we can use it in Excel Power Map or Excel Power View reports, or we can use the SQL Server Import/Export wizard to import the Excel file into a SQL Server table, from which we can then use Reporting Services on this. PowerQuery PowerQuery is a free plugin for Excel 2010 and 2013. If you haven’t done so, download it from the Microsoft download site, and run the installer. Once installed, we can start loading different types of files into it, filter and manipulate the data, and in the end stored the result in either a regular Excel sheet, or in a PowerPivot model. Since GPX files are just XML files, we can load the file that way. GPX files can store coordinates in different ways. The two most popular are as a track (a bunch of points who make up a walk or route) or as waypoints: markers or points of interest. In the first example, we will load a local gpx file extracting the trackpoints. To save you all the trouble of repeating the transformation steps, I have included the M script. To rerun this: Start Excel In the PowerQuery ribbon select Get External Data –> From Other Sources –> Blank Query Go to the View ribbon, and select Advanced Editor Paste the query you see below into the query window, and replace the path of the gpx file with the actual path on your machine Click Done In the lower right corner, select the destination: If you plan on using PowerMap or PowerView on this data, select the Load to Data Model option In the Home ribbon select Apply and Close let Source = Xml.Tables(File.Contents("c:\MyGPXFile.gpx")), #"Expand trk" = Table.ExpandTableColumn(Source, "trk", {"name", "desc", "number", "http://www.topografix.com/GPX/Private/TopoGrafix/0/1", "trkseg"}, {"trk.name", "trk.desc", "trk.number", "trk.http://www.topografix.com/GPX/Private/TopoGrafix/0/1", "trk.trkseg"}), #"Expand trk.trkseg" = Table.ExpandTableColumn(#"Expand trk", "trk.trkseg", {"trkpt"}, {"trk.trkseg.trkpt"}), #"Expand trk.trkseg.trkpt1" = Table.ExpandTableColumn(#"Expand trk.trkseg", "trk.trkseg.trkpt", {"Attribute:lat", "Attribute:lon"}, {"trk.trkseg.trkpt.Attribute:lat", "trk.trkseg.trkpt.Attribute:lon"}), RemovedOtherColumns = Table.SelectColumns(#"Expand trk.trkseg.trkpt1",{"trk.trkseg.trkpt.Attribute:lat", "trk.trkseg.trkpt.Attribute:lon"}), ChangedTypeWithLocale = Table.TransformColumnTypes(RemovedOtherColumns, {{"trk.trkseg.trkpt.Attribute:lat", type number}, {"trk.trkseg.trkpt.Attribute:lon", type number}}, "en-US")in ChangedTypeWithLocale If you want to load the waypoints instead of a track, you can repeat the same steps, except using this script: let Source = Xml.Tables(Web.Contents("http://www.topografix.com/fells_loop.gpx")), ChangedType = Table.TransformColumnTypes(Source,{{"time", type datetime}, {"Attribute:version", type number}, {"Attribute:creator", type text}}), #"Expand wpt" = Table.ExpandTableColumn(ChangedType, "wpt", {"ele", "time", "Attribute:lat", "Attribute:lon"}, {"wpt.ele", "wpt.time", "wpt.Attribute:lat", "wpt.Attribute:lon"}), RemovedOtherColumns = Table.SelectColumns(#"Expand wpt",{"wpt.ele", "wpt.time", "wpt.Attribute:lat", "wpt.Attribute:lon"}), ChangedType1 = Table.TransformColumnTypes(RemovedOtherColumns,{{"wpt.time", type datetime}}), ChangedTypeWithLocale = Table.TransformColumnTypes(ChangedType1, {{"wpt.ele", type number}, {"wpt.Attribute:lat", type number}, {"wpt.Attribute:lon", type number}}, "en-US")in ChangedTypeWithLocale Notice that this last script loads the gpx file from the internet, replace Web.Content with File.Content if you want to load from a local file. PowerMap Once we have loaded the data in an Excel PowerPivot model, its easy to get it into a PowerMap. First, if you haven’t done so, download and install PowerMap. Open the Excel workbook in which you loaded the PowerQuery. We will use in the example the waypoints we loaded with the second example. Go to the Excel Insert ribbon, and click Map –> Launch Power Map The Power Map dialogue opens (you need to be connected to the internet, since it talks to Bing Maps) and prompts for the geographical information. Select the latitude and longitude fields, and specify explicitly which field is latitude and longitude: Click Next. We now already see a map with the waypoints, but we can also plot extra information that is in the GPX file. For instance, we copied in this example also the time the waypoint was visited, and the elevation. So we can ask PowerMap to draw higher data bars for points with higher elevation, and we can put the time into the time axis: Finally, by clicking the settings icon in the upper right corner we can make the data bars lower, less opaque and more narrow: This leads now to this visualization: You can use this as a starting point to learn and experiment more with Power Map. Or use the model we made earlier on in Power View reports (from the Excel Insert ribbon as well).  I hope this gives you some inspiration to get started building your own visualizations on top of either GPS information you collect yourself, or information you find on internet or intranet. If you want to learn more on using these Excel data processing and reporting options (known as PowerBI), subscribe to one of our PowerBI classes.    

Challenges in combining multiple data sets in PowerBI

Relationship between house price and income A few days ago the media reported that Belgium was amongst one of the most expensive European countries to buy a house, if you compare the house price relative to the income. A few days ago I wrote a blog post on how we could analyze the Belgian house market using Microsoft Excel PowerBI and public data from the Belgian federal government (http://statbel.fgov.be). Inspired by the news that the house price relative to the income is an important indicator, let’s try to mimick this research, but then specifically for the Belgian house market. With PowerPivot one cannot only model data coming from a single data source, it is also easy to combine different data sources into a single model. For this blog post, I took the model made in the previous Belgian house market analysis (with house prices per city) and then looked for a dataset with income information per city. On http://statbel.fgov.be/nl/modules/publications/statistiques/arbeidsmarkt_levensomstandigheden/Fiscale_inkomens_-_per_gemeente_-_2008.jsp we can get fiscal income per Belgian city in 2008 (unfortunately I’m not aware of more recent public datasets on this). Using PowerQuery we can filter and convert the data such that we the total fiscal income, number of taxpayers and number of inhabitants per city. Combining the data sets: a risky business After we loaded this data in the model, we need to setup the proper relationships between the two data sources. Since both data sources are using the same city identifier (NIScode), this link is easily made. However, if we would directly link the table with the house prices per city (which contains a row per city per year per house type per measurement) with the table with the incomes per city (which contains a row per city per year), we can get errors. PowerPivot does not allow many-to-many relationships! In our model it is even worse. Since we currently only loaded data for 2008, we do only have one row per city in the income table. But as soon as we load data for multiple years, our model would become invalid! The proper way to link data together Even with a small data set (just 2 tables) as we have in this example, it is already important to stick to some of the core ideas in business intelligence: dimensional modeling. This boils down to splitting up data into dimensions (things you group or filter upon, typically the nouns in the business story) and facts (things you aggregate upon in your reports, typically the verbs in the business story). In our analysis we have two dimensions: Cities and years. So we must create a table with only one row per city (with all the information regarding that city: code, name, county, country), and another with one row per year. Since there is not much more to tell about a year than just its value, we can easily generate the latter table in PowerQuery: 1: let 2: Source = List.Numbers(1990,25), 3: TableFromList = Table.FromList(Source, Splitter.SplitByNothing(), {"Year"}, null, ExtraValues.Error) 4: in 5: TableFromList After these tables get created, we can now link them in PowerPivot in this way: From the perspective of linking the tables, the solution is fine. But we still run the risk that users will for instance grab the Years or NIScode column from either the HousePrice or the income table in one of their reports. This would be problematic, since this would only filter data from the table from which it was choosen, not from the other table. However, if we can force our users to select years only from the Years table and NIScodes only from the Location table, we don’t face this problem, since both our fact tables (HousePrice and Income) are related to both these dimension tables (Years and Location). This can easily be obtained by just hiding these columns in the fact tables: After we made the model, we can now start defining DAX measures which are calculated over both fact tables. A very interesting one is the average house price expressed in terms of the average income: How many years of average income in a city do people need to spend to buy an average house in that city. Why Bruges isn’t the most expensive area after all Now we can start building PowerView reports on top of this data. In this way we can for instance see that Bruges is much more expensive than Brussels, but if we express the houseprice in terms of the income of its inhabitants, Bruges becomes cheaper than Brussels: Download the Excel sheet from my OneDrive, and start building lots of interesting reports on top of this data set! Check out our Excel PowerBI course to learn more about building models and reports in Excel.

Analyze the Belgian house market… yourself

From time to time articles get published about the evolution of the Belgian house market. Interesting to learn about general trends but… sometimes you whished these analyses would be more tailored to your situation. It’s nice to know that prices for villas have stabilized last year, but does this trend also hold in my region? Sometimes we just have to do things our self… PowerBI = Self Service BI Lot’s of data is nowadays publicly available. The same holds for the Belgian house market analysis: http://statbel.fgov.be has a lot of public data sets available for analyzing different aspects of Belgium. One of them contains house sales information per village per year per house type (flat, house, villa). By diving into this data set we can find all sorts of detailed information. But its not easy to navigate this data. We want to massage the data such that it becomes easier to filter and aggregate the data. To do so we can use many tools… but why leave Excel? In 2014 Microsoft released PowerBI, which amongst others contains Excel plugins for loading, cleaning and modeling data. PowerQuery makes data loading easy With the free PowerQuery plugin, we can easily create a script which downloads the data from the statbel website, filters and transforms the data until we become a representation which is easier to filter and aggregate: If you want to give it a try, here is the M code, just copy it and paste it into a blank PowerQuery. 1: let 2: Source = Excel.Workbook(Web.Contents("http://statbel.fgov.be/nl/binaries/NL_immo_statbel_jaar_131202_%20121715_tcm325-34189.xls")), 3: #"Per gemeente1" = Source{[Name="Per gemeente"]}[Data], 4: RemovedFirstRows = Table.Skip(#"Per gemeente1",2), 5: FirstRowAsHeader = Table.PromoteHeaders(RemovedFirstRows), 6: RemovedColumns = Table.RemoveColumns(FirstRowAsHeader,{"localiteit", "jaar_1", "oppervlakteklasse", "Column6", "P10 prijs(€)", "Q25 prijs(€)", "Q50 prijs(€)", "Q75 prijs(€)", "P90 prijs(€)", "Column16", "P10 prijs(€)_6", "Q25 prijs(€)_7", "Q50 prijs(€)_8", "Q75 prijs(€)_9", "P90 prijs(€)_10", "Column26", "P10 prijs(€)_15", "Q25 prijs(€)_16", "Q50 prijs(€)_17", "Q75 prijs(€)_18", "P90 prijs(€)_19", "Column36", "P10 prijs(€/m²)", "Q25 prijs(€/m²)", "Q50 prijs(€/m²)", "Q75 prijs(€/m²)", "P90 prijs(€/m²)", "gemiddelde prijs(€)", "gemiddelde prijs(€)_5", "gemiddelde prijs(€)_14", "gemiddelde prijs(€/m²)"}), 7: RenamedColumns = Table.RenameColumns(RemovedColumns,{{"aantal transacties", "House transaction count"}, {"totale prijs(€)", "House total cost"}, {"totale oppervlakte(m²)", "House building area"}, {"aantal transacties_2", "Villa transaction count"}, {"totale prijs(€)_3", "Villa total cost"}, {"totale oppervlakte(m²)_4", "Villa building area"}, {"aantal transacties_11", "Flat transaction count"}, {"totale prijs(€)_12", "Flat total cost"}}), 8: RemovedColumns1 = Table.RemoveColumns(RenamedColumns,{"totale oppervlakte(m²)_13"}), 9: RenamedColumns1 = Table.RenameColumns(RemovedColumns1,{ {"aantal transacties_20", "Ground transaction count"}, {"totale prijs(€)_21", "Ground total cost"}, {"totale oppervlakte(m²)_22", "Ground building area"}}), 10: Unpivot = Table.UnpivotOtherColumns(RenamedColumns1,{"refnis", "jaar"},"Attribute","Value"), 11: SplitColumnDelimiter = Table.SplitColumn(Unpivot,"Attribute",Splitter.SplitTextByEachDelimiter({" "}, null, false),{"Attribute.1", "Attribute.2"}), 12: ChangedType = Table.TransformColumnTypes(SplitColumnDelimiter,{{"Attribute.1", type text}, {"Attribute.2", type text}}), 13: RenamedColumns2 = Table.RenameColumns(ChangedType,{{"Attribute.1", "HouseType"}, {"Attribute.2", "Measurement"}}), 14: ChangedType1 = Table.TransformColumnTypes(RenamedColumns2,{{"Value", type number}}), 15: TransformedColumn = Table.TransformColumns(ChangedType1,{{"Measurement", Text.Proper}}), 16: RenamedColumns3 = Table.RenameColumns(TransformedColumn,{{"jaar", "Year"}}), 17: ChangedType2 = Table.TransformColumnTypes(RenamedColumns3,{{"Year", type date}}), 18: RenamedColumns4 = Table.RenameColumns(ChangedType2,{{"refnis", "NISCode"}}) 19: in 20: RenamedColumns4 PowerPivot to model the data the way you want it Next we want to define useful calculations: Sums, averages, but also more challenging calculations, such as linking each village with the corresponding county. For this we use PowerPivot, a free addin for Excel 2010, out-of-the-box present in Excel 2013. Amongst others we define an average cost per house and an average cost per square meter calculation: PowerView to build interactive charts As soon as we have loaded and modeled our data, we can finally start building the reports we needed. The PowerView add-in in Excel 2013 allows us to quickly create interactive reports on top of our PowerPivot model. In less than two minutes we can create a report which shows the price evolution over time for the different counties, with a drill down into the actual villages, split by house type. Watch the video below to get an idea, or if you have Excel 2013, just download the Excel file and play with the report itself!   The nice thing about PowerView is that out-of-the-box it usually does what we expect it to do. For instance: when we click on a house type, the bar charts automatically show the average price for the selected type of house relative to the average price for all house types. For instance, in the county Brugge a flat is more expensive than the average price for any type of house, whereas in the other counties it is cheaper:   We can also create more advanced visualization. For instance a bubble chart plotting the average land price against the average size of the building area, and how this evolves over time. When you single click a county, you see the price evolution through time, double clicking will zoom in into the cities within that county. Do it yourself You can download the Excel sheet from my OneDrive at http://1drv.ms/1fcH2u6 and have fun with the data. Be sure to download the Excel file, since OneDrive does not allow you to interact with the data in the browser. But if you want to learn how to build PowerPivot models and load data with PowerQuery, consider attending our three day course Building PowerPivot Models in Excel. If you are also interested in building PowerView and PowerMap reports on top of the PowerPivot models, you can instead attend the course Business Intelligence with Excel. Have fun with the PowerBI self service business intelligence… the world is yours to discover!