U2U Blog

for developers and other creative minds

Validation sets in SQL Server Data Mining

What are validation sets?

Data Mining

Data mining analyses historical data to find patterns that might help us better understand how our business works, or might help predict how the business might evolve in the future: Instead of doing ‘traditional BI’, where we pick some attributes and ask for aggregated data (“show me the sum of sales amount by country per fiscal quarter”), in data mining we ask questions such as “what is typical for customers who buy bikes”, and we get answers (models, as we call them) that contain patterns such as “if the age of the customer is less than 29 and they live in the Netherlands they are more likely bike buyers”. This however results in a problem: how do we know if a model is ‘better’ than another model? Is the model “Young people are more frequent bike buyers” better than “People who do not own a car are bike buyers”?

Test set

The typical approach to test the quality of models is by testing how well they behave when we use them to predict the outcome (e.g. whether a customer buys a bike or not) on the historical data, for which we then already know the outcome. Models for which the predicted outcome more frequently corresponds with the actual outcome are better models. However, we need to be careful: if we would use as a test data set the same set of data we use to create the models, we run the risk of overfitting. Overfitting means the model is so tuned on the training set, that the patterns are not general enough to be useful on new data. E.g. the model “If the customer name is Ben Carlson, Margareta Wuyts, … or Jeremy Frank then it is a bike buyer” might make perfect predictions in your historical data, but it is clear that it will be of little help in making predictions on new customers: it is heavily overfitted. This is why we split the historical data in two sets: training data, on which the system search for patterns, and test data, which we use to test the quality of the model. This is even build-in in the SQL Server Analysis Services wizard to construct mining models: It by default proposes to keep 30% of the data separate for testing.

image

Validation set

But… also test data sets raise an issue: We often need to test a lot of different mining models with different parameter setting to find a near-optimal result. This is an iterative process, in which we create a few models, test them on the test set, see which data mining techniques and parameters work best, use that knowledge to setup a second iteration of models to be tested etc. But in this way, the data mining developer is introducing knowledge from the test set in the development process: Imagine that in our test set age is a strong indicator, than we will favor models which use this. The overall result is that the estimated quality of the predictions which are made on the test set are no longer a good estimate of the expected quality of the predictions on new data. They are already slightly biased towards our test set, and typically overestimate the predictive quality of our model.

This is where validation sets come in: Before we got started with any data mining in the first place, we should have set some of our historical data (e.g. 20% of the data) apart in a validation set. The remaining 80% is then split apart in training and test data. Once we’re finished with our data mining, we test our model one last time, on data it has never seen, not as training data, not as test data. Our validation set is (from the data mining point of view) truly new data, and might give the best impression of the expected predictive quality of our mining model.

How do we create validation sets?

In contrast to test data sets, the mining wizard does not allow us to set apart a validation set. So we need to do this in the data preparation phase (see CRISP-DM methodology for more info on the different phases in the data mining process). If you prefer to prepare your data with T-SQL statements, you can use this approach based on NEWID() to randomly select a certain set of data, but be careful: if you rerun the statement, a different subset will be selected.

Another approach is to use SSIS (Integration Services), which has a percentage sampling transformation which is ideal for this job: it assigns each row an n% likelihood of being selected, so because of that it doesn’t need to cache all rows in memory (in contrast to the row sampling transformation). An advantage over the NEWID() approach is that we can set the seed for the random data generator, such that results are reproducible if we want.

image

How do we use validation sets?

Using validation sets is easy. Just make sure that the table you created with the validation data is in the same data source as the data source you used for the SSAS project. Then in the Mining Accuracy Chart tab of the Mining model in SSAS, you select just the best performing model(s) and below you choose the radio button to use a different data set.
ValidationSet

Click the ellipsis button (…) and select the table or view which contains the validation set. Join the proper columns from the validation set with the mining model, and you’re set! Now you can create lift or profit charts and build a classification matrix against the validation set.

Happy mining!
Nico

Comments (14) -

  • Yomi

    2/17/2013 6:38:00 PM |

    There are dots near the knobs that are luminous and this enables the individual to view the time in darkness. Whether it is that casual party or social gathering, now you have one more reason to shine when it's in a company. So just possess your new or used http://www.myrolexwatchshop.com  rolex replica. The rugged construction of the http://www.myrolexwatchshop.com/air-king  rolex air king ensures that it can withstand the effects of time.

  • swiss rolex replica

    3/11/2013 5:57:20 PM |

    Indeed they are very pleased that they are capable to get all around several of the Iowa regulations this route. <a href="http://www.salewatchus.com/">cheap replica rolex</a>
    Tables and Chairs You will unquestionably require both of those of these. You will not be standing and moving during the complete time, make absolutely sure your provide a lot of chairs that can be effortlessly scooted up less than the table. <a href="www.salewatchus.com/luxury-replica-watch-us-daytona">rolex daytona replica</a>

  • Tyrell Kimes

    11/12/2013 7:26:08 AM |

    I wish to express my respect for your generosity for individuals who really want help with in this subject. Your real commitment to passing the message all over has been pretty functional and have regularly permitted associates like me to reach their goals. Your personal valuable key points can mean a whole lot a person like me and much more to my colleagues. With thanks; from each one of us.

  • Dewayne Havekost

    11/12/2013 11:31:32 AM |

    I was suggested this blog via my cousin. I am now not positive whether this publish is written by him as nobody else understand such exact about my trouble. You're wonderful! Thank you!

  • Helene Clizbe

    2/7/2014 12:58:26 AM |

    I like this blog  so much,  saved to my bookmarks .

  • DEBORA Laurence

    8/24/2014 2:47:28 PM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • DEBORA Laurence

    8/24/2014 4:14:43 PM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • DEBORA Laurence

    8/24/2014 7:33:55 PM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • DEBORA Laurence

    8/24/2014 8:24:10 PM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • DEBORA Laurence

    8/25/2014 12:14:38 AM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • DEBORA Laurence

    8/25/2014 2:08:57 AM |

    annuaires-gratuit.com/ vous propose de créer gratuitement un annuaire de sites internet pour un bon référencement.

  • lalit vohra

    11/21/2014 7:46:43 AM |

    Hi there, just became alert to your weblog via Google, and located that it's truly informative.I will appreciate if you happen to continue this in future. A lot of other people will likely be benefited out of your writing. Cheers!

  • thin wallet money clip combo

    12/3/2014 11:34:17 AM |

    <A href="http://bit.ly/1HxLKUY";>Trading</A> has been an uplifting experience. Thanks for giving this info.

  • money exchange bank

    12/4/2014 4:30:18 AM |

  • make money online fast for free

    12/4/2014 4:38:37 PM |

    Start trading at <A href="http://bit.ly/1HxLKUY";>Cedar Finance </A>it's a smart way to get some extra dollars before christmas.

  • what is the best way to make money online for free

    12/6/2014 4:47:40 AM |

    <A href="http://bit.ly/1HxLKUY";>Cedar Finance </A>is a simple and easy binary trading plateform. You should read overcedar finance.

  • best work from home ideas no fees

    12/6/2014 7:30:50 AM |

    Trading is exciting at <A href="http://bit.ly/1HxLKUY";>cedar Finance</A>. Click to see more.

  • work from home jobs nowra

    12/6/2014 9:11:06 AM |

    This website is a wealth of <A href="http://bit.ly/1HxLKUY";>information</A>. Thanks for sharing.

  • idlife

    12/8/2014 8:31:11 AM |

    Nice post! Check out my site sometime!

  • this website

    12/17/2014 11:33:15 AM |

    Thanks a lot for the blog article.Much thanks again. Really Great.

  • fishinglovers

    12/18/2014 4:24:27 AM |

    Im obliged for the article.Much thanks again. Really Great.

  • kCTmk6gY

    1/9/2015 9:28:44 AM |

    35091 320214Wow, suprisingly I never knew this. Maintain up with excellent posts. 636806

Loading

Using the Windows 8.1 Hub as an ItemsControl

Diederik Krols

The XAML Brewer

Using the Windows 8.1 Hub as an ItemsControl

This article presents a Windows 8.1 Hub control with ItemsSource and ItemTemplate properties, making it easily bindable and more MVVM-friendly. The Hub control has become the main host on the startup screen of many Windows Store apps: it’s flexible but still presents a standard look-and-feel with the title and section headers at the appropriate location, it nicely scrolls horizontally, and it comes with semantic zoom capabilities (well, at least with some help). Although it visually presents a list of items, it’s not an ItemsControl: the Hub control expects you to provide its sections and their corresponding data template more or less manually. Let’s solve that issue and create a Hub control that’s closer to an ItemsControl:

I initially started by hooking up attached properties to the existing Hub class, but then I discovered that this class is not sealed. So I created a subclass, named ItemsHub:

public class ItemsHub : Hub
{
	// ...
}

Then I just added some of the missing properties as dependency properties (using the propdp snippet): ItemTemplate as a DataTemplate, and ItemsSource as an IList.

public DataTemplate ItemTemplate
{
    get { return (DataTemplate)GetValue(ItemTemplateProperty); }
    set { SetValue(ItemTemplateProperty, value); }
}

public static readonly DependencyProperty ItemTemplateProperty =
    DependencyProperty.Register("ItemTemplate", typeof(DataTemplate), typeof(ItemsHub), new PropertyMetadata(null, ItemTemplateChanged));

public IList ItemsSource
{
    get { return (IList)GetValue(ItemsSourceProperty); }
    set { SetValue(ItemsSourceProperty, value); }
}

public static readonly DependencyProperty ItemsSourceProperty =
    DependencyProperty.Register("ItemsSource", typeof(IList), typeof(ItemsHub), new PropertyMetadata(null, ItemsSourceChanged));

When ItemTemplate is assigned or changed, we iterate over all Hub sections to apply the template to each of them:

private static void ItemTemplateChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
{
    ItemsHub hub = d as ItemsHub;
    if (hub != null)
    {
        DataTemplate template = e.NewValue as DataTemplate;
        if (template != null)
        {
            // Apply template
            foreach (var section in hub.Sections)
            {
                section.ContentTemplate = template;
            }
        }
    }
}

When ItemsSource is assigned or changed, we repopulate the sections and their headers from the source IList, and re-apply the data template (you should not make assumptions on the order in which the dependency properties are assigned):

private static void ItemsSourceChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
{
    ItemsHub hub = d as ItemsHub;
    if (hub != null)
    {
        IList items = e.NewValue as IList;
        if (items != null)
        {
            hub.Sections.Clear();
            foreach (var item in items)
            {
                HubSection section = new HubSection();
                section.DataContext = item;
                section.Header = item;
                DataTemplate template = hub.ItemTemplate;
                section.ContentTemplate = template;
                hub.Sections.Add(section);
            }
        }
    }
}

Instead of defining a HeaderPath property, or creating a HeaderTemplate, I decided to fall back on the default template (a text block) and to assign the whole item to the section header. Now all you need to do to show a descent header is overriding the ToString method in the (View)Model class:

public override string ToString()
{
    return this.Name;
}

Here’s how to create an ItemsHub control in XAML and define the bindings to its new properties in a light-weight MVVM style:

<Page.DataContext>
    <local:MainPageViewModel />
</Page.DataContext>

<Page.Resources>
    <DataTemplate x:Key="DataTemplate">
        <Image Source="{Binding Image}" />
    </DataTemplate>
</Page.Resources>

<Grid>
    <local:ItemsHub Header="Hub ItemsControl Sample"
                    ItemTemplate="{StaticResource DataTemplate}"
                    ItemsSource="{Binding Manuals}" />
</Grid>

Here’s the result in the attached sample app. Each Hub section represents a business object (an Ikea Instruction Manual) from a collection in the ViewModel:

I focused on the properties that make the most sense in MVVM apps, but -as I mentioned- the framework Hub class is not sealed. So you can use this same technique to add other useful properties like Items and DataTemplateSelector.

Here’s the full code, it was written with Visual Studio 2013 for Windows 8.1: U2UConsult.WinRT.HubItemsControl.zip (759.57 kb)

Enjoy!
Diederik

Comments (8) -

  • Arafat

    1/7/2014 7:00:59 PM |

    You guys have been tremendous when it comes to improvement for Windows Store apps.

    I would like to ask you a question that how will we be able to access the items inside Hub Section? Suppose I have got a GridView and ListView instead HubPanel and I want to bind to different data sources. Is it possible to do that?

    Currently, I am following not-so-good approach for that and I have made a separate UserControl to bind them and then that UserControl is placed under Hub Section. The disadvantage of this approach is, whenever I get back to the page, it loads again and again (which I don't want because of performance issues)

    Please let me know if you have any solution to that.

    Thanks a lot
    Arafat

  • Diederik Krols

    1/9/2014 2:39:50 AM |

    Hi Arafat,
    for a Hub that contains different types of sections, it would make sense to let the main viewmodel expose a list of different (view)models, and let a DataTemplateSelector (to be implemented as dependency property) look up the appropriate template. That way you still make maximum use of XAML data binding.

  • best dentist St. Louis

    1/26/2014 5:06:35 AM |

    I've been meditating on the similar matter myself lately. Glad to see someone on the same wavelength! Nice article.

  • silent hill games news

    1/26/2014 5:50:13 AM |

    You...are...my...hero!!!  I cant believe something like this exists on the internet!  Its so true, so honest, and more than that you dont sound like an idiot!  Finally, someone who knows how to talk about a subject without sounding like a kid who didnt get that bike he wanted for Christmas.

  • Earnest Schember

    2/7/2014 1:48:09 AM |

    Glad to be one of the  visitors   on this  awe inspiring   web site  : D.

  • Nevada Abdeldayen

    3/27/2014 11:33:08 PM |

    This is my first time pay a quick visit at here and i am truly impressed to read all at alone place.

  • Andrew Pelt

    4/18/2014 6:41:36 PM |

    I simply want to tell you that I'm new to weblog and definitely loved you're web-site. Almost certainly I’m likely to bookmark your website . You absolutely come with amazing article content. Thank you for sharing your webpage.

  • Larhonda Schomberg

    5/17/2014 4:53:53 AM |

    Greetings! I know this is kinda off topic but I was wondering which blog platform are you using for this site? I'm getting sick and tired of Wordpress because I've had problems with hackers and I'm looking at alternatives for another platform. I would be fantastic if you could point me in the direction of a good platform.|

Comments are closed
File IO in Windows 8

File IO in Windows 8

Let’s leave Windows Phone for a moment, and let’s have a look at Windows 8. Recently the U2U-team was present at the Microsoft Build-convention in Anaheim, California, where Microsoft unveiled Windows 8. Now, what we saw is still just a preview, but it clearly shows the direction in which Windows is evolving. Have a look at this video, to see Windows 8 in action :

 

The “new style” applications Windows 8 offers are “Metro”-applications (yes, like in Windows Phone Metro style), and these can be made using Xaml and managed code, or even using HTML5 + Javascript. Metro-apps will have to be installed through a marketplace, and are kind of sandboxed (Comparable to Silverlight, but different ;-)  ). One of the effects of this sandboxing is that your options for working with files are limited. You cannot just access any folder on your machine ! Unfortunately you’re also unable to use Isolated Storage. So what can you use ? Let’s start by writing info into the “Local Folder”.

Windows.Storage.ApplicationData appData = Windows.Storage.ApplicationData.Current;
 
StorageFile file = await appData.LocalFolder.CreateFileAsync("EmployeeList.u2u");
Windows.Storage.Streams.IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.ReadWrite);

 

No ordinary File here, but a StorageFile which I have to open to write in. The IRandomAccessStream I get back gives me the possibility to create an input- (for reading) or output-stream (for writing).

IOutputStream output = stream.GetOutputStreamAt(0);

 

And then I got stuck. The IOutputStream gives me a WriteAsync-function, but it ask me for an IBuffer-object. How to write data ? Luckily there’s already some MSDN-documentation available. That tells me to use a DataWriter which takes the IOutputStream as a constructor-argument.

DataWriter writer = new DataWriter(output);
writer.WriteString("HERE GOES DATA");
output.FlushAsync();

 

Not there yet ! The code I wrote starts fro; the assumption that File IO is much like the “ordinary” file IO: flush the stream and done. In Metro we have to ”commit” the data in the writer, and I have to start the flushing, so this is the working code :

   1:  DataWriter writer = new DataWriter(output);
   2:  writer.WriteString("HERE GOES DATA");
   3:   
   4:  await writer.StoreAsync();
   5:  output.FlushAsync().Start();    
   6:  statusTxt.Text = "File Saved";

 

They could have made it simpler Smile.
By localfolder I assumed the bin/debug-folder. Nothing there. It seems my file is located in C:\Users\michael\AppData\Local\Packages\d64899f1-9800-470a-9cb3-fa89210f4941_qs0a8q7rnpy8j\LocalState.

How about reading the file ? Well, simply reverse your writing-logic (writer becomes reader, output becomes input, store becomes load, …)

   1:  Windows.Storage.ApplicationData appData = Windows.Storage.ApplicationData.Current;
   2:   
   3:  var file = await appData.LocalFolder.GetFileAsync("EmployeeList.u2u");
   4:  Windows.Storage.Streams.IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);
   5:   
   6:  IInputStream input = stream.GetInputStreamAt(0);
   7:   
   8:  DataReader reader = new DataReader(input);
   9:   
  10:  var size = stream.Size;
  11:  await reader.LoadAsync((uint)size);
  12:  var data = reader.ReadString((uint)size);

 

There you go!

What about other folders than the “local” ? The KnownFolders=clqss gives me access to following locations :

  • Documents Library
  • Home Group
  • Media Server Devices (DLNA – Digital Living Network Alliance Devices, sounds interesting)
  • MusicLibrary
  • Pictures Library
  • Removable Devices
  • Videos Library

Let’s change the first 2 lines of my writing-code for using the documents library :

StorageFolder doclib = Windows.Storage.KnownFolders.DocumentsLibrary;
 
StorageFile file = await doclib.CreateFileAsync("EmployeeList.u2u");

 

As soon as you start running your code, it will fail on the first line (no exception though, simply stops). That’s because you explicitly have to give the app the capability to access the Document library. Double-clicking on Package.appxmanifest allows you to do that :

image

Still: this time your app will stop on the second line. You also need to associate your app with the u2u-fileextension. This is also done in the appxmanifest-file in the declarations-tab where we add a filetype association declaration :

image

There you go: the app works and saves my u2u-file in the doclibrary.

Loading