U2U Blog

for developers and other creative minds

Loading data in Azure Machine Learning

In July 2014 Microsoft made their cloud-based data mining environment (known as Azure Machine Learning, or AzureML) available to the public. With this platform users can analyze large amounts of data without the need to install and configure special software: A browser and a credit card is all you need Glimlach. With the increasing number of people in a number-crunching job (data scientists) it is nice to see Microsoft focusing on this.

In a previous blog post (see http://blogs.u2u.be/u2u/post/2014/07/14/First-Steps-in-Azure-Machine-Learning.aspx) I show how to get started with setting up an AzureML environment. In this blog post we take a look at loading data in AzureML.

Supported data formats

Currently AzureML is focusing on the most common formats used in the world of machine learning:

  • Text files containing comma separated values (CSV), tab-separated values (TSV), the Attribute-Relation File Format (ARFF) which was introduced by the open-source Weka machine learning framework, RData files or the SVMLight format.
  • Database tables: Hive tables (Hadoop), Azure Tables and SQL Databases in Azure

Since AzureML runs in the Azure cloud, all your data must be in the cloud as well. Either you already uploaded your data to Azure (e.g. your data is stored in an Azure SQL Database) or you will upload it explicitly for this project. In both cases be careful to store your data in the same region as where you’re running your AzureML, since in the preview period, AzureML only runs from the South Central US data center. If you store your data in another data center it will be slower and more expensive to run your experiments.

Let’s first consider the scenario where you upload your data from a local file directly into AzureML (Uploading a DataSet), then we cover the scenario where your data is already somewhere in Azure (Reading data).

Uploading a DataSet

A lot of sample machine learning data sets are already available out-of-the-box in Azure ML. But after some experimenting with public data, you probably want to play with your own data. If you didn’t have your data anywhere on Azure yet, you can upload it as a new dataset in AzureML directly. But before we start adding data sets, first a warning: In the current preview we cannot delete uploaded datatsets. We can override an existing data set with new data, but if you create 1001 data sets, they will be in the list forever (that is: until Microsoft fixes this limitation). Because of this, if your dataset is not yet fixed, consider uploading the data file(s) into a custom Azure blob store and then load them with the reader from within your experiment.

To add a new dataset, click the +New button at the bottom left of the ML Studio screen, and select DataSet –> From local file. In the next dialog box, we can pick the file to upload, provide a name (choose well, it cannot be altered later on), select the type of data in the file and provide an optional description:
image

If you select the checkbox you select an existing dataset, who’s content will be overwritten by the file you select. It is impossible to delete or rename a datset, but you can always upload an empty file ‘as a new version’ of a large data set to truncate it.

If we now want to use this data, we create a new experiment by clicking the +New button. In this new experiment under the Saved Datasets we will find our uploaded dataset among the list. Just drag it to the design surface.

image

Also remember the search box at the top: by typing part of an object name (and a data set is one of the many objects we have in AzureML) we get a filtered list which makes it easier to find an object.

Now that we have our data in AzureML we can start interacting with it, such as simply visualizing our data: click in the circle under the data set and select Visualize:
image

This will open up the overview screen, showing basic statistical information on each data field:

image

Reading data

Another way to get data in an AzureML experiment is by first uploading your data in a Azure SQL Database, an Hadoop cluster (such as HDInsight) or upload the files with data (same data types as we had in the previous paragraph) into an Azure blob store.

In this case you do not need to create a data set, but you can immediately create a new experiment.

In this experiment, locate the Reader under Data Input and Output and drag it into the experiment.
image
When we click in the Reader, we get on the right-hand side all the configurable properties of this Reader. The most important property is the data source type. This one determines which other properties are needed. Select over here the location where your data can be found and configure the other properties appropriately
image

When we now run the experiment, we can visualize the data from this reader, just as we could we an uploaded data set. But we have an extra option. By clicking Save as dataset, we can permanently store this data in AzureML. This speeds up the runtime of an experiment, but it increases the storage cost (we store another redundant copy of the data).
image

In a next blog post, I will discuss data preprocessing.

Comments (46) -

  • kitchen cabinet ideas

    4/24/2015 6:35:42 PM |

    Excellent blog here! Additionally your web site rather a lot up very fast! What host are you using? Can I am getting your associate hyperlink on your host? I desire my website loaded up as fast as yours lol

  • careers in finance

    4/24/2015 6:35:57 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • DSLR camera

    4/24/2015 7:23:45 PM |

    I cling on to listening to the news lecture about receiving free online grant applications so I have been looking around for the top site to get one. Could you advise me please, where could i get some?

  • Legal Services

    4/24/2015 7:24:08 PM |

    Thank you a lot for sharing this with all folks you really recognise what you're speaking approximately! Bookmarked. Please also talk over with my website =). We could have a hyperlink trade agreement between us!

  • acne treatment

    4/24/2015 7:27:15 PM |

    Thank you a bunch for sharing this with all folks you really understand what you're speaking approximately! Bookmarked. Please also discuss with my site =). We may have a link exchange contract between us!

  • home interior

    4/24/2015 7:32:04 PM |

    I will immediately take hold of your rss as I can not find your e-mail subscription hyperlink or newsletter service. Do you have any? Please let me know so that I may just subscribe. Thanks.

  • Leukemia

    4/24/2015 7:32:36 PM |

    I haven¡¦t checked in here for some time as I thought it was getting boring, but the last few posts are good quality so I guess I¡¦ll add you back to my everyday bloglist. You deserve it my friend Smile

  • Legal Forms

    4/24/2015 7:41:07 PM |

    I haven¡¦t checked in here for some time because I thought it was getting boring, but the last several posts are great quality so I guess I will add you back to my everyday bloglist. You deserve it my friend Smile

  • Money Order

    4/24/2015 7:59:03 PM |

    You actually make it appear really easy with your presentation but I in finding this matter to be really one thing which I believe I might by no means understand. It sort of feels too complicated and very huge for me. I am looking ahead to your subsequent post, I¡¦ll try to get the hold of it!

  • Kenji Kowata

    4/24/2015 9:44:57 PM |

    I'm very pleased to uncover this web site. I want to to thank you for ones time just for this wonderful read!! I definitely savored every part of it and I have you saved as a favorite to see new things in your website.

  • business strategy

    4/24/2015 9:54:40 PM |

    This is very interesting, You are a very skilled blogger. I've joined your feed and look forward to seeking more of your wonderful post. Also, I've shared your website in my social networks!

  • Healthy Skin

    4/24/2015 11:36:29 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • security finance

    4/25/2015 12:59:54 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Natural Skin Care

    4/25/2015 4:49:28 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • C.A.N. Przedsiebiorstwo Uslugowe Kalitta £ukowska Lucyna

    4/25/2015 5:34:52 AM |

    Thanks a lotopyeer bunch for sharing this with all folks you really realize what you are speaking about! Bookmarked. Please additionally seek advice from my website =). We may have a link exchange agreement among us!

  • International Travel Tips

    4/25/2015 6:04:04 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • computer components

    4/25/2015 6:37:14 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • CARPET TILES

    4/25/2015 6:59:40 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • DINING ROOM FUNITURE

    4/25/2015 7:31:13 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Drivers Education

    4/25/2015 7:55:07 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • structured finance

    4/25/2015 8:26:30 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Public School

    4/25/2015 8:58:44 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Kitchen design

    4/25/2015 9:40:59 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Modern Sofa

    4/25/2015 9:50:04 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Elementary Education

    4/25/2015 10:32:41 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • real estate broker

    4/25/2015 10:35:12 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • vga cable

    4/25/2015 10:37:31 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • project finance

    4/25/2015 10:44:08 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • project finance

    4/25/2015 11:11:26 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • Home Decorating Ideas

    4/25/2015 11:44:49 AM |

    that's good, thanks for sharing,..  I think this is great blog

  • security finance

    4/25/2015 12:24:12 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Painting

    4/25/2015 12:52:02 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • apple iphone

    4/25/2015 12:59:13 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Technology News

    4/25/2015 1:17:30 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Men s Skin Care

    4/25/2015 1:34:27 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Organic Cosmetics

    4/25/2015 2:22:41 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Elementary Education

    4/25/2015 2:22:50 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • LED headlights

    4/25/2015 2:29:15 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Ilonn Hotel Poznan

    4/25/2015 2:34:08 PM |

    Wnetrza, które inspiruja do pracy i odpoczynku, 77 pokoi, nowoczesne sale konferencyjne,  2 restauracje do wyboru oraz studio sauny dla relaksu - to wszystko w jednym hotelu w Poznaniu.

  • finance news

    4/25/2015 3:13:39 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Kitchen design

    4/25/2015 3:34:12 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Financial Management

    4/25/2015 3:35:33 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Home office design

    4/25/2015 3:42:02 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Men's Skin Care

    4/25/2015 3:44:39 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • health tips

    4/25/2015 4:16:58 PM |

    that's good, thanks for sharing,..  I think this is great blog

  • Travel Packages

    4/25/2015 4:25:48 PM |

    that's good, thanks for sharing,..  I think this is great blog

Loading