Getting started with Python in Microsoft Azure - Machine Learning Service

Getting started with Python in Microsoft Azure - Machine Learning Service

In my last blogpost I tried to convince you, as a Python programmer, that Microsoft Azure has plenty to offer you. Even though more and more ETL and machine learning products in Azure are based on a fully visual experience, there are also quite a few products in which (Python) coding skills are extremely useful or even required.

As an example, in the last blog I showed how to get started with Azure Databricks, a big data and machine-learning platform built on top of Apache Spark. The idea is that you can use Databricks to easily set up a Spark cluster, with which you interact through notebooks. One of the languages that can be used inside a notebook is Python.

Azure Machine Learning Service

In this blog, I will show you a second Azure product in which Python can be useful: Azure Machine Learning Service. This is a cloud service that you can use to train, deploy, automate, and manage machine learning models at a broad scale.

You may wonder now what makes Machine Learning Service different from Databricks. Databricks allows you to significantly spead up your Machine Learning activities by running them on a Spark Cluster. But what if you want to use another type of compute resource to train your models? What if you want to keep track of the results of different model comparisons? What if you want to deploy your model? These things may not be straightforward with Databricks, but they are with Machine Learning Service. Another interesting feature is that you can incorporate your DataBricks notebook in a Machine Learning Service pipeline.

Three Ways of using Machine Learning Service

There are basically three ways in which you can use Azure Machine Learning Services. With the first way, AutoML, you just provide the data and specify the type of problem that you try to solve (classification, regression or forecasting). Automated ML then uses a trial-and-error approach to solve your problem in a 'reasonably good' way. The second approach, consists of a visual interface in which you can create a model by dragging and dropping modules to create a Machine Learning flow (like Machine Learning Studio). The third possibility is to use the Azure Machine Learning SDK for Python to build and run your own machine learning workflows. You can interact with this service in any Python environment, including Jupyter Notebooks. You can train models either locally or by using cloud compute resources. Let's dive a bit deeper into this last approach!

Getting started with the Azure Machine Learning SDK for Python

Creating a Machine Learning Workspace

Let's create a new Machine Learning resource. Currently, there are two editions: Basic and Enterprice (in preview). The basic workspace is for free, the resources that you create within a workspace will be charged. Check this link for more pricing details. For now, let's go for a basic workspace.

Creating a Notebook VM

In the Notebook VM tab of the compute section, click 'New' to create a new notebook VM. You can choose a VM Type that fits your computational needs. Let's just create the default notebook VM for this demonstration.

createvm

Creating or Importing a Notebook

When the notebook is created, a link to Jupyter notebooks appears behind it. If you click on this link, you will find a lot of useful tutorials and how-to notebooks in the AzureML Samples tab to get you started. For this blog, I created a notebook that you can download from here; upload it to Jupter Notebooks on your Notebook VM.

importnotebook

Training the model and logging the results in an experiment

Once we are in the notebook, we can start writing Python code to train our model. We can keep track of the results of training and testing models in the Machine learning workspace that we created before.

  1. Connect to the workspace and create an experiment
    The first thing we need to do is connect to our Machine Learning Workspace on Azure. To do so, we will import the Workspace class and load our subscription information from the config.json file in our current directory using the from_config() function. Next, we will create an experiment in this workspace. In the experiment, we will keep track of our results.

workspace_exp

  1. Loading and transforming data
    In this demo, we will use the census dataset again. This data set contains information about adults; the goal is to predict whether a person earns less or more than 50K. You can download the dataset from here. Upload it to Jupyter notebooks (in the same folder as the current notebook) so that you can easily load it using pandas.

data

Before we can apply a machine learning model to this dataset, we need to do some preprocessing. Let's transform all categorical columns to numerical ones.

preprocessing

Next, we will divide the data into a training and test set

traintest

  1. Training the model
    In this demonstration, we will train an easy scikit-learn model without extensive hyperparameter tuning or comparing many different models. We can simply do this on our Notebook VM. It is possible however, to attach a more powerful compute target, and to train your model over there. In the img-classification-part1-training sample notebook (in the tutorials section of the AzureML Samples tab of Jupyter Notebooks), it is explained how to attach a compute resource, and how to train your model on this remote compute.

Let's train a number of decision trees with different maximum tree depths. We will log the results and save the models as pickle files in the experiment that we created in our Machine Learning Workspace.

trainmodel

The results of these different models are logged in the experiment in your Machine Learning workspace. Which model has the best accuracy? You can click on a run to get more details and to download the pickle files.

results

Let's register the best performing model.

registermodel

Deploying the best model

Now that we have our model, we can deploy it as a web service hosted in ACI. To do so, we need:

  1. A scoring script to show how to use the model
  2. An environment file to show what packages need to be installed
  3. A configuration file to build the ACI
  4. The model we trained before

Let's start with creating a scoring file:

scoring

Next, we create an environment file.

environment

The final thing we need to do before deploying is creating a configuration file for the ACI container.

config

Now we can finally deploy our model! This may take a while...

deploy

Once the deployment is done, we can test the webservice using new data. Let's use the first two rows of our test data for this:

test

That worked well!
I hope that with this blogpost I showed you that Azure Machine Learning Service can be very useful for you if you want to create your machine learning models with Python. More about Python and Azure in my next blogpost!