Machine Learning Flash

Overview

Teaching: 15 min
Exercises: 0 min

Questions

How can I use scikit learn to apply machine learning?

Objectives

Run a SVM classifier on the MNIST digits

Make a prediction for an arbitrary set of images of a digit.

`scikit-learn` is one of the most widely used scientific machine learning library in Python.

Commonly called sklearn.
The Jupyter Notebook will render plots inline if we ask it to using a “magic” command.

%matplotlib inline
import matplotlib.pyplot as plt

import scikit-learn

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics

load the data

digits = datasets.load_digits()

Get the lay of the land

combine two lists using the zip function for easier handling inside the plotting loop
- note: target refers to a numerical representation of the labels

images_and_labels = list(zip(digits.images, digits.target))

create several subplots to draw the first four items in the dataset as well as their actual label

for index, (image, label) in enumerate(images_and_labels[:4]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)

Flatten the input images

The inputs are 8x8 grayscale images
produce a flat array of 64 pixel values so that each pixel corresponds to a column/observable for the classifier later on

n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

Train the Classifier

in this example, we use a support vector machine provided within sklearn

classifier = svm.SVC(gamma=0.001)

we train the classifier on half the dataset

classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])

Machine Learning in action

Let’s do the prediction on the remaining half of the dataset

expected = digits.target[n_samples // 2:]
predicted = classifier.predict(data[n_samples // 2:])

compute some metrics/statistics on the quality of the prediction, misprediction rates etc.

print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))

Let the machine speak

if satisfied with the above, we can predict some images

images_and_predictions = list(zip(digits.images[n_samples // 2:], predicted))
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
    plt.subplot(2, 4, index + 5)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Prediction: %i' % prediction)

plt.show()

Key Points

flatten input dataset as the SVM is unaware of the idea of an image

Split your data 50/50 and train on the first half.

Predict the other half.

Produce a confusion matrix to check the quality of the learning.

Plot some images and their predicted values.

previous episode

Programming with Python

lesson home

Machine Learning Flash

Overview

`scikit-learn` is one of the most widely used scientific machine learning library in Python.

Get the lay of the land

Flatten the input images

Train the Classifier

Machine Learning in action

Let the machine speak

Key Points

previous episode

lesson home

previous episode

Programming with Python

lesson home

Machine Learning Flash

Overview

scikit-learn is one of the most widely used scientific machine learning library in Python.

Get the lay of the land

Flatten the input images

Train the Classifier

Machine Learning in action

Let the machine speak

Key Points

previous episode

lesson home

`scikit-learn` is one of the most widely used scientific machine learning library in Python.