Machine Learning Flash

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • How can I use scikit learn to apply machine learning?

Objectives
  • Run a SVM classifier on the MNIST digits

  • Make a prediction for an arbitrary set of images of a digit.

scikit-learn is one of the most widely used scientific machine learning library in Python.

%matplotlib inline
import matplotlib.pyplot as plt
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
digits = datasets.load_digits()

Get the lay of the land

images_and_labels = list(zip(digits.images, digits.target))
for index, (image, label) in enumerate(images_and_labels[:4]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)

Flatten the input images

n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

Train the Classifier

classifier = svm.SVC(gamma=0.001)
classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])

Machine Learning in action

expected = digits.target[n_samples // 2:]
predicted = classifier.predict(data[n_samples // 2:])
print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))

Let the machine speak

images_and_predictions = list(zip(digits.images[n_samples // 2:], predicted))
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
    plt.subplot(2, 4, index + 5)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Prediction: %i' % prediction)

plt.show()

Key Points

  • flatten input dataset as the SVM is unaware of the idea of an image

  • Split your data 50/50 and train on the first half.

  • Predict the other half.

  • Produce a confusion matrix to check the quality of the learning.

  • Plot some images and their predicted values.