Overview of this Lecture / week

This week and next week we will cover Classification. This is perhaps one of the most commonly used machine learning techniques used for Data Mining. There are two main methods for Classification and these are regularly used interchangeably. The first is the labeling of binary or multi-class features. The second is the prediction of continuous value features.

This week we will cover what is classification, what it is typically used for, the typical process, some data preparation requirements for classification, we will example decision trees and how they are created, then look at Random Forests, and finally how to evaluate the classification models (How do you know what is good).

Next week we will look at some additional Classification algorithms and also look at Regression. Many of the machine learning algorithms can be used for classification and regression problems.


Click here to download the notes for the Classification topic. These notes cover this week and next week.


Videos of Notes

Original xkcd Post

Lab Exercises

The lab for this week involves using SAS Enterprise Miner to create a Decision Tree model and to explore the various evaluation metrics.

Follow the SAS Enterprise Miner notes for creating a Decision Tree.


Additional Reading Materials

Tan Book Chapter

24 Evaluation Metrics for Binary Classification (and when to use them)

ODM Article – Decisions Grow on Trees

US Government Report on their use of Data Mining

A systematic analysis of performance measures for classification tasks

Guide to Decision Trees

A simple explanation to Classification Models