Introduction

This week is divided into two topics. The first topic covers some data manipulation functions in SQL, covering numeric, date and string data types. In the second part we will look at how to do Data Mining/Machine Learning using SQL functions. Yes you can do data mining/machine learning in a database.

Notes

Click here to download the SQL Manipulation Function notes

 

Click here to download the Oracle Data Mining notes

L4-2-In-Database_Data_Mining

Lab Exercises

Lab 1 – Data Manipulation functions

 

Lab 2 – In-Database Machine Learning

Check out the book chapter from my Oracle Data Mining book

Task 1 – Loading the data set
Load the required data sets into your schema. You will need to download the following files and save them to your environment. Then load these into your Schema using SQL Developer.

Training Data Set = mining_data_build_v.sql

Create two tables that have the same structure as MINING_DATA_BUILD_V table. These following tables should contain No data.

MINING_DATA_TEST_V
MINING_DATA_APPLY_V

Example SQL to create these tables with no records

create table mining_data_test_v
as select * from mining_data_build_v
where rownum < 1;

Now load the data into each of these tables.

MINING_DATA_TEST_V = mining_data_test_v.sql
MINING_DATA_APPLY_V = mining_data_apply_v.sql

Task 2 – Creating the settings table
Create the Settings table for a Decision Tree model

-- Create the settings table
CREATE TABLE decision_tree_model_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(30));
-- Populate the settings table
-- Specify DT. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used.
BEGIN
   INSERT INTO decision_tree_model_settings (setting_name, setting_value)
   VALUES (dbms_data_mining.algo_name,dbms_data_mining.algo_decision_tree);

   INSERT INTO decision_tree_model_settings (setting_name, setting_value)
   VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
   COMMIT;
END;

Task 3 – Creating a Decision Tree
Create the model for the decision tree, with AFFINITY_CARD as the target attribute/feature.

BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
   model_name => 'Decision_Tree_Model2',
   mining_function => dbms_data_mining.classification,
   data_table_name => 'mining_data_build_v',
   case_id_column_name => 'cust_id',
   target_column_name => 'affinity_card',
   settings_table_name => 'decision_tree_model_settings');
END;

Task 4 – Exploring the meta-data for the models
Use the following commands to explore the meta-data about the Decision Tree model.

Can you explain what the purpose is for each of these commands.

-- describe the model settings tables
describe user_mining_model_settings
-- List all the ODM models created in your Oracle schema => what machine learning models you have created
SELECT model_name,
   mining_function,
   algorithm,
   build_duration,
   model_size
FROM user_MINING_MODELS;
-- List the algorithm settings used for your machine learning model
SELECT setting_name,
   setting_value,
   setting_type
FROM user_mining_model_settings;
-- WHERE model_name in 'DECISION_TREE_MODEL2';
-- List the attribute the machine learning model uses. It may use a subset of the attributes.
-- This allows you to see what attributes were selected
SELECT attribute_name,
   attribute_type,
   usage_type,
   target
from all_mining_model_attributes
where model_name = 'DECISION_TREE_MODEL2';

You have now explored the machine learning models in your Oracle schema.

NOTE: SQL is not a graphic language. You cannot visualise the decision tree like in other graphical based tools.
But if you explore more of the DB objects created behind the scenes, you might be able to work this out.
This is not necessary for these exercises.

Task 5 – Generating the Confusion Matrix
First we need to apply the model to the test data set

-- create a view that will contain the predicted outcomes => labeled data set
CREATE OR REPLACE VIEW demo_class_dt_test_results
AS
SELECT cust_id,
   prediction(DECISION_TREE_MODEL2 USING *) predicted_value,
   prediction_probability(DECISION_TREE_MODEL2 USING *) probability
FROM mining_data_test_v;
-- Select the data containing the applied/labeled/scored data set
-- This will be used as input to the calculation of the confusion matrix
SELECT *
FROM demo_class_dt_test_results;

Now we generate the confusion matrix.

NB: Make sure to have server output turned on in SQL Developer

DECLARE
   v_accuracy NUMBER;
BEGIN
DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
   accuracy => v_accuracy,
   apply_result_table_name => 'demo_class_dt_test_results',
   target_table_name => 'mining_data_test_v',
   case_id_column_name => 'cust_id',
   target_column_name => 'affinity_card',
   confusion_matrix_table_name => 'demo_class_dt_confusion_matrix',
   score_column_name => 'PREDICTED_VALUE',
   score_criterion_column_name => 'PROBABILITY',
   cost_matrix_table_name => null,
   apply_result_schema_name => null,
   target_schema_name => null,
   cost_matrix_schema_name => null,
   score_criterion_type => 'PROBABILITY');
   DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: ' || ROUND(v_accuracy,4));
END;

Then run

— select the results that are now stored in the table created during the Confusion Matrix function

SELECT *
FROM demo_class_dt_confusion_matrix;

Task 6 – Applying the model to new data – Bulk processing
Let us now apply the model to new data that has already been gathered in a table. With bulk processing we want to generate a labeled data set that contains the predicted value by the machine learning model.

BEGIN
   dbms_data_mining.apply(
   model_name => 'Decision_Tree_Model2',
   data_table_name => 'MINING_DATA_APPLY_V',
   case_id_column_name => 'CUST_ID',
   result_table_name => 'NEW_DATA_SCORED');
END;

Inspect the results in the NEW_DATA_SCORED table and write some queries to compare the results.

You will need to refresh the object tree on the left hand side of SQL Developer to get this new table appearing.

Task 7 – Applying the model to new data – Real-time processing
Use the PREDICTION and PREDICTION_PROBABILITY functions to label data in real-time mode.

SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *)
FROM MINING_DATA_APPLY_V;
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *), , PREDICTION_PROBABILITY(Decision_Tree_Model2 using *)
FROM MINING_DATA_APPLY_V
WHERE rownum <=5;
SELECT cust_id
FROM MINING_DATA_APPLY_V
WHERE PREDICTION(Decision_Tree_Model2 using *) = 1
AND rownum <=10;

Task 8 – Using the model for What-if analysis
Now try out doing some What-If analysis using the machine learning model.

select PREDICTION_PROBABILITY (Decision_Tree_Model2, 0
USING 46 as AGE,
3 as HOUSEHOLD_SIZE,
'Assoc-A' as EDICATION,
'M' as CUST_GENDER,
'MARRIED' as CUST_MARITAL_STATUS,
'Crafts' as OCCUPATION) Pred_Prob
from dual;

 

Additional Reading

Oracle Data Mining – Classification Book chapter