This week is divided into two topics. The first topic covers some data manipulation functions in SQL, covering numeric, date and string data types. In the second part we will look at how to do Data Mining/Machine Learning using SQL functions. Yes you can do data mining/machine learning in a database.
Click here to download the SQL Manipulation Function notes
Click here to download the Oracle Data Mining notesL4-2-In-Database_Data_Mining
Lab 1 – Data Manipulation functions
Lab 2 – In-Database Machine Learning
Check out the book chapter from my Oracle Data Mining book
Task 1 – Loading the data set
Load the required data sets into your schema. You will need to download the following files and save them to your environment. Then load these into your Schema using SQL Developer.
Training Data Set = mining_data_build_v.sql
Create two tables that have the same structure as MINING_DATA_BUILD_V table. These following tables should contain No data.
Example SQL to create these tables with no records
create table mining_data_test_v as select * from mining_data_build_v where rownum < 1;
Now load the data into each of these tables.
Task 2 – Creating the settings table
Create the Settings table for a Decision Tree model
-- Create the settings table CREATE TABLE decision_tree_model_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30));
-- Populate the settings table -- Specify DT. By default, Naive Bayes is used for classification. -- Specify ADP. By default, ADP is not used. BEGIN INSERT INTO decision_tree_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name,dbms_data_mining.algo_decision_tree); INSERT INTO decision_tree_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on); COMMIT; END;
Task 3 – Creating a Decision Tree
Create the model for the decision tree, with AFFINITY_CARD as the target attribute/feature.
BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'Decision_Tree_Model2', mining_function => dbms_data_mining.classification, data_table_name => 'mining_data_build_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => 'decision_tree_model_settings'); END;
Task 4 – Exploring the meta-data for the models
Use the following commands to explore the meta-data about the Decision Tree model.
Can you explain what the purpose is for each of these commands.
-- describe the model settings tables describe user_mining_model_settings
-- List all the ODM models created in your Oracle schema => what machine learning models you have created SELECT model_name, mining_function, algorithm, build_duration, model_size FROM user_MINING_MODELS;
-- List the algorithm settings used for your machine learning model SELECT setting_name, setting_value, setting_type FROM user_mining_model_settings; -- WHERE model_name in 'DECISION_TREE_MODEL2';
-- List the attribute the machine learning model uses. It may use a subset of the attributes. -- This allows you to see what attributes were selected SELECT attribute_name, attribute_type, usage_type, target from all_mining_model_attributes where model_name = 'DECISION_TREE_MODEL2';
You have now explored the machine learning models in your Oracle schema.
NOTE: SQL is not a graphic language. You cannot visualise the decision tree like in other graphical based tools.
But if you explore more of the DB objects created behind the scenes, you might be able to work this out.
This is not necessary for these exercises.
Task 5 – Generating the Confusion Matrix
First we need to apply the model to the test data set
-- create a view that will contain the predicted outcomes => labeled data set CREATE OR REPLACE VIEW demo_class_dt_test_results AS SELECT cust_id, prediction(DECISION_TREE_MODEL2 USING *) predicted_value, prediction_probability(DECISION_TREE_MODEL2 USING *) probability FROM mining_data_test_v;
-- Select the data containing the applied/labeled/scored data set -- This will be used as input to the calculation of the confusion matrix SELECT * FROM demo_class_dt_test_results;
Now we generate the confusion matrix.
NB: Make sure to have server output turned on in SQL Developer
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy => v_accuracy, apply_result_table_name => 'demo_class_dt_test_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'demo_class_dt_confusion_matrix', score_column_name => 'PREDICTED_VALUE', score_criterion_column_name => 'PROBABILITY', cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'PROBABILITY'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: ' || ROUND(v_accuracy,4)); END;
— select the results that are now stored in the table created during the Confusion Matrix function
SELECT * FROM demo_class_dt_confusion_matrix;
Task 6 – Applying the model to new data – Bulk processing
Let us now apply the model to new data that has already been gathered in a table. With bulk processing we want to generate a labeled data set that contains the predicted value by the machine learning model.
BEGIN dbms_data_mining.apply( model_name => 'Decision_Tree_Model2', data_table_name => 'MINING_DATA_APPLY_V', case_id_column_name => 'CUST_ID', result_table_name => 'NEW_DATA_SCORED'); END;
Inspect the results in the NEW_DATA_SCORED table and write some queries to compare the results.
You will need to refresh the object tree on the left hand side of SQL Developer to get this new table appearing.
Task 7 – Applying the model to new data – Real-time processing
Use the PREDICTION and PREDICTION_PROBABILITY functions to label data in real-time mode.
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *) FROM MINING_DATA_APPLY_V;
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *), , PREDICTION_PROBABILITY(Decision_Tree_Model2 using *) FROM MINING_DATA_APPLY_V WHERE rownum <=5;
SELECT cust_id FROM MINING_DATA_APPLY_V WHERE PREDICTION(Decision_Tree_Model2 using *) = 1 AND rownum <=10;
Task 8 – Using the model for What-if analysis
Now try out doing some What-If analysis using the machine learning model.
select PREDICTION_PROBABILITY (Decision_Tree_Model2, 0 USING 46 as AGE, 3 as HOUSEHOLD_SIZE, 'Assoc-A' as EDICATION, 'M' as CUST_GENDER, 'MARRIED' as CUST_MARITAL_STATUS, 'Crafts' as OCCUPATION) Pred_Prob from dual;