This week is divided into two topics. The first topic covers some data manipulation functions in SQL, covering numeric, date and string data types. In the second part we will look at how to do Data Mining/Machine Learning using SQL functions. Yes you can do data mining/machine learning in a database.


Click here to download the SQL Manipulation Function notes

L4-1-Manipulating Data using SQL

Click here to download the Oracle Data Mining notes


Lab Exercises

Lab 1 – Data Manipulation functions

[The following tasks should take approx. 20-25mins to complete.]

Task 1 -Some Date processing

Find out the day of the week you were born on, and what days of the week you celebrated your 10th, 18th, 21st, etc birthday on.

Task 2 -Date and Time processing

Write out today’s date in standard format used in EU and in USA. Include the time in standard 12 hour format and in 24 hour format

Task 3 -Basic string processing

Take your full name as a string and separate it out into its individual components

Task 4 -Reformatting

Reformat the following string

Brendan, Tierney, Kevin St, Dublin 8, Ireland

to the following format

brendan   |tierney   |kevin st  |dublin 8. |ireland.  |

– convert each string component to lower case, remove the comma, replace to have fixed width of 10 characters, separated by the pipe (|) symbol

Task 5 – Working with names

Take the following names. Using the SQL functions reformat them into their correct form (example given)

input = ‘mcDonald’.   output = ‘McDonald

input = “o’reilly”.   output = “O’Reilly”

Task 6 – Regular Expressions

Lab 2 – In-Database Machine Learning

[The following tasks should take approx. 35-40mins to complete including some time to RTFM!]

Check out the book chapter from my Oracle Data Mining book

Task 1 – Loading the data set
Load the required data sets into your schema. You will need to download the following files and save them to your environment. Then load these into your Schema using SQL Developer.

Training Data Set = mining_data_build_v.sql

Create two tables that have the same structure as MINING_DATA_BUILD_V table. These following tables should contain No data.


Example SQL to create these tables with no records

create table mining_data_test_v
as select * from mining_data_build_v
where rownum < 1;

Now load the data into each of these tables.

MINING_DATA_TEST_V = mining_data_test_v.sql
MINING_DATA_APPLY_V = mining_data_apply_v.sql

Task 2 – Creating the settings table
Create the Settings table for a Decision Tree model

-- Create the settings table
CREATE TABLE decision_tree_model_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(30));
-- Populate the settings table
-- Specify DT. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used.
   INSERT INTO decision_tree_model_settings (setting_name, setting_value)
   VALUES (dbms_data_mining.algo_name,dbms_data_mining.algo_decision_tree);

   INSERT INTO decision_tree_model_settings (setting_name, setting_value)
   VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);

Task 3 – Creating a Decision Tree
Create the model for the decision tree, with AFFINITY_CARD as the target attribute/feature.

   model_name => 'Decision_Tree_Model2',
   mining_function => dbms_data_mining.classification,
   data_table_name => 'mining_data_build_v',
   case_id_column_name => 'cust_id',
   target_column_name => 'affinity_card',
   settings_table_name => 'decision_tree_model_settings');

Task 4 – Exploring the meta-data for the models
Use the following commands to explore the meta-data about the Decision Tree model.

Can you explain what the purpose is for each of these commands.

-- describe the model settings tables
describe user_mining_model_settings
-- List all the ODM models created in your Oracle schema => what machine learning models you have created
SELECT model_name,
-- List the algorithm settings used for your machine learning model
SELECT setting_name,
FROM user_mining_model_settings;
-- WHERE model_name in 'DECISION_TREE_MODEL2';
-- List the attribute the machine learning model uses. It may use a subset of the attributes.
-- This allows you to see what attributes were selected
SELECT attribute_name,
from all_mining_model_attributes
where model_name = 'DECISION_TREE_MODEL2';

You have now explored the machine learning models in your Oracle schema.

NOTE: SQL is not a graphic language. You cannot visualise the decision tree like in other graphical based tools.
But if you explore more of the DB objects created behind the scenes, you might be able to work this out.
This is not necessary for these exercises.

Task 5 – Generating the Confusion Matrix
First we need to apply the model to the test data set

-- create a view that will contain the predicted outcomes => labeled data set
CREATE OR REPLACE VIEW demo_class_dt_test_results
SELECT cust_id,
   prediction(DECISION_TREE_MODEL2 USING *) predicted_value,
   prediction_probability(DECISION_TREE_MODEL2 USING *) probability
FROM mining_data_test_v;
-- Select the data containing the applied/labeled/scored data set
-- This will be used as input to the calculation of the confusion matrix
FROM demo_class_dt_test_results;

Now we generate the confusion matrix.

NB: Make sure to have server output turned on in SQL Developer

   v_accuracy NUMBER;
   accuracy => v_accuracy,
   apply_result_table_name => 'demo_class_dt_test_results',
   target_table_name => 'mining_data_test_v',
   case_id_column_name => 'cust_id',
   target_column_name => 'affinity_card',
   confusion_matrix_table_name => 'demo_class_dt_confusion_matrix',
   score_column_name => 'PREDICTED_VALUE',
   score_criterion_column_name => 'PROBABILITY',
   cost_matrix_table_name => null,
   apply_result_schema_name => null,
   target_schema_name => null,
   cost_matrix_schema_name => null,
   score_criterion_type => 'PROBABILITY');
   DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: ' || ROUND(v_accuracy,4));

Then run

— select the results that are now stored in the table created during the Confusion Matrix function

FROM demo_class_dt_confusion_matrix;

Task 6 – Applying the model to new data – Bulk processing
Let us now apply the model to new data that has already been gathered in a table. With bulk processing we want to generate a labeled data set that contains the predicted value by the machine learning model.

   model_name => 'Decision_Tree_Model2',
   data_table_name => 'MINING_DATA_APPLY_V',
   case_id_column_name => 'CUST_ID',
   result_table_name => 'NEW_DATA_SCORED');

Inspect the results in the NEW_DATA_SCORED table and write some queries to compare the results.

You will need to refresh the object tree on the left hand side of SQL Developer to get this new table appearing.

Task 7 – Applying the model to new data – Real-time processing
Use the PREDICTION and PREDICTION_PROBABILITY functions to label data in real-time mode.

SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *)
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *), , PREDICTION_PROBABILITY(Decision_Tree_Model2 using *)
WHERE rownum <=5;
SELECT cust_id
WHERE PREDICTION(Decision_Tree_Model2 using *) = 1
AND rownum <=10;

Task 8 – Using the model for What-if analysis
Now try out doing some What-If analysis using the machine learning model.

select PREDICTION_PROBABILITY (Decision_Tree_Model2, 0
USING 46 as AGE,
'Assoc-A' as EDICATION,
'Crafts' as OCCUPATION) Pred_Prob
from dual;

Lab 3 – Work on your SQL assignment

Use whatever time you have available to work on your SQL assignment.

Additional Reading

Oracle Data Mining – Classification Book chapter

Oracle Data Mining User Guide (Oracle Documentation)

Oracle Data Mining Concepts (Oracle Documentation)

PL/SQL Data Mining Packages and Functions (Oracle Documentation)