This week is divided into two topics. The first topic covers some data manipulation functions in SQL, covering numeric, date and string data types. In the second part we will look at how to do Data Mining/Machine Learning using SQL functions. Yes you can do data mining/machine learning in a database.
Click here to download the SQL Manipulation Function notesL4-1-Manipulating Data using SQL
Click here to download the Oracle Data Mining notesL4-2-In-Database_Data_Mining
Lab 1 – Data Manipulation functions
[The following tasks should take approx. 20-25mins to complete.]
Task 1 -Some Date processing
Find out the day of the week you were born on, and what days of the week you celebrated your 10th, 18th, 21st, etc birthday on.
Task 2 -Date and Time processing
Write out today’s date in standard format used in EU and in USA. Include the time in standard 12 hour format and in 24 hour format
Task 3 -Basic string processing
Take your full name as a string and separate it out into its individual components
Task 4 -Reformatting
Reformat the following string
Brendan, Tierney, Kevin St, Dublin 8, Ireland
to the following format
brendan |tierney |kevin st |dublin 8. |ireland. |
– convert each string component to lower case, remove the comma, replace to have fixed width of 10 characters, separated by the pipe (|) symbol
Task 5 – Working with names
Take the following names. Using the SQL functions reformat them into their correct form (example given)
input = ‘mcDonald’. output = ‘McDonald
input = “o’reilly”. output = “O’Reilly”
Task 6 – Regular Expressions
Lab 2 – In-Database Machine Learning
[The following tasks should take approx. 35-40mins to complete including some time to RTFM!]
Check out the book chapter from my Oracle Data Mining book
Task 1 – Loading the data set
Load the required data sets into your schema. You will need to download the following files and save them to your environment. Then load these into your Schema using SQL Developer.
Training Data Set = mining_data_build_v.sql
Create two tables that have the same structure as MINING_DATA_BUILD_V table. These following tables should contain No data.
Example SQL to create these tables with no records
create table mining_data_test_v as select * from mining_data_build_v where rownum < 1;
Now load the data into each of these tables.
Task 2 – Creating the settings table
Create the Settings table for a Decision Tree model
-- Create the settings table CREATE TABLE decision_tree_model_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30));
-- Populate the settings table -- Specify DT. By default, Naive Bayes is used for classification. -- Specify ADP. By default, ADP is not used. BEGIN INSERT INTO decision_tree_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name,dbms_data_mining.algo_decision_tree); INSERT INTO decision_tree_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on); COMMIT; END;
Task 3 – Creating a Decision Tree
Create the model for the decision tree, with AFFINITY_CARD as the target attribute/feature.
BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'Decision_Tree_Model2', mining_function => dbms_data_mining.classification, data_table_name => 'mining_data_build_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => 'decision_tree_model_settings'); END;
Task 4 – Exploring the meta-data for the models
Use the following commands to explore the meta-data about the Decision Tree model.
Can you explain what the purpose is for each of these commands.
-- describe the model settings tables describe user_mining_model_settings
-- List all the ODM models created in your Oracle schema => what machine learning models you have created SELECT model_name, mining_function, algorithm, build_duration, model_size FROM user_MINING_MODELS;
-- List the algorithm settings used for your machine learning model SELECT setting_name, setting_value, setting_type FROM user_mining_model_settings; -- WHERE model_name in 'DECISION_TREE_MODEL2';
-- List the attribute the machine learning model uses. It may use a subset of the attributes. -- This allows you to see what attributes were selected SELECT attribute_name, attribute_type, usage_type, target from all_mining_model_attributes where model_name = 'DECISION_TREE_MODEL2';
You have now explored the machine learning models in your Oracle schema.
NOTE: SQL is not a graphic language. You cannot visualise the decision tree like in other graphical based tools.
But if you explore more of the DB objects created behind the scenes, you might be able to work this out.
This is not necessary for these exercises.
Task 5 – Generating the Confusion Matrix
First we need to apply the model to the test data set
-- create a view that will contain the predicted outcomes => labeled data set CREATE OR REPLACE VIEW demo_class_dt_test_results AS SELECT cust_id, prediction(DECISION_TREE_MODEL2 USING *) predicted_value, prediction_probability(DECISION_TREE_MODEL2 USING *) probability FROM mining_data_test_v;
-- Select the data containing the applied/labeled/scored data set -- This will be used as input to the calculation of the confusion matrix SELECT * FROM demo_class_dt_test_results;
Now we generate the confusion matrix.
NB: Make sure to have server output turned on in SQL Developer
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy => v_accuracy, apply_result_table_name => 'demo_class_dt_test_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'demo_class_dt_confusion_matrix', score_column_name => 'PREDICTED_VALUE', score_criterion_column_name => 'PROBABILITY', cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'PROBABILITY'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: ' || ROUND(v_accuracy,4)); END;
— select the results that are now stored in the table created during the Confusion Matrix function
SELECT * FROM demo_class_dt_confusion_matrix;
Task 6 – Applying the model to new data – Bulk processing
Let us now apply the model to new data that has already been gathered in a table. With bulk processing we want to generate a labeled data set that contains the predicted value by the machine learning model.
BEGIN dbms_data_mining.apply( model_name => 'Decision_Tree_Model2', data_table_name => 'MINING_DATA_APPLY_V', case_id_column_name => 'CUST_ID', result_table_name => 'NEW_DATA_SCORED'); END;
Inspect the results in the NEW_DATA_SCORED table and write some queries to compare the results.
You will need to refresh the object tree on the left hand side of SQL Developer to get this new table appearing.
Task 7 – Applying the model to new data – Real-time processing
Use the PREDICTION and PREDICTION_PROBABILITY functions to label data in real-time mode.
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *) FROM MINING_DATA_APPLY_V;
SELECT cust_id, PREDICTION(Decision_Tree_Model2 using *), , PREDICTION_PROBABILITY(Decision_Tree_Model2 using *) FROM MINING_DATA_APPLY_V WHERE rownum <=5;
SELECT cust_id FROM MINING_DATA_APPLY_V WHERE PREDICTION(Decision_Tree_Model2 using *) = 1 AND rownum <=10;
Task 8 – Using the model for What-if analysis
Now try out doing some What-If analysis using the machine learning model.
select PREDICTION_PROBABILITY (Decision_Tree_Model2, 0 USING 46 as AGE, 3 as HOUSEHOLD_SIZE, 'Assoc-A' as EDICATION, 'M' as CUST_GENDER, 'MARRIED' as CUST_MARITAL_STATUS, 'Crafts' as OCCUPATION) Pred_Prob from dual;
Lab 3 – Work on your SQL assignment
Use whatever time you have available to work on your SQL assignment.