Overview of this Lecture / week

What happens when you run a SQL query? What does the database do to my SQL query? How is my SQL query executed? Can I make my SQL queries go quicker?

These are many of the questions and challenges that Data Architect will be asked. Somethings the Database Administrator (DBA) will be asked these questions. Sometimes it will be the Data Architect and DBA to figure our the best options and then implement them in the database.

Over the course of this week and next week we will look at all of these questions. Each database will have their own specific way of accessing all of the requirement information and their own why of implementing query and database performance features. The features covered over these two weeks will cover the type of features in most Enterprise level Databases. Some of these feature are available in non-enterprise DBs and a small number in open source databases.

There will be some lab exercises. These will allow you to explore the various features covered in the notes/slides.

You will need the Oracle Pre-built Virtual Machine to complete these exercises. You may need to load some data into the database.

Notes

Click here to download notes.

L3 - Query Optimisation - Part 1

Videos of Notes

None available at this time

Lab Exercises

This lab exercise looks at exploring some of the indexing and performance features.

Task 1 – Explore the tables and objects created for your previous ER Diagrams
Example the tables and associated objects there were created in your Oracle Schema when you ran the DDL script for the ER diagrams you created back in Week 8.

The following database data dictionary views maybe helpful.

desc user_objects
desc user_tables
desc user_tab_columns
desc user_tab_comments
desc user_constraint
desc user_cons_columns
desc user_indexes
desc user_ind_columns

Examine these data dictionary views and use the information available in these to exaime what exists in the database for your objects.

Examine what indexes and other constraints that are missing from your objects and model.

Task 2 – What modifications could you make to the ER Diagram
Using the information gathered in the previous task, update the ER diagram to include the necessary indexing and constraints.

Generate the updated DDL and re-create the objects in your Oracle Schema.

Re-examine the data dictionary views to determine if the additional indexes and constraints have been created in your schema.

Task 3 – Explore what happens when you run the following code
The following code illustrate a scenario where a set of tables have been poorly defined and modeled. Follow the steps below and examine the outputs at Step 3, Step 5 and Step 7. You will need to work out why the changes in the performance have occurred and what particular newly created objects resulted in this improved performance.

Create an ER diagram based on the information provided in the following steps.

Step 1 – Let’s create some tables

set autotrace off;
drop table sales_v2;
drop table customers_v2;
drop table stores_v2;

create table stores_v2
( store_id int,
name varchar2(20),
address varchar2(100),
country varchar2(20)
);

create table customers_v2
( cust_id int,
name varchar2(100),
signup date,
creditlimit int,
vip varchar2(1),
store_id int
);

create table sales_v2
( sales_id int,
cust_id int,
tstamp timestamp,
amount number(10,2),
prod_id int
);

Step 2 – Now create some data in these tables

insert /*+ APPEND */ into stores_v2
select rownum, 'store'||rownum, 'address'||rownum , 'INDIA'
from dual
connect by level <= 50;

insert /*+ APPEND */ into customers_v2
select rownum, 'cust'||rownum, sysdate-720+mod(rownum,500), dbms_random.value(1,100),
case when mod(rownum,10)=0 then 'Y' else 'N' end, mod(rownum,50)+1
from dual
connect by level <= 5000;

insert /*+ APPEND */ into sales_v2
select rownum, 1+mod(rownum,5000), sysdate-720+rownum/(1000000/720), rownum/1000, mod(rownum,100)
from dual
connect by level <= 1000000;

commit;

Step 3 – Now query this data and examine how the query is/was executed

set autotrace on;

select prod_id, max(amount)
from stores_v2 st,
customers_v2 c,
sales_v2 s
where s.cust_id = c.cust_id(+)
and c.store_id = st.store_id
and s.amount > 10
group by prod_id;

set autotrace off;

You should get results like the following in the ‘Script Output’ tab

>>Query Run In:Query Result 1

Explain Plan

-----------------------------------------------------------

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 55184460

-------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100 | 2200 | 1186 (3)| 00:00:01 |

| 1 | HASH GROUP BY | | 100 | 2200 | 1186 (3)| 00:00:01 |

|* 2 | HASH JOIN | | 990K| 20M| 1164 (1)| 00:00:01 |

|* 3 | HASH JOIN | | 5000 | 50000 | 12 (0)| 00:00:01 |

| 4 | TABLE ACCESS FULL| STORES_V2 | 50 | 150 | 3 (0)| 00:00:01 |

| 5 | TABLE ACCESS FULL| CUSTOMERS_V2 | 5000 | 35000 | 9 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|* 6 | TABLE ACCESS FULL | SALES_V2 | 990K| 11M| 1149 (1)| 00:00:01 |

-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------
2 - access("S"."CUST_ID"="C"."CUST_ID")
3 - access("C"."STORE_ID"="ST"."STORE_ID")
6 - filter("S"."AMOUNT">10)

When you look under the Cost column of the above formatted output you will see that the cost associated with executing this query is 1186. Plus you can see that there are 3 full table scans. This is not good.

Step 4 – Let’s fix this by creating some indexes

create unique index store_ix on stores_v2 ( store_id );
create unique index cust_ix on customers_v2 ( cust_id );
create unique index sales_ix on sales_v2 ( sales_id);
create index sales_ix_cust on sales_v2 ( cust_id);
create index sales_ix_prod on sales_v2 ( prod_id , amount);
Step 5 - Now run the query again - Does the query run quicker?
set autotrace on

select prod_id, max(amount)
from stores_v2 st,
customers_v2 c,
sales_v2 s
where s.cust_id = c.cust_id(+)
and c.store_id = st.store_id
and s.amount > 10
group by prod_id;

set autotrace off;

This time the formatted output looks a little different

>>Query Run In:Query Result 2

Explain Plan

-----------------------------------------------------------

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 1243214705

-------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100 | 2200 | 1184 (3)| 00:00:01 |

| 1 | HASH GROUP BY | | 100 | 2200 | 1184 (3)| 00:00:01 |

|* 2 | HASH JOIN | | 990K| 20M| 1161 (1)| 00:00:01 |

| 3 | NESTED LOOPS SEMI | | 5000 | 50000 | 9 (0)| 00:00:01 |

| 4 | TABLE ACCESS FULL| CUSTOMERS_V2 | 5000 | 35000 | 9 (0)| 00:00:01 |

|* 5 | INDEX UNIQUE SCAN| STORE_IX | 50 | 150 | 0 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|* 6 | TABLE ACCESS FULL | SALES_V2 | 990K| 11M| 1149 (1)| 00:00:01 |

-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------
2 - access("S"."CUST_ID"="C"."CUST_ID")
5 - access("C"."STORE_ID"="ST"."STORE_ID")
6 - filter("S"."AMOUNT">10)

We can see the cost of executing the query has reduced to 1184 and we are still doing two full table scans. This is not good!!!

Step 6 – Now let us add some valuable meta-data

-- create the primary keys
alter table stores_v2 add primary key (store_id );

alter table customers_v2 add primary key (cust_id );

alter table sales_v2 add primary key (sales_id );

-- create some NOT NULL constraints on attributes

alter table sales_v2 modify cust_id not null;

alter table sales_v2 modify prod_id not null;

alter table sales_v2 modify amount not null;

alter table customers_v2 modify store_id not null;

-- create the foreign key constraints

alter table customers_v2 add constraint cust_fk
foreign key ( store_id) references stores_v2 ( store_id );

alter table sales_v2 add constraint sales_fk
foreign key ( cust_id) references customers_v2 ( cust_id );

Step 7 – Did adding this meta-data make any difference to our query

set autotrace on;

select prod_id, max(amount)
from stores_v2 st,
customers_v2 c,
sales_v2 s
where s.cust_id = c.cust_id(+)
and c.store_id = st.store_id
and s.amount > 10
group by prod_id;

set autotrace off;

Run this code and find out ?

Additional Reading

Blog post on creating indexes