ADVDB_WK5

Overview of this Lecture / week

Last week we commenced our exploration of Database Query Optimisation. The focus last week was on introducing the topic and covering the main indexing techniques that Enterprise Databases provide, and some/most open source databases.

This week we will look at Cost Based optimisation. This requires a fully implemented data model and the database gathers statistics about the data contained in each of the tables and the constraints between them. We will also look at some physical structures for tables and how in-memory data storage and processing can be used.

Notes

Click here to download the notes.

L4 - Query Optimisation - Part 2

Videos of Notes

Check out the videos by Neil Chandler on various Optimizer topics.

Check out the details notes and videos on Partitioning here.

Lab Exercises

Task 1 – How Fast is your Database

In this task you will get to test your Oracle Database (Autonomous, VM, etc) to see how many transactions per second it can process. These task is based on the blog post by Connor McDonald. Have a read of his blog post before starting the Task.

Ideally it would be good to perform the test using the Autonomous Database and Database on a VM. If you have other database installs, say direct install on your computer, then great.

I’ve taken the code provided on Connor’s blog post, made some corrections and some minor additions to the code.

--
-- Some parent tables
--
create table product as select rownum pid, 'product '||rownum descr
from dual connect by level <= 100000;
alter table product add primary key ( pid) ;

create table customer as select rownum cid, 'customer '||rownum descr
from dual connect by level <= 100000;
alter table customer add primary key ( cid) ;

create table store as select rownum sid, 'store '||rownum descr
from dual connect by level <= 100000;
alter table store add primary key ( sid) ;

create table taxcode as select rownum tid, 'taxcode '||rownum descr
from dual connect by level <= 100000;

alter table taxcode add primary key ( tid) ;

create sequence txnseq;
--
-- And my sales table that refers back to them
--
create table sales
( txn_id int default txnseq.nextval not null ,
pid int not null references product(pid),
cid int not null references customer(cid),
sid int not null references store(sid),
tid int not null references taxcode(tid),
discount varchar2(1) not null,
amount number(10,2) not null,
qty number(10) not null,
constraint chk1 check ( discount in ('Y','N')),
constraint chk2 check ( amount > 0),
constraint chk3 check ( qty > 0)
);

alter table sales add primary key ( txn_id )
using index global partition by hash ( txn_id ) partitions 8;
-------------------------------------------------------------

create table hammer_log ( job int, ela interval day to second );

create or replace
procedure hammer_time is
   p sys.odcinumberlist := sys.odcinumberlist();
   c sys.odcinumberlist := sys.odcinumberlist();
   s sys.odcinumberlist := sys.odcinumberlist();
   t sys.odcinumberlist := sys.odcinumberlist();

timer timestamp;
begin
   select trunc(dbms_random.value(1,100000))
   bulk collect into p
   from dual connect by level <= 32000;

   select trunc(dbms_random.value(1,100000))
   bulk collect into c
   from dual connect by level <= 32000;

   select trunc(dbms_random.value(1,100000))
   bulk collect into s
   from dual connect by level <= 32000;

   select trunc(dbms_random.value(1,100000))
   bulk collect into t
   from dual connect by level <= 32000;

   timer := localtimestamp;
   for j in 1 .. 3 loop
      for i in 1 .. 32000 loop
         insert into sales ( pid,cid,sid,tid,discount,amount,qty)
         values (p(i), c(i), s(i), t(i), 'Y',i,i);
         commit;
      end loop;
   end loop;

   insert into hammer_log values ( sys_context('USERENV','BG_JOB_ID'), localtimestamp-timer);
   commit;
end;
/

---------------------------------------------------------
delete from hammer_log;

declare
   j int;
begin
   for i in 1 .. 12
   loop
      dbms_job.submit(j,'hammer_time;');
   end loop;
   commit;
end;
/

select * from hammer_log;

select count(*)*96000 / extract(second from max(ela))
from hammer_log;

select max(ela) from hammer_log;
select count(*)*96000 / (extract(minute from max(ela))*60)+(extract(second from max(ela))) tps from hammer_log;
-- 19,228.890954 tps
---------------------------------------------------------

drop index sales_ix1;
drop index sales_ix2;
drop index sales_ix3;
drop index sales_ix4;

create index sales_ix1 on sales ( pid);
create index sales_ix2 on sales ( sid);
create index sales_ix3 on sales ( cid);
create index sales_ix4 on sales ( tid);
---------------------------------------------------------

delete from hammer_log;
declare
   j int;
begin
   for i in 1 .. 12
   loop
      dbms_job.submit(j,'hammer_time;');
   end loop;
   commit;
end;
/

select * from hammer_log;
select count(*)*96000 / (extract(minute from max(ela))*60)+(extract(second from max(ela))) tps from hammer_log;
-- 19,238.647394

Run the above code in each database and take note of the timing for each part.

Compare the timings from each database.

Why are there differences?  Do you know why?  Can you explain why there are differences?

Task 2 Building upon the exercises from last week, try some of the techniques covered this week with the same data set, or use an alternative data set. The techniques to try include

Additional Reading

Jonathan Lewis – Changing the order of the table in the FROM can affect performance