WWD_WK3

Introduction

This week we will continue our exploration knowledge of SQL by looking at some slightly more complex SQL Queries, how to analyze data and explore analytic SQL Functions.

This week we will look at:

  • Analyzing data using basic SQL statistical functions
  • Look at a subset of the analytical and window SQL functions

Notes

Click here to download the Analytical SQL Functions notes – Part 1.

Click here to download the Analytical SQL Functions notes – Part 2.

Lab Exercises

Exercise 1 – Analyzing data

Use the statistical function to answer the following question, using the EMPT, DEPT, etc tables.

[The following exercises will take approx 25-35 minutes to complete]

1-1 – What is the maximum and minimum salary

1-2 – What is the difference between the maximum and the average salary

1-3 – Count the number of employees per grade

1-4 – What is the average salary per department

1-5 – Select the department and number of employees,  who have a department count greater than 2

1-6 – List the employees who have a salary greater than the average salary

1-7- Use the SUMMARY package to gather statistics

Use the in-database package to gather summary statistics about one of the attributes in a table. This package and function is called DBMS_STAT_FUNCS.SUMMARY.  Check out my blog post about this function.

Exercise 2 – Analytical SQL Functions

This lab exercise requires you to practice using some of the hundreds of analytical SQL functions. In addition to the exercises/tasks listed below, you should spend some time exploring and experimenting with the many of these analytical SQL functions

[Open the Oracle SQL Language Reference Manual = RTFM]
[Also see the examples in the slides]

[50-75 minutes to complete Tasks 1-6. Complete remaining Tasks if time or as homework]

Task 1 – Practice using basic SQL Stats functions

Experiment with using the following stats functions in SQL, on the data sets you have in your Oracle Schema

AVG(): returns the mean
COUNT(): returns the population (or sample, depending on the row source)
POWER(a,n): returns the value of a to the nth power
SQRT(n): returns the square root of n
SUM(): returns the sum of the values in a set
STDDEV(): returns the standard deviation of a sample
STDDEV_POP(): returns the standard deviation of a population
VARIANCE(): returns the variance of a sample
VAR_POP(): returns the variance of a population
CORR(): correlation analysis

Task 2 – Having Clause

Take the SQL stats functions used in Task 1 and modify them to include the HAVING Clause.

You will need to think carefully on How and Why you are using the HAVING clause in each case.

What are you trying to show or demonstrate in each case.

Write 1-3 line description of what the benefit is for using the HAVING clause for each example.

Task 3 – Sampling data

For one of your larger tables, containing the largest number of records, compare the results and records generated by using the SAMPLE and ORA_HASH functions, for the following sample sizes.

Sample = 5%
Sample = 10%
Sample = 15%
Sample = 20%

What differences can you find and/or observations that you see. Can you explain these?

Task 4 – PIVOT

Write a query that returns the average salary per department (full name)

Rewirte the query using the PIVOT command to display the display the departments as columns in the output.

Task 5 – LEAD and LAG

Write a query that returns some of the details of each employee, ordered by salary in ascending order

Alter the query to add an additional column that displays the salary of the previous person in the results set

Alter the query to add an additional column that displays the salary of the next person in the results set

Add the following columns
– Difference in salary between the employee and the person who earns just below them
– Difference in salary between the employee and the person who earns just more than them

Task 6 – Ranking Staff

Using your basic query that returns the details of employees and their salary
– Use the RANK function applied to all the data
– Use the PERCENT_RANK function applied to all the data
– Use the CUME_DIST function applied to all the data

Using the PARTITION BY clause change the above queries to rank the results based on each department.

Exercise 3 (Optional)

Task 1
Perform some regression analysis on the Employee salary data using the following function
REGR_SLOPE
– REGR_INTERCEPT
– REGR_COUNT

Task 2
Alter these questions to partition the results by
– job_id
– department

These should be done as separate queries

Task 3
Use the ROLLUP function, on your employee salary query, to calculate aggregated results by
– department
– department and job_id

Modify your previous query to use the CUBE function install of the ROLLUP function.
Examine how the results are different between the usage of the ROLLUP and CUBE functions.
Can you think of other scenarios when you would use one of these functions instead of the other.

Additional Reading/Resources

Oracle SQL Language Reference – Functions chapter
Fundamentals of PL/SQL
ANSI SQL vs Traditional SQL: how to write JOINS => two versions of the same standard

7 Reasons Why Using SELECT * FROM TABLE in SQL Query Is a Bad Idea