Overview of this Lecture / week
This week we will be concentration on non-traditional databases that are typically associated with the term NoSQL Databases. NoSQL is a bit of an odd term as its history, its meaning and what it is used for today are very different. Historically NoSQL referred to databases that did not use SQL. Because the SQL language was evil. Then they realised how powerful SQL was and started to include the SQL language in their databases. In more recent times NoSQL Databases typically relate to non-Relational Databases. We will go through the CAP theorem, and the main categories of NoSQL databases, looking at the advantages and disadvantages of each. We will finish off on NewSQL databases, some of the features they have and what the future holds.
There will be some related lab exercises that require you to use the Oracle Virtual Machine setup and used in previous weeks.
Read before class, and try to ignore the “marketing” side of this article
NotesL3 - NoSQL Databases
This weeks lab consists of 3 parts. You will need to use the Oracle Virtual Machine to try out these features as the require administrator access to the database.
1. Using External Data
To use External Data in your database you can create a link to a file and allow the database to treat this file as a table. This way you can use SQL to query data in a table on the file system (of the database server). Follow these tutorials and then follow the exercise below.
Go to the UCI Machine Learning Repository. Inspect the various data sets that are available.
Select one of these data sets that is in traditional CSV format. (If you are not sure what a CSV format is then Google it)
Download the CSV data file onto the Virtual Machine.
Make the necessary changes to the Database and add the download files as an external table.
Write some queries to explore the data in the file.
2. Columnar Database Features
In this exercise you will explore the in-database columnar features of the database. This is a powerful feature that can dramatically increase query response times. But you are limited by what memory you may have available on the server. Follow the tutorials and then complete the exercises.
Your task now is to test the in-memory column store. In this task you will query a traditional table and time how long it takes to run the query. Then push this table to being in-memory and test the query again to see what the difference is.
Create a table have the same attributes as the external table created in the previous exercise.
Copy all the data from the external table into the new table.
Write several queries that perform some analytics on this data in the new table and time how long the queries take. If the queries finish very quickly, within a second or two, then you need more detail. Make additional copies of the data and insert into the new table (i.e. add the data again and again and again)
When your queries are taking a few seconds (>5 seconds and ideally you have some that take >30 seconds or minutes), you can then reconfigure the table to put it into memory.
Then re-run the queries and time how long they take. What are the differences?
3. In-Database JSON
Check out the following tutorials on using JSON in Oracle. Complete these before considering the exercise below.
Looking back on the first ER diagram exercise. Convert the Employment History entity/table to use a combination of traditional columns as well as a JSON object to story previous employments.
Enter some data into the table containing the JSON object and write some queries on this data.
Look at what indexes can be added to this JSON object to improve performance.
Additional Reading Materials