This weeks topic is the ‘Hadoop Eco-System’. The core Hadoop environment consists of Hive, HBase, Sqoop, Pig, Mahoot and a few more products. We will cover this topics and a few more topics relating to data processing languages and how Hadoop is being integrated into other Database and Data Management enterprise architectures.

Discussion – Read the following and discuss in class.  Hadoop is Failing (or is it really)?
– Make sure to read the comments

Also have a read of  Hadump – meaning data dumped into Hadoop with no plan

FAQ : Check out the questions and suggestions from previous students.

Notes

Click here to download the notes.

L4-Hadoop-EcoSystem

Lab Exercises

Lab time this week can be used for the following.

– Complete all lab exercises from previous weeks.

– Work on the assignment

Additional Materials & Reading

NoSQL keeps rising, but relational databases still dominate big data
The Hadoop Ecosystem – Table summarising products
A Plethora of Data Set Repositories
51 Database terms to know
The Secret Life of SQL and it’s Longevity
Relational Databases are far from dead — just ask Facebook

Some links on Spark (your next topic/component of the module)

Apache Spark
Installing Scala & Spark on a Mac
Scala Cheat Sheet