This weeks topic is the ‘Hadoop Eco-System’. The core Hadoop environment consists of Hive, HBase, Sqoop, Pig, Mahoot and a few more products. We will cover this topics and a few more topics relating to data processing languages and how Hadoop is being integrated into other Database and Data Management enterprise architectures.
Discussion – Read the following and discuss in class. Hadoop is Failing (or is it really)?
– Make sure to read the comments
Also have a read of Hadump – meaning data dumped into Hadoop with no plan
Lab time this week can be used for the following.
– Complete all lab exercises from previous weeks.
– Work on the assignment
Additional Materials & Reading
NoSQL keeps rising, but relational databases still dominate big data
The Hadoop Ecosystem – Table summarising products
A Plethora of Data Set Repositories
51 Database terms to know
The Secret Life of SQL and it’s Longevity
Relational Databases are far from dead — just ask Facebook
Some links on Spark (your next topic/component of the module)