Students are reminded that notes provided on this site are intended to form summary material only and are not intended to be a substitute for attending lectures or further reading on the subject.

My slides are not Lecture Notes

Students should download the notes to your own device. The notes are a living artefact and will evolve from semester to semester. It cannot be guaranteed that the notes will be available after the end of a semester.

This module is 100% continuous assessment. There is a large practical aspect to this module. It is expected that students can work independently and have the necessary programming (Java etc) and technical skills (working with Virtual Machines, Linux, etc) for this module.

Before you decide to take this module, have a read of the Module overview and watch the following videos.
This will allow you know what is involved with this module, that is expected of you, and how you can prepare
for the module, especially the first class.

Module Overview video  (& notes)
Module Pre-requisites (for Hadoop part) : video (& notes)
What to do before first class : video (& notes)

Please be mindful of your location and any people nearby before playing any of the videos
Always use headphones

  Week of

 Lecture Topic  Lab Work
 Additional Information
Make Module Overview (video)

Module Pre-requisites (video)

What to do before first class (video)


Introduction to Hadoop & Overview (Notes)

Managing your VM
Make sure you regularly clear down the temp files and files in the bin
The VM has a small disk size and careful management of the available space is necessary.

If you need to change the size of the disk for the VM, you can follow the instructions here. Make sure you follow them very carefully.

Week 1 FAQ / Q&As

Week 1 Lab Exercises & Videos



Link to 64bit VM
Link to 32bit VM



Video-What is Hadoop & Map-Reduce

Link to the Hadoop VM will be given in class and on WebCourses

Google white paper : Google File System => HDFS

Google white paper :


Hadoop Website

Hadoop Documentation

Hadoop APIs

HDFS Commands Cheat Sheet

Hadoop in Action

Relational databases are far from dead — just ask Facebook

Microsoft CEO Satya Nadella reveals which product he wishes the company had developed first

29th Jan
This weeks topic is available online only.
There will be no in-person class this week.All lecture notes and lab exercises are available on this web page.Videos are provided that cover the lecture notes and the lab exercises.Map-Reduce – Part 1


Week 2 FAQ / Q&As



Week 2 Lab Exercises & Videos

Sample code

Shakespeare data set

Hadoop in Action – Some Case Studies


10 Hadoop Tutorials

5th Feb
Map-Reduce – Part 2


Week 3 FAQ / Q&As

Week 3 Lab Exercises & Videos

Assignment Handout
– see webcourses for details including
– assignment handout
– details of data set
– any additional instructions

This code might be useful for your assignment!

TextPairs java code

12th Feb
Hadoop Eco-System & Data Storage
lots of Hadoop related products!

Week 4 FAQ / Q&As

Discussion – Read the following and discuss in class
Hadoop is Failing (or is it really)?
– Make sure to read the comments

Hadump – meaning data dumped into Hadoop with no plan

No lab work this week. Use the time to Work on Your Assignment


NoSQL keeps rising, but relational databases still dominate big data

The Hadoop Ecosystem – Table summarising products

A Plethora of Data Set Repositories

51 Database terms to know

The Secret Life of SQL and it’s Longevity

26th Feb
Hadoop Assignment Due this week.

Due Date = 28th February at 11pm

Must be submitted on Webcourses.

Relational Databases are far from dead — just ask Facebook Results and feedback during weeks 7/8/9 of semester via WebCourses

If you would like to meet to discuss your results we can setup an appointment.

Apache Spark

Installing Scala & Spark on a Mac

Scala Cheat Sheet