Welcome to the Programming for Big Data module.

This is an optional module on the MSc in Computing, for the tracks on Advanced Software Development and Data Analytics.

The module is divided into 3 main components and each component will have 4 weeks of lectures and lab work. Each component will have a different lecturer.

1. Hadoop and MapReduce (lecturer = Brendan)

2. Programming with Spark (lecturer = Bojan)

3. NoSQL Databases (Redis, Cassandra, mongoDB, Elastic Search)  (lecturer = Lucas)

Each class will be a mixture of lecture, in-class exercises, research, independent learning, etc.

It is expected that students can work independently and have the necessary programming (Java etc) and technical skills (working with Virtual Machines, Linux, etc) for this module.

Make sure to check out WebCourses for links to all the materials and assessments for this module. Only the notes and materials for the Hadoop & MapReduce component is available on this website.

Module Assessments

The module is 100% continuous assessment. This means there is no exam.  But there is a lot of work for class exercises and there will be independent assessments for each component of the module => lots of assignment work.

  • Hadoop & MapReduce assessment = 35%
  • Spark assessment = 35%
  • NoSQL assessment = 30%

Module Overview & Admin Notes

Read and understand these notes.

Video Correction – NoSQL will be covered by Lucas Rizzo

Module Pre-requisites

Notes on pre-requisites for the module. Be prepared for the module and do you have the skills and background.

What to do before the first class

What to do before the first class

Install VirtualBox software.

Download the pre-build Virtual Machine (VM).  I will show you how to install and use this VM during the First Week class.

This is an 8Gb download, plus extra space for VM.  You will need a minimum of 2GM RAM available to run the VM.

Docker: If you like working with Docker, try out the pre-built Docker images on the Docker Hub Store.