Welcome to the Programming for Big Data module.
This is an optional module on the MSc in Computing, for the tracks on Advanced Software Development and Data Analytics.
The module is divided into 3 main components and each component will have 4 weeks of lectures and lab work. Each component will have a different lecturer.
1. Hadoop and MapReduce (lecturer = Brendan)
2. Programming with Spark (lecturer = Bojan)
3. NoSQL Databases (Redis, Cassandra, mongoDB, Elastic Search) (lecturer = Lucas)
Each class will be a mixture of lecture, in-class exercises, research, independent learning, etc.
It is expected that students can work independently and have the necessary programming (Java etc) and technical skills (working with Virtual Machines, Linux, etc) for this module.
Make sure to check out WebCourses for links to all the materials and assessments for this module. Only the notes and materials for the Hadoop & MapReduce component is available on this website.
The module is 100% continuous assessment. This means there is no exam. But there is a lot of work for class exercises and there will be independent assessments for each component of the module => lots of assignment work.
- Hadoop & MapReduce assessment = 35%
- Spark assessment = 35%
- NoSQL assessment = 30%
Module Overview & Admin Notes
Video Correction – NoSQL will be covered by Lucas Rizzo
What to do before the first class
This is an 8Gb download, plus extra space for VM. You will need a minimum of 2GM RAM available to run the VM.
Docker: If you like working with Docker, try out the pre-built Docker images on the Docker Hub Store.