At the end of last week you had your first Map-Reduce process running on the VM. Hopefully you explored some of the monitoring pages to track the progress of the MR process and to see some of the results.

Getting MR setup and running can be a little bit difficult, but once you have it setup then it can be simple case of following the same process.

Care is needed on what processing you put into the Mapper and the Reducer. You need to think very carefully about this and what data processing and data transformations are needed at each step.

This week we will look at some of the more advanced features of the Map-Reduce process. These features aren’t really advanced features but they are features that allow us to create more feature rich Map-Reduce processes. These features include adding Counters, Combiners, Partitionars, Read and Writing different data formats and then how to create Chained Map-Reduce processes. There will be exercises for some of these features.

The assignment for this component of the module will be given out this week. It will be due on Week 6.

FAQ : Check out the questions and suggestions from previous students.

Notes

Click here to download the notes.

L3-Hadoop 2

Lab Exercises

Make sure you have completed all lab exercises, including those from previous weeks before you commence work on the assignment.

Click here to download the Lab Exercise.

Lab5-More_MR_Process

Assignment

See WebCourses for the assignment details.

Assignment due date is during week 6 of semester. No extensions are possible as this would impact the other components of the module and their assignment submission dates.

Additional Material

This code might be useful for your assignment! TextPairs java code