PBD_wk2

In this class we will look at the basics of the Map-Reduce process, it’s different components, and how it all works together.

The lab exercises main towards you building and running your first Map-Reduce process.

FAQ : Check out the questions and suggestions from previous students.

Notes

Click here to download the notes.

Lab Exercises

Click here to download the lab exercises.

Follow the notes very carefully. There are some configurations settings. If you miss one of these or do one incorrectly, the code will not compile and/or will not run for you.  See the sample code.

Complete Exercise 1 and Exercise 2 before the next class.

Exercise 1 is the same/similar as the Tutorial on the Hadoop website. Check out this webpage for more details and is an alternative location for the Sample Code.

WARNING: You need to be careful of what version of Hadoop is used for each tutorial/examples. Some package and functions names might have slightly different names and/or functionality between different versions of Hadoop.  RTFM! (Read The Fabulous Manuals) i.e. the documentation!

Other tutorials – Alternative Lab Exercises

Click here to download the lab exercises.

Additional Reading and Materials
Sample code
Shakespeare data set – gdrive link
Shakespeare data set – dropbox link
Shakespeare data set – website link
Hadoop in Action – Some Case Studies
10 Hadoop Tutorials