Overview of this Lecture / week

Part of data preparation is exploring and investigating the data. You can discover a lot of information about what is happening in your data by using a variety of different visualisation techniques and looking at the data in different ways.

This week we will have a look at Text Mining and the main steps involved in doing this. You will explore techniques for Test Mining in the Machine Learning module.

Notes

Click here to download notes for Week 4 – Text Mining.

L - Text Mining

Videos of Notes

Video of Text Mining and Word Cloud in R. (ignore webpage at beginning of video. Everything is on this page)

See Task 2 for link to sample code

Lab Exercises

Task 1 – Load a data set into SAS Enterprise Miner and explore the data

Find a data set from a data set repository. For example A Plethora of Data Set Repositories
There are lots and lots of these.
Find one of these repositories, find a data set, download it, load it into SAS EM and Aanalyse the data (same as last week)

How to load your own data sets into SAS OnDemand
SAS Guide for loading your own data using SAS Studio

Task 2 – Text Mining & Word Cloud in R

Demo R script for building a word cloud

Text Analysis 101 : Document Classification

Task 3 – Optional

Use the code (and if needed expand it) to analyse 3 or 4 webpages from a company
or (maybe do both if you can)
Use this code (and if needed expand it) to analyse and/or compare some news stories from newpaper websites

Work together individually or in pairs.
Discuss the usefulness of WordClouds and how they can give you interesting insights on the topics covered on those websites.
Does the pattern of words match what you would expect ?

What to prepare for next week

Make sure you are keeping up with all the lab exercises.

We will be back using SAS Enterprise Miner next week.

Additional Reading Materials

Text Analysis 101 : Document Classification

Sentiment Analysis in R Tutorial – Kaggle

Machine Learning basics for a Newbie