L14-2-Step_Wise_Regression

In the previous regression examples we looked at Linear and Logistic.
In this example we will look at a simple implementation of Step-Wise Regression.
This is a common method used in many industries.
But it does have some associated issues.

Let’s use Titanic data set.
The following code runs through loading the data and some data preparation steps.
Check out the previous examples for more details on these steps

import pandas as pd
import warnings

warnings.filterwarnings("ignore")
titanic = pd.read_csv('/Users/brendan.tierney/Dropbox/titanic/train.csv')
titanic_data = titanic.drop(['PassengerId','Name','Ticket','Cabin'], 1)

def missing_age(cols):
    Age = cols[0]
    Pclass = cols[1]

    if pd.isnull(Age):
        if Pclass == 1:
        return 37
    elif Pclass == 2:
        return 29
    else:
        return 24
    else:
        return Age

#Now apply this function to the data set and the re-check
titanic_data['Age'] = titanic_data[['Age', 'Pclass']].apply(missing_age, axis=1)
titanic_data.dropna(inplace=True)
titanic_data2 = pd.get_dummies(titanic_data)
titanic_data2.drop(['Sex_male'],axis=1,inplace=True)
titanic_data2.head(10)

We can now set up the data.

from sklearn import linear_model
from sklearn import cross_validation
from sklearn import preprocessing
import numpy as np

X_train = titanic_data2.drop('Survived', axis = 1)
Y_train = titanic_data2['Survived']

#create a list containing the column names
col_names = X_train.columns

And now create a loop that incrementally adds a variable/column and outputs some results

consider = []
i=1
for col in col_names:
    consider.append(col)
    print("Iteration "+str(i)+": including : "+str(consider))
    i+=1

    train = X_train[consider]
    X = train.as_matrix()
    Y = Y_train.as_matrix()

    lm_model = linear_model.LinearRegression()
    lm_scores = cross_validation.cross_val_score(lm_model, X, Y, cv=5)
    print ("Step-wise Model: mean="+str(np.mean(lm_scores))+" std="+str(np.std(lm_scores)))
    print('')