back to course main page

L7 – Pandas

Pandas are like Excel Spreadsheets or a database table. The data is neatly organisated into rows and columns, and the Pandas library provides a large suite of functions and graphs that can be used on the data stored in a Panda.

The following demonstrates how you can use Pandas for storing, analysing, producing statistics, and producting graphics based on the data in a Panda

1. Setting up Pandas
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Pandas Website and Documentation

To install the Pandas libray run the following in the command line.

pip3 install pandas

It will install any required libraries.

Some quick tutorials for Pandas
10 Minutes to Pandas
Pandas Tutorials

To import Pandas library

import panda as pd
2. Reading a CSV File into a Panda
videoReview = pd.read_csv('/Users/brendan.tierney/Downloads/Video_Games_Sales_as_at_22_Dec_2016.csv')

Yes it is that simple!

3. Analysing data in a Panda
print('# print first 3 rows')
print(videoReview[:3])

print('----------')
print('# print columns')
print(videoReview['Name'])

print('----------')
print('# print columns, first 5 rows')
print(videoReview['Name'][:5])

print('----------')
print('# Platform #')
print(videoReview['Platform'].value_counts())

print('----------')
print('#shape')
print('Number of rows = ', videoReview.shape[0])
print('Number of columns = ', videoReview.shape[1])
print('Shape = ', videoReview.shape)

print('----------')
print('# Name #')
print(videoReview['Name'].value_counts())

print('----------')
print('#head')
print(videoReview.head())
print(videoReview.head(8))

print('----------')
print('#tail')
print(videoReview.tail())
print(videoReview.tail(8))

print('----------')
print('#Describe')
print(videoReview.describe()) # calculates measures of central tendency
print(videoReview['Platform'].describe())
print(videoReview['Year_of_Release'].describe())

print('----------')
print('#info')
print(videoReview.info()) # memory footprint and datatypes

# perform some Statistics on the items in a panda
import pandas as pd
import numpy as np
import matplotlib as plt

videoReview = pd.read_csv('/Users/brendan.tierney/Downloads/Video_Games_Sales_as_at_22_Dec_2016.csv')

print('----------')
print('Calculate column mean values')
print(videoReview.mean(axis=0))

print('----------')
print('Calculate column median values')
print(videoReview.median(axis=0))

print('----------')
print('Calculate column mode values')
print(videoReview.mode(axis=0))

print('----------')
print('Iterate some rows from DF')
for i, row in videoReview[:9].iterrows():
 print('#### Printing row ####')
 print(row['Name'])

#the following commented out commands work
print('')
print('Group by')
4. Subsetting and ordering Pandas
print('----------')
print('Group by Year, Platform by Count : ')
print(videoReview.groupby(['Year_of_Release','Platform']).count())


print('----------')
print('Group by : for Year=2016 group by Platform and count Global Sales')
print(videoReview[videoReview.Year_of_Release==2016.0].groupby('Platform')['Global_Sales'].sum())
5. More Panda functions
print('----------')
print('Sorting and Ordering')
df = videoReview[(videoReview.Platform=='Wii') & (videoReview.NA_Sales>9)]
print(df.sort_values('Global_Sales', ascending=True))
6. Writing a Panda to a CSV file
df.to_csv('/Users/brendan.tierney/My_Phyton_Files/video_games_wii.csv', sep=',')
7. Creating Graphs for a Panda
#plotting
print('----------')
print('Plotting - Histogram')
videoReview['Year_of_Release'].plot(kind='hist')

 

back to course main page