back to course main page

L6 – File I/O in Python

In this tutorial we will look at how to can read and write files using Python, and some of the typical commands you will use to support the reading and writing of files.

1. Reading and Writing File Basics
Reading and writing files is one of the most common tasks you will need to perform in any language. At a basic level you will be reading and writing plain text files. But for Data Science and Machine Learning projects you will working with CSV files mainly. Why this file format? It is the most commonly used format used in most businesses.

The example code given in the following sections cover the reading and writing CSV files. Reading in records of that for you to analyse and then saving your data to CSV files.

In addition to the commands shown on this webpage there are many more file commands that allow you to step through a file in increments of records or by X number of bytes or characters. I’ll leave these particular commands for you to research and work out how to use them.

2. File permissions
As with all other languages, when working with files you can apply certain permissions to the file. This permissions determines how you are going to work with the file and also to protect you when working with the files so that you do not do something wrong. The following table lists the files permissions in Python.

r

Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.

rb

Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.

r+

Opens a file for both reading and writing. The file pointer placed at the beginning of the file.

rb+

Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.

w
Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

wb
Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

wb+
Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

a
Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

ab
Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

a+
Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

ab+
Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

3. Reading a File
The basic syntax for reading a file is

file object = open(file_name [, access_mode][, buffering])

Some of the properties of a file object inlcude

closed = is the file closed
mode = what mode is the file open in
name = name of the file

The following code illustrates the opening of a file in read only mode and then prints some of the properties of the file.

# open file, giving full path to file and filename. Open with r permissions
f = open("/Users/brendan.tierney/My_Phyton_Files/usernames.txt", "r")
print("Name of the file: ", f.name)

print("Name of the file: ", f.name)
print("Closed or not : ", f.closed)
print("Opening mode : ", f.mode)

We can use the simple read function to read all the contents of the file

# print the contents of the file
print(f.read())

If we want to process each line one at a time we can use a for loop

f = open("/Users/brendan.tierney/My_Phyton_Files/usernames.txt", "r")
i = 1
for line in f:
   print("Line", i, ": ", line, end='')
   i=i+1

CSV files
Although CSV stands for Comma Separated File, these files can come with in a variety of different format. The following example uses an SAS organics data set in CSV format.

# import the csv library
import csv

# open the file
f_open = open("/Users/brendan.tierney/My_Phyton_Files/organics.csv")

# read the file as a CSV file with the delimiter set to a comma
f_csv = csv.reader(f_open, delimiter=',')
i = 1
for row in f_csv:
   # process each row
   if i == 1:
      print(row)
   elif i == 2:
      # print out some of the details of the second record, indexed based on data column
      print(row)
      print("CUST_ID=", row[1])
      print("Gender=", row[2])
      print("DOB=", row[3])
 
 # count the number of records
 i=i+1
 
# print the number of records read from CSV file
print("Number of records =", i)

4. Writing to a File
We have a write function. The following code reads from a file and then writes the outputs to another file.
Everytime we run this code the file been written too gets over written

f = open("/Users/brendan.tierney/My_Phyton_Files/usernames.txt", "r")
# write to a file
print("Open file for writing to")
f_write = open("/Users/brendan.tierney/My_Phyton_Files/usernames2.txt", "w")
i = 1
for line in f:
 f_write.write(line)
 print("Writing line", i, "to file")
 i=i+1
 
print("Finished writing file")
f_write.close()

CSV files
The following example extends othe previous CSV example by opening an output file, defining it ass CSV format and
then writing the records read in from a CSV file, out to the new file, also, in CSV format.

# import the csv library
import csv

# open the file
f_open = open("/Users/brendan.tierney/My_Phyton_Files/organics.csv")

# read the file as a CSV file with the delimiter set to a comma
f_csv = csv.reader(f_open, delimiter=',')

# create output file in CSV format
f_Out = open("/Users/brendan.tierney/My_Phyton_Files/organics_copy.csv", "w")
f_OutCSV = csv.writer(f_Out, delimiter=',', quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
i = 1

for row in f_csv:
 f_OutCSV.writerow(row) 
 # count the number of records
 i=i+1
 
# print the number of records written to CSV file
print("Records written to file =", i)

f_Out.close()

5. Reading & Writing Files with Panda
The following example illustrates the reading of a CSV file into a Panda (see section/webpage for more on Pandas), filters this data set to only having WII related games and then writes this subset (also a Panda) out to a CSV file.

import pandas as pd

videoReview = pd.read_csv('/Users/brendan.tierney/Downloads/Video_Games_Sales_as_at_22_Dec_2016.csv')

print('# print first 3 rows')
print(videoReview[:3])

print('Num records:',len(videoReview))
print('unique platforms = ', videoReview['Platform'].unique())

print('')
print('Sorting and Ordering')
df = videoReview[(videoReview.Platform=='Wii') & (videoReview.NA_Sales>9)]
print(df.sort_values('Global_Sales', ascending=True))

df.to_csv('/Users/brendan.tierney/My_Phyton_Files/video_games_wii.csv', sep=',')

6. Working with Directories
There are a number of functions to allow us to work with directorys including getting the current working directory, changing directory, creating a directory, deleting a directory, etc. For these we need to import the os package.

import os

user_cwd = os.getcwd()
print("Current working directory: ", user_cwd)

# Changing the working directory
os.chdir("/Users/brendan.tierney/My_Phyton_Files")
print("Current working directory: ", os.getcwd())

# create a directory
os.mkdir("test")

# remove a directory
os.rmdir("test")

7. Some other file functions
There are a number of other functions to allow us to rename and to delete a file from the OS.

# rename a file
os.rename( "test.txt", "test_rename.txt" )

# delete a file
os.remove("text_rename.txt")

8. Closing a file
To safeguard the data you have written to a file and to protect the file you should close it after you have finished working with it. The closing of the file shouldn’t happen at the end of your code, but should happen as soon as you are finished writing to it. The time between the opening of the file and the closing of the file should be kept to a minimum.

f.close()

back to course main page