File Handling | Data Science Course

Reading & Writing Text Files

Data scientists spend much of their time working with files - reading datasets, saving results, and processing information. Understanding file handling is essential before diving into data analysis libraries.

Why File Handling Matters

Think about your daily work: you download CSV files, save analysis results, read configuration files, and export reports. All of this requires file handling - the ability to read from and write to files on your computer.

In data science, you'll constantly work with files containing datasets, model configurations, and output results. While libraries like Pandas make working with data files easier, understanding Python's built-in file handling gives you the foundation to work with any file type.

Real-world example: Imagine you're analyzing customer feedback. The raw comments are in a text file, customer info is in a CSV, and your company's product data is in JSON format. You need to read all three, process them, and save your analysis. That's file handling in action!

Opening Files: The open() Function

Python's built-in open() function is your gateway to working with files. It takes a filename and a mode that specifies what you want to do with the file.

Built-in Function

open(filename, mode)

Opens a file and returns a file object that you can use to read or write data. The mode determines whether you're reading, writing, or appending to the file.

Always remember: Files must be closed after use to free up system resources. The best practice is using the with statement (covered in Section 04).

Here are the most common file modes:

Mode	Description	Creates File?	Overwrites?
`'r'`	Read (default) - opens file for reading	No (error if missing)	No
`'w'`	Write - opens file for writing	Yes	Yes (erases content!)
`'a'`	Append - adds to end of file	Yes	No (keeps content)
`'r+'`	Read and write	No (error if missing)	Depends on operation

Warning: Using 'w' mode on an existing file will erase all its content before writing! If you want to add to a file, use 'a' (append) mode instead.

Reading Files

There are several ways to read content from a file. Download the sample file below to follow along:

Download sample_text.txt Sample text file for practice (TXT, 1KB)

Method 1: read() - Reads the entire file as one string:

# Read entire file as one string
with open("sample_text.txt", "r") as file:
    content = file.read()
    print(content)

Method 2: readline() - Reads one line at a time:

# Read one line at a time
with open("sample_text.txt", "r") as file:
    first_line = file.readline()
    second_line = file.readline()
    print("First:", first_line.strip())
    print("Second:", second_line.strip())

Method 3: readlines() - Reads all lines into a list:

# Read all lines into a list
with open("sample_text.txt", "r") as file:
    lines = file.readlines()
    print(f"Total lines: {len(lines)}")
    print(lines[:3])  # First 3 lines

Method 4: Loop through lines - Most memory-efficient for large files:

# Loop through file line by line (memory efficient!)
with open("sample_text.txt", "r") as file:
    for line in file:
        print(line.strip())

Pro tip: When working with large files (millions of lines), use the loop method. It reads one line at a time instead of loading the entire file into memory.

Writing Files

Writing to files is just as straightforward. Use 'w' mode to create a new file (or overwrite existing), or 'a' to append to an existing file.

# Writing to a new file (creates it if doesn't exist)
file = open("output.txt", "w")
file.write("Line 1: Hello!\n")
file.write("Line 2: This is Python.\n")
file.close()

# The file now contains:
# Line 1: Hello!
# Line 2: This is Python.

Use writelines() to write multiple lines from a list:

# Writing multiple lines at once
lines = ["Apple\n", "Banana\n", "Cherry\n"]
file = open("fruits.txt", "w")
file.writelines(lines)
file.close()

Appending adds content to the end without erasing:

# Append to existing file (doesn't erase!)
file = open("fruits.txt", "a")
file.write("Dragonfruit\n")
file.close()

# fruits.txt now contains:
# Apple
# Banana
# Cherry
# Dragonfruit

Practice Questions: Text Files

Test your understanding with these hands-on exercises.

Task: Read sample_text.txt and count how many lines it contains.

Expected output: Total lines: X (where X is the actual count)

Show Solution

with open("sample_text.txt", "r") as file:
    lines = file.readlines()
    print(f"Total lines: {len(lines)}")

Given:

items = ["Milk", "Bread", "Eggs", "Butter"]

Task: Write each item to a file called shopping_list.txt, one item per line.

Show Solution

items = ["Milk", "Bread", "Eggs", "Butter"]

with open("shopping_list.txt", "w") as file:
    for item in items:
        file.write(item + "\n")

Task: Read sample_text.txt and print only the lines that contain the word "data" (case-insensitive).

Hint: Use .lower() for case-insensitive comparison.

Show Solution

with open("sample_text.txt", "r") as file:
    for line in file:
        if "data" in line.lower():
            print(line.strip())

Given:

events = ["User login", "File uploaded", "Report generated"]

Task: Write each event to activity_log.txt with a timestamp. Each line should look like: 2025-12-20 10:30:45 - User login

Hint: Use from datetime import datetime

Show Solution

from datetime import datetime

events = ["User login", "File uploaded", "Report generated"]

with open("activity_log.txt", "w") as file:
    for event in events:
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        file.write(f"{timestamp} - {event}\n")

CSV File Operations

CSV (Comma-Separated Values) is one of the most common formats for storing tabular data. Every data scientist encounters CSV files daily - from spreadsheet exports to database dumps.

Download sales_data.csv Sample sales dataset for this section (CSV, 1KB)

What is CSV?

A CSV file is simply a text file where each line represents a row of data, and values within each row are separated by commas (or sometimes semicolons or tabs). The first row typically contains column headers.

# Preview of sales_data.csv:
# product,quantity,price,revenue,date
# Laptop,5,45000,225000,2024-01-15
# Mouse,25,500,12500,2024-01-15
# ...

Python's built-in csv module makes it easy to read and write CSV files without worrying about edge cases like commas inside quoted values or different line endings.

# Import the csv module
import csv

Reading CSV Files

The csv.reader() function reads CSV files row by row, returning each row as a list of values.

import csv

# Read the sales data CSV file
with open("sales_data.csv", "r") as file:
    reader = csv.reader(file)
    
    # Get header row
    header = next(reader)
    print("Columns:", header)
    
    # Read first 3 data rows
    for i, row in enumerate(reader):
        if i < 3:
            print(row)

You can access individual values by index:

import csv

with open("sales_data.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    
    for row in reader:
        product = row[0]
        revenue = int(row[3])
        print(f"{product}: ₹{revenue:,}")

Writing CSV Files

Use csv.writer() to write data to CSV files. The writerow() method writes a single row, while writerows() writes multiple rows at once.

import csv

# Data to write
students = [
    ["name", "age", "city", "score"],
    ["Priya", 22, "Mumbai", 85],
    ["Rahul", 24, "Delhi", 92],
    ["Ankit", 23, "Bangalore", 78]
]

# Write to CSV file
with open("new_students.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(students)  # Write all rows at once

Note: Always include newline="" when opening CSV files for writing on Windows. This prevents blank rows from appearing between data rows.

Writing row by row gives you more control:

import csv

with open("scores.csv", "w", newline="") as file:
    writer = csv.writer(file)
    
    # Write header
    writer.writerow(["Student", "Math", "Science"])
    
    # Write data rows one by one
    writer.writerow(["Priya", 95, 88])
    writer.writerow(["Rahul", 87, 92])
    writer.writerow(["Meera", 91, 85])

Working with DictReader and DictWriter

Instead of accessing columns by index (which can be confusing), DictReader lets you access values by column name. This makes your code much more readable!

import csv

# DictReader - access by column name (much cleaner!)
with open("sales_data.csv", "r") as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        # Access by column name - so much clearer!
        product = row["product"]
        revenue = int(row["revenue"])
        date = row["date"]
        print(f"{date}: {product} - ₹{revenue:,}")

Similarly, DictWriter lets you write using column names:

import csv

# Data as list of dictionaries
students = [
    {"name": "Priya", "age": 22, "score": 85},
    {"name": "Rahul", "age": 24, "score": 92},
    {"name": "Ankit", "age": 23, "score": 78}
]

# DictWriter - specify column names (fieldnames)
with open("students_dict.csv", "w", newline="") as file:
    fieldnames = ["name", "age", "score"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()  # Write the header row
    writer.writerows(students)  # Write all data rows

Use DictReader/DictWriter when...

Column order might change
You want readable, self-documenting code
Working with many columns
Data naturally maps to dictionaries

Use reader/writer when...

Simple files with few columns
Processing data sequentially
Performance is critical (slightly faster)
No header row in file

Practice Questions: CSV Files

Test your understanding with these hands-on exercises.

Task: Read customers.csv and print each row as a dictionary.

Show Solution

import csv

with open("customers.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)

Task: Count how many data rows (excluding header) are in customers.csv.

Show Solution

import csv

with open("customers.csv", "r") as file:
    reader = csv.DictReader(file)
    count = sum(1 for row in reader)
    print(f"Total rows: {count}")

Task: Read sales_data.csv and print only rows where the revenue is greater than 20000.

Hint: Convert the revenue string to int for comparison.

Show Solution

import csv

with open("sales_data.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        if int(row['revenue']) > 20000:
            print(row)

Task: Read sales_data.csv and calculate the average of the 'revenue' column.

Show Solution

import csv

with open("sales_data.csv", "r") as file:
    reader = csv.DictReader(file)
    revenues = [int(row['revenue']) for row in reader]
    
average = sum(revenues) / len(revenues)
print(f"Average revenue: ₹{average:,.2f}")

JSON Data Handling

JSON (JavaScript Object Notation) has become the universal language for data exchange on the web. APIs, configuration files, and NoSQL databases all speak JSON - making it essential for data scientists.

Download students.json Sample student records dataset (JSON, 1KB)

What is JSON?

JSON is a lightweight text format that looks remarkably similar to Python dictionaries and lists. It's human-readable, easy to parse, and supported by virtually every programming language.

# Preview of students.json structure:
# {
#     "students": [
#         {"name": "Priya Sharma", "gpa": 3.8, ...},
#         {"name": "Rahul Kumar", "gpa": 3.6, ...}
#     ],
#     "institution": "Data Science Academy"
# }

Notice how JSON maps directly to Python types:

JSON Type	Python Type	Example
object	dict	`{"key": "value"}`
array	list	`[1, 2, 3]`
string	str	`"hello"`
number	int/float	`42` or `3.14`
true/false	True/False	`true` → `True`
null	None	`null` → `None`

# Import the json module
import json

Reading JSON Files

Use json.load() to read JSON from a file. It automatically converts JSON into Python dictionaries and lists.

import json

# Read the students.json file (download above)
with open("students.json", "r") as file:
    data = json.load(file)

# Access top-level data
print(data["institution"])  # Data Science Academy
print(data["semester"])     # Fall 2024

# Access nested student data
students = data["students"]
print(f"Total students: {len(students)}")

Working with JSON arrays (lists of objects) is common when dealing with nested data:

import json

with open("students.json", "r") as file:
    data = json.load(file)

# Loop through the list of student dictionaries
for student in data["students"]:
    name = student["name"]
    gpa = student["gpa"]
    courses = ", ".join(student["courses"])
    print(f"{name} (GPA: {gpa}) - {courses}")

Writing JSON Files

Use json.dump() to write Python data to a JSON file. The indent parameter makes the output human-readable (pretty-printed).

import json

# Python dictionary to save
student = {
    "name": "Rahul Kumar",
    "age": 24,
    "enrolled": True,
    "courses": ["Machine Learning", "Deep Learning"],
    "gpa": 3.8
}

# Write to JSON file (indent=2 for pretty formatting)
with open("output.json", "w") as file:
    json.dump(student, file, indent=2)

The output file will look like this (nicely formatted):

# Contents of output.json:
# {
#   "name": "Rahul Kumar",
#   "age": 24,
#   "enrolled": true,
#   "courses": [
#     "Machine Learning",
#     "Deep Learning"
#   ],
#   "gpa": 3.8
# }

Pretty printing: Use indent=2 or indent=4 for readable output. Omit indent for compact output (saves space but harder to read).

Working with JSON Strings

Sometimes you need to convert between JSON strings and Python objects without files. This is common when working with APIs or web services.

json.loads() - Parse a JSON string into Python (the 's' stands for 'string'):

import json

# JSON string (maybe from an API response)
json_string = '{"name": "Meera", "score": 91, "passed": true}'

# Parse string to Python dictionary
data = json.loads(json_string)
print(data["name"])    # Meera
print(data["passed"])  # True (boolean, not string!)

json.dumps() - Convert Python to a JSON string:

import json

# Python dictionary
student = {"name": "Vikram", "scores": [85, 90, 88]}

# Convert to JSON string
json_string = json.dumps(student)
print(json_string)
# Output: {"name": "Vikram", "scores": [85, 90, 88]}

# Pretty-printed string
pretty_json = json.dumps(student, indent=2)
print(pretty_json)

Remember the difference:

json.load() / json.dump() - work with files
json.loads() / json.dumps() - work with strings

Practice Questions: JSON Data

Test your understanding with these hands-on exercises.

Given:

api_response = '{"status": "success", "count": 42, "data": [1, 2, 3]}'

Task: Parse this JSON string and print the count value.

Show Solution

import json

api_response = '{"status": "success", "count": 42, "data": [1, 2, 3]}'
result = json.loads(api_response)
print(result["count"])  # 42

Given:

config = {
    "app_name": "DataAnalyzer",
    "version": "1.0",
    "debug": True
}

Task: Save this config dictionary to a file called config.json with indentation.

Show Solution

import json

config = {
    "app_name": "DataAnalyzer",
    "version": "1.0",
    "debug": True
}

with open("config.json", "w") as file:
    json.dump(config, file, indent=2)

Task: Read products.json, add a new product to the list, and save it back.

New product: {"name": "Laptop", "price": 999.99}

Show Solution

import json

# Read existing data
with open("products.json", "r") as file:
    products = json.load(file)

# Add new product
products.append({"name": "Laptop", "price": 999.99})

# Save updated data
with open("products.json", "w") as file:
    json.dump(products, file, indent=2)

Given:

json_str = '''
{
  "company": "TechCorp",
  "employees": [
    {"name": "Alice", "department": "Engineering"},
    {"name": "Bob", "department": "Sales"},
    {"name": "Carol", "department": "Engineering"}
  ]
}'''

Task: Parse the JSON and print the names of all employees in the Engineering department.

Show Solution

import json

json_str = '''
{
  "company": "TechCorp",
  "employees": [
    {"name": "Alice", "department": "Engineering"},
    {"name": "Bob", "department": "Sales"},
    {"name": "Carol", "department": "Engineering"}
  ]
}'''

data = json.loads(json_str)
for emp in data["employees"]:
    if emp["department"] == "Engineering":
        print(emp["name"])
# Output: Alice, Carol

Context Managers (with Statement)

Throughout this lesson, you've seen the with statement. It's not just syntactic sugar - it's a powerful pattern that prevents bugs and keeps your code clean. Let's understand why it matters.

The Problem with Manual File Handling

When you open a file, your operating system allocates resources to track it. If you forget to close the file, these resources stay locked - potentially causing issues like:

Memory leaks in long-running programs
Data not being written to disk (stuck in buffer)
Other programs unable to access the file
Maximum open file limit reached

Here's the old (problematic) way of handling files:

# The WRONG way - easy to forget close() or miss it on errors
file = open("data.txt", "r")
content = file.read()
# ... do something with content ...
# What if an error occurs here? close() never runs!
file.close()  # Easy to forget this line

Even worse, if an error occurs between open() and close(), the file never gets closed:

# Error prevents close() from running!
file = open("data.txt", "r")
content = file.read()
result = int(content)  # ValueError if content isn't a number!
file.close()  # This line NEVER executes if error above

The with Statement Solution

The with statement is Python's elegant solution. It automatically closes the file when the block ends - even if an error occurs! This is called a "context manager."

# The RIGHT way - file is ALWAYS closed automatically
with open("data.txt", "r") as file:
    content = file.read()
    # Even if an error occurs here...
# File is automatically closed here, guaranteed!

Best Practice

Context Manager (with statement)

A context manager ensures that resources are properly managed - acquired when needed and released when done. The with statement handles setup and cleanup automatically.

Rule: Always use with when working with files. There's no good reason to use manual open/close in modern Python.

Here's how it handles errors gracefully:

# Even with errors, file gets closed!
try:
    with open("data.txt", "r") as file:
        content = file.read()
        result = int(content)  # Might raise ValueError
except ValueError:
    print("Could not convert to integer")
# File is closed regardless of error or success

Working with Multiple Files

You can open multiple files in a single with statement - useful for copying data or comparing files:

# Open two files at once
with open("input.txt", "r") as infile, open("output.txt", "w") as outfile:
    for line in infile:
        # Process and write each line
        outfile.write(line.upper())

For many files, use separate lines for readability:

# Multiple files - cleaner with parentheses (Python 3.10+)
with (
    open("source.txt", "r") as source,
    open("backup.txt", "w") as backup,
    open("log.txt", "a") as log
):
    content = source.read()
    backup.write(content)
    log.write("Backup created successfully\n")

File Paths and Common Errors

Understanding file paths prevents many common errors. There are two types:

Relative Path

Path relative to your current working directory:

"data.txt"
"data/input.txt"
"../parent_folder/file.txt"

Absolute Path

Full path from system root:

"/home/priya/data.txt" (Linux/Mac)
"C:/Users/Priya/data.txt" (Windows)

Common file errors and solutions:

# FileNotFoundError - file doesn't exist
# Solution: Check path, use os.path.exists()
import os
if os.path.exists("data.txt"):
    with open("data.txt", "r") as file:
        content = file.read()
else:
    print("File not found!")

# PermissionError - no access rights
# Solution: Check file permissions, run as admin, or use different path

# UnicodeDecodeError - encoding mismatch
# Solution: Specify encoding
with open("data.txt", "r", encoding="utf-8") as file:
    content = file.read()

Pro tip: Always specify encoding="utf-8" when working with text files that might contain non-English characters. UTF-8 handles most international characters correctly.

Practice Questions: Context Managers

Test your understanding with these hands-on exercises.

Given (bad code):

file = open("data.txt", "r")
content = file.read()
print(content)
file.close()

Task: Rewrite this using a context manager (with statement).

Show Solution

with open("data.txt", "r") as file:
    content = file.read()
    print(content)
# File automatically closed!

Task: Write code that checks if "report.txt" exists before reading it. Print "File not found" if it doesn't exist.

Show Solution

import os

if os.path.exists("report.txt"):
    with open("report.txt", "r") as file:
        print(file.read())
else:
    print("File not found")

Task: Open two files simultaneously - read from "source.txt" and write its contents to "destination.txt".

Show Solution

with open("source.txt", "r") as source, open("destination.txt", "w") as dest:
    content = source.read()
    dest.write(content)

Task: Read a large file line by line (memory efficient), count how many lines contain the word "error", and save the count to "error_count.txt".

Hint: Don't load entire file into memory - iterate line by line.

Show Solution

error_count = 0

with open("server_log.txt", "r") as file:
    for line in file:  # Memory efficient - one line at a time
        if "error" in line.lower():
            error_count += 1

with open("error_count.txt", "w") as output:
    output.write(f"Total errors found: {error_count}")

Key Takeaways

Always Use with Statement

Context managers automatically close files, even if errors occur. Never use manual open/close.

Know Your File Modes

'r' for read, 'w' for write (overwrites!), 'a' for append. Wrong mode = data loss or errors.

CSV for Tabular Data

Use DictReader/DictWriter for cleaner code. Access columns by name instead of index.

JSON for Structured Data

load/dump for files, loads/dumps for strings. JSON maps directly to Python dicts and lists.

Specify Encoding

Use encoding="utf-8" for text files with international characters to avoid decode errors.

Check File Existence

Use os.path.exists() before reading. Handle FileNotFoundError gracefully in your code.

Python File Handling

What You'll Learn

Contents

Reading & Writing Text Files

Why File Handling Matters

Opening Files: The open() Function

open(filename, mode)

Reading Files

Writing Files

Practice Questions: Text Files

Medium Audit server log entries for compliance reporting

Hard Generate warehouse inventory manifest file

Easy Search support tickets for keyword mentions

Medium Build timestamped activity tracker for user sessions

CSV File Operations

What is CSV?

Reading CSV Files

Writing CSV Files

Working with DictReader and DictWriter

Practice Questions: CSV Files

Hard Import CRM customer records for marketing analysis

Easy Validate transaction count for daily reconciliation

Medium Extract high-value deals from quarterly sales report

Easy Calculate average order value for e-commerce dashboard

JSON Data Handling

What is JSON?

Reading JSON Files

Writing JSON Files

Working with JSON Strings

Practice Questions: JSON Data

Medium Decode API response from payment gateway

Easy Export application settings to configuration file

Hard Update product catalog with new inventory items

Medium Filter engineering team from company org chart

Context Managers (with Statement)

The Problem with Manual File Handling

The with Statement Solution

Context Manager (with statement)

Working with Multiple Files

File Paths and Common Errors

Practice Questions: Context Managers

Hard Refactor legacy file handling to production-safe code

Medium Verify report availability before data pipeline execution

Easy Create backup copy of configuration template

Medium Scan production logs for error frequency analysis

Key Takeaways

Always Use with Statement

Know Your File Modes

CSV for Tabular Data

JSON for Structured Data

Specify Encoding

Check File Existence

Knowledge Check

Quick Quiz

1 Which file mode creates a new file and overwrites existing content?

2 What is the main advantage of using the with statement for file handling?

3 Which function parses a JSON string into a Python dictionary?

4 What does csv.DictReader return for each row?

5 To add new content to an existing file without erasing it, which mode should you use?

6 What does file.read() return?

2 What is the main advantage of using the `with` statement for file handling?

4 What does `csv.DictReader` return for each row?

6 What does `file.read()` return?