Skip to main content

Chapter 6: Advanced Python Topics - Master Python's Advanced Features

Welcome to the comprehensive guide to advanced Python topics! This chapter covers the sophisticated features and techniques that make Python a powerful and flexible programming language. These advanced concepts will help you write more efficient, elegant, and maintainable Python code.

Learning Objectives

By the end of this chapter, you will understand:

  • Generators and Iterators - memory-efficient iteration
  • Decorators - function and class enhancement
  • Context Managers - resource management
  • Metaclasses - class creation and modification
  • Descriptors - attribute access control
  • Concurrency - threading and multiprocessing
  • Regular Expressions - pattern matching and text processing
  • File Handling - reading and writing files
  • Error Handling - exception management
  • Performance Optimization - profiling and optimization techniques

Generators and Iterators

Generator Functions

Generators are functions that return an iterator using the yield keyword:

# Basic generator function
def count_up_to(max_count):
"""Generator that counts up to max_count."""
count = 1
while count <= max_count:
yield count
count += 1

# Using the generator
counter = count_up_to(5)
print(list(counter)) # [1, 2, 3, 4, 5]

# Generator with multiple yields
def fibonacci_generator(n):
"""Generate Fibonacci numbers up to n."""
a, b = 0, 1
while a < n:
yield a
a, b = b, a + b

# Using Fibonacci generator
fib = fibonacci_generator(100)
for num in fib:
print(num, end=" ")
print()

# Generator with send() method
def accumulator():
"""Generator that accumulates values."""
total = 0
while True:
value = yield total
if value is not None:
total += value

# Using send() with generator
acc = accumulator()
next(acc) # Initialize generator
print(acc.send(10)) # 10
print(acc.send(5)) # 15
print(acc.send(3)) # 18

Generator Expressions

Generator expressions are memory-efficient alternatives to list comprehensions:

# Generator expression
squares_gen = (x**2 for x in range(10))
print(list(squares_gen)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Generator expression with condition
even_squares = (x**2 for x in range(20) if x % 2 == 0)
print(list(even_squares)) # [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

# Memory efficiency comparison
import sys

# List comprehension (stores all values in memory)
list_comp = [x**2 for x in range(1000000)]
print(f"List comprehension memory: {sys.getsizeof(list_comp)} bytes")

# Generator expression (generates values on demand)
gen_expr = (x**2 for x in range(1000000))
print(f"Generator expression memory: {sys.getsizeof(gen_expr)} bytes")

Custom Iterators

class CountDown:
"""Custom iterator class."""

def __init__(self, start):
"""Initialize countdown."""
self.start = start

def __iter__(self):
"""Return iterator object."""
return self

def __next__(self):
"""Return next value."""
if self.start <= 0:
raise StopIteration
self.start -= 1
return self.start + 1

# Using custom iterator
countdown = CountDown(5)
for num in countdown:
print(num, end=" ")
print()

# Iterator with __getitem__ method
class SquareNumbers:
"""Iterator using __getitem__ method."""

def __init__(self, max_num):
"""Initialize square numbers."""
self.max_num = max_num

def __getitem__(self, index):
"""Get item at index."""
if index >= self.max_num:
raise IndexError("Index out of range")
return index ** 2

# Using __getitem__ iterator
squares = SquareNumbers(10)
for square in squares:
print(square, end=" ")
print()

Advanced Decorators

Decorator with Arguments

def repeat(times):
"""Decorator that repeats function execution."""
def decorator(func):
def wrapper(*args, **kwargs):
for _ in range(times):
result = func(*args, **kwargs)
return result
return wrapper
return decorator

@repeat(3)
def greet(name):
print(f"Hello, {name}!")

greet("Alice")

# Decorator with optional arguments
def timing_decorator(func=None, *, print_time=True):
"""Timing decorator with optional arguments."""
def decorator(f):
import time
import functools

@functools.wraps(f)
def wrapper(*args, **kwargs):
start_time = time.time()
result = f(*args, **kwargs)
end_time = time.time()

if print_time:
print(f"{f.__name__} took {end_time - start_time:.4f} seconds")
return result
return wrapper

if func is None:
return decorator
else:
return decorator(func)

@timing_decorator
def slow_function():
import time
time.sleep(1)
return "Done"

@timing_decorator(print_time=False)
def fast_function():
return "Quick"

slow_function()
fast_function()

Class Decorators

def singleton(cls):
"""Singleton decorator for classes."""
instances = {}

def get_instance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]

return get_instance

@singleton
class Database:
"""Singleton database class."""

def __init__(self):
self.connection = "Database connection"

# Test singleton
db1 = Database()
db2 = Database()
print(db1 is db2) # True

# Property decorator for classes
def add_property(cls):
"""Add a property to a class."""
def get_value(self):
return getattr(self, '_value', None)

def set_value(self, value):
self._value = value

cls.value = property(get_value, set_value)
return cls

@add_property
class MyClass:
pass

obj = MyClass()
obj.value = 42
print(obj.value) # 42

Decorator Classes

class CountCalls:
"""Decorator class that counts function calls."""

def __init__(self, func):
"""Initialize decorator."""
self.func = func
self.count = 0

def __call__(self, *args, **kwargs):
"""Call the decorated function."""
self.count += 1
print(f"Call {self.count} of {self.func.__name__}")
return self.func(*args, **kwargs)

@CountCalls
def say_hello(name):
return f"Hello, {name}!"

print(say_hello("Alice"))
print(say_hello("Bob"))
print(f"Total calls: {say_hello.count}")

Context Managers

Using with Statement

# Basic context manager
class FileManager:
"""File manager context manager."""

def __init__(self, filename, mode):
"""Initialize file manager."""
self.filename = filename
self.mode = mode
self.file = None

def __enter__(self):
"""Enter context manager."""
print(f"Opening file: {self.filename}")
self.file = open(self.filename, self.mode)
return self.file

def __exit__(self, exc_type, exc_val, exc_tb):
"""Exit context manager."""
print(f"Closing file: {self.filename}")
if self.file:
self.file.close()

if exc_type:
print(f"Exception occurred: {exc_val}")
return False # Don't suppress exceptions

# Using context manager
with FileManager("test.txt", "w") as f:
f.write("Hello, World!")

# File is automatically closed

Context Manager with contextlib

from contextlib import contextmanager
import time

@contextmanager
def timer():
"""Context manager for timing code execution."""
start_time = time.time()
print("Starting timer...")
try:
yield
finally:
end_time = time.time()
print(f"Execution time: {end_time - start_time:.4f} seconds")

# Using timer context manager
with timer():
time.sleep(1)
print("Doing some work...")

# Context manager for database connections
@contextmanager
def database_connection():
"""Simulate database connection."""
print("Connecting to database...")
connection = "Database connection"
try:
yield connection
finally:
print("Closing database connection...")

with database_connection() as db:
print(f"Using {db}")

Multiple Context Managers

from contextlib import ExitStack

def process_files(filenames):
"""Process multiple files using ExitStack."""
with ExitStack() as stack:
files = [stack.enter_context(open(fname, 'r')) for fname in filenames]
# Process files
for file in files:
print(f"Processing {file.name}")
content = file.read()
print(f"Content length: {len(content)}")

# Using multiple context managers
filenames = ["file1.txt", "file2.txt", "file3.txt"]
try:
process_files(filenames)
except FileNotFoundError as e:
print(f"File not found: {e}")

Metaclasses

Basic Metaclass

class SingletonMeta(type):
"""Metaclass for singleton pattern."""

_instances = {}

def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]

class Database(metaclass=SingletonMeta):
"""Database class using singleton metaclass."""

def __init__(self):
self.connection = "Database connection"

# Test singleton metaclass
db1 = Database()
db2 = Database()
print(db1 is db2) # True

# Metaclass for automatic registration
class PluginMeta(type):
"""Metaclass for plugin registration."""

registry = {}

def __new__(cls, name, bases, attrs):
new_class = super().__new__(cls, name, bases, attrs)
if name != 'Plugin':
cls.registry[name] = new_class
return new_class

class Plugin(metaclass=PluginMeta):
"""Base plugin class."""
pass

class EmailPlugin(Plugin):
"""Email plugin."""
pass

class SMSPlugin(Plugin):
"""SMS plugin."""
pass

# Access registered plugins
print("Registered plugins:", list(PluginMeta.registry.keys()))

Advanced Metaclass

class ValidatedMeta(type):
"""Metaclass for automatic validation."""

def __new__(cls, name, bases, attrs):
# Add validation methods
for key, value in attrs.items():
if isinstance(value, type) and hasattr(value, '__annotations__'):
attrs[f'_validate_{key}'] = cls._create_validator(value)

return super().__new__(cls, name, bases, attrs)

@staticmethod
def _create_validator(field_type):
"""Create validator for field type."""
def validator(self, value):
if not isinstance(value, field_type):
raise TypeError(f"Expected {field_type.__name__}, got {type(value).__name__}")
return value
return validator

class Person(metaclass=ValidatedMeta):
"""Person class with validation."""

name: str
age: int

def __init__(self, name, age):
self.name = self._validate_name(name)
self.age = self._validate_age(age)

# Using validated class
person = Person("Alice", 25)
print(f"Name: {person.name}, Age: {person.age}")

try:
invalid_person = Person("Bob", "25") # Should raise TypeError
except TypeError as e:
print(f"Validation error: {e}")

Descriptors

Property Descriptors

class Temperature:
"""Temperature class using descriptors."""

def __init__(self, celsius=0):
self._celsius = celsius

def __get__(self, instance, owner):
return self._celsius

def __set__(self, instance, value):
if value < -273.15:
raise ValueError("Temperature cannot be below absolute zero")
self._celsius = value

class Weather:
"""Weather class using temperature descriptor."""

temperature = Temperature()

def __init__(self, temp):
self.temperature = temp

# Using descriptor
weather = Weather(25)
print(f"Temperature: {weather.temperature}°C")

weather.temperature = 30
print(f"Temperature: {weather.temperature}°C")

try:
weather.temperature = -300 # Should raise ValueError
except ValueError as e:
print(f"Error: {e}")

Custom Descriptor

class ValidatedAttribute:
"""Descriptor for validated attributes."""

def __init__(self, validator_func):
self.validator_func = validator_func
self.name = None

def __set_name__(self, owner, name):
self.name = name

def __get__(self, instance, owner):
if instance is None:
return self
return instance.__dict__.get(self.name)

def __set__(self, instance, value):
if not self.validator_func(value):
raise ValueError(f"Invalid value for {self.name}: {value}")
instance.__dict__[self.name] = value

def is_positive(value):
"""Validator for positive numbers."""
return isinstance(value, (int, float)) and value > 0

def is_non_empty_string(value):
"""Validator for non-empty strings."""
return isinstance(value, str) and len(value.strip()) > 0

class Product:
"""Product class with validated attributes."""

name = ValidatedAttribute(is_non_empty_string)
price = ValidatedAttribute(is_positive)
quantity = ValidatedAttribute(is_positive)

def __init__(self, name, price, quantity):
self.name = name
self.price = price
self.quantity = quantity

# Using validated attributes
product = Product("Laptop", 999.99, 10)
print(f"Product: {product.name}, Price: ${product.price}")

try:
invalid_product = Product("", -100, 5) # Should raise ValueError
except ValueError as e:
print(f"Validation error: {e}")

Concurrency

Threading

import threading
import time
import queue

def worker(name, work_queue, result_queue):
"""Worker function for threading."""
while True:
try:
item = work_queue.get(timeout=1)
if item is None:
break

print(f"Worker {name} processing {item}")
time.sleep(0.5) # Simulate work
result_queue.put(f"{name} processed {item}")
work_queue.task_done()
except queue.Empty:
break

# Using threading
work_queue = queue.Queue()
result_queue = queue.Queue()

# Add work items
for i in range(10):
work_queue.put(f"Task {i}")

# Create and start threads
threads = []
for i in range(3):
thread = threading.Thread(target=worker, args=(f"Worker-{i}", work_queue, result_queue))
thread.start()
threads.append(thread)

# Wait for all work to be done
work_queue.join()

# Stop workers
for _ in range(3):
work_queue.put(None)

for thread in threads:
thread.join()

# Collect results
results = []
while not result_queue.empty():
results.append(result_queue.get())

print("Results:", results)

Multiprocessing

import multiprocessing
import time

def cpu_intensive_task(n):
"""CPU intensive task."""
result = 0
for i in range(n):
result += i ** 2
return result

def process_data(data):
"""Process data in parallel."""
with multiprocessing.Pool() as pool:
results = pool.map(cpu_intensive_task, data)
return results

# Using multiprocessing
data = [1000000, 2000000, 3000000, 4000000]

start_time = time.time()
results = process_data(data)
end_time = time.time()

print(f"Results: {results}")
print(f"Processing time: {end_time - start_time:.4f} seconds")

Asyncio (Asynchronous Programming)

import asyncio
import aiohttp
import time

async def fetch_url(session, url):
"""Fetch URL asynchronously."""
async with session.get(url) as response:
return await response.text()

async def fetch_multiple_urls(urls):
"""Fetch multiple URLs concurrently."""
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results

# Using asyncio
async def main():
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/1"
]

start_time = time.time()
results = await fetch_multiple_urls(urls)
end_time = time.time()

print(f"Fetched {len(results)} URLs in {end_time - start_time:.4f} seconds")

# Run async function
# asyncio.run(main())

Regular Expressions

Basic Pattern Matching

import re

# Basic pattern matching
text = "The quick brown fox jumps over the lazy dog"
pattern = r"fox"
match = re.search(pattern, text)
if match:
print(f"Found '{match.group()}' at position {match.start()}")

# Find all matches
pattern = r"\b\w{4}\b" # 4-letter words
matches = re.findall(pattern, text)
print("4-letter words:", matches)

# Pattern with groups
pattern = r"(\w+) (\w+)"
matches = re.findall(pattern, text)
print("Word pairs:", matches)

# Substitution
new_text = re.sub(r"fox", "cat", text)
print("After substitution:", new_text)

Advanced Regular Expressions

# Email validation
def validate_email(email):
"""Validate email address."""
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return re.match(pattern, email) is not None

# Phone number validation
def validate_phone(phone):
"""Validate phone number."""
pattern = r'^\+?1?[-.\s]?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$'
return re.match(pattern, phone) is not None

# Extract information from text
def extract_info(text):
"""Extract information from text."""
# Extract dates
date_pattern = r'\b(\d{1,2})/(\d{1,2})/(\d{4})\b'
dates = re.findall(date_pattern, text)

# Extract phone numbers
phone_pattern = r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
phones = re.findall(phone_pattern, text)

# Extract email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
emails = re.findall(email_pattern, text)

return {
'dates': dates,
'phones': phones,
'emails': emails
}

# Test functions
test_text = """
Contact John at [email protected] or call (555) 123-4567.
Meeting scheduled for 12/25/2024.
Also reach out to [email protected] at 555.987.6543.
"""

print("Email validation:")
print(validate_email("[email protected]")) # True
print(validate_email("invalid-email")) # False

print("\nPhone validation:")
print(validate_phone("(555) 123-4567")) # True
print(validate_phone("555-123-4567")) # True
print(validate_phone("invalid-phone")) # False

print("\nExtracted information:")
info = extract_info(test_text)
for key, value in info.items():
print(f"{key}: {value}")

File Handling

Reading and Writing Files

# Writing to files
def write_to_file(filename, content):
"""Write content to file."""
with open(filename, 'w') as f:
f.write(content)
print(f"Content written to {filename}")

# Reading from files
def read_from_file(filename):
"""Read content from file."""
try:
with open(filename, 'r') as f:
content = f.read()
return content
except FileNotFoundError:
print(f"File {filename} not found")
return None

# Working with CSV files
import csv

def write_csv(filename, data):
"""Write data to CSV file."""
with open(filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
print(f"Data written to {filename}")

def read_csv(filename):
"""Read data from CSV file."""
data = []
with open(filename, 'r') as f:
reader = csv.reader(f)
for row in reader:
data.append(row)
return data

# Working with JSON files
import json

def write_json(filename, data):
"""Write data to JSON file."""
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
print(f"Data written to {filename}")

def read_json(filename):
"""Read data from JSON file."""
try:
with open(filename, 'r') as f:
data = json.load(f)
return data
except FileNotFoundError:
print(f"File {filename} not found")
return None

# Example usage
sample_data = [
["Name", "Age", "City"],
["Alice", 25, "New York"],
["Bob", 30, "London"],
["Charlie", 35, "Tokyo"]
]

write_csv("sample.csv", sample_data)
csv_data = read_csv("sample.csv")
print("CSV data:", csv_data)

json_data = {
"users": [
{"name": "Alice", "age": 25, "city": "New York"},
{"name": "Bob", "age": 30, "city": "London"}
]
}

write_json("sample.json", json_data)
json_data_read = read_json("sample.json")
print("JSON data:", json_data_read)

Error Handling and Debugging

Advanced Exception Handling

import logging
import traceback

# Custom exception classes
class ValidationError(Exception):
"""Custom validation error."""
pass

class DatabaseError(Exception):
"""Custom database error."""
pass

# Exception handling with logging
def setup_logging():
"""Setup logging configuration."""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('app.log'),
logging.StreamHandler()
]
)

def validate_data(data):
"""Validate data with custom exceptions."""
if not isinstance(data, dict):
raise ValidationError("Data must be a dictionary")

if 'name' not in data:
raise ValidationError("Name is required")

if 'age' not in data:
raise ValidationError("Age is required")

if not isinstance(data['age'], int) or data['age'] < 0:
raise ValidationError("Age must be a positive integer")

return True

def process_data(data):
"""Process data with comprehensive error handling."""
try:
validate_data(data)
logging.info(f"Processing data for {data['name']}")

# Simulate processing
if data['age'] > 100:
raise DatabaseError("Age too high for database")

return f"Processed {data['name']}, age {data['age']}"

except ValidationError as e:
logging.error(f"Validation error: {e}")
return f"Validation failed: {e}"

except DatabaseError as e:
logging.error(f"Database error: {e}")
return f"Database operation failed: {e}"

except Exception as e:
logging.error(f"Unexpected error: {e}")
logging.error(traceback.format_exc())
return f"Unexpected error: {e}"

# Setup logging and test
setup_logging()

# Test with valid data
valid_data = {"name": "Alice", "age": 25}
result = process_data(valid_data)
print(result)

# Test with invalid data
invalid_data = {"name": "Bob", "age": -5}
result = process_data(invalid_data)
print(result)

# Test with database error
high_age_data = {"name": "Charlie", "age": 150}
result = process_data(high_age_data)
print(result)

Performance Optimization

Profiling and Optimization

import time
import cProfile
import pstats
from functools import lru_cache

# Function without optimization
def fibonacci_slow(n):
"""Slow Fibonacci implementation."""
if n <= 1:
return n
return fibonacci_slow(n-1) + fibonacci_slow(n-2)

# Function with memoization
@lru_cache(maxsize=None)
def fibonacci_fast(n):
"""Fast Fibonacci implementation with memoization."""
if n <= 1:
return n
return fibonacci_fast(n-1) + fibonacci_fast(n-2)

# Function with iterative approach
def fibonacci_iterative(n):
"""Iterative Fibonacci implementation."""
if n <= 1:
return n

a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b

# Performance comparison
def compare_performance():
"""Compare performance of different implementations."""
n = 35

# Test slow version
start_time = time.time()
result_slow = fibonacci_slow(n)
slow_time = time.time() - start_time

# Test fast version
start_time = time.time()
result_fast = fibonacci_fast(n)
fast_time = time.time() - start_time

# Test iterative version
start_time = time.time()
result_iterative = fibonacci_iterative(n)
iterative_time = time.time() - start_time

print(f"Fibonacci({n}) = {result_slow}")
print(f"Slow version: {slow_time:.4f} seconds")
print(f"Fast version: {fast_time:.4f} seconds")
print(f"Iterative version: {iterative_time:.4f} seconds")
print(f"Speedup (fast): {slow_time / fast_time:.2f}x")
print(f"Speedup (iterative): {slow_time / iterative_time:.2f}x")

# Run performance comparison
compare_performance()

# Profiling example
def profile_function():
"""Profile a function using cProfile."""
def sample_function():
"""Sample function to profile."""
total = 0
for i in range(1000000):
total += i ** 2
return total

# Profile the function
profiler = cProfile.Profile()
profiler.enable()
result = sample_function()
profiler.disable()

# Print profiling results
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10) # Print top 10 functions

return result

# Run profiling
print("\nProfiling results:")
profile_function()

Practical Examples

Example 1: Web Scraper with Advanced Features

import requests
import re
from urllib.parse import urljoin, urlparse
from collections import deque
import time

class WebScraper:
"""Advanced web scraper with rate limiting and error handling."""

def __init__(self, delay=1):
"""Initialize scraper."""
self.delay = delay
self.visited = set()
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})

def scrape_page(self, url):
"""Scrape a single page."""
try:
response = self.session.get(url, timeout=10)
response.raise_for_status()

# Extract links
links = re.findall(r'href="([^"]*)"', response.text)
absolute_links = [urljoin(url, link) for link in links]

# Extract text content
text_content = re.sub(r'<[^>]+>', '', response.text)
text_content = re.sub(r'\s+', ' ', text_content).strip()

return {
'url': url,
'title': self.extract_title(response.text),
'content': text_content[:500], # First 500 characters
'links': absolute_links,
'status_code': response.status_code
}

except requests.RequestException as e:
print(f"Error scraping {url}: {e}")
return None

def extract_title(self, html):
"""Extract title from HTML."""
title_match = re.search(r'<title>(.*?)</title>', html, re.IGNORECASE)
return title_match.group(1) if title_match else "No title"

def crawl(self, start_url, max_pages=10):
"""Crawl multiple pages."""
queue = deque([start_url])
results = []

while queue and len(results) < max_pages:
url = queue.popleft()

if url in self.visited:
continue

self.visited.add(url)
result = self.scrape_page(url)

if result:
results.append(result)
# Add new links to queue
for link in result['links']:
if link not in self.visited and len(queue) < max_pages * 2:
queue.append(link)

time.sleep(self.delay) # Rate limiting

return results

# Using the scraper
scraper = WebScraper(delay=0.5)
results = scraper.crawl("https://httpbin.org/html", max_pages=3)

for result in results:
print(f"URL: {result['url']}")
print(f"Title: {result['title']}")
print(f"Content: {result['content'][:100]}...")
print(f"Links found: {len(result['links'])}")
print("-" * 50)

Example 2: Data Processing Pipeline

import json
import csv
from typing import List, Dict, Any
from dataclasses import dataclass
from pathlib import Path

@dataclass
class DataRecord:
"""Data record class."""
id: str
name: str
age: int
city: str
salary: float

class DataProcessor:
"""Data processing pipeline."""

def __init__(self):
"""Initialize processor."""
self.records = []

def load_from_json(self, filename: str) -> None:
"""Load data from JSON file."""
with open(filename, 'r') as f:
data = json.load(f)

for item in data:
record = DataRecord(
id=item['id'],
name=item['name'],
age=item['age'],
city=item['city'],
salary=item['salary']
)
self.records.append(record)

def load_from_csv(self, filename: str) -> None:
"""Load data from CSV file."""
with open(filename, 'r') as f:
reader = csv.DictReader(f)
for row in reader:
record = DataRecord(
id=row['id'],
name=row['name'],
age=int(row['age']),
city=row['city'],
salary=float(row['salary'])
)
self.records.append(record)

def filter_by_age(self, min_age: int, max_age: int) -> List[DataRecord]:
"""Filter records by age range."""
return [record for record in self.records
if min_age <= record.age <= max_age]

def filter_by_city(self, city: str) -> List[DataRecord]:
"""Filter records by city."""
return [record for record in self.records
if record.city.lower() == city.lower()]

def calculate_statistics(self) -> Dict[str, Any]:
"""Calculate statistics for the data."""
if not self.records:
return {}

salaries = [record.salary for record in self.records]
ages = [record.age for record in self.records]

return {
'total_records': len(self.records),
'average_salary': sum(salaries) / len(salaries),
'max_salary': max(salaries),
'min_salary': min(salaries),
'average_age': sum(ages) / len(ages),
'cities': list(set(record.city for record in self.records))
}

def export_to_csv(self, filename: str, records: List[DataRecord] = None) -> None:
"""Export records to CSV file."""
if records is None:
records = self.records

with open(filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['id', 'name', 'age', 'city', 'salary'])
for record in records:
writer.writerow([record.id, record.name, record.age,
record.city, record.salary])

def export_to_json(self, filename: str, records: List[DataRecord] = None) -> None:
"""Export records to JSON file."""
if records is None:
records = self.records

data = []
for record in records:
data.append({
'id': record.id,
'name': record.name,
'age': record.age,
'city': record.city,
'salary': record.salary
})

with open(filename, 'w') as f:
json.dump(data, f, indent=2)

# Using the data processor
processor = DataProcessor()

# Create sample data
sample_data = [
{"id": "1", "name": "Alice", "age": 25, "city": "New York", "salary": 75000},
{"id": "2", "name": "Bob", "age": 30, "city": "London", "salary": 80000},
{"id": "3", "name": "Charlie", "age": 35, "city": "Tokyo", "salary": 90000},
{"id": "4", "name": "Diana", "age": 28, "city": "New York", "salary": 85000}
]

# Save sample data
with open("sample_data.json", "w") as f:
json.dump(sample_data, f, indent=2)

# Load and process data
processor.load_from_json("sample_data.json")

# Filter data
young_employees = processor.filter_by_age(25, 30)
ny_employees = processor.filter_by_city("New York")

# Calculate statistics
stats = processor.calculate_statistics()
print("Statistics:", stats)

# Export filtered data
processor.export_to_csv("young_employees.csv", young_employees)
processor.export_to_json("ny_employees.json", ny_employees)

print(f"Found {len(young_employees)} young employees")
print(f"Found {len(ny_employees)} New York employees")

Summary

In this chapter, we've covered:

  • Generators and Iterators: Memory-efficient iteration and custom iteration patterns
  • Advanced Decorators: Decorators with arguments, class decorators, and decorator classes
  • Context Managers: Resource management with with statements and contextlib
  • Metaclasses: Class creation and modification at the metaclass level
  • Descriptors: Attribute access control and validation
  • Concurrency: Threading, multiprocessing, and asynchronous programming
  • Regular Expressions: Pattern matching and text processing
  • File Handling: Reading and writing various file formats
  • Error Handling: Advanced exception management and debugging
  • Performance Optimization: Profiling and optimization techniques

These advanced Python topics provide the foundation for building sophisticated, efficient, and maintainable applications. Understanding these concepts will help you write more Pythonic code and leverage Python's full potential.

Next Steps

Now that you've mastered advanced Python topics, you're ready to explore:

  1. Web Development: Django, Flask, or FastAPI frameworks
  2. Data Science: NumPy, Pandas, and Matplotlib
  3. Machine Learning: Scikit-learn, TensorFlow, or PyTorch
  4. DevOps: Docker, CI/CD, and deployment strategies
  5. Testing: Unit testing, integration testing, and test automation

Congratulations! You've completed the comprehensive Python programming tutorial. You now have a solid foundation in Python programming and are ready to tackle real-world projects and advanced frameworks.