Assignment 8-A

Log File Analyzer

Build a powerful log file analyzer that reads, parses, filters, and generates reports from server log files using file streams and I/O operations.

6-8 hours
Intermediate
200 Points
Submit Assignment
What You'll Practice
  • File input/output streams
  • Text file parsing
  • Binary file operations
  • String stream manipulation
  • Error handling for I/O
Contents
01

Assignment Overview

Create a Log File Analyzer that processes Apache/Nginx-style log files, extracts statistics, filters entries, and generates comprehensive reports. This comprehensive project requires you to apply ALL concepts from Module 8: file input/output, stream operations, and system programming.

No External Libraries: You must use ONLY standard C++ libraries (<fstream>, <sstream>, <iostream>). This tests your understanding of core C++ file I/O.
Skills Applied: This assignment tests your understanding of File I/O (Topic 8.1), Streams and Formatting (Topic 8.2), and System Programming (Topic 8.3) from Module 8.
File I/O (8.1)

ifstream, ofstream, fstream - Reading and writing files

Streams (8.2)

stringstream parsing, formatting output, stream states

System Programming (8.3)

Command-line arguments, environment variables

Ready to submit? Already completed the assignment? Submit your work now!
Submit Now
02

The Scenario

CloudNet Hosting Services

You have been hired as a C++ Developer at CloudNet Hosting Services, a web hosting company that needs a log analysis tool. The DevOps manager has given you this task:

"We have Apache/Nginx web server logs generating gigabytes of data daily. We need a fast C++ tool that can parse these logs, extract key statistics, filter suspicious traffic, identify performance bottlenecks, and generate detailed reports. Can you build this analyzer for us?"

Your Task

Create a command-line C++ application called LogAnalyzer that reads Apache/Nginx log files, processes millions of entries efficiently using file streams, extracts statistical data, filters based on criteria, and generates both text and binary report formats.

03

The Dataset

You will work with Apache/Nginx Common Log Format. Create a sample log file or use the provided format:

File: sample.log (Apache/Nginx Access Log)

192.168.1.10 - - [15/Jan/2026:10:23:14 +0000] "GET /index.html HTTP/1.1" 200 4523 "-" "Mozilla/5.0"
10.0.0.25 - - [15/Jan/2026:10:23:15 +0000] "POST /api/login HTTP/1.1" 200 1024 "-" "curl/7.68.0"
203.0.113.45 - - [15/Jan/2026:10:23:16 +0000] "GET /images/logo.png HTTP/1.1" 200 15234 "https://example.com/" "Chrome/96.0"
192.168.1.10 - - [15/Jan/2026:10:23:18 +0000] "GET /style.css HTTP/1.1" 200 8456 "-" "Mozilla/5.0"
198.51.100.88 - - [15/Jan/2026:10:23:19 +0000] "GET /admin/login HTTP/1.1" 404 512 "-" "Python-requests/2.28.0"
203.0.113.45 - - [15/Jan/2026:10:23:20 +0000] "GET /about.html HTTP/1.1" 200 3421 "-" "Chrome/96.0"
10.0.0.25 - - [15/Jan/2026:10:23:22 +0000] "POST /api/data HTTP/1.1" 500 256 "-" "curl/7.68.0"
192.168.1.10 - - [15/Jan/2026:10:23:25 +0000] "GET /contact.html HTTP/1.1" 200 2134 "-" "Mozilla/5.0"
198.51.100.88 - - [15/Jan/2026:10:23:27 +0000] "GET /admin/dashboard HTTP/1.1" 403 256 "-" "Python-requests/2.28.0"
203.0.113.45 - - [15/Jan/2026:10:23:30 +0000] "GET /products.html HTTP/1.1" 200 12456 "https://example.com/" "Chrome/96.0"
Log Format Explained (Common Log Format)
IP_ADDRESS - - [TIMESTAMP] "METHOD URL PROTOCOL" STATUS_CODE BYTES "REFERRER" "USER_AGENT"
  • IP_ADDRESS - Client IP address (string)
  • TIMESTAMP - Date and time in format: [DD/MMM/YYYY:HH:MM:SS +ZONE]
  • METHOD - HTTP method (GET, POST, PUT, DELETE, etc.)
  • URL - Requested URL path (string)
  • PROTOCOL - HTTP protocol version (HTTP/1.1, HTTP/2.0)
  • STATUS_CODE - HTTP response status (200, 404, 500, etc.) - integer
  • BYTES - Response size in bytes - integer
  • REFERRER - Referring page URL or "-"
  • USER_AGENT - Client browser/tool identifier
Parsing Tip: Use std::stringstream and std::getline() with custom delimiters (space, quotes, brackets) to extract each field. Handle edge cases like missing fields or malformed lines gracefully.
04

Requirements

Your LogAnalyzer application must implement ALL of the following features. Each component is mandatory and will be tested individually.

1
LogEntry Structure

Create a LogEntry struct in LogEntry.h that:

  • Contains all log fields: ip, timestamp, method, url, statusCode, bytes, referrer, userAgent
  • Uses appropriate data types (string, int, etc.)
  • Includes a default constructor
struct LogEntry {
    std::string ip;
    std::string timestamp;
    std::string method;
    std::string url;
    int statusCode;
    int bytes;
    std::string referrer;
    std::string userAgent;
    
    LogEntry() : statusCode(0), bytes(0) {}
};
2
Parse Log File

Implement a function that reads and parses the log file:

  • Use std::ifstream with proper error checking
  • Use std::stringstream to parse each line
  • Handle quoted fields correctly (timestamps in brackets, strings in quotes)
  • Skip or report malformed lines without crashing
  • Return a std::vector<LogEntry> containing all valid entries
std::vector<LogEntry> parseLogFile(const std::string& filename) {
    std::vector<LogEntry> entries;
    std::ifstream file(filename);
    
    if (!file.is_open()) {
        std::cerr << "Error: Cannot open file " << filename << std::endl;
        return entries;
    }
    
    std::string line;
    while (std::getline(file, line)) {
        // Parse line using stringstream
        // Handle brackets, quotes, and spaces
        // Add valid entries to vector
    }
    
    return entries;
}
3
Generate Statistics

Calculate and display comprehensive statistics:

  • Total requests: Count of all log entries
  • Unique visitors: Count of unique IP addresses (use std::set)
  • Status code distribution: Count by status code (use std::map<int, int>)
  • Top 10 URLs: Most frequently requested URLs (use map and sorting)
  • Total bandwidth: Sum of all bytes transferred
  • Average request size: Total bytes / total requests
  • Requests by hour: Group by hour extracted from timestamp
struct Statistics {
    int totalRequests;
    int uniqueVisitors;
    std::map<int, int> statusCodeCount;
    std::vector<std::pair<std::string, int>> topUrls;
    long long totalBandwidth;
    double avgRequestSize;
    std::map<int, int> requestsByHour;
    
    void display() const;
};

Statistics generateStatistics(const std::vector<LogEntry>& entries);
4
Filtering Functionality

Implement filters to extract specific log entries:

  • By status code: Filter by HTTP status (e.g., show only 404 or 500 errors)
  • By IP address: Show all requests from a specific IP
  • By URL pattern: Filter URLs containing a substring (e.g., "/api/" or "/admin/")
  • By date/time range: Filter by timestamp (bonus)
  • Write filtered results to a new file using std::ofstream
std::vector<LogEntry> filterByStatusCode(
    const std::vector<LogEntry>& entries, int statusCode);

std::vector<LogEntry> filterByIP(
    const std::vector<LogEntry>& entries, const std::string& ip);

std::vector<LogEntry> filterByURL(
    const std::vector<LogEntry>& entries, const std::string& pattern);

void writeFilteredLog(const std::vector<LogEntry>& entries, 
                      const std::string& outputFile);
5
Text Report Generation

Generate a formatted text report and save to file:

  • Use std::ofstream to write the report
  • Use std::setw(), std::setprecision(), and other manipulators for formatting
  • Include header with analysis date/time
  • Display all statistics in a readable format
  • Include tables with proper alignment
void generateReport(const Statistics& stats, const std::string& filename) {
    std::ofstream report(filename);
    if (!report.is_open()) {
        std::cerr << "Error: Cannot create report file" << std::endl;
        return;
    }
    
    report << "========================================\n";
    report << "       LOG ANALYSIS REPORT\n";
    report << "========================================\n\n";
    
    report << "Total Requests: " << stats.totalRequests << "\n";
    report << "Unique Visitors: " << stats.uniqueVisitors << "\n";
    
    // Use iomanip for formatting
    report << std::fixed << std::setprecision(2);
    report << "Average Request Size: " << stats.avgRequestSize << " bytes\n";
    
    // More formatted output...
}
6
Binary Cache (Bonus)

Implement binary file operations for caching statistics:

  • Save computed statistics to a binary file using write()
  • Load cached statistics using read()
  • Check file modification times to determine if cache is valid
  • Provides faster loading for repeated analysis of large log files
void saveBinaryCache(const Statistics& stats, const std::string& cacheFile) {
    std::ofstream out(cacheFile, std::ios::binary);
    if (!out.is_open()) return;
    
    out.write(reinterpret_cast<const char*>(&stats.totalRequests), sizeof(int));
    out.write(reinterpret_cast<const char*>(&stats.uniqueVisitors), sizeof(int));
    // Write other fields...
}

bool loadBinaryCache(Statistics& stats, const std::string& cacheFile) {
    std::ifstream in(cacheFile, std::ios::binary);
    if (!in.is_open()) return false;
    
    in.read(reinterpret_cast<char*>(&stats.totalRequests), sizeof(int));
    in.read(reinterpret_cast<char*>(&stats.uniqueVisitors), sizeof(int));
    // Read other fields...
    return true;
}
7
Command-Line Interface

Implement command-line argument parsing (Topic 8.3):

  • Accept log filename as required argument
  • Support optional flags: -o output.txt (report file), -f status_code (filter), -v (verbose)
  • Display usage message when arguments are missing or --help is provided
  • Validate arguments before processing
int main(int argc, char* argv[]) {
    if (argc < 2) {
        std::cout << "Usage: " << argv[0] << " <logfile> [-o report.txt] [-f 404] [-v]" << std::endl;
        return 1;
    }
    
    std::string logFile = argv[1];
    std::string reportFile = "report.txt";
    bool verbose = false;
    int filterStatus = -1;
    
    // Parse optional arguments...
    
    return 0;
}
8
Error Handling

Handle all potential errors gracefully:

  • File not found: Check if log file exists before opening
  • Permission denied: Handle cases where files cannot be opened for reading/writing
  • Malformed log lines: Skip invalid lines and optionally report them
  • Empty files: Handle empty log files without crashing
  • Stream errors: Check stream states (fail(), bad(), eof())
  • Provide meaningful error messages to stderr
std::ifstream file(filename);
if (!file.is_open()) {
    std::cerr << "Error: Cannot open file '" << filename << "'" << std::endl;
    return false;
}

if (file.fail()) {
    std::cerr << "Error: Failed to read from file" << std::endl;
    return false;
}
05

Submission

Required Repository Name
cpp-log-analyzer
cpp-log-analyzer/
├── LogEntry.h          # Log entry struct
├── LogAnalyzer.h       # Analyzer class
├── LogAnalyzer.cpp     # Implementation
├── main.cpp            # Main application
├── sample.log          # Sample log file
├── sample_output.txt   # Sample report
└── README.md           # Documentation
06

Grading Rubric

CriteriaPoints
Log parsing with stringstream40
Statistics calculation40
Filtering functionality35
Report generation (text)30
Binary cache (bonus)25
Error handling15
Code Quality15
Total200