Assignment Overview
Create a Log File Analyzer that processes Apache/Nginx-style log files, extracts statistics, filters entries, and generates comprehensive reports. This comprehensive project requires you to apply ALL concepts from Module 8: file input/output, stream operations, and system programming.
<fstream>,
<sstream>, <iostream>). This tests your understanding of core C++ file I/O.
File I/O (8.1)
ifstream, ofstream, fstream - Reading and writing files
Streams (8.2)
stringstream parsing, formatting output, stream states
System Programming (8.3)
Command-line arguments, environment variables
The Scenario
CloudNet Hosting Services
You have been hired as a C++ Developer at CloudNet Hosting Services, a web hosting company that needs a log analysis tool. The DevOps manager has given you this task:
"We have Apache/Nginx web server logs generating gigabytes of data daily. We need a fast C++ tool that can parse these logs, extract key statistics, filter suspicious traffic, identify performance bottlenecks, and generate detailed reports. Can you build this analyzer for us?"
Your Task
Create a command-line C++ application called LogAnalyzer that reads Apache/Nginx
log files, processes millions of entries efficiently using file streams, extracts statistical
data, filters based on criteria, and generates both text and binary report formats.
The Dataset
You will work with Apache/Nginx Common Log Format. Create a sample log file or use the provided format:
File: sample.log (Apache/Nginx Access Log)
192.168.1.10 - - [15/Jan/2026:10:23:14 +0000] "GET /index.html HTTP/1.1" 200 4523 "-" "Mozilla/5.0"
10.0.0.25 - - [15/Jan/2026:10:23:15 +0000] "POST /api/login HTTP/1.1" 200 1024 "-" "curl/7.68.0"
203.0.113.45 - - [15/Jan/2026:10:23:16 +0000] "GET /images/logo.png HTTP/1.1" 200 15234 "https://example.com/" "Chrome/96.0"
192.168.1.10 - - [15/Jan/2026:10:23:18 +0000] "GET /style.css HTTP/1.1" 200 8456 "-" "Mozilla/5.0"
198.51.100.88 - - [15/Jan/2026:10:23:19 +0000] "GET /admin/login HTTP/1.1" 404 512 "-" "Python-requests/2.28.0"
203.0.113.45 - - [15/Jan/2026:10:23:20 +0000] "GET /about.html HTTP/1.1" 200 3421 "-" "Chrome/96.0"
10.0.0.25 - - [15/Jan/2026:10:23:22 +0000] "POST /api/data HTTP/1.1" 500 256 "-" "curl/7.68.0"
192.168.1.10 - - [15/Jan/2026:10:23:25 +0000] "GET /contact.html HTTP/1.1" 200 2134 "-" "Mozilla/5.0"
198.51.100.88 - - [15/Jan/2026:10:23:27 +0000] "GET /admin/dashboard HTTP/1.1" 403 256 "-" "Python-requests/2.28.0"
203.0.113.45 - - [15/Jan/2026:10:23:30 +0000] "GET /products.html HTTP/1.1" 200 12456 "https://example.com/" "Chrome/96.0"
Log Format Explained (Common Log Format)
IP_ADDRESS - - [TIMESTAMP] "METHOD URL PROTOCOL" STATUS_CODE BYTES "REFERRER" "USER_AGENT"
IP_ADDRESS- Client IP address (string)TIMESTAMP- Date and time in format:[DD/MMM/YYYY:HH:MM:SS +ZONE]METHOD- HTTP method (GET, POST, PUT, DELETE, etc.)URL- Requested URL path (string)PROTOCOL- HTTP protocol version (HTTP/1.1, HTTP/2.0)STATUS_CODE- HTTP response status (200, 404, 500, etc.) - integerBYTES- Response size in bytes - integerREFERRER- Referring page URL or "-"USER_AGENT- Client browser/tool identifier
std::stringstream and std::getline() with
custom delimiters (space, quotes, brackets) to extract each field. Handle edge cases like missing fields
or malformed lines gracefully.
Requirements
Your LogAnalyzer application must implement ALL of the following features.
Each component is mandatory and will be tested individually.
LogEntry Structure
Create a LogEntry struct in LogEntry.h that:
- Contains all log fields:
ip,timestamp,method,url,statusCode,bytes,referrer,userAgent - Uses appropriate data types (string, int, etc.)
- Includes a default constructor
struct LogEntry {
std::string ip;
std::string timestamp;
std::string method;
std::string url;
int statusCode;
int bytes;
std::string referrer;
std::string userAgent;
LogEntry() : statusCode(0), bytes(0) {}
};
Parse Log File
Implement a function that reads and parses the log file:
- Use std::ifstream with proper error checking
- Use std::stringstream to parse each line
- Handle quoted fields correctly (timestamps in brackets, strings in quotes)
- Skip or report malformed lines without crashing
- Return a std::vector<LogEntry> containing all valid entries
std::vector<LogEntry> parseLogFile(const std::string& filename) {
std::vector<LogEntry> entries;
std::ifstream file(filename);
if (!file.is_open()) {
std::cerr << "Error: Cannot open file " << filename << std::endl;
return entries;
}
std::string line;
while (std::getline(file, line)) {
// Parse line using stringstream
// Handle brackets, quotes, and spaces
// Add valid entries to vector
}
return entries;
}
Generate Statistics
Calculate and display comprehensive statistics:
- Total requests: Count of all log entries
- Unique visitors: Count of unique IP addresses (use
std::set) - Status code distribution: Count by status code (use
std::map<int, int>) - Top 10 URLs: Most frequently requested URLs (use map and sorting)
- Total bandwidth: Sum of all bytes transferred
- Average request size: Total bytes / total requests
- Requests by hour: Group by hour extracted from timestamp
struct Statistics {
int totalRequests;
int uniqueVisitors;
std::map<int, int> statusCodeCount;
std::vector<std::pair<std::string, int>> topUrls;
long long totalBandwidth;
double avgRequestSize;
std::map<int, int> requestsByHour;
void display() const;
};
Statistics generateStatistics(const std::vector<LogEntry>& entries);
Filtering Functionality
Implement filters to extract specific log entries:
- By status code: Filter by HTTP status (e.g., show only 404 or 500 errors)
- By IP address: Show all requests from a specific IP
- By URL pattern: Filter URLs containing a substring (e.g., "/api/" or "/admin/")
- By date/time range: Filter by timestamp (bonus)
- Write filtered results to a new file using std::ofstream
std::vector<LogEntry> filterByStatusCode(
const std::vector<LogEntry>& entries, int statusCode);
std::vector<LogEntry> filterByIP(
const std::vector<LogEntry>& entries, const std::string& ip);
std::vector<LogEntry> filterByURL(
const std::vector<LogEntry>& entries, const std::string& pattern);
void writeFilteredLog(const std::vector<LogEntry>& entries,
const std::string& outputFile);
Text Report Generation
Generate a formatted text report and save to file:
- Use std::ofstream to write the report
- Use std::setw(), std::setprecision(), and other manipulators for formatting
- Include header with analysis date/time
- Display all statistics in a readable format
- Include tables with proper alignment
void generateReport(const Statistics& stats, const std::string& filename) {
std::ofstream report(filename);
if (!report.is_open()) {
std::cerr << "Error: Cannot create report file" << std::endl;
return;
}
report << "========================================\n";
report << " LOG ANALYSIS REPORT\n";
report << "========================================\n\n";
report << "Total Requests: " << stats.totalRequests << "\n";
report << "Unique Visitors: " << stats.uniqueVisitors << "\n";
// Use iomanip for formatting
report << std::fixed << std::setprecision(2);
report << "Average Request Size: " << stats.avgRequestSize << " bytes\n";
// More formatted output...
}
Binary Cache (Bonus)
Implement binary file operations for caching statistics:
- Save computed statistics to a binary file using
write() - Load cached statistics using
read() - Check file modification times to determine if cache is valid
- Provides faster loading for repeated analysis of large log files
void saveBinaryCache(const Statistics& stats, const std::string& cacheFile) {
std::ofstream out(cacheFile, std::ios::binary);
if (!out.is_open()) return;
out.write(reinterpret_cast<const char*>(&stats.totalRequests), sizeof(int));
out.write(reinterpret_cast<const char*>(&stats.uniqueVisitors), sizeof(int));
// Write other fields...
}
bool loadBinaryCache(Statistics& stats, const std::string& cacheFile) {
std::ifstream in(cacheFile, std::ios::binary);
if (!in.is_open()) return false;
in.read(reinterpret_cast<char*>(&stats.totalRequests), sizeof(int));
in.read(reinterpret_cast<char*>(&stats.uniqueVisitors), sizeof(int));
// Read other fields...
return true;
}
Command-Line Interface
Implement command-line argument parsing (Topic 8.3):
- Accept log filename as required argument
- Support optional flags:
-o output.txt(report file),-f status_code(filter),-v(verbose) - Display usage message when arguments are missing or
--helpis provided - Validate arguments before processing
int main(int argc, char* argv[]) {
if (argc < 2) {
std::cout << "Usage: " << argv[0] << " <logfile> [-o report.txt] [-f 404] [-v]" << std::endl;
return 1;
}
std::string logFile = argv[1];
std::string reportFile = "report.txt";
bool verbose = false;
int filterStatus = -1;
// Parse optional arguments...
return 0;
}
Error Handling
Handle all potential errors gracefully:
- File not found: Check if log file exists before opening
- Permission denied: Handle cases where files cannot be opened for reading/writing
- Malformed log lines: Skip invalid lines and optionally report them
- Empty files: Handle empty log files without crashing
- Stream errors: Check stream states (
fail(),bad(),eof()) - Provide meaningful error messages to stderr
std::ifstream file(filename);
if (!file.is_open()) {
std::cerr << "Error: Cannot open file '" << filename << "'" << std::endl;
return false;
}
if (file.fail()) {
std::cerr << "Error: Failed to read from file" << std::endl;
return false;
}
Submission
Required Repository Name
cpp-log-analyzer
cpp-log-analyzer/
├── LogEntry.h # Log entry struct
├── LogAnalyzer.h # Analyzer class
├── LogAnalyzer.cpp # Implementation
├── main.cpp # Main application
├── sample.log # Sample log file
├── sample_output.txt # Sample report
└── README.md # Documentation
Grading Rubric
| Criteria | Points |
|---|---|
| Log parsing with stringstream | 40 |
| Statistics calculation | 40 |
| Filtering functionality | 35 |
| Report generation (text) | 30 |
| Binary cache (bonus) | 25 |
| Error handling | 15 |
| Code Quality | 15 |
| Total | 200 |