Project Overview
This capstone project challenges you to build a text editor with complete undo/redo functionality. You will work with real editing operation data containing 100+ editing events including insertions, deletions, undo/redo operations, and cursor movements. Your goal is to implement stack-based command pattern and understand memory management techniques that power professional text editors.
Stack-based Commands
Command pattern with dual stacks
Undo/Redo System
Full operation history management
Memory Management
Efficient memory allocation & cleanup
Optimization
Performance tuning & state compression
Learning Objectives
Technical Skills
- Implement Rope data structure with balanced trees
- Build Gap Buffer for efficient local editing
- Create Piece Table for append-only editing
- Implement undo/redo using Command and Memento patterns
- Apply KMP/Rabin-Karp for efficient text search
Problem-Solving Skills
- Understand why String is inefficient for editors
- Analyze time complexity of edit operations
- Compare trade-offs between different approaches
- Handle edge cases in text manipulation
- Test with real VS Code editing data
Problem Scenario
CodeCraft IDE
You have been hired as a Core Engineer at CodeCraft IDE, a startup building a next-generation code editor to compete with VS Code and Sublime Text. The team has discovered that using Java's String class for document storage causes severe performance issues when editing large files - every keystroke becomes slow because strings are immutable and require O(n) copy operations.
"Our prototype crashes when opening files over 1MB. Simple typing has noticeable lag. We need you to implement proper data structures like VS Code uses. Can you build a text buffer that handles millions of characters with O(log n) edits and efficient undo/redo?"
Problems to Solve
- Efficient insert/delete at any position
- Handle files with millions of characters
- O(log n) operations instead of O(n)
- Memory-efficient for large documents
- Unlimited undo history
- Group related operations together
- Efficient memory usage for history
- Support redo after undo
- Fast pattern matching in large files
- Find all occurrences efficiently
- Replace single or all matches
- Support regex patterns (bonus)
- Track cursor position efficiently
- Support text selection ranges
- Handle multi-cursor editing
- Line/column navigation
The Dataset
You will work with a dataset of real text editor operations. Download the CSV file containing editing events to test your implementation:
Dataset Download
Download the text editor operations dataset. We provide a curated sample, or you can explore the full keystroke dynamics dataset on Kaggle for extended testing.
Original Data Source
This project uses editing patterns inspired by the Keystrokes Dataset from Kaggle. The dataset contains 136 million keystrokes capturing real typing sessions including press/release times, keystroke dynamics, and text entry behaviors from multiple participants.
Dataset Schema
Each row represents an editing operation with the following columns:
| Column | Type | Description |
|---|---|---|
operation_id | String | Unique operation identifier (e.g., OP001) |
timestamp | DateTime | When the operation occurred |
operation_type | String | INSERT, DELETE, UNDO, REDO, FIND, REPLACE, etc. |
position | Integer | Cursor position (0-indexed) |
length | Integer | Number of characters affected |
content | String | Text content (for INSERT/REPLACE operations) |
document_id | String | Document being edited (e.g., DOC001) |
user_id | String | User performing the operation |
session_id | String | Editing session identifier |
undo_group | Integer | Group ID for undo operations |
Operations are distributed across the following types:
- INSERT (45 operations)
- DELETE (15 operations)
- UNDO (12 operations)
- REDO (5 operations)
- FIND (8 operations)
- REPLACE (6 operations)
- CURSOR_MOVE (5 operations)
- SELECT (2 operations)
- CUT/COPY/PASTE (2 operations)
Parse operations and replay them to test your implementation:
- Create an
EditOperationclass with type, position, length, and content fields - Load operations from CSV and iterate through them
- Use switch statement to handle INSERT, DELETE, UNDO, REDO operations
- Verify your text buffer produces correct output after replaying all operations
Sample Data Preview
Here is what the data looks like from text_editor_operations.csv:
| Op ID | Type | Position | Length | Content | Doc ID |
|---|---|---|---|---|---|
| OP001 | INSERT | 0 | 13 | Hello World! | DOC001 |
| OP004 | DELETE | 5 | 6 | - | DOC001 |
| OP006 | UNDO | 0 | 0 | - | DOC001 |
Project Requirements
Your project must include all of the following components. Structure your code with clear organization, documentation, and comprehensive testing.
Text Buffer Implementation (Choose 1+)
Implement at least ONE of the following data structures:
- Rope: Binary tree where leaves contain strings, O(log n) operations
- Gap Buffer: Array with movable gap at cursor, O(1) local edits
- Piece Table: Append-only with piece descriptors, used by VS Code
Undo/Redo System
Implement complete undo/redo functionality:
- Command Pattern: Encapsulate operations as objects
- Undo Stack: Track all operations for reversal
- Redo Stack: Track undone operations for replay
- Group Operations: Batch related edits (e.g., word completion)
Find & Replace
Implement efficient text search:
- KMP Algorithm: O(n+m) pattern matching
- Find All: Return all occurrences of pattern
- Replace: Replace single or all occurrences
- Case Sensitivity: Support case-insensitive search
Documentation and Analysis
Technical documentation:
- Data Structure Analysis: Why you chose Rope/Gap Buffer/Piece Table
- Complexity Analysis: Time/space for each operation
- Comparison: Trade-offs between different approaches
- Performance Tests: Benchmarks with large files
Algorithm Specifications
Implement the following algorithms with optimal time and space complexity. Each algorithm should be thoroughly tested and documented.
Encapsulate each edit operation as a command object.
- Create
Commandinterface withexecute()andundo()methods - Implement
InsertCommand,DeleteCommandclasses - Store command state for reversal (position, text, length)
- Each command knows how to undo itself
Use two stacks to manage operation history.
- Undo Stack: Push commands after execution
- Redo Stack: Push commands after undo
- Clear redo stack on new operation
- Handle stack underflow gracefully
Optimize memory usage for large documents.
- Limit undo history size (e.g., max 100 operations)
- Compress consecutive similar operations
- Use StringBuilder for efficient string manipulation
- Implement lazy evaluation where possible
Group related operations for better UX.
- Batch rapid keystrokes into single undo unit
- Use timestamps to detect typing sessions
- Implement
beginGroup()/endGroup() - Undo entire group as single operation
Data Structure Specifications
Implement the following data structures to support efficient text editing operations. Choose at least ONE text buffer implementation.
Binary tree where leaves store string fragments.
insert(pos, str)- O(log n)delete(pos, len)- O(log n)charAt(index)- O(log n)substring(start, end)- O(log n + k)concat(rope1, rope2)- O(log n)split(pos)- O(log n)
Array with movable gap at cursor position.
insert(char)- O(1) at cursordelete()- O(1) at cursormoveCursor(pos)- O(gap movement)charAt(index)- O(1)toString()- O(n)
Append-only buffers with piece descriptors (VS Code).
insert(pos, str)- O(pieces)delete(pos, len)- O(pieces)charAt(index)- O(pieces)- Original buffer: immutable file content
- Add buffer: append-only new text
- Pieces: (buffer, start, length) descriptors
Stack-based history management.
execute(Command cmd)- O(1)undo()- O(1)redo()- O(1)canUndo()- O(1)canRedo()- O(1)beginGroup() / endGroup()- O(1)
Submission Guidelines
Submit your project via GitHub. Ensure your repository is public and follows the structure below.
Required Folders
src/- Source code filessrc/commands/- Command pattern classessrc/editor/- Main editor classtest/- JUnit test filesdata/- CSV dataset
Required Files
Command.java- Command interfaceInsertCommand.java- Insert operationDeleteCommand.java- Delete operationUndoManager.java- Stack managementREADME.md- Documentation
README Requirements
- Project overview and objectives
- Command Pattern implementation details
- Undo/Redo stack logic explanation
- Memory management strategies used
- Usage examples with sample outputs
- Time and space complexity analysis
Submission Checklist
- Command interface with execute/undo
- InsertCommand and DeleteCommand classes
- UndoManager with dual stacks
- Memory optimization (history limit)
- JUnit test suite with edge cases
- Code properly documented
Grading Rubric
Your project will be evaluated on the following criteria. Maximum score is 500 points.
| Category | Criteria | Points |
|---|---|---|
| Text Buffer |
|
150 |
| Undo/Redo |
|
100 |
| Find & Replace |
|
90 |
| Testing |
|
90 |
| Documentation |
|
70 |
| Total | 500 | |
Excellent
Outstanding implementation
Good
Meets all requirements
Needs Work
Missing key requirements