This is a Rust implementation of Python's difflib.unified_diff function with PyO3 bindings, created as a high-performance alternative to the built-in Python implementation.
# Activate virtual environment (required)
source venv/bin/activate
# Clear build cache (important when changes aren't taking effect!)
cargo clean
# Development build (for testing)
maturin develop
# Production build with release optimizations (recommended for benchmarking)
maturin develop --release
# Build wheel for distribution
maturin build --release# Activate virtual environment first
source venv/bin/activate
# Run all tests
python -m pytest tests/ -v
# Run specific test
python -m pytest tests/test_unified_diff.py::test_identical_sequences -v
# Run tests with more verbose output
python -m pytest tests/ -vv
# Run only basic sanity tests
python -m pytest tests/test_unified_diff.py -k "sanity or basic" -v
# Run benchmark tests
python -m pytest tests/test_benchmark.py -s# Activate virtual environment
source venv/bin/activate
# Install the package in development mode
maturin develop
# Check for Rust compilation errors
cargo checkdifflib-rst/
├── Cargo.toml # Rust dependencies and configuration
├── pyproject.toml # Python packaging with maturin backend
├── src/
│ ├── lib.rs # Main Rust implementation
│ └── lib_old.rs # Backup of previous implementation
├── tests/
│ └── test_unified_diff.py # Comprehensive test suite
├── .gitignore # Git ignore patterns
├── README.md # Project documentation
└── SWEEP.md # This file
- Uses longest common subsequence (LCS) algorithm for diff computation
- Implements sequence matching with grouped opcodes
- Supports context lines (default: 3)
- Handles edge cases like identical sequences and empty inputs
- Identical sequences: Currently returns diff output instead of empty list
- Range formatting: Some hunk header ranges don't match Python's exactly
- Basic functionality tests ✅
- Edge cases (empty, identical sequences)
⚠️ - Random data validation against Python's difflib
⚠️ - Known examples from Python documentation ✅
- File date handling ✅
- Custom line terminators ✅
# Compare outputs between Rust and Python implementations
python3 -c "
from difflib_rs import unified_diff as rust_unified_diff
import difflib
a = ['line1', 'line2', 'line3']
rust_result = rust_unified_diff(a, a, 'a', 'b')
python_result = list(difflib.unified_diff(a, a, 'a', 'b'))
print('Rust result:', rust_result)
print('Python result:', python_result)
"
# Check Rust compilation
cargo check --manifest-path Cargo.tomlBased on comprehensive benchmarking (tests/test_benchmark.py):
Rust Outperforms Python in ALL Scenarios with IDENTICAL Output:
- 5,000 lines, 5 changes: 1.73x faster (was 0.75x slower initially)
- 10,000 lines, 5 changes: 1.81x faster (was 0.54x slower initially)
- 20,000 lines, 5 changes: 2.37x faster (was 0.45x slower initially)
- 5,000 lines, 250 changes: 1.91x faster (was 0.68x slower initially)
- 10,000 lines, 500 changes: 2.21x faster (was 0.42x slower initially)
- 20,000 lines, 1,000 changes: 2.17x faster (was 0.21x slower initially)
- Identical sequences: 5.17x faster than Python
- Small files (100-2000 lines): 1.7x-2.25x faster
- Files with 50% changes: 2.58x-2.90x faster
- HashMap-based sparse representation in
find_longest_match(eliminated O(n²) memory operations) - Queue-based approach in
get_matching_blocks(better cache locality) - Proper memory management (using move semantics instead of swap for HashMaps)
- Fixed
get_grouped_opcodesto correctly limit context lines (ensures identical output to Python)
- Fix identical sequence handling to return empty list
- Improve range formatting to exactly match Python's output
- Add benchmarking suite to measure performance gains
- Consider adding more difflib functions (context_diff, etc.)