|
| 1 | +# CLI Documentation Automation - Implementation Progress |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document tracks the implementation of deterministic CLI metadata extraction for the tower-cli project, replacing brittle Python regex parsing with Java reflection-based extraction. |
| 6 | + |
| 7 | +## Timeline |
| 8 | + |
| 9 | +**Start Date:** 2026-01-22 |
| 10 | +**Branch:** `ll-cli-docs-automation-infrastructure` |
| 11 | +**PR:** [#574](https://github.com/seqeralabs/tower-cli/pull/574) |
| 12 | + |
| 13 | +## Problem Statement |
| 14 | + |
| 15 | +### Original Approach Issues |
| 16 | + |
| 17 | +The initial Python-based metadata extractor (`docs/scripts/extract-cli-metadata.py`) had several limitations: |
| 18 | + |
| 19 | +1. **Brittle regex parsing** - String concatenation, multiline annotations, and formatting variations could break extraction |
| 20 | +2. **Incomplete constant resolution** - Constants defined in other files/packages might not resolve |
| 21 | +3. **Mixin limitations** - Platform/provider mixin options weren't automatically captured |
| 22 | +4. **Non-deterministic** - Same input could produce different outputs based on parsing edge cases |
| 23 | +5. **Limited coverage** - Only captured 461 options out of 1000+ |
| 24 | + |
| 25 | +### Engineering Feedback |
| 26 | + |
| 27 | +Received feedback from CLI engineer to: |
| 28 | + |
| 29 | +1. **Replace text-based analysis** with deterministic approach |
| 30 | +2. **Use Java reflection** (or Claude) to extract metadata deterministically |
| 31 | +3. **Integrate into PR workflow** as final step before merging (not on release) |
| 32 | + |
| 33 | +## Solution Implemented |
| 34 | + |
| 35 | +### 1. Java Reflection-Based Metadata Extractor |
| 36 | + |
| 37 | +**File:** `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java` (551 lines) |
| 38 | + |
| 39 | +**Approach:** |
| 40 | +- Uses picocli's `CommandSpec` API for reflection-based extraction |
| 41 | +- Instantiates the Tower CLI application at runtime |
| 42 | +- Walks the complete command tree recursively |
| 43 | +- Automatically resolves all `@Mixin` annotations |
| 44 | +- Extracts full type information (String, boolean, enums, etc.) |
| 45 | +- Outputs structured JSON to `docs/cli-metadata.json` |
| 46 | + |
| 47 | +**Key Methods:** |
| 48 | +- `extractMetadataAsJson()` - Main entry point, returns processed JSON |
| 49 | +- `buildCommandHierarchy()` - Recursively walks command tree |
| 50 | +- `extractCommandMetadata()` - Extracts data for single command |
| 51 | +- `extractOptionMetadata()` - Extracts option details with types |
| 52 | +- `extractParameterMetadata()` - Extracts positional parameters |
| 53 | + |
| 54 | +**Benefits:** |
| 55 | +- ✅ Deterministic - same input always produces same output |
| 56 | +- ✅ Complete - captures all options including mixins |
| 57 | +- ✅ Type-safe - compile-time checks for API changes |
| 58 | +- ✅ Maintainable - standard Java code in same repo |
| 59 | +- ✅ No external dependencies - uses picocli API already in classpath |
| 60 | + |
| 61 | +### 2. Gradle Task Integration |
| 62 | + |
| 63 | +**File:** `build.gradle` |
| 64 | + |
| 65 | +**Added task:** |
| 66 | +```gradle |
| 67 | +task extractCliMetadata(type: JavaExec) { |
| 68 | + group = 'documentation' |
| 69 | + description = 'Extract CLI metadata using Java reflection (deterministic, includes resolved mixins)' |
| 70 | + classpath = sourceSets.main.runtimeClasspath |
| 71 | + mainClass = 'io.seqera.tower.cli.utils.metadata.CliMetadataExtractor' |
| 72 | + args = [file('docs/cli-metadata.json').absolutePath] |
| 73 | + dependsOn classes |
| 74 | +} |
| 75 | +``` |
| 76 | + |
| 77 | +**Usage:** |
| 78 | +```bash |
| 79 | +./gradlew extractCliMetadata |
| 80 | +``` |
| 81 | + |
| 82 | +**Output:** |
| 83 | +- `docs/cli-metadata.json` - Processed metadata for docs consumption (992KB) |
| 84 | +- `docs/command-spec.json` - Raw picocli CommandSpec data for debugging (1.2MB, gitignored) |
| 85 | + |
| 86 | +### 3. PR Template with Checklist |
| 87 | + |
| 88 | +**File:** `.github/pull_request_template.md` |
| 89 | + |
| 90 | +**Key section:** |
| 91 | +```markdown |
| 92 | +## Pre-Merge Checklist |
| 93 | + |
| 94 | +- [ ] CLI metadata updated (if CLI commands/options were added/modified) |
| 95 | + - [ ] Ran `./gradlew extractCliMetadata` to regenerate `docs/cli-metadata.json` |
| 96 | + - [ ] Committed the updated metadata file |
| 97 | +``` |
| 98 | + |
| 99 | +**Purpose:** |
| 100 | +- Enforces metadata updates as final step before merge |
| 101 | +- Makes developers responsible for keeping metadata current |
| 102 | +- Eliminates need for automated workflow on every PR |
| 103 | + |
| 104 | +### 4. Updated GitHub Actions Workflow |
| 105 | + |
| 106 | +**File:** `.github/workflows/trigger-docs-release-update.yml` |
| 107 | + |
| 108 | +**Changes:** |
| 109 | +- Clarified that metadata should already be current from PR checklist |
| 110 | +- Enhanced error messaging if metadata is missing |
| 111 | +- Workflow now just notifies docs repo (metadata already updated) |
| 112 | +- Simplified from "extract and notify" to "verify and notify" |
| 113 | + |
| 114 | +**Trigger:** Still runs on `release.published` event |
| 115 | + |
| 116 | +### 5. Documentation Updates |
| 117 | + |
| 118 | +**docs/README.md** - Complete rewrite |
| 119 | +- Removed all Python script references |
| 120 | +- Documented Java reflection approach |
| 121 | +- Updated workflow description |
| 122 | +- Added troubleshooting section |
| 123 | +- Metrics comparison table |
| 124 | + |
| 125 | +**.claude/skills/enrich-cli-help/SKILL.md** |
| 126 | +- Updated metadata extraction commands |
| 127 | +- Changed `python docs/scripts/extract-cli-metadata.py` → `./gradlew extractCliMetadata` |
| 128 | +- Removed absolute path references |
| 129 | +- Updated working directory to project root |
| 130 | + |
| 131 | +**.claude/README.md** |
| 132 | +- Updated repository structure diagram |
| 133 | +- Replaced Python references throughout |
| 134 | +- Clarified automatic mixin resolution |
| 135 | +- Added metadata update to contributor guide |
| 136 | + |
| 137 | +### 6. Cleanup |
| 138 | + |
| 139 | +**Removed:** |
| 140 | +- `docs/scripts/extract-cli-metadata.py` (replaced by Java) |
| 141 | +- `docs/scripts/extract-cli-examples.py` (out of scope) |
| 142 | +- `docs/cli-examples.json` (out of scope) |
| 143 | +- Empty `docs/scripts/` directory |
| 144 | + |
| 145 | +**Updated .gitignore:** |
| 146 | +- Added `docs/command-spec.json` (raw debug output) |
| 147 | + |
| 148 | +## Testing Results |
| 149 | + |
| 150 | +### Extraction Test (2026-01-22) |
| 151 | + |
| 152 | +```bash |
| 153 | +./gradlew extractCliMetadata |
| 154 | +``` |
| 155 | + |
| 156 | +**Output:** |
| 157 | +``` |
| 158 | +CLI metadata written to: /Users/llewelyn-van-der-berg/Documents/GitHub/tower-cli/docs/cli-metadata.json |
| 159 | +Total commands: 164 |
| 160 | +Total options: 1008 |
| 161 | +Total parameters: 12 |
| 162 | +Raw command spec written to: /Users/llewelyn-van-der-berg/Documents/GitHub/tower-cli/docs/command-spec.json |
| 163 | +
|
| 164 | +BUILD SUCCESSFUL |
| 165 | +``` |
| 166 | + |
| 167 | +**Verification:** |
| 168 | +- ✅ File generated: `docs/cli-metadata.json` (992KB) |
| 169 | +- ✅ Valid JSON structure |
| 170 | +- ✅ Complete command hierarchy |
| 171 | +- ✅ All options with resolved mixins |
| 172 | +- ✅ Full type information included |
| 173 | + |
| 174 | +## Metrics Comparison |
| 175 | + |
| 176 | +### Python Regex vs Java Reflection |
| 177 | + |
| 178 | +| Metric | Python (Regex) | Java (Reflection) | Improvement | |
| 179 | +|--------|----------------|-------------------|-------------| |
| 180 | +| **Commands** | 163 | 164 | +1 (+0.6%) | |
| 181 | +| **Options** | 461 | 1008 | **+547 (+118%)** | |
| 182 | +| **Parameters** | 9 | 12 | +3 (+33%) | |
| 183 | +| **Mixin resolution** | Partial (manual) | Automatic | ✅ Complete | |
| 184 | +| **Type information** | Limited | Complete | ✅ Full Java types | |
| 185 | +| **Deterministic** | No (brittle regex) | Yes | ✅ Guaranteed | |
| 186 | +| **Dependencies** | Python 3 | Java (already required) | ✅ No new deps | |
| 187 | +| **Maintenance** | External script | In-repo, type-safe | ✅ Better DX | |
| 188 | + |
| 189 | +### Key Insight |
| 190 | + |
| 191 | +The **118% increase in options** (461 → 1008) reveals that the Python approach was missing more than half of the CLI options! This was primarily due to: |
| 192 | +- Platform/provider mixin classes not being resolved |
| 193 | +- Complex annotation patterns that regex couldn't handle |
| 194 | +- Inheritance patterns that weren't tracked |
| 195 | + |
| 196 | +## New Workflow |
| 197 | + |
| 198 | +### For Developers Making CLI Changes |
| 199 | + |
| 200 | +1. **Develop** - Add/modify CLI commands and options in Java source |
| 201 | +2. **Improve descriptions** - Apply quality standards to `@Option` descriptions |
| 202 | +3. **Extract metadata** - Run `./gradlew extractCliMetadata` before merge |
| 203 | +4. **Commit both** - Java changes + updated `docs/cli-metadata.json` |
| 204 | +5. **PR checklist** - Verify metadata update checkbox is checked |
| 205 | +6. **Merge** - Metadata is now current in master branch |
| 206 | + |
| 207 | +### On Release |
| 208 | + |
| 209 | +1. **Create release** - Tag version and publish |
| 210 | +2. **Workflow triggers** - GitHub Actions runs automatically |
| 211 | +3. **Verify metadata exists** - Checks `docs/cli-metadata.json` in release tag |
| 212 | +4. **Notify docs repo** - Sends repository dispatch event |
| 213 | +5. **Docs repo** - Fetches metadata, generates docs, creates PR |
| 214 | + |
| 215 | +**Result:** Zero manual documentation steps after release |
| 216 | + |
| 217 | +## Architecture Decisions |
| 218 | + |
| 219 | +### Why Java Reflection Over Claude? |
| 220 | + |
| 221 | +**Considered options:** |
| 222 | +1. Java reflection (chosen) |
| 223 | +2. Claude-based extraction |
| 224 | +3. Hybrid (Java + Claude) |
| 225 | + |
| 226 | +**Decision rationale:** |
| 227 | +- Java reflection is **deterministic** (same input = same output) |
| 228 | +- Claude-based would require API calls and careful prompt engineering |
| 229 | +- Claude results may vary between runs |
| 230 | +- Java reflection uses official picocli API (guaranteed to be correct) |
| 231 | +- No external API dependency or rate limits |
| 232 | +- Type-safe and compile-time checked |
| 233 | +- Faster execution (no API latency) |
| 234 | + |
| 235 | +**Trade-offs accepted:** |
| 236 | +- Requires compilation (but build already required) |
| 237 | +- Requires GitHub credentials for tower-java-sdk (already needed) |
| 238 | +- Slightly more complex than script (but more robust) |
| 239 | + |
| 240 | +### Why PR Checklist Over Automated Workflow? |
| 241 | + |
| 242 | +**Considered options:** |
| 243 | +1. PR checklist (chosen) |
| 244 | +2. Automated workflow on every PR |
| 245 | +3. Pre-commit git hook |
| 246 | + |
| 247 | +**Decision rationale:** |
| 248 | +- **PR checklist** gives developers control and visibility |
| 249 | +- Automated workflow would create bot commits on every PR (noise) |
| 250 | +- Many PRs don't change CLI structure (unnecessary workflow runs) |
| 251 | +- Pre-commit hooks require local setup (not enforced) |
| 252 | +- Checklist is simple, explicit, and educational |
| 253 | + |
| 254 | +**Trade-offs accepted:** |
| 255 | +- Relies on developer compliance (but checklist makes it obvious) |
| 256 | +- Not automatically enforced (but PR reviewers can check) |
| 257 | +- Could be forgotten (but template makes it prominent) |
| 258 | + |
| 259 | +## Implementation Commits |
| 260 | + |
| 261 | +1. `6895d301` - Replace Python metadata extractor with Java reflection approach |
| 262 | + - Added CliMetadataExtractor.java |
| 263 | + - Added extractCliMetadata Gradle task |
| 264 | + - Removed Python scripts |
| 265 | + - Created PR template |
| 266 | + - Updated workflow and documentation |
| 267 | + |
| 268 | +2. `506c07d5` - Update enrich-cli-help skill to reference Java extractor |
| 269 | + - Changed metadata extraction commands |
| 270 | + - Removed absolute paths |
| 271 | + - Updated working directory references |
| 272 | + |
| 273 | +3. `4538b39a` - Update .claude/README.md to reference Java extractor |
| 274 | + - Updated repository structure |
| 275 | + - Replaced Python references |
| 276 | + - Updated workflow pipeline |
| 277 | + |
| 278 | +4. `bc3ece0f` - Test Java metadata extractor and update cli-metadata.json |
| 279 | + - Successfully tested extraction |
| 280 | + - Generated fresh metadata (1008 options) |
| 281 | + - Added command-spec.json to gitignore |
| 282 | + |
| 283 | +## Current State |
| 284 | + |
| 285 | +### Completed ✅ |
| 286 | + |
| 287 | +- ✅ Java reflection-based metadata extractor implemented |
| 288 | +- ✅ Gradle task integrated and tested |
| 289 | +- ✅ PR template with checklist created |
| 290 | +- ✅ GitHub Actions workflow updated |
| 291 | +- ✅ All documentation updated (docs/README.md, .claude/) |
| 292 | +- ✅ Python scripts removed |
| 293 | +- ✅ Fresh metadata generated (1008 options) |
| 294 | +- ✅ PR created: #574 |
| 295 | + |
| 296 | +### Branch Status |
| 297 | + |
| 298 | +**Branch:** `ll-cli-docs-automation-infrastructure` |
| 299 | +**Status:** Ready for review |
| 300 | +**PR:** https://github.com/seqeralabs/tower-cli/pull/574 |
| 301 | + |
| 302 | +### Files Changed Summary |
| 303 | + |
| 304 | +**Added:** |
| 305 | +- `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java` (551 lines) |
| 306 | +- `.github/pull_request_template.md` (new) |
| 307 | + |
| 308 | +**Modified:** |
| 309 | +- `build.gradle` (added extractCliMetadata task) |
| 310 | +- `.github/workflows/trigger-docs-release-update.yml` (clarified comments) |
| 311 | +- `docs/README.md` (complete rewrite) |
| 312 | +- `.claude/skills/enrich-cli-help/SKILL.md` (updated references) |
| 313 | +- `.claude/README.md` (updated references) |
| 314 | +- `.gitignore` (added command-spec.json) |
| 315 | +- `docs/cli-metadata.json` (regenerated with 1008 options) |
| 316 | + |
| 317 | +**Removed:** |
| 318 | +- `docs/scripts/extract-cli-metadata.py` |
| 319 | +- `docs/scripts/extract-cli-examples.py` |
| 320 | +- `docs/cli-examples.json` |
| 321 | + |
| 322 | +## Next Steps |
| 323 | + |
| 324 | +### Immediate (Post-Merge) |
| 325 | + |
| 326 | +1. **Merge PR #574** - Get Java extractor into master |
| 327 | +2. **Update other branch** - `ll-metadata-extractor-and-docs-automation` (help text PR) |
| 328 | + - Rebase on master to get Java extractor |
| 329 | + - Regenerate metadata with `./gradlew extractCliMetadata` |
| 330 | + - Update that PR to include fresh metadata |
| 331 | + |
| 332 | +### Future Enhancements |
| 333 | + |
| 334 | +**Short term:** |
| 335 | +- Add metadata validation tests (ensure all commands have descriptions) |
| 336 | +- Add GitHub Action to verify metadata is current in PRs (non-blocking check) |
| 337 | +- Document common patterns for new command implementations |
| 338 | + |
| 339 | +**Long term:** |
| 340 | +- Changelog generation (compare metadata between versions) |
| 341 | +- Deprecation tracking (flag removed commands/options) |
| 342 | +- Multi-format output (man pages, shell completions) |
| 343 | +- Metadata coverage metrics dashboard |
| 344 | + |
| 345 | +## Lessons Learned |
| 346 | + |
| 347 | +### What Worked Well |
| 348 | + |
| 349 | +1. **Java reflection approach** - Deterministic, complete, maintainable |
| 350 | +2. **PR checklist integration** - Simple, explicit, developer-friendly |
| 351 | +3. **Porting from experimental branch** - Already proven in prototype |
| 352 | +4. **Comprehensive documentation** - Clear process for contributors |
| 353 | + |
| 354 | +### Challenges Encountered |
| 355 | + |
| 356 | +1. **Build requires GitHub credentials** - Need tower-java-sdk access |
| 357 | + - Solution: Documented in troubleshooting section |
| 358 | +2. **Multiple options discovered** - Initially considered Claude-based approach |
| 359 | + - Solution: Engineering feedback helped prioritize Java reflection |
| 360 | +3. **Outdated skill references** - Claude skills had Python script paths |
| 361 | + - Solution: Systematically updated all references |
| 362 | + |
| 363 | +### Process Improvements |
| 364 | + |
| 365 | +1. **Always test before committing** - Verified extractor works before PR |
| 366 | +2. **Update all documentation** - Ensured no stale references remain |
| 367 | +3. **Clear commit messages** - Each commit explains what and why |
| 368 | +4. **Comparison metrics** - Showed concrete improvement (118% more options) |
| 369 | + |
| 370 | +## References |
| 371 | + |
| 372 | +### Implementation Files |
| 373 | +- Java extractor: `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java` |
| 374 | +- Gradle task: `build.gradle` (extractCliMetadata) |
| 375 | +- PR template: `.github/pull_request_template.md` |
| 376 | +- Workflow: `.github/workflows/trigger-docs-release-update.yml` |
| 377 | + |
| 378 | +### Documentation |
| 379 | +- Process guide: `docs/README.md` |
| 380 | +- Claude skill: `.claude/skills/enrich-cli-help/SKILL.md` |
| 381 | +- Contributor guide: `.claude/README.md` |
| 382 | + |
| 383 | +### External |
| 384 | +- PR #574: https://github.com/seqeralabs/tower-cli/pull/574 |
| 385 | +- Picocli docs: https://picocli.info/ |
| 386 | +- Original conversation: Downloaded text file with experimental branch discussion |
| 387 | + |
| 388 | +--- |
| 389 | + |
| 390 | +**Last Updated:** 2026-01-22 |
| 391 | +**Status:** Implementation complete, PR ready for review |
| 392 | +**Contributors:** Llewelyn van der Berg, Claude Sonnet 4.5 |
0 commit comments