Skip to content

Commit 9131b97

Browse files
llewellyn-slclaude
andcommitted
Add progress.md documenting Java extractor implementation
Comprehensive documentation of the metadata extraction replacement project: - Problem statement and engineering feedback - Solution implementation details - Testing results (1008 options vs 461) - Architecture decisions and rationale - New workflow description - Lessons learned and next steps Provides complete context for future reference. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
1 parent bc3ece0 commit 9131b97

1 file changed

Lines changed: 392 additions & 0 deletions

File tree

docs/progress.md

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
# CLI Documentation Automation - Implementation Progress
2+
3+
## Overview
4+
5+
This document tracks the implementation of deterministic CLI metadata extraction for the tower-cli project, replacing brittle Python regex parsing with Java reflection-based extraction.
6+
7+
## Timeline
8+
9+
**Start Date:** 2026-01-22
10+
**Branch:** `ll-cli-docs-automation-infrastructure`
11+
**PR:** [#574](https://github.com/seqeralabs/tower-cli/pull/574)
12+
13+
## Problem Statement
14+
15+
### Original Approach Issues
16+
17+
The initial Python-based metadata extractor (`docs/scripts/extract-cli-metadata.py`) had several limitations:
18+
19+
1. **Brittle regex parsing** - String concatenation, multiline annotations, and formatting variations could break extraction
20+
2. **Incomplete constant resolution** - Constants defined in other files/packages might not resolve
21+
3. **Mixin limitations** - Platform/provider mixin options weren't automatically captured
22+
4. **Non-deterministic** - Same input could produce different outputs based on parsing edge cases
23+
5. **Limited coverage** - Only captured 461 options out of 1000+
24+
25+
### Engineering Feedback
26+
27+
Received feedback from CLI engineer to:
28+
29+
1. **Replace text-based analysis** with deterministic approach
30+
2. **Use Java reflection** (or Claude) to extract metadata deterministically
31+
3. **Integrate into PR workflow** as final step before merging (not on release)
32+
33+
## Solution Implemented
34+
35+
### 1. Java Reflection-Based Metadata Extractor
36+
37+
**File:** `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java` (551 lines)
38+
39+
**Approach:**
40+
- Uses picocli's `CommandSpec` API for reflection-based extraction
41+
- Instantiates the Tower CLI application at runtime
42+
- Walks the complete command tree recursively
43+
- Automatically resolves all `@Mixin` annotations
44+
- Extracts full type information (String, boolean, enums, etc.)
45+
- Outputs structured JSON to `docs/cli-metadata.json`
46+
47+
**Key Methods:**
48+
- `extractMetadataAsJson()` - Main entry point, returns processed JSON
49+
- `buildCommandHierarchy()` - Recursively walks command tree
50+
- `extractCommandMetadata()` - Extracts data for single command
51+
- `extractOptionMetadata()` - Extracts option details with types
52+
- `extractParameterMetadata()` - Extracts positional parameters
53+
54+
**Benefits:**
55+
- ✅ Deterministic - same input always produces same output
56+
- ✅ Complete - captures all options including mixins
57+
- ✅ Type-safe - compile-time checks for API changes
58+
- ✅ Maintainable - standard Java code in same repo
59+
- ✅ No external dependencies - uses picocli API already in classpath
60+
61+
### 2. Gradle Task Integration
62+
63+
**File:** `build.gradle`
64+
65+
**Added task:**
66+
```gradle
67+
task extractCliMetadata(type: JavaExec) {
68+
group = 'documentation'
69+
description = 'Extract CLI metadata using Java reflection (deterministic, includes resolved mixins)'
70+
classpath = sourceSets.main.runtimeClasspath
71+
mainClass = 'io.seqera.tower.cli.utils.metadata.CliMetadataExtractor'
72+
args = [file('docs/cli-metadata.json').absolutePath]
73+
dependsOn classes
74+
}
75+
```
76+
77+
**Usage:**
78+
```bash
79+
./gradlew extractCliMetadata
80+
```
81+
82+
**Output:**
83+
- `docs/cli-metadata.json` - Processed metadata for docs consumption (992KB)
84+
- `docs/command-spec.json` - Raw picocli CommandSpec data for debugging (1.2MB, gitignored)
85+
86+
### 3. PR Template with Checklist
87+
88+
**File:** `.github/pull_request_template.md`
89+
90+
**Key section:**
91+
```markdown
92+
## Pre-Merge Checklist
93+
94+
- [ ] CLI metadata updated (if CLI commands/options were added/modified)
95+
- [ ] Ran `./gradlew extractCliMetadata` to regenerate `docs/cli-metadata.json`
96+
- [ ] Committed the updated metadata file
97+
```
98+
99+
**Purpose:**
100+
- Enforces metadata updates as final step before merge
101+
- Makes developers responsible for keeping metadata current
102+
- Eliminates need for automated workflow on every PR
103+
104+
### 4. Updated GitHub Actions Workflow
105+
106+
**File:** `.github/workflows/trigger-docs-release-update.yml`
107+
108+
**Changes:**
109+
- Clarified that metadata should already be current from PR checklist
110+
- Enhanced error messaging if metadata is missing
111+
- Workflow now just notifies docs repo (metadata already updated)
112+
- Simplified from "extract and notify" to "verify and notify"
113+
114+
**Trigger:** Still runs on `release.published` event
115+
116+
### 5. Documentation Updates
117+
118+
**docs/README.md** - Complete rewrite
119+
- Removed all Python script references
120+
- Documented Java reflection approach
121+
- Updated workflow description
122+
- Added troubleshooting section
123+
- Metrics comparison table
124+
125+
**.claude/skills/enrich-cli-help/SKILL.md**
126+
- Updated metadata extraction commands
127+
- Changed `python docs/scripts/extract-cli-metadata.py``./gradlew extractCliMetadata`
128+
- Removed absolute path references
129+
- Updated working directory to project root
130+
131+
**.claude/README.md**
132+
- Updated repository structure diagram
133+
- Replaced Python references throughout
134+
- Clarified automatic mixin resolution
135+
- Added metadata update to contributor guide
136+
137+
### 6. Cleanup
138+
139+
**Removed:**
140+
- `docs/scripts/extract-cli-metadata.py` (replaced by Java)
141+
- `docs/scripts/extract-cli-examples.py` (out of scope)
142+
- `docs/cli-examples.json` (out of scope)
143+
- Empty `docs/scripts/` directory
144+
145+
**Updated .gitignore:**
146+
- Added `docs/command-spec.json` (raw debug output)
147+
148+
## Testing Results
149+
150+
### Extraction Test (2026-01-22)
151+
152+
```bash
153+
./gradlew extractCliMetadata
154+
```
155+
156+
**Output:**
157+
```
158+
CLI metadata written to: /Users/llewelyn-van-der-berg/Documents/GitHub/tower-cli/docs/cli-metadata.json
159+
Total commands: 164
160+
Total options: 1008
161+
Total parameters: 12
162+
Raw command spec written to: /Users/llewelyn-van-der-berg/Documents/GitHub/tower-cli/docs/command-spec.json
163+
164+
BUILD SUCCESSFUL
165+
```
166+
167+
**Verification:**
168+
- ✅ File generated: `docs/cli-metadata.json` (992KB)
169+
- ✅ Valid JSON structure
170+
- ✅ Complete command hierarchy
171+
- ✅ All options with resolved mixins
172+
- ✅ Full type information included
173+
174+
## Metrics Comparison
175+
176+
### Python Regex vs Java Reflection
177+
178+
| Metric | Python (Regex) | Java (Reflection) | Improvement |
179+
|--------|----------------|-------------------|-------------|
180+
| **Commands** | 163 | 164 | +1 (+0.6%) |
181+
| **Options** | 461 | 1008 | **+547 (+118%)** |
182+
| **Parameters** | 9 | 12 | +3 (+33%) |
183+
| **Mixin resolution** | Partial (manual) | Automatic | ✅ Complete |
184+
| **Type information** | Limited | Complete | ✅ Full Java types |
185+
| **Deterministic** | No (brittle regex) | Yes | ✅ Guaranteed |
186+
| **Dependencies** | Python 3 | Java (already required) | ✅ No new deps |
187+
| **Maintenance** | External script | In-repo, type-safe | ✅ Better DX |
188+
189+
### Key Insight
190+
191+
The **118% increase in options** (461 → 1008) reveals that the Python approach was missing more than half of the CLI options! This was primarily due to:
192+
- Platform/provider mixin classes not being resolved
193+
- Complex annotation patterns that regex couldn't handle
194+
- Inheritance patterns that weren't tracked
195+
196+
## New Workflow
197+
198+
### For Developers Making CLI Changes
199+
200+
1. **Develop** - Add/modify CLI commands and options in Java source
201+
2. **Improve descriptions** - Apply quality standards to `@Option` descriptions
202+
3. **Extract metadata** - Run `./gradlew extractCliMetadata` before merge
203+
4. **Commit both** - Java changes + updated `docs/cli-metadata.json`
204+
5. **PR checklist** - Verify metadata update checkbox is checked
205+
6. **Merge** - Metadata is now current in master branch
206+
207+
### On Release
208+
209+
1. **Create release** - Tag version and publish
210+
2. **Workflow triggers** - GitHub Actions runs automatically
211+
3. **Verify metadata exists** - Checks `docs/cli-metadata.json` in release tag
212+
4. **Notify docs repo** - Sends repository dispatch event
213+
5. **Docs repo** - Fetches metadata, generates docs, creates PR
214+
215+
**Result:** Zero manual documentation steps after release
216+
217+
## Architecture Decisions
218+
219+
### Why Java Reflection Over Claude?
220+
221+
**Considered options:**
222+
1. Java reflection (chosen)
223+
2. Claude-based extraction
224+
3. Hybrid (Java + Claude)
225+
226+
**Decision rationale:**
227+
- Java reflection is **deterministic** (same input = same output)
228+
- Claude-based would require API calls and careful prompt engineering
229+
- Claude results may vary between runs
230+
- Java reflection uses official picocli API (guaranteed to be correct)
231+
- No external API dependency or rate limits
232+
- Type-safe and compile-time checked
233+
- Faster execution (no API latency)
234+
235+
**Trade-offs accepted:**
236+
- Requires compilation (but build already required)
237+
- Requires GitHub credentials for tower-java-sdk (already needed)
238+
- Slightly more complex than script (but more robust)
239+
240+
### Why PR Checklist Over Automated Workflow?
241+
242+
**Considered options:**
243+
1. PR checklist (chosen)
244+
2. Automated workflow on every PR
245+
3. Pre-commit git hook
246+
247+
**Decision rationale:**
248+
- **PR checklist** gives developers control and visibility
249+
- Automated workflow would create bot commits on every PR (noise)
250+
- Many PRs don't change CLI structure (unnecessary workflow runs)
251+
- Pre-commit hooks require local setup (not enforced)
252+
- Checklist is simple, explicit, and educational
253+
254+
**Trade-offs accepted:**
255+
- Relies on developer compliance (but checklist makes it obvious)
256+
- Not automatically enforced (but PR reviewers can check)
257+
- Could be forgotten (but template makes it prominent)
258+
259+
## Implementation Commits
260+
261+
1. `6895d301` - Replace Python metadata extractor with Java reflection approach
262+
- Added CliMetadataExtractor.java
263+
- Added extractCliMetadata Gradle task
264+
- Removed Python scripts
265+
- Created PR template
266+
- Updated workflow and documentation
267+
268+
2. `506c07d5` - Update enrich-cli-help skill to reference Java extractor
269+
- Changed metadata extraction commands
270+
- Removed absolute paths
271+
- Updated working directory references
272+
273+
3. `4538b39a` - Update .claude/README.md to reference Java extractor
274+
- Updated repository structure
275+
- Replaced Python references
276+
- Updated workflow pipeline
277+
278+
4. `bc3ece0f` - Test Java metadata extractor and update cli-metadata.json
279+
- Successfully tested extraction
280+
- Generated fresh metadata (1008 options)
281+
- Added command-spec.json to gitignore
282+
283+
## Current State
284+
285+
### Completed ✅
286+
287+
- ✅ Java reflection-based metadata extractor implemented
288+
- ✅ Gradle task integrated and tested
289+
- ✅ PR template with checklist created
290+
- ✅ GitHub Actions workflow updated
291+
- ✅ All documentation updated (docs/README.md, .claude/)
292+
- ✅ Python scripts removed
293+
- ✅ Fresh metadata generated (1008 options)
294+
- ✅ PR created: #574
295+
296+
### Branch Status
297+
298+
**Branch:** `ll-cli-docs-automation-infrastructure`
299+
**Status:** Ready for review
300+
**PR:** https://github.com/seqeralabs/tower-cli/pull/574
301+
302+
### Files Changed Summary
303+
304+
**Added:**
305+
- `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java` (551 lines)
306+
- `.github/pull_request_template.md` (new)
307+
308+
**Modified:**
309+
- `build.gradle` (added extractCliMetadata task)
310+
- `.github/workflows/trigger-docs-release-update.yml` (clarified comments)
311+
- `docs/README.md` (complete rewrite)
312+
- `.claude/skills/enrich-cli-help/SKILL.md` (updated references)
313+
- `.claude/README.md` (updated references)
314+
- `.gitignore` (added command-spec.json)
315+
- `docs/cli-metadata.json` (regenerated with 1008 options)
316+
317+
**Removed:**
318+
- `docs/scripts/extract-cli-metadata.py`
319+
- `docs/scripts/extract-cli-examples.py`
320+
- `docs/cli-examples.json`
321+
322+
## Next Steps
323+
324+
### Immediate (Post-Merge)
325+
326+
1. **Merge PR #574** - Get Java extractor into master
327+
2. **Update other branch** - `ll-metadata-extractor-and-docs-automation` (help text PR)
328+
- Rebase on master to get Java extractor
329+
- Regenerate metadata with `./gradlew extractCliMetadata`
330+
- Update that PR to include fresh metadata
331+
332+
### Future Enhancements
333+
334+
**Short term:**
335+
- Add metadata validation tests (ensure all commands have descriptions)
336+
- Add GitHub Action to verify metadata is current in PRs (non-blocking check)
337+
- Document common patterns for new command implementations
338+
339+
**Long term:**
340+
- Changelog generation (compare metadata between versions)
341+
- Deprecation tracking (flag removed commands/options)
342+
- Multi-format output (man pages, shell completions)
343+
- Metadata coverage metrics dashboard
344+
345+
## Lessons Learned
346+
347+
### What Worked Well
348+
349+
1. **Java reflection approach** - Deterministic, complete, maintainable
350+
2. **PR checklist integration** - Simple, explicit, developer-friendly
351+
3. **Porting from experimental branch** - Already proven in prototype
352+
4. **Comprehensive documentation** - Clear process for contributors
353+
354+
### Challenges Encountered
355+
356+
1. **Build requires GitHub credentials** - Need tower-java-sdk access
357+
- Solution: Documented in troubleshooting section
358+
2. **Multiple options discovered** - Initially considered Claude-based approach
359+
- Solution: Engineering feedback helped prioritize Java reflection
360+
3. **Outdated skill references** - Claude skills had Python script paths
361+
- Solution: Systematically updated all references
362+
363+
### Process Improvements
364+
365+
1. **Always test before committing** - Verified extractor works before PR
366+
2. **Update all documentation** - Ensured no stale references remain
367+
3. **Clear commit messages** - Each commit explains what and why
368+
4. **Comparison metrics** - Showed concrete improvement (118% more options)
369+
370+
## References
371+
372+
### Implementation Files
373+
- Java extractor: `src/main/java/io/seqera/tower/cli/utils/metadata/CliMetadataExtractor.java`
374+
- Gradle task: `build.gradle` (extractCliMetadata)
375+
- PR template: `.github/pull_request_template.md`
376+
- Workflow: `.github/workflows/trigger-docs-release-update.yml`
377+
378+
### Documentation
379+
- Process guide: `docs/README.md`
380+
- Claude skill: `.claude/skills/enrich-cli-help/SKILL.md`
381+
- Contributor guide: `.claude/README.md`
382+
383+
### External
384+
- PR #574: https://github.com/seqeralabs/tower-cli/pull/574
385+
- Picocli docs: https://picocli.info/
386+
- Original conversation: Downloaded text file with experimental branch discussion
387+
388+
---
389+
390+
**Last Updated:** 2026-01-22
391+
**Status:** Implementation complete, PR ready for review
392+
**Contributors:** Llewelyn van der Berg, Claude Sonnet 4.5

0 commit comments

Comments
 (0)