Skip to content

feat(bundle-analysis): JSONL-only gap analysis with full six-signal attribution#23

Open
colombod wants to merge 4 commits into
mainfrom
feat/bundle-analysis-library
Open

feat(bundle-analysis): JSONL-only gap analysis with full six-signal attribution#23
colombod wants to merge 4 commits into
mainfrom
feat/bundle-analysis-library

Conversation

@colombod
Copy link
Copy Markdown
Collaborator

Summary

Implements a complete bundle usage gap analysis system. Answers "which named components of which bundles did I actually use versus what was declared?" across sessions and workspaces — using local JSONL only, no CI graph required.

What changed

Core library (context_intelligence/bundle_analysis/)

  • Inventory scanner — three-tier schema (always_active / agent_level / mode_gated). Reads behaviors/*.yaml for always-active agents/context/recipes, modes/*.md contributes blocks for mode-gated components, agents/*.md frontmatter for agent-level tool declarations. Disk cache keyed on bundle {slug}-{hash} for near-instant re-runs. Supports both bundle.md and bundle.yaml formats.
  • Signals reader — named sets throughout (not integer counts). Inventory used as reverse lookup table, enabling attribution for all six signal types including tools (via tool:pre events) and modes (via mode:activated / mode:changed events). Previously tools and modes were always-zero.
  • Gap arithmetic — four gap types with specific component names: tree-shake, mode-refactor, config-gap, mode-never-activated. Two-tier evaluation: always_active and agent_level components get tree-shake/mode-refactor; mode_gated components get mode-never-activated.
  • Orchestration order — inventory built first (scan_cache), then signals attributed using inventory as reverse lookup, then gap computed. Dropped AsyncCIClient requirement entirely.

Deleted

  • GraphFetcher class and all 18 Cypher query files — no CI graph path remains.

Tooling

  • tool-bundle-usage — removed CI client dependency, works JSONL-only.
  • agents/bundle-usage-analyst.md — new agent contributed by bundle-usage mode. Calls bundle_usage once, interprets gap findings, writes structured markdown report.
  • modes/bundle-usage.mdadvertised: false mode that contributes the analyst agent and tool.

Tests

  • 134 tests pass (was 116 before this work — 18 new tests for tool/mode attribution).
  • ruff: all checks passed.
  • pyright: 0 errors, 0 warnings.

Live verification

Against the bundle-use-inspectors workspace (real JSONL data):

context-intelligence:
  ✅ agents:  [graph-analyst, session-navigator, context-intelligence-design-facilitator]
  ✅ context: [context-intelligence-awareness.md]
  ✅ tools:   [bash, delegate, graph_query]

superpowers:
  ✅ modes:   [brainstorm, execute-plan, verify, write-plan]

Gap findings include tree-shake for 7 unused skills, mode-never-activated for the context-intelligence mode (never activated in this workspace), and 24 total mode-never-activated findings across all bundles — all based on real event data.

Diego Colombo and others added 4 commits May 21, 2026 12:54
…ttribution

Implements a complete bundle usage gap analysis system for the
context-intelligence bundle. Replaces the CI-graph-first approach with
a JSONL-only, inventory-first pipeline that works offline.

Key changes:
- Rewrite inventory scanner — three-tier schema (always_active /
  agent_level / mode_gated), reads behaviors/*.yaml, mode.contributes
  blocks, and agent-level frontmatter. Disk cache keyed on bundle
  slug+hash for near-instant re-runs.
- Rewrite processor.py — named sets throughout (not counts), inventory
  reverse lookup enables attribution of all six signal types: agents,
  skills, recipes, context, tools, and modes.
- Add tool_call event kind — parses tool:pre events for non-recipe
  tools, attributed via agent_level tool declarations in inventory.
- Add mode_activated event kind — parses mode:activated and
  mode:changed events, attributed via inventory modes map.
- Rewrite gap.py — four gap types with specific component names:
  tree-shake, mode-refactor, config-gap, mode-never-activated. Two-tier
  evaluation: always_active and agent_level get tree-shake/mode-refactor;
  mode_gated gets mode-never-activated.
- Update orchestration order — inventory built first, signals attributed
  using inventory as reverse lookup, then gap computed.
- Remove CI client dependency — run_bundle_analysis no longer requires
  AsyncCIClient. tool-bundle-usage works JSONL-only.
- Delete GraphFetcher and 18 Cypher query files — no CI graph path.
- Add bundle-usage mode (advertised: false) — contributes
  bundle-usage-analyst agent and tool-bundle-usage tool.
- Add bundle-usage-analyst agent — calls bundle_usage tool once,
  interprets gap findings, writes structured markdown report to disk.
- Support bundle.md and bundle.yaml — fixes silent drop of python-dev
  and lsp bundles which use bundle.yaml format.
- 134 tests pass, ruff clean, pyright 0 errors.

Co-authored-by: Amplifier <amplifier@microsoft.com>
…h references, fix processor docstring and __all__
@bkrabach
Copy link
Copy Markdown
Collaborator

Reviewed for parent_id and Session B merge interaction: zero impact. This PR reads events.jsonl from disk only — doesn't touch the hook module, dispatch path, session-graph write path, or merge_state semantics. Additive surface area is justified by scope; 134 unit tests + live verification output in the PR body is solid coverage.

One mechanical note: this PR and #19 both touch behaviors/context-intelligence.yaml. Whoever merges second needs a manual rebase on that file. Recommend merge order #22#23#19, which puts the breaking change (#19) last.

Approved. The deletion of GraphFetcher + 18 Cypher query files is a one-way door — fine and consistent with the stated JSONL-first direction, just flagging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants