Which guide? Modifying
mellea/,cli/, ortest/→ this file. Writing code that imports Mellea →docs/AGENTS_TEMPLATE.md.
Code of Conduct: This project adheres to a Code of Conduct. All contributors, including AI assistants, are expected to follow these community standards when generating code, documentation, or interacting with the project.
uv for Python commands — never use system Python or pip directly.
- Run Python scripts:
uv run python script.py(notpython script.py) - Run tools:
uv run pytest,uv run ruff(notpytest,ruff) - Install deps:
uv sync(notpip install) - The virtual environment is
.venv/—uv runautomatically uses it
pre-commit install # Required: install git hooks
uv sync --all-extras --all-groups # Install all deps (required for tests)
uv sync --extra backends --all-groups # Install just backend deps (lighter)
ollama serve # Start Ollama (required for most tests)
uv run pytest # Default: qualitative tests, skip slow tests
uv run pytest -m "not qualitative" # Fast tests only (~2 min)
uv run pytest -m slow # Run only slow tests (>5 min)
uv run pytest --co -q # Run ALL tests including slow (bypass config)
uv run ruff format . # Format code
uv run ruff check . # Lint code
uv run mypy . # Type checkBranches: feat/topic, fix/issue-id, docs/topic
| Path | Contents |
|---|---|
mellea/core/ |
Core abstractions: Backend, Base, Formatter, Requirement, Sampling |
mellea/stdlib/ |
Standard library: Sessions, Components, Context |
mellea/backends/ |
Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM |
mellea/formatters/ |
Output formatters for different types |
mellea/templates/ |
Jinja2 templates |
mellea/helpers/ |
Utilities, logging, model ID tables |
cli/ |
CLI commands (m serve, m alora, m decompose, m eval) |
test/ |
All tests (run from repo root) |
docs/examples/ |
Example code (run as tests via pytest) |
.agents/skills/ |
Agent skills (agentskills.io standard) |
scratchpad/ |
Experiments (git-ignored) |
Tests use a four-tier granularity system (unit, integration, e2e, qualitative) plus backend and resource markers. The unit marker is auto-applied by conftest — never write it explicitly. The llm marker is deprecated; use e2e instead.
See test/MARKERS_GUIDE.md for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic, common patterns).
Examples in docs/examples/ are opt-in — unlike test/ files (auto-collected, default unit), examples require an explicit # pytest: comment to be collected. Files without this comment are silently ignored (they won't appear in skip summaries either). This is because examples have variable dependencies and limited setup:
# pytest: e2e, ollama, qualitative
"""Example description..."""qualitative to trivial tests — keep the fast loop fast.
slow.
Skills live in .agents/skills/ following the agentskills.io open standard. Each skill is a directory with a SKILL.md file (YAML frontmatter + markdown instructions).
Tool discovery:
| Tool | Project skills | Global skills | Config needed |
|---|---|---|---|
| Claude Code | .agents/skills/ |
~/.claude/skills/ |
"skillLocations": [".agents/skills"] in .claude/settings.json |
| IBM Bob | .bob/skills/ |
~/.bob/skills/ |
Symlink: .bob/skills → .agents/skills |
| VS Code / Copilot | .agents/skills/ |
— | None (auto-discovered) |
Bob users: create the symlink once per clone:
mkdir -p .bob && ln -s ../.agents/skills .bob/skillsAvailable skills: /audit-markers, /skill-author
- Types required on all core functions
- Docstrings are prompts — be specific, the LLM reads them
- Google-style docstrings —
Args:on the class docstring only;__init__gets a single summary sentence. AddAttributes:only when a stored value differs in type/behaviour from its constructor input (type transforms, computed values, class constants). See CONTRIBUTING.md for a full example. - Ruff for linting/formatting
- Use
...in@generativefunction bodies - Prefer primitives over classes
- Friendly Dependency Errors: Wraps optional backend imports in
try/except ImportErrorwith a helpful message (e.g., "Please pip install mellea[hf]"). Seemellea/stdlib/session.pyfor examples. - CLI command docstrings: Typer command functions in
cli/follow an enriched convention withPrerequisites:andSee Also:sections — these feed the auto-generated CLI reference page. Seedocs/docs/guide/CONTRIBUTING.mdfor the full pattern. Regenerate after changes:uv run poe clidocs. Test the generator:uv run pytest tooling/docs-autogen/test_cli_reference.py -v. Full pipeline docs:tooling/docs-autogen/README.md. - Backend telemetry fields: All backends must populate
mot.usage(dict withprompt_tokens,completion_tokens,total_tokens),mot.model(str), andmot.provider(str) in theirpost_processing()method.mot.streaming(bool) andmot.ttfb_ms(float | None) are set automatically inastream()— backends do not need to set them. Metrics are automatically recorded byTokenMetricsPlugin,LatencyMetricsPlugin, andErrorMetricsPlugin— don't add manualrecord_token_usage_metrics(),record_request_duration(), orrecord_error()calls.
Angular format: feat:, fix:, docs:, test:, refactor:, release:
Pre-commit runs: ruff, mypy, uv-lock, codespell
For AI attribution trailers, see Section 7 (AI Attribution).
Commits require a Signed-off-by trailer from the human author (added by running git commit -s). AI agents must not add a Signed-off-by in the tool's own name — instead, always add an Assisted-by: trailer to the commit footer:
Assisted-by: Claude Code
Assisted-by: IBM Bob
Use the tool's common name (e.g., GitHub Copilot, Cursor, etc.).
Don't cancel:
pytest(full) andpre-commit --all-filesmay take minutes. Canceling mid-run can corrupt state.
| Problem | Fix |
|---|---|
ComponentParseError |
Add examples to docstring |
uv.lock out of sync |
Run uv sync |
| Ollama refused | Run ollama serve |
| Telemetry import errors | Run uv sync to install OpenTelemetry deps |
uv run pytest test/ -m "not qualitative"passes?ruff formatandruff checkclean?- New functions typed with concise docstrings?
- Unit tests added for new functionality?
- Avoided over-engineering?
- Place tests in
test/mirroring source structure - Name files
test_*.py(required for pydocstyle) - Use
gh_runfixture for CI-aware tests (seetest/conftest.py) - Mark tests checking LLM output quality with
@pytest.mark.qualitative - If a test fails, fix the code, not the test (unless the test was wrong)
If you are modifying or creating pages under docs/docs/, follow the writing
conventions in docs/docs/guide/CONTRIBUTING.md.
Key rules that differ from typical Markdown habits:
- No H1 in the body — Mintlify renders the frontmatter
titleautomatically; a body# Headingproduces a duplicate title in the published site - No
.mdextensions in internal links — use../concepts/requirements-system, not../concepts/requirements-system.md - Frontmatter required — every page needs
titleanddescription; addsidebarTitleif the title is long - markdownlint gate — run
npx markdownlint-cli2 "docs/docs/**/*.md"and fix all warnings before committing a doc page - Verified code only — every code example must be checked against the current
mellea source; mark forward-looking content with
> **Coming soon:** - No visible TODOs — if content is missing, open a GitHub issue instead
Found a bug, workaround, or pattern? Update the docs:
- Issue/workaround? → Add to Section 9 (Common Issues) in this file
- Usage pattern? → Add to
docs/AGENTS_TEMPLATE.md - New pitfall? → Add warning near relevant section
Intrinsics are specialized LoRA adapters that add task-specific capabilities (RAG evaluation, safety checks, calibration, etc.) to Granite models. Mellea handles adapter loading and input formatting automatically — you just call the right function.
Prefer the high-level wrappers in mellea/stdlib/components/intrinsic/. These handle adapter loading, context formatting, and output parsing for you:
| Module | Function | Description |
|---|---|---|
core |
check_certainty(context, backend) |
Model certainty about its last response (0–1) |
core |
requirement_check(context, backend, requirement) |
Whether text meets a requirement (0–1) |
core |
find_context_attributions(response, documents, context, backend) |
Sentences that influenced the response |
rag |
check_answerability(question, documents, context, backend) |
Whether documents can answer a question (0–1) |
rag |
rewrite_question(question, context, backend) |
Rewrite question into a retrieval query |
rag |
clarify_query(question, documents, context, backend) |
Generate clarification or return "CLEAR" |
rag |
find_citations(response, documents, context, backend) |
Document sentences supporting the response |
rag |
check_context_relevance(question, document, context, backend) |
Whether a document is relevant (0–1) |
rag |
flag_hallucinated_content(response, documents, context, backend) |
Flag potentially hallucinated sentences |
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Message
from mellea.stdlib.components.intrinsic import core
from mellea.stdlib.context import ChatContext
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(Message("user", "What is the square root of 4?"))
.add(Message("assistant", "The square root of 4 is 2."))
)
score = core.check_certainty(context, backend)For lower-level control (custom adapters, model options), use mfuncs.act() with Intrinsic directly — see examples in docs/examples/intrinsics/.
- Canonical catalog:
mellea/backends/adapters/catalog.py— source of truth for intrinsic names, HF repo IDs, and adapter types - Usage examples:
docs/examples/intrinsics/— working code for every intrinsic - Helper functions:
mellea/stdlib/components/intrinsic/rag.pyandcore.py
When adding support for a new intrinsic (not just using an existing one), fetch its README from Hugging Face first. Each README contains the authoritative spec for input/output format, intended use, and examples.
Writing examples? The HF READMEs also document intended usage patterns and example inputs — useful reference when writing code in docs/examples/intrinsics/.
| Repo | Purpose | Intrinsics |
|---|---|---|
ibm-granite/granitelib-rag-r1.0 |
RAG pipeline | answerability, citations, context_relevance, hallucination_detection, query_rewrite, query_clarification |
ibm-granite/granitelib-core-r1.0 |
Core capabilities | context-attribution, requirement-check, uncertainty |
ibm-granite/granitelib-guardian-r1.0 |
Safety & compliance | guardian-core, policy-guardrails, factuality-detection, factuality-correction |
README URLs — RAG intrinsics (no model subfolder):
https://huggingface.co/ibm-granite/granitelib-rag-r1.0/blob/main/{intrinsic_name}/README.md
Core and Guardian intrinsics (include model subfolder):
https://huggingface.co/ibm-granite/granitelib-{core,guardian}-r1.0/blob/main/{intrinsic_name}/granite-4.0-micro/README.md