diff --git a/examples/README.md b/examples/README.md
index cb92870e..2f80dc5f 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -18,6 +18,7 @@ service keys.
 | [Build a multimodal wine recommender with OCR](./wine-recommender) | Combining preference-based retrieval with OCR-driven label detection in one UI | `encode`, `score`, `extract` | Docker Compose app plus local SIE endpoint; API key optional for unauthenticated SIE | Runnable demo |
 | [Build a multi-modal product classifier with embeddings](./taxonomy-classification) | Evaluating text, image, NLI, and reranking approaches for hierarchical product taxonomy classification | `extract`, `encode`, `score` | SIE endpoint, Shopify dataset prep via `uv run` scripts, standalone `uv` project | Runnable evaluation example |
 | [Swap an OCR model with one identifier change](./document-ocr) | Driving recognition (VLM-OCR), structured extraction (Donut), and zero-shot NER (GLiNER) through the same `extract` call by swapping the model ID | `extract` | Docker Compose plus Node UI, no API key required, hosted version on [Hugging Face Spaces](https://huggingface.co/spaces/superlinked/document-ocr) | Runnable demo |
+| [Vision-first document RAG](./vision-doc-rag) | Retrieving and answering questions over a multi-tenant page corpus by looking at page images, with OCR kept out of the score path | `encode`, `extract`, `score` (optional) | SIE endpoint with a GPU recommended for ColQwen2.5 + Florence-2-DocVQA | Runnable demo |
 
 For docs publishing, lead with the quickest runnable demos, then use the
 benchmark and evaluation examples for deeper technical users.
diff --git a/examples/vision-doc-rag/.gitignore b/examples/vision-doc-rag/.gitignore
new file mode 100644
index 00000000..9a052846
--- /dev/null
+++ b/examples/vision-doc-rag/.gitignore
@@ -0,0 +1,9 @@
+.venv/
+__pycache__/
+data/pages.json
+data/pdfs_manifest.json
+data/pages_manifest.json
+data/pdfs/
+data/pages/
+data/multivectors.npz
+data/metadata.json
diff --git a/examples/vision-doc-rag/README.md b/examples/vision-doc-rag/README.md
new file mode 100644
index 00000000..3f3c586f
--- /dev/null
+++ b/examples/vision-doc-rag/README.md
@@ -0,0 +1,261 @@
+# Vision-first document RAG
+
+Retrieve by image, answer by image. ColQwen2.5 reads each PDF page as a
+picture and ranks pages via late interaction; Florence-2-DocVQA reads the
+winning page and produces the textual answer. OCR never enters the score path,
+so schematics, pinout diagrams, architecture slides, charts, and other layout
+cues still drive ranking. Everything runs on one SIE endpoint.
+
+Each page also carries a `client` tag, so the same corpus serves multiple
+tenants from one index. Queries scoped to `embedded-lab` cannot retrieve
+`ops-eng` or `aerospace` pages.
+
+## Corpus
+
+The demo fetches a small public PDF batch on demand and renders selected pages
+to PNGs. The page selections are deliberately capped so local ingest stays
+fast while still indexing visually rich pages.
+
+| Tenant | Sources | Visual signal |
+|---|---|---|
+| `embedded-lab` | Raspberry Pi Pico datasheet, Arduino UNO R3 datasheet, Arduino UNO R3 schematic | Pinout diagrams, board diagrams, circuit schematics |
+| `ops-eng` | PostgreSQL manual, CNCF Kubernetes / cloud-native architecture material | Architecture diagrams, operational flows, dense technical tables |
+| `aerospace` | NASA NTRS nozzle and booster reports | Engineering drawings, cross-sections, charts, mission technical figures |
+
+Generated files are ignored:
+
+```text
+data/pdfs/                # downloaded PDFs
+data/pdfs_manifest.json   # source manifest from fetch_pdfs.py
+data/pages/               # rendered PNG pages
+data/pages_manifest.json  # page-level metadata from render_pages.py
+data/metadata.json        # index metadata from ingest.py
+data/multivectors.npz     # page multivectors from ingest.py
+```
+
+## SIE features used
+
+- `encode` - `vidore/colqwen2.5-v0.2` on page images at ingest and on query
+  text at search time. Output is a `[tokens, 128]` multivector. Late
+  interaction (`sie_sdk.scoring.maxsim`) is the first-stage ranking signal.
+- `extract` - `mynkchaudhry/Florence-2-FT-DocVQA`. Called with
+  `instruction=<your question>` to get a textual answer for the top page, and
+  without `instruction` to OCR the same page for a display snippet. The OCR
+  snippet is UX-only; it never enters ranking.
+- `score` optional - `Qwen/Qwen3-VL-Reranker-2B` second-stage rerank over
+  `(query text, page image)`. Off by default while we wait for an upstream
+  adapter fix; flip `search.visual_rerank: true` in `config.yaml` to enable it
+  on a cluster that's ready.
+
+## Run it
+
+You need Python 3.12 and a reachable SIE cluster.
+
+```bash
+# 1. SIE locally, or point SIE_CLUSTER_URL / SIE_API_KEY at a managed cluster.
+docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
+
+# 2. Fetch public PDFs and render selected pages to PNG.
+cd examples/vision-doc-rag
+pip install -r python/requirements.txt
+python data/fetch_pdfs.py
+python data/render_pages.py
+
+# 3. Encode every rendered page with ColQwen2.5 and save the multivectors.
+python python/ingest.py
+
+# 4a. CLI demo.
+python python/search.py
+
+# 4b. Or start the UI.
+uvicorn --app-dir python server:app --port 8888
+open http://localhost:8888
+```
+
+`render_pages.py` uses `pdf2image` when Poppler is available. If Poppler is
+not installed, it falls back to PyMuPDF, which is installed from
+`python/requirements.txt`.
+
+First run on a cold cluster pays a one-time model load. ColQwen2.5 and
+Florence-2 are both several GB, so expect roughly a minute on CPU and a few
+seconds on GPU before the warm path kicks in.
+
+### Managed cluster
+
+```bash
+export SIE_CLUSTER_URL="https://your-cluster-host:8080"
+export SIE_API_KEY="SL-..."
+```
+
+The defaults in `config.yaml` point at `http://localhost:8080`. Set
+`cluster.gpu` to a profile name like `l4-spot` if the cluster needs an
+explicit GPU class.
+
+## Try these queries
+
+Queries are grouped by what they exercise. Each row names the expected target
+page so you can spot regressions at a glance.
+
+### Visual signal — the ranking comes from the page image, not OCR
+
+| Tenant | Query | Expected target | Why it's interesting |
+|---|---|---|---|
+| `embedded-lab` | Raspberry Pi Pico pinout GP21 | Pi Pico datasheet pinout (pp 4-5) | Abbreviated visual label still drives retrieval. |
+| `embedded-lab` | where is the ATmega16U2 on the schematic? | Arduino UNO R3 schematic (pp 1-2) | Circuit schematic retrieval, not prose. |
+| `ops-eng` | cloud native architecture diagram | CNCF AI whitepaper or Kubernetes slides | Visual architecture page instead of OCR text. |
+| `aerospace` | solid rocket motor nozzle design figure | Solid rocket motor nozzles report | Engineering drawing in a figure-heavy report. |
+
+### Table / value lookups — the DocVQA answer is the point
+
+| Tenant | Query | Expected target | Expected answer |
+|---|---|---|---|
+| `embedded-lab` | What is the operating voltage range of the Raspberry Pi Pico? | Pi Pico datasheet electrical characteristics (pp 6-8) | A voltage range, e.g. 1.8-5.5 V |
+| `embedded-lab` | Which Arduino UNO pin is the built-in LED on? | UNO R3 datasheet pinout (pp 5-11) | D13 / PB5 |
+| `ops-eng` | PostgreSQL default listening port | PG 18 manual config section (pp 19-24) | 5432 |
+| `ops-eng` | What is the default value of max_connections in PostgreSQL? | PG 18 manual parameter table (pp 19-24) | 100 |
+| `aerospace` | What is the throat diameter shown in the nozzle drawing? | Nozzle design figure | A labeled dimension off the drawing |
+
+### Disambiguation — two PDFs in one tenant, the right one must win
+
+| Tenant | Query | Should pick | Should beat |
+|---|---|---|---|
+| `aerospace` | solid propellant rocket nozzle cross-section | `solid-rocket-motor-nozzles.pdf` | `liquid-rocket-engine-nozzles.pdf` |
+| `aerospace` | regeneratively cooled nozzle | `liquid-rocket-engine-nozzles.pdf` (regen cooling is liquid-specific) | `solid-rocket-motor-nozzles.pdf` |
+| `embedded-lab` | USB-to-serial interface chip on the schematic | `arduino-uno-r3-schematic.pdf` (ATmega16U2) | `raspberry-pi-pico-datasheet.pdf` |
+| `embedded-lab` | RP2040 GPIO function table | `raspberry-pi-pico-datasheet.pdf` | `arduino-uno-r3-datasheet.pdf` |
+
+### Tenant-leak negatives — the matching content lives in a different tenant
+
+| Scoped to | Query | Pass condition |
+|---|---|---|
+| `ops-eng` | Raspberry Pi Pico pinout GP21 | No embedded-lab pages return. |
+| `ops-eng` | regeneratively cooled nozzle | No aerospace pages return. |
+| `aerospace` | cloud native architecture diagram | No ops-eng pages return. |
+| `embedded-lab` | PostgreSQL connection pool | No ops-eng pages return. |
+
+## API
+
+### `GET /api/search`
+
+| Parameter | Required | Description |
+|---|---|---|
+| `q` | yes | Search query |
+| `client` | no | Tenant filter, for example `embedded-lab`. Omitted means search all tenants. |
+
+```bash
+curl "http://localhost:8888/api/search?q=Raspberry+Pi+Pico+pinout+GP21&client=embedded-lab"
+```
+
+```json
+{
+  "query": "Raspberry Pi Pico pinout GP21",
+  "client": "embedded-lab",
+  "answer": "GP21 can be used for ...",
+  "results": [
+    {
+      "page_id": "embedded-lab__raspberry-pi-pico-datasheet__p005",
+      "client": "embedded-lab",
+      "title": "Raspberry Pi Pico Datasheet",
+      "publisher": "Raspberry Pi Ltd",
+      "source_pdf": "raspberry-pi-pico-datasheet.pdf",
+      "page_number": 5,
+      "citation": "raspberry-pi-pico-datasheet.pdf · p.5",
+      "page_image": "/pages/embedded-lab/raspberry-pi-pico-datasheet_p005.png",
+      "scores": { "maxsim": 14.44, "rerank": null }
+    }
+  ]
+}
+```
+
+### `GET /api/clients`, `GET /api/stats`
+
+Tenant list and runtime config: active models, rerank on/off, and page count.
+
+## How it works
+
+```text
+        ingest.py  (once per corpus)
+        fetch_pdfs.py -> data/pdfs/{tenant}/*.pdf
+             -> render_pages.py -> data/pages/{tenant}/*.png
+             -> data/pages_manifest.json
+             -> SIE.encode(ColQwen2.5, images, multivector)
+             -> data/multivectors.npz + data/metadata.json
+
+        search.py / server.py  (per query)
+        q -> SIE.encode(ColQwen2.5, text, is_query=True)
+          -> filter metadata by tenant
+          -> sie_sdk.scoring.maxsim -> top_k_candidates
+          -> optional SIE.score(Qwen3-VL-Reranker, q, images)
+          -> SIE.extract(Florence-2-DocVQA, instruction=q, images=[top_page])
+          -> SIE.extract(Florence-2-DocVQA, images=[top_page]) for display OCR
+```
+
+OCR is never on the score path. The visual reranker, when enabled, ranks over
+the same modality as retrieval, so layout cues survive both stages.
+
+The corpus is small enough that MaxSim runs in Python. For thousands of pages,
+hand the multivectors to LanceDB, Vespa, or another multivector store; the SIE
+calls stay the same.
+
+## Customize
+
+`data/fetch_pdfs.py` owns the curated source list. Add a source with:
+
+```python
+{
+    "client": "my-tenant",
+    "slug": "my-manual",
+    "title": "My Manual",
+    "publisher": "Example Publisher",
+    "license": "CC BY 4.0",
+    "url": "https://example.com/my-manual.pdf",
+    "pages": [1, 2, 7, 8],
+}
+```
+
+Then rerun:
+
+```bash
+python data/fetch_pdfs.py
+python data/render_pages.py
+python python/ingest.py
+```
+
+`config.yaml` is the model and rendering tuning surface:
+
+```yaml
+models:
+  retriever: "vidore/colqwen2.5-v0.2"
+  docvqa: "mynkchaudhry/Florence-2-FT-DocVQA"
+  reranker: "Qwen/Qwen3-VL-Reranker-2B"
+render:
+  backend: "auto"
+  dpi: 160
+search:
+  top_k_candidates: 5
+  top_k_results: 3
+  visual_rerank: false
+  answer: true
+  ocr_snippet: true
+```
+
+## Project layout
+
+```text
+examples/vision-doc-rag/
+├── config.yaml
+├── data/
+│   ├── fetch_pdfs.py          # curated public PDF source list + downloader
+│   ├── render_pages.py        # PDFs -> PNG pages + pages_manifest.json
+│   ├── pdfs/                  # generated
+│   ├── pages/                 # generated PNGs
+│   ├── metadata.json          # generated by ingest
+│   └── multivectors.npz       # generated by ingest
+├── python/
+│   ├── ingest.py
+│   ├── search.py
+│   ├── server.py
+│   └── requirements.txt
+└── static/
+    └── index.html
+```
diff --git a/examples/vision-doc-rag/config.yaml b/examples/vision-doc-rag/config.yaml
new file mode 100644
index 00000000..587548f2
--- /dev/null
+++ b/examples/vision-doc-rag/config.yaml
@@ -0,0 +1,40 @@
+# SIE server (defaults to local Docker: docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default).
+# Override with SIE_CLUSTER_URL / SIE_API_KEY env vars when targeting a managed cluster.
+cluster:
+  url: "http://localhost:8080"
+  api_key: ""
+  gpu: ""                       # only set for managed multi-GPU clusters (e.g. "l4-spot"); ignored locally
+  provision_timeout_s: 600
+
+# Models. The retrieval signal is vision end-to-end: ColQwen2.5 reads each page
+# as an image and we late-interact (MaxSim) against the same model's text-side
+# embedding of the query. No OCR is involved in ranking, so charts, screenshots,
+# tables, and any other layout cue that wouldn't survive an OCR round-trip
+# still contributes to the score.
+#
+# DocVQA produces a textual answer for the top page. The model takes the page
+# image + the user's question (passed via `instruction`) and returns the answer
+# as an entity in the response — no separate LLM call needed.
+models:
+  retriever: "vidore/colqwen2.5-v0.2"
+  docvqa: "mynkchaudhry/Florence-2-FT-DocVQA"
+  # Optional second-stage cross-encoder rerank. Visual model so we don't have to
+  # collapse the page through OCR before reranking. Disabled by default while
+  # we wait for the cluster-side adapter bug to land:
+  #   https://github.com/superlinked/sie-internal/issues/1026
+  # Re-enable with search.visual_rerank: true once that ships.
+  reranker: "Qwen/Qwen3-VL-Reranker-2B"
+
+# Page rendering. `auto` tries pdf2image/Poppler first and falls back to
+# PyMuPDF when Poppler is not installed.
+render:
+  backend: "auto"                # auto | pdf2image | pymupdf
+  dpi: 160
+
+# Retrieval
+search:
+  top_k_candidates: 5           # how many pages survive MaxSim
+  top_k_results: 3              # how many pages return after optional rerank
+  visual_rerank: false          # see models.reranker note above
+  answer: true                  # run DocVQA on the top page for a textual answer
+  ocr_snippet: true             # OCR the top page for a display-only snippet in the UI
diff --git a/examples/vision-doc-rag/data/fetch_pdfs.py b/examples/vision-doc-rag/data/fetch_pdfs.py
new file mode 100644
index 00000000..21ade844
--- /dev/null
+++ b/examples/vision-doc-rag/data/fetch_pdfs.py
@@ -0,0 +1,158 @@
+"""Download the public PDF corpus for the visual document RAG demo.
+
+The corpus is intentionally small and curated. Each source has a tenant, a
+stable slug, source metadata, and a limited page selection so the demo can be
+indexed quickly while still containing diagrams, schematics, screenshots, and
+technical figures that reward visual retrieval.
+"""
+
+from __future__ import annotations
+
+import json
+import shutil
+import sys
+import tempfile
+from pathlib import Path
+from urllib.error import HTTPError, URLError
+from urllib.request import Request, urlopen
+
+
+SOURCES = [
+    {
+        "client": "embedded-lab",
+        "slug": "raspberry-pi-pico-datasheet",
+        "title": "Raspberry Pi Pico Datasheet",
+        "publisher": "Raspberry Pi Ltd",
+        "license": "CC BY-ND 4.0",
+        "url": "https://datasheets.raspberrypi.com/pico/pico-datasheet.pdf",
+        "pages": [4, 5, 6, 7, 8, 9],
+    },
+    {
+        "client": "embedded-lab",
+        "slug": "arduino-uno-r3-datasheet",
+        "title": "Arduino UNO R3 Datasheet",
+        "publisher": "Arduino",
+        "license": "Arduino documentation / open hardware terms",
+        "url": "https://docs.arduino.cc/resources/datasheets/A000066-datasheet.pdf",
+        "pages": [5, 6, 7, 8, 9, 10, 11],
+    },
+    {
+        "client": "embedded-lab",
+        "slug": "arduino-uno-r3-schematic",
+        "title": "Arduino UNO R3 Schematic",
+        "publisher": "Arduino",
+        "license": "CC BY-SA 4.0 hardware reference design",
+        "url": "https://docs.arduino.cc/resources/schematics/A000066-schematics.pdf",
+        "pages": [1, 2],
+    },
+    {
+        "client": "ops-eng",
+        "slug": "postgresql-18-manual",
+        "title": "PostgreSQL 18 Documentation",
+        "publisher": "PostgreSQL Global Development Group",
+        "license": "PostgreSQL License",
+        "url": "https://www.postgresql.org/files/documentation/pdf/18/postgresql-18-A4.pdf",
+        "pages": [19, 20, 21, 22, 23, 24],
+    },
+    {
+        "client": "ops-eng",
+        "slug": "kubernetes-infrastructure-abstraction",
+        "title": "Kubernetes as Infrastructure Abstraction",
+        "publisher": "Cloud Native Computing Foundation",
+        "license": "CNCF public presentation material",
+        "url": "https://www.cncf.io/wp-content/uploads/2020/08/2019-09-Kubernetes-as-Infrastructure-Abstraction.pdf",
+        "pages": [6, 7, 8, 9, 10, 11],
+    },
+    {
+        "client": "ops-eng",
+        "slug": "cloud-native-ai-whitepaper",
+        "title": "Cloud Native Artificial Intelligence Whitepaper",
+        "publisher": "Cloud Native Computing Foundation",
+        "license": "CNCF documentation / report terms",
+        "url": "https://www.cncf.io/wp-content/uploads/2024/03/cloud_native_ai24_031424a-2.pdf",
+        "pages": [11, 12, 13, 14, 15, 16],
+    },
+    {
+        "client": "aerospace",
+        "slug": "solid-rocket-motor-nozzles",
+        "title": "Solid Rocket Motor Nozzles",
+        "publisher": "NASA Technical Reports Server",
+        "license": "NASA STI public release",
+        "url": "https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19760013126.pdf",
+        "pages": [1, 2, 3, 4, 5, 6],
+    },
+    {
+        "client": "aerospace",
+        "slug": "liquid-rocket-engine-nozzles",
+        "title": "Liquid Rocket Engine Nozzles",
+        "publisher": "NASA Technical Reports Server",
+        "license": "NASA STI public release",
+        "url": "https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19770009165.pdf",
+        "pages": [1, 2, 3, 4, 5, 6],
+    },
+    {
+        "client": "aerospace",
+        "slug": "sls-booster-state-machine",
+        "title": "State Machine Modeling of the Space Launch System Solid Rocket Boosters",
+        "publisher": "NASA Technical Reports Server",
+        "license": "NASA STI public release",
+        "url": "https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20160000328.pdf",
+        "pages": [1, 2, 3, 4, 5, 6],
+    },
+]
+
+
+def _download(url: str, out: Path) -> bool:
+    """Download url to out atomically. Return True when a new file was written."""
+    if out.exists() and out.stat().st_size > 0:
+        return False
+
+    out.parent.mkdir(parents=True, exist_ok=True)
+    request = Request(
+        url,
+        headers={
+            "User-Agent": "sie-vision-doc-rag-demo/1.0",
+            "Accept": "application/pdf,*/*",
+        },
+    )
+    with tempfile.NamedTemporaryFile(delete=False, dir=out.parent, suffix=".tmp") as tmp:
+        tmp_path = Path(tmp.name)
+        try:
+            with urlopen(request, timeout=60) as response:
+                shutil.copyfileobj(response, tmp)
+        except (HTTPError, URLError, TimeoutError):
+            tmp_path.unlink(missing_ok=True)
+            raise
+
+    tmp_path.replace(out)
+    return True
+
+
+def main() -> None:
+    here = Path(__file__).resolve().parent
+    pdf_root = here / "pdfs"
+    manifest = []
+
+    for source in SOURCES:
+        pdf_path = pdf_root / source["client"] / f"{source['slug']}.pdf"
+        try:
+            downloaded = _download(source["url"], pdf_path)
+        except Exception as exc:
+            print(f"Failed to download {source['url']}: {type(exc).__name__}: {exc}", file=sys.stderr)
+            raise
+
+        row = dict(source)
+        row["pdf_path"] = str(pdf_path.relative_to(here))
+        row["source_pdf"] = pdf_path.name
+        manifest.append(row)
+
+        status = "downloaded" if downloaded else "cached"
+        print(f"  {status:10s} {source['client']:12s} {source['slug']} -> {row['pdf_path']}")
+
+    out = here / "pdfs_manifest.json"
+    out.write_text(json.dumps({"sources": manifest}, indent=2) + "\n")
+    print(f"\nWrote {len(manifest)} PDF sources to {out}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/data/render_pages.py b/examples/vision-doc-rag/data/render_pages.py
new file mode 100644
index 00000000..9c123305
--- /dev/null
+++ b/examples/vision-doc-rag/data/render_pages.py
@@ -0,0 +1,147 @@
+"""Rasterize the curated PDF corpus to page PNGs.
+
+The script tries pdf2image first because it produces excellent page images
+when Poppler is installed. If Poppler or pdf2image is unavailable, it falls
+back to PyMuPDF so the demo still works with only Python package dependencies.
+"""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+import yaml
+
+
+def _selected_pages(source: dict, total_pages: int) -> list[int]:
+    pages = source.get("pages")
+    if pages:
+        selected = [int(p) for p in pages if 1 <= int(p) <= total_pages]
+    else:
+        start = int(source.get("start_page", 1))
+        max_pages = int(source.get("max_pages", 6))
+        selected = list(range(start, min(total_pages, start + max_pages - 1) + 1))
+
+    if not selected:
+        raise ValueError(f"No valid pages selected for {source['slug']} ({total_pages} pages)")
+    return selected
+
+
+def _pdf_page_count_with_pymupdf(pdf_path: Path) -> int:
+    import fitz
+
+    with fitz.open(pdf_path) as doc:
+        return doc.page_count
+
+
+def _render_with_pdf2image(pdf_path: Path, page_number: int, out_path: Path, dpi: int) -> None:
+    from pdf2image import convert_from_path
+
+    images = convert_from_path(
+        str(pdf_path),
+        dpi=dpi,
+        first_page=page_number,
+        last_page=page_number,
+        fmt="png",
+        single_file=True,
+    )
+    if not images:
+        raise RuntimeError(f"pdf2image returned no image for {pdf_path} page {page_number}")
+    images[0].save(out_path)
+
+
+def _render_with_pymupdf(pdf_path: Path, page_number: int, out_path: Path, dpi: int) -> None:
+    import fitz
+
+    zoom = dpi / 72
+    matrix = fitz.Matrix(zoom, zoom)
+    with fitz.open(pdf_path) as doc:
+        page = doc.load_page(page_number - 1)
+        pixmap = page.get_pixmap(matrix=matrix, alpha=False)
+        pixmap.save(out_path)
+
+
+def _render_page(pdf_path: Path, page_number: int, out_path: Path, dpi: int, backend: str) -> str:
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    if backend in {"auto", "pdf2image"}:
+        try:
+            _render_with_pdf2image(pdf_path, page_number, out_path, dpi)
+            return "pdf2image"
+        except Exception as exc:
+            if backend == "pdf2image":
+                raise
+            print(
+                f"  pdf2image unavailable for {pdf_path.name} p.{page_number} "
+                f"({type(exc).__name__}); falling back to PyMuPDF",
+                file=sys.stderr,
+            )
+
+    _render_with_pymupdf(pdf_path, page_number, out_path, dpi)
+    return "pymupdf"
+
+
+def main() -> None:
+    here = Path(__file__).resolve().parent
+    root = here.parent
+    manifest_path = here / "pdfs_manifest.json"
+    if not manifest_path.exists():
+        print("pdfs_manifest.json not found; run `python data/fetch_pdfs.py` first", file=sys.stderr)
+        sys.exit(1)
+
+    config = yaml.safe_load((root / "config.yaml").read_text())
+    render_config = config.get("render", {})
+    dpi = int(render_config.get("dpi", 160))
+    backend = render_config.get("backend", "auto")
+    active_backend = backend
+    out_dir = here / "pages"
+
+    pdf_manifest = json.loads(manifest_path.read_text())
+    page_manifest: list[dict] = []
+    backend_counts: dict[str, int] = {}
+
+    for source in pdf_manifest["sources"]:
+        pdf_path = here / source["pdf_path"]
+        if not pdf_path.exists():
+            raise FileNotFoundError(f"Missing PDF: {pdf_path}. Run data/fetch_pdfs.py.")
+
+        total_pages = _pdf_page_count_with_pymupdf(pdf_path)
+        for page_number in _selected_pages(source, total_pages):
+            page_id = f"{source['client']}__{source['slug']}__p{page_number:03d}"
+            image_path = out_dir / source["client"] / f"{source['slug']}_p{page_number:03d}.png"
+            used_backend = _render_page(pdf_path, page_number, image_path, dpi, active_backend)
+            if backend == "auto" and used_backend == "pymupdf":
+                active_backend = "pymupdf"
+            backend_counts[used_backend] = backend_counts.get(used_backend, 0) + 1
+
+            rel_image_path = image_path.relative_to(here)
+            page_manifest.append(
+                {
+                    "page_id": page_id,
+                    "client": source["client"],
+                    "title": source["title"],
+                    "publisher": source["publisher"],
+                    "license": source["license"],
+                    "source_url": source["url"],
+                    "source_pdf": source["source_pdf"],
+                    "source_pdf_path": source["pdf_path"],
+                    "page_number": page_number,
+                    "image_path": str(rel_image_path),
+                }
+            )
+            print(
+                f"  {source['client']:12s} {source['slug']:38s} "
+                f"p.{page_number:<4d} -> data/{rel_image_path}"
+            )
+
+    out = here / "pages_manifest.json"
+    out.write_text(json.dumps(page_manifest, indent=2) + "\n")
+
+    print(f"\nRendered {len(page_manifest)} pages to {out_dir}")
+    print(f"Wrote page manifest to {out}")
+    for name, count in sorted(backend_counts.items()):
+        print(f"  {name}: {count} pages")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/ingest.py b/examples/vision-doc-rag/python/ingest.py
new file mode 100644
index 00000000..8b0f8e11
--- /dev/null
+++ b/examples/vision-doc-rag/python/ingest.py
@@ -0,0 +1,124 @@
+"""Build the per-tenant visual index.
+
+For every rendered PDF page PNG we ask SIE to encode the image with
+vidore/colqwen2.5-v0.2, which returns a [tokens, 128] multivector. Each page's
+multivector goes into a single .npz on disk, alongside a metadata.json that
+keeps the client name, source PDF, page number, and source URL for routing,
+filtering, and citation at query time.
+
+There is no vector database here. MaxSim at the scale of one team's wiki
+(hundreds to thousands of pages) is cheap and avoids the indexing step.
+For larger corpora swap the .npz for a multivector store (LanceDB, Vespa,
+Turbopuffer); the encode call is the same.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import time
+from pathlib import Path
+
+import numpy as np
+import yaml
+
+from sie_sdk import SIEClient
+from sie_sdk.types import Item
+
+
+def load_config():
+    return yaml.safe_load((Path(__file__).resolve().parent.parent / "config.yaml").read_text())
+
+
+def load_pages():
+    pages_path = Path(__file__).resolve().parent.parent / "data" / "pages_manifest.json"
+    if not pages_path.exists():
+        raise FileNotFoundError(
+            "data/pages_manifest.json not found. Run `python data/fetch_pdfs.py` "
+            "and `python data/render_pages.py` first."
+        )
+    return json.loads(pages_path.read_text())
+
+
+def encode_pages(client: SIEClient, model: str, pages: list[dict], gpu: str, timeout: float):
+    data_dir = Path(__file__).resolve().parent.parent / "data"
+    multivectors: list[np.ndarray] = []
+    metadata: list[dict] = []
+
+    for i, page in enumerate(pages, 1):
+        image_path = data_dir / page["image_path"]
+        if not image_path.exists():
+            raise FileNotFoundError(f"Missing page image: {image_path}. Run data/render_pages.py.")
+
+        start = time.time()
+        result = client.encode(
+            model,
+            Item(id=page["page_id"], images=[str(image_path)]),
+            output_types=["multivector"],
+            gpu=gpu,
+            wait_for_capacity=True,
+            provision_timeout_s=timeout,
+        )
+        elapsed = time.time() - start
+        mv = result["multivector"].astype(np.float32)
+        multivectors.append(mv)
+        metadata.append(
+            {
+                "page_id": page["page_id"],
+                "client": page["client"],
+                "title": page["title"],
+                "publisher": page["publisher"],
+                "license": page["license"],
+                "source_url": page["source_url"],
+                "source_pdf": page["source_pdf"],
+                "source_pdf_path": page["source_pdf_path"],
+                "page_number": page["page_number"],
+                "image_path": page["image_path"],
+                "num_tokens": int(mv.shape[0]),
+            }
+        )
+        citation = f"{page['source_pdf']} · p.{page['page_number']}"
+        print(f"  [{i}/{len(pages)}] {page['client']:12s} {citation:44s} {mv.shape} in {elapsed:.1f}s")
+
+    return multivectors, metadata
+
+
+def main():
+    config = load_config()
+    pages = load_pages()
+    print(f"Loaded {len(pages)} pages")
+
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+    gpu = config["cluster"]["gpu"]
+    timeout = config["cluster"]["provision_timeout_s"]
+    model = config["models"]["retriever"]
+
+    print(f"\n--- Encoding pages with {model} ---")
+    with SIEClient(cluster_url, api_key=api_key) as client:
+        multivectors, metadata = encode_pages(client, model, pages, gpu, timeout)
+
+    data_dir = Path(__file__).resolve().parent.parent / "data"
+    # np.savez stores variable-length multivectors as one entry per array; we
+    # key them by page_id so the search side can reload without an extra index.
+    np.savez(
+        data_dir / "multivectors.npz",
+        **{m["page_id"]: mv for m, mv in zip(metadata, multivectors)},
+    )
+    (data_dir / "metadata.json").write_text(json.dumps(metadata, indent=2))
+
+    total_tokens = sum(m["num_tokens"] for m in metadata)
+    by_client: dict[str, int] = {}
+    for m in metadata:
+        by_client[m["client"]] = by_client.get(m["client"], 0) + 1
+
+    print(f"\n  Saved {len(metadata)} multivectors to data/multivectors.npz")
+    print(f"  Saved metadata to data/metadata.json")
+    print(f"  Total visual tokens: {total_tokens}")
+    print("  Pages per tenant:")
+    for client_name in sorted(by_client):
+        print(f"    {client_name}: {by_client[client_name]}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/requirements.txt b/examples/vision-doc-rag/python/requirements.txt
new file mode 100644
index 00000000..1ea77ae9
--- /dev/null
+++ b/examples/vision-doc-rag/python/requirements.txt
@@ -0,0 +1,8 @@
+sie-sdk==0.3.4
+fastapi>=0.115.0
+uvicorn>=0.30.0
+numpy>=1.26.0
+pyyaml>=6.0
+Pillow>=10.3.0
+pdf2image>=1.17.0
+PyMuPDF>=1.24.0
diff --git a/examples/vision-doc-rag/python/search.py b/examples/vision-doc-rag/python/search.py
new file mode 100644
index 00000000..74f86cd1
--- /dev/null
+++ b/examples/vision-doc-rag/python/search.py
@@ -0,0 +1,261 @@
+"""Visual document search + question answering, vision end-to-end.
+
+Pipeline per query:
+  1. encode(ColQwen2.5, text)          — query multivector
+  2. sie_sdk.scoring.maxsim             — late interaction against page images
+  3. score(Qwen3-VL-Reranker, query, images)   — optional, off by default
+  4. extract(Florence-2-FT-DocVQA, instruction=query, images=[top page])
+                                        — textual answer + citation
+  5. extract(Florence-2-FT-DocVQA, images=[top page])
+                                        — OCR snippet for the UI (display only,
+                                          NOT in the ranking path)
+
+The ranking is decided by a vision model looking at the page image, so charts,
+screenshots, tables, and any other visual signal that OCR would erase still
+contributes. OCR runs only on the chosen page, only to provide on-screen text
+the user can read or copy.
+
+Multi-tenant isolation is a Python filter on metadata before MaxSim, so a
+query scoped to one client never sees another client's pages.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import time
+from pathlib import Path
+
+import numpy as np
+import yaml
+
+from sie_sdk import SIEClient
+from sie_sdk.scoring import maxsim
+from sie_sdk.types import Item
+
+
+def load_config():
+    return yaml.safe_load((Path(__file__).resolve().parent.parent / "config.yaml").read_text())
+
+
+def load_index():
+    data_dir = Path(__file__).resolve().parent.parent / "data"
+    if not (data_dir / "multivectors.npz").exists():
+        raise FileNotFoundError("data/multivectors.npz missing. Run `python python/ingest.py` first.")
+    npz = np.load(data_dir / "multivectors.npz")
+    metadata = json.loads((data_dir / "metadata.json").read_text())
+    required = {"page_id", "client", "source_pdf", "page_number", "image_path", "publisher", "source_url"}
+    if metadata:
+        missing = required - set(metadata[0])
+        if missing:
+            raise ValueError(
+                "data/metadata.json was generated by an older corpus shape. "
+                "Run `python data/fetch_pdfs.py`, `python data/render_pages.py`, "
+                "then `python python/ingest.py`."
+            )
+    multivectors = {m["page_id"]: npz[m["page_id"]] for m in metadata}
+    return multivectors, metadata
+
+
+def _ocr_snippet(entities: list[dict], max_chars: int = 400) -> str:
+    """Concatenate OCR text regions into a single readable snippet."""
+    pieces = []
+    for e in entities or []:
+        text = (e.get("text") or "").replace("</s>", "").strip()
+        if text:
+            pieces.append(text)
+    joined = " · ".join(pieces)
+    if len(joined) > max_chars:
+        return joined[: max_chars - 1] + "…"
+    return joined
+
+
+def _docvqa_answer(entities: list[dict]) -> str:
+    """Pick the answer string out of a Florence-2 DocVQA response.
+
+    Florence-2 returns the answer as an entity (often the single one when the
+    `<DocVQA>` task token is dispatched). We take the first non-empty text.
+    """
+    for e in entities or []:
+        text = (e.get("text") or "").replace("</s>", "").strip()
+        if text:
+            return text
+    return ""
+
+
+def search(
+    client: SIEClient,
+    config: dict,
+    multivectors: dict[str, np.ndarray],
+    metadata: list[dict],
+    query: str,
+    client_filter: str | None = None,
+) -> dict:
+    gpu = config["cluster"]["gpu"]
+    timeout = config["cluster"]["provision_timeout_s"]
+    top_k_candidates = config["search"]["top_k_candidates"]
+    top_k_results = config["search"]["top_k_results"]
+    do_visual_rerank = config["search"].get("visual_rerank", False)
+    do_answer = config["search"].get("answer", True)
+    do_ocr_snippet = config["search"].get("ocr_snippet", True)
+
+    corpus = [m for m in metadata if not client_filter or m["client"] == client_filter]
+    if not corpus:
+        return {"results": [], "answer": None, "timings": {}}
+
+    timings: dict[str, float] = {}
+    pages_root = Path(__file__).resolve().parent.parent / "data"
+
+    # 1. Encode query (text side of ColQwen2.5).
+    t0 = time.time()
+    q_result = client.encode(
+        config["models"]["retriever"],
+        Item(text=query),
+        output_types=["multivector"],
+        is_query=True,
+        gpu=gpu,
+        wait_for_capacity=True,
+        provision_timeout_s=timeout,
+    )
+    timings["encode_query_s"] = round(time.time() - t0, 3)
+    query_mv = q_result["multivector"].astype(np.float32)
+
+    # 2. MaxSim against in-memory multivectors.
+    doc_mvs = [multivectors[m["page_id"]] for m in corpus]
+    t0 = time.time()
+    maxsim_scores = maxsim(query_mv, doc_mvs)
+    timings["maxsim_s"] = round(time.time() - t0, 3)
+
+    order = np.argsort(maxsim_scores)[::-1][:top_k_candidates]
+    candidates: list[dict] = []
+    for idx in order:
+        c = dict(corpus[idx])
+        c["_maxsim_score"] = float(maxsim_scores[idx])
+        c["_rerank_score"] = None
+        candidates.append(c)
+
+    # 3. Optional visual rerank. Image-in cross-encoder so OCR never enters the
+    #    ranking path. Disabled by default — see config.yaml for the cluster
+    #    bug we're waiting on.
+    if do_visual_rerank and candidates:
+        try:
+            t0 = time.time()
+            rerank_items = [
+                Item(id=c["page_id"], images=[str(pages_root / c["image_path"])])
+                for c in candidates
+            ]
+            rerank = client.score(
+                config["models"]["reranker"],
+                Item(text=query),
+                rerank_items,
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["visual_rerank_s"] = round(time.time() - t0, 3)
+            rerank_by_id = {s["item_id"]: s for s in rerank["scores"]}
+            for c in candidates:
+                s = rerank_by_id.get(c["page_id"])
+                c["_rerank_score"] = float(s["score"]) if s else 0.0
+            candidates.sort(key=lambda c: c["_rerank_score"] or 0.0, reverse=True)
+        except Exception as exc:
+            # Cluster adapter bug fallback: keep MaxSim ordering, surface the
+            # failure to the caller. See sie-internal#1026.
+            timings["visual_rerank_error"] = type(exc).__name__
+
+    results = candidates[:top_k_results]
+
+    # 4. DocVQA answer from the top page image. instruction= goes in as the
+    #    plain question; the adapter prepends Florence-2's `<DocVQA>` task
+    #    token. See superlinked.com/docs/extract/vision.
+    answer = None
+    if do_answer and results:
+        top = results[0]
+        try:
+            t0 = time.time()
+            qa = client.extract(
+                config["models"]["docvqa"],
+                Item(images=[str(pages_root / top["image_path"])]),
+                instruction=query,
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["docvqa_s"] = round(time.time() - t0, 3)
+            answer = _docvqa_answer(qa["entities"])
+        except Exception as exc:
+            timings["docvqa_error"] = type(exc).__name__
+
+    # 5. OCR snippet for display — only on the top result so users see the
+    #    text on the page they're being shown. Never used as a ranking signal.
+    if do_ocr_snippet and results:
+        top = results[0]
+        try:
+            t0 = time.time()
+            ocr = client.extract(
+                config["models"]["docvqa"],   # same model, no `instruction` ⇒ OCR mode
+                Item(images=[str(pages_root / top["image_path"])]),
+                gpu=gpu,
+                wait_for_capacity=True,
+                provision_timeout_s=timeout,
+            )
+            timings["ocr_snippet_s"] = round(time.time() - t0, 3)
+            top["ocr_snippet"] = _ocr_snippet(ocr["entities"])
+        except Exception as exc:
+            timings["ocr_snippet_error"] = type(exc).__name__
+
+    return {"results": results, "answer": answer, "timings": timings}
+
+
+def print_run(out: dict, query: str, client_filter: str | None):
+    scope = client_filter or "all clients"
+    print(f'\n  Query: "{query}"  ({scope})')
+    print(f"  Timings: {out['timings']}")
+    if out["answer"]:
+        print(f"\n  Answer: {out['answer']}")
+    if not out["results"]:
+        print("  No results.")
+        return
+    for i, r in enumerate(out["results"], 1):
+        rerank = r.get("_rerank_score")
+        rerank_str = f"rerank={rerank:.4f}" if rerank is not None else "rerank=—"
+        print(f"\n  {i}. [{r['client']}] {r['title']}")
+        print(f"     {r['source_pdf']}  ·  p.{r['page_number']}  ·  {r['publisher']}")
+        print(f"     maxsim={r['_maxsim_score']:.3f}  {rerank_str}")
+        if r.get("ocr_snippet"):
+            print(f"     OCR snippet: {r['ocr_snippet'][:200]}")
+        print(f"     url: {r['source_url']}")
+
+
+def main():
+    config = load_config()
+    multivectors, metadata = load_index()
+    print(f"Loaded index: {len(metadata)} pages")
+
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+
+    demo = [
+        # Visual signal — ranking is driven by the page image.
+        ("Raspberry Pi Pico pinout GP21", "embedded-lab"),
+        ("cloud native architecture diagram", "ops-eng"),
+        ("solid rocket motor nozzle design figure", "aerospace"),
+        # No tenant filter: shows the query routes across tenants.
+        ("ATmega16U2 power tree diagram", None),
+        # Table / value lookup — DocVQA must return a specific value, not the title.
+        ("What is the operating voltage range of the Raspberry Pi Pico?", "embedded-lab"),
+        ("PostgreSQL default listening port", "ops-eng"),
+        # Disambiguation — two PDFs in one tenant; the right one must win.
+        ("solid propellant rocket nozzle cross-section", "aerospace"),
+        # Tenant-leak negative — the matching content lives in aerospace; scoping
+        # to ops-eng must return no aerospace pages.
+        ("regeneratively cooled nozzle", "ops-eng"),
+    ]
+    with SIEClient(cluster_url, api_key=api_key) as client:
+        for query, tenant in demo:
+            out = search(client, config, multivectors, metadata, query, tenant)
+            print_run(out, query, tenant)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/vision-doc-rag/python/server.py b/examples/vision-doc-rag/python/server.py
new file mode 100644
index 00000000..990857fa
--- /dev/null
+++ b/examples/vision-doc-rag/python/server.py
@@ -0,0 +1,99 @@
+"""FastAPI backend for the multi-tenant visual-document search + QA demo."""
+
+from __future__ import annotations
+
+import os
+from contextlib import asynccontextmanager
+from pathlib import Path
+
+import yaml
+from fastapi import FastAPI, Query
+from fastapi.responses import FileResponse
+from fastapi.staticfiles import StaticFiles
+
+from sie_sdk import SIEClient
+
+from search import load_index, search
+
+config = None
+multivectors = None
+metadata = None
+client = None
+clients_index: list[str] = []
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    global config, multivectors, metadata, client, clients_index
+    root = Path(__file__).resolve().parent.parent
+    config = yaml.safe_load((root / "config.yaml").read_text())
+    multivectors, metadata = load_index()
+    cluster_url = os.environ.get("SIE_CLUSTER_URL", config["cluster"]["url"])
+    api_key = os.environ.get("SIE_API_KEY", config["cluster"]["api_key"])
+    client = SIEClient(cluster_url, api_key=api_key)
+    clients_index = sorted({m["client"] for m in metadata})
+    yield
+    client.close()
+
+
+app = FastAPI(title="SIE Vision-First Document RAG", lifespan=lifespan)
+
+root = Path(__file__).resolve().parent.parent
+static_dir = root / "static"
+app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
+app.mount("/pages", StaticFiles(directory=str(root / "data" / "pages")), name="pages")
+
+
+@app.get("/")
+def index():
+    return FileResponse(str(static_dir / "index.html"))
+
+
+@app.get("/api/clients")
+def api_clients():
+    return clients_index
+
+
+@app.get("/api/stats")
+def api_stats():
+    return {
+        "total_pages": len(metadata),
+        "clients": clients_index,
+        "models": config["models"],
+        "visual_rerank": config["search"].get("visual_rerank", False),
+        "answer": config["search"].get("answer", True),
+    }
+
+
+@app.get("/api/search")
+def api_search(
+    q: str = Query(..., min_length=1),
+    client_name: str | None = Query(None, alias="client"),
+):
+    out = search(client, config, multivectors, metadata, q, client_name)
+    return {
+        "query": q,
+        "client": client_name,
+        "answer": out["answer"],
+        "timings": out["timings"],
+        "results": [
+            {
+                "page_id": r["page_id"],
+                "client": r["client"],
+                "title": r["title"],
+                "publisher": r["publisher"],
+                "license": r["license"],
+                "source_url": r["source_url"],
+                "source_pdf": r["source_pdf"],
+                "page_number": r["page_number"],
+                "citation": f"{r['source_pdf']} · p.{r['page_number']}",
+                "page_image": f"/{r['image_path']}",
+                "ocr_snippet": r.get("ocr_snippet", ""),
+                "scores": {
+                    "maxsim": round(r["_maxsim_score"], 4),
+                    "rerank": round(r["_rerank_score"], 4) if r.get("_rerank_score") is not None else None,
+                },
+            }
+            for r in out["results"]
+        ],
+    }
diff --git a/examples/vision-doc-rag/static/index.html b/examples/vision-doc-rag/static/index.html
new file mode 100644
index 00000000..2b3eb7c3
--- /dev/null
+++ b/examples/vision-doc-rag/static/index.html
@@ -0,0 +1,199 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Vision-First Document RAG · SIE</title>
+    <style>
+      :root {
+        color-scheme: light;
+        --fg: #0f172a;
+        --muted: #475569;
+        --bg: #f8fafc;
+        --card: #ffffff;
+        --border: #e2e8f0;
+        --accent: #0ea5e9;
+      }
+      * { box-sizing: border-box; }
+      body {
+        font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Inter, system-ui, sans-serif;
+        margin: 0;
+        background: var(--bg);
+        color: var(--fg);
+      }
+      header {
+        padding: 24px 32px;
+        border-bottom: 1px solid var(--border);
+        background: var(--card);
+      }
+      h1 { margin: 0 0 4px 0; font-size: 20px; }
+      header p { margin: 0; color: var(--muted); font-size: 14px; }
+      main { padding: 24px 32px; max-width: 1200px; margin: 0 auto; }
+      form {
+        display: flex;
+        gap: 8px;
+        margin-bottom: 24px;
+        flex-wrap: wrap;
+      }
+      select, input[type=text], button {
+        font: inherit;
+        padding: 10px 14px;
+        border: 1px solid var(--border);
+        border-radius: 8px;
+        background: var(--card);
+      }
+      input[type=text] { flex: 1; min-width: 280px; }
+      button {
+        background: var(--accent);
+        color: white;
+        border-color: var(--accent);
+        cursor: pointer;
+      }
+      button:hover { background: #0284c7; }
+      .stats { color: var(--muted); font-size: 13px; margin-bottom: 16px; }
+      .answer-card {
+        padding: 16px 20px;
+        margin-bottom: 20px;
+        background: #f0fdf4;
+        border: 1px solid #bbf7d0;
+        border-radius: 12px;
+      }
+      .answer-card .label {
+        font-size: 11px;
+        text-transform: uppercase;
+        letter-spacing: 0.08em;
+        color: #15803d;
+        font-weight: 600;
+      }
+      .answer-card .text { font-size: 16px; line-height: 1.5; margin-top: 4px; }
+      .result {
+        display: grid;
+        grid-template-columns: 220px 1fr;
+        gap: 20px;
+        padding: 20px;
+        background: var(--card);
+        border: 1px solid var(--border);
+        border-radius: 12px;
+        margin-bottom: 16px;
+      }
+      .result img {
+        width: 100%;
+        border: 1px solid var(--border);
+        border-radius: 8px;
+        cursor: zoom-in;
+      }
+      .title { font-size: 16px; font-weight: 600; margin: 0 0 4px 0; }
+      .meta { font-size: 13px; color: var(--muted); margin-bottom: 8px; }
+      .scores {
+        font-family: ui-monospace, SFMono-Regular, monospace;
+        font-size: 12px;
+        color: var(--muted);
+        margin-bottom: 10px;
+      }
+      .snippet {
+        font-size: 14px;
+        line-height: 1.5;
+        color: var(--fg);
+        background: var(--bg);
+        padding: 10px 12px;
+        border-radius: 8px;
+        border: 1px solid var(--border);
+      }
+      .empty, .loading { color: var(--muted); padding: 12px 0; }
+      .tag {
+        display: inline-block;
+        padding: 2px 8px;
+        background: #e0f2fe;
+        color: #075985;
+        border-radius: 999px;
+        font-size: 12px;
+        font-weight: 500;
+        margin-right: 6px;
+      }
+    </style>
+  </head>
+  <body>
+    <header>
+      <h1>Multi-Tenant Visual Doc Search + QA</h1>
+      <p>ColQwen2.5 ranks pages by looking at the images. Florence-2-DocVQA reads the top page and answers the question. All on one SIE endpoint.</p>
+    </header>
+    <main>
+      <form id="searchForm">
+        <select id="clientSel"><option value="">All clients</option></select>
+        <input id="q" type="text" placeholder="e.g. Raspberry Pi Pico pinout GP21" autofocus />
+        <button type="submit">Search</button>
+      </form>
+      <div id="stats" class="stats"></div>
+      <div id="answer"></div>
+      <div id="results"></div>
+    </main>
+    <script>
+      const clientSel = document.getElementById("clientSel");
+      const form = document.getElementById("searchForm");
+      const q = document.getElementById("q");
+      const resultsEl = document.getElementById("results");
+      const answerEl = document.getElementById("answer");
+      const statsEl = document.getElementById("stats");
+
+      function escapeHtml(value) {
+        return String(value ?? "")
+          .replace(/&/g, "&amp;")
+          .replace(/</g, "&lt;")
+          .replace(/>/g, "&gt;")
+          .replace(/"/g, "&quot;");
+      }
+
+      async function loadStats() {
+        const r = await fetch("/api/stats").then(r => r.json());
+        for (const c of r.clients) {
+          const opt = document.createElement("option");
+          opt.value = c;
+          opt.textContent = c;
+          clientSel.appendChild(opt);
+        }
+        const rerank = r.visual_rerank ? "on" : "off";
+        statsEl.textContent =
+          `${r.total_pages} pages · ${r.clients.length} clients · ` +
+          `retriever=${r.models.retriever} · docvqa=${r.models.docvqa} · visual rerank=${rerank}`;
+      }
+
+      form.addEventListener("submit", async (e) => {
+        e.preventDefault();
+        const query = q.value.trim();
+        if (!query) return;
+        answerEl.innerHTML = "";
+        resultsEl.innerHTML = `<div class="loading">Searching…</div>`;
+        const params = new URLSearchParams({ q: query });
+        if (clientSel.value) params.set("client", clientSel.value);
+        const res = await fetch(`/api/search?${params}`).then(r => r.json());
+        if (res.answer) {
+          answerEl.innerHTML = `
+            <div class="answer-card">
+              <div class="label">Answer (Florence-2-DocVQA)</div>
+              <div class="text">${escapeHtml(res.answer)}</div>
+            </div>`;
+        }
+        if (!res.results.length) {
+          resultsEl.innerHTML = `<div class="empty">No results.</div>`;
+          return;
+        }
+        resultsEl.innerHTML = res.results.map(r => {
+          const rerank = r.scores.rerank == null ? "—" : r.scores.rerank;
+          return `
+          <div class="result">
+            <a href="${escapeHtml(r.page_image)}" target="_blank"><img src="${escapeHtml(r.page_image)}" alt="${escapeHtml(r.title)}"/></a>
+            <div>
+              <div class="title">${escapeHtml(r.title)}</div>
+              <div class="meta"><span class="tag">${escapeHtml(r.client)}</span> ${escapeHtml(r.citation)} · ${escapeHtml(r.publisher)}</div>
+              <div class="meta">${escapeHtml(r.license)} · <a href="${escapeHtml(r.source_url)}" target="_blank" rel="noreferrer">source PDF</a></div>
+              <div class="scores">maxsim=${r.scores.maxsim}   rerank=${rerank}</div>
+              ${r.ocr_snippet ? `<div class="snippet">${escapeHtml(r.ocr_snippet)}</div>` : ""}
+            </div>
+          </div>`;
+        }).join("");
+      });
+
+      loadStats();
+    </script>
+  </body>
+</html>