Skip to content

[EXPERIMENTAL][ADR-044] feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead#3729

Open
gandhipratik203 wants to merge 5 commits intomainfrom
feat/grpc-uds-pure-tonic
Open

[EXPERIMENTAL][ADR-044] feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead#3729
gandhipratik203 wants to merge 5 commits intomainfrom
feat/grpc-uds-pure-tonic

Conversation

@gandhipratik203
Copy link
Collaborator

@gandhipratik203 gandhipratik203 commented Mar 18, 2026

Summary

Closes #3730

Experiment to explore the ideas in ADR-044 using a pure tonic dispatch strategy — companion to #3726 which takes the Axum dispatch approach. Together these two PRs represent two different ways of implementing the same gRPC-over-UDS boundary.

This PR's approach:

  • McpRuntimeService holds AppState directly and calls rpc_inner, transport_get_inner, transport_delete_inner without re-encoding to HTTP
  • Removes the proto→http::Request→Tower/Axum→http::Response→proto round-trip from every RPC call
  • Removes explicit tower optional dependency (was redundant via tonic's transitive dep)

ADR-044 Goals — What This Experiment Covers

Click to see which goals this implementation demonstrates, which are partial, and which remain future work.

ADR-044 goal coverage for this approach (pure tonic dispatch)

✅ Demonstrated

Typed contract via protobuf
The .proto file is the single source of truth. Python and Rust both generate stubs from it. If a field is added or a method renamed, both sides fail to compile before anything ships. The HTTP/JSON IPC boundary had no schema — a wrong header name would silently do nothing.

Language-neutral boundary
The McpRuntime proto contract does not care what is on either side. A Go module, a Java module, or a second Rust service could plug into the same contract today without any changes to the Python gateway. The HTTP/JSON IPC was Python-to-Rust only and relied on undocumented internal headers.

Native streaming RPCs
InvokeStream is a proper gRPC server-streaming RPC — not HTTP chunked encoding over a socket. The Python side gets an async iterator, the Rust side yields chunks. Backpressure, framing, and cancellation are handled correctly by construction.

Unix Domain Socket — no TCP overhead
All traffic stays in kernel memory. No TCP handshake, no loopback routing, no port allocation. HTTP/2 multiplexing means multiple concurrent RPCs share one connection rather than opening new connections per request.

Clean process boundary — crash isolation
If the Rust sidecar panics, Python keeps running and the gRPC channel reconnects. The proto contract is the explicit, versioned API surface — not an internal HTTP path that could drift silently.

Aligns with the existing plugin gRPC pattern
The external plugin framework in mcpgateway/plugins/framework/external/grpc/ already uses gRPC. The MCP runtime boundary now uses the same pattern — one mental model for module boundaries across the platform.


⚠️ Partially demonstrated

Independent scaling of modules
The process boundary exists — Rust is a separate binary on a separate socket. But in the current container setup they are co-located. The groundwork is there; exploiting it requires running the sidecar as a separate container or pod.

Structured auth context propagation
The AuthContext proto message is a better contract than the x-contextforge-auth-context header string used in HTTP/JSON IPC. The current implementation encodes it as base64 JSON inside the proto field — which works, but does not yet use the proto's ability to carry it as proper typed fields.


❌ Not yet demonstrated

Catalog change subscriptions and session broadcast
ADR-044 specifically calls out streaming RPCs for catalog change notifications and session broadcast patterns. InvokeStream currently handles only SSE relay. The push patterns the ADR envisioned are not implemented.

Multi-language beyond Python and Rust
The value of language-neutral codegen becomes concrete when a third language plugs into the same contract. Right now it is two languages that could have communicated over HTTP just fine. The payoff is visible only when a Go or Java module joins.

Approach difference vs #3726

#3726 (Axum dispatch) This PR (pure tonic)
McpRuntimeService holds Router AppState
Hot path proto → http::Request → Tower oneshot → Axum → http::Response → proto proto → HeaderMap + Uri + Bytesrpc_inner() → proto
proto_to_http_request exists removed
router.clone() per call yes gone
Tower dispatch yes gone

Benchmark (125 users / 60s, same config as #3726)

Metric Python HTTP/JSON IPC gRPC-UDS #3726 (Axum dispatch) gRPC-UDS this PR (pure tonic)
RPS 512 1,930 1,690 1,778
Avg latency 167ms 2.84ms 11.51ms 8.71ms
p99 610ms 11ms 79ms 37ms
Failures 24 0 ✅ 0 ✅ 0 ✅

+88 RPS and p99 halved (79ms → 37ms) vs the Axum dispatch approach. HTTP/JSON IPC still leads because Axum speaks HTTP natively — the irreducible cost here is protobuf serialization on both sides of the socket, which is unavoidable in gRPC.

Why HTTP/JSON IPC still leads

The remaining gap is structural to gRPC — protobuf serialization on Python ingress and Rust egress is unavoidable. The value of this transport is the typed contract, language-neutral codegen, and native streaming RPC — not raw single-request throughput. ADR-044 explicitly acknowledges this trade-off.

Flow diagrams: HTTP/JSON IPC vs gRPC-UDS pure tonic

HTTP/JSON IPC

Python gateway                  Unix socket              Rust sidecar
─────────────────────────────────────────────────────────────────────

  [incoming MCP request]
        │
        │  Build HTTP request
        │  (headers already exist,
        │   body already bytes)
        │
        ▼
  ┌─────────────┐
  │  httpx      │ ──── raw HTTP bytes ──────────────────► Axum router
  │  AsyncClient│                                              │
  └─────────────┘                                              │
                                                         (Axum speaks
                                                          HTTP natively,
        ◄──────────── raw HTTP response bytes ───────────  no conversion)
        │
  [send response to client]

gRPC-UDS IPC (pure tonic — this PR)

Python gateway                  Unix socket              Rust sidecar
─────────────────────────────────────────────────────────────────────

  [incoming MCP request]
        │
        │  1. Serialize to protobuf          ← still needed
        │     (headers map, body bytes,
        │      auth context, session id...)
        │
        ▼
  ┌─────────────┐
  │  grpc.aio   │ ──── protobuf bytes ──────────────────► tonic server
  │  stub       │                                              │
  └─────────────┘                                    2. Deserialize protobuf
                                                        into HeaderMap
                                                        + Uri + Bytes
                                                              │
                                                              │  ← no http::Request
                                                              │  ← no Tower dispatch
                                                              │  ← no router.clone()
                                                              │
                                                              ▼
                                                         rpc_inner()
                                                         directly
                                                              │
                                                     3. Serialize response
                                                        headers + body
                                                        to protobuf
                                                              │
        ◄──────────────── protobuf bytes ─────────────────────
        │
  4. Deserialize protobuf                             ← still needed
        │
  [send response to client]

Old gRPC path had 6 steps with overhead. This has 4. Removed: http::Request construction, Tower oneshot dispatch, router.clone() per call.

…time (ADR-044)

Replace the HTTP/JSON Python→Rust sidecar boundary with a typed protobuf
contract over a Unix Domain Socket, as described in ADR-044.

Rust side:
- Add proto/mcp_runtime.proto defining the McpRuntime service (Invoke,
  InvokeStream, CloseSession, HealthCheck RPCs)
- Add src/grpc.rs implementing McpRuntimeService using tonic; handlers
  convert proto McpRequest into http::Request and call the existing Axum
  router directly as a Tower service — no additional network hop
- Add build.rs to compile the proto via tonic-build at build time
- Gate all new deps behind the grpc-uds Cargo feature flag
- Add MCP_RUST_GRPC_UDS env var to RuntimeConfig (src/config.rs)
- Wire serve_grpc_uds into run() alongside the existing Axum servers

Python side:
- Add mcpgateway/transports/grpc_gen/ with generated pb2 stubs
- Add rust_mcp_runtime_grpc_proxy.py: async gRPC proxy implementing the
  same ASGI interface as RustMCPRuntimeProxy but using grpc.aio over UDS
- Add experimental_rust_mcp_runtime_grpc_uds config setting
- Wire RustMCPRuntimeGrpcProxy into main.py: selected when
  MCP_RUST_GRPC_UDS is set; HTTP proxy remains the default

Tests:
- Add 15 unit tests covering header stripping, request construction,
  POST/GET/DELETE ASGI flows, fallback paths, and 502 error handling

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…uds feature

Extends the Containerfile.lite cargo build logic to conditionally compile
the Rust MCP runtime with the grpc-uds Cargo feature when
--build-arg ENABLE_RUST_MCP_GRPC_UDS=true is passed. Features are combined
so rmcp-upstream-client and grpc-uds can be enabled together.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ture is enabled

tonic-build requires protoc at compile time to generate code from the
proto file. Install the official protobuf release binary from GitHub
before the cargo build step when ENABLE_RUST_MCP_GRPC_UDS=true.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
… and fix socket setup

- Makefile: propagate ENABLE_RUST_MCP_GRPC_UDS_BUILD through
  docker-prod-rust-no-cache → container-build, add docker-prod-rust-grpc-uds
  convenience target, add GRPC_UDS_ARG to container-build
- docker-compose.yml: pass MCP_RUST_GRPC_UDS and
  EXPERIMENTAL_RUST_MCP_RUNTIME_GRPC_UDS into gateway containers
- docker-entrypoint.sh: unset MCP_RUST_GRPC_UDS when empty (prevents
  clap empty-string parse error) and mkdir -p the socket directory
- mcp_runtime_pb2_grpc.py: lower GRPC_GENERATED_VERSION to 1.78.0 to
  match grpcio installed in the image

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ndler calls

Eliminates the proto→http::Request→Tower/Axum→http::Response→proto
round-trip that was the source of gRPC-UDS overhead vs HTTP/JSON IPC.

McpRuntimeService now holds AppState instead of Router. The tonic
service methods (invoke, invoke_stream, close_session) call rpc_inner,
transport_get_inner, and transport_delete_inner directly, passing
HeaderMap + Uri + Bytes constructed from the proto fields — no HTTP
encoding on the hot path.

Changes:
- grpc.rs: remove proto_to_http_request, Tower oneshot dispatch; add
  build_headers/build_uri helpers; serve_grpc_uds takes AppState
- lib.rs: mark _inner fns pub(crate); pass state.clone() to gRPC spawn;
  fix build_public_router to clone state so it remains available
- Cargo.toml: remove explicit tower optional dep (was redundant via tonic)
- tests/runtime.rs + lib.rs test_config: add missing grpc_uds_path field
  (pre-existing compile error when running tests with grpc-uds feature)

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
@gandhipratik203 gandhipratik203 marked this pull request as ready for review March 18, 2026 19:31
@gandhipratik203 gandhipratik203 changed the title feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead (ADR-044) [EXPERIMENTAL][ADR-044] feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead Mar 18, 2026
@gandhipratik203 gandhipratik203 added experimental Experimental features, test proposed MCP Specification changes rust Rust programming mcp-protocol Alignment with MCP protocol or specification performance Performance related items labels Mar 19, 2026
@crivetimihai crivetimihai added the COULD P3: Nice-to-have features with minimal impact if left out; included if time permits label Mar 20, 2026
@crivetimihai crivetimihai added this to the Release 1.3.0 milestone Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

COULD P3: Nice-to-have features with minimal impact if left out; included if time permits experimental Experimental features, test proposed MCP Specification changes mcp-protocol Alignment with MCP protocol or specification performance Performance related items rust Rust programming

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE][RUST]: ADR-044 gRPC-over-UDS module communication boundary — POC

2 participants