[EXPERIMENTAL][ADR-044] feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead#3729
Open
gandhipratik203 wants to merge 5 commits intomainfrom
Open
[EXPERIMENTAL][ADR-044] feat(grpc-uds): pure tonic handlers — eliminate Axum dispatch overhead#3729gandhipratik203 wants to merge 5 commits intomainfrom
gandhipratik203 wants to merge 5 commits intomainfrom
Conversation
…time (ADR-044) Replace the HTTP/JSON Python→Rust sidecar boundary with a typed protobuf contract over a Unix Domain Socket, as described in ADR-044. Rust side: - Add proto/mcp_runtime.proto defining the McpRuntime service (Invoke, InvokeStream, CloseSession, HealthCheck RPCs) - Add src/grpc.rs implementing McpRuntimeService using tonic; handlers convert proto McpRequest into http::Request and call the existing Axum router directly as a Tower service — no additional network hop - Add build.rs to compile the proto via tonic-build at build time - Gate all new deps behind the grpc-uds Cargo feature flag - Add MCP_RUST_GRPC_UDS env var to RuntimeConfig (src/config.rs) - Wire serve_grpc_uds into run() alongside the existing Axum servers Python side: - Add mcpgateway/transports/grpc_gen/ with generated pb2 stubs - Add rust_mcp_runtime_grpc_proxy.py: async gRPC proxy implementing the same ASGI interface as RustMCPRuntimeProxy but using grpc.aio over UDS - Add experimental_rust_mcp_runtime_grpc_uds config setting - Wire RustMCPRuntimeGrpcProxy into main.py: selected when MCP_RUST_GRPC_UDS is set; HTTP proxy remains the default Tests: - Add 15 unit tests covering header stripping, request construction, POST/GET/DELETE ASGI flows, fallback paths, and 502 error handling Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…uds feature Extends the Containerfile.lite cargo build logic to conditionally compile the Rust MCP runtime with the grpc-uds Cargo feature when --build-arg ENABLE_RUST_MCP_GRPC_UDS=true is passed. Features are combined so rmcp-upstream-client and grpc-uds can be enabled together. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ture is enabled tonic-build requires protoc at compile time to generate code from the proto file. Install the official protobuf release binary from GitHub before the cargo build step when ENABLE_RUST_MCP_GRPC_UDS=true. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
… and fix socket setup - Makefile: propagate ENABLE_RUST_MCP_GRPC_UDS_BUILD through docker-prod-rust-no-cache → container-build, add docker-prod-rust-grpc-uds convenience target, add GRPC_UDS_ARG to container-build - docker-compose.yml: pass MCP_RUST_GRPC_UDS and EXPERIMENTAL_RUST_MCP_RUNTIME_GRPC_UDS into gateway containers - docker-entrypoint.sh: unset MCP_RUST_GRPC_UDS when empty (prevents clap empty-string parse error) and mkdir -p the socket directory - mcp_runtime_pb2_grpc.py: lower GRPC_GENERATED_VERSION to 1.78.0 to match grpcio installed in the image Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ndler calls Eliminates the proto→http::Request→Tower/Axum→http::Response→proto round-trip that was the source of gRPC-UDS overhead vs HTTP/JSON IPC. McpRuntimeService now holds AppState instead of Router. The tonic service methods (invoke, invoke_stream, close_session) call rpc_inner, transport_get_inner, and transport_delete_inner directly, passing HeaderMap + Uri + Bytes constructed from the proto fields — no HTTP encoding on the hot path. Changes: - grpc.rs: remove proto_to_http_request, Tower oneshot dispatch; add build_headers/build_uri helpers; serve_grpc_uds takes AppState - lib.rs: mark _inner fns pub(crate); pass state.clone() to gRPC spawn; fix build_public_router to clone state so it remains available - Cargo.toml: remove explicit tower optional dep (was redundant via tonic) - tests/runtime.rs + lib.rs test_config: add missing grpc_uds_path field (pre-existing compile error when running tests with grpc-uds feature) Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
This was referenced Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #3730
Experiment to explore the ideas in ADR-044 using a pure tonic dispatch strategy — companion to #3726 which takes the Axum dispatch approach. Together these two PRs represent two different ways of implementing the same gRPC-over-UDS boundary.
This PR's approach:
McpRuntimeServiceholdsAppStatedirectly and callsrpc_inner,transport_get_inner,transport_delete_innerwithout re-encoding to HTTPproto→http::Request→Tower/Axum→http::Response→protoround-trip from every RPC calltoweroptional dependency (was redundant via tonic's transitive dep)ADR-044 Goals — What This Experiment Covers
ADR-044 goal coverage for this approach (pure tonic dispatch)
✅ Demonstrated
Typed contract via protobuf
The
.protofile is the single source of truth. Python and Rust both generate stubs from it. If a field is added or a method renamed, both sides fail to compile before anything ships. The HTTP/JSON IPC boundary had no schema — a wrong header name would silently do nothing.Language-neutral boundary
The
McpRuntimeproto contract does not care what is on either side. A Go module, a Java module, or a second Rust service could plug into the same contract today without any changes to the Python gateway. The HTTP/JSON IPC was Python-to-Rust only and relied on undocumented internal headers.Native streaming RPCs
InvokeStreamis a proper gRPC server-streaming RPC — not HTTP chunked encoding over a socket. The Python side gets an async iterator, the Rust side yields chunks. Backpressure, framing, and cancellation are handled correctly by construction.Unix Domain Socket — no TCP overhead
All traffic stays in kernel memory. No TCP handshake, no loopback routing, no port allocation. HTTP/2 multiplexing means multiple concurrent RPCs share one connection rather than opening new connections per request.
Clean process boundary — crash isolation
If the Rust sidecar panics, Python keeps running and the gRPC channel reconnects. The proto contract is the explicit, versioned API surface — not an internal HTTP path that could drift silently.
Aligns with the existing plugin gRPC pattern
The external plugin framework in
mcpgateway/plugins/framework/external/grpc/already uses gRPC. The MCP runtime boundary now uses the same pattern — one mental model for module boundaries across the platform.Independent scaling of modules
The process boundary exists — Rust is a separate binary on a separate socket. But in the current container setup they are co-located. The groundwork is there; exploiting it requires running the sidecar as a separate container or pod.
Structured auth context propagation
The
AuthContextproto message is a better contract than thex-contextforge-auth-contextheader string used in HTTP/JSON IPC. The current implementation encodes it as base64 JSON inside the proto field — which works, but does not yet use the proto's ability to carry it as proper typed fields.❌ Not yet demonstrated
Catalog change subscriptions and session broadcast
ADR-044 specifically calls out streaming RPCs for catalog change notifications and session broadcast patterns.
InvokeStreamcurrently handles only SSE relay. The push patterns the ADR envisioned are not implemented.Multi-language beyond Python and Rust
The value of language-neutral codegen becomes concrete when a third language plugs into the same contract. Right now it is two languages that could have communicated over HTTP just fine. The payoff is visible only when a Go or Java module joins.
Approach difference vs #3726
McpRuntimeServiceholdsRouterAppStatehttp::Request→ Tower oneshot → Axum →http::Response→ protoHeaderMap+Uri+Bytes→rpc_inner()→ protoproto_to_http_requestrouter.clone()per callBenchmark (125 users / 60s, same config as #3726)
+88 RPS and p99 halved (79ms → 37ms) vs the Axum dispatch approach. HTTP/JSON IPC still leads because Axum speaks HTTP natively — the irreducible cost here is protobuf serialization on both sides of the socket, which is unavoidable in gRPC.
Why HTTP/JSON IPC still leads
The remaining gap is structural to gRPC — protobuf serialization on Python ingress and Rust egress is unavoidable. The value of this transport is the typed contract, language-neutral codegen, and native streaming RPC — not raw single-request throughput. ADR-044 explicitly acknowledges this trade-off.
Flow diagrams: HTTP/JSON IPC vs gRPC-UDS pure tonic
HTTP/JSON IPC
gRPC-UDS IPC (pure tonic — this PR)
Old gRPC path had 6 steps with overhead. This has 4. Removed:
http::Requestconstruction, Toweroneshotdispatch,router.clone()per call.