Skip to content

Add path-based asset backend proxy routing#668

Open
ChristianPavilonis wants to merge 6 commits into
mainfrom
feature/multi-backend-asset-proxy
Open

Add path-based asset backend proxy routing#668
ChristianPavilonis wants to merge 6 commits into
mainfrom
feature/multi-backend-asset-proxy

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 28, 2026

Summary

  • Add configurable [[proxy.asset_routes]] path-prefix routing so selected first-party asset paths can proxy to a different backend origin than publisher.origin_url.
  • Route matching is transparent for normal inbound requests, limited to GET/HEAD, uses longest-prefix-wins, and preserves the full incoming path/query while swapping only the origin.
  • Keep built-in and integration routes ahead of asset routes, and bypass the publisher consent/cookie/rewrite pipeline for matched asset requests by serving them as lean raw pass-through proxies.

Changes

File Change
docs/superpowers/specs/2026-04-28-multi-backend-asset-proxy-design.md Add the detailed design/spec for path-based multi-backend asset proxy routing.
crates/trusted-server-core/src/settings.rs Add ProxyAssetRoute, proxy.asset_routes config support, asset-route normalization/validation, duplicate-prefix warnings, and longest-prefix matching helpers.
crates/trusted-server-core/src/proxy.rs Add the raw asset proxy handler and target URL / Host header helpers for forwarding matched asset requests to alternate origins.
crates/trusted-server-adapter-fastly/src/main.rs Wire asset-route matching into top-level request routing after explicit routes and before publisher fallback.
crates/trusted-server-adapter-fastly/src/route_tests.rs Add route-precedence and consent-bypass tests for asset-route behavior.
trusted-server.toml Document the new [[proxy.asset_routes]] configuration shape with a commented example.

Closes

Closes #663

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve
  • Other:

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis marked this pull request as ready for review April 29, 2026 20:12
@aram356 aram356 assigned aram356 and ChristianPavilonis and unassigned aram356 Apr 29, 2026
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Path-based asset proxy routing closely follows the spec, with a clean separation from the publisher pipeline and a conservative forwarded-header set. Concerns are concentrated around (1) origin_url validation that lets path/query through silently, (2) missing test coverage for spec-stated invariants — most importantly that asset failures must not silently fall back to publisher.origin_url, and (3) one CLAUDE.md doc-comment compliance gap.

Blocking

🔧 wrench

  • Missing doc comments on ProxyAssetRoute (crates/trusted-server-core/src/settings.rs:336-339)
  • origin_url validation accepts URLs with path/query that are silently dropped at runtime (crates/trusted-server-core/src/settings.rs:740-766)
  • No test that asset-origin failures stop at 502 without falling back to publisher.origin_url (crates/trusted-server-adapter-fastly/src/route_tests.rs, around the asset_routes_* tests) — spec §9 forbids the fallback; the impl is correct but unpinned by tests.

Non-blocking

🤔 thinking

  • Strict-Transport-Security from asset origin forwarded unchanged (crates/trusted-server-core/src/proxy.rs:657-663) — same class of risk as upstream Set-Cookie, which is correctly stripped. Either strip HSTS too or document explicitly.
  • Spec-listed test gaps: HEAD passthrough, non-GET/HEAD bypass, redirect pass-through, query-string-ignored-for-matching. All small additions given existing helpers.
  • String-prefix matching can match non-segment boundaries (prefix = "/static" matches /staticfile.js). Spec is explicit, but every example uses a trailing /. Worth a sentence in the toml comment and field doc.

♻️ refactor

  • Idiomatic method-set check in route_request (crates/trusted-server-adapter-fastly/src/main.rs:153-156) — replace the match &method { &Method::GET | &Method::HEAD => ... } with matches!(...).then(...).flatten().
  • Test-stub duplication: StaticResponseHttpClient could move into platform::test_support by extending StubHttpClient with push_response_with_headers.
  • Validation split between Proxy::normalize and Proxy::prepare_runtime is inconsistent with the rest of Settings, which uses #[validate] attributes. Either consolidate or comment the rationale.

🌱 seedling

  • WASM heap pressure for large asset bodies (crates/trusted-server-adapter-fastly/src/platform.rs:223-237fastly_response_to_platform calls take_body_bytes(); crates/trusted-server-core/src/proxy.rs:657 re-buffers via set_body(Vec<u8>)). Same as PublisherResponse::PassThrough, so not a regression — but this PR is specifically targeting asset traffic, where bodies are routinely larger than HTML. Track as a streaming pass-through follow-up mirroring PublisherResponse::Stream. (Anchored in the body since platform.rs is unchanged in this PR.)

⛏ nitpick

  • Validation error wording (crates/trusted-server-core/src/settings.rs:716, validate_no_trailing_slash): the spec says "must not include a trailing slash"; the error message says "origin_url must not end with '/'". Aligning the wording reads slightly better in operator-facing errors. The function itself is unchanged in this PR but is reached via the new validate_proxy_origin_url. (Anchored in the body since the line is outside the diff.)
  • Redundant to_ascii_lowercase() on target_url.scheme() — already lowercase from URL parsing. See inline comment.
  • Set-Cookie strip test only verifies a single header is removed; should test multiple. See inline comment.

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS (858 tests, 0 failures locally; CI green)
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser & integration tests: PASS
  • CodeQL / Analyze: PASS

Comment thread crates/trusted-server-core/src/settings.rs
Comment thread crates/trusted-server-core/src/settings.rs
Comment thread crates/trusted-server-adapter-fastly/src/route_tests.rs
Comment thread crates/trusted-server-core/src/proxy.rs
Comment thread crates/trusted-server-adapter-fastly/src/route_tests.rs
Comment thread crates/trusted-server-adapter-fastly/src/main.rs Outdated
Comment thread crates/trusted-server-core/src/proxy.rs
Comment thread crates/trusted-server-core/src/settings.rs
Comment thread crates/trusted-server-core/src/proxy.rs Outdated
Comment thread crates/trusted-server-core/src/proxy.rs
@ChristianPavilonis ChristianPavilonis requested a review from aram356 May 5, 2026 12:43
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Tested proxying, it does work, however proxying to a s3 bucket that requires authentication isn't implemented.

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Asset-proxy routing closely follows the spec, every prior blocking comment has been addressed with a paired test, and CI is green. Two blockers remain: (1) path_pattern regex is recompiled on every request despite a working OnceLock pattern ten lines up in the same file, and (2) the spec checked in by this PR explicitly lists path rewrite as out of scope but the PR adds path rewrite. Two follow-up question/scope items, plus a handful of non-blockers around header handling and validation tightness.

Blocking

🔧 wrench

  • path_pattern regex recompiled on every request (crates/trusted-server-core/src/settings.rs:435-447) — Handler ten lines up caches via OnceLock. prepare_runtime already compiles for fail-fast and throws the result away. Hot-path WASM perf hole for the Cloudinary-style example shipped in trusted-server.toml.
  • Spec contradicts implementation in the same PR (docs/superpowers/specs/2026-04-28-multi-backend-asset-proxy-design.md:55-58) — spec says path rewrite and regex matching are out of scope, but the PR adds both. Update the spec or split the rewrite into a follow-up.

❓ question

  • Why is the publisher Host-override change in this PR? (crates/trusted-server-core/src/settings.rs:18-20) — separate publisher-pipeline feature stacked on top of asset-routing. Independent surface, independent rollback risk.

Non-blocking

🤔 thinking

  • validate_host_header_value is too permissive (crates/trusted-server-core/src/settings.rs:898-907) — passes spaces, slashes, query strings. Used both as a literal Host: header and to format a URL.
  • Clear-Site-Data not stripped from asset responses (crates/trusted-server-core/src/proxy.rs:660-661) — same threat class as HSTS / Set-Cookie which are correctly stripped.
  • X-Forwarded-For listed but always empty (crates/trusted-server-core/src/proxy.rs:28) — sanitize_forwarded_headers strips XFF at the edge before routing. Asset CDNs see Trusted Server's IP only.

♻️ refactor

  • Dead-code error path in target_path_for (crates/trusted-server-core/src/settings.rs:459-466) — unreachable .ok_or_else(...) with a misleading error message.
  • from_url_with_first_byte_timeout_and_override_host (crates/trusted-server-core/src/backend.rs:303-318) — 51-character method name; the API is asking for a builder. Out of scope for this PR.

🌱 seedling

  • Backend-name collision via replace(['.', ':'], \"_\") (crates/trusted-server-core/src/backend.rs:144-149) — operator-controlled so practical risk is near-zero, but a hash or non-replacing escape would be more defensive.

⛏ nitpick

  • matched_asset_route computed eagerly (crates/trusted-server-adapter-fastly/src/main.rs:153-155) — only used in the catch-all branch. Move inside the _ => arm.
  • Double Host header in publisher.rs (crates/trusted-server-core/src/publisher.rs:528) — BackendConfig::override_host is already authoritative; the manual set_header is redundant.
  • POST-bypass test could be stronger (crates/trusted-server-adapter-fastly/src/route_tests.rs:399-410) — assert_ne!(OK) is true for any non-200; pin BAD_GATEWAY like the other tests.

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS (875 tests on this worktree, 0 failures)
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser & integration tests: PASS
  • CodeQL / Analyze: PASS

Comment thread crates/trusted-server-core/src/settings.rs
Comment thread docs/superpowers/specs/2026-04-28-multi-backend-asset-proxy-design.md Outdated
Comment thread crates/trusted-server-core/src/settings.rs Outdated
Comment thread crates/trusted-server-core/src/settings.rs Outdated
Comment thread crates/trusted-server-core/src/proxy.rs Outdated
Comment thread crates/trusted-server-core/src/settings.rs Outdated
Comment thread crates/trusted-server-core/src/backend.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/main.rs Outdated
Comment thread crates/trusted-server-core/src/publisher.rs
Comment thread crates/trusted-server-adapter-fastly/src/route_tests.rs
@ChristianPavilonis ChristianPavilonis force-pushed the feature/multi-backend-asset-proxy branch 2 times, most recently from 8c8a51a to b4f13f9 Compare May 11, 2026 17:16
@aram356 aram356 marked this pull request as draft May 26, 2026 15:46
@ChristianPavilonis ChristianPavilonis marked this pull request as ready for review May 27, 2026 20:10
@ChristianPavilonis ChristianPavilonis force-pushed the feature/multi-backend-asset-proxy branch from 08b7892 to 62758de Compare May 27, 2026 20:11
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Asset-route routing is solid and every prior blocking comment is addressed with paired tests; CI is green. Since the May 8 review the PR has expanded with four new commits adding S3 SigV4 auth, Fastly Image Optimizer, profile-table transformations, and a new /admin/debug/s3-objects endpoint — ~3,500 net lines on top of the original asset-routing change. Concerns are concentrated around (1) PR scope — three features in one PR with two separate design docs; (2) per-request S3 credential hostcalls on the hot path; (3) a hand-rolled XML parser for the new debug endpoint; and (4) one open question about preflighting S3 HEAD before the IO GET.

Blocking

🔧 wrench

  • Massive scope creep — split the PR. Title is "Add path-based asset backend proxy routing" but the PR ships three features and two design docs (2026-04-28-multi-backend-asset-proxy-design.md, 2026-05-19-asset-s3-auth-fastly-io-design.md). Recommend landing the original #663-scoped asset-routing change (commits a4da6a63, 0c5495f6) here, and splitting S3+IO and the s3-objects debug endpoint into separate PRs. This is the same class of concern as the May 8 question about the publisher origin_host_header bundling — just larger.
  • S3 credentials re-read on every asset request. 6 secret-store hostcalls per IO+S3 happy-path image. See inline comment on crates/trusted-server-core/src/proxy.rs:588.
  • Hand-rolled XML parser for S3 ListObjectsV2. Naive find()-based parsing with no attribute/namespace/CDATA handling. See inline comment on crates/trusted-server-core/src/proxy.rs:1264.

❓ question

  • Why preflight S3 with HEAD before the IO GET? Doubles round-trips on every cache miss. See inline comment on crates/trusted-server-core/src/proxy.rs:715.

Non-blocking

🤔 thinking

  • S3 bucket enumeration via debug.s3_list_endpoint_enabled — no audit logging; admin-cred compromise becomes bucket-wide walkable. Inline on crates/trusted-server-core/src/proxy.rs:937.
  • origin_query_policy precedence is undocumented. Inline on crates/trusted-server-core/src/settings.rs:1181.

♻️ refactor

  • Extract S3 list-objects code from proxy.rs to its own module. Inline on crates/trusted-server-core/src/proxy.rs:877.
  • Inline mark_asset_origin_error_uncacheable. Inline on crates/trusted-server-core/src/proxy.rs:698.
  • unwrap_or_default() inconsistent with ?-propagating sibling. Inline on crates/trusted-server-core/src/proxy.rs:949.

🌱 seedling / 📌 out of scope

  • Cache the SigV4 signing key. Inline on crates/trusted-server-core/src/s3_sigv4.rs:232.
  • Asset-proxy observability follow-up — cache hit rate, latency p50/p99 per route prefix, error rates by status code. Track for a follow-up PR/issue; nothing currently distinguishes asset-proxy responses from publisher responses in metrics/logs.
  • Commit e76a38ed (s3 debug) message violates CLAUDE.md conventions (not descriptive, not sentence case, not imperative). Squash before merge or rebase to a proper message such as "Add authenticated S3 list-objects debug endpoint".

⛏ nitpick

  • Dead if path.is_empty() branch in canonical_uri. Inline on crates/trusted-server-core/src/s3_sigv4.rs:160.
  • should_preflight_s3_origin_for_image_optimizershould_preflight_s3. Inline on crates/trusted-server-core/src/proxy.rs:705.

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS (1,059 tests, 0 failures locally; CI green)
  • wasm32-wasip1 build: PASS (Fastly adapter)
  • vitest / format-typescript / format-docs / browser & integration tests / CodeQL: PASS

})
}

fn apply_asset_origin_auth(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — S3 credentials are re-read from the secret store on every asset request, and the IO+S3 happy path pays this cost twice (HEAD preflight + GET).

apply_asset_origin_auth runs 2–3 SecretStore::get_string lookups per call: each is a WASI hostcall (open + try_get + try_plaintext). On the IO+S3 happy path both preflight_s3_origin_for_image_optimizer and the main GET call this helper, totaling 6 hostcalls per image asset. Asset routes are exactly where traffic is high-volume.

Fix: cache loaded credentials in a OnceLock<S3Credentials> keyed by (secret_store, access_key_id, secret_access_key, session_token) inside RuntimeServices (or a module-local cache). AWS keys rotate on the order of days, so per-process caching is safe. As a follow-on, cache the derived per-day signing_key() (4 HMAC ops) as well — see the seedling on s3_sigv4.rs:232.

Ok(Response::from_status(fastly::http::StatusCode::FOUND)
.with_header(header::LOCATION, &redirect_target)
.with_header(header::CACHE_CONTROL, "no-store, private"))
fn xml_element_bodies<'a>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Hand-rolled XML parser for S3 ListObjectsV2 responses is fragile.

xml_element_bodies does naive find("<Tag>") / find("</Tag>"). It does not handle XML attributes (<Tag xmlns="...">), namespaces, CDATA, comments, self-closing tags, or whitespace inside the opening tag. It works for the current S3 response shape but is brittle to AWS XML format changes, and the test only exercises the happy path.

Fix: depend on quick-xml (WASM-compatible) or another battle-tested XML parser instead of hand-rolling. As a cleanup follow-on, the ~150 lines of S3 list/XML logic cohabit with HTTP-proxy code in a file that is already 4000+ lines — see the refactor comment on line 877.

&& (request_method == Method::GET || request_method == Method::HEAD)
}

async fn preflight_s3_origin_for_image_optimizer(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Why preflight S3 with HEAD before the IO GET on the happy path? This doubles round-trips for every IO+S3 cache miss.

The code/tests motivate this as "return raw S3 errors before image optimization" (see handle_asset_proxy_request_returns_raw_s3_error_before_image_optimizer). But on the common cache-miss path we now pay one extra signed round-trip on every request to improve UX only on the failure path.

Was this validated with a benchmark or product decision? If not, consider alternatives: only preflight on the first request to a given backend, only when Fastly IO would otherwise mask the error, or behind a per-route opt-in flag.

/// be selected, credentials cannot be read, signing/backend setup fails, the
/// upstream `S3` request fails, or the successful `S3` XML response cannot be
/// parsed/serialized.
pub async fn handle_s3_list_objects_debug(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingdebug.s3_list_endpoint_enabled exposes full S3 bucket enumeration to anyone with admin credentials.

The endpoint is gated by basic auth + the flag, but it accepts prefix="" and max_keys=1000, allowing iterated bucket-wide enumeration via continuation tokens. If admin credentials leak the entire bucket is walkable. There is also no audit log entry for the listing query — we know auth succeeded, but not what prefix or page was requested.

Suggestion: add a log::info! recording the requested route_prefix, prefix, max_keys, and continuation_token, and document the operational risk in docs/guide/asset-routes.md.


/// Return the effective origin query policy for this asset route.
#[must_use]
pub fn origin_query_policy(&self) -> OriginQueryPolicy {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Undocumented precedence: auth.origin_query overrides image_optimizer.origin_query.

An operator who sets image_optimizer.origin_query = "strip" may not realize auth.origin_query = "preserve" would override it (or vice versa). The IO enabled + Preserve = error invariant in prepare_runtime saves us at config time, but the precedence is not called out in the doc comments here or in the spec.

Fix: add a sentence to the method-level rustdoc explaining the priority order, and mirror it in docs/guide/asset-routes.md.

}
}

async fn proxy_with_redirects(
fn mark_asset_origin_error_uncacheable(response: &mut Response) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — One-line helper wrapping a single set_header call.

mark_asset_origin_error_uncacheable is a four-line function whose body is one set_header. Inline at the two call sites, or rename to something more general like apply_no_store_cache_control if you keep it.

response.set_header(
    header::CACHE_CONTROL,
    HeaderValue::from_static("no-store, private"),
);

let query = parse_s3_list_objects_debug_query(&req)?;
let route = select_s3_list_objects_debug_route(settings, query.route_prefix.as_deref())?;
let target_url = build_s3_list_objects_v2_url(route, &query)?;
let origin_host = target_url.host_str().unwrap_or_default().to_string();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorunwrap_or_default() here is inconsistent with the ?-propagating sibling.

asset_origin_host_header on line 567 errors when host_str() is None; here we silently default to an empty string. The URL has been validated upstream so both paths are unreachable in practice, but the inconsistency makes audit harder.

Fix: propagate the error like the sibling helper, or extract a shared require_host_str(&url) helper used by both call sites.

mac.finalize().into_bytes().to_vec()
}

fn signing_key(secret_access_key: &str, date_stamp: &str, region: &str) -> Vec<u8> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — Cache the per-day signing key.

signing_key() runs 4 HMAC-SHA256 operations per request to derive the signing key. The result only changes when (date_stamp, region, secret_access_key) changes — i.e., daily under normal credential rotation. A OnceLock<(date_stamp, region, key_bytes)> (or a RefCell-based per-process cache) would save those 4 ops on every signed call. Pair this with the credential-caching wrench on proxy.rs:588 to remove most of the per-request signing cost.


fn canonical_uri(url: &Url) -> String {
let path = url.path();
if path.is_empty() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpickif path.is_empty() branch is unreachable.

url::Url::path() always returns at least / for absolute URLs (which is the only kind we ever construct here, since Url::parse of a relative URL fails earlier). Either drop the branch or add a comment explaining defensive intent.

);
}

fn should_preflight_s3_origin_for_image_optimizer(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpickshould_preflight_s3_origin_for_image_optimizer reads as a sentence; the function arguments already convey the IO/method preconditions. should_preflight_s3 is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

As publisher i want to support proxying assets from different backend from content backend

2 participants