Skip to content

Add Phase 5 cross-adapter verification suite (PR 18)#725

Open
prk-Jr wants to merge 29 commits into
feature/edgezero-pr17-cloudflare-adapterfrom
feature/edgezero-pr18-phase5-verification
Open

Add Phase 5 cross-adapter verification suite (PR 18)#725
prk-Jr wants to merge 29 commits into
feature/edgezero-pr17-cloudflare-adapterfrom
feature/edgezero-pr18-phase5-verification

Conversation

@prk-Jr
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr commented May 20, 2026

Summary

  • Implements the Phase 5 verification gate suite from issue Verification gates #499: route parity, auth parity, cross-adapter behavior, auction error-correlation, HTML golden tests, and Criterion benchmarks
  • Adds a dedicated parity test binary in crates/integration-tests that drives the Axum and Cloudflare adapters with identical requests in-process, proving behavioral equivalence before cutover
  • Establishes CI gates for both the parity suite and a benchmark smoke-run so regressions are caught on every PR

Changes

File Change
crates/trusted-server-adapter-cloudflare/tests/routes.rs 10 route smoke tests + 5 basic-auth parity tests + 4 admin key path tests
crates/trusted-server-adapter-axum/tests/routes.rs 5 basic-auth parity tests + 3 admin key path tests
crates/trusted-server-adapter-cloudflare/Cargo.toml Added base64 dev-dependency
crates/trusted-server-adapter-axum/Cargo.toml Added base64 dev-dependency
crates/integration-tests/Cargo.toml New [[test]] parity binary + adapter path deps + edgezero git deps
crates/integration-tests/tests/parity.rs New — 8 cross-adapter in-process parity tests (Axum vs Cloudflare)
crates/trusted-server-core/src/platform/http.rs PlatformResponse::backend_name unit tests (error-correlation, pre-EdgeZero #213)
crates/trusted-server-core/src/html_processor.rs 4 golden regression tests: injection position, URL rewriting, no double-inject, size bounds
crates/trusted-server-core/Cargo.toml Added [[bench]] html_processor_bench entry
crates/trusted-server-core/benches/html_processor_bench.rs New — Criterion benchmarks for 10 KB and 100 KB HTML processing
.github/workflows/test.yml New test-parity CI job + benchmark smoke step in test-axum job
docs/superpowers/plans/2026-05-20-pr18-phase5-verification.md Implementation plan

Closes

Closes #499

Test plan

  • cargo fmt --all -- --check
  • cargo clippy-axum
  • cargo test-axum (16 tests pass)
  • cargo test-cloudflare (22 tests pass)
  • cargo test --manifest-path crates/integration-tests/Cargo.toml --test parity (8 tests pass)
  • cargo test --lib -p trusted-server-core (956 tests pass)
  • cargo bench -p trusted-server-core --bench html_processor_bench -- --test (smoke passes)
  • cargo test-fastly (Viceroy-based — not run locally; covered by existing CI job)
  • JS tests / JS format / Docs format (no JS or docs changes)

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

prk-Jr added 15 commits May 20, 2026 20:32
Adds 10 tests to tests/routes.rs covering every explicitly registered
route plus the tsjs catch-all wildcard. Assertions are scoped to routing
only (not 404) for handlers that require live settings or outbound
connections, matching the pattern established by the existing auction test.
Verifies that /admin/* routes return 401 without credentials, include
a WWW-Authenticate: Basic realm=... header, and reject wrong credentials;
also confirms /.well-known and /auction are not gated by admin auth.
…enchmarks

- Add golden_script_tag_injected_at_head_start: verifies script tag is
  the first child of <head> with nothing between the opening tag and the
  injected <script>.
- Add golden_url_rewriting_replaces_origin_in_href: verifies origin host
  is fully replaced by proxy host in href/src attributes.
- Add golden_integration_script_is_not_double_injected: verifies the
  /static/tsjs= script tag appears exactly once.
- Add response_size_does_not_grow_disproportionately: verifies processed
  HTML stays within 2× of input size to catch buffer/double-processing bugs.
- Add Criterion benchmark (html_processor_bench) for process_chunk at
  10 KB and 100 KB payload sizes.
@prk-Jr prk-Jr self-assigned this May 20, 2026
prk-Jr added 13 commits May 20, 2026 21:43
The build script writes trusted-server-out.toml to ../../target/ relative
to crates/trusted-server-core/. When the test-parity CI job builds this
crate as a dependency from crates/integration-tests/ (workspace-excluded),
the workspace-root target/ directory may not yet exist, causing a panic.

Add fs::create_dir_all for the parent path before the write to handle
this case robustly.
The renamed tests duplicated coverage already provided by
admin_route_with_wrong_credentials_returns_401. Auth middleware rejects
any wrong credentials with 401 regardless of body content, so the extra
variants added no unique signal.
The previous comment described the wrong divergence (authenticated path).
For unauthenticated requests both adapters return 401. Add the missing
assert_eq!(axum_status, 401) and assert_eq!(axum_status, cf_status) so
the parity claim is actually verified for both adapters.
crates/integration-tests is workspace-excluded so cargo clippy --workspace
is blind to it. Add an explicit step so lint regressions in parity.rs
are caught on every PR.
2.0 was a magic number. Named constant with comment makes the bound
self-documenting: 2× covers injected script tag plus URL rewrites.
The setup-rust-toolchain action does not guarantee clippy is installed
when restoring a shared cache. Explicitly request the component so the
Clippy (parity test crate) step can find cargo-clippy.
Matches the convention used by the Axum adapter tests and parity tests.
Single-threaded tokio can miss races in middleware that spawns tasks.
Matches the five first-party route tests already present in the
Cloudflare adapter test suite. A silently removed route in the Axum
adapter now fails the test run instead of going undetected.
- Assert Axum also returns WWW-Authenticate header on 401 (was CF-only)
- Add admin_deactivate_unauthenticated_parity covering the deactivate path
- Rename cookie_behavior_note → publisher_proxy_fallback_parity (name
  now reflects what the test actually verifies)
- Fix expect("collect body") → expect("should collect body") per style guide
Adds inline comment so future maintainers know why the version is pinned
with `=` rather than a range constraint.
@prk-Jr prk-Jr requested a review from aram356 May 20, 2026 17:15
@prk-Jr prk-Jr requested a review from ChristianPavilonis May 20, 2026 17:15
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the three issues locally and opened stacked fix PR #739 with test/CI-only changes.

Blocking summary:

  • route smoke tests need to prove explicit registrations, not just non-404 responses behind wildcard fallbacks
  • cross-adapter parity tests need to compare critical header values, not just presence
  • CI should run clippy for the integration test crate with --all-targets so the test targets are actually linted

Comment thread crates/trusted-server-adapter-cloudflare/tests/routes.rs Outdated
Comment thread crates/integration-tests/tests/parity.rs Outdated
Comment thread crates/integration-tests/tests/parity.rs Outdated
Comment thread .github/workflows/test.yml Outdated
Route smoke tests — false positives via wildcard fallback:
- Add registered_routes() helper using RouterService::routes() for introspection
- Replace 9 individual assert_ne!(404) tests with all_explicit_routes_are_registered()
  which checks the route table directly; a removed explicit route now fails even
  when a wildcard fallback would have returned non-404

Cross-adapter parity tests — value equality not just presence:
- admin_rotate_unauthenticated_parity: extract both WWW-Authenticate header values
  and assert they are equal; assert the shared value starts with Basic
- admin_deactivate_unauthenticated_parity: same fix
- geo_header_parity_on_all_responses: compare X-Geo-Info-Available values across
  adapters rather than only checking presence; catches true vs false divergence

CI clippy:
- Add --all-targets to the integration-tests clippy invocation so test targets
  (tests/parity.rs, tests/routes.rs) are actually linted
@prk-Jr prk-Jr linked an issue May 28, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Phase 5 verification adds route smoke tests, cross-adapter parity tests, HTML golden tests, error-correlation unit tests, Criterion benchmarks, and CI gates. The architecture is right and the tests cover important contracts, but several assertions are weaker than they read — they pass against the current code yet would also pass against several plausible regressions the suite is supposed to catch. CI is green across all 11 checks; findings below address test rigor, not breakage.

A prior review by ChristianPavilonis (CHANGES_REQUESTED) flagged three issues. A stacked fix PR (#739) addresses them but is not merged into this branch yet, so those findings are still live. I confirm all three and add four more.

Blocking

🔧 wrench

  • Route smoke tests pass when explicit registration is removed — both adapters end their router with /{*rest} catch-all GET/POST. Smoke tests asserting != 404 cannot distinguish "explicit handler ran" from "wildcard fallback swallowed the request". Inline detail on cloudflare and axum sides. Same root cause for the basic-auth tests on admin routes: AuthMiddleware short-circuits with 401 regardless of whether the explicit handler is registered, so removing .post("/admin/keys/rotate", ...) would not fail any test in this PR.

  • Parity tests check presence, not value equality — three places use contains_key(...) where parity needs assert_eq!(axum_value, cf_value, ...). Inline detail on geo_header_parity_on_all_responses and the admin 401 parity tests.

  • CI clippy gate is a no-opcargo clippy --manifest-path crates/integration-tests/Cargo.toml -- -D warnings misses every [[test]] target (no --all-targets), and even with --all-targets the crate has no [lints] section so workspace lints (unwrap_used = deny, panic = deny) do not apply. The crate is workspace-excluded so lints.workspace = true may not resolve — verify or inline.

  • auth_middleware_runs_in_chain_for_protected_routes cannot prove what its name claims — the assertions hold even if AuthMiddleware is removed from the chain entirely. Inline detail at cloudflare/tests/routes.rs:43.

Non-blocking

🤔 thinking

  • Tokio exact pin is redundant with the lockfile and silently drifts=1.52.3 adds nothing the workspace-excluded Cargo.lock doesn't already enforce, but it WILL break when workspace tokio bumps and nobody remembers to edit this line.

  • MAX_GROWTH_FACTOR = 2.0 is too loose — observed growth is likely ≤1.05× for the test HTML; a regression that doubles script injection would still pass.

  • Discovery JSON parity is is_some == is_some — both adapters could return completely different JSON and the test passes.

♻️ refactor

  • Bench should add no-rewrite and rewrite-dense baselines to separate fixed pipeline overhead from rewrite cost.

  • Parity helpers rebuild the full router per request — fine now; will dominate test time as the suite grows. Consider a OnceCell<Arc<Router>> per adapter.

📌 out of scope

  • edgezero git rev is hardcoded in integration-tests/Cargo.toml separately from [workspace.dependencies]. Worth a follow-up CI check or docs note for the dual-update.

🌱 seedling

  • Parity coverage gaps worth a follow-up issue: response-body equality for /.well-known/trusted-server.json when both adapters succeed; Set-Cookie attribute parity (HttpOnly, Secure, SameSite, Max-Age); Content-Type header parity on success responses. Today's suite covers status + a couple headers; full parity needs the rest.

⛏ nitpick

  • axum_post and cf_post helpers are asymmetric — chained .expect() vs oneshot() patterns. Optional symmetrize.

CI Status

  • cargo fmt: PASS
  • cargo test (axum native): PASS
  • cargo test (cross-adapter parity): PASS
  • cargo test (workspace): PASS
  • cargo check (cloudflare native + wasm32-unknown-unknown): PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • vitest: PASS
  • format-docs / format-typescript: PASS

All gates green. The findings above are about what the gates would catch next, not what's broken now.

.body(edgezero_core::body::Body::from(r#"{"adUnits":[]}"#))
.expect("should build request");
let resp = router.oneshot(req).await;
assert_ne!(resp.status().as_u16(), 404, "/auction must be routed");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Route smoke test passes even when the explicit route registration is removed.

Both adapters register a /{*rest} catch-all GET/POST at the end of RouterService::builder() (see cloudflare/src/app.rs:366-367). With this asserting only status != 404, removing the explicit .post("/auction", auction_handler) registration would still pass: the request falls through to post_fallbackhandle_publisher_request → 5xx in CI (no origin), which is still != 404. The same false-positive applies to every smoke test in this file that asserts != 404 (verify_signature, admin/keys/, first_party/, /.well-known/trusted-server.json).

Suggested fix: Either introspect the router's registered routes directly, configure settings so the explicit handler returns a distinguishable status (e.g., 200) while the fallback returns 5xx, or inject a per-handler marker header to prove which handler actually ran.

This matches the prior reviewer's finding; PR #739 stacks a fix but is not merged into this branch.

resp.status().as_u16(),
404,
"/first-party/proxy must be routed"
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Same wildcard-fallback false positive as the Cloudflare smoke tests.

Axum's router registers /{*rest} at lines 365-366 of src/app.rs. If the explicit .get("/first-party/proxy", fp_proxy_handler) registration were removed, this test still passes because the catch-all routes to get_fallbackhandle_publisher_request, returning 5xx (still != 404).

Applies to every != 404 smoke test in this file. Same fix as the Cloudflare side.

assert!(
cf_headers.contains_key("x-geo-info-available"),
"Cloudflare: {method} {path} (status={cf_status}) must have X-Geo-Info-Available"
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Header presence check is too weak for a parity assertion.

The module-level contract is that critical header values match across adapters. This test only proves both adapters return some X-Geo-Info-Available header. If Axum returns true and Cloudflare returns false, the test still passes — exactly the kind of regression a parity suite must catch.

Suggested fix:

let axum_value = axum_headers.get("x-geo-info-available").and_then(|v| v.to_str().ok());
let cf_value = cf_headers.get("x-geo-info-available").and_then(|v| v.to_str().ok());
assert_eq!(
    axum_value, cf_value,
    "X-Geo-Info-Available value must match across adapters for {method} {path}"
);

Matches the prior reviewer's finding; PR #739 has the fix but is not merged here.

assert!(
cf_www_auth.starts_with("Basic realm="),
"Cloudflare 401 WWW-Authenticate must be Basic scheme: {cf_www_auth:?}"
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchWWW-Authenticate parity is checked unevenly.

Axum side asserts presence only (line 219); Cloudflare side asserts the value starts with "Basic realm=" (line 228). A regression where both adapters return WWW-Authenticate but with different realms (e.g., Basic realm="admin" vs Basic realm="foo") or different schemes (Basic vs Bearer) would not be detected as a parity failure.

Suggested fix: Extract both values, assert axum_www_auth == cf_www_auth to enforce strict parity, OR explicitly document the non-parity dimensions in a comment.

Same issue at admin_deactivate_unauthenticated_parity (both sides presence-only).

Comment thread .github/workflows/test.yml Outdated
run: cargo test --manifest-path crates/integration-tests/Cargo.toml --test parity

- name: Clippy (parity test crate)
run: cargo clippy --manifest-path crates/integration-tests/Cargo.toml -- -D warnings
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Clippy gate runs against nothing.

crates/integration-tests is a publish = false package whose only targets are [[test]] binaries — no [lib], no [[bin]]. Without --all-targets, cargo clippy lints nothing meaningful, so tests/parity.rs and tests/integration.rs are not actually gated.

There is also a second issue: crates/integration-tests/Cargo.toml has no [lints] section. Because the crate is workspace-excluded (root [workspace].exclude), it does not inherit the workspace's unwrap_used = deny / panic = deny / print_stdout = warn lints even with --all-targets. Both gaps must be closed for the gate to mean anything.

Suggested fix:

run: cargo clippy --manifest-path crates/integration-tests/Cargo.toml --all-targets -- -D warnings

Additionally, add [lints] to crates/integration-tests/Cargo.toml (inline the workspace lints — lints.workspace = true may not resolve in an excluded crate).

The prior reviewer flagged the --all-targets half; the [lints] inheritance gap is additional.

use std::sync::Arc;

// 2× accounts for the injected tsjs script tag plus URL attribute rewrites.
const MAX_GROWTH_FACTOR: f64 = 2.0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingMAX_GROWTH_FACTOR = 2.0 is too loose to catch real regressions.

Typical growth on the included html_processor.test.html is probably ≤1.05× (one injected script tag + a few URL rewrites). A regression that doubles script-tag injection or duplicates URL rewriting would land around 1.02-1.10× — still well under 2× — and this test would pass.

Suggested fix: Either tighten the bound to a value closer to observed real growth (e.g., 1.3×), or assert an absolute byte budget tied to the known script-tag length: assert!(output_size <= input_size + EXPECTED_INJECTION_BYTES + URL_REWRITE_DELTA_BYTES).

cf_json.is_some(),
"/.well-known/trusted-server.json body JSON-parsability must match across adapters \
(axum_status={axum_status} cf_status={cf_status})"
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingis_some parity is weaker than the parity goal.

If both adapters return JSON, this passes — even if the JSON bodies are completely different (one {"error":...}, one {"keys":[]}). The intent is parity of the discovery payload.

Suggested fix: Either compare top-level key sets (when both parse as JSON objects) or, if you control test settings, set up signing keys so both adapters return the success payload and assert payload equality. The current assertion only catches the case where one adapter returns JSON and the other returns HTML/text, which is unlikely to regress.

fn bench_html_processor(c: &mut Criterion) {
let mut group = c.benchmark_group("html_processor");

for size_kb in [10usize, 100] {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — Add baselines that isolate fixed vs variable cost.

Current bench measures one input shape. Two additional inputs help regressions stay visible:

  1. HTML with zero occurrences of origin_host — measures fixed pipeline overhead (rewriter init, lol_html parse).
  2. HTML with rewrite-dense content (many origin-host URLs per KB) — measures rewriting cost per occurrence.

A bug that adds per-element overhead or duplicates rewriter passes shows up clearly in the second baseline; one that adds fixed init cost shows up in the first.


/// Send a GET request to the Axum adapter and return (status, headers).
async fn axum_get(uri: &str) -> (u16, HeaderMap) {
let mut svc = EdgeZeroAxumService::new(AxumApp::routes());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — Helpers rebuild the full service per request.

Every axum_get / cf_get / axum_post / cf_post invocation calls AxumApp::routes() / CloudflareApp::routes(), which goes through the full settings + integration-registry boot path each time. With 8 tests × ~2 requests each, that's ~16+ router builds.

Not blocking now, but as the parity suite grows this will dominate test time. Consider a OnceCell<Arc<Router>> per adapter, scoped to the test binary.

}

/// Send a POST request to the Cloudflare adapter and return (status, headers, body bytes).
async fn cf_post(uri: &str, body: &str) -> (u16, HeaderMap, bytes::Bytes) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpickcf_post and axum_post are asymmetric.

axum_post unwraps three layers: .ready().await.expect().call().await.expect() then body collect.
cf_post (here) uses .oneshot() + into_bytes() synchronously.

The asymmetry isn't wrong, but reading the two paths side-by-side requires the reader to mentally translate. A pair of small helper traits or thin wrapper functions could symmetrize the call sites. Optional cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verification gates

3 participants