Add EdgeZero canary routing (rollout_pct flag + FNV-1a bucket assignment) (PR 19)#727
Conversation
…on' into feature/edgezero-pr14-entry-point-dual-path
…dispatch_unregistered_method_returns_405_at_router_level
Incorporates the round-3 review fixes from PR14 (E2E tests, finalize-header coverage for unregistered methods, allowed_domains documentation) and the broader PR14 refactoring (EC ID rename from synthetic_id, kv_ops removal from handle_auction, runtime_services_for_consent_route consent KV approach, async tokio tests in proxy/orchestrator, PublisherResponse streaming). Conflict resolutions: - Take PR14's refactored core files (endpoints, orchestrator, consent, publisher, proxy, platform/mod, testlight) - Keep PR15's deletion of core/compat.rs and storage/mod.rs (Fastly code stays in adapter, not core) - Add from_fastly_response to adapter compat.rs (needed by EdgeZero path for entry-point finalize-header wrap introduced in PR14) - Remove obsolete FastlyConsentKvStore / open_consent_kv / ConsentKvOps references from platform.rs and app.rs (trait removed in PR14) - Fix orchestrator backend_name call to pass services (PR15 trait signature) - Add minimal backend.rs and storage/mod.rs stubs to core for DEFAULT_FIRST_BYTE_TIMEOUT and storage::kv_store module path - Add tokio as dev-dependency to trusted-server-core for async tests
Conflict resolutions: - Cargo.lock: regenerated after adding futures dep - docs/guide/testing.md: kept Axum adapter test command (PR16) and EC ID rename + specific-test example (PR15) - .cargo/config.toml: kept PR16 version (no WASM default target) because trusted-server-adapter-axum is now a workspace member and tokio/reqwest cannot compile to wasm32-wasip1 - AGENTS.md: updated cargo test instruction to document both required commands now that the workspace contains both WASM-only and native-only crates API breakage from PR15 (handle_publisher_request now returns PublisherResponse): - Add resolve_publisher_response() helper mirroring the Fastly adapter pattern (Buffered / Stream / PassThrough -> Response) - Drop the removed None arg from handle_auction and handle_publisher_request call sites
PR15 merge — modifications to existing files not captured in the merge commit: - Fastly adapter: migrate to EdgeZero router/middleware, rewrite management API, update platform/compat/logging/backend to new EdgeZero abstractions - Core: remove KV-backed consent layer (consent/kv.rs deleted), update auction orchestrator and integrations to new API surface, refactor html_processor, cookies, and error types - Delete stale Fastly KV store test fixtures (counter_store.json, opid_store.json) - Update fastly.toml, trusted-server.toml, and integration test config to match PR16 workspace restructuring: - Add both adapters to workspace members; set default-members=[core, axum] so bare `cargo test` runs natively without WASM target conflicts - Replace test-wasm alias with test-fastly (--workspace --exclude axum --target wasm32-wasip1) and keep test-axum (-p trusted-server-adapter-axum) - Update CI: cargo test-fastly in the Fastly job, cargo test-axum in native job - Axum admin routes (/admin/keys/rotate, /admin/keys/deactivate) return 501 instead of falling through to an error handler - Update CLAUDE.md, AGENTS.md, README, and all guide docs to use new aliases
Resolved conflicts: - .cargo/config.toml: extend test-fastly alias to also exclude trusted-server-adapter-cloudflare - .github/workflows/test.yml: use cargo test-fastly alias in CI (from PR16) and keep test-cloudflare job (from PR17) - CLAUDE.md: keep Cloudflare workspace entry and cargo test-axum alias; merge both adapters' commands - crates/trusted-server-core/src/proxy.rs: use #[tokio::test] + std::time pattern (consistent with PR16 tests) - Cargo.lock: regenerated to reflect merged workspace state
Viceroy internally runs `cargo run --bin trusted-server-adapter-fastly` against the default-run packages. With core + axum as defaults, Cargo fails to find the binary. Restricting default-members to the fastly adapter fixes the lookup. Updated stale comments in CLAUDE.md and .cargo/config.toml to reflect the new setup.
Brings in the default-members fix: restricts workspace default-members to `trusted-server-adapter-fastly` so Viceroy can locate the binary via `cargo run --bin` during `cargo test-fastly`.
…feature/edgezero-pr12-handler-layer-types
…/edgezero-pr13-integration-provider-type-migration
Blocking fixes: - Remove debug_assert! in compat::to_fastly_response that contradicted the documented silent-drop fallback and caused to_fastly_response_with_streaming_body_produces_empty_body to panic in debug builds - Restore streaming dispatch for PublisherResponse::Stream, which was regressed by the PR 11 compat shim. Introduce HandlerOutcome enum so route_request can signal streaming intent back to the adapter. main() handles HandlerOutcome::Streaming via to_fastly_response_skeleton + stream_to_client + stream_publisher_body directly, avoiding the Vec<u8> buffer that delayed TTFB and scaled WASM memory with body size. Non-blocking fixes: - Add to_fastly_response_skeleton for headers-only fastly response conversion - Add geo-401 comment: skip geo lookup for unauthenticated callers - Add audited-callers comment on EdgeBody::Stream arms in compat - Add O(N) / PR 15 comment on bytes.to_vec() calls - Deduplicate auction/publisher status extraction in route_tests via outcome_status helper
Resolve conflicts in main.rs, compat.rs, auction/endpoints.rs, cookies.rs, edge_cookie.rs, migration_guards.rs, publisher.rs, and request_signing/endpoints.rs. Includes main's JA4 debug endpoint, Sourcepoint integration, and Prebid bid param override rules. Removed unused try_build_ec_cookie_value and get_ec_id_from_http_request stubs surfaced by clippy after merge.
StreamingBody in fastly 0.11.12 has no Drop impl — dropping it without calling finish() leaves the chunked HTTP stream with no terminal chunk, causing clients to receive "unexpected EOF during chunk size line". Match on stream_publisher_body result and call finish() in both the success and error paths so the response is always properly terminated.
…er' into feature/edgezero-pr19-cutover-canary
EDGEZERO_LOG_LEVEL is not propagated into the Compute guest by fastly compute serve, so the documented local route-decision validation path never raised the level. Detect the Viceroy environment through the guest-visible FASTLY_HOSTNAME=localhost signal and auto-raise to debug there, keeping production at the safe Info default. The explicit env override is retained for harnesses that populate the guest environment.
max_level was applied only to the inner log_fastly Logger, so the outer fern Dispatch still ran timestamp and format construction for records the child logger later dropped. Set the same level on the Dispatch so below-threshold records are filtered before formatting at production QPS.
Update the migration runbook and implementation plan to describe local route-decision logging via the guest-visible FASTLY_HOSTNAME=localhost signal instead of the EDGEZERO_LOG_LEVEL env var, which the Compute guest never sees. Replace the stale Ok(None) => 100 plan snippet with the fail-safe absent-key arm, correct the stickiness invariant to per-client-IP (NAT and IP changes can re-bucket), switch verification commands to the cargo clippy-fastly alias, and drop the tail/head pipes that masked the real exit status of test, fmt, clippy, build, and docs-format checks.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review:
Review Summary
Reviewed PR #727 against feature/edgezero-pr19-spin-adapter, focusing on the Fastly canary routing implementation, rollout defaults/failure behavior, logging-level changes, local config, migration runbook, duplicated implementation-plan guidance, CI, and existing review feedback. I did not find any blocking correctness, security, data-loss, authorization, or severe compatibility issues. I left one P2 inline comment on stale plan guidance that now contradicts the implemented tests/helper names and could mislead future agent handoffs.
Findings
P0 / Blockers
None.
P1 / High
None.
P2 / Medium
1 inline comment submitted.
P3 / Low
None.
CI / Existing Reviews
All reported PR checks are passing. Existing review feedback about absent-key fail-safe behavior, per-client-IP stickiness, route-decision observability, logging override scope, stale fail-open snippets, and validation command exit-status masking appears addressed in the runtime code, fastly.toml, the canonical migration runbook, and portions of the implementation-plan document. Locally, cargo test -p trusted-server-adapter-fastly --target wasm32-wasip1 rollout_pct --no-run completed successfully.
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review against the PR base (feature/edgezero-pr19-spin-adapter). All findings from the prior review rounds are resolved:
- Absent
edgezero_rollout_pctnow fails safe to0(legacy), with the doc table, runbook failure-modes table, and setup precondition updated to match. - The runbook stickiness claim is corrected to "sticky per client IP (not per user)".
- The log-level override docs no longer imply it only raises (it can lower too), and the
FASTLY_HOSTNAMEsignal is documented as Fastly-specific and not to be copied into the axum/spin/cloudflare adapters. .level(max_level)on the fernDispatchnow filters route-decision debug logs out of production (without it, fern's default would have overridden the Logger'sInfo).
Approving. Three non-blocking items remain inline: a test-coverage gap on the log-level detection logic (♻️), the two-per-request config-store reads (🤔, carried), and an EDGEZERO_LOG_LEVEL precedence nit (⛏).
Non-blocking
♻️ refactor
- Log-level detection logic is untested — extract a pure helper and pin the production-safety property (real hostname must stay
Info). (logging.rs:76)
🤔 thinking
- Two config-store reads per request — consolidatable into one read. (
main.rs:328)
⛏ nitpick
- Unparseable
EDGEZERO_LOG_LEVELsuppresses local auto-debug — minor dev-surprise. (logging.rs:71)
CI Status
- fmt: PASS
- clippy: PASS
- rust tests: PASS
- js tests: PASS
Resolve conflicts keeping this branch's full fastly-removal architecture while preserving main's independent improvements: - Cargo.toml / core Cargo.toml: keep tower (axum dep); drop the residual fastly dependency from core (this branch removes it entirely via the EcKvStore platform trait + adapter-local rate limiter, superseding main's allowlist approach). - migration_guards.rs: keep the no-fastly guards; drop main's deferred-EC fastly allowlist tests. - fastly app.rs / main.rs: keep this branch's fastly-in-adapter wiring (ec_kv, rate_limiter, use_finalize_kv, core buffer_publisher_response) and adopt main's full ClientInfo capture into request extensions, which fixes the EdgeZero bot-protection metadata loss for integration request filters (DataDome). - edge_cookie.rs / storage/kv_store.rs: take main's identifier-free consent and EC log messages. - proxy.rs: take main's SVG image-optimizer tests (superset of this branch's). - auction/orchestrator.rs: keep the sequential-fan-out rejection test.
Blocking: - Force manual redirect handling on the Cloudflare fetch path (RequestRedirect::Manual). The Workers runtime defaults to Follow and auto-chases 3xx to any host, bypassing core's per-hop allowed_domains enforcement in proxy_with_redirects (SSRF). Surfaces the 3xx back to core unfollowed, matching the Axum adapter. - Return a fixed 501 from /admin/keys/rotate and /admin/keys/deactivate. Cloudflare's config/secret stores reject writes, so the real handlers surfaced an opaque 500; 501 makes the unsupported-runtime contract explicit, matching Axum. Non-blocking parity/safety: - Cap the buffered upstream response body at 10 MiB (Content-Length pre-check + post-buffer check), mirroring the Fastly adapter, to avoid OOMing the isolate on a large/hostile origin. - Reject image_optimizer and stream_response in the fetch path rather than silently dropping the behavior, honoring the PlatformHttpRequest contract. - Strip hop-by-hop response headers (and Connection-listed tokens) before forwarding, matching the Axum adapter's is_hop_by_hop_response_header. - Derive the S3 SigV4 signing time from the wasm-safe web_time clock so it no longer calls std::time::SystemTime::now(), which panics on wasm32-unknown-unknown (latent once asset proxying lands on Cloudflare). Tests: - Add routes_with_settings/build_router seam to the Cloudflare adapter and assert admin routes return 501 and the tsjs catch-all returns < 500, matching the Axum route-test coverage that would have caught the admin status divergence.
The build script only installed worker-build when none was on PATH, so a
globally installed worker-build 0.8.x was used as-is and rejected the worker
0.7 crate pinned in Cargo.toml ("Unsupported version worker@0.7.x, expected at
least worker@0.8.4"), failing `wrangler dev`. Check the installed major.minor
and reinstall ^0.7 when it is absent or mismatched.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Summary
👍 Approving. The Fastly canary runtime path now fails safe to legacy, the critical rollout helper branches are covered, and reported CI is green. I left non-blocking inline comments using the CREG style from CONTRIBUTING.md.
Folded follow-ups
🔧 wrench — Configuration guide still describes the old one-key rollout flow — docs/guide/configuration.md:1163
This file is not in the PR diff, so I could not leave an inline suggestion. The guide says edgezero_enabled=true routes traffic through EdgeZero and the examples only set that key. With this PR, absent edgezero_rollout_pct defaults to 0, so following the guide produces 0% EdgeZero. Please update the guide with both config keys, staged values, fallback behavior, and rollback via edgezero_rollout_pct = "0".
📝 memo — Spec drift to reconcile later — docs/superpowers/specs/2026-03-19-edgezero-migration-design.md:1512 still says rollout hashes the request ID, while the implementation and runbook hash client IP. Please align that source doc when updating rollout documentation.
Resolve conflicts from main's crate rename (js→trusted-server-js, openrtb→trusted-server-openrtb, integration-tests→trusted-server-integration-tests), toolchain bump (1.95.0), and the edgezero #269 repin. Key resolutions: - Keep the branch's pinned edgezero rev 38198f9 and fastly 0.11.12. That rev's Body::into_bytes returns Bytes (not Option) and the Fastly 0.11 TLS getters return Option directly, so main's #269 glue (.into_bytes().unwrap_or_default(), .ok().flatten() on TLS getters, oneshot().expect()/?) is reverted to the branch's API shape. Forward repin to fastly 0.12 is deferred to its own PR. - Adopt toolchain 1.95.0 but keep both wasm targets (Cloudflare needs wasm32-unknown-unknown). - Take the platform-neutral EC KV rewrite (ec/kv.rs) over main's Fastly-coupled version; main's changes there were cosmetic. - Carry main's new features onto the branch: tester-cookie endpoints (/_ts/set-tester, /_ts/clear-tester) wired into the Fastly router, and the osano integration. - Fix the Cloudflare adapter's trusted-server-js path after the rename, drop the leftover renamed-away crate dirs, and restore repo-wide node_modules/dist ignores (the old crates/js/.gitignore was removed by the rename). All adapters build, lint, and test green (fastly+core, axum, cloudflare).
Resolve PR19 conflicts against the latest PR18 base. Keep PR18's renamed crate layout/toolchain updates while preserving PR19's Spin workspace member, CI checks, route parity, and worker 0.8 compatibility. Combine Fastly TLS header hardening with PR18's client-info extraction helper, and update Spin/integration-test paths for the trusted-server-js rename.
…ezero-pr18-phase5-verification # Conflicts: # .github/workflows/integration-tests.yml
…ezero-pr18-phase5-verification # Conflicts: # crates/trusted-server-integration-tests/Cargo.lock
…/edgezero-pr19-spin-adapter # Conflicts: # crates/trusted-server-integration-tests/Cargo.lock
…ro-pr19-cutover-canary
The cutover-canary gating in the Fastly adapter main() routes to edgezero_main only when edgezero_enabled=true and read_rollout_pct() covers the request. read_rollout_pct fails safe to 0 (legacy) when the edgezero_rollout_pct key is absent, so the EdgeZero canary fixture must set it explicitly. Without it, main() fell back to legacy_main and the test_ec_lifecycle_fastly TRACE canary saw a proxied 200 instead of the EdgeZero router 405. Set the key to 100 (full cutover) so every request routes to EdgeZero deterministically, independent of client-IP bucketing.
Summary
edgezero_rollout_pct(0–100) to thetrusted_server_configstore so EdgeZero traffic can be shifted incrementally without a deployclient_ipfor deterministic, sticky per-user bucket assignment (0–99); routing predicate isbucket < rollout_pctrollout_pct = 0is the immediate no-deploy rollbackChanges
crates/trusted-server-adapter-fastly/src/main.rsparse_rollout_pct,fnv1a_bucket,canary_routes_to_edgezero,read_rollout_pct; updatedmain()with canary dispatch; 11 unit tests (pinned FNV-1a vectors, distribution, boundary routing)fastly.tomledgezero_rollout_pct = "0"to local Viceroy config store with ops commentdocs/internal/EDGEZERO_MIGRATION.mdCloses
Closes #500
Test plan
cargo test-fastly— 65 + 956 + 21 + 3 tests greencargo test-axum— 31 tests greencargo clippy-fastly && cargo clippy-axum— no warningscargo fmt --all -- --check— cleancd crates/js/lib && npx vitest run— 291 tests greencd crates/js/lib && npm run format— cleancd docs && npm run format— cleancargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute servewithedgezero_rollout_pct = "1"and"0"(rollback)Checklist
unwrap()in production code — useexpect("should ...")logmacros (notprintln!)