diff --git a/architecture/resilience.md b/architecture/resilience.md index 9b6d8f1..af29c6d 100644 --- a/architecture/resilience.md +++ b/architecture/resilience.md @@ -16,7 +16,7 @@ Both `AsyncCircuitBreaker` and `CircuitBreaker` expose a read-only `state` property that returns a public `CircuitState` enum (`CLOSED`/`OPEN`/`HALF_OPEN`), importable from `httpware`, for health checks and introspection. The property is a raw read of the stored state: because the OPEN→HALF_OPEN transition is lazy (it fires on the next request after `reset_timeout` elapses, not on a clock tick), `state` continues to report `OPEN` until a request is actually admitted as the probe — reading the property never triggers the transition. -The classic consecutive-failure mode is the default and unchanged. An opt-in time-based failure-rate mode is available: set `failure_rate_threshold` (a float in `(0, 1]`) to switch. In rate mode the circuit opens when the observed failure rate over a rolling `window_seconds` window (default `30.0` s) meets or exceeds the threshold, but only once `minimum_calls` outcomes have been observed in that window (default `20`). The presence of `failure_rate_threshold` is the sole mode switch: when it is set, the breaker is in rate mode and `failure_threshold` is ignored (setting both is not an error — rate mode wins). `window_seconds` and `minimum_calls` are validated at construction in both modes even though they are inert in classic mode, so an invalid value is rejected eagerly regardless of mode. Half-open recovery (`reset_timeout`, `success_threshold`, the single-probe admission) is identical to classic mode. The event names (`circuit.opened`, `circuit.rejected`, `circuit.half_open`, `circuit.closed`) are the same in both modes; in rate mode the `circuit.opened` event carries extra attributes — `failure_rate`, `failure_rate_threshold`, `window_seconds`, `observed_calls` — and its message is `"circuit opened — failure rate threshold reached"`. +The classic consecutive-failure mode is the default and unchanged. An opt-in time-based failure-rate mode is available: set `failure_rate_threshold` (a float in `(0, 1]`) to switch. In rate mode the circuit opens when the observed failure rate over a rolling `window_seconds` window (default `30.0` s) meets or exceeds the threshold, but only once `minimum_calls` outcomes have been observed in that window (default `20`). The presence of `failure_rate_threshold` is the sole mode switch: when it is set, the breaker is in rate mode and `failure_threshold` is ignored (setting both is not an error — rate mode wins). `window_seconds` and `minimum_calls` are validated at construction in both modes even though they are inert in classic mode, so an invalid value is rejected eagerly regardless of mode. Half-open recovery (`reset_timeout`, `success_threshold`, the single-probe admission) is identical to classic mode. The event names (`circuit.opened`, `circuit.rejected`, `circuit.half_open`, `circuit.closed`) are the same in both modes; in rate mode the `circuit.opened` event carries extra attributes — `failure_rate`, `failure_rate_threshold`, `window_seconds`, `observed_calls` — and its message is `"circuit opened — failure rate threshold reached"`. The rate-flavored `circuit.opened` is emitted only on the *initial* trip from CLOSED: a HALF_OPEN **probe failure** re-opens the circuit through the shared half-open path (a probe failure is a distinct event from a rate/threshold trip), so that re-open's `circuit.opened` carries the classic `failure_threshold` / `failures` attributes (with `failures = 1`) and the `"circuit opened — failure threshold reached"` message in *both* modes. This is intentional; the event name is the stable contract, and a consumer keys off the present attribute set rather than assuming a fixed shape. `AsyncTimeout` is an async-only middleware that bounds the total wall-clock for the whole inner pipeline (most importantly across an `AsyncRetry` loop, whose attempts and backoff sleeps `httpx2` cannot bound). It is not a per-call timeout — `httpx2`'s connect/read/write/pool timeouts are the right tool for a single outbound call, and `AsyncTimeout` does not duplicate them. It rejects a non-finite or non-positive `timeout` at construction, and on expiry raises httpware `TimeoutError`. There is no sync `Timeout`: a sync total-deadline cannot interrupt a blocking call mid-flight, and `httpx2` already covers sync per-call timeouts. Sync callers configure `httpx2`'s timeouts directly. diff --git a/planning/README.md b/planning/README.md index 894d844..9067087 100644 --- a/planning/README.md +++ b/planning/README.md @@ -74,6 +74,8 @@ _None._ ### Archived (shipped) +- **[delta-audit-followups](changes/archive/2026-06-16.04-delta-audit-followups/change.md)** (#71, 2026-06-16) — Closed the [2026-06-16 delta audit](audits/2026-06-16-delta-audit.md) Low findings: rate-mode HALF_OPEN probe-failure re-open test + document-as-intended note. Tests + doc only, no release. + - **[circuit-breaker-state](changes/archive/2026-06-16.03-circuit-breaker-state/design.md)** (#70, 2026-06-16) — Read-only `state` property + public `CircuitState` enum on the circuit breaker. Shipped 0.14.0; closed the read-only-state half of the deferred CircuitBreaker introspection item. - **[circuit-breaker-rate-mode](changes/archive/2026-06-16.02-circuit-breaker-rate-mode/design.md)** (#69, 2026-06-16) — Added an opt-in time-based failure-rate trip mode to the circuit breaker (classic stays default). Shipped 0.13.0; closed deferred item "CircuitBreaker v2 (a)". diff --git a/planning/audits/2026-06-16-delta-audit.md b/planning/audits/2026-06-16-delta-audit.md new file mode 100644 index 0000000..21ce51e --- /dev/null +++ b/planning/audits/2026-06-16-delta-audit.md @@ -0,0 +1,61 @@ +# httpware delta audit — 2026-06-16 + +**Status:** complete +**Scope:** the four changes shipped since the 2026-06-14 deep audit (baseline `9297ced`): the custom-decoder guide (#67), per-verb `*_with_response` siblings + the `_prepare_request` refactor (#68, 0.12.0), the time-based rate-mode circuit breaker (#69, 0.13.0), and read-only circuit-breaker `state` + the public `CircuitState` enum (#70, 0.14.0). +**Method:** five adversarial finders fanned out across dimensions — correctness/edge-cases, concurrency, public-API/consistency, docs-accuracy, test-quality — each instructed to refute its own candidates and report only survivors. Single synthesis pass (this report); the controller re-verified the one substantive finding against source. + +## Summary + +A clean delta. Four of five dimensions produced **zero** confirmed findings: every correctness, concurrency, public-API, and docs-accuracy candidate was refuted against the actual code. The only residue is in test coverage. + +- Blockers: 0 +- High: 0 +- Medium: 0 +- Low: 2 +- Nits: 1 + +**Headline:** there is no headline defect. The accumulated 0.12–0.14 surface (per-verb siblings, rate-mode breaker, `state` introspection) is correct, concurrency-safe under the documented model, API-consistent and sync/async-symmetric, and its docs match the code — including the rate-mode defaults, the `(0,1]` range, the `failure_threshold`-ignored / both-set-precedence notes, and the raw-read `state` caveat. The two Low findings are a missing rate-mode test path and a minor observability-attribute inconsistency on one re-open path. + +**Not covered:** static analysis only (source/test/doc reading + reproducer reasoning + targeted `ty`/import checks). No fuzzing, live-network, or dependency-CVE scan. The full suite (704 tests, 100% line coverage) and `ty`/`ruff` were green throughout — line coverage is total, so the test-quality finder targeted *semantic* gaps. + +## Findings + +### Low + +#### 1. Rate-mode HALF_OPEN probe-failure re-open emits classic `circuit.opened` attributes +*(consistency / observability — verified against source)* + +`src/httpware/middleware/resilience/circuit_breaker.py` — `on_failure`, HALF_OPEN branch (`self._open(request, failures=1)`) + +When a circuit is in HALF_OPEN, a single probe failure re-opens it via `_open(request, failures=1)` — the *classic* opener, which emits `circuit.opened` with `{failure_threshold, failures}` and the message `"circuit opened — failure threshold reached"`. This path is shared across modes, so a **rate-mode** breaker that re-opens from a failed probe emits classic-shaped attributes rather than the rate shape (`failure_rate`/`failure_rate_threshold`/`window_seconds`/`observed_calls`) it emits when it first trips from CLOSED. An observability consumer keying off `circuit.opened` attributes sees an inconsistent attribute set from the same breaker depending on which transition opened it, and `failure_threshold` — documented as *ignored* in rate mode — appears in the event. + +Capped at Low: reachable only under the non-default rate knob, affects event *attributes* only (no control-flow effect — the re-open itself is correct), and half-open recovery is documented as mode-independent, so this is arguably acceptable. **Suggested direction:** either (a) document that a probe-failure re-open always carries the probe-failure attribute shape regardless of mode, or (b) emit a transition-specific attribute set for the half-open re-open (e.g. a `reason: "probe_failed"` attribute) so the event is self-describing in both modes. Not a functional bug. + +Note: the window is *not* cleared on this re-open (only on HALF_OPEN→CLOSED). That is harmless — while OPEN/HALF_OPEN no outcomes are recorded, and the window is cleared when the circuit eventually closes (confirmed by the correctness finder). + +#### 2. No rate-mode probe-failure re-open test +*(test coverage — verified)* + +`tests/test_circuit_breaker.py` / `tests/test_circuit_breaker_sync.py` (rate-mode section) + +Every rate-mode recovery test drives HALF_OPEN→CLOSED through a *successful* probe. No rate-mode test exercises a probe *failure* re-opening the circuit. The classic equivalent (`test_probe_failure_reopens_circuit`) exists but runs in consecutive-failure mode, so it never touches the rate-mode opener selection or the window. A regression in the rate-mode re-open path (the very behavior in finding 1) would pass the suite. Line coverage is 100% because the classic tests already cover the shared `_open` line — this is a *semantic* gap. **Suggested direction:** add a rate-mode test (async + sync) that opens via rate, advances past `reset_timeout`, fails the probe, and asserts the circuit re-opens; assert whatever attribute behavior finding 1 settles on. + +### Nits + +#### 3. `_RollingWindow` Hypothesis oracle shares the slot formula with the implementation +*(test quality)* + +`tests/test_rolling_window.py` — `test_totals_match_live_events` + +The property's `expected_live` oracle recomputes `int(t // bucket_width)` and the live-cutoff with the same arithmetic as `_RollingWindow`, so a systematic off-by-one in the window-boundary formula would be replicated in the oracle and the property would still pass. Largely neutralized: the example tests (`test_counts_within_window`, `test_stale_buckets_evicted_by_time`) pin the boundary with hand-computed literals the implementation can't co-vary with, and the property still exercises bucket-collision/eviction interleavings. Nit only. + +## Refuted (recorded so the reasoning is auditable) + +- **Correctness:** `_prepare_request` is byte-equivalent to the old inline logic (kwargs order, marker, sync/async streaming-detector split all preserved); all 12 verb delegations use the correct verb string and kwarg subset; rate trip math has no division-by-zero (guarded by `minimum_calls >= 1`); `_RollingWindow` slot aliasing is sound (live window spans exactly `_BUCKET_COUNT` slots → no live-bucket collision mod 10); `state` is a pure read. +- **Concurrency:** no `await` anywhere in the async transition chain (`_record_outcome`→`_open_rate`→`_enter_open`→`_emit`→`_emit_event` all sync); `_now()` read once inside the critical section; sync rate path fully under `self._lock`; lock-free `state` read is a single atomic enum-reference load (no tear). +- **Public API:** `*_with_response` correctly overload-free (single return shape), `response_model` required, exact sync/async parity; new breaker params identical on both wrappers; `CircuitState` is one object exported from both `__all__`s, alphabetical; `ty` clean. +- **Docs accuracy:** every load-bearing claim across `docs/decoders.md`, `docs/resilience.md`, the pagination recipe, and the three `architecture/` files verified against source — defaults, ranges, precedence, the raw-read caveat, import paths, and the CSV example all match and run. + +## Remediation + +Findings 1 + 2 are naturally one small change: add the rate-mode probe-failure re-open test (sync + async) and, in the same pass, settle finding 1 (document-as-intended or add a self-describing re-open attribute). Lightweight lane. Finding 3 is a Nit — optional. diff --git a/planning/changes/archive/2026-06-16.04-delta-audit-followups/change.md b/planning/changes/archive/2026-06-16.04-delta-audit-followups/change.md new file mode 100644 index 0000000..05b10a6 --- /dev/null +++ b/planning/changes/archive/2026-06-16.04-delta-audit-followups/change.md @@ -0,0 +1,64 @@ +--- +status: shipped +date: 2026-06-16 +slug: delta-audit-followups +supersedes: null +superseded_by: null +pr: 71 +outcome: Closed the 2026-06-16 delta-audit Low findings — rate-mode probe-failure re-open test (async+sync) + document-as-intended note. No source behavior change. +--- + +# Change: Close the 2026-06-16 delta-audit Low findings + +**Lane:** lightweight — tests + a one-paragraph doc clarification, no source +behavior change, no public-API change. + +Closes the actionable findings from the +[2026-06-16 delta audit](../../../audits/2026-06-16-delta-audit.md), which gave +the 0.12–0.14 change cluster a clean bill (0 Blocker/High/Medium) with two Low +items and a Nit. + +## Goal + +Lock and document the one behavioral subtlety the audit surfaced: a rate-mode +circuit that re-opens from a **failed HALF_OPEN probe** emits `circuit.opened` +with the *classic* `failure_threshold`/`failures` attributes (not the rate +shape), because the half-open re-open path is shared and calls the classic +`_open`. Per the audit's resolution (Option 1 — **document as intended**): a +probe-failure re-open is a distinct event from a trip, so the classic shape is +correct in both modes. No code behavior changes. + +## Approach + +- **Low #1 (observability, document-as-intended):** append a sentence to the + CircuitBreaker section of `architecture/resilience.md` stating that the + rate-flavored `circuit.opened` fires only on the initial trip from CLOSED, and + that a HALF_OPEN probe-failure re-open carries the classic + `failure_threshold`/`failures` (`failures = 1`) attributes and message in + *both* modes — intentional, event-name is the stable contract. +- **Low #2 (test gap):** add `test_rate_mode_probe_failure_reopens_with_classic_attributes` + (async + sync mirror) — trip via rate, advance past `reset_timeout`, fail the + probe, assert the circuit re-opens AND the re-open `circuit.opened` carries the + classic shape (`failures == 1`, has `failure_threshold`, NOT `failure_rate`). + This locks the Low #1 behavior so a regression can't pass silently. +- **Nit #3 (Hypothesis oracle coupling):** not actioned — already neutralized by + the hand-computed example tests; recorded in the audit as a Nit. + +No `pyproject.toml` change; no release required (tests + doc clarification only, +no user-facing functionality or API change). The audit report itself is added +under `planning/audits/`. + +## Files + +- `planning/audits/2026-06-16-delta-audit.md` — the findings report (added). +- `architecture/resilience.md` — the document-as-intended paragraph. +- `tests/test_circuit_breaker.py` — async probe-failure re-open test. +- `tests/test_circuit_breaker_sync.py` — sync mirror. + +## Verification + +- [x] Both new tests fail-safe (assert the actual emitted attribute set) and pass: + `just test ...::test_rate_mode_probe_failure_reopens_with_classic_attributes`. +- [x] `just test` — 706 passed, 100% coverage. +- [x] `just lint` — clean. +- [x] No source behavior change (only tests + docs). diff --git a/tests/test_circuit_breaker.py b/tests/test_circuit_breaker.py index c343621..1f94f92 100644 --- a/tests/test_circuit_breaker.py +++ b/tests/test_circuit_breaker.py @@ -669,3 +669,34 @@ async def test_state_half_open_while_probing() -> None: assert breaker.state is CircuitState.HALF_OPEN await ok.get("https://example.test/x") # 2nd consecutive success → CLOSED assert breaker.state is CircuitState.CLOSED + + +async def test_rate_mode_probe_failure_reopens_with_classic_attributes(caplog: pytest.LogCaptureFixture) -> None: + """A failed HALF_OPEN probe re-opens a rate-mode circuit via the shared classic path. + + Documented as intended (2026-06-16 delta audit, Low #1): a probe-failure re-open is a + distinct event from a rate trip, so circuit.opened carries the classic + failure_threshold/failures shape (failures=1) in both modes; the rate attributes appear + only on the initial rate trip from CLOSED. + """ + clock = _Clock() + breaker = AsyncCircuitBreaker( + failure_rate_threshold=0.5, window_seconds=100.0, minimum_calls=2, reset_timeout=5.0, _now=clock + ) + fail = _client(_StatusSequence([500, 500]), breaker=breaker) # total=2, rate=1.0 → trip via rate + for _ in range(2): + with pytest.raises(InternalServerError): + await fail.get("https://example.test/x") + assert breaker.state is CircuitState.OPEN + clock.advance(5.0) # past reset_timeout → next request is admitted as the probe + with caplog.at_level(logging.WARNING, logger="httpware.circuit_breaker"): + probe = _client(_StatusSequence([500]), breaker=breaker) + with pytest.raises(InternalServerError): + await probe.get("https://example.test/x") + assert breaker.state is CircuitState.OPEN # the failed probe re-opened it + opened = [r for r in caplog.records if r.event == "circuit.opened"] # ty: ignore[unresolved-attribute] + assert opened + rec = opened[-1] + assert rec.failures == 1 # ty: ignore[unresolved-attribute] # classic probe-failure shape + assert hasattr(rec, "failure_threshold") + assert not hasattr(rec, "failure_rate") # NOT the rate shape diff --git a/tests/test_circuit_breaker_sync.py b/tests/test_circuit_breaker_sync.py index 2509118..a52e6cb 100644 --- a/tests/test_circuit_breaker_sync.py +++ b/tests/test_circuit_breaker_sync.py @@ -560,3 +560,32 @@ def test_state_half_open_while_probing() -> None: assert breaker.state is CircuitState.HALF_OPEN ok.get("https://example.test/x") # 2nd consecutive success → CLOSED assert breaker.state is CircuitState.CLOSED + + +def test_rate_mode_probe_failure_reopens_with_classic_attributes(caplog: pytest.LogCaptureFixture) -> None: + """Sync mirror: a failed HALF_OPEN probe re-opens a rate-mode circuit via the shared classic path. + + Documented as intended (2026-06-16 delta audit, Low #1): the re-open's circuit.opened carries + the classic failure_threshold/failures shape (failures=1) in both modes. + """ + clock = _Clock() + breaker = CircuitBreaker( + failure_rate_threshold=0.5, window_seconds=100.0, minimum_calls=2, reset_timeout=5.0, _now=clock + ) + fail = _client(_StatusSequence([500, 500]), breaker=breaker) # total=2, rate=1.0 → trip via rate + for _ in range(2): + with pytest.raises(InternalServerError): + fail.get("https://example.test/x") + assert breaker.state is CircuitState.OPEN + clock.advance(5.0) # past reset_timeout → next request is admitted as the probe + with caplog.at_level(logging.WARNING, logger="httpware.circuit_breaker"): + probe = _client(_StatusSequence([500]), breaker=breaker) + with pytest.raises(InternalServerError): + probe.get("https://example.test/x") + assert breaker.state is CircuitState.OPEN # the failed probe re-opened it + opened = [r for r in caplog.records if r.event == "circuit.opened"] # ty: ignore[unresolved-attribute] + assert opened + rec = opened[-1] + assert rec.failures == 1 # ty: ignore[unresolved-attribute] # classic probe-failure shape + assert hasattr(rec, "failure_threshold") + assert not hasattr(rec, "failure_rate") # NOT the rate shape