Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion architecture/resilience.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

Both `AsyncCircuitBreaker` and `CircuitBreaker` expose a read-only `state` property that returns a public `CircuitState` enum (`CLOSED`/`OPEN`/`HALF_OPEN`), importable from `httpware`, for health checks and introspection. The property is a raw read of the stored state: because the OPEN→HALF_OPEN transition is lazy (it fires on the next request after `reset_timeout` elapses, not on a clock tick), `state` continues to report `OPEN` until a request is actually admitted as the probe — reading the property never triggers the transition.

The classic consecutive-failure mode is the default and unchanged. An opt-in time-based failure-rate mode is available: set `failure_rate_threshold` (a float in `(0, 1]`) to switch. In rate mode the circuit opens when the observed failure rate over a rolling `window_seconds` window (default `30.0` s) meets or exceeds the threshold, but only once `minimum_calls` outcomes have been observed in that window (default `20`). The presence of `failure_rate_threshold` is the sole mode switch: when it is set, the breaker is in rate mode and `failure_threshold` is ignored (setting both is not an error — rate mode wins). `window_seconds` and `minimum_calls` are validated at construction in both modes even though they are inert in classic mode, so an invalid value is rejected eagerly regardless of mode. Half-open recovery (`reset_timeout`, `success_threshold`, the single-probe admission) is identical to classic mode. The event names (`circuit.opened`, `circuit.rejected`, `circuit.half_open`, `circuit.closed`) are the same in both modes; in rate mode the `circuit.opened` event carries extra attributes — `failure_rate`, `failure_rate_threshold`, `window_seconds`, `observed_calls` — and its message is `"circuit opened — failure rate threshold reached"`.
The classic consecutive-failure mode is the default and unchanged. An opt-in time-based failure-rate mode is available: set `failure_rate_threshold` (a float in `(0, 1]`) to switch. In rate mode the circuit opens when the observed failure rate over a rolling `window_seconds` window (default `30.0` s) meets or exceeds the threshold, but only once `minimum_calls` outcomes have been observed in that window (default `20`). The presence of `failure_rate_threshold` is the sole mode switch: when it is set, the breaker is in rate mode and `failure_threshold` is ignored (setting both is not an error — rate mode wins). `window_seconds` and `minimum_calls` are validated at construction in both modes even though they are inert in classic mode, so an invalid value is rejected eagerly regardless of mode. Half-open recovery (`reset_timeout`, `success_threshold`, the single-probe admission) is identical to classic mode. The event names (`circuit.opened`, `circuit.rejected`, `circuit.half_open`, `circuit.closed`) are the same in both modes; in rate mode the `circuit.opened` event carries extra attributes — `failure_rate`, `failure_rate_threshold`, `window_seconds`, `observed_calls` — and its message is `"circuit opened — failure rate threshold reached"`. The rate-flavored `circuit.opened` is emitted only on the *initial* trip from CLOSED: a HALF_OPEN **probe failure** re-opens the circuit through the shared half-open path (a probe failure is a distinct event from a rate/threshold trip), so that re-open's `circuit.opened` carries the classic `failure_threshold` / `failures` attributes (with `failures = 1`) and the `"circuit opened — failure threshold reached"` message in *both* modes. This is intentional; the event name is the stable contract, and a consumer keys off the present attribute set rather than assuming a fixed shape.

`AsyncTimeout` is an async-only middleware that bounds the total wall-clock for the whole inner pipeline (most importantly across an `AsyncRetry` loop, whose attempts and backoff sleeps `httpx2` cannot bound). It is not a per-call timeout — `httpx2`'s connect/read/write/pool timeouts are the right tool for a single outbound call, and `AsyncTimeout` does not duplicate them. It rejects a non-finite or non-positive `timeout` at construction, and on expiry raises httpware `TimeoutError`. There is no sync `Timeout`: a sync total-deadline cannot interrupt a blocking call mid-flight, and `httpx2` already covers sync per-call timeouts. Sync callers configure `httpx2`'s timeouts directly.

Expand Down
2 changes: 2 additions & 0 deletions planning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ _None._

### Archived (shipped)

- **[delta-audit-followups](changes/archive/2026-06-16.04-delta-audit-followups/change.md)** (#71, 2026-06-16) — Closed the [2026-06-16 delta audit](audits/2026-06-16-delta-audit.md) Low findings: rate-mode HALF_OPEN probe-failure re-open test + document-as-intended note. Tests + doc only, no release.

- **[circuit-breaker-state](changes/archive/2026-06-16.03-circuit-breaker-state/design.md)** (#70, 2026-06-16) — Read-only `state` property + public `CircuitState` enum on the circuit breaker. Shipped 0.14.0; closed the read-only-state half of the deferred CircuitBreaker introspection item.

- **[circuit-breaker-rate-mode](changes/archive/2026-06-16.02-circuit-breaker-rate-mode/design.md)** (#69, 2026-06-16) — Added an opt-in time-based failure-rate trip mode to the circuit breaker (classic stays default). Shipped 0.13.0; closed deferred item "CircuitBreaker v2 (a)".
Expand Down
61 changes: 61 additions & 0 deletions planning/audits/2026-06-16-delta-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# httpware delta audit — 2026-06-16

**Status:** complete
**Scope:** the four changes shipped since the 2026-06-14 deep audit (baseline `9297ced`): the custom-decoder guide (#67), per-verb `*_with_response` siblings + the `_prepare_request` refactor (#68, 0.12.0), the time-based rate-mode circuit breaker (#69, 0.13.0), and read-only circuit-breaker `state` + the public `CircuitState` enum (#70, 0.14.0).
**Method:** five adversarial finders fanned out across dimensions — correctness/edge-cases, concurrency, public-API/consistency, docs-accuracy, test-quality — each instructed to refute its own candidates and report only survivors. Single synthesis pass (this report); the controller re-verified the one substantive finding against source.

## Summary

A clean delta. Four of five dimensions produced **zero** confirmed findings: every correctness, concurrency, public-API, and docs-accuracy candidate was refuted against the actual code. The only residue is in test coverage.

- Blockers: 0
- High: 0
- Medium: 0
- Low: 2
- Nits: 1

**Headline:** there is no headline defect. The accumulated 0.12–0.14 surface (per-verb siblings, rate-mode breaker, `state` introspection) is correct, concurrency-safe under the documented model, API-consistent and sync/async-symmetric, and its docs match the code — including the rate-mode defaults, the `(0,1]` range, the `failure_threshold`-ignored / both-set-precedence notes, and the raw-read `state` caveat. The two Low findings are a missing rate-mode test path and a minor observability-attribute inconsistency on one re-open path.

**Not covered:** static analysis only (source/test/doc reading + reproducer reasoning + targeted `ty`/import checks). No fuzzing, live-network, or dependency-CVE scan. The full suite (704 tests, 100% line coverage) and `ty`/`ruff` were green throughout — line coverage is total, so the test-quality finder targeted *semantic* gaps.

## Findings

### Low

#### 1. Rate-mode HALF_OPEN probe-failure re-open emits classic `circuit.opened` attributes
*(consistency / observability — verified against source)*

`src/httpware/middleware/resilience/circuit_breaker.py` — `on_failure`, HALF_OPEN branch (`self._open(request, failures=1)`)

When a circuit is in HALF_OPEN, a single probe failure re-opens it via `_open(request, failures=1)` — the *classic* opener, which emits `circuit.opened` with `{failure_threshold, failures}` and the message `"circuit opened — failure threshold reached"`. This path is shared across modes, so a **rate-mode** breaker that re-opens from a failed probe emits classic-shaped attributes rather than the rate shape (`failure_rate`/`failure_rate_threshold`/`window_seconds`/`observed_calls`) it emits when it first trips from CLOSED. An observability consumer keying off `circuit.opened` attributes sees an inconsistent attribute set from the same breaker depending on which transition opened it, and `failure_threshold` — documented as *ignored* in rate mode — appears in the event.

Capped at Low: reachable only under the non-default rate knob, affects event *attributes* only (no control-flow effect — the re-open itself is correct), and half-open recovery is documented as mode-independent, so this is arguably acceptable. **Suggested direction:** either (a) document that a probe-failure re-open always carries the probe-failure attribute shape regardless of mode, or (b) emit a transition-specific attribute set for the half-open re-open (e.g. a `reason: "probe_failed"` attribute) so the event is self-describing in both modes. Not a functional bug.

Note: the window is *not* cleared on this re-open (only on HALF_OPEN→CLOSED). That is harmless — while OPEN/HALF_OPEN no outcomes are recorded, and the window is cleared when the circuit eventually closes (confirmed by the correctness finder).

#### 2. No rate-mode probe-failure re-open test
*(test coverage — verified)*

`tests/test_circuit_breaker.py` / `tests/test_circuit_breaker_sync.py` (rate-mode section)

Every rate-mode recovery test drives HALF_OPEN→CLOSED through a *successful* probe. No rate-mode test exercises a probe *failure* re-opening the circuit. The classic equivalent (`test_probe_failure_reopens_circuit`) exists but runs in consecutive-failure mode, so it never touches the rate-mode opener selection or the window. A regression in the rate-mode re-open path (the very behavior in finding 1) would pass the suite. Line coverage is 100% because the classic tests already cover the shared `_open` line — this is a *semantic* gap. **Suggested direction:** add a rate-mode test (async + sync) that opens via rate, advances past `reset_timeout`, fails the probe, and asserts the circuit re-opens; assert whatever attribute behavior finding 1 settles on.

### Nits

#### 3. `_RollingWindow` Hypothesis oracle shares the slot formula with the implementation
*(test quality)*

`tests/test_rolling_window.py` — `test_totals_match_live_events`

The property's `expected_live` oracle recomputes `int(t // bucket_width)` and the live-cutoff with the same arithmetic as `_RollingWindow`, so a systematic off-by-one in the window-boundary formula would be replicated in the oracle and the property would still pass. Largely neutralized: the example tests (`test_counts_within_window`, `test_stale_buckets_evicted_by_time`) pin the boundary with hand-computed literals the implementation can't co-vary with, and the property still exercises bucket-collision/eviction interleavings. Nit only.

## Refuted (recorded so the reasoning is auditable)

- **Correctness:** `_prepare_request` is byte-equivalent to the old inline logic (kwargs order, marker, sync/async streaming-detector split all preserved); all 12 verb delegations use the correct verb string and kwarg subset; rate trip math has no division-by-zero (guarded by `minimum_calls >= 1`); `_RollingWindow` slot aliasing is sound (live window spans exactly `_BUCKET_COUNT` slots → no live-bucket collision mod 10); `state` is a pure read.
- **Concurrency:** no `await` anywhere in the async transition chain (`_record_outcome`→`_open_rate`→`_enter_open`→`_emit`→`_emit_event` all sync); `_now()` read once inside the critical section; sync rate path fully under `self._lock`; lock-free `state` read is a single atomic enum-reference load (no tear).
- **Public API:** `*_with_response` correctly overload-free (single return shape), `response_model` required, exact sync/async parity; new breaker params identical on both wrappers; `CircuitState` is one object exported from both `__all__`s, alphabetical; `ty` clean.
- **Docs accuracy:** every load-bearing claim across `docs/decoders.md`, `docs/resilience.md`, the pagination recipe, and the three `architecture/` files verified against source — defaults, ranges, precedence, the raw-read caveat, import paths, and the CSV example all match and run.

## Remediation

Findings 1 + 2 are naturally one small change: add the rate-mode probe-failure re-open test (sync + async) and, in the same pass, settle finding 1 (document-as-intended or add a self-describing re-open attribute). Lightweight lane. Finding 3 is a Nit — optional.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
status: shipped
date: 2026-06-16
slug: delta-audit-followups
supersedes: null
superseded_by: null
pr: 71
outcome: Closed the 2026-06-16 delta-audit Low findings — rate-mode probe-failure re-open test (async+sync) + document-as-intended note. No source behavior change.
---

# Change: Close the 2026-06-16 delta-audit Low findings

**Lane:** lightweight — tests + a one-paragraph doc clarification, no source
behavior change, no public-API change.

Closes the actionable findings from the
[2026-06-16 delta audit](../../../audits/2026-06-16-delta-audit.md), which gave
the 0.12–0.14 change cluster a clean bill (0 Blocker/High/Medium) with two Low
items and a Nit.

## Goal

Lock and document the one behavioral subtlety the audit surfaced: a rate-mode
circuit that re-opens from a **failed HALF_OPEN probe** emits `circuit.opened`
with the *classic* `failure_threshold`/`failures` attributes (not the rate
shape), because the half-open re-open path is shared and calls the classic
`_open`. Per the audit's resolution (Option 1 — **document as intended**): a
probe-failure re-open is a distinct event from a trip, so the classic shape is
correct in both modes. No code behavior changes.

## Approach

- **Low #1 (observability, document-as-intended):** append a sentence to the
CircuitBreaker section of `architecture/resilience.md` stating that the
rate-flavored `circuit.opened` fires only on the initial trip from CLOSED, and
that a HALF_OPEN probe-failure re-open carries the classic
`failure_threshold`/`failures` (`failures = 1`) attributes and message in
*both* modes — intentional, event-name is the stable contract.
- **Low #2 (test gap):** add `test_rate_mode_probe_failure_reopens_with_classic_attributes`
(async + sync mirror) — trip via rate, advance past `reset_timeout`, fail the
probe, assert the circuit re-opens AND the re-open `circuit.opened` carries the
classic shape (`failures == 1`, has `failure_threshold`, NOT `failure_rate`).
This locks the Low #1 behavior so a regression can't pass silently.
- **Nit #3 (Hypothesis oracle coupling):** not actioned — already neutralized by
the hand-computed example tests; recorded in the audit as a Nit.

No `pyproject.toml` change; no release required (tests + doc clarification only,
no user-facing functionality or API change). The audit report itself is added
under `planning/audits/`.

## Files

- `planning/audits/2026-06-16-delta-audit.md` — the findings report (added).
- `architecture/resilience.md` — the document-as-intended paragraph.
- `tests/test_circuit_breaker.py` — async probe-failure re-open test.
- `tests/test_circuit_breaker_sync.py` — sync mirror.

## Verification

- [x] Both new tests fail-safe (assert the actual emitted attribute set) and pass:
`just test ...::test_rate_mode_probe_failure_reopens_with_classic_attributes`.
- [x] `just test` — 706 passed, 100% coverage.
- [x] `just lint` — clean.
- [x] No source behavior change (only tests + docs).
31 changes: 31 additions & 0 deletions tests/test_circuit_breaker.py
Original file line number Diff line number Diff line change
Expand Up @@ -669,3 +669,34 @@ async def test_state_half_open_while_probing() -> None:
assert breaker.state is CircuitState.HALF_OPEN
await ok.get("https://example.test/x") # 2nd consecutive success → CLOSED
assert breaker.state is CircuitState.CLOSED


async def test_rate_mode_probe_failure_reopens_with_classic_attributes(caplog: pytest.LogCaptureFixture) -> None:
"""A failed HALF_OPEN probe re-opens a rate-mode circuit via the shared classic path.

Documented as intended (2026-06-16 delta audit, Low #1): a probe-failure re-open is a
distinct event from a rate trip, so circuit.opened carries the classic
failure_threshold/failures shape (failures=1) in both modes; the rate attributes appear
only on the initial rate trip from CLOSED.
"""
clock = _Clock()
breaker = AsyncCircuitBreaker(
failure_rate_threshold=0.5, window_seconds=100.0, minimum_calls=2, reset_timeout=5.0, _now=clock
)
fail = _client(_StatusSequence([500, 500]), breaker=breaker) # total=2, rate=1.0 → trip via rate
for _ in range(2):
with pytest.raises(InternalServerError):
await fail.get("https://example.test/x")
assert breaker.state is CircuitState.OPEN
clock.advance(5.0) # past reset_timeout → next request is admitted as the probe
with caplog.at_level(logging.WARNING, logger="httpware.circuit_breaker"):
probe = _client(_StatusSequence([500]), breaker=breaker)
with pytest.raises(InternalServerError):
await probe.get("https://example.test/x")
assert breaker.state is CircuitState.OPEN # the failed probe re-opened it
opened = [r for r in caplog.records if r.event == "circuit.opened"] # ty: ignore[unresolved-attribute]
assert opened
rec = opened[-1]
assert rec.failures == 1 # ty: ignore[unresolved-attribute] # classic probe-failure shape
assert hasattr(rec, "failure_threshold")
assert not hasattr(rec, "failure_rate") # NOT the rate shape
29 changes: 29 additions & 0 deletions tests/test_circuit_breaker_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -560,3 +560,32 @@ def test_state_half_open_while_probing() -> None:
assert breaker.state is CircuitState.HALF_OPEN
ok.get("https://example.test/x") # 2nd consecutive success → CLOSED
assert breaker.state is CircuitState.CLOSED


def test_rate_mode_probe_failure_reopens_with_classic_attributes(caplog: pytest.LogCaptureFixture) -> None:
"""Sync mirror: a failed HALF_OPEN probe re-opens a rate-mode circuit via the shared classic path.

Documented as intended (2026-06-16 delta audit, Low #1): the re-open's circuit.opened carries
the classic failure_threshold/failures shape (failures=1) in both modes.
"""
clock = _Clock()
breaker = CircuitBreaker(
failure_rate_threshold=0.5, window_seconds=100.0, minimum_calls=2, reset_timeout=5.0, _now=clock
)
fail = _client(_StatusSequence([500, 500]), breaker=breaker) # total=2, rate=1.0 → trip via rate
for _ in range(2):
with pytest.raises(InternalServerError):
fail.get("https://example.test/x")
assert breaker.state is CircuitState.OPEN
clock.advance(5.0) # past reset_timeout → next request is admitted as the probe
with caplog.at_level(logging.WARNING, logger="httpware.circuit_breaker"):
probe = _client(_StatusSequence([500]), breaker=breaker)
with pytest.raises(InternalServerError):
probe.get("https://example.test/x")
assert breaker.state is CircuitState.OPEN # the failed probe re-opened it
opened = [r for r in caplog.records if r.event == "circuit.opened"] # ty: ignore[unresolved-attribute]
assert opened
rec = opened[-1]
assert rec.failures == 1 # ty: ignore[unresolved-attribute] # classic probe-failure shape
assert hasattr(rec, "failure_threshold")
assert not hasattr(rec, "failure_rate") # NOT the rate shape