feat(admission): platform-API gate for per-agent quota / cost limits (closes #201) by initializ-mk · Pull Request #203 · initializ/forge

initializ-mk · 2026-06-27T21:21:02Z

Summary

Adds a pre-dispatch middleware that calls a platform admission API once per agent (cached 5s) to decide whether to admit each new tasks/send. Distinct from auth (HTTP 401) and from the per-IP rate limiter (HTTP 429). Off by default; engaged only when both FORGE_ADMISSION_URL and FORGE_PLATFORM_TOKEN are set.

Closes the platform → agent signal gap: Forge has measured token usage since FWS-3 / #87 so the platform can compute a spend ceiling, but until now there was no clean way for the platform to tell the agent "stop accepting work" when an org/workspace/agent went over budget.

Wire shape

Request (issued at most once per 5s per agent):

GET /v1/admission?agent_id=my-agent HTTP/1.1
Authorization: Bearer <FORGE_PLATFORM_TOKEN>
Org-Id: <FORGE_ORG_ID>           # from #157; omitted when empty
Workspace-Id: <FORGE_WORKSPACE_ID> # from #157; omitted when empty

Response (HTTP 200 when the platform reached a decision):

{
  \"decision\": \"admit\" | \"deny\",
  \"reason\": \"cost_limit_exceeded\",
  \"scope\": \"agent\" | \"workspace\" | \"org\",
  \"window\": \"daily\",
  \"reset_at\": \"2026-06-28T14:00:00Z\"
}

Caller sees on deny: HTTP 402 Payment Required + Retry-After (derived from reset_at, clamped non-negative) + structured JSON body mirroring the platform response.

Design decisions (per the locked contract under #201)

Two env vars to engage; nothing else. Baked 2s timeout + 5s cache TTL + GET method. No FORGE_ADMISSION_REQUIRED, no FORGE_ADMISSION_FAIL_MODE, no per-request override knob — keeps the operator surface flat.
Fail-open everywhere. Network error, 4xx, 5xx, parse failure, unknown decision value all produce a logged warn + cached fail-open admit for the TTL. The cache key is per-agent, so platform outage = one call per agent per 5s, not one per request. No knob to flip to fail-closed; if you need hard enforcement on platform outage, do it at a different layer.
Tenancy headers: Org-Id / Workspace-Id (no X-Forge- prefix). Deliberately distinct from the inbound X-Forge-Org-ID / X-Forge-Workspace-ID tenancy stamps Forge accepts (Tenancy stamping: stamp org_id / workspace_id on every audit event from env + headers #157) — different direction, different convention. Empty value → header omitted entirely, never sent as the literal empty string.
Pipeline placement: between auth_middleware and the dispatcher. Auth runs first so platform calls don't burn on unauthenticated traffic; admission runs before the dispatcher so a denied invocation never reaches the executor / LLM / tool stack.
New audit event task_admission_denied carries fields.cached distinguishing "platform actively denied" from "serving a 4-second-old cached deny" when debugging propagation lag.
New OTel span admission.check sibling of auth.verify (Add three runtime spans: auth.verify, channel.<adapter>.deliver, schedule.fire #187) with forge.admission.{decision,reason,scope,window,cached,fallback} attrs. Status=Error on deny. HTTP call nests under it as http.client so total admission latency = span duration, platform-side latency = HTTP child.

What the platform owns

Forge stays a dumb yes/no asker. Platform owns: bearer-token verification, hierarchy precedence (agent → workspace → org), window vocabulary, reset-window timing, per-agent overrides + grace periods, aggregating Forge's audit stream into spend totals. The whole platform contract is curl-testable.

Implementation surface

File	Role
`forge-core/runtime/admission.go`	`AdmissionChecker` interface, `Decision` struct, `NoopAdmissionChecker`
`forge-cli/runtime/admission_engine.go`	`PlatformAdmissionChecker` with TTL cache + injectable clock + fail-open
`forge-cli/runtime/admission_loader.go`	`BuildAdmissionChecker` env resolution + partial-config startup warn
`forge-cli/server/admission_middleware.go`	HTTP middleware: 402 on deny + structured body + Retry-After + audit emission
`forge-cli/runtime/runner.go`	Wired into the server pipeline between auth + dispatcher
`forge-core/runtime/audit.go`	New `AuditTaskAdmissionDenied` constant
`forge-core/observability/attrs.go`	Six new `forge.admission.*` attribute constants

Docs

docs/security/admission.md — full operator + platform-integrator reference
docs/security/audit-logging.md — task_admission_denied event row
docs/core-concepts/observability-tracing.md — admission.check span hierarchy + attribute table
.claude/skills/forge.md — implicit via sync-docs row
.claude/commands/sync-docs.md — new mapping row
CHANGELOG entry

Test plan

golangci-lint run across all four modules — 0 issues
gofmt -w across all modules
go test ./... in forge-core/ and forge-cli/ — all green
21 new unit tests pin: admit / deny / tenancy-header-send-and-omit / cache hit + expire / fail-open on network error + 5xx + 4xx + malformed JSON + unknown decision / query string preservation / 2s timeout / loader engaged-path + silent-noop + partial-config-warn / middleware admit pass-through + deny 402-with-body + negative Retry-After clamp + audit event field carry + Noop short-circuit + nil-checker guard.
Manual smoke: stand up a local mock platform, set FORGE_ADMISSION_URL + FORGE_PLATFORM_TOKEN, hit tasks/send; verify admission.check span attrs + task_admission_denied audit event surface as expected.

…loses #201) Forge has measured LLM token usage per call (llm_call audit event) and per invocation (X-Forge-Tokens-* response headers, invocation_complete audit event) since FWS-3 / #87, but once the platform decided "this agent is over budget" there was no clean way to tell the agent process to stop accepting new invocations. tasks/cancel only stops in-flight work; the per-IP rate limiter (FWS-10) measures request-rate not cost; auth-layer rejection doesn't fit OIDC/cloud-native providers because they validate tokens directly against the IdP with no platform round-trip to piggyback on. This adds a dedicated admission middleware that calls a platform-side API once per agent (cached 5s) to decide whether to admit each new tasks/send. Distinct from auth, distinct from rate limit. Off by default; engaged only when both FORGE_ADMISSION_URL and FORGE_PLATFORM_TOKEN are set. Contract (matches the locked design discussion under #201): - Two env vars to engage; existing FORGE_ORG_ID / FORGE_WORKSPACE_ID from #157 forward as outbound Org-Id / Workspace-Id headers when set (empty value = header omitted entirely on the wire). - GET /admission?agent_id=<id> with bearer + tenancy headers; response {decision, reason, scope, window, reset_at}. - Baked: 2s HTTP timeout, 5s decision cache. Not env-overridable. - Fail-open everywhere: any failure (timeout, 4xx, 5xx, parse error, unknown decision) → logged warn + cached fail-open admit for the TTL. No REQUIRED knob; if you need hard enforcement on platform outage, do it at a different layer. - On deny: HTTP 402 Payment Required + Retry-After (derived from reset_at, clamped non-negative) + structured JSON body carrying reason/scope/window/reset_at. - Pipeline placement: seq counter → auth → admission → dispatcher. Auth runs first so platform calls don't burn on unauthenticated traffic; admission runs before the dispatcher so denied calls never reach the executor / LLM / tool stack. - New audit event task_admission_denied with fields.cached flag. - New OTel span admission.check parallel to auth.verify (#187) with forge.admission.{decision,reason,scope,window,cached,fallback}. Status=Error on deny. HTTP call nests under it as http.client. Implementation surface: forge-core/runtime/admission.go - AdmissionChecker, Decision, NoopAdmissionChecker forge-cli/runtime/admission_engine.go - PlatformAdmissionChecker with TTL cache + fail-open forge-cli/runtime/admission_loader.go - env-driven build, partial- config warn at startup forge-cli/server/admission_middleware.go - HTTP middleware, 402, audit + span emission forge-cli/runtime/runner.go - wired into server pipeline between auth + dispatcher Pinned by TestPlatformAdmissionChecker_{AdmitFromPlatform, DenyFromPlatform, TenancyHeadersSentAndOmitted, CachesWithinTTL, CacheExpires, FailsOpenOnNetworkError, FailsOpenOnPlatform5xx, FailsOpenOnAuth4xx, FailsOpenOnMalformedJSON, FailsOpenOnUnknownDecision, AppendsAgentIDToExistingQuery, TimeoutHonored}, TestBuildAdmissionChecker_{BothEnvSetReturnsPlatformChecker, NeitherEnvSetSilentNoop, PartialConfigWarnsButReturnsNoop}, TestAdmissionMiddleware_{AdmitPassesThrough, DenyReturns402WithStructuredBody, DenyClampsNegativeRetryAfter, EmitsAuditEventOnDeny, NoopShortCircuits, NilCheckerPasses}.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203

feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203
initializ-mk wants to merge 1 commit into
mainfrom
feat/issue-201-admission-hook

initializ-mk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

initializ-mk commented Jun 27, 2026

Summary

Wire shape

Design decisions (per the locked contract under #201)

What the platform owns

Implementation surface

Docs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant