feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203
Open
initializ-mk wants to merge 1 commit into
Open
feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203initializ-mk wants to merge 1 commit into
initializ-mk wants to merge 1 commit into
Conversation
…loses #201) Forge has measured LLM token usage per call (llm_call audit event) and per invocation (X-Forge-Tokens-* response headers, invocation_complete audit event) since FWS-3 / #87, but once the platform decided "this agent is over budget" there was no clean way to tell the agent process to stop accepting new invocations. tasks/cancel only stops in-flight work; the per-IP rate limiter (FWS-10) measures request-rate not cost; auth-layer rejection doesn't fit OIDC/cloud-native providers because they validate tokens directly against the IdP with no platform round-trip to piggyback on. This adds a dedicated admission middleware that calls a platform-side API once per agent (cached 5s) to decide whether to admit each new tasks/send. Distinct from auth, distinct from rate limit. Off by default; engaged only when both FORGE_ADMISSION_URL and FORGE_PLATFORM_TOKEN are set. Contract (matches the locked design discussion under #201): - Two env vars to engage; existing FORGE_ORG_ID / FORGE_WORKSPACE_ID from #157 forward as outbound Org-Id / Workspace-Id headers when set (empty value = header omitted entirely on the wire). - GET /admission?agent_id=<id> with bearer + tenancy headers; response {decision, reason, scope, window, reset_at}. - Baked: 2s HTTP timeout, 5s decision cache. Not env-overridable. - Fail-open everywhere: any failure (timeout, 4xx, 5xx, parse error, unknown decision) → logged warn + cached fail-open admit for the TTL. No REQUIRED knob; if you need hard enforcement on platform outage, do it at a different layer. - On deny: HTTP 402 Payment Required + Retry-After (derived from reset_at, clamped non-negative) + structured JSON body carrying reason/scope/window/reset_at. - Pipeline placement: seq counter → auth → admission → dispatcher. Auth runs first so platform calls don't burn on unauthenticated traffic; admission runs before the dispatcher so denied calls never reach the executor / LLM / tool stack. - New audit event task_admission_denied with fields.cached flag. - New OTel span admission.check parallel to auth.verify (#187) with forge.admission.{decision,reason,scope,window,cached,fallback}. Status=Error on deny. HTTP call nests under it as http.client. Implementation surface: forge-core/runtime/admission.go - AdmissionChecker, Decision, NoopAdmissionChecker forge-cli/runtime/admission_engine.go - PlatformAdmissionChecker with TTL cache + fail-open forge-cli/runtime/admission_loader.go - env-driven build, partial- config warn at startup forge-cli/server/admission_middleware.go - HTTP middleware, 402, audit + span emission forge-cli/runtime/runner.go - wired into server pipeline between auth + dispatcher Pinned by TestPlatformAdmissionChecker_{AdmitFromPlatform, DenyFromPlatform, TenancyHeadersSentAndOmitted, CachesWithinTTL, CacheExpires, FailsOpenOnNetworkError, FailsOpenOnPlatform5xx, FailsOpenOnAuth4xx, FailsOpenOnMalformedJSON, FailsOpenOnUnknownDecision, AppendsAgentIDToExistingQuery, TimeoutHonored}, TestBuildAdmissionChecker_{BothEnvSetReturnsPlatformChecker, NeitherEnvSetSilentNoop, PartialConfigWarnsButReturnsNoop}, TestAdmissionMiddleware_{AdmitPassesThrough, DenyReturns402WithStructuredBody, DenyClampsNegativeRetryAfter, EmitsAuditEventOnDeny, NoopShortCircuits, NilCheckerPasses}.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a pre-dispatch middleware that calls a platform admission API once per agent (cached 5s) to decide whether to admit each new
tasks/send. Distinct from auth (HTTP 401) and from the per-IP rate limiter (HTTP 429). Off by default; engaged only when bothFORGE_ADMISSION_URLandFORGE_PLATFORM_TOKENare set.Closes the platform → agent signal gap: Forge has measured token usage since FWS-3 / #87 so the platform can compute a spend ceiling, but until now there was no clean way for the platform to tell the agent "stop accepting work" when an org/workspace/agent went over budget.
Wire shape
Request (issued at most once per 5s per agent):
Response (HTTP 200 when the platform reached a decision):
{ \"decision\": \"admit\" | \"deny\", \"reason\": \"cost_limit_exceeded\", \"scope\": \"agent\" | \"workspace\" | \"org\", \"window\": \"daily\", \"reset_at\": \"2026-06-28T14:00:00Z\" }Caller sees on deny: HTTP 402 Payment Required + Retry-After (derived from
reset_at, clamped non-negative) + structured JSON body mirroring the platform response.Design decisions (per the locked contract under #201)
GETmethod. NoFORGE_ADMISSION_REQUIRED, noFORGE_ADMISSION_FAIL_MODE, no per-request override knob — keeps the operator surface flat.decisionvalue all produce a logged warn + cached fail-open admit for the TTL. The cache key is per-agent, so platform outage = one call per agent per 5s, not one per request. No knob to flip to fail-closed; if you need hard enforcement on platform outage, do it at a different layer.Org-Id/Workspace-Id(noX-Forge-prefix). Deliberately distinct from the inboundX-Forge-Org-ID/X-Forge-Workspace-IDtenancy stamps Forge accepts (Tenancy stamping: stamp org_id / workspace_id on every audit event from env + headers #157) — different direction, different convention. Empty value → header omitted entirely, never sent as the literal empty string.auth_middlewareand the dispatcher. Auth runs first so platform calls don't burn on unauthenticated traffic; admission runs before the dispatcher so a denied invocation never reaches the executor / LLM / tool stack.task_admission_deniedcarriesfields.cacheddistinguishing "platform actively denied" from "serving a 4-second-old cached deny" when debugging propagation lag.admission.checksibling ofauth.verify(Add three runtime spans: auth.verify, channel.<adapter>.deliver, schedule.fire #187) withforge.admission.{decision,reason,scope,window,cached,fallback}attrs. Status=Error on deny. HTTP call nests under it ashttp.clientso total admission latency = span duration, platform-side latency = HTTP child.What the platform owns
Forge stays a dumb yes/no asker. Platform owns: bearer-token verification, hierarchy precedence (agent → workspace → org),
windowvocabulary, reset-window timing, per-agent overrides + grace periods, aggregating Forge's audit stream into spend totals. The whole platform contract is curl-testable.Implementation surface
forge-core/runtime/admission.goAdmissionCheckerinterface,Decisionstruct,NoopAdmissionCheckerforge-cli/runtime/admission_engine.goPlatformAdmissionCheckerwith TTL cache + injectable clock + fail-openforge-cli/runtime/admission_loader.goBuildAdmissionCheckerenv resolution + partial-config startup warnforge-cli/server/admission_middleware.goforge-cli/runtime/runner.goforge-core/runtime/audit.goAuditTaskAdmissionDeniedconstantforge-core/observability/attrs.goforge.admission.*attribute constantsDocs
docs/security/admission.md— full operator + platform-integrator referencedocs/security/audit-logging.md—task_admission_deniedevent rowdocs/core-concepts/observability-tracing.md—admission.checkspan hierarchy + attribute table.claude/skills/forge.md— implicit via sync-docs row.claude/commands/sync-docs.md— new mapping rowTest plan
golangci-lint runacross all four modules — 0 issuesgofmt -wacross all modulesgo test ./...inforge-core/andforge-cli/— all greenFORGE_ADMISSION_URL+FORGE_PLATFORM_TOKEN, hittasks/send; verify admission.check span attrs + task_admission_denied audit event surface as expected.