Skip to content

feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203

Open
initializ-mk wants to merge 1 commit into
mainfrom
feat/issue-201-admission-hook
Open

feat(admission): platform-API gate for per-agent quota / cost limits (closes #201)#203
initializ-mk wants to merge 1 commit into
mainfrom
feat/issue-201-admission-hook

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Summary

Adds a pre-dispatch middleware that calls a platform admission API once per agent (cached 5s) to decide whether to admit each new tasks/send. Distinct from auth (HTTP 401) and from the per-IP rate limiter (HTTP 429). Off by default; engaged only when both FORGE_ADMISSION_URL and FORGE_PLATFORM_TOKEN are set.

Closes the platform → agent signal gap: Forge has measured token usage since FWS-3 / #87 so the platform can compute a spend ceiling, but until now there was no clean way for the platform to tell the agent "stop accepting work" when an org/workspace/agent went over budget.

Wire shape

Request (issued at most once per 5s per agent):

GET /v1/admission?agent_id=my-agent HTTP/1.1
Authorization: Bearer <FORGE_PLATFORM_TOKEN>
Org-Id: <FORGE_ORG_ID>           # from #157; omitted when empty
Workspace-Id: <FORGE_WORKSPACE_ID> # from #157; omitted when empty

Response (HTTP 200 when the platform reached a decision):

{
  \"decision\": \"admit\" | \"deny\",
  \"reason\": \"cost_limit_exceeded\",
  \"scope\": \"agent\" | \"workspace\" | \"org\",
  \"window\": \"daily\",
  \"reset_at\": \"2026-06-28T14:00:00Z\"
}

Caller sees on deny: HTTP 402 Payment Required + Retry-After (derived from reset_at, clamped non-negative) + structured JSON body mirroring the platform response.

Design decisions (per the locked contract under #201)

  • Two env vars to engage; nothing else. Baked 2s timeout + 5s cache TTL + GET method. No FORGE_ADMISSION_REQUIRED, no FORGE_ADMISSION_FAIL_MODE, no per-request override knob — keeps the operator surface flat.
  • Fail-open everywhere. Network error, 4xx, 5xx, parse failure, unknown decision value all produce a logged warn + cached fail-open admit for the TTL. The cache key is per-agent, so platform outage = one call per agent per 5s, not one per request. No knob to flip to fail-closed; if you need hard enforcement on platform outage, do it at a different layer.
  • Tenancy headers: Org-Id / Workspace-Id (no X-Forge- prefix). Deliberately distinct from the inbound X-Forge-Org-ID / X-Forge-Workspace-ID tenancy stamps Forge accepts (Tenancy stamping: stamp org_id / workspace_id on every audit event from env + headers #157) — different direction, different convention. Empty value → header omitted entirely, never sent as the literal empty string.
  • Pipeline placement: between auth_middleware and the dispatcher. Auth runs first so platform calls don't burn on unauthenticated traffic; admission runs before the dispatcher so a denied invocation never reaches the executor / LLM / tool stack.
  • New audit event task_admission_denied carries fields.cached distinguishing "platform actively denied" from "serving a 4-second-old cached deny" when debugging propagation lag.
  • New OTel span admission.check sibling of auth.verify (Add three runtime spans: auth.verify, channel.<adapter>.deliver, schedule.fire #187) with forge.admission.{decision,reason,scope,window,cached,fallback} attrs. Status=Error on deny. HTTP call nests under it as http.client so total admission latency = span duration, platform-side latency = HTTP child.

What the platform owns

Forge stays a dumb yes/no asker. Platform owns: bearer-token verification, hierarchy precedence (agent → workspace → org), window vocabulary, reset-window timing, per-agent overrides + grace periods, aggregating Forge's audit stream into spend totals. The whole platform contract is curl-testable.

Implementation surface

File Role
forge-core/runtime/admission.go AdmissionChecker interface, Decision struct, NoopAdmissionChecker
forge-cli/runtime/admission_engine.go PlatformAdmissionChecker with TTL cache + injectable clock + fail-open
forge-cli/runtime/admission_loader.go BuildAdmissionChecker env resolution + partial-config startup warn
forge-cli/server/admission_middleware.go HTTP middleware: 402 on deny + structured body + Retry-After + audit emission
forge-cli/runtime/runner.go Wired into the server pipeline between auth + dispatcher
forge-core/runtime/audit.go New AuditTaskAdmissionDenied constant
forge-core/observability/attrs.go Six new forge.admission.* attribute constants

Docs

  • docs/security/admission.md — full operator + platform-integrator reference
  • docs/security/audit-logging.mdtask_admission_denied event row
  • docs/core-concepts/observability-tracing.mdadmission.check span hierarchy + attribute table
  • .claude/skills/forge.md — implicit via sync-docs row
  • .claude/commands/sync-docs.md — new mapping row
  • CHANGELOG entry

Test plan

  • golangci-lint run across all four modules — 0 issues
  • gofmt -w across all modules
  • go test ./... in forge-core/ and forge-cli/ — all green
  • 21 new unit tests pin: admit / deny / tenancy-header-send-and-omit / cache hit + expire / fail-open on network error + 5xx + 4xx + malformed JSON + unknown decision / query string preservation / 2s timeout / loader engaged-path + silent-noop + partial-config-warn / middleware admit pass-through + deny 402-with-body + negative Retry-After clamp + audit event field carry + Noop short-circuit + nil-checker guard.
  • Manual smoke: stand up a local mock platform, set FORGE_ADMISSION_URL + FORGE_PLATFORM_TOKEN, hit tasks/send; verify admission.check span attrs + task_admission_denied audit event surface as expected.

…loses #201)

Forge has measured LLM token usage per call (llm_call audit event) and
per invocation (X-Forge-Tokens-* response headers, invocation_complete
audit event) since FWS-3 / #87, but once the platform decided "this
agent is over budget" there was no clean way to tell the agent process
to stop accepting new invocations. tasks/cancel only stops in-flight
work; the per-IP rate limiter (FWS-10) measures request-rate not cost;
auth-layer rejection doesn't fit OIDC/cloud-native providers because
they validate tokens directly against the IdP with no platform
round-trip to piggyback on.

This adds a dedicated admission middleware that calls a platform-side
API once per agent (cached 5s) to decide whether to admit each new
tasks/send. Distinct from auth, distinct from rate limit. Off by
default; engaged only when both FORGE_ADMISSION_URL and
FORGE_PLATFORM_TOKEN are set.

Contract (matches the locked design discussion under #201):

  - Two env vars to engage; existing FORGE_ORG_ID / FORGE_WORKSPACE_ID
    from #157 forward as outbound Org-Id / Workspace-Id headers when
    set (empty value = header omitted entirely on the wire).
  - GET /admission?agent_id=<id> with bearer + tenancy headers;
    response {decision, reason, scope, window, reset_at}.
  - Baked: 2s HTTP timeout, 5s decision cache. Not env-overridable.
  - Fail-open everywhere: any failure (timeout, 4xx, 5xx, parse error,
    unknown decision) → logged warn + cached fail-open admit for the
    TTL. No REQUIRED knob; if you need hard enforcement on platform
    outage, do it at a different layer.
  - On deny: HTTP 402 Payment Required + Retry-After (derived from
    reset_at, clamped non-negative) + structured JSON body carrying
    reason/scope/window/reset_at.
  - Pipeline placement: seq counter → auth → admission → dispatcher.
    Auth runs first so platform calls don't burn on unauthenticated
    traffic; admission runs before the dispatcher so denied calls
    never reach the executor / LLM / tool stack.
  - New audit event task_admission_denied with fields.cached flag.
  - New OTel span admission.check parallel to auth.verify (#187) with
    forge.admission.{decision,reason,scope,window,cached,fallback}.
    Status=Error on deny. HTTP call nests under it as http.client.

Implementation surface:

  forge-core/runtime/admission.go         - AdmissionChecker, Decision,
                                            NoopAdmissionChecker
  forge-cli/runtime/admission_engine.go   - PlatformAdmissionChecker
                                            with TTL cache + fail-open
  forge-cli/runtime/admission_loader.go   - env-driven build, partial-
                                            config warn at startup
  forge-cli/server/admission_middleware.go - HTTP middleware, 402,
                                             audit + span emission
  forge-cli/runtime/runner.go              - wired into server pipeline
                                             between auth + dispatcher

Pinned by TestPlatformAdmissionChecker_{AdmitFromPlatform,
DenyFromPlatform, TenancyHeadersSentAndOmitted, CachesWithinTTL,
CacheExpires, FailsOpenOnNetworkError, FailsOpenOnPlatform5xx,
FailsOpenOnAuth4xx, FailsOpenOnMalformedJSON,
FailsOpenOnUnknownDecision, AppendsAgentIDToExistingQuery,
TimeoutHonored}, TestBuildAdmissionChecker_{BothEnvSetReturnsPlatformChecker,
NeitherEnvSetSilentNoop, PartialConfigWarnsButReturnsNoop},
TestAdmissionMiddleware_{AdmitPassesThrough,
DenyReturns402WithStructuredBody, DenyClampsNegativeRetryAfter,
EmitsAuditEventOnDeny, NoopShortCircuits, NilCheckerPasses}.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant