diff --git a/.github/workflows/harness-integration.yml b/.github/workflows/harness-integration.yml index ab6b353b9..11b5239dc 100644 --- a/.github/workflows/harness-integration.yml +++ b/.github/workflows/harness-integration.yml @@ -7,6 +7,7 @@ on: paths: - "src/agentex/lib/core/harness/**" - "src/agentex/lib/adk/_modules/**" + - "tests/lib/core/harness/test_harness_pydantic_ai_*.py" - ".github/workflows/harness-integration.yml" jobs: @@ -31,10 +32,28 @@ jobs: - name: Conformance suite run: ./scripts/test tests/lib/core/harness/ -v - # Live integration matrix (harness x {sync, async, temporal}) is added per-harness - # in the migration plans. Placeholder job keeps the workflow valid until then. + # Offline pydantic-ai integration tests (sync / async / temporal channels). + # These use pydantic-ai TestModel + fake streaming/tracing and require no live + # infrastructure. Enabled here for PR 4 (pydantic-ai migration). Future harness + # migration PRs (5-8) should add their integration-test paths to this matrix. live-matrix: runs-on: ubuntu-latest - if: false # enabled once the first harness's test agents land + strategy: + matrix: + channel: [sync, async, temporal] + fail-fast: false + name: pydantic-ai-${{ matrix.channel }} steps: - - run: echo "populated by migration PRs" # TODO(harness-migration): enable per-harness; see docs/superpowers/plans migration PRs 4-8 + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + + - name: Install uv + uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5.4.2 + with: + version: '0.10.2' + + - name: Bootstrap + run: ./scripts/bootstrap + + - name: pydantic-ai ${{ matrix.channel }} integration tests (offline, TestModel) + run: | + ./scripts/test tests/lib/core/harness/test_harness_pydantic_ai_${{ matrix.channel }}.py -v diff --git a/docs/superpowers/plans/2026-06-18-unified-harness-surface-pr4-pydantic-ai.md b/docs/superpowers/plans/2026-06-18-unified-harness-surface-pr4-pydantic-ai.md new file mode 100644 index 000000000..2fa1892fe --- /dev/null +++ b/docs/superpowers/plans/2026-06-18-unified-harness-surface-pr4-pydantic-ai.md @@ -0,0 +1,246 @@ +# Unified Harness Surface — PR 4: pydantic-ai Migration Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Migrate the pydantic-ai harness onto the unified harness surface so it emits streaming + persisted messages + tracing + turn usage through ONE source of truth, over both delivery channels (yield + auto-send), with no public regression — and ship its 3 integration test agents (sync/async/temporal). + +**Architecture:** Wrap a pydantic-ai run as a `HarnessTurn` (canonical `StreamTaskMessage*` stream + normalized `TurnUsage`). Reuse the existing `convert_pydantic_ai_to_agentex_events` mapping as the tap. Reimplement the existing public auto-send helper on top of `UnifiedEmitter.auto_send_turn`, and route sync ACP agents through `UnifiedEmitter.yield_turn`. Retire the bespoke `_pydantic_ai_tracing` handler in favor of the surface's derived spans (keep the old symbol as a deprecated shim). + +**Tech Stack:** Python 3, pydantic-ai (`pydantic_ai`), pydantic v2, pytest + pytest-asyncio, the `agentex.lib.core.harness` package from PRs 1-3. + +**Foundation:** `src/agentex/lib/core/harness/` (`UnifiedEmitter`, `SpanTracer`, `SpanDeriver`, `HarnessTurn`, `TurnUsage`, `TurnResult`, `yield_events`, `auto_send`, conformance scaffold). Design: `docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md`. + +--- + +## Dependencies (must land first) + +- **AGX1-373** — cross-channel conformance equivalence + `Full` wire reconciliation. PR 4's conformance fixtures register into the upgraded cross-channel runner. **Do not start Task 6 until 373 is merged into the foundation branch.** +- **AGX1-375** — public `adk` import path for the harness surface. If merged, import the surface via the public path in this PR; if not, import from `agentex.lib.core.harness` and add a follow-up note. (Tasks below assume `from agentex.lib.core.harness import UnifiedEmitter, TurnUsage, ...`; swap to the public path if 375 landed.) + +This is one PR (target < 1000 lines code, excluding any recorded fixtures). The 3 test agents are the largest chunk; if the diff exceeds budget, split the test agents into a follow-up PR 4b (note in the PR description). + +--- + +## File Structure + +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_sync.py` — add an optional `on_result` callback to `convert_pydantic_ai_to_agentex_events` (additive) so usage can be captured. Behavior unchanged when omitted. +- Create `src/agentex/lib/adk/_modules/_pydantic_ai_turn.py` — `PydanticAITurn(HarnessTurn)` + `pydantic_ai_usage_to_turn_usage(...)`. +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_async.py` — reimplement `stream_pydantic_ai_events` on `UnifiedEmitter.auto_send_turn`, preserving signature + return. +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` — mark `create_pydantic_ai_tracing_handler` / `AgentexPydanticAITracingHandler` deprecated (docstring + `DeprecationWarning`); keep importable. +- Create `tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py` — register pydantic-ai fixtures into the cross-channel conformance runner. +- Create `examples/tutorials/harness-pydantic-ai-{sync,async,temporal}/` — 3 test agents (modeled on the `sync-pydantic-ai` / `default-pydantic-ai` / `temporal-pydantic-ai` CLI templates) using the unified surface. +- Modify `.github/workflows/harness-integration.yml` — enable the pydantic-ai rows of the `live-matrix` job. +- Modify `.github/workflows/agentex-tutorials-test.yml` (or its agent list) — include the 3 new test agents if that workflow enumerates agents. + +--- + +## Task 1: Expose the pydantic-ai run result for usage capture + +**Files:** +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_sync.py` +- Test: `tests/lib/adk/test_pydantic_ai_sync.py` (create if absent) + +The converter already iterates the pydantic-ai event stream and currently *ignores* `AgentRunResultEvent` (the terminal event carrying the run result + usage). Add an optional callback so a caller can capture it without changing existing behavior. + +- [ ] **Step 1: Write the failing test.** + +```python +import pytest +from agentex.lib.adk._modules._pydantic_ai_sync import convert_pydantic_ai_to_agentex_events + + +class _FakeResultEvent: # stand-in for pydantic_ai.run.AgentRunResultEvent + def __init__(self, result): + self.result = result + + +async def _stream(events): + for e in events: + yield e + + +@pytest.mark.asyncio +async def test_on_result_callback_receives_terminal_event(monkeypatch): + # When the stream ends with an AgentRunResultEvent, on_result is invoked with it, + # and the converter still yields no extra events for it. + captured = {} + # Use a real AgentRunResultEvent if constructable; otherwise patch isinstance check. + # (Implementer: see Step 3 note — match the real terminal event type.) + ... +``` + +Implementer note: the exact terminal event type is `pydantic_ai.run.AgentRunResultEvent` (already imported in `_pydantic_ai_sync.py`). Write the test to feed a stream ending in a real `AgentRunResultEvent` (construct it as the installed pydantic-ai version requires; inspect `python -c "import pydantic_ai.run, inspect; print(inspect.signature(pydantic_ai.run.AgentRunResultEvent))"`). Assert `on_result` is called once with that event and that the converter yields the same `StreamTaskMessage*` sequence as without the callback (no behavior change for the streaming output). + +- [ ] **Step 2: Run** `uv run pytest tests/lib/adk/test_pydantic_ai_sync.py -v` — expect FAIL (no `on_result` param). + +- [ ] **Step 3: Implement.** Add `on_result: Callable[[AgentRunResultEvent], None] | None = None` (and an async-callable variant if needed) to `convert_pydantic_ai_to_agentex_events`. In the existing `elif isinstance(event, (FunctionToolCallEvent, FinalResultEvent, AgentRunResultEvent))` branch, when the event is an `AgentRunResultEvent` and `on_result` is set, call it (await if it's a coroutine). Keep yielding nothing for it. No other change. + +- [ ] **Step 4: Run** the test — expect PASS, plus run the existing `_pydantic_ai_sync` tests if any to confirm no regression. + +- [ ] **Step 5: Commit** `feat(pydantic-ai): optional on_result callback to expose run result for usage capture`. + +--- + +## Task 2: Normalize pydantic-ai usage to `TurnUsage` + +**Files:** +- Create: `src/agentex/lib/adk/_modules/_pydantic_ai_turn.py` +- Test: `tests/lib/adk/test_pydantic_ai_turn.py` + +- [ ] **Step 1: Verify the real usage shape FIRST.** Run `uv run python -c "from pydantic_ai.usage import RunUsage; import inspect; print([f for f in RunUsage.model_fields])"` (the type/name may be `RunUsage` or `Usage` depending on the installed version). Record the exact field names (commonly: `input_tokens`, `output_tokens`, `total_tokens`, `requests`, and a cache/`details` field). The mapping in Step 3 MUST use the real field names. + +- [ ] **Step 2: Write the failing test.** + +```python +from agentex.lib.adk._modules._pydantic_ai_turn import pydantic_ai_usage_to_turn_usage + + +def test_usage_normalization_maps_fields(): + # Build a usage object matching the installed pydantic-ai RunUsage shape + # (see Task 2 Step 1 for the real fields), then assert the mapping. + usage_obj = ... # construct RunUsage(input_tokens=10, output_tokens=20, requests=2, ...) + tu = pydantic_ai_usage_to_turn_usage(usage_obj, model="openai:gpt-4o") + assert tu.model == "openai:gpt-4o" + assert tu.input_tokens == 10 + assert tu.output_tokens == 20 + assert tu.num_llm_calls == 2 +``` + +- [ ] **Step 3: Implement** `pydantic_ai_usage_to_turn_usage(usage, model) -> TurnUsage` mapping the verified RunUsage fields onto `TurnUsage` (`input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens` if available, `num_llm_calls` ← `requests`). Use `getattr(usage, "", None)` defensively so a version field rename degrades to `None` rather than crashing. Then implement `PydanticAITurn`: + +```python +class PydanticAITurn: + """A pydantic-ai run as a HarnessTurn: canonical event stream + normalized usage.""" + + def __init__(self, stream, model: str | None = None): + self._stream = stream + self._model = model + self._usage = TurnUsage(model=model) + + @property + async def events(self): + def _capture(result_event): + run_result = getattr(result_event, "result", None) + usage_obj = run_result.usage() if run_result is not None else None + if usage_obj is not None: + self._usage = pydantic_ai_usage_to_turn_usage(usage_obj, self._model) + async for ev in convert_pydantic_ai_to_agentex_events(self._stream, on_result=_capture): + yield ev + + def usage(self) -> TurnUsage: + return self._usage +``` + +(Verify `run_result.usage()` is the correct accessor for the installed version; adjust if it's an attribute.) + +- [ ] **Step 4: Add a `PydanticAITurn` test** that feeds a small stream ending in an `AgentRunResultEvent` whose `result.usage()` returns a known usage, drives `turn.events` to exhaustion, then asserts `turn.usage()` reflects the normalized values and that `events` yielded the expected `StreamTaskMessage*`. Confirm `usage()` BEFORE exhaustion returns the default (documented single-pass contract). + +- [ ] **Step 5: Run** the tests — expect PASS. + +- [ ] **Step 6: Commit** `feat(pydantic-ai): PydanticAITurn HarnessTurn + usage normalization`. + +--- + +## Task 3: Reimplement the auto-send helper on the unified surface + +**Files:** +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_async.py` +- Test: `tests/lib/adk/test_pydantic_ai_async.py` + +`stream_pydantic_ai_events(stream, task_id, ...)` currently hand-drives `adk.streaming`. Reimplement it to delegate to `UnifiedEmitter.auto_send_turn(PydanticAITurn(stream, model))`, preserving its signature and return value (the accumulated final text). Feature-add: traces by default. + +- [ ] **Step 1: Capture current behavior as a characterization test.** Before changing anything, write a test that runs the CURRENT `stream_pydantic_ai_events` over a fixture stream with a fake `adk.streaming` and records the messages produced (text, tool request/response). This is the backward-compat baseline ("equivalent messages before/after" from the design). + +- [ ] **Step 2: Run** it green against the current implementation. Commit the test alone: `test(pydantic-ai): characterize stream_pydantic_ai_events output`. + +- [ ] **Step 3: Reimplement** `stream_pydantic_ai_events` to build a `PydanticAITurn` and call `UnifiedEmitter(task_id=task_id, trace_id=, parent_span_id=, streaming=).auto_send_turn(turn)`, returning `result.final_text`. Resolve `trace_id`/`parent_span_id` the same way the module does today (from the streaming/tracing context vars it already reads). Preserve the exact public signature and return type. + +- [ ] **Step 4: Run** the characterization test — it must still pass (same messages). Adjust the test only if AGX1-373 deliberately changed the tool-message wire shape; in that case assert the post-373 shape and note it. Confirm tracing now occurs by default (assert spans via a fake tracer). + +- [ ] **Step 5: Commit** `refactor(pydantic-ai): reimplement stream_pydantic_ai_events on UnifiedEmitter (default tracing)`. + +--- + +## Task 4: Route sync ACP delivery through the surface + deprecate the bespoke tracing handler + +**Files:** +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` +- (Reference) the sync ACP usage pattern in the pydantic-ai docs/templates. + +- [ ] **Step 1: Deprecate the bespoke tracing handler.** Add a `DeprecationWarning` (via `warnings.warn(...)`) and a docstring note to `create_pydantic_ai_tracing_handler` / `AgentexPydanticAITracingHandler` stating the unified surface (`UnifiedEmitter`, which derives spans from the canonical stream) supersedes it. Keep the symbols importable and functional (no removal — backward compat). + +- [ ] **Step 2: Confirm the sync path.** The sync tap remains `convert_pydantic_ai_to_agentex_events`. Document (in the module docstring of `_pydantic_ai_sync.py`) the recommended sync ACP usage: + +```python +turn = PydanticAITurn(agent.run_stream_events(...), model=...) +async for event in emitter.yield_turn(turn): + yield event +``` + +No code change beyond the docstring (the sync converter already yields the canonical stream; `yield_turn` adds tracing). Add a test that `emitter.yield_turn(PydanticAITurn(...))` forwards the same events the bare converter would and derives spans. + +- [ ] **Step 3: Run** tests; **Commit** `refactor(pydantic-ai): deprecate bespoke tracing handler; document unified sync path`. + +--- + +## Task 5: pydantic-ai cross-channel conformance fixtures + +**Files:** +- Create: `tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py` + +**Blocked by AGX1-373** (the cross-channel conformance runner). Once 373 is merged into the foundation branch: + +- [ ] **Step 1: Record canonical fixtures.** For 3-4 representative pydantic-ai runs (text-only; single tool; reasoning/thinking; multi-step text+tool), capture the `StreamTaskMessage*` sequence the tap produces (run `convert_pydantic_ai_to_agentex_events` over recorded `AgentStreamEvent` inputs, or hand-author the canonical sequences). Store as `Fixture(name=..., events=[...])`. + +- [ ] **Step 2: Register** each fixture with the conformance runner and let the cross-channel parametrized test (from AGX1-373) assert yield-vs-auto-send equivalence + span equivalence for each. Register/parametrize within THIS module (per the runner's documented per-module registry semantics). + +- [ ] **Step 3: Run** `./scripts/test tests/lib/core/harness/ -v` — all green. **Commit** `test(pydantic-ai): cross-channel conformance fixtures`. + +--- + +## Task 6: Three integration test agents (sync / async / temporal) + +**Files:** +- Create: `examples/tutorials/harness-pydantic-ai-sync/` , `…-async/` , `…-temporal/` (each a minimal Agentex agent). +- Modify: `.github/workflows/harness-integration.yml` (enable pydantic-ai `live-matrix` rows). +- Modify: `.github/workflows/agentex-tutorials-test.yml` if it enumerates agents. + +Each agent is the smallest agent that exercises one delivery channel through the unified surface with the pydantic-ai harness. + +- [ ] **Step 1: Scaffold from the existing templates.** Base each agent on the corresponding CLI template: `sync-pydantic-ai`, `default-pydantic-ai` (async), `temporal-pydantic-ai` (under `src/agentex/lib/cli/templates/`). In each, the message handler builds `PydanticAITurn(agent.run_stream_events(params.content.content), model=...)` and: + - sync agent: `async for ev in emitter.yield_turn(turn): yield ev` + - async + temporal agents: `await emitter.auto_send_turn(turn)` (temporal: inside the activity, as the template already structures it). + Use a tiny pydantic-ai agent with ONE trivial tool so the run exercises text + a tool call + tool response. + +- [ ] **Step 2: Write an integration test per agent** that drives it with a fixed prompt and asserts: valid ordered messages (text + tool request + tool response) and a well-formed span tree. Use the repo's existing tutorial-agent test harness pattern (see `agentex-tutorials-test.yml` and how current tutorial agents are tested). + +- [ ] **Step 3: Wire CI.** In `.github/workflows/harness-integration.yml`, replace the `if: false` placeholder `live-matrix` job (or add a real matrix) with the pydantic-ai × {sync, async, temporal} entries, each running its agent's integration test. If `agentex-tutorials-test.yml` enumerates agents, add the three there too. `log`/document any agent-type not covered (none expected for pydantic-ai). + +- [ ] **Step 4: Run** the integration tests locally (as far as the env allows) and the conformance + unit suites. **Commit** `test(pydantic-ai): sync/async/temporal integration agents + enable CI live-matrix rows`. + +--- + +## Task 7: Full suite, type check, and backward-compat audit + +- [ ] **Step 1:** `./scripts/test tests/lib/core/harness/ tests/lib/adk/ -v` — all green on 3.12 + 3.13. +- [ ] **Step 2:** `uv run pyright src/agentex/lib/` (or the harness + pydantic modules) — 0 new errors. +- [ ] **Step 3: Backward-compat audit.** Confirm the public signatures are unchanged: `convert_pydantic_ai_to_agentex_events` (only gained an optional kwarg), `stream_pydantic_ai_events` (same signature + return), `create_pydantic_ai_tracing_handler` (still importable, now warns). Grep the repo + templates for callers and confirm none broke. +- [ ] **Step 4:** If any fix was needed, **Commit** `chore(pydantic-ai): type/back-compat fixes`. + +--- + +## Self-Review checklist (run before opening the PR) + +- Every public symbol that existed before still exists with the same signature (additive-only): `convert_pydantic_ai_to_agentex_events`, `stream_pydantic_ai_events`, `create_pydantic_ai_tracing_handler`. +- The auto-send helper returns the same final text as before (characterization test passes, or the post-373 shape is asserted with a note). +- Tracing is now on by default for both channels and is overridable (emitter `tracer=False`). +- Usage normalization uses the REAL pydantic-ai usage field names (verified in Task 2 Step 1), with defensive `getattr`. +- Conformance fixtures register per-module and pass the cross-channel assertion from AGX1-373. +- 3 test agents exist and their CI rows are enabled. +- No `# type: ignore` added without justification. + +## Notes for the PR description + +- Link AGX1-373 (dependency) and AGX1-375 (import path); note AGX1-374 (reasoning/mixed-ordering auto_send tests) is foundation-level and orthogonal. +- State the diff size; if test agents pushed it over budget, note the PR 4b split. +- This is the template the langgraph (PR 5) and openai (PR 6) migrations follow. diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/.dockerignore b/examples/tutorials/00_sync/harness_pydantic_ai/.dockerignore new file mode 100644 index 000000000..c49489471 --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/.dockerignore @@ -0,0 +1,43 @@ +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg + +# Environments +.env** +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# IDE +.idea/ +.vscode/ +*.swp +*.swo + +# Git +.git +.gitignore + +# Misc +.DS_Store diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/Dockerfile b/examples/tutorials/00_sync/harness_pydantic_ai/Dockerfile new file mode 100644 index 000000000..3a9412fa9 --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/Dockerfile @@ -0,0 +1,50 @@ +# syntax=docker/dockerfile:1.3 +FROM python:3.12-slim +COPY --from=ghcr.io/astral-sh/uv:0.6.4 /uv /uvx /bin/ + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + htop \ + vim \ + curl \ + tar \ + python3-dev \ + postgresql-client \ + build-essential \ + libpq-dev \ + gcc \ + cmake \ + netcat-openbsd \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +RUN uv pip install --system --upgrade pip setuptools wheel + +ENV UV_HTTP_TIMEOUT=1000 + +# Copy pyproject.toml and README.md to install dependencies +COPY 00_sync/harness_pydantic_ai/pyproject.toml /app/harness_pydantic_ai/pyproject.toml +COPY 00_sync/harness_pydantic_ai/README.md /app/harness_pydantic_ai/README.md + +WORKDIR /app/harness_pydantic_ai + +# Copy the project code +COPY 00_sync/harness_pydantic_ai/project /app/harness_pydantic_ai/project + +# Copy the test files +COPY 00_sync/harness_pydantic_ai/tests /app/harness_pydantic_ai/tests + +# Copy shared test utilities +COPY test_utils /app/test_utils + +# Install the required Python packages with dev dependencies +RUN uv pip install --system .[dev] + +# Set environment variables +ENV PYTHONPATH=/app + +# Set test environment variables +ENV AGENT_NAME=s-harness-pydantic-ai + +# Run the agent using uvicorn +CMD ["uvicorn", "project.acp:acp", "--host", "0.0.0.0", "--port", "8000"] diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/README.md b/examples/tutorials/00_sync/harness_pydantic_ai/README.md new file mode 100644 index 000000000..1466bc4e7 --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/README.md @@ -0,0 +1,54 @@ +# Sync Pydantic AI Harness Test Agent + +A minimal **synchronous** Pydantic AI agent that drives the **unified harness +surface** (`UnifiedEmitter.yield_turn` + `PydanticAITurn`) on the sync +(HTTP-yield) channel. + +## Why this agent exists + +The `00_sync/040_pydantic_ai` tutorial streams via the bare +`convert_pydantic_ai_to_agentex_events` converter and does **not** exercise the +unified `yield_turn` path. This harness test agent is the sync coverage for the +unified surface: it proves an agent author can wire the sync channel through +`UnifiedEmitter` and get automatic span derivation (tool spans nested under the +per-turn span) for free, exactly like the async/temporal channels. + +## How it wires the unified surface + +In `project/acp.py`: + +```python +emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=turn_span.id if turn_span else None, +) +async with agent.run_stream_events(user_message) as stream: + turn = PydanticAITurn(stream, model=MODEL_NAME) # coalesce off: stream tool-call arg tokens + async for ev in emitter.yield_turn(turn): + yield ev +``` + +- `coalesce_tool_requests=False` (the default) preserves token-by-token + tool-call argument streaming on the sync channel. +- The `UnifiedEmitter` is constructed from the ACP/streaming context + (`task_id` + `trace_id` + `parent_span_id`) so tool spans nest under the + per-turn `AGENT_WORKFLOW` span automatically. + +## Files + +- `project/acp.py` — sync ACP handler using `emitter.yield_turn(...)`. +- `project/agent.py` — builds the `pydantic_ai.Agent` with one tool. +- `project/tools.py` — `get_weather(city)` returning a constant. +- `tests/test_agent.py` — live integration test (requires a running agent). + +## Tools + +- `get_weather(city: str) -> str`: returns a fixed "sunny and 72°F" string so a + run deterministically exercises text + a tool call + a tool response. + +## Offline coverage + +Offline integration tests for the same wiring (pydantic-ai `TestModel` + fake +streaming/tracing, no network) live in the SDK repo at +`tests/lib/core/harness/test_harness_pydantic_ai_sync.py`. diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/manifest.yaml b/examples/tutorials/00_sync/harness_pydantic_ai/manifest.yaml new file mode 100644 index 000000000..55d8f5d2b --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/manifest.yaml @@ -0,0 +1,58 @@ +build: + context: + root: ../../ + include_paths: + - 00_sync/harness_pydantic_ai + - test_utils + dockerfile: 00_sync/harness_pydantic_ai/Dockerfile + dockerignore: 00_sync/harness_pydantic_ai/.dockerignore + +local_development: + agent: + port: 8000 + host_address: host.docker.internal + paths: + acp: project/acp.py + +agent: + acp_type: sync + name: s-harness-pydantic-ai + description: A sync Pydantic AI harness test agent using the unified emitter surface + + temporal: + enabled: false + + credentials: + - env_var_name: OPENAI_API_KEY + secret_name: openai-api-key + secret_key: api-key + - env_var_name: REDIS_URL + secret_name: redis-url-secret + secret_key: url + - env_var_name: SGP_API_KEY + secret_name: sgp-api-key + secret_key: api-key + - env_var_name: SGP_ACCOUNT_ID + secret_name: sgp-account-id + secret_key: account-id + - env_var_name: SGP_CLIENT_BASE_URL + secret_name: sgp-client-base-url + secret_key: url + +deployment: + image: + repository: "" + tag: "latest" + + global: + agent: + name: "s-harness-pydantic-ai" + description: "A sync Pydantic AI harness test agent using the unified emitter surface" + replicaCount: 1 + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "1000m" + memory: "2Gi" diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/project/__init__.py b/examples/tutorials/00_sync/harness_pydantic_ai/project/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/project/acp.py b/examples/tutorials/00_sync/harness_pydantic_ai/project/acp.py new file mode 100644 index 000000000..d78921c8e --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/project/acp.py @@ -0,0 +1,92 @@ +"""ACP handler for the sync harness Pydantic AI test agent. + +This agent exercises the UNIFIED HARNESS SURFACE on the sync (HTTP-yield) +channel — ``UnifiedEmitter.yield_turn(PydanticAITurn(...))`` — rather than the +bare ``convert_pydantic_ai_to_agentex_events`` converter used by the +``040_pydantic_ai`` tutorial. The unified surface gives the sync channel the +same tracing (span derivation) the async/temporal channels get for free. + +Flow: +1. Open a per-turn AGENT_WORKFLOW span via ``adk.tracing.span``. +2. Construct a ``UnifiedEmitter`` from the ACP/streaming context (task_id + + trace_id + parent_span_id) so tool spans nest under the turn span. +3. Wrap ``agent.run_stream_events(...)`` in a ``PydanticAITurn`` and forward + events with ``emitter.yield_turn(turn)`` — yielding each to the client. +""" + +from __future__ import annotations + +import os +from typing import AsyncGenerator + +from dotenv import load_dotenv + +load_dotenv() + +import agentex.lib.adk as adk +from project.agent import MODEL_NAME, create_agent +from agentex.lib.types.acp import SendMessageParams +from agentex.lib.core.harness import UnifiedEmitter +from agentex.lib.types.tracing import SGPTracingProcessorConfig +from agentex.lib.utils.logging import make_logger +from agentex.lib.sdk.fastacp.fastacp import FastACP +from agentex.types.task_message_update import TaskMessageUpdate +from agentex.types.task_message_content import TaskMessageContent +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn +from agentex.lib.core.tracing.tracing_processor_manager import add_tracing_processor_config + +logger = make_logger(__name__) + +add_tracing_processor_config( + SGPTracingProcessorConfig( + sgp_api_key=os.environ.get("SGP_API_KEY", ""), + sgp_account_id=os.environ.get("SGP_ACCOUNT_ID", ""), + sgp_base_url=os.environ.get("SGP_CLIENT_BASE_URL", ""), + ) +) + +acp = FastACP.create(acp_type="sync") + +_agent = None + + +def get_agent(): + """Get or create the Pydantic AI agent instance.""" + global _agent + if _agent is None: + _agent = create_agent() + return _agent + + +@acp.on_message_send +async def handle_message_send( + params: SendMessageParams, +) -> TaskMessageContent | list[TaskMessageContent] | AsyncGenerator[TaskMessageUpdate, None]: + """Handle incoming messages, streaming events through the unified surface.""" + agent = get_agent() + task_id = params.task.id + + user_message = params.content.content + logger.info(f"Processing message for task {task_id}") + + async with adk.tracing.span( + trace_id=task_id, + task_id=task_id, + name="message", + input={"message": user_message}, + data={"__span_type__": "AGENT_WORKFLOW"}, + ) as turn_span: + # Construct the UnifiedEmitter from the ACP/streaming context so tracing + # is automatic: tool spans nest under this turn's span. + emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=turn_span.id if turn_span else None, + ) + + async with agent.run_stream_events(user_message) as stream: + # Default coalesce_tool_requests=False preserves token-by-token + # tool-call argument streaming on the sync/HTTP channel. + turn = PydanticAITurn(stream, model=MODEL_NAME) + async for ev in emitter.yield_turn(turn): + yield ev diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/project/agent.py b/examples/tutorials/00_sync/harness_pydantic_ai/project/agent.py new file mode 100644 index 000000000..72fd74173 --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/project/agent.py @@ -0,0 +1,39 @@ +"""Pydantic AI agent definition for the sync harness test agent. + +The Agent is the boundary between this module and the API layer (acp.py). +Pydantic AI handles its own tool-call loop internally — no graph required. +""" + +from __future__ import annotations + +from datetime import datetime + +from pydantic_ai import Agent + +from project.tools import get_weather + +__all__ = ["create_agent", "MODEL_NAME"] + +MODEL_NAME = "openai:gpt-4o-mini" +SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools. + +Current date and time: {timestamp} + +Guidelines: +- Be concise and helpful +- Use tools when they would help answer the user's question +- If you're unsure, ask clarifying questions +- Always provide accurate information +""" + + +def create_agent() -> Agent: + """Build and return the Pydantic AI agent with tools registered.""" + agent = Agent( + MODEL_NAME, + system_prompt=SYSTEM_PROMPT.format(timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S")), + ) + + agent.tool_plain(get_weather) + + return agent diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/project/tools.py b/examples/tutorials/00_sync/harness_pydantic_ai/project/tools.py new file mode 100644 index 000000000..d649c75f1 --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/project/tools.py @@ -0,0 +1,20 @@ +"""Tool definitions for the sync harness Pydantic AI agent. + +Pydantic AI tools are registered directly on the Agent via decorators +(see project.agent). This module hosts the bare function so it is easy to +unit-test in isolation. +""" + +from __future__ import annotations + + +def get_weather(city: str) -> str: + """Get the current weather for a city. + + Args: + city: The name of the city to get weather for. + + Returns: + A string describing the weather conditions. + """ + return f"The weather in {city} is sunny and 72°F" diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/pyproject.toml b/examples/tutorials/00_sync/harness_pydantic_ai/pyproject.toml new file mode 100644 index 000000000..08f709a4a --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/pyproject.toml @@ -0,0 +1,36 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "s-harness-pydantic-ai" +version = "0.1.0" +description = "A sync Pydantic AI harness test agent using the unified emitter surface" +readme = "README.md" +requires-python = ">=3.12" +dependencies = [ + "agentex-sdk", + "scale-gp", + "pydantic-ai-slim[openai]>=1.0,<2", +] + +[project.optional-dependencies] +dev = [ + "pytest", + "pytest-asyncio", + "httpx", + "black", + "isort", + "flake8", +] + +[tool.hatch.build.targets.wheel] +packages = ["project"] + +[tool.black] +line-length = 88 +target-version = ['py312'] + +[tool.isort] +profile = "black" +line_length = 88 diff --git a/examples/tutorials/00_sync/harness_pydantic_ai/tests/test_agent.py b/examples/tutorials/00_sync/harness_pydantic_ai/tests/test_agent.py new file mode 100644 index 000000000..96da95fdc --- /dev/null +++ b/examples/tutorials/00_sync/harness_pydantic_ai/tests/test_agent.py @@ -0,0 +1,138 @@ +"""Live tests for the sync harness Pydantic AI agent. + +These tests require a running agent (server + deployed agent) and exercise the +unified-surface sync handler end-to-end over the wire. They mirror the +``040_pydantic_ai`` tutorial tests but target this harness agent. + +Offline coverage of the same wiring (TestModel + fake streaming/tracing) lives +in ``tests/lib/core/harness/test_harness_pydantic_ai_sync.py`` in the SDK repo. + +To run these tests: +1. Make sure the agent is running (via docker-compose or `agentex agents run`) +2. Set the AGENTEX_API_BASE_URL environment variable if not using default +3. Run: pytest test_agent.py -v + +Configuration: +- AGENTEX_API_BASE_URL: Base URL for the AgentEx server (default: http://localhost:5003) +- AGENT_NAME: Name of the agent to test (default: s-harness-pydantic-ai) +""" + +import os + +import pytest +from test_utils.sync import validate_text_in_string, collect_streaming_response + +from agentex import Agentex +from agentex.types import TextContentParam +from agentex.types.agent_rpc_params import ParamsSendMessageRequest + +AGENTEX_API_BASE_URL = os.environ.get("AGENTEX_API_BASE_URL", "http://localhost:5003") +AGENT_NAME = os.environ.get("AGENT_NAME", "s-harness-pydantic-ai") + + +@pytest.fixture +def client(): + """Create an AgentEx client instance for testing.""" + return Agentex(base_url=AGENTEX_API_BASE_URL) + + +@pytest.fixture +def agent_name(): + """Return the agent name for testing.""" + return AGENT_NAME + + +@pytest.fixture +def agent_id(client, agent_name): + """Retrieve the agent ID based on the agent name.""" + agents = client.agents.list() + for agent in agents: + if agent.name == agent_name: + return agent.id + raise ValueError(f"Agent with name {agent_name} not found.") + + +class TestNonStreamingMessages: + """Test non-streaming message sending with the unified-surface sync agent.""" + + def test_send_simple_message(self, client: Agentex, agent_name: str): + """Test sending a simple message and receiving a response.""" + response = client.agents.send_message( + agent_name=agent_name, + params=ParamsSendMessageRequest( + content=TextContentParam( + author="user", + content="Hello! What can you help me with?", + type="text", + ) + ), + ) + result = response.result + assert result is not None + assert len(result) >= 1 + + def test_tool_calling(self, client: Agentex, agent_name: str): + """Test that the agent can use tools (e.g., weather tool).""" + response = client.agents.send_message( + agent_name=agent_name, + params=ParamsSendMessageRequest( + content=TextContentParam( + author="user", + content="What's the weather in San Francisco?", + type="text", + ) + ), + ) + result = response.result + assert result is not None + assert len(result) >= 1 + + +class TestStreamingMessages: + """Test streaming message sending through the unified yield_turn path.""" + + def test_stream_simple_message(self, client: Agentex, agent_name: str): + """Test streaming a simple message response.""" + stream = client.agents.send_message_stream( + agent_name=agent_name, + params=ParamsSendMessageRequest( + content=TextContentParam( + author="user", + content="Tell me a short joke.", + type="text", + ) + ), + ) + + aggregated_content, chunks = collect_streaming_response(stream) + + assert aggregated_content is not None + assert len(chunks) > 1, "No chunks received in streaming response." + + def test_stream_tool_calling(self, client: Agentex, agent_name: str): + """Test streaming with tool calls through the unified surface. + + Exercises token-by-token tool-call argument streaming (coalesce off), + which the unified yield_turn path preserves on the sync channel. + """ + stream = client.agents.send_message_stream( + agent_name=agent_name, + params=ParamsSendMessageRequest( + content=TextContentParam( + author="user", + content="What's the weather in New York? Respond with the temperature.", + type="text", + ) + ), + ) + + aggregated_content, chunks = collect_streaming_response(stream) + + assert aggregated_content is not None + assert len(chunks) > 0, "No chunks received in streaming response." + # The weather tool always returns "72°F", so the agent's reply should mention it. + validate_text_in_string("72", aggregated_content) + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/.dockerignore b/examples/tutorials/10_async/00_base/harness_pydantic_ai/.dockerignore new file mode 100644 index 000000000..c49489471 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/.dockerignore @@ -0,0 +1,43 @@ +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg + +# Environments +.env** +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# IDE +.idea/ +.vscode/ +*.swp +*.swo + +# Git +.git +.gitignore + +# Misc +.DS_Store diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/Dockerfile b/examples/tutorials/10_async/00_base/harness_pydantic_ai/Dockerfile new file mode 100644 index 000000000..3c1b9dfea --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/Dockerfile @@ -0,0 +1,50 @@ +# syntax=docker/dockerfile:1.3 +FROM python:3.12-slim +COPY --from=ghcr.io/astral-sh/uv:0.6.4 /uv /uvx /bin/ + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + htop \ + vim \ + curl \ + tar \ + python3-dev \ + postgresql-client \ + build-essential \ + libpq-dev \ + gcc \ + cmake \ + netcat-openbsd \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +RUN uv pip install --system --upgrade pip setuptools wheel + +ENV UV_HTTP_TIMEOUT=1000 + +# Copy pyproject.toml and README.md to install dependencies +COPY 10_async/00_base/harness_pydantic_ai/pyproject.toml /app/harness_pydantic_ai/pyproject.toml +COPY 10_async/00_base/harness_pydantic_ai/README.md /app/harness_pydantic_ai/README.md + +WORKDIR /app/harness_pydantic_ai + +# Copy the project code +COPY 10_async/00_base/harness_pydantic_ai/project /app/harness_pydantic_ai/project + +# Copy the test files +COPY 10_async/00_base/harness_pydantic_ai/tests /app/harness_pydantic_ai/tests + +# Copy shared test utilities +COPY test_utils /app/test_utils + +# Install the required Python packages with dev dependencies +RUN uv pip install --system .[dev] pytest-asyncio httpx + +# Set environment variables +ENV PYTHONPATH=/app + +# Set test environment variables +ENV AGENT_NAME=ab-harness-pydantic-ai + +# Run the agent using uvicorn +CMD ["uvicorn", "project.acp:acp", "--host", "0.0.0.0", "--port", "8000"] diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/README.md b/examples/tutorials/10_async/00_base/harness_pydantic_ai/README.md new file mode 100644 index 000000000..51acb62bd --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/README.md @@ -0,0 +1,54 @@ +# Async Pydantic AI Harness Test Agent + +A minimal **async** (Redis-streaming) Pydantic AI agent that drives the +**unified harness surface** (`UnifiedEmitter.auto_send_turn` + `PydanticAITurn`) +directly. + +## Why this agent exists + +The `10_async/00_base/110_pydantic_ai` tutorial streams via the +`stream_pydantic_ai_events` helper (which uses the unified surface internally). +This harness test agent calls `emitter.auto_send_turn(...)` **explicitly** at the +agent-author level, making the unified-surface wiring visible and giving the +async channel direct coverage. + +## How it wires the unified surface + +In `project/acp.py`: + +```python +emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=turn_span.id if turn_span else None, +) +async with agent.run_stream_events(user_message, message_history=previous_messages) as stream: + turn = PydanticAITurn(tee_messages(stream), model=MODEL_NAME, coalesce_tool_requests=True) + result = await emitter.auto_send_turn(turn) +``` + +- `coalesce_tool_requests=True` is required on the async/auto_send path until + AGX1-377 lands: tool requests are delivered as a single `Full(tool_request)` + rather than streamed `Start + Delta + Done`. +- The `UnifiedEmitter` is constructed from the ACP context (`task_id` + + `trace_id` + `parent_span_id`) so messages auto-send to the task stream + (Redis) and tracing is automatic. +- Multi-turn memory is persisted via `adk.state` (pydantic-ai message history + round-tripped through `ModelMessagesTypeAdapter`). + +## Files + +- `project/acp.py` — async ACP handler using `emitter.auto_send_turn(...)`. +- `project/agent.py` — builds the `pydantic_ai.Agent` with one tool. +- `project/tools.py` — `get_weather(city)` returning a constant. +- `tests/test_agent.py` — live integration test (requires a running agent). + +## Tools + +- `get_weather(city: str) -> str`: returns a fixed "sunny and 72°F" string. + +## Offline coverage + +Offline integration tests for the same wiring (pydantic-ai `TestModel` + fake +streaming/tracing, no network) live in the SDK repo at +`tests/lib/core/harness/test_harness_pydantic_ai_async.py`. diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/manifest.yaml b/examples/tutorials/10_async/00_base/harness_pydantic_ai/manifest.yaml new file mode 100644 index 000000000..f9e50f329 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/manifest.yaml @@ -0,0 +1,58 @@ +build: + context: + root: ../../../ + include_paths: + - 10_async/00_base/harness_pydantic_ai + - test_utils + dockerfile: 10_async/00_base/harness_pydantic_ai/Dockerfile + dockerignore: 10_async/00_base/harness_pydantic_ai/.dockerignore + +local_development: + agent: + port: 8000 + host_address: host.docker.internal + paths: + acp: project/acp.py + +agent: + acp_type: async + name: ab-harness-pydantic-ai + description: An async Pydantic AI harness test agent using the unified emitter surface + + temporal: + enabled: false + + credentials: + - env_var_name: OPENAI_API_KEY + secret_name: openai-api-key + secret_key: api-key + - env_var_name: REDIS_URL + secret_name: redis-url-secret + secret_key: url + - env_var_name: SGP_API_KEY + secret_name: sgp-api-key + secret_key: api-key + - env_var_name: SGP_ACCOUNT_ID + secret_name: sgp-account-id + secret_key: account-id + - env_var_name: SGP_CLIENT_BASE_URL + secret_name: sgp-client-base-url + secret_key: url + +deployment: + image: + repository: "" + tag: "latest" + + global: + agent: + name: "ab-harness-pydantic-ai" + description: "An async Pydantic AI harness test agent using the unified emitter surface" + replicaCount: 1 + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "1000m" + memory: "2Gi" diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/__init__.py b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/acp.py b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/acp.py new file mode 100644 index 000000000..1b7f77508 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/acp.py @@ -0,0 +1,161 @@ +"""ACP handler for the async harness Pydantic AI test agent. + +This agent exercises the UNIFIED HARNESS SURFACE on the async (Redis-streaming) +channel — ``UnifiedEmitter.auto_send_turn(PydanticAITurn(..., coalesce_tool_requests=True))`` +— calling it directly rather than via the ``stream_pydantic_ai_events`` helper +(which the ``110_pydantic_ai`` tutorial uses). This makes the unified-surface +wiring explicit at the agent-author level. + +Multi-turn memory is persisted via ``adk.state``: on each turn we load the +previous pydantic-ai ``message_history`` from state, run the agent with it, +then save the updated history back. +""" + +from __future__ import annotations + +import os +from typing import Any, AsyncIterator + +from dotenv import load_dotenv + +load_dotenv() + +from pydantic_ai.run import AgentRunResultEvent +from pydantic_ai.messages import ModelMessagesTypeAdapter + +import agentex.lib.adk as adk +from project.agent import MODEL_NAME, create_agent +from agentex.lib.types.acp import SendEventParams, CancelTaskParams, CreateTaskParams +from agentex.lib.core.harness import UnifiedEmitter +from agentex.lib.types.fastacp import AsyncACPConfig +from agentex.lib.types.tracing import SGPTracingProcessorConfig +from agentex.lib.utils.logging import make_logger +from agentex.lib.utils.model_utils import BaseModel +from agentex.lib.sdk.fastacp.fastacp import FastACP +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn +from agentex.lib.core.tracing.tracing_processor_manager import add_tracing_processor_config + +logger = make_logger(__name__) + +add_tracing_processor_config( + SGPTracingProcessorConfig( + sgp_api_key=os.environ.get("SGP_API_KEY", ""), + sgp_account_id=os.environ.get("SGP_ACCOUNT_ID", ""), + sgp_base_url=os.environ.get("SGP_CLIENT_BASE_URL", ""), + ) +) + +acp = FastACP.create( + acp_type="async", + config=AsyncACPConfig(type="base"), +) + +_agent = None + + +def get_agent(): + global _agent + if _agent is None: + _agent = create_agent() + return _agent + + +class ConversationState(BaseModel): + """Per-task conversation state persisted via ``adk.state``. + + ``history_json`` holds the pydantic-ai message history serialized by + ``ModelMessagesTypeAdapter`` — pydantic-ai's official way to round-trip + ``ModelMessage`` objects through JSON. + """ + + history_json: str = "[]" + turn_number: int = 0 + + +@acp.on_task_create +async def handle_task_create(params: CreateTaskParams): + """Initialize per-task state on task creation.""" + logger.info(f"Task created: {params.task.id}") + await adk.state.create( + task_id=params.task.id, + agent_id=params.agent.id, + state=ConversationState(), + ) + + +@acp.on_task_event_send +async def handle_task_event_send(params: SendEventParams): + """Handle each user message through the unified auto_send_turn path.""" + agent = get_agent() + task_id = params.task.id + agent_id = params.agent.id + user_message = params.event.content.content + + logger.info(f"Processing message for thread {task_id}") + + # Echo the user's message into the task history. + await adk.messages.create(task_id=task_id, content=params.event.content) + + # Load the previous conversation history from state (fall back to fresh). + task_state = await adk.state.get_by_task_and_agent(task_id=task_id, agent_id=agent_id) + if task_state is None: + state = ConversationState() + task_state = await adk.state.create(task_id=task_id, agent_id=agent_id, state=state) + else: + state = ConversationState.model_validate(task_state.state) + + state.turn_number += 1 + previous_messages = ModelMessagesTypeAdapter.validate_json(state.history_json) + + async with adk.tracing.span( + trace_id=task_id, + task_id=task_id, + name=f"Turn {state.turn_number}", + input={"message": user_message}, + data={"__span_type__": "AGENT_WORKFLOW"}, + ) as turn_span: + # Construct the UnifiedEmitter from the ACP context so tracing is + # automatic and messages are auto-sent to the task stream (Redis). + emitter = UnifiedEmitter( + task_id=task_id, + trace_id=task_id, + parent_span_id=turn_span.id if turn_span else None, + ) + + # Capture the terminal AgentRunResultEvent to persist message history. + captured_messages: list[Any] = [] + + async def tee_messages(upstream) -> AsyncIterator[Any]: + async for event in upstream: + if isinstance(event, AgentRunResultEvent): + captured_messages[:] = list(event.result.all_messages()) + yield event + + async with agent.run_stream_events(user_message, message_history=previous_messages) as stream: + # coalesce_tool_requests=True is required on the async/auto_send + # path until AGX1-377 lands: tool requests are delivered as a single + # Full(tool_request) rather than streamed Start+Delta+Done. + turn = PydanticAITurn( + tee_messages(stream), + model=MODEL_NAME, + coalesce_tool_requests=True, + ) + result = await emitter.auto_send_turn(turn) + + # Save the updated message history so the next turn picks up here. + if captured_messages: + state.history_json = ModelMessagesTypeAdapter.dump_json(captured_messages).decode() + await adk.state.update( + state_id=task_state.id, + task_id=task_id, + agent_id=agent_id, + state=state, + ) + + if turn_span: + turn_span.output = {"final_output": result.final_text} + + +@acp.on_task_cancel +async def handle_task_canceled(params: CancelTaskParams): + logger.info(f"Task canceled: {params.task.id}") diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/agent.py b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/agent.py new file mode 100644 index 000000000..e7b764d82 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/agent.py @@ -0,0 +1,39 @@ +"""Pydantic AI agent definition for the async harness test agent. + +The Agent is the boundary between this module and the API layer (acp.py). +Pydantic AI handles its own tool-call loop internally — no graph required. +""" + +from __future__ import annotations + +from datetime import datetime + +from pydantic_ai import Agent + +from project.tools import get_weather + +__all__ = ["create_agent", "MODEL_NAME"] + +MODEL_NAME = "openai:gpt-4o-mini" +SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools. + +Current date and time: {timestamp} + +Guidelines: +- Be concise and helpful +- Use tools when they would help answer the user's question +- If you're unsure, ask clarifying questions +- Always provide accurate information +""" + + +def create_agent() -> Agent: + """Build and return the Pydantic AI agent with tools registered.""" + agent = Agent( + MODEL_NAME, + system_prompt=SYSTEM_PROMPT.format(timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S")), + ) + + agent.tool_plain(get_weather) + + return agent diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/tools.py b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/tools.py new file mode 100644 index 000000000..0f16a7cb0 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/project/tools.py @@ -0,0 +1,20 @@ +"""Tool definitions for the async harness Pydantic AI agent. + +Pydantic AI tools are registered directly on the Agent via decorators +(see project.agent). This module hosts the bare function so it is easy to +unit-test in isolation. +""" + +from __future__ import annotations + + +def get_weather(city: str) -> str: + """Get the current weather for a city. + + Args: + city: The name of the city to get weather for. + + Returns: + A string describing the weather conditions. + """ + return f"The weather in {city} is sunny and 72°F" diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/pyproject.toml b/examples/tutorials/10_async/00_base/harness_pydantic_ai/pyproject.toml new file mode 100644 index 000000000..3dc1e0e41 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/pyproject.toml @@ -0,0 +1,36 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "ab-harness-pydantic-ai" +version = "0.1.0" +description = "An async Pydantic AI harness test agent using the unified emitter surface" +readme = "README.md" +requires-python = ">=3.12" +dependencies = [ + "agentex-sdk", + "scale-gp", + "pydantic-ai-slim[openai]>=1.0,<2", +] + +[project.optional-dependencies] +dev = [ + "pytest", + "pytest-asyncio", + "httpx", + "black", + "isort", + "flake8", +] + +[tool.hatch.build.targets.wheel] +packages = ["project"] + +[tool.black] +line-length = 88 +target-version = ['py312'] + +[tool.isort] +profile = "black" +line_length = 88 diff --git a/examples/tutorials/10_async/00_base/harness_pydantic_ai/tests/test_agent.py b/examples/tutorials/10_async/00_base/harness_pydantic_ai/tests/test_agent.py new file mode 100644 index 000000000..11098c7d5 --- /dev/null +++ b/examples/tutorials/10_async/00_base/harness_pydantic_ai/tests/test_agent.py @@ -0,0 +1,118 @@ +"""Live tests for the async harness Pydantic AI agent. + +These tests require a running agent (server + deployed agent) and exercise the +unified-surface async handler end-to-end over the wire. They mirror the +``110_pydantic_ai`` async tutorial tests but target this harness agent. + +Offline coverage of the same wiring (TestModel + fake streaming/tracing) lives +in ``tests/lib/core/harness/test_harness_pydantic_ai_async.py`` in the SDK repo. + +To run these tests: +1. Make sure the agent is running (via docker-compose or `agentex agents run`) +2. Set the AGENTEX_API_BASE_URL environment variable if not using default +3. Run: pytest test_agent.py -v + +Configuration: +- AGENTEX_API_BASE_URL: Base URL for the AgentEx server (default: http://localhost:5003) +- AGENT_NAME: Name of the agent to test (default: ab-harness-pydantic-ai) +""" + +import os + +import pytest +import pytest_asyncio + +from agentex import AsyncAgentex +from agentex.types import TextContentParam +from agentex.types.agent_rpc_params import ParamsCreateTaskRequest +from agentex.lib.sdk.fastacp.base.base_acp_server import uuid + +AGENTEX_API_BASE_URL = os.environ.get("AGENTEX_API_BASE_URL", "http://localhost:5003") +AGENT_NAME = os.environ.get("AGENT_NAME", "ab-harness-pydantic-ai") + + +@pytest_asyncio.fixture +async def client(): + """Create an AsyncAgentex client instance for testing.""" + client = AsyncAgentex(base_url=AGENTEX_API_BASE_URL) + yield client + await client.close() + + +@pytest.fixture +def agent_name(): + """Return the agent name for testing.""" + return AGENT_NAME + + +@pytest_asyncio.fixture +async def agent_id(client, agent_name): + """Retrieve the agent ID based on the agent name.""" + agents = await client.agents.list() + for agent in agents: + if agent.name == agent_name: + return agent.id + raise ValueError(f"Agent with name {agent_name} not found.") + + +class TestNonStreamingEvents: + """Test non-streaming event sending through the unified auto_send_turn path.""" + + @pytest.mark.asyncio + async def test_send_event(self, client: AsyncAgentex, agent_id: str): + """Test sending an event to the async harness Pydantic AI agent.""" + task_response = await client.agents.create_task(agent_id, params=ParamsCreateTaskRequest(name=uuid.uuid1().hex)) + task = task_response.result + assert task is not None + + event_content = TextContentParam( + type="text", + author="user", + content="Hello! What can you help me with?", + ) + await client.agents.send_event( + agent_id=agent_id, + params={"task_id": task.id, "content": event_content}, + ) + + @pytest.mark.asyncio + async def test_tool_calling(self, client: AsyncAgentex, agent_id: str): + """Test that the agent can use tools (e.g., weather tool).""" + task_response = await client.agents.create_task(agent_id, params=ParamsCreateTaskRequest(name=uuid.uuid1().hex)) + task = task_response.result + assert task is not None + + event_content = TextContentParam( + type="text", + author="user", + content="What's the weather in San Francisco?", + ) + await client.agents.send_event( + agent_id=agent_id, + params={"task_id": task.id, "content": event_content}, + ) + + +class TestStreamingEvents: + """Test streaming event sending.""" + + @pytest.mark.asyncio + async def test_send_event_and_stream(self, client: AsyncAgentex, agent_id: str): + """Test sending an event and streaming the response.""" + task_response = await client.agents.create_task(agent_id, params=ParamsCreateTaskRequest(name=uuid.uuid1().hex)) + task = task_response.result + assert task is not None + + event_content = TextContentParam( + type="text", + author="user", + content="Tell me a short joke.", + ) + await client.agents.send_event( + agent_id=agent_id, + params={"task_id": task.id, "content": event_content}, + ) + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/.dockerignore b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/.dockerignore new file mode 100644 index 000000000..c49489471 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/.dockerignore @@ -0,0 +1,43 @@ +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg + +# Environments +.env** +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# IDE +.idea/ +.vscode/ +*.swp +*.swo + +# Git +.git +.gitignore + +# Misc +.DS_Store diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/Dockerfile b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/Dockerfile new file mode 100644 index 000000000..98c74c6e8 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/Dockerfile @@ -0,0 +1,43 @@ +# syntax=docker/dockerfile:1.3 +FROM python:3.12-slim +COPY --from=ghcr.io/astral-sh/uv:0.6.4 /uv /uvx /bin/ + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + htop \ + vim \ + curl \ + tar \ + python3-dev \ + postgresql-client \ + build-essential \ + libpq-dev \ + gcc \ + cmake \ + netcat-openbsd \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +RUN uv pip install --system --upgrade pip setuptools wheel + +ENV UV_HTTP_TIMEOUT=1000 + +COPY 10_async/10_temporal/harness_pydantic_ai/pyproject.toml /app/harness_pydantic_ai/pyproject.toml +COPY 10_async/10_temporal/harness_pydantic_ai/README.md /app/harness_pydantic_ai/README.md + +WORKDIR /app/harness_pydantic_ai + +COPY 10_async/10_temporal/harness_pydantic_ai/project /app/harness_pydantic_ai/project +COPY 10_async/10_temporal/harness_pydantic_ai/tests /app/harness_pydantic_ai/tests +COPY test_utils /app/test_utils + +RUN uv pip install --system .[dev] + +ENV PYTHONPATH=/app + +ENV AGENT_NAME=at-harness-pydantic-ai + +CMD ["uvicorn", "project.acp:acp", "--host", "0.0.0.0", "--port", "8000"] + +# When we deploy the worker, we will replace the CMD with the following +# CMD ["python", "-m", "run_worker"] diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/README.md b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/README.md new file mode 100644 index 000000000..3e5fef4c6 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/README.md @@ -0,0 +1,61 @@ +# Temporal Pydantic AI Harness Test Agent + +A minimal **Temporal-backed** Pydantic AI agent that drives the **unified +harness surface** (`UnifiedEmitter.auto_send_turn` + `PydanticAITurn`) from +inside the model activity's `event_stream_handler`. + +## Why this agent exists + +The `10_async/10_temporal/110_pydantic_ai` tutorial streams via the +`stream_pydantic_ai_events` helper (which uses the unified surface internally). +This harness test agent calls `emitter.auto_send_turn(...)` **explicitly** inside +the `event_stream_handler`, making the unified-surface wiring visible and giving +the temporal channel direct coverage. + +## How it wires the unified surface + +In `project/agent.py`, the `event_stream_handler` runs inside the model activity +and constructs a `UnifiedEmitter` from `RunContext.deps`: + +```python +async def event_handler(run_context, events): + emitter = UnifiedEmitter( + task_id=run_context.deps.task_id, + trace_id=run_context.deps.task_id, + parent_span_id=run_context.deps.parent_span_id, + ) + turn = PydanticAITurn(events, model=MODEL_NAME, coalesce_tool_requests=True) + await emitter.auto_send_turn(turn) +``` + +- The handler runs inside a Temporal activity, so it can freely make + non-deterministic Redis + tracing writes. +- `coalesce_tool_requests=True` is required on the auto_send path until + AGX1-377 lands. +- `deps` (set by `project/workflow.py`) threads the `task_id` and the per-turn + `parent_span_id` into the handler so tool spans nest under the workflow's turn + span. + +## Structure + +- `project/acp.py` — thin ACP server; FastACP auto-wires HTTP routes to the + workflow when `TemporalACPConfig` is used. +- `project/agent.py` — base `Agent` + `TemporalAgent` + the unified-surface + `event_stream_handler`. +- `project/workflow.py` — durable workflow; each turn delegates to + `temporal_agent.run(...)`. +- `project/run_worker.py` — Temporal worker entry point. +- `project/tools.py` — async `get_weather(city)` returning a constant. +- `tests/test_agent.py` — live integration test (requires Temporal + Redis + + ACP server + worker). + +## Tools + +- `get_weather(city: str) -> str` (async): returns a fixed "sunny and 72°F" + string. Each tool call becomes its own Temporal activity. + +## Offline coverage + +Offline integration tests for the same wiring (pydantic-ai `TestModel` + fake +streaming/tracing, no Temporal server) live in the SDK repo at +`tests/lib/core/harness/test_harness_pydantic_ai_temporal.py`. diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/manifest.yaml b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/manifest.yaml new file mode 100644 index 000000000..9efbff918 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/manifest.yaml @@ -0,0 +1,62 @@ +build: + context: + root: ../../../ + include_paths: + - 10_async/10_temporal/harness_pydantic_ai + - test_utils + dockerfile: 10_async/10_temporal/harness_pydantic_ai/Dockerfile + dockerignore: 10_async/10_temporal/harness_pydantic_ai/.dockerignore + +local_development: + agent: + port: 8000 + host_address: host.docker.internal + paths: + acp: project/acp.py + worker: project/run_worker.py + +agent: + acp_type: async + name: at-harness-pydantic-ai + description: A Temporal-backed Pydantic AI harness test agent using the unified emitter surface + + temporal: + enabled: true + workflows: + - name: at-harness-pydantic-ai + queue_name: at_harness_pydantic_ai_queue + + credentials: + - env_var_name: REDIS_URL + secret_name: redis-url-secret + secret_key: url + - env_var_name: OPENAI_API_KEY + secret_name: openai-api-key + secret_key: api-key + - env_var_name: SGP_API_KEY + secret_name: sgp-api-key + secret_key: api-key + - env_var_name: SGP_ACCOUNT_ID + secret_name: sgp-account-id + secret_key: account-id + - env_var_name: SGP_CLIENT_BASE_URL + secret_name: sgp-client-base-url + secret_key: url + +deployment: + image: + repository: "" + tag: "latest" + + global: + agent: + name: "at-harness-pydantic-ai" + description: "A Temporal-backed Pydantic AI harness test agent using the unified emitter surface" + replicaCount: 1 + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "1000m" + memory: "2Gi" diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/__init__.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/acp.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/acp.py new file mode 100644 index 000000000..c142dcf70 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/acp.py @@ -0,0 +1,35 @@ +"""ACP server for the Temporal harness Pydantic AI test agent. + +This file is intentionally thin. When ``acp_type="async"`` is combined with +``TemporalACPConfig(type="temporal", ...)``, FastACP auto-wires: + + HTTP task/create → @workflow.run on the workflow class + HTTP task/event/send → @workflow.signal(SignalName.RECEIVE_EVENT) + HTTP task/cancel → workflow cancellation via the Temporal client + +so we don't define any handlers here. The actual agent code lives in +``project/workflow.py`` and is executed by the Temporal worker +(``project/run_worker.py``), not by this HTTP process. +""" + +from __future__ import annotations + +import os + +from dotenv import load_dotenv + +load_dotenv() + +from pydantic_ai.durable_exec.temporal import PydanticAIPlugin + +from agentex.lib.types.fastacp import TemporalACPConfig +from agentex.lib.sdk.fastacp.fastacp import FastACP + +acp = FastACP.create( + acp_type="async", + config=TemporalACPConfig( + type="temporal", + temporal_address=os.getenv("TEMPORAL_ADDRESS", "localhost:7233"), + plugins=[PydanticAIPlugin()], + ), +) diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/agent.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/agent.py new file mode 100644 index 000000000..ae261eedc --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/agent.py @@ -0,0 +1,111 @@ +"""Pydantic AI agent definition for the Temporal harness test agent. + +This module constructs the base ``pydantic_ai.Agent`` once at import time, +registers tools on it, and wraps it in ``TemporalAgent`` from +``pydantic_ai.durable_exec.temporal``. + +The ``TemporalAgent`` wrapper makes every model call and every tool call run as +a Temporal activity automatically. The workflow stays deterministic; the +non-deterministic work (LLM HTTP calls, tool execution) moves into recorded +activities. + +Streaming back to Agentex happens via ``event_stream_handler``, which receives +Pydantic AI ``AgentStreamEvent``s from inside the model activity and forwards +them through the UNIFIED HARNESS SURFACE (``UnifiedEmitter.auto_send_turn`` + +``PydanticAITurn``) — called directly rather than via ``stream_pydantic_ai_events``. +The ``task_id`` and per-turn ``parent_span_id`` are threaded into the handler +via ``deps``. +""" + +from __future__ import annotations + +from datetime import datetime +from collections.abc import AsyncIterable + +from pydantic import BaseModel +from pydantic_ai import Agent, RunContext +from pydantic_ai.messages import AgentStreamEvent +from pydantic_ai.durable_exec.temporal import TemporalAgent + +from project.tools import get_weather +from agentex.lib.core.harness import UnifiedEmitter +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + +__all__ = ["TaskDeps", "temporal_agent", "base_agent", "MODEL_NAME"] + +MODEL_NAME = "openai:gpt-4o-mini" +SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools. + +Current date and time: {timestamp} + +Guidelines: +- Be concise and helpful +- Use tools when they would help answer the user's question +- If you're unsure, ask clarifying questions +- Always provide accurate information +""" + + +class TaskDeps(BaseModel): + """Per-run dependencies passed into the agent via ``deps=``. + + Pydantic AI's ``RunContext.deps`` is the canonical place to thread + request-scoped data (like the Agentex task_id) into tools and event + handlers — including code that runs inside Temporal activities. + """ + + task_id: str + # When set, the event handler nests per-tool-call spans under this span. + # Typically the ID of the per-turn span opened by the workflow. + parent_span_id: str | None = None + + +def _build_base_agent() -> Agent[TaskDeps, str]: + """Build the underlying Pydantic AI agent with tools registered. + + Tools must be registered BEFORE the agent is wrapped in TemporalAgent; + changes to tool registration after wrapping are not reflected. + """ + agent: Agent[TaskDeps, str] = Agent( + MODEL_NAME, + deps_type=TaskDeps, + system_prompt=SYSTEM_PROMPT.format(timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S")), + ) + agent.tool_plain(get_weather) + return agent + + +async def event_handler( + run_context: RunContext[TaskDeps], + events: AsyncIterable[AgentStreamEvent], +) -> None: + """Stream Pydantic AI events to Agentex via the unified surface. + + Pydantic AI calls this with the live event stream as soon as the model + activity begins emitting parts. Because the handler runs inside the activity + (not the workflow), it can freely make non-deterministic Redis + tracing + writes. + + The UnifiedEmitter is constructed from ``deps`` (task_id + parent_span_id), + so tool spans nest under the workflow's per-turn span and messages auto-send + to the task stream. ``coalesce_tool_requests=True`` is required on the + auto_send path until AGX1-377 lands. + """ + emitter = UnifiedEmitter( + task_id=run_context.deps.task_id, + trace_id=run_context.deps.task_id, + parent_span_id=run_context.deps.parent_span_id, + ) + turn = PydanticAITurn(events, model=MODEL_NAME, coalesce_tool_requests=True) + await emitter.auto_send_turn(turn) + + +# Construct the durable agent at module load time so that the PydanticAIPlugin +# can auto-discover its activities via the workflow's ``__pydantic_ai_agents__`` +# attribute. +base_agent = _build_base_agent() +temporal_agent: TemporalAgent[TaskDeps, str] = TemporalAgent( + base_agent, + name="harness_pydantic_ai_agent", + event_stream_handler=event_handler, +) diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/run_worker.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/run_worker.py new file mode 100644 index 000000000..4b4d43d19 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/run_worker.py @@ -0,0 +1,48 @@ +"""Temporal worker for the harness Pydantic AI test agent. + +Run as a separate long-lived process alongside the ACP HTTP server. The worker +polls Temporal for workflow + activity tasks and executes them. + +The ``PydanticAIPlugin`` reads ``__pydantic_ai_agents__`` off the workflow class +and registers every model/tool activity the TemporalAgent needs — so we don't +have to enumerate activities by hand here. +""" + +import asyncio + +from pydantic_ai.durable_exec.temporal import PydanticAIPlugin + +from project.workflow import HarnessPydanticAiWorkflow +from agentex.lib.utils.debug import setup_debug_if_enabled +from agentex.lib.utils.logging import make_logger +from agentex.lib.environment_variables import EnvironmentVariables +from agentex.lib.core.temporal.activities import get_all_activities +from agentex.lib.core.temporal.workers.worker import AgentexWorker + +environment_variables = EnvironmentVariables.refresh() +logger = make_logger(__name__) + + +async def main(): + setup_debug_if_enabled() + + task_queue_name = environment_variables.WORKFLOW_TASK_QUEUE + if task_queue_name is None: + raise ValueError("WORKFLOW_TASK_QUEUE is not set") + + # get_all_activities() returns the built-in Agentex activities (state, + # messages, streaming, tracing). Pydantic AI's TemporalAgent activities are + # auto-registered by PydanticAIPlugin via __pydantic_ai_agents__. + worker = AgentexWorker( + task_queue=task_queue_name, + plugins=[PydanticAIPlugin()], + ) + + await worker.run( + activities=get_all_activities(), + workflow=HarnessPydanticAiWorkflow, + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/tools.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/tools.py new file mode 100644 index 000000000..bbd6c5200 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/tools.py @@ -0,0 +1,24 @@ +"""Tool definitions for the Temporal harness Pydantic AI agent. + +These functions are registered on the base Pydantic AI agent. When the agent +is wrapped in ``TemporalAgent``, each tool call becomes its own Temporal +activity automatically — independently retryable and observable. + +Tools must be ``async`` because Pydantic AI's Temporal integration requires +it: non-async tools would run in threads, which is non-deterministic and +unsafe for Temporal replay. +""" + +from __future__ import annotations + + +async def get_weather(city: str) -> str: + """Get the current weather for a city. + + Args: + city: The name of the city to get weather for. + + Returns: + A string describing the weather conditions. + """ + return f"The weather in {city} is sunny and 72°F" diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/workflow.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/workflow.py new file mode 100644 index 000000000..9a01be7de --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/project/workflow.py @@ -0,0 +1,137 @@ +"""Temporal workflow for the harness Pydantic AI test agent. + +The workflow holds task state durably across crashes. Its signal handler +delegates the actual agent run to ``temporal_agent.run(...)`` — which internally +schedules model and tool activities, each independently durable. The +``event_stream_handler`` registered on ``temporal_agent`` (see project.agent) +pushes streaming deltas through the unified harness surface while the model +activity runs. + +Multi-turn memory is kept on the workflow instance itself +(``self._message_history``). Temporal's workflow state is already durable and +replay-safe, so unlike the async-base agent we don't need an external +``adk.state`` round-trip. +""" + +from __future__ import annotations + +import os +import json +from typing import TYPE_CHECKING + +from temporalio import workflow + +from agentex.lib import adk +from project.agent import TaskDeps, temporal_agent +from agentex.lib.types.acp import SendEventParams, CreateTaskParams +from agentex.lib.types.tracing import SGPTracingProcessorConfig +from agentex.lib.utils.logging import make_logger +from agentex.types.text_content import TextContent +from agentex.lib.environment_variables import EnvironmentVariables +from agentex.lib.core.temporal.types.workflow import SignalName +from agentex.lib.core.temporal.workflows.workflow import BaseWorkflow +from agentex.lib.core.tracing.tracing_processor_manager import ( + add_tracing_processor_config, +) + +if TYPE_CHECKING: + from pydantic_ai.messages import ModelMessage + +add_tracing_processor_config( + SGPTracingProcessorConfig( + sgp_api_key=os.environ.get("SGP_API_KEY", ""), + sgp_account_id=os.environ.get("SGP_ACCOUNT_ID", ""), + sgp_base_url=os.environ.get("SGP_CLIENT_BASE_URL", ""), + ) +) + +environment_variables = EnvironmentVariables.refresh() + +if environment_variables.WORKFLOW_NAME is None: + raise ValueError("Environment variable WORKFLOW_NAME is not set") +if environment_variables.AGENT_NAME is None: + raise ValueError("Environment variable AGENT_NAME is not set") + +logger = make_logger(__name__) + + +@workflow.defn(name=environment_variables.WORKFLOW_NAME) +class HarnessPydanticAiWorkflow(BaseWorkflow): + """Long-running Temporal workflow that delegates each turn to a Pydantic AI TemporalAgent. + + The ``__pydantic_ai_agents__`` attribute is the marker the + ``PydanticAIPlugin`` looks for at worker startup: it pulls + ``temporal_agent.temporal_activities`` off this list and registers them on + the worker automatically — so we don't have to list activities by hand in + ``run_worker.py``. + """ + + __pydantic_ai_agents__ = [temporal_agent] + + def __init__(self): + super().__init__(display_name=environment_variables.AGENT_NAME) + self._complete_task = False + self._turn_number = 0 + # Conversation history accumulated across turns. Each entry is a + # pydantic-ai ``ModelMessage``. Temporal replays the activity that + # produced these messages, so the list is rebuilt deterministically if + # the workflow ever recovers from a crash. + self._message_history: list["ModelMessage"] = [] + + @workflow.signal(name=SignalName.RECEIVE_EVENT) + async def on_task_event_send(self, params: SendEventParams) -> None: + """Handle a new user message: echo it, then run the agent durably.""" + logger.info(f"Received task event: {params.task.id}") + self._turn_number += 1 + + # Echo the user's message so it shows up in the UI as a chat bubble. + await adk.messages.create(task_id=params.task.id, content=params.event.content) + + async with adk.tracing.span( + trace_id=params.task.id, + task_id=params.task.id, + name=f"Turn {self._turn_number}", + input={"message": params.event.content.content}, + ) as span: + # temporal_agent.run() schedules a model activity, per-tool + # activities, and the event_stream_handler activity (which pushes + # deltas through the unified surface). Passing ``message_history`` + # makes the run remember prior turns. + result = await temporal_agent.run( + params.event.content.content, + message_history=self._message_history, + deps=TaskDeps( + task_id=params.task.id, + parent_span_id=span.id if span else None, + ), + ) + # Persist the new full history (user + assistant + any tool rounds) + # so the next turn picks up from here. + self._message_history = list(result.all_messages()) + if span: + span.output = {"final_output": result.output} + + @workflow.run + async def on_task_create(self, params: CreateTaskParams) -> str: + """Workflow entry point — keep the conversation alive for incoming signals.""" + logger.info(f"Task created: {params.task.id}") + + await adk.messages.create( + task_id=params.task.id, + content=TextContent( + author="agent", + content=( + f"Task initialized with params:\n{json.dumps(params.params, indent=2)}\n" + f"Send me a message and I'll respond using a Pydantic AI agent backed by Temporal." + ), + ), + ) + + await workflow.wait_condition(lambda: self._complete_task, timeout=None) + return "Task completed" + + @workflow.signal + async def complete_task_signal(self) -> None: + """Graceful workflow shutdown signal.""" + logger.info("Received complete_task signal") + self._complete_task = True diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/pyproject.toml b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/pyproject.toml new file mode 100644 index 000000000..4d9039640 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/pyproject.toml @@ -0,0 +1,38 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "at-harness-pydantic-ai" +version = "0.1.0" +description = "A Temporal-backed Pydantic AI harness test agent using the unified emitter surface" +readme = "README.md" +requires-python = ">=3.12" +dependencies = [ + "agentex-sdk", + "scale-gp", + "temporalio>=1.18.2", + "pydantic-ai-slim[openai]>=1.0,<2", +] + +[project.optional-dependencies] +dev = [ + "pytest", + "pytest-asyncio", + "httpx", + "black", + "isort", + "flake8", + "debugpy>=1.8.15", +] + +[tool.hatch.build.targets.wheel] +packages = ["project"] + +[tool.black] +line-length = 88 +target-version = ['py312'] + +[tool.isort] +profile = "black" +line_length = 88 diff --git a/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/tests/test_agent.py b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/tests/test_agent.py new file mode 100644 index 000000000..a5b90ca34 --- /dev/null +++ b/examples/tutorials/10_async/10_temporal/harness_pydantic_ai/tests/test_agent.py @@ -0,0 +1,114 @@ +"""Live tests for the Temporal harness Pydantic AI agent. + +These tests require a running agent (Temporal + Redis + ACP server + worker) and +exercise the unified-surface event_stream_handler end-to-end over the wire. They +mirror the ``at110`` temporal tutorial tests but target this harness agent. + +Offline coverage of the same wiring (TestModel + fake streaming/tracing) lives +in ``tests/lib/core/harness/test_harness_pydantic_ai_temporal.py`` in the SDK repo. + +To run these tests: +1. Make sure the agent is running (worker + ACP server) +2. Set AGENTEX_API_BASE_URL if not using the default +3. Run: pytest tests/test_agent.py -v +""" + +import os +import uuid + +import pytest +import pytest_asyncio +from test_utils.async_utils import poll_messages, send_event_and_poll_yielding + +from agentex import AsyncAgentex +from agentex.types.task_message import TaskMessage +from agentex.types.agent_rpc_params import ParamsCreateTaskRequest + +AGENTEX_API_BASE_URL = os.environ.get("AGENTEX_API_BASE_URL", "http://localhost:5003") +AGENT_NAME = os.environ.get("AGENT_NAME", "at-harness-pydantic-ai") + + +@pytest_asyncio.fixture +async def client(): + client = AsyncAgentex(base_url=AGENTEX_API_BASE_URL) + yield client + await client.close() + + +@pytest.fixture +def agent_name(): + return AGENT_NAME + + +@pytest_asyncio.fixture +async def agent_id(client, agent_name): + agents = await client.agents.list() + for agent in agents: + if agent.name == agent_name: + return agent.id + raise ValueError(f"Agent with name {agent_name} not found.") + + +class TestNonStreamingEvents: + """Test that the Temporal-backed harness agent responds and uses tools.""" + + @pytest.mark.asyncio + async def test_send_event_and_poll(self, client: AsyncAgentex, agent_id: str): + """Drive a full turn: create task, send a weather question, verify tool round-trip.""" + task_response = await client.agents.create_task(agent_id, params=ParamsCreateTaskRequest(name=uuid.uuid1().hex)) + task = task_response.result + assert task is not None + + # Wait for the welcome message from on_task_create + task_creation_found = False + async for message in poll_messages( + client=client, + task_id=task.id, + timeout=30, + sleep_interval=1.0, + ): + assert isinstance(message, TaskMessage) + if message.content and message.content.type == "text" and message.content.author == "agent": + task_creation_found = True + break + assert task_creation_found, "Task creation welcome message not found" + + # Ask about weather — the agent should call get_weather + seen_tool_request = False + seen_tool_response = False + final_message = None + async for message in send_event_and_poll_yielding( + client=client, + agent_id=agent_id, + task_id=task.id, + user_message="What is the weather in San Francisco?", + timeout=60, + sleep_interval=1.0, + ): + assert isinstance(message, TaskMessage) + + if message.content and message.content.type == "tool_request": + seen_tool_request = True + if message.content and message.content.type == "tool_response": + seen_tool_response = True + if final_message and getattr(final_message, "streaming_status", None) == "DONE": + break + + if message.content and message.content.type == "text" and message.content.author == "agent": + final_message = message + content_length = len(getattr(message.content, "content", "") or "") + if message.streaming_status == "DONE" and content_length > 0: + if not seen_tool_request or seen_tool_response: + break + + assert seen_tool_request, "Expected a tool_request (agent calling get_weather)" + assert seen_tool_response, "Expected a tool_response (get_weather result)" + assert final_message is not None, "Expected a final agent text message" + final_text = getattr(final_message.content, "content", None) if final_message.content else None + assert isinstance(final_text, str) and len(final_text) > 0 + # The get_weather tool always returns "72°F" — the response should mention it. + assert "72" in final_text, "Expected weather response to mention 72°F" + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/src/agentex/lib/adk/_modules/_pydantic_ai_async.py b/src/agentex/lib/adk/_modules/_pydantic_ai_async.py index 0bbb5b19d..85abfb845 100644 --- a/src/agentex/lib/adk/_modules/_pydantic_ai_async.py +++ b/src/agentex/lib/adk/_modules/_pydantic_ai_async.py @@ -6,11 +6,10 @@ HTTP yields. Text and thinking tokens stream as deltas inside coalesced streaming -contexts. Tool requests and tool results are emitted as full -``adk.messages.create(...)`` calls (Option A — matches the async LangGraph -helper's convention). To stream tool-call argument tokens, see the sync -converter at ``agentex.lib.adk._modules._pydantic_ai_sync`` which yields -``ToolRequestDelta`` events. +contexts. Tool requests and tool results are posted as open+close pairs +on a streaming context (the unified surface persists ``initial_content`` +when a context is closed without deltas). This matches the ``auto_send`` +convention used by all other async/Temporal harnesses. Tracing is opt-in via a ``tracing_handler`` parameter — see ``create_pydantic_ai_tracing_handler`` in @@ -19,7 +18,7 @@ from __future__ import annotations -from typing import TYPE_CHECKING, Any +from typing import TYPE_CHECKING if TYPE_CHECKING: from agentex.lib.adk._modules._pydantic_ai_tracing import ( @@ -49,230 +48,18 @@ async def stream_pydantic_ai_events( more text) return only the final text segment, matching the ``stream_langgraph_events`` convention. """ - # Lazy imports so pydantic-ai isn't required at module load time. - import json + from agentex.lib.core.harness.emitter import UnifiedEmitter + from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn - from pydantic_ai.messages import ( - TextPart, - PartEndEvent, - ThinkingPart, - ToolCallPart, - TextPartDelta, - PartDeltaEvent, - PartStartEvent, - ThinkingPartDelta, - FunctionToolResultEvent, + turn = PydanticAITurn( + stream, + model=None, + tracing_handler=tracing_handler, ) - - from agentex.lib import adk - from agentex.types.text_content import TextContent - from agentex.types.reasoning_content import ReasoningContent - from agentex.types.task_message_delta import TextDelta - from agentex.types.task_message_update import StreamTaskMessageDelta - from agentex.types.tool_request_content import ToolRequestContent - from agentex.types.tool_response_content import ToolResponseContent - from agentex.types.reasoning_content_delta import ReasoningContentDelta - - text_context = None - reasoning_context = None - final_text = "" - - # Per Pydantic-AI part-index bookkeeping. Part indices restart at 0 on - # each new model response, so we overwrite on PartStartEvent. - part_kind: dict[int, str] = {} - tool_call_info: dict[int, tuple[str, str]] = {} - - async def _close_text(): - nonlocal text_context - if text_context: - await text_context.close() - text_context = None - - async def _close_reasoning(): - nonlocal reasoning_context - if reasoning_context: - await reasoning_context.close() - reasoning_context = None - - try: - async for event in stream: - if isinstance(event, PartStartEvent): - if isinstance(event.part, TextPart): - await _close_reasoning() - await _close_text() - - final_text = "" - text_context = await adk.streaming.streaming_task_message_context( - task_id=task_id, - initial_content=TextContent( - author="agent", - content="", - format="markdown", - ), - ).__aenter__() - part_kind[event.index] = "text" - - # Pydantic AI puts the first streaming chunk in - # PartStartEvent.part.content; surface it as a Delta so it - # actually renders (Start.content is initialization, not body). - if event.part.content: - final_text += event.part.content - await text_context.stream_update( - StreamTaskMessageDelta( - parent_task_message=text_context.task_message, - delta=TextDelta(type="text", text_delta=event.part.content), - type="delta", - ) - ) - - elif isinstance(event.part, ThinkingPart): - await _close_text() - await _close_reasoning() - - reasoning_context = await adk.streaming.streaming_task_message_context( - task_id=task_id, - initial_content=ReasoningContent( - author="agent", - summary=[], - content=[], - type="reasoning", - style="active", - ), - ).__aenter__() - part_kind[event.index] = "reasoning" - - if event.part.content: - await reasoning_context.stream_update( - StreamTaskMessageDelta( - parent_task_message=reasoning_context.task_message, - delta=ReasoningContentDelta( - type="reasoning_content", - content_index=0, - content_delta=event.part.content, - ), - type="delta", - ) - ) - - elif isinstance(event.part, ToolCallPart): - await _close_text() - await _close_reasoning() - tool_call_info[event.index] = ( - event.part.tool_call_id, - event.part.tool_name, - ) - part_kind[event.index] = "tool_call" - - elif isinstance(event, PartDeltaEvent): - kind = part_kind.get(event.index) - if kind == "text" and isinstance(event.delta, TextPartDelta) and text_context: - final_text += event.delta.content_delta - await text_context.stream_update( - StreamTaskMessageDelta( - parent_task_message=text_context.task_message, - delta=TextDelta(type="text", text_delta=event.delta.content_delta), - type="delta", - ) - ) - elif ( - kind == "reasoning" - and isinstance(event.delta, ThinkingPartDelta) - and reasoning_context - and event.delta.content_delta - ): - await reasoning_context.stream_update( - StreamTaskMessageDelta( - parent_task_message=reasoning_context.task_message, - delta=ReasoningContentDelta( - type="reasoning_content", - content_index=0, - content_delta=event.delta.content_delta, - ), - type="delta", - ) - ) - # Tool-call arg deltas: Pydantic AI accumulates them; we - # surface the final args on PartEndEvent below (Option A). - - elif isinstance(event, PartEndEvent): - kind = part_kind.get(event.index) - if kind == "text": - await _close_text() - elif kind == "reasoning": - await _close_reasoning() - elif kind == "tool_call" and isinstance(event.part, ToolCallPart): - tool_call_id, tool_name = tool_call_info.get(event.index, ("", "")) - args = event.part.args - if isinstance(args, str): - try: - args = json.loads(args) if args else {} - except json.JSONDecodeError: - args = {"_raw": args} - elif args is None: - args = {} - await adk.messages.create( - task_id=task_id, - content=ToolRequestContent( - tool_call_id=tool_call_id, - name=tool_name, - arguments=args, - author="agent", - ), - ) - if tracing_handler is not None and tool_call_id: - await tracing_handler.on_tool_start( - tool_call_id=tool_call_id, - tool_name=tool_name, - arguments=args, - ) - - elif isinstance(event, FunctionToolResultEvent): - await _close_text() - await _close_reasoning() - - result = event.part - tool_call_id = result.tool_call_id - tool_name = getattr(result, "tool_name", "") or "" - # Preserve structure for dicts / lists / Pydantic models so the - # UI can render them as JSON, not as Python repr. Matches the - # sync converter's ``_tool_return_content`` helper exactly — - # ``str(content)`` on a dict produces ``"{'k': 'v'}"`` which is - # invalid JSON and unreadable in the UI. - content = getattr(result, "content", None) - content_payload: Any - if content is None: - content_payload = str(result) - elif isinstance(content, (str, int, float, bool, list, dict)): - content_payload = content - elif hasattr(content, "model_dump"): - try: - content_payload = content.model_dump() - except Exception: - content_payload = str(content) - else: - content_payload = str(content) - await adk.messages.create( - task_id=task_id, - content=ToolResponseContent( - tool_call_id=tool_call_id, - name=tool_name, - content=content_payload, - author="agent", - ), - ) - if tracing_handler is not None and tool_call_id: - await tracing_handler.on_tool_end( - tool_call_id=tool_call_id, - result=content_payload, - ) - - # FunctionToolCallEvent / FinalResultEvent / AgentRunResultEvent - # are intentionally ignored — same as the sync converter. - - finally: - if text_context: - await text_context.close() - if reasoning_context: - await reasoning_context.close() - - return final_text + emitter = UnifiedEmitter( + task_id=task_id, + trace_id=None, + parent_span_id=None, + ) + result = await emitter.auto_send_turn(turn) + return result.final_text diff --git a/src/agentex/lib/adk/_modules/_pydantic_ai_sync.py b/src/agentex/lib/adk/_modules/_pydantic_ai_sync.py index d94c0ae12..e4ac31e7e 100644 --- a/src/agentex/lib/adk/_modules/_pydantic_ai_sync.py +++ b/src/agentex/lib/adk/_modules/_pydantic_ai_sync.py @@ -16,12 +16,32 @@ async def handle_message_send(params): async with agent.run_stream_events(params.content.content) as stream: async for event in convert_pydantic_ai_to_agentex_events(stream): yield event + +Recommended: unified surface +----------------------------- +For new handlers, prefer ``UnifiedEmitter`` + ``PydanticAITurn`` over the +bare converter. The unified surface wires tracing automatically when a +``trace_id`` is provided, so tool and reasoning spans are derived from the +same event stream with no extra setup: + + from agentex.lib.core.harness import UnifiedEmitter + from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + + emitter = UnifiedEmitter(task_id=task_id, trace_id=trace_id, parent_span_id=parent_span_id) + turn = PydanticAITurn(agent.run_stream_events(prompt), model="openai:gpt-4o") + async for event in emitter.yield_turn(turn): + yield event # forwarded over the ACP streaming response; spans derived automatically + +``convert_pydantic_ai_to_agentex_events`` remains the low-level tap for +callers that manage their own tracing or need direct access to the raw +converted stream. """ from __future__ import annotations import json -from typing import TYPE_CHECKING, Any, AsyncIterator +import inspect +from typing import TYPE_CHECKING, Any, Callable, AsyncIterator from pydantic_ai.run import AgentRunResultEvent @@ -105,6 +125,7 @@ def _tool_return_content(result: ToolReturnPart | Any) -> Any: async def convert_pydantic_ai_to_agentex_events( stream_response: AsyncIterator[Any], tracing_handler: "AgentexPydanticAITracingHandler | None" = None, + on_result: Callable[[AgentRunResultEvent], Any] | None = None, ) -> AsyncIterator[StreamTaskMessageStart | StreamTaskMessageDelta | StreamTaskMessageFull | StreamTaskMessageDone]: """Convert a Pydantic AI agent event stream into Agentex stream events. @@ -132,6 +153,12 @@ async def convert_pydantic_ai_to_agentex_events( tool call in the run is also recorded as an Agentex child span beneath the handler's configured ``parent_span_id``. Streaming behavior is unchanged when omitted. + on_result: Optional callback invoked with the terminal + ``AgentRunResultEvent`` when the run completes. Both sync and + async callables are accepted. No ``StreamTaskMessage*`` events are + yielded for this terminal event; the callback is the only side + effect. Useful for capturing run-level usage without altering the + streaming output. Yields: Agentex ``StreamTaskMessage*`` events suitable for forwarding back over @@ -328,6 +355,10 @@ async def convert_pydantic_ai_to_agentex_events( # Already covered by PartStart/PartDelta/PartEnd events above, or # informational only (FinalResultEvent / AgentRunResultEvent signal # run-level state, not new content to surface). + if isinstance(event, AgentRunResultEvent) and on_result is not None: + ret = on_result(event) + if inspect.iscoroutine(ret): + await ret continue else: diff --git a/src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py b/src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py index aa9d906eb..e199d0a8c 100644 --- a/src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py +++ b/src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py @@ -1,5 +1,29 @@ """Tracing handler that records Agentex spans for tool calls in a pydantic-ai agent run. +.. deprecated:: + ``AgentexPydanticAITracingHandler`` and ``create_pydantic_ai_tracing_handler`` + are superseded by the unified harness surface (``UnifiedEmitter`` in + ``agentex.lib.core.harness``). The unified surface derives tool and + reasoning spans directly from the canonical ``StreamTaskMessage*`` stream, + so no separate handler is required. Both symbols remain fully importable + and functional; they will be removed in a future release. New code should + construct a ``UnifiedEmitter`` with a ``trace_id`` instead: + + from agentex.lib.core.harness import UnifiedEmitter + from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + + emitter = UnifiedEmitter(task_id=task_id, trace_id=trace_id, parent_span_id=parent_span_id) + turn = PydanticAITurn(agent.run_stream_events(prompt), model="openai:gpt-4o") + async for event in emitter.yield_turn(turn): + yield event + +# NOTE: A runtime ``warnings.warn(..., DeprecationWarning)`` is intentionally +# omitted here. The repo's pyproject ``filterwarnings = ["error"]`` would turn +# it into a test/caller failure, and the async helper (``stream_pydantic_ai_events``) +# still threads this handler through for existing callers that lack a ``trace_id`` +# on the async path. The runtime warning and caller migration are deferred until +# ``trace_id`` threading lands on the async helper in a future API-versioning change. + Mirrors the LangGraph tracing handler pattern: the caller creates a handler bound to a ``trace_id`` and a ``parent_span_id``, then hands it to ``stream_pydantic_ai_events(..., tracing_handler=handler)``. The streamer @@ -63,6 +87,14 @@ def _tool_span_id(trace_id: str, tool_call_id: str) -> str: class AgentexPydanticAITracingHandler: """Records Agentex tracing spans for tool calls observed in a pydantic-ai event stream. + .. deprecated:: + Superseded by ``UnifiedEmitter`` (``agentex.lib.core.harness``), which + derives tool and reasoning spans from the canonical ``StreamTaskMessage*`` + stream automatically when ``trace_id`` is provided. This class remains + fully functional but will be removed in a future release. New code should + use ``UnifiedEmitter`` with a trace context instead of constructing this + handler directly. + Pass an instance to ``stream_pydantic_ai_events(..., tracing_handler=...)`` or call ``on_tool_start`` / ``on_tool_end`` yourself if you're consuming the event stream by hand. @@ -165,6 +197,13 @@ def create_pydantic_ai_tracing_handler( ) -> AgentexPydanticAITracingHandler: """Create a tracing handler that records Agentex spans for pydantic-ai tool calls. + .. deprecated:: + Superseded by ``UnifiedEmitter`` (``agentex.lib.core.harness``), which + derives tool and reasoning spans from the canonical ``StreamTaskMessage*`` + stream automatically when ``trace_id`` is provided. This function remains + fully functional but will be removed in a future release. New code should + construct a ``UnifiedEmitter`` with a trace context instead. + Args: trace_id: The trace ID. Typically the Agentex task ID. parent_span_id: Optional parent span ID to nest tool spans under. If diff --git a/src/agentex/lib/adk/_modules/_pydantic_ai_turn.py b/src/agentex/lib/adk/_modules/_pydantic_ai_turn.py new file mode 100644 index 000000000..b06172e7f --- /dev/null +++ b/src/agentex/lib/adk/_modules/_pydantic_ai_turn.py @@ -0,0 +1,134 @@ +"""PydanticAITurn: a HarnessTurn wrapping a pydantic-ai event stream. + +Adapts a pydantic-ai ``AgentStreamEvent`` stream into the canonical +``StreamTaskMessage*`` stream while capturing run-level usage from the +terminal ``AgentRunResultEvent``. + +Typical usage:: + + async with agent.run_stream_events(user_msg) as stream: + turn = PydanticAITurn(stream, model="openai:gpt-4o") + async for event in turn.events: + yield event + span.set_attributes(turn.usage().model_dump()) +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING, Any, AsyncIterator + +from pydantic_ai.run import AgentRunResultEvent + +from agentex.lib.core.harness.types import TurnUsage +from agentex.types.task_message_update import ( + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageDelta, + StreamTaskMessageStart, +) +from agentex.lib.adk._modules._pydantic_ai_sync import convert_pydantic_ai_to_agentex_events + +if TYPE_CHECKING: + from agentex.lib.adk._modules._pydantic_ai_tracing import AgentexPydanticAITracingHandler + +StreamTaskMessage = StreamTaskMessageStart | StreamTaskMessageDelta | StreamTaskMessageFull | StreamTaskMessageDone + + +def pydantic_ai_usage_to_turn_usage(usage: Any, model: str | None) -> TurnUsage: + """Map a pydantic-ai ``RunUsage`` onto ``TurnUsage``. + + Uses defensive ``getattr(..., None)`` so a future field rename in + pydantic-ai degrades to ``None`` rather than raising ``AttributeError``. + + RunUsage fields (verified against pydantic-ai in this repo): + input_tokens, cache_write_tokens, cache_read_tokens, output_tokens, + input_audio_tokens, cache_audio_read_tokens, output_audio_tokens, + details, requests, tool_calls. + ``total_tokens`` is a computed property. + + Mapping: + requests -> num_llm_calls + input_tokens -> input_tokens + output_tokens -> output_tokens + cache_read_tokens -> cached_input_tokens + total_tokens -> total_tokens + + getattr results pass straight through: a MISSING attribute degrades to + None (defensive), while a real 0 stays 0 (a cache-hit with 0 output + tokens is a genuine zero, not "unknown") and a real N stays N. + """ + raw_input = getattr(usage, "input_tokens", None) + raw_output = getattr(usage, "output_tokens", None) + raw_cache_read = getattr(usage, "cache_read_tokens", None) + raw_total = getattr(usage, "total_tokens", None) + raw_requests = getattr(usage, "requests", None) + + return TurnUsage( + model=model, + input_tokens=raw_input, + output_tokens=raw_output, + cached_input_tokens=raw_cache_read, + total_tokens=raw_total, + num_llm_calls=raw_requests if raw_requests is not None else 0, + ) + + +class PydanticAITurn: + """A single harness turn backed by a pydantic-ai event stream. + + Satisfies the ``HarnessTurn`` protocol: ``events`` async-generates the + canonical ``StreamTaskMessage*`` stream; ``usage()`` returns a normalized + ``TurnUsage`` (valid only after ``events`` is exhausted). + + ``events`` is identical to the bare ``convert_pydantic_ai_to_agentex_events`` + output (tool calls stream as ``Start + ToolRequestDelta + Done``, preserving + argument-token streaming on the sync/yield channel). The foundation + ``auto_send`` delivers the streamed tool-request shape natively (AGX1-377), + so no coalescing is needed on either channel. + """ + + def __init__( + self, + stream: AsyncIterator[Any], + model: str | None = None, + tracing_handler: "AgentexPydanticAITracingHandler | None" = None, + ) -> None: + self._stream = stream + self._model = model + self._tracing_handler = tracing_handler + self._usage = TurnUsage(model=model) + + @property + def events(self) -> AsyncIterator[StreamTaskMessage]: + return self._generate_events() + + async def _generate_events(self) -> AsyncIterator[StreamTaskMessage]: + def _capture(result_event: AgentRunResultEvent) -> None: + run_result = getattr(result_event, "result", None) + if run_result is None: + return + usage_attr = getattr(run_result, "usage", None) + if usage_attr is None: + return + # In newer pydantic-ai, .usage is a DeprecatedCallableRunUsage — + # it's both a property value and callable (emitting a deprecation + # warning when called). Access it as a plain attribute to avoid the + # warning; it already IS the RunUsage instance. + usage_obj = usage_attr + self._usage = pydantic_ai_usage_to_turn_usage(usage_obj, self._model) + + raw_stream = convert_pydantic_ai_to_agentex_events( + self._stream, + tracing_handler=self._tracing_handler, + on_result=_capture, + ) + async for ev in raw_stream: + yield ev + + def usage(self) -> TurnUsage: + """Return the normalized usage for this turn. + + Valid only after ``events`` is exhausted (single-pass contract). + Before exhaustion the model field is set but token fields are None. + """ + return self._usage diff --git a/tests/lib/adk/test_pydantic_ai_async.py b/tests/lib/adk/test_pydantic_ai_async.py index dadda5914..49cb6054c 100644 --- a/tests/lib/adk/test_pydantic_ai_async.py +++ b/tests/lib/adk/test_pydantic_ai_async.py @@ -82,7 +82,9 @@ class FakeStreamingModule: def __init__(self) -> None: self.contexts: list[FakeContext] = [] - def streaming_task_message_context(self, *, task_id: str, initial_content: Any) -> FakeContext: + def streaming_task_message_context( + self, *, task_id: str, initial_content: Any, streaming_mode: str = "coalesced", created_at: Any = None + ) -> FakeContext: tm = TaskMessage( id=f"m{len(self.contexts) + 1}", task_id=task_id, @@ -255,16 +257,36 @@ async def test_empty_thinking_delta_is_skipped( class TestToolCallEmission: - async def test_tool_call_emits_full_tool_request_message_on_part_end( + async def test_tool_call_opens_streaming_context_with_identity( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - """Async helper uses Option A: tool requests are full messages, not delta streams.""" + """Tool requests are delivered as a streaming context (Start+Delta+Done). + + AGX1-377 fix: auto_send now delivers streamed tool-request messages + natively (Start+ToolRequestDelta+Done). The streaming context is opened + at the Start event with the initial ToolRequestContent (tool_call_id + + name + empty arguments), argument tokens are streamed as deltas, and the + context is closed on Done. + + This test uses a realistic pydantic-ai event sequence: args arrive as a + PartDeltaEvent fragment (the way OpenAI/Anthropic actually stream JSON + tool-call arguments). + """ + from pydantic_ai.messages import ToolCallPartDelta + + from agentex.types.tool_request_delta import ToolRequestDelta + streaming, messages = fake_adk events = [ PartStartEvent( index=1, part=ToolCallPart(tool_name="get_weather", args=None, tool_call_id="c1"), ), + # Realistic: args arrive as delta tokens (JSON string fragments). + PartDeltaEvent( + index=1, + delta=ToolCallPartDelta(args_delta='{"city":"Paris"}'), + ), PartEndEvent( index=1, part=ToolCallPart(tool_name="get_weather", args='{"city":"Paris"}', tool_call_id="c1"), @@ -272,21 +294,28 @@ async def test_tool_call_emits_full_tool_request_message_on_part_end( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert streaming.contexts == [], "Tool calls do not open a streaming context" - assert len(messages.created) == 1 - msg = messages.created[0] - assert msg["task_id"] == TASK_ID - content = msg["content"] + # AGX1-373: tool messages arrive via streaming_task_message_context. + assert messages.created == [], "adk.messages.create must not be called" + assert len(streaming.contexts) == 1, "tool_request opens a streaming context" + ctx = streaming.contexts[0] + assert ctx.closed is True + content = ctx.initial_content assert isinstance(content, ToolRequestContent) assert content.tool_call_id == "c1" assert content.name == "get_weather" - assert content.arguments == {"city": "Paris"} assert content.author == "agent" + # AGX1-377 streamed shape: initial_content has empty args (args come via delta) + assert content.arguments == {} + # The arg delta is delivered as a stream_update + assert len(ctx.updates) == 1 + assert isinstance(ctx.updates[0].delta, ToolRequestDelta) + assert ctx.updates[0].delta.arguments_delta == '{"city":"Paris"}' async def test_tool_call_with_dict_args_passes_through( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - _, messages = fake_adk + """When args arrive pre-populated as a dict in PartStart, they're in initial_content.""" + streaming, messages = fake_adk events = [ PartStartEvent( index=0, @@ -299,23 +328,40 @@ async def test_tool_call_with_dict_args_passes_through( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(messages.created) == 1 - assert messages.created[0]["content"].arguments == {"q": "weather"} + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + # Dict args present at PartStart land directly in initial_content.arguments + assert streaming.contexts[0].initial_content.arguments == {"q": "weather"} + assert streaming.contexts[0].updates == [], "no delta for pre-populated dict args" async def test_tool_call_with_invalid_json_args_surfaces_raw( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - """Don't drop the tool call when the model emits malformed JSON args. + """Malformed JSON arg delta is surfaced as a ToolRequestDelta with the raw string. + + The argument delta is delivered as-is by auto_send; the client-side + accumulator or the streaming backend handles malformed JSON gracefully. - The arguments field is preserved under ``_raw`` so the failure is - visible to the UI rather than silently truncated. + Parts-manager invariant: PartEnd.part is the accumulated snapshot; real + pydantic-ai conveys args via PartStart + PartDeltaEvent, so a + PartStart(None)+PartEnd(json) with no delta is not realizable. """ - _, messages = fake_adk + from pydantic_ai.messages import ToolCallPartDelta + + from agentex.types.tool_request_delta import ToolRequestDelta + + streaming, messages = fake_adk events = [ PartStartEvent( index=0, part=ToolCallPart(tool_name="t", args=None, tool_call_id="c"), ), + # Malformed JSON arrives as a delta token. + PartDeltaEvent( + index=0, + delta=ToolCallPartDelta(args_delta="not-json{"), + ), PartEndEvent( index=0, part=ToolCallPart(tool_name="t", args="not-json{", tool_call_id="c"), @@ -323,13 +369,21 @@ async def test_tool_call_with_invalid_json_args_surfaces_raw( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(messages.created) == 1 - assert messages.created[0]["content"].arguments == {"_raw": "not-json{"} + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + ctx = streaming.contexts[0] + # Initial content has empty args (args come via delta) + assert ctx.initial_content.arguments == {} + # The malformed JSON is surfaced verbatim in the ToolRequestDelta + assert len(ctx.updates) == 1 + assert isinstance(ctx.updates[0].delta, ToolRequestDelta) + assert ctx.updates[0].delta.arguments_delta == "not-json{" async def test_tool_call_with_none_args_defaults_to_empty_dict( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - _, messages = fake_adk + streaming, messages = fake_adk events = [ PartStartEvent( index=0, @@ -342,15 +396,20 @@ async def test_tool_call_with_none_args_defaults_to_empty_dict( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(messages.created) == 1 - assert messages.created[0]["content"].arguments == {} + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + assert streaming.contexts[0].initial_content.arguments == {} + assert streaming.contexts[0].updates == [], "no delta when args are absent" class TestToolResult: async def test_tool_return_emits_full_tool_response_message( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - _, messages = fake_adk + # AGX1-373: tool responses arrive via streaming_task_message_context + # (open+close pair), NOT via adk.messages.create. + streaming, messages = fake_adk events = [ FunctionToolResultEvent( part=ToolReturnPart(tool_name="get_weather", content="Sunny, 72F", tool_call_id="c1"), @@ -358,13 +417,17 @@ async def test_tool_return_emits_full_tool_response_message( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(messages.created) == 1 - content = messages.created[0]["content"] + assert messages.created == [], "adk.messages.create must not be called after reimplementation" + assert len(streaming.contexts) == 1 + ctx = streaming.contexts[0] + assert ctx.closed is True + content = ctx.initial_content assert isinstance(content, ToolResponseContent) assert content.tool_call_id == "c1" assert content.name == "get_weather" assert content.content == "Sunny, 72F" assert content.author == "agent" + assert ctx.updates == [], "open+close only — no deltas for tool messages" async def test_tool_return_with_dict_content_preserves_structure( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] @@ -377,7 +440,7 @@ async def test_tool_return_with_dict_content_preserves_structure( and divergent from the sync converter which uses ``_tool_return_content`` to return dicts as-is. """ - _, messages = fake_adk + streaming, messages = fake_adk events = [ FunctionToolResultEvent( part=ToolReturnPart(tool_name="t", content={"temp": 72, "sky": "clear"}, tool_call_id="c"), @@ -385,7 +448,10 @@ async def test_tool_return_with_dict_content_preserves_structure( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - out = messages.created[0]["content"].content + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + out = streaming.contexts[0].initial_content.content assert out == {"temp": 72, "sky": "clear"}, ( f"Expected the dict to survive verbatim; got {out!r}. " "If this is a Python repr string, the helper regressed to str(content)." @@ -402,7 +468,7 @@ class WeatherResult(BaseModel): temp: int sky: str - _, messages = fake_adk + streaming, messages = fake_adk events = [ FunctionToolResultEvent( part=ToolReturnPart( @@ -414,13 +480,16 @@ class WeatherResult(BaseModel): ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - out = messages.created[0]["content"].content + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + out = streaming.contexts[0].initial_content.content assert out == {"temp": 72, "sky": "clear"} async def test_retry_prompt_part_surfaces_as_tool_response( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - _, messages = fake_adk + streaming, messages = fake_adk events = [ FunctionToolResultEvent( part=RetryPromptPart( @@ -432,8 +501,10 @@ async def test_retry_prompt_part_surfaces_as_tool_response( ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(messages.created) == 1 - content = messages.created[0]["content"] + # AGX1-373: tool messages via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 1 + content = streaming.contexts[0].initial_content assert isinstance(content, ToolResponseContent) assert content.tool_call_id == "c1" # RetryPromptPart.content stringifies to the error description @@ -446,9 +517,9 @@ async def test_text_then_tool_then_text_uses_separate_contexts_in_order( ) -> None: """End-to-end multi-step shape: text → tool call → tool result → more text. - Each text/reasoning segment must get its own streaming context that is - closed before the next one opens, and tool messages must interleave - correctly via ``adk.messages.create``. + AGX1-373 envelope change: tool messages now arrive via + streaming_task_message_context (open+close pairs) instead of + adk.messages.create. All four message types open streaming contexts. """ streaming, messages = fake_adk events = [ @@ -474,18 +545,30 @@ async def test_text_then_tool_then_text_uses_separate_contexts_in_order( ] final = await stream_pydantic_ai_events(_aiter(events), TASK_ID) - assert len(streaming.contexts) == 2, "One context per text part — tool calls don't open streaming contexts" + # AGX1-373: all 4 messages (text, tool_request, tool_response, text) + # arrive via streaming_task_message_context. + assert messages.created == [], "adk.messages.create must not be called after reimplementation" + assert len(streaming.contexts) == 4 assert all(ctx.closed for ctx in streaming.contexts) - assert _text_deltas(streaming.contexts[0]) == ["Looking up..."] - assert _text_deltas(streaming.contexts[1]) == ["It's sunny."] - # Two messages: tool request, then tool response — in that order. - assert [type(m["content"]).__name__ for m in messages.created] == [ - "ToolRequestContent", - "ToolResponseContent", - ] - assert messages.created[0]["content"].tool_call_id == "c1" - assert messages.created[1]["content"].tool_call_id == "c1" + text_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, TextContent)] + tool_req_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, ToolRequestContent)] + tool_resp_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, ToolResponseContent)] + assert len(text_ctxs) == 2 + assert len(tool_req_ctxs) == 1 + assert len(tool_resp_ctxs) == 1 + + assert _text_deltas(text_ctxs[0]) == ["Looking up..."] + assert _text_deltas(text_ctxs[1]) == ["It's sunny."] + + # Tool content is preserved verbatim. + assert tool_req_ctxs[0].initial_content.tool_call_id == "c1" + assert tool_resp_ctxs[0].initial_content.tool_call_id == "c1" + + # Tool contexts carry no deltas (open+close only). + assert tool_req_ctxs[0].updates == [] + assert tool_resp_ctxs[0].updates == [] + assert final == "It's sunny." async def test_new_text_part_after_text_closes_previous( @@ -533,7 +616,11 @@ async def test_reasoning_then_text_closes_reasoning_context( async def test_tool_result_closes_any_open_streaming_context( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - """A tool result arriving while a text context is open must close that context first.""" + """A tool result arriving while a text context is open must close that context first. + + AGX1-373: the tool response itself now also opens a streaming context + (open+close pair) rather than going through adk.messages.create. + """ streaming, messages = fake_adk events = [ PartStartEvent(index=0, part=TextPart(content="")), @@ -548,7 +635,10 @@ async def test_tool_result_closes_any_open_streaming_context( assert streaming.contexts[0].closed is True, ( "Helper must close any open streaming context before emitting a tool result message" ) - assert len(messages.created) == 1 + # AGX1-373: tool response arrives via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 2 + assert isinstance(streaming.contexts[1].initial_content, ToolResponseContent) class TestDeltaForOrphanIndexIgnored: @@ -584,7 +674,7 @@ async def on_tool_end(self, tool_call_id: str, result: Any) -> None: async def test_handler_records_start_and_end_for_each_tool_call( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - _, messages = fake_adk + streaming, messages = fake_adk handler = self._RecordingHandler() events = [ PartStartEvent( @@ -605,11 +695,12 @@ async def test_handler_records_start_and_end_for_each_tool_call( tracing_handler=handler, # type: ignore[arg-type] ) - # Streaming side-effects still happen — tracing is additive. - assert [type(m["content"]).__name__ for m in messages.created] == [ - "ToolRequestContent", - "ToolResponseContent", - ] + # AGX1-373: tool messages arrive via streaming_task_message_context. + # Tracing is still additive — both messages are delivered AND hooks fire. + assert messages.created == [] + assert len(streaming.contexts) == 2 + assert isinstance(streaming.contexts[0].initial_content, ToolRequestContent) + assert isinstance(streaming.contexts[1].initial_content, ToolResponseContent) # And both lifecycle hooks fired exactly once with the right payload. assert handler.starts == [ { @@ -680,8 +771,12 @@ async def test_handler_records_each_tool_in_multi_tool_run( async def test_omitting_handler_is_a_no_op_for_existing_behavior( self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] ) -> None: - """Regression: passing no tracing handler preserves the pre-tracing behavior.""" - _, messages = fake_adk + """Regression: passing no tracing handler preserves streaming behavior. + + AGX1-373: tool messages arrive via streaming_task_message_context + regardless of whether tracing_handler is passed. + """ + streaming, messages = fake_adk events = [ PartStartEvent( index=0, @@ -696,11 +791,11 @@ async def test_omitting_handler_is_a_no_op_for_existing_behavior( ), ] await stream_pydantic_ai_events(_aiter(events), TASK_ID) - # Exact same shape as before tracing existed. - assert [type(m["content"]).__name__ for m in messages.created] == [ - "ToolRequestContent", - "ToolResponseContent", - ] + # AGX1-373: tool messages via streaming_task_message_context. + assert messages.created == [] + assert len(streaming.contexts) == 2 + content_types = [type(ctx.initial_content).__name__ for ctx in streaming.contexts] + assert content_types == ["ToolRequestContent", "ToolResponseContent"] class TestPydanticAITracingHandlerDeterministicIds: @@ -867,3 +962,101 @@ async def boom() -> AsyncIterator[Any]: await stream_pydantic_ai_events(boom(), TASK_ID) assert streaming.contexts[0].closed is True + + +# --------------------------------------------------------------------------- +# Characterization test: lock the wire-level delivery shape for a representative +# pydantic-ai run (text + tool call + tool response + more text). +# +# Step 1 (CURRENT behavior): written against the original implementation. +# - Text/reasoning use adk.streaming.streaming_task_message_context. +# - Tool messages use adk.messages.create (FakeMessagesModule.created list). +# - Final text is the last text segment. +# +# Step 2 (POST-reimplementation on UnifiedEmitter / auto_send): +# The assertions in TestCharacterizeWireShapeNew (below) lock the new shape. +# Tool messages no longer go through adk.messages.create; they arrive via +# streaming_task_message_context open+close pairs (Start+Done envelope). +# This is the AGX1-373 accepted envelope change: logical content is identical. +# --------------------------------------------------------------------------- + + +class TestCharacterizeWireShape: + """Characterization tests: lock the wire-level delivery shape after reimplementation. + + Uses FakeStreamingModule + FakeMessagesModule (the existing fake pair). + + AGX1-373 shape (post-reimplementation on UnifiedEmitter / auto_send): + - Text/reasoning: streaming_task_message_context (open + deltas + close) + - Tool messages: streaming_task_message_context (open+close, no deltas) + - adk.messages.create is NOT called. + - Final text == last text segment only. + + This class was first written to characterize the OLD shape (adk.messages.create + for tool messages) and was updated post-reimplementation to reflect the new + delivery channel. The logical content is identical; only the channel changed. + """ + + async def test_text_tool_text_new_wire_shape( + self, fake_adk: tuple[FakeStreamingModule, FakeMessagesModule] + ) -> None: + """Representative run: text -> tool call -> tool response -> more text. + + Post-AGX1-373 delivery shape: + - Four streaming contexts: text, tool_request, tool_response, text. + - adk.messages.create NOT called. + - Final text == "It's sunny." (last segment only, matching the + multi-step convention). + """ + from pydantic_ai.messages import ToolReturnPart + + streaming, messages = fake_adk + events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="Looking up...")), + PartEndEvent(index=0, part=TextPart(content="Looking up...")), + PartStartEvent( + index=1, + part=ToolCallPart(tool_name="get_weather", args=None, tool_call_id="c1"), + ), + PartEndEvent( + index=1, + part=ToolCallPart(tool_name="get_weather", args="{}", tool_call_id="c1"), + ), + FunctionToolResultEvent( + part=ToolReturnPart(tool_name="get_weather", content="Sunny", tool_call_id="c1"), + ), + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="It's sunny.")), + PartEndEvent(index=0, part=TextPart(content="It's sunny.")), + ] + + final = await stream_pydantic_ai_events(_aiter(events), TASK_ID) + + assert final == "It's sunny.", "multi-step: only the last text segment is returned" + + # AGX1-373: all 4 messages arrive via streaming_task_message_context + assert messages.created == [] + assert len(streaming.contexts) == 4 + assert all(ctx.closed for ctx in streaming.contexts) + + content_types = [type(ctx.initial_content).__name__ for ctx in streaming.contexts] + assert content_types == [ + "TextContent", + "ToolRequestContent", + "ToolResponseContent", + "TextContent", + ] + + text_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, TextContent)] + tool_req_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, ToolRequestContent)] + tool_resp_ctxs = [ctx for ctx in streaming.contexts if isinstance(ctx.initial_content, ToolResponseContent)] + + assert _text_deltas(text_ctxs[0]) == ["Looking up..."] + assert _text_deltas(text_ctxs[1]) == ["It's sunny."] + assert tool_req_ctxs[0].initial_content.tool_call_id == "c1" + assert tool_req_ctxs[0].initial_content.name == "get_weather" + assert tool_req_ctxs[0].updates == [] + assert tool_resp_ctxs[0].initial_content.tool_call_id == "c1" + assert tool_resp_ctxs[0].initial_content.content == "Sunny" + assert tool_resp_ctxs[0].updates == [] diff --git a/tests/lib/adk/test_pydantic_ai_sync.py b/tests/lib/adk/test_pydantic_ai_sync.py index 36d06200e..080bc5be8 100644 --- a/tests/lib/adk/test_pydantic_ai_sync.py +++ b/tests/lib/adk/test_pydantic_ai_sync.py @@ -3,9 +3,11 @@ from __future__ import annotations import json +import asyncio from typing import Any, AsyncIterator import pytest +from pydantic_ai.run import AgentRunResult, AgentRunResultEvent from pydantic_ai.messages import ( TextPart, PartEndEvent, @@ -481,3 +483,75 @@ async def test_author_is_agent(self, events: list[Any]): content = getattr(e, "content", None) if content is not None and hasattr(content, "author"): assert content.author == "agent" + + +class TestOnResultCallback: + """on_result callback: captures the terminal AgentRunResultEvent without + altering streaming output.""" + + def _make_result_event(self, output: Any = "hello") -> AgentRunResultEvent: + result = AgentRunResult(output=output, _output_tool_name=None) + return AgentRunResultEvent(result=result) + + async def test_callback_invoked_once_with_result_event(self): + """on_result is called exactly once, with the AgentRunResultEvent.""" + captured: list[AgentRunResultEvent] = [] + + def on_result(event: AgentRunResultEvent) -> None: + captured.append(event) + + result_event = self._make_result_event("the answer") + events = [result_event] + await _collect(convert_pydantic_ai_to_agentex_events(_aiter(events), on_result=on_result)) + + assert len(captured) == 1 + assert captured[0] is result_event + assert captured[0].result.output == "the answer" + + async def test_streaming_output_unchanged_with_callback(self): + """Yielded StreamTaskMessage* sequence is identical whether on_result is set or not.""" + result_event = self._make_result_event() + events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="hi")), + PartEndEvent(index=0, part=TextPart(content="hi")), + result_event, + ] + + captured: list[AgentRunResultEvent] = [] + out_with = await _collect(convert_pydantic_ai_to_agentex_events(_aiter(events), on_result=captured.append)) + out_without = await _collect(convert_pydantic_ai_to_agentex_events(_aiter(events))) + + assert len(out_with) == len(out_without) + for a, b in zip(out_with, out_without): + assert type(a) is type(b) + assert a.model_dump() == b.model_dump() + assert len(captured) == 1 + + async def test_no_callback_no_error(self): + """AgentRunResultEvent is silently ignored when on_result is None.""" + result_event = self._make_result_event() + events = [result_event] + out = await _collect(convert_pydantic_ai_to_agentex_events(_aiter(events))) + assert out == [] + + async def test_async_callback_is_awaited(self): + """An async on_result callable is properly awaited. + + The callback suspends (``await asyncio.sleep(0)``) before recording its + side effect, so ``awaited`` is only populated if the converter actually + awaits the returned coroutine — distinguishing "awaited" from + "called-but-not-awaited." + """ + awaited: list[AgentRunResultEvent] = [] + + async def on_result_async(event: AgentRunResultEvent) -> None: + await asyncio.sleep(0) + awaited.append(event) + + result_event = self._make_result_event("async_output") + events = [result_event] + await _collect(convert_pydantic_ai_to_agentex_events(_aiter(events), on_result=on_result_async)) + + assert len(awaited) == 1 + assert awaited[0].result.output == "async_output" diff --git a/tests/lib/adk/test_pydantic_ai_sync_unified.py b/tests/lib/adk/test_pydantic_ai_sync_unified.py new file mode 100644 index 000000000..f920418de --- /dev/null +++ b/tests/lib/adk/test_pydantic_ai_sync_unified.py @@ -0,0 +1,209 @@ +"""Tests for the unified sync (HTTP ACP) path: PydanticAITurn + UnifiedEmitter. + +Exercises the path documented in _pydantic_ai_sync.py under "Recommended: unified surface": +- events forwarded by yield_turn equal PydanticAITurn(stream).events (passthrough) +- with a trace context + fake tracing backend, tool spans are derived (start_span / end_span called) +- with a trace context + fake tracing backend, reasoning spans are derived +""" + +from __future__ import annotations + +from typing import Any, AsyncIterator + +from pydantic_ai.run import AgentRunResult, AgentRunResultEvent +from pydantic_ai.usage import RunUsage +from pydantic_ai.messages import ( + TextPart, + PartEndEvent, + ThinkingPart, + ToolCallPart, + TextPartDelta, + PartDeltaEvent, + PartStartEvent, + ThinkingPartDelta, + ToolCallPartDelta, +) + +from agentex.lib.core.harness import UnifiedEmitter +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + + +async def _aiter(events: list[Any]) -> AsyncIterator[Any]: + for e in events: + yield e + + +async def _collect(stream: AsyncIterator[Any]) -> list[Any]: + return [e async for e in stream] + + +class _FakeSpan: + def __init__(self, name: str): + self.name = name + self.output: Any = None + + +class _FakeTracing: + def __init__(self) -> None: + self.started: list[tuple[str, str | None, Any]] = [] + self.ended: list[tuple[str, Any]] = [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append((name, parent_id, input)) + return _FakeSpan(name) + + async def end_span(self, *, trace_id, span): + self.ended.append((span.name, span.output)) + + +def _make_result_event(usage: RunUsage | None = None) -> AgentRunResultEvent: + result = AgentRunResult(output="done", _output_tool_name=None) + if usage is not None: + result._state.usage = usage + return AgentRunResultEvent(result=result) + + +class TestUnifiedSyncPathPassthrough: + """The events forwarded by yield_turn are identical to PydanticAITurn.events.""" + + async def test_text_stream_passthrough(self): + raw_events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="hello")), + PartEndEvent(index=0, part=TextPart(content="hello")), + ] + + turn_a = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + direct = await _collect(turn_a.events) + + turn_b = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None) + via_emitter = await _collect(emitter.yield_turn(turn_b)) + + assert len(via_emitter) == len(direct) + for a, b in zip(via_emitter, direct): + assert type(a) is type(b) + assert a.model_dump() == b.model_dump() + + async def test_tool_call_stream_passthrough(self): + raw_events = [ + PartStartEvent(index=0, part=ToolCallPart(tool_name="Bash", args=None, tool_call_id="c1")), + PartDeltaEvent(index=0, delta=ToolCallPartDelta(args_delta='{"cmd":"ls"}')), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args='{"cmd":"ls"}', tool_call_id="c1"), + ), + ] + + turn_a = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + direct = await _collect(turn_a.events) + + turn_b = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None) + via_emitter = await _collect(emitter.yield_turn(turn_b)) + + assert len(via_emitter) == len(direct) + for a, b in zip(via_emitter, direct): + assert type(a) is type(b) + assert a.model_dump() == b.model_dump() + + +class TestUnifiedSyncPathSpanDerivation: + """With trace context + fake tracing, spans are derived from the stream.""" + + async def test_tool_span_opened_and_closed(self): + """A tool call produces start_span + end_span on the fake tracing backend.""" + from pydantic_ai.messages import ToolReturnPart, FunctionToolResultEvent + + tool_events = [ + PartStartEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args={"cmd": "ls"}, tool_call_id="call_1"), + ), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args='{"cmd":"ls"}', tool_call_id="call_1"), + ), + FunctionToolResultEvent( + part=ToolReturnPart(tool_name="Bash", content="files", tool_call_id="call_1"), + ), + ] + + fake = _FakeTracing() + turn = PydanticAITurn(_aiter(tool_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id="tr", parent_span_id="p", tracing=fake) + + events = await _collect(emitter.yield_turn(turn)) + + assert len(events) >= 2, "at least Start(tool) + Done + Full(response)" + assert len(fake.started) == 1, "one tool span opened" + assert len(fake.ended) == 1, "one tool span closed" + span_name, parent_id, span_input = fake.started[0] + assert span_name == "Bash" + assert parent_id == "p" + closed_name, closed_output = fake.ended[0] + assert closed_name == "Bash" + + async def test_reasoning_span_opened_and_closed(self): + """A thinking/reasoning block produces start_span + end_span.""" + reasoning_events = [ + PartStartEvent(index=0, part=ThinkingPart(content="")), + PartDeltaEvent(index=0, delta=ThinkingPartDelta(content_delta="let me think")), + PartEndEvent(index=0, part=ThinkingPart(content="let me think")), + ] + + fake = _FakeTracing() + turn = PydanticAITurn(_aiter(reasoning_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id="tr", parent_span_id="p", tracing=fake) + + await _collect(emitter.yield_turn(turn)) + + assert len(fake.started) == 1, "one reasoning span opened" + assert len(fake.ended) == 1, "one reasoning span closed" + span_name, parent_id, _ = fake.started[0] + assert span_name == "reasoning" + assert parent_id == "p" + + async def test_no_trace_id_means_no_spans(self): + """When trace_id is None, no spans are derived even with a fake tracing backend.""" + raw_events = [ + PartStartEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args={"cmd": "ls"}, tool_call_id="c2"), + ), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args='{"cmd":"ls"}', tool_call_id="c2"), + ), + ] + + fake = _FakeTracing() + turn = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None, tracing=fake) + + await _collect(emitter.yield_turn(turn)) + + assert fake.started == [], "no spans when trace_id is absent" + assert fake.ended == [] + + async def test_tracer_false_suppresses_spans_even_with_trace_id(self): + """tracer=False disables span derivation regardless of trace_id.""" + raw_events = [ + PartStartEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args={"cmd": "ls"}, tool_call_id="c3"), + ), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="Bash", args='{"cmd":"ls"}', tool_call_id="c3"), + ), + ] + + fake = _FakeTracing() + turn = PydanticAITurn(_aiter(raw_events), model="openai:gpt-4o") + emitter = UnifiedEmitter(task_id="t", trace_id="tr", parent_span_id="p", tracer=False, tracing=fake) + + await _collect(emitter.yield_turn(turn)) + + assert fake.started == [] + assert fake.ended == [] diff --git a/tests/lib/adk/test_pydantic_ai_turn.py b/tests/lib/adk/test_pydantic_ai_turn.py new file mode 100644 index 000000000..0659895d3 --- /dev/null +++ b/tests/lib/adk/test_pydantic_ai_turn.py @@ -0,0 +1,276 @@ +"""Tests for PydanticAITurn and pydantic_ai_usage_to_turn_usage.""" + +from __future__ import annotations + +from typing import Any, AsyncIterator + +from pydantic_ai.run import AgentRunResult, AgentRunResultEvent +from pydantic_ai.usage import RunUsage +from pydantic_ai.messages import ( + TextPart, + PartEndEvent, + TextPartDelta, + PartDeltaEvent, + PartStartEvent, +) + +from agentex.lib.core.harness import HarnessTurn +from agentex.lib.adk._modules._pydantic_ai_turn import ( + PydanticAITurn, + pydantic_ai_usage_to_turn_usage, +) + + +async def _aiter(events: list[Any]) -> AsyncIterator[Any]: + for e in events: + yield e + + +async def _collect(stream: AsyncIterator[Any]) -> list[Any]: + return [e async for e in stream] + + +def _make_result_event(output: Any = "done", usage: RunUsage | None = None) -> AgentRunResultEvent: + result = AgentRunResult(output=output, _output_tool_name=None) + if usage is not None: + result._state.usage = usage + return AgentRunResultEvent(result=result) + + +class TestUsageNormalization: + def test_usage_normalization_maps_fields(self): + """Real RunUsage fields map correctly onto TurnUsage.""" + usage = RunUsage( + requests=3, + input_tokens=200, + output_tokens=80, + cache_read_tokens=25, + ) + turn_usage = pydantic_ai_usage_to_turn_usage(usage, model="openai:gpt-4o") + + assert turn_usage.model == "openai:gpt-4o" + assert turn_usage.input_tokens == 200 + assert turn_usage.output_tokens == 80 + assert turn_usage.num_llm_calls == 3 + + def test_total_tokens_is_computed(self): + """RunUsage.total_tokens is a computed property; we surface it correctly.""" + usage = RunUsage(input_tokens=100, output_tokens=50) + turn_usage = pydantic_ai_usage_to_turn_usage(usage, model="openai:gpt-4o") + assert turn_usage.total_tokens == 150 + + def test_cache_read_tokens_mapped_to_cached_input_tokens(self): + usage = RunUsage(input_tokens=100, output_tokens=50, cache_read_tokens=20) + turn_usage = pydantic_ai_usage_to_turn_usage(usage, model="openai:gpt-4o") + assert turn_usage.cached_input_tokens == 20 + + def test_none_model(self): + """model=None is preserved.""" + usage = RunUsage() + turn_usage = pydantic_ai_usage_to_turn_usage(usage, model=None) + assert turn_usage.model is None + + def test_all_zero_usage_preserves_real_zeros(self): + """An all-zero RunUsage maps real 0s through (not None). + + RunUsage token fields are ints defaulting to 0. A 0 is a genuine + value (e.g. a cache-hit with 0 output tokens), not "unknown", so it + must survive normalization as 0 rather than being coerced to None. + """ + usage = RunUsage() + turn_usage = pydantic_ai_usage_to_turn_usage(usage, model="openai:gpt-4o") + assert turn_usage.num_llm_calls == 0 + assert turn_usage.input_tokens == 0 + assert turn_usage.output_tokens == 0 + assert turn_usage.cached_input_tokens == 0 + assert turn_usage.total_tokens == 0 + + def test_missing_field_degrades_to_none(self): + """A usage object MISSING a field maps that field to None (defensive getattr). + + Guards the version-rename guarantee: if pydantic-ai renames a field, + the absent attribute degrades to None rather than raising. + """ + + class StubUsage: + requests = 2 + input_tokens = 100 + # no output_tokens / cache_read_tokens / total_tokens attributes + + turn_usage = pydantic_ai_usage_to_turn_usage(StubUsage(), model="openai:gpt-4o") + assert turn_usage.num_llm_calls == 2 + assert turn_usage.input_tokens == 100 + assert turn_usage.output_tokens is None + assert turn_usage.cached_input_tokens is None + assert turn_usage.total_tokens is None + + +class TestPydanticAITurn: + async def test_turn_satisfies_harness_turn_protocol(self): + """PydanticAITurn is structurally compatible with HarnessTurn.""" + turn = PydanticAITurn(_aiter([]), model="openai:gpt-4o") + assert isinstance(turn, HarnessTurn) + + async def test_usage_before_exhaustion_returns_default(self): + """usage() before iterating events returns default TurnUsage (model set, tokens None).""" + result_event = _make_result_event(usage=RunUsage(requests=1, input_tokens=100, output_tokens=40)) + events = [result_event] + turn = PydanticAITurn(_aiter(events), model="openai:gpt-4o") + + # Do NOT exhaust events — check usage pre-run + pre_usage = turn.usage() + assert pre_usage.model == "openai:gpt-4o" + assert pre_usage.input_tokens is None + assert pre_usage.output_tokens is None + assert pre_usage.num_llm_calls == 0 + + async def test_turn_events_and_usage(self): + """Driving events to exhaustion populates usage from the terminal event.""" + known_usage = RunUsage( + requests=2, + input_tokens=300, + output_tokens=120, + cache_read_tokens=30, + ) + result_event = _make_result_event(usage=known_usage) + events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="hi")), + PartEndEvent(index=0, part=TextPart(content="hi")), + result_event, + ] + turn = PydanticAITurn(_aiter(events), model="openai:gpt-4o") + + collected = await _collect(turn.events) + + # Events match bare converter output (Start + Delta + Done = 3 events) + assert len(collected) == 3 + + # Usage is populated after exhaustion + usage = turn.usage() + assert usage.model == "openai:gpt-4o" + assert usage.input_tokens == 300 + assert usage.output_tokens == 120 + assert usage.cached_input_tokens == 30 + assert usage.num_llm_calls == 2 + assert usage.total_tokens == 420 + + async def test_events_match_bare_converter(self): + """Yielded events are identical to bare convert_pydantic_ai_to_agentex_events output.""" + from agentex.lib.adk._modules._pydantic_ai_sync import convert_pydantic_ai_to_agentex_events + + text_events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="Hello")), + PartEndEvent(index=0, part=TextPart(content="Hello")), + ] + + turn = PydanticAITurn(_aiter(text_events), model="openai:gpt-4o") + turn_out = await _collect(turn.events) + + bare_out = await _collect(convert_pydantic_ai_to_agentex_events(_aiter(text_events))) + + assert len(turn_out) == len(bare_out) + for a, b in zip(turn_out, bare_out): + assert type(a) is type(b) + assert a.model_dump() == b.model_dump() + + async def test_usage_captured_via_real_usage_accessor(self): + """Drive the turn through the REAL ``result.usage`` property accessor. + + The production code reads ``getattr(run_result, "usage", None)``, which + on this pydantic-ai version resolves the ``_DeprecatedCallableRunUsage`` + property (NOT ``_state.usage`` directly). This asserts that the real + accessor path the converter uses captures the run usage. Constructing + the event without our test's ``_state`` shortcut: we set ``_state.usage`` + only because that is the sole supported way to seed an + ``AgentRunResult``, but we then assert capture happens through the + public ``.usage`` attribute access (verified below). + """ + known_usage = RunUsage(requests=4, input_tokens=512, output_tokens=64) + result = AgentRunResult(output="done", _output_tool_name=None) + result._state.usage = known_usage + result_event = AgentRunResultEvent(result=result) + + # Sanity: the value is reachable via the real public accessor the + # production code uses (not just via the private _state). The + # _DeprecatedCallableRunUsage property wraps the value, so compare by + # equality rather than identity. + accessed = getattr(result_event.result, "usage", None) + assert accessed is not None + assert accessed.input_tokens == 512 + assert accessed.requests == 4 + + events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartEndEvent(index=0, part=TextPart(content="")), + result_event, + ] + turn = PydanticAITurn(_aiter(events), model="anthropic:claude-3-5-sonnet") + await _collect(turn.events) + + usage = turn.usage() + assert usage.model == "anthropic:claude-3-5-sonnet" + assert usage.input_tokens == 512 + assert usage.output_tokens == 64 + assert usage.num_llm_calls == 4 + + async def test_no_usage_event_leaves_default_usage(self): + """If the stream has no AgentRunResultEvent, usage() returns the default (tokens None).""" + events = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartEndEvent(index=0, part=TextPart(content="")), + ] + turn = PydanticAITurn(_aiter(events), model="openai:gpt-4o") + await _collect(turn.events) + + usage = turn.usage() + assert usage.model == "openai:gpt-4o" + assert usage.input_tokens is None + assert usage.num_llm_calls == 0 + + +class TestToolRequestStreaming: + """PydanticAITurn.events equals the bare converter output unconditionally. + + The foundation auto_send delivers Start+ToolRequestDelta+Done natively + (AGX1-377), so no coalescing is needed on either channel. + """ + + async def test_events_match_bare_converter_for_streamed_tool_call(self): + """PydanticAITurn yields a ToolRequestDelta for a streamed-args tool call + — i.e. it is byte-for-byte the bare converter output, preserving + argument-token streaming on the sync/yield channel.""" + from pydantic_ai.messages import ToolCallPart, ToolCallPartDelta + + from agentex.types.tool_request_delta import ToolRequestDelta + from agentex.types.task_message_update import StreamTaskMessageDelta + from agentex.lib.adk._modules._pydantic_ai_sync import convert_pydantic_ai_to_agentex_events + + tool_events = [ + PartStartEvent(index=0, part=ToolCallPart(tool_name="get_weather", args=None, tool_call_id="c1")), + PartDeltaEvent(index=0, delta=ToolCallPartDelta(args_delta='{"city":"Paris"}')), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="get_weather", args='{"city":"Paris"}', tool_call_id="c1"), + ), + ] + + turn = PydanticAITurn(_aiter(tool_events), model="openai:gpt-4o") + turn_out = await _collect(turn.events) + + bare_out = await _collect(convert_pydantic_ai_to_agentex_events(_aiter(tool_events))) + + # Turn is identical to the bare converter. + assert len(turn_out) == len(bare_out) + for a, b in zip(turn_out, bare_out): + assert type(a) is type(b) + assert a.model_dump() == b.model_dump() + + # The arg-streaming delta is present. + deltas = [ + e for e in turn_out if isinstance(e, StreamTaskMessageDelta) and isinstance(e.delta, ToolRequestDelta) + ] + assert len(deltas) == 1, "streamed tool-call args must surface as a ToolRequestDelta" + assert isinstance(deltas[0].delta, ToolRequestDelta) + assert deltas[0].delta.arguments_delta == '{"city":"Paris"}' diff --git a/tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py b/tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py new file mode 100644 index 000000000..5012e9974 --- /dev/null +++ b/tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py @@ -0,0 +1,200 @@ +"""Cross-channel conformance fixtures derived from real pydantic-ai event sequences. + +Each fixture is built by running a pydantic_ai event stream through PydanticAITurn +(default coalesce_tool_requests=False) and collecting the canonical StreamTaskMessage* +output. These canonical event lists are then registered with the conformance runner and +exercised by the cross-channel test (yield_events vs auto_send). + +AGX1-377 NOTE +------------- +The pydantic-ai stream emits a tool REQUEST as Start + ToolRequestDelta + Done (not a +Full event). The runner's current normalization does NOT produce a logical delivery for +Start+Delta+Done(tool_request): _yield_logical_deliveries only produces a delivery for +Full(tool_request) or Full(tool_response), and Start+Done for text/reasoning content. +auto_send likewise drops the Start+Delta+Done(tool_request) shape. Both channels handle +it consistently (both ignore it), so the cross-channel test PASSES, but it does NOT yet +assert that the streamed tool-request is actually delivered. Full delivery-equivalence +coverage for streamed tool requests will land once AGX1-377 fixes the normalization. +The fixtures below retain the ToolRequestDelta events so they become valid test inputs +automatically once AGX1-377 lands. +""" + +from __future__ import annotations + +import asyncio +from typing import Any, AsyncIterator + +import pytest +from pydantic_ai.messages import ( + TextPart, + PartEndEvent, + ThinkingPart, + ToolCallPart, + TextPartDelta, + PartDeltaEvent, + PartStartEvent, + ToolReturnPart, + ThinkingPartDelta, + ToolCallPartDelta, + FunctionToolResultEvent, +) + +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + +from .runner import ( + Fixture, + register, + derive_all, + run_cross_channel_conformance, +) + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +async def _aiter(events: list[Any]) -> AsyncIterator[Any]: + for e in events: + yield e + + +async def _canonical(pydantic_events: list[Any]) -> list[Any]: + """Run pydantic_ai events through PydanticAITurn and collect the output. + + Default coalesce_tool_requests=False means the output equals the bare + convert_pydantic_ai_to_agentex_events output. + """ + turn = PydanticAITurn(_aiter(pydantic_events), model=None) + return [e async for e in turn.events] + + +def _build_fixtures() -> list[Fixture]: + """Build all pydantic-ai conformance fixtures synchronously via asyncio.run.""" + + # ------------------------------------------------------------------ # + # 1. Text-only run: simple streaming text response. + # ------------------------------------------------------------------ # + text_only_pydantic = [ + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="Hello, ")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="world!")), + PartEndEvent(index=0, part=TextPart(content="Hello, world!")), + ] + + # ------------------------------------------------------------------ # + # 2. Single tool call + tool response. + # The canonical stream emits Start+ToolRequestDelta+Done for the request + # and Full(ToolResponseContent) for the response. See AGX1-377 note above + # for why the request delivery is not yet asserted cross-channel. + # ------------------------------------------------------------------ # + tool_call_pydantic = [ + PartStartEvent( + index=0, + part=ToolCallPart(tool_name="get_weather", args=None, tool_call_id="call_01"), + ), + PartDeltaEvent( + index=0, + delta=ToolCallPartDelta(args_delta='{"city":"Paris"}', tool_call_id="call_01"), + ), + PartEndEvent( + index=0, + part=ToolCallPart(tool_name="get_weather", args='{"city":"Paris"}', tool_call_id="call_01"), + ), + FunctionToolResultEvent( + part=ToolReturnPart(tool_name="get_weather", content="Sunny, 22C", tool_call_id="call_01"), + ), + ] + + # ------------------------------------------------------------------ # + # 3. Reasoning/thinking block: produces ReasoningContent Start+Delta+Done. + # ------------------------------------------------------------------ # + reasoning_pydantic = [ + PartStartEvent(index=0, part=ThinkingPart(content="")), + PartDeltaEvent(index=0, delta=ThinkingPartDelta(content_delta="First, let me think...")), + PartDeltaEvent(index=0, delta=ThinkingPartDelta(content_delta=" Then conclude.")), + PartEndEvent(index=0, part=ThinkingPart(content="First, let me think... Then conclude.")), + ] + + # ------------------------------------------------------------------ # + # 4. Multi-step run: text -> tool call + response -> text. + # Pydantic AI restarts part indices at 0 for each model response; the + # converter assigns globally-monotonic indices to Agentex messages. + # ------------------------------------------------------------------ # + multi_step_pydantic = [ + # First model turn: text then tool call + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="Let me check the weather.")), + PartEndEvent(index=0, part=TextPart(content="Let me check the weather.")), + PartStartEvent( + index=1, + part=ToolCallPart(tool_name="get_weather", args=None, tool_call_id="call_ms1"), + ), + PartDeltaEvent( + index=1, + delta=ToolCallPartDelta(args_delta='{"city":"London"}', tool_call_id="call_ms1"), + ), + PartEndEvent( + index=1, + part=ToolCallPart(tool_name="get_weather", args='{"city":"London"}', tool_call_id="call_ms1"), + ), + FunctionToolResultEvent( + part=ToolReturnPart(tool_name="get_weather", content="Cloudy, 15C", tool_call_id="call_ms1"), + ), + # Second model turn: text response (pydantic restarts index at 0) + PartStartEvent(index=0, part=TextPart(content="")), + PartDeltaEvent(index=0, delta=TextPartDelta(content_delta="It's cloudy and 15C in London.")), + PartEndEvent(index=0, part=TextPart(content="It's cloudy and 15C in London.")), + ] + + text_only_events = asyncio.run(_canonical(text_only_pydantic)) + tool_call_events = asyncio.run(_canonical(tool_call_pydantic)) + reasoning_events = asyncio.run(_canonical(reasoning_pydantic)) + multi_step_events = asyncio.run(_canonical(multi_step_pydantic)) + + return [ + Fixture(name="pydantic-ai-text-only", events=text_only_events), + Fixture(name="pydantic-ai-single-tool-call", events=tool_call_events), + Fixture(name="pydantic-ai-reasoning-block", events=reasoning_events), + Fixture(name="pydantic-ai-multi-step", events=multi_step_events), + ] + + +_FIXTURES: list[Fixture] = _build_fixtures() + +for _f in _FIXTURES: + register(_f) + + +# --------------------------------------------------------------------------- +# Cross-channel conformance: logical equivalence + span equivalence +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize("fixture", _FIXTURES, ids=lambda f: f.name) +@pytest.mark.asyncio +async def test_cross_channel_equivalence(fixture: Fixture) -> None: + """Assert that yield_events and auto_send produce equivalent logical + deliveries and identical span signals for each pydantic-ai fixture. + + See runner.py for the full contract. The AGX1-377 note at the top of this + module explains why streamed-tool-request delivery is not yet asserted. + """ + yield_deliveries, auto_deliveries, yield_spans, auto_spans = await run_cross_channel_conformance(fixture) + + assert yield_deliveries == auto_deliveries, ( + f"[{fixture.name}] logical deliveries differ:\n yield: {yield_deliveries}\n auto_send: {auto_deliveries}" + ) + assert yield_spans == auto_spans, ( + f"[{fixture.name}] span signals differ:\n yield: {yield_spans}\n auto_send: {auto_spans}" + ) + + +# --------------------------------------------------------------------------- +# Backward-compatible determinism guard +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize("fixture", _FIXTURES, ids=lambda f: f.name) +def test_span_derivation_is_deterministic(fixture: Fixture) -> None: + """Span derivation over the same event list is idempotent.""" + assert derive_all(fixture.events) == derive_all(fixture.events) diff --git a/tests/lib/core/harness/test_harness_pydantic_ai_async.py b/tests/lib/core/harness/test_harness_pydantic_ai_async.py new file mode 100644 index 000000000..8bda7d020 --- /dev/null +++ b/tests/lib/core/harness/test_harness_pydantic_ai_async.py @@ -0,0 +1,361 @@ +"""Integration test: async (Redis-streaming) channel with a pydantic-ai agent. + +Exercises the unified harness surface (UnifiedEmitter.auto_send_turn + PydanticAITurn) +with a minimal pydantic-ai agent backed by TestModel so the test runs fully +offline (no API keys, no Redis, no Agentex server). + +Agent description +----------------- +Same single-tool agent as the sync test: ``get_weather(city: str) -> str`` +returning "sunny and 72F". TestModel is configured to call the tool once then +produce a fixed text reply. + +The async path uses the bare PydanticAITurn (no coalescing): the foundation +auto_send delivers streamed tool-request Start+ToolRequestDelta+Done messages +natively (AGX1-377 fix), so no coalescing wrapper is needed. + +What is tested +-------------- +- The async handler pushes the correct sequence of messages to the fake streaming + backend: tool_request + tool_response + text (in that order). +- final_text equals the TestModel custom output. +- With a SpanTracer, tool spans are derived and forwarded to the fake tracing + backend (streamed tool-request delivery now triggers span derivation on the + async path). + +What is NOT covered without live infrastructure +----------------------------------------------- +- Actual Redis streaming (requires a running Redis instance). +- The ACP on_task_event_send / on_task_create / on_task_cancel lifecycle. +- Multi-turn history persistence via adk.state. +- Real LLM calls or production model behaviour. +- The full FastACP async request lifecycle. + +See also: test_harness_pydantic_ai_sync.py (span derivation with sync path) and +test_harness_pydantic_ai_temporal.py (temporal activity path). +""" + +from __future__ import annotations + +from typing import Any + +import pytest +from pydantic_ai import Agent +from pydantic_ai.models.test import TestModel + +from agentex.types.task_message import TaskMessage +from agentex.lib.core.harness.types import TurnResult +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + +# --------------------------------------------------------------------------- +# Minimal agent under test +# --------------------------------------------------------------------------- + + +def _make_agent() -> Agent: + """Build a pydantic-ai agent with one weather tool and a TestModel.""" + model = TestModel( + call_tools=["get_weather"], + custom_output_text="The weather in Paris is sunny and 72F.", + ) + agent: Agent = Agent(model) + + @agent.tool_plain + def get_weather(city: str) -> str: + """Get the current weather for a city.""" + return f"The weather in {city} is sunny and 72F" + + return agent + + +# --------------------------------------------------------------------------- +# Fake streaming backend (replaces adk.streaming; no Redis required) +# --------------------------------------------------------------------------- + + +class _FakeCtx: + """Minimal StreamingTaskMessageContext fake.""" + + def __init__(self, sink: list[Any], ctype: str, initial_content: Any) -> None: + self.sink = sink + self.ctype = ctype + self.task_message = TaskMessage(id="msg-1", task_id="task1", content=initial_content) + + async def __aenter__(self) -> "_FakeCtx": + self.sink.append(("open", self.ctype, self.task_message.content)) + return self + + async def __aexit__(self, *args: Any) -> bool: + await self.close() + return False + + async def close(self) -> None: + self.sink.append(("close", self.ctype)) + + async def stream_update(self, update: Any) -> Any: + self.sink.append(("delta", self.ctype, update)) + return update + + +class _FakeStreaming: + """Fake streaming backend; records every context lifecycle event.""" + + def __init__(self) -> None: + self.sink: list[Any] = [] + self.messages_opened: list[Any] = [] + + def streaming_task_message_context( + self, + task_id: str, + initial_content: Any, + streaming_mode: str = "coalesced", + created_at: Any = None, + ) -> _FakeCtx: + ctype = getattr(initial_content, "type", None) or "" + self.messages_opened.append(initial_content) + return _FakeCtx(self.sink, ctype, initial_content) + + +# --------------------------------------------------------------------------- +# Fake tracing backend +# --------------------------------------------------------------------------- + + +class _FakeSpan: + def __init__(self, name: str) -> None: + self.name = name + self.output: Any = None + + +class _FakeTracing: + def __init__(self) -> None: + self.started: list[tuple[str, str | None]] = [] + self.ended: list[tuple[str, Any]] = [] + + async def start_span( + self, + *, + trace_id: str, + name: str, + input: Any = None, + parent_id: Any = None, + data: Any = None, + task_id: Any = None, + ) -> _FakeSpan: + self.started.append((name, parent_id)) + return _FakeSpan(name) + + async def end_span(self, *, trace_id: str, span: _FakeSpan) -> None: + self.ended.append((span.name, span.output)) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +async def _run_auto_send_turn( + agent: Agent, + user_msg: str = "What is the weather in Paris?", + trace_id: str | None = None, + parent_span_id: str | None = None, + fake_tracing: _FakeTracing | None = None, +) -> tuple[TurnResult, _FakeStreaming]: + """Drive the async (auto_send) path and return the TurnResult + fake streaming state.""" + fake_streaming = _FakeStreaming() + + tracer: SpanTracer | bool | None = None + if trace_id and fake_tracing is not None: + tracer = SpanTracer( + trace_id=trace_id, + parent_span_id=parent_span_id, + task_id="task1", + tracing=fake_tracing, + ) + + async with agent.run_stream_events(user_msg) as stream: + turn = PydanticAITurn( + stream, + model="test", + ) + emitter = UnifiedEmitter( + task_id="task1", + trace_id=trace_id, + parent_span_id=parent_span_id, + tracer=tracer if tracer is not None else False, + streaming=fake_streaming, + ) + result = await emitter.auto_send_turn(turn) + + return result, fake_streaming + + +# --------------------------------------------------------------------------- +# Tests: message order and content +# --------------------------------------------------------------------------- + + +class TestAsyncAutoSendMessageOrder: + """auto_send pushes messages to the streaming backend in canonical order.""" + + async def test_tool_request_pushed_first(self) -> None: + """tool_request is the first message type pushed to the streaming backend.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + message_types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert "tool_request" in message_types + assert message_types.index("tool_request") < message_types.index("tool_response"), ( + "tool_request must be pushed before tool_response" + ) + + async def test_tool_response_pushed_after_tool_request(self) -> None: + """tool_response appears after tool_request in the pushed messages.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + message_types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert "tool_response" in message_types + + async def test_text_pushed_last(self) -> None: + """Text content is the last type pushed (after tool round-trip).""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + message_types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert message_types[-1] == "text", f"Expected last message type=text, got {message_types}" + + async def test_exactly_three_messages(self) -> None: + """Exactly three message contexts are opened: tool_request, tool_response, text.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + assert len(fake_streaming.messages_opened) == 3, ( + f"Expected 3 messages (tool_request + tool_response + text), " + f"got {len(fake_streaming.messages_opened)}: " + f"{[getattr(m, 'type', None) for m in fake_streaming.messages_opened]}" + ) + + +class TestAsyncAutoSendContentVerification: + """The content pushed to the streaming backend is correct.""" + + async def test_tool_request_content(self) -> None: + """The pushed tool_request is a ToolRequestContent for get_weather.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + tool_reqs = [m for m in fake_streaming.messages_opened if isinstance(m, ToolRequestContent)] + assert len(tool_reqs) == 1, "Expected exactly one ToolRequestContent" + assert tool_reqs[0].name == "get_weather" + + async def test_tool_response_content(self) -> None: + """The pushed tool_response is a ToolResponseContent containing the weather result.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + tool_resps = [m for m in fake_streaming.messages_opened if isinstance(m, ToolResponseContent)] + assert len(tool_resps) == 1, "Expected exactly one ToolResponseContent" + assert isinstance(tool_resps[0].content, str) + assert "72F" in tool_resps[0].content + assert tool_resps[0].name == "get_weather" + + async def test_tool_call_ids_match(self) -> None: + """tool_request and tool_response have the same tool_call_id.""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + tool_req = next(m for m in fake_streaming.messages_opened if isinstance(m, ToolRequestContent)) + tool_resp = next(m for m in fake_streaming.messages_opened if isinstance(m, ToolResponseContent)) + assert tool_req.tool_call_id == tool_resp.tool_call_id, ( + "tool_request and tool_response must share the same tool_call_id" + ) + + +class TestAsyncAutoSendFinalText: + """auto_send_turn returns the accumulated text from the last text part.""" + + async def test_final_text_matches_model_output(self) -> None: + """TurnResult.final_text equals the TestModel custom_output_text.""" + agent = _make_agent() + result, _ = await _run_auto_send_turn(agent) + assert result.final_text == "The weather in Paris is sunny and 72F." + + async def test_turn_result_has_usage(self) -> None: + """TurnResult carries a TurnUsage object (may have None tokens from TestModel).""" + agent = _make_agent() + result, _ = await _run_auto_send_turn(agent) + assert result.usage is not None + + async def test_context_lifecycle_open_then_close(self) -> None: + """Every message context is opened then closed (no leak).""" + agent = _make_agent() + _, fake_streaming = await _run_auto_send_turn(agent) + + opens = [e for e in fake_streaming.sink if e[0] == "open"] + closes = [e for e in fake_streaming.sink if e[0] == "close"] + assert len(opens) == len(closes) == 3, "Each of the 3 messages must have exactly one open and one close" + + +class TestAsyncAutoSendSpanDerivation: + """Span derivation on the async path now works for streamed tool requests. + + The foundation auto_send delivers Start+ToolRequestDelta+Done natively + (AGX1-377 fix). The SpanDeriver opens a tool span on Done(tool_request), + so the async path now derives spans just like the sync path. + """ + + async def test_tool_span_derived_on_async_path(self) -> None: + """With the bare PydanticAITurn (no coalescing), a tool span is derived + on the async/auto_send path when auto_send delivers the streamed + Start+ToolRequestDelta+Done sequence.""" + agent = _make_agent() + fake_tracing = _FakeTracing() + tracer = SpanTracer( + trace_id="trace1", + parent_span_id="parent", + task_id="task1", + tracing=fake_tracing, + ) + fake_streaming = _FakeStreaming() + + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id="trace1", + parent_span_id="parent", + tracer=tracer, + streaming=fake_streaming, + ) + await emitter.auto_send_turn(turn) + + assert len(fake_tracing.started) == 1, ( + "Expected one tool span to be started for the get_weather call." + ) + assert fake_tracing.started[0][0] == "get_weather" + assert len(fake_tracing.ended) == 1 + + +@pytest.mark.parametrize( + "user_msg", + [ + "What is the weather in Paris?", + "Tell me the weather in London.", + ], +) +async def test_async_handler_pushes_messages_for_various_inputs(user_msg: str) -> None: + """auto_send pushes at least tool_request + tool_response + text for any input.""" + agent = _make_agent() + result, fake_streaming = await _run_auto_send_turn(agent, user_msg=user_msg) + + message_types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert "tool_request" in message_types + assert "tool_response" in message_types + assert "text" in message_types + assert isinstance(result.final_text, str) + assert len(result.final_text) > 0 diff --git a/tests/lib/core/harness/test_harness_pydantic_ai_sync.py b/tests/lib/core/harness/test_harness_pydantic_ai_sync.py new file mode 100644 index 000000000..1557d0dd1 --- /dev/null +++ b/tests/lib/core/harness/test_harness_pydantic_ai_sync.py @@ -0,0 +1,388 @@ +"""Integration test: sync (HTTP-yield) channel with a pydantic-ai agent. + +Exercises the unified harness surface (UnifiedEmitter.yield_turn + PydanticAITurn) +with a minimal pydantic-ai agent backed by TestModel so the test runs fully +offline (no API keys, no live infrastructure). + +Agent description +----------------- +A single-tool agent with ``get_weather(city: str) -> str`` that always returns +"sunny and 72F". TestModel is configured to call that tool once then produce +a fixed text reply, giving a deterministic event sequence. + +What is tested +-------------- +- The sync handler correctly yields StreamTaskMessage* events in order: + tool_request (Start+Done) then tool_response (Full) then text (Start+Delta+Done). +- Final accumulated text equals the TestModel custom output. +- With a trace_id + fake tracing, a tool span is opened (OpenSpan) and + closed (CloseSpan) — proving the SpanDeriver is wired on the yield path. + +What is NOT covered without live infrastructure +----------------------------------------------- +- Actual HTTP streaming over the ACP sync endpoint (requires a running + Agentex server + deployed agent). +- Real LLM calls or production model behaviour. +- The full FastACP request/response lifecycle. + +See also: tests/lib/core/harness/test_harness_pydantic_ai_async.py and +test_harness_pydantic_ai_temporal.py for the other two channels. +""" + +from __future__ import annotations + +from typing import Any, override + +import pytest +from pydantic_ai import Agent +from pydantic_ai.models.test import TestModel + +from agentex.types.text_delta import TextDelta +from agentex.lib.core.harness.types import OpenSpan, CloseSpan +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.types.task_message_update import ( + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) +from agentex.types.tool_response_content import ToolResponseContent +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + +# --------------------------------------------------------------------------- +# Minimal agent under test +# --------------------------------------------------------------------------- + + +def _make_agent() -> Agent: + """Build a pydantic-ai agent with one weather tool and a TestModel. + + TestModel is instantiated with call_tools=['get_weather'] so it always + invokes the tool once, then emits custom_output_text as the reply. + """ + model = TestModel( + call_tools=["get_weather"], + custom_output_text="The weather in Paris is sunny and 72F.", + ) + agent: Agent = Agent(model) + + @agent.tool_plain + def get_weather(city: str) -> str: + """Get the current weather for a city.""" + return f"The weather in {city} is sunny and 72F" + + return agent + + +# --------------------------------------------------------------------------- +# Fake tracing backend (no network calls) +# --------------------------------------------------------------------------- + + +class _FakeSpan: + def __init__(self, name: str) -> None: + self.name = name + self.output: Any = None + + +class _FakeTracing: + def __init__(self) -> None: + self.started: list[tuple[str, str | None]] = [] + self.ended: list[tuple[str, Any]] = [] + + async def start_span( + self, + *, + trace_id: str, + name: str, + input: Any = None, + parent_id: Any = None, + data: Any = None, + task_id: Any = None, + ) -> _FakeSpan: + self.started.append((name, parent_id)) + return _FakeSpan(name) + + async def end_span(self, *, trace_id: str, span: _FakeSpan) -> None: + self.ended.append((span.name, span.output)) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +async def _run_yield_turn( + agent: Agent, + user_msg: str = "What is the weather in Paris?", + trace_id: str | None = None, + parent_span_id: str | None = None, + fake_tracing: _FakeTracing | None = None, +) -> list[Any]: + """Drive the sync (yield) path and collect all yielded events.""" + tracer: SpanTracer | bool | None = None + if trace_id and fake_tracing is not None: + tracer = SpanTracer( + trace_id=trace_id, + parent_span_id=parent_span_id, + task_id="task1", + tracing=fake_tracing, + ) + + events: list[Any] = [] + async with agent.run_stream_events(user_msg) as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id=trace_id, + parent_span_id=parent_span_id, + tracer=tracer if tracer is not None else False, + ) + events = [ev async for ev in emitter.yield_turn(turn)] + return events + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + + +class TestSyncYieldEventOrder: + """The yield channel forwards events in canonical order.""" + + async def test_tool_request_precedes_tool_response(self) -> None: + """tool_request events appear before the tool_response Full event.""" + agent = _make_agent() + events = await _run_yield_turn(agent) + + content_types = [ + getattr(getattr(ev, "content", None), "type", None) + for ev in events + if isinstance(ev, (StreamTaskMessageStart, StreamTaskMessageFull)) + ] + assert "tool_request" in content_types + assert "tool_response" in content_types + tool_req_idx = content_types.index("tool_request") + tool_resp_idx = content_types.index("tool_response") + assert tool_req_idx < tool_resp_idx, "tool_request must appear before tool_response in the event stream" + + async def test_text_appears_after_tool_response(self) -> None: + """Text content (Start/Done) comes after the tool_response Full event.""" + agent = _make_agent() + events = await _run_yield_turn(agent) + + full_types = [ + getattr(getattr(ev, "content", None), "type", None) + for ev in events + if isinstance(ev, StreamTaskMessageFull) + ] + start_types = [ + getattr(getattr(ev, "content", None), "type", None) + for ev in events + if isinstance(ev, StreamTaskMessageStart) + ] + + assert "tool_response" in full_types + assert "text" in start_types + + tool_resp_pos = next( + i + for i, ev in enumerate(events) + if isinstance(ev, StreamTaskMessageFull) + and getattr(getattr(ev, "content", None), "type", None) == "tool_response" + ) + text_start_pos = next( + i + for i, ev in enumerate(events) + if isinstance(ev, StreamTaskMessageStart) and getattr(getattr(ev, "content", None), "type", None) == "text" + ) + assert tool_resp_pos < text_start_pos + + async def test_tool_response_carries_weather_result(self) -> None: + """The ToolResponseContent contains the get_weather return value.""" + agent = _make_agent() + events = await _run_yield_turn(agent) + + full_events = [ + ev + for ev in events + if isinstance(ev, StreamTaskMessageFull) and isinstance(getattr(ev, "content", None), ToolResponseContent) + ] + assert len(full_events) >= 1, "Expected at least one tool_response Full event" + tool_response = full_events[0].content + assert isinstance(tool_response, ToolResponseContent) + assert isinstance(tool_response.content, str) + assert "72F" in tool_response.content + assert tool_response.name == "get_weather" + + async def test_accumulated_text_matches_model_output(self) -> None: + """Accumulated text deltas equal the TestModel custom_output_text.""" + from agentex.types.task_message_update import StreamTaskMessageDelta + + agent = _make_agent() + events = await _run_yield_turn(agent) + + accumulated = "".join( + ev.delta.text_delta + for ev in events + if isinstance(ev, StreamTaskMessageDelta) and isinstance(ev.delta, TextDelta) and ev.delta.text_delta + ) + assert accumulated == "The weather in Paris is sunny and 72F." + + async def test_every_start_has_matching_done(self) -> None: + """Every StreamTaskMessageStart has a corresponding StreamTaskMessageDone.""" + agent = _make_agent() + events = await _run_yield_turn(agent) + + starts = {ev.index for ev in events if isinstance(ev, StreamTaskMessageStart)} + dones = {ev.index for ev in events if isinstance(ev, StreamTaskMessageDone)} + assert starts == dones, f"Unmatched Start/Done indices: starts={starts} dones={dones}" + + +class TestSyncYieldSpanDerivation: + """SpanDeriver is wired on the yield path; tool spans are opened/closed.""" + + async def test_tool_span_opened_and_closed(self) -> None: + """One tool span is opened and closed per tool call.""" + agent = _make_agent() + fake_tracing = _FakeTracing() + tracer = SpanTracer( + trace_id="trace1", + parent_span_id="parent-span", + task_id="task1", + tracing=fake_tracing, + ) + + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id="trace1", + parent_span_id="parent-span", + tracer=tracer, + ) + await emitter.yield_turn(turn).__anext__.__self__ if False else None + [_ async for _ in emitter.yield_turn(turn)] + + assert len(fake_tracing.started) == 1, "Expected exactly one tool span opened" + assert len(fake_tracing.ended) == 1, "Expected exactly one tool span closed" + span_name, parent_id = fake_tracing.started[0] + assert span_name == "get_weather" + assert parent_id == "parent-span" + + async def test_tool_span_output_is_tool_result(self) -> None: + """The closed tool span's output equals the tool's return value.""" + agent = _make_agent() + fake_tracing = _FakeTracing() + tracer = SpanTracer( + trace_id="trace1", + parent_span_id="parent-span", + task_id="task1", + tracing=fake_tracing, + ) + + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id="trace1", + parent_span_id="parent-span", + tracer=tracer, + ) + [_ async for _ in emitter.yield_turn(turn)] + + name, output = fake_tracing.ended[0] + assert name == "get_weather" + assert output is not None + assert "72F" in str(output) + + async def test_no_trace_id_means_no_spans(self) -> None: + """With trace_id=None, no spans are derived (emitter disables tracing).""" + agent = _make_agent() + fake_tracing = _FakeTracing() + + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id=None, + parent_span_id=None, + tracing=fake_tracing, + ) + [_ async for _ in emitter.yield_turn(turn)] + + assert fake_tracing.started == [] + assert fake_tracing.ended == [] + + async def test_tracer_false_suppresses_spans(self) -> None: + """tracer=False disables span derivation regardless of trace_id.""" + agent = _make_agent() + fake_tracing = _FakeTracing() + + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id="trace1", + parent_span_id="parent-span", + tracer=False, + tracing=fake_tracing, + ) + [_ async for _ in emitter.yield_turn(turn)] + + assert fake_tracing.started == [] + assert fake_tracing.ended == [] + + async def test_span_signal_types(self) -> None: + """The signals received by the tracer are OpenSpan then CloseSpan.""" + from agentex.lib.core.harness.tracer import SpanTracer as RealTracer + + received_signals: list[Any] = [] + + class _RecordingTracer(RealTracer): + @override + async def handle(self, signal: Any) -> None: + received_signals.append(signal) + await super().handle(signal) + + fake_tracing = _FakeTracing() + tracer = _RecordingTracer( + trace_id="trace1", + parent_span_id="parent", + task_id="task1", + tracing=fake_tracing, + ) + + agent = _make_agent() + async with agent.run_stream_events("What is the weather in Paris?") as stream: + turn = PydanticAITurn(stream, model="test") + emitter = UnifiedEmitter( + task_id="task1", + trace_id="trace1", + parent_span_id="parent", + tracer=tracer, + ) + [_ async for _ in emitter.yield_turn(turn)] + + assert len(received_signals) == 2 + assert isinstance(received_signals[0], OpenSpan) + assert isinstance(received_signals[1], CloseSpan) + assert received_signals[0].name == "get_weather" + + +@pytest.mark.parametrize( + "user_msg", + [ + "What is the weather in Paris?", + "Tell me the weather in London.", + ], +) +async def test_sync_handler_produces_events_for_various_inputs(user_msg: str) -> None: + """Yield path produces at least a tool_response Full for any user message.""" + agent = _make_agent() + events = await _run_yield_turn(agent, user_msg=user_msg) + + full_event_types = [ + getattr(getattr(ev, "content", None), "type", None) for ev in events if isinstance(ev, StreamTaskMessageFull) + ] + assert "tool_response" in full_event_types diff --git a/tests/lib/core/harness/test_harness_pydantic_ai_temporal.py b/tests/lib/core/harness/test_harness_pydantic_ai_temporal.py new file mode 100644 index 000000000..0ead8e832 --- /dev/null +++ b/tests/lib/core/harness/test_harness_pydantic_ai_temporal.py @@ -0,0 +1,370 @@ +"""Integration test: Temporal-backed pydantic-ai agent, offline. + +Exercises the core of the Temporal pydantic-ai harness path — the +event_stream_handler activity — with a TemporalAgent backed by TestModel so the +test runs fully offline (no Temporal server, no Redis, no API keys). + +Architecture overview +--------------------- +In a real Temporal deployment the pydantic-ai Temporal harness runs like this: + + HTTP POST /task/event/send + -> @workflow.signal on At110PydanticAiWorkflow + -> temporal_agent.run(user_message, deps=TaskDeps(...)) + internally schedules: + 1. request_activity (LLM HTTP call — recorded by Temporal) + 2. call_tool_activity (for each tool call — also recorded) + 3. event_stream_handler_activity (streams events to Redis) + +The third activity is what we test here: it receives a +``RunContext[TaskDeps]`` and an ``AsyncIterable[AgentStreamEvent]`` from +pydantic-ai, calls ``stream_pydantic_ai_events`` (which internally constructs +a ``UnifiedEmitter`` + ``PydanticAITurn`` and calls ``auto_send_turn``), and +pushes the resulting messages to Redis. + +What we test +----------- +Since ``TemporalAgent.run_stream_events`` works offline with TestModel (it does +not schedule Temporal activities — it runs in-process), we can: + +1. Build a TemporalAgent with TestModel. +2. Call ``run_stream_events`` on it directly, just as the event_stream_handler + would see the event iterable. +3. Feed that stream into ``stream_pydantic_ai_events`` backed by a fake streaming + backend, and assert the canonical message sequence. + +This covers the full inner harness chain that the Temporal workflow exercises, +minus the Temporal scheduling/durability layer itself. + +What is NOT covered without live infrastructure +----------------------------------------------- +- Temporal scheduling (the workflow.signal -> activity dispatch chain). +- Temporal durability guarantees and replay behaviour. +- Redis streaming (requires a running Redis instance). +- Multi-turn history (pydantic-ai message_history round-tripping via Temporal + workflow state). +- Real LLM calls or production model behaviour. +- The full temporal_agent.run(...) path, which schedules activities and cannot + run without a connected Temporal client. + +To test with live infrastructure: spin up Temporal + Redis + the ACP server + +the Temporal worker, then use the AsyncAgentex client to create a task, send a +message, and poll for messages — exactly as the existing examples/tutorials/ +10_async/10_temporal/110_pydantic_ai/tests/test_agent.py does. +""" + +from __future__ import annotations + +from typing import Any + +import pytest +from pydantic import BaseModel +from pydantic_ai import Agent +from pydantic_ai.models.test import TestModel +from pydantic_ai.durable_exec.temporal import TemporalAgent + +from agentex.types.task_message import TaskMessage +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent +from agentex.lib.adk._modules._pydantic_ai_turn import PydanticAITurn + +# --------------------------------------------------------------------------- +# Agent under test (mirrors examples/tutorials/10_async/10_temporal/110_pydantic_ai) +# --------------------------------------------------------------------------- + + +class TaskDeps(BaseModel): + """Per-run dependencies injected via RunContext.deps.""" + + task_id: str + parent_span_id: str | None = None + + +def _make_temporal_agent() -> TemporalAgent[TaskDeps, str]: + """Build a TemporalAgent with TestModel and one weather tool. + + The underlying pydantic-ai Agent is constructed with TaskDeps as the + deps_type, mirroring the real temporal tutorial agent. TestModel makes + the run deterministic and offline. + """ + model = TestModel( + call_tools=["get_weather"], + custom_output_text="The weather in Paris is sunny and 72F.", + ) + base: Agent[TaskDeps, str] = Agent(model, deps_type=TaskDeps) + + @base.tool_plain + def get_weather(city: str) -> str: + """Get the current weather for a city.""" + return f"The weather in {city} is sunny and 72F" + + return TemporalAgent(base, name="test_temporal_agent") + + +# --------------------------------------------------------------------------- +# Fake streaming backend +# --------------------------------------------------------------------------- + + +class _FakeCtx: + def __init__(self, sink: list[Any], ctype: str, initial_content: Any) -> None: + self.sink = sink + self.ctype = ctype + self.task_message = TaskMessage(id="msg-1", task_id="task1", content=initial_content) + + async def __aenter__(self) -> "_FakeCtx": + self.sink.append(("open", self.ctype, self.task_message.content)) + return self + + async def __aexit__(self, *args: Any) -> bool: + await self.close() + return False + + async def close(self) -> None: + self.sink.append(("close", self.ctype)) + + async def stream_update(self, update: Any) -> Any: + self.sink.append(("delta", self.ctype, update)) + return update + + +class _FakeStreaming: + def __init__(self) -> None: + self.sink: list[Any] = [] + self.messages_opened: list[Any] = [] + + def streaming_task_message_context( + self, + task_id: str, + initial_content: Any, + streaming_mode: str = "coalesced", + created_at: Any = None, + ) -> _FakeCtx: + ctype = getattr(initial_content, "type", None) or "" + self.messages_opened.append(initial_content) + return _FakeCtx(self.sink, ctype, initial_content) + + +# --------------------------------------------------------------------------- +# Helpers: the event_stream_handler pattern tested offline +# --------------------------------------------------------------------------- + + +async def _run_event_stream_handler( + temporal_agent: TemporalAgent[TaskDeps, str], + user_msg: str = "What is the weather in Paris?", + task_id: str = "task1", +) -> _FakeStreaming: + """Simulate the event_stream_handler activity offline. + + In production the event_stream_handler receives the event stream from + pydantic-ai's model activity and calls stream_pydantic_ai_events. + Here we obtain the stream directly from run_stream_events (which works + offline with TestModel) and forward it to stream_pydantic_ai_events backed + by a fake streaming backend. + + This is equivalent to: + async def event_handler(ctx: RunContext[TaskDeps], events: AsyncIterable[AgentStreamEvent]) -> None: + await stream_pydantic_ai_events(events, ctx.deps.task_id) + but without requiring a running Temporal server. + """ + fake_streaming = _FakeStreaming() + + async with temporal_agent.run_stream_events(user_msg) as stream: + await _fake_stream_pydantic_ai_events(stream, task_id, fake_streaming) + + return fake_streaming + + +async def _fake_stream_pydantic_ai_events( + stream: Any, + task_id: str, + fake_streaming: _FakeStreaming, +) -> str: + """Like stream_pydantic_ai_events but uses an injected fake streaming backend. + + Mirrors the exact chain that stream_pydantic_ai_events uses internally: + PydanticAITurn(stream) + + UnifiedEmitter.auto_send_turn(turn) + but with the fake backend injected so no Redis is needed. + """ + turn = PydanticAITurn(stream, model=None) + emitter = UnifiedEmitter( + task_id=task_id, + trace_id=None, + parent_span_id=None, + tracer=False, + streaming=fake_streaming, + ) + result = await emitter.auto_send_turn(turn) + return result.final_text + + +# --------------------------------------------------------------------------- +# Tests: TemporalAgent + event_stream_handler pattern +# --------------------------------------------------------------------------- + + +class TestTemporalEventStreamHandlerMessageOrder: + """The event_stream_handler pushes messages in canonical order.""" + + async def test_tool_request_before_tool_response(self) -> None: + """tool_request is pushed before tool_response.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert "tool_request" in types + assert "tool_response" in types + assert types.index("tool_request") < types.index("tool_response") + + async def test_text_is_last(self) -> None: + """Text content is pushed last (after the tool round-trip).""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert types[-1] == "text" + + async def test_exactly_three_messages(self) -> None: + """Exactly tool_request + tool_response + text are pushed.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + assert len(fake_streaming.messages_opened) == 3, ( + f"Expected 3 messages, got {len(fake_streaming.messages_opened)}: " + f"{[getattr(m, 'type', None) for m in fake_streaming.messages_opened]}" + ) + + +class TestTemporalEventStreamHandlerContent: + """Content verification for the messages pushed by the event_stream_handler.""" + + async def test_tool_request_is_get_weather(self) -> None: + """The pushed tool_request is for the get_weather function.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + tool_reqs = [m for m in fake_streaming.messages_opened if isinstance(m, ToolRequestContent)] + assert len(tool_reqs) == 1 + assert tool_reqs[0].name == "get_weather" + + async def test_tool_response_contains_weather_result(self) -> None: + """The pushed tool_response contains the get_weather return value.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + tool_resps = [m for m in fake_streaming.messages_opened if isinstance(m, ToolResponseContent)] + assert len(tool_resps) == 1 + assert isinstance(tool_resps[0].content, str) + assert "72F" in tool_resps[0].content + assert tool_resps[0].name == "get_weather" + + async def test_tool_call_ids_match(self) -> None: + """tool_request and tool_response share the same tool_call_id.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + tool_req = next(m for m in fake_streaming.messages_opened if isinstance(m, ToolRequestContent)) + tool_resp = next(m for m in fake_streaming.messages_opened if isinstance(m, ToolResponseContent)) + assert tool_req.tool_call_id == tool_resp.tool_call_id + + +class TestTemporalFinalText: + """stream_pydantic_ai_events returns the correct final text.""" + + async def test_final_text_matches_model_output(self) -> None: + """The returned final text equals the TestModel custom_output_text.""" + temporal_agent = _make_temporal_agent() + fake_streaming = _FakeStreaming() + + async with temporal_agent.run_stream_events("What is the weather in Paris?") as stream: + final = await _fake_stream_pydantic_ai_events(stream, "task1", fake_streaming) + + assert final == "The weather in Paris is sunny and 72F." + + async def test_context_lifecycle_complete(self) -> None: + """Every opened streaming context is also closed.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent) + + opens = [e for e in fake_streaming.sink if e[0] == "open"] + closes = [e for e in fake_streaming.sink if e[0] == "close"] + assert len(opens) == len(closes), "Every opened context must be closed" + + +class TestTemporalAgentStreamEventsOffline: + """TemporalAgent.run_stream_events produces the expected raw pydantic-ai events. + + This verifies that the TemporalAgent wrapper does not suppress event stream + delivery when used with TestModel, so the event_stream_handler pattern is + meaningful offline. + """ + + async def test_run_stream_events_yields_tool_call_and_text(self) -> None: + """TemporalAgent.run_stream_events with TestModel yields tool + text events.""" + + temporal_agent = _make_temporal_agent() + collected: list[Any] = [] + + async with temporal_agent.run_stream_events("What is the weather in Paris?") as stream: + async for ev in stream: + collected.append(ev) + + event_types = {type(ev).__name__ for ev in collected} + assert "FunctionToolResultEvent" in event_types, "Expected FunctionToolResultEvent proving tool call ran" + assert "PartDeltaEvent" in event_types or "PartEndEvent" in event_types, ( + "Expected text part events in the stream" + ) + + async def test_run_stream_events_contains_tool_result(self) -> None: + """The raw event stream contains a FunctionToolResultEvent with the tool output.""" + from pydantic_ai.messages import FunctionToolResultEvent + + temporal_agent = _make_temporal_agent() + + async with temporal_agent.run_stream_events("What is the weather in Paris?") as stream: + events = [ev async for ev in stream] + + tool_results = [ev for ev in events if isinstance(ev, FunctionToolResultEvent)] + assert len(tool_results) >= 1 + assert isinstance(tool_results[0].part.content, str) + assert "72F" in tool_results[0].part.content + + +class TestTemporalLiveInfraNote: + """Placeholder tests documenting what requires live Temporal infrastructure. + + These tests are skipped by design. They document the gap between what the + offline tests cover and what a full integration test would exercise. + """ + + @pytest.mark.skip( + reason=( + "Requires live Temporal server + Redis + ACP server + worker. " + "See examples/tutorials/10_async/10_temporal/110_pydantic_ai/tests/test_agent.py " + "for the live integration test that exercises this path end-to-end." + ) + ) + async def test_temporal_workflow_full_round_trip(self) -> None: + """Full Temporal workflow: create_task -> send_event -> poll_messages.""" + pass # Covered by the live tutorial test + + +@pytest.mark.parametrize( + "user_msg", + [ + "What is the weather in Paris?", + "Tell me the weather in London.", + ], +) +async def test_temporal_handler_pushes_messages_for_various_inputs(user_msg: str) -> None: + """event_stream_handler pushes tool_request + tool_response + text for any input.""" + temporal_agent = _make_temporal_agent() + fake_streaming = await _run_event_stream_handler(temporal_agent, user_msg=user_msg) + + types = [getattr(m, "type", None) for m in fake_streaming.messages_opened] + assert "tool_request" in types + assert "tool_response" in types + assert "text" in types