Bugfix for Declarative Workflow#6530
Conversation
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 86%
✓ Correctness
No actionable issues found in this dimension.
✓ Security Reliability
No actionable issues found in this dimension.
✓ Test Coverage
The PR adds comprehensive tests for the new per-invocation request ID mechanism (concurrent approvals, checkpoint/restore, legacy migration, duplicate delivery, etc.) for both InvokeFunctionToolExecutor and InvokeMcpToolExecutor. However, several pre-existing MCP approval tests now silently exercise the error/rejection path instead of the success path because they construct approval responses with
action.Idbut never callHandleAsyncto register a snapshot. Additionally, there is no test verifying_pendingNonApprovalCallIdssurvive checkpoint/restore cycles. The MCP executor test suite adds comprehensive coverage for the new per-invocation request ID scheme, including unique IDs, concurrent approvals, checkpoint/reset/restore cycles, and legacy migration. However, it is missing two test scenarios that the parallel InvokeFunctionToolExecutorTest covers: duplicate approval delivery idempotency and stale+valid approvals in a single response message. These are important edge cases for a concurrent system.
✓ Failure Modes
No actionable issues found in this dimension.
✓ Design Approach
The new per-invocation snapshot design closes the cross-request mix-up cases, but it still leaves one checkpoint/replay hole: once an approval response is consumed, the executor only removes that snapshot from memory. If the process resets before the next checkpoint, restore will reload the stale snapshot and republish the old pending request, so the same approval can drive the MCP tool a second time.
Suggestions
- Add an MCP-equivalent of
InvokeFunctionToolDuplicateApprovalDeliveryInvokesFunctionOnceAsync— delivering the same approval response twice should invoke the MCP tool exactly once and surface the not-approved error on the second delivery. - Add an MCP-equivalent of
InvokeFunctionToolAprovalMatchPrefersPendingSnapshotAsync— when a response contains multiple ToolApprovalResponseContent items (a stale/unrelated approval plus the valid one), the executor should select the one matching a pending snapshot. The MCPCaptureResponseAsyncalready handles this viaFirstOrDefaultwith a predicate, but an explicit test would guard against regressions.
Automated review by peibekwe's agents
There was a problem hiding this comment.
Pull request overview
This PR fixes correctness gaps in declarative workflow tool invocation by introducing per-invocation identifiers and storing approval-time snapshots per request, ensuring approvals resume with the exact parameters the user reviewed even under concurrent/interleaved execution and across checkpoint/restore cycles.
Changes:
- Switch
InvokeMcpToolExecutorandInvokeFunctionToolExecutorfrom a single approval snapshot to per-invocation snapshot maps keyed by unique request ids, including legacy checkpoint migration. - Add tracking for non-approval
InvokeFunctionToolcall ids soFunctionResultContentcan be matched to the originating invocation rather thanthis.Id. - Expand unit tests to cover: unique request ids, concurrent pending approvals, missing snapshot behavior, legacy checkpoint migration, and checkpoint/reset/restore survivability.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/src/Microsoft.Agents.AI.Workflows.Declarative/ObjectModel/InvokeMcpToolExecutor.cs | Implements per-invocation approval snapshots + checkpoint persistence/migration for MCP tool approvals. |
| dotnet/src/Microsoft.Agents.AI.Workflows.Declarative/ObjectModel/InvokeFunctionToolExecutor.cs | Implements per-invocation call ids + approval snapshots + persisted pending state for function-tool invocations. |
| dotnet/src/Microsoft.Agents.AI.Workflows.Declarative/Interpreter/DeclarativeActionExecutor.cs | Makes ResetAsync() virtual so action executors can actually override/reset per-run state. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.Declarative.UnitTests/ObjectModel/InvokeMcpToolExecutorTest.cs | Updates and adds tests for per-invocation MCP approval ids, migration, and checkpoint/reset/restore behavior. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.Declarative.UnitTests/ObjectModel/InvokeFunctionToolExecutorTest.cs | Adds extensive coverage for per-invocation call ids, concurrent pending approvals, migration, and replay/reset scenarios. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 83%
✓ Correctness
No actionable issues found in this dimension.
✓ Security Reliability
The PR correctly migrates from a single approval snapshot to a per-invocation concurrent dictionary, fixing a significant limitation where a single executor instance could only track one pending approval. The security model is maintained: approval snapshots are consumed atomically via TryRemove preventing replay, per-invocation GUIDs prevent ID prediction, and legacy migration is handled correctly. No high-severity security or reliability issues found. One medium-severity observation about silent error swallowing in InvokeFunctionToolExecutor when a response contains neither a matching FunctionResultContent nor any ToolApprovalResponseContent.
✓ Test Coverage
The MCP test coverage is thorough for the new per-invocation approval snapshot design, covering unique IDs, concurrent pending approvals, missing snapshots, legacy migration, and checkpoint/reset/restore cycles. However, there is an asymmetric coverage gap: InvokeFunctionToolExecutorTest includes InvokeFunctionToolDuplicateApprovalDeliveryInvokesFunctionOnceAsync (verifying idempotency when the same approval is delivered twice), but InvokeMcpToolExecutorTest has no equivalent test for the same atomic TryRemove-based idempotency contract.
✓ Failure Modes
No actionable issues found in this dimension.
✗ Design Approach
Both tool executors now key pending work by per-invocation IDs, but they still only clear those IDs from in-memory dictionaries on the response path. Because checkpointing persists the dictionaries and restore reloads them, a crash after approval/result handling but before the next checkpoint can resurrect an already-consumed invocation and let the same response be processed again after restore. The new MCP tests cover more of the approval-snapshot behavior, but one important failure mode is still untested: successful approval removes the pending snapshot only in memory, while checkpoint/restore persists
_approvalSnapshotsseparately. I also noticed one smaller coverage gap where the unique-request-id test claims to validate both wrapper and inner MCP content ids but only checks the wrapper.
Flagged Issues
- Both InvokeFunctionToolExecutor (CaptureResponseAsync at line 208) and InvokeMcpToolExecutor (CaptureResponseAsync at line 167) remove consumed pending IDs/approval snapshots only from in-memory dictionaries, while OnCheckpointingAsync/OnCheckpointRestoredAsync persist and reload those maps. A crash after approval handling but before the next checkpoint will resurrect an already-consumed invocation and allow the same response to be replayed. A durable state update is needed on the response path, not just an in-memory removal.
- The test at InvokeMcpToolExecutorTest.cs:1426-1428 only asserts the live in-memory dictionary is empty after approval. Since CaptureResponseAsync never persists the updated map back to workflow state while checkpoint/restore serializes and reloads it, this test mises the crash-before-checkpoint replay path. Please assert the persisted stateStore["_approvalSnapshots"] is cleared/updated after CaptureResponseAsync.
Suggestions
- The unique-request-id test at InvokeMcpToolExecutorTest.cs:1126-1134 claims to validate uniqueness on both McpServerToolCallContent and the wrapping ToolApprovalRequestContent, but only checks the wrapper request ids. The analogous function-tool test (InvokeFunctionToolExecutorTest.cs:476-513) also verifies the inner call id. Adding the MCP inner-content assertion would make the test match its stated contract and serve as a more complete regression guard.
Automated review by peibekwe's agents
|
Flagged issue Both InvokeFunctionToolExecutor (CaptureResponseAsync at line 208) and InvokeMcpToolExecutor (CaptureResponseAsync at line 167) remove consumed pending IDs/approval snapshots only from in-memory dictionaries, while OnCheckpointingAsync/OnCheckpointRestoredAsync persist and reload those maps. A crash after approval handling but before the next checkpoint will resurrect an already-consumed invocation and allow the same response to be replayed. A durable state update is needed on the response path, not just an in-memory removal. Source: automated DevFlow PR review |
|
Flagged issue The test at InvokeMcpToolExecutorTest.cs:1426-1428 only asserts the live in-memory dictionary is empty after approval. Since CaptureResponseAsync never persists the updated map back to workflow state while checkpoint/restore serializes and reloads it, this test mises the crash-before-checkpoint replay path. Please assert the persisted stateStore["_approvalSnapshots"] is cleared/updated after CaptureResponseAsync. Source: automated DevFlow PR review |
Description & Review Guide
Bugfix for declarative workflow InvokeFunctionTool and InvokeMcpTool
Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.