feat(eval): id-aware match with sanitised-name fallback (Option C, layer 1/3) by ajay-kesavan · Pull Request #1736 · UiPath/uipath-python

ajay-kesavan · 2026-06-19T17:20:39Z

Summary

SDK layer of the Option C eval-matching architecture. The matcher prefers id-equality when both sides carry an id; otherwise falls back to sanitised-name equality so display-name-keyed eval-sets (the current default) keep working while new id-keyed eval-sets become rename-safe.

Changes

_match_key / _calls_match — id wins first, then sanitised name fallback
tool_calls_count_score — try raw key first, then sanitised key on miss
_normalize_tool_name — pinned reference implementation of the LangChain sanitiser algorithm (split-on-whitespace → strip non [A-Za-z0-9_-] → cap at 64 chars)

Why

Closes the display-vs-sanitised gap (eval-set "Web Search" vs span "Web_Search"). When the producer side later starts emitting tool.id on the span (companion PRs in uipath-agents-python + flow-workbench), this same matcher uses the id-equality path automatically — same code, two backward-compatible match modes.

Companions

uipath-agents-python: inject canvas node id into tool.metadata, instrumentor emits tool.id
flow-workbench: picker captures + stores canvas node id

Tests

pytest tests/evaluators/test_evaluator_helpers.py::TestSanitizedNameMatch — 19 passed
pytest tests/cli/eval/ tests/evaluators/ — 889 passed
ruff check / format / mypy — clean

🤖 Generated with Claude Code

… evaluators Option C of the eval matching architecture: - _match_key / _calls_match prefer id-equality when both sides carry an id - name fallback normalises both sides through the LangChain sanitiser so an editor-persisted display name ("Web Search") matches a runtime span whose tool.name is the sanitised form ("Web_Search") - tool_calls_count_score's direct dict lookup also tries the sanitised key on miss so the count path matches the same semantics Backward compatibility: - Old eval-sets keyed by display name match via the sanitised name path - New eval-sets keyed by canvas node id match via the id path - No data migration required Test coverage: - Pinned reference sanitiser matching uipath_langchain's algorithm - _match_key id-wins-first + name-fallback paths - _calls_match for ToolCall + ToolOutput - count_score display-name + id-keyed cases Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates the SDK-side evaluator matching logic to prefer stable tool id equality when available, and otherwise fall back to LangChain-style sanitised-name equality so display-name-keyed eval sets continue to match runtime spans whose tool.name is sanitised.

Changes:

Added _normalize_tool_name implementing a pinned LangChain sanitiser (whitespace → _, strip non [A-Za-z0-9_-], truncate to 64).
Updated _match_key and _calls_match to use id-first matching with sanitised-name fallback.
Updated tool_calls_count_score to attempt raw expected keys first, then their sanitised form on lookup miss.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
packages/uipath/src/uipath/eval/_helpers/evaluators_helpers.py	Implements sanitised-name normalisation and applies it across id-aware matching and count scoring.
packages/uipath/tests/evaluators/test_evaluator_helpers.py	Adds targeted tests that pin the sanitiser behavior and validate display-vs-sanitised fallback plus id-first semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- count_score: split the two-tier .get() into `is None` short-circuit so the sanitiser regex only runs when the raw key is absent (P2). `or` would have been wrong: a real count of 0 is a hit, not a fallback trigger. - TestSanitizedNameMatch: hoist nine in-method imports of `_match_key`/`_calls_match`/`_normalize_tool_name` to the module-level import block (shrink). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The sibling `uipath.eval.mocks._mock_context._normalize_tool_name` does the OPPOSITE transform (`"my_tool"` -> `"my tool"`, snake-case -> words). Keeping the LangChain-mirroring sanitiser under the same name in a neighbouring module is a foot-gun for any future reader who grabs the wrong import. Rename to match the upstream `sanitize_tool_name` and drop the collision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per repo convention: comments carry one short line for non-obvious constraints; reasoning lives in commit / PR descriptions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When an actual ToolCall carries an id, the matcher uses id-only mode and does not fall back to sanitised-name. When it has no id, sanitised-name is the only path. This makes the matching behaviour symmetric with the user's mental model: id-vs-id when ids exist, name-vs-name when they don't, never the two crossed. - _match_key: drop the name fallback when actual_id is present. - _calls_match: when actual.id is set, compare against expected.id (or expected.name when picker stored the id under that field). No sanitised- name path while id is in play. - count_tool_calls_by_name_and_id: bucket each call under one key — id when present, name otherwise. The dict no longer mixes kinds, so a later lookup can't cross-match a display name against an id bucket. Trade: legacy name-keyed eval-sets stop matching against post-Layer-5 id-bearing spans. Acceptable since the picker now always stores the id when one is available, and legacy authors can re-pick to upgrade. Tests updated to reflect the new contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sonarqubecloud · 2026-06-20T01:24:46Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings June 19, 2026 17:20

github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 19, 2026

Copilot started reviewing on behalf of ajay-kesavan June 19, 2026 17:21 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

chore: bump uipath to 2.11.6 (2.11.5 already on PyPI)

06d5d20

ajay-kesavan requested a review from Chibionos June 19, 2026 18:42

chore: refresh uipath uv.lock for 2.11.6 bump

8e7af29

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Chibionos approved these changes Jun 19, 2026

View reviewed changes

ajay-kesavan and others added 4 commits June 19, 2026 14:11

chore: collapse multi-line comments to one line

c83a28b

Per repo convention: comments carry one short line for non-obvious constraints; reasoning lives in commit / PR descriptions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chore: apply ruff format

039b3f5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ajay-kesavan force-pushed the ajay/eval-option-c-sdk branch from 3840732 to 039b3f5 Compare June 19, 2026 21:13

UiPath deleted a comment from github-actions Bot Jun 20, 2026

ajay-kesavan merged commit 6d94685 into main Jun 20, 2026
251 of 255 checks passed

ajay-kesavan deleted the ajay/eval-option-c-sdk branch June 20, 2026 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): id-aware match with sanitised-name fallback (Option C, layer 1/3)#1736

feat(eval): id-aware match with sanitised-name fallback (Option C, layer 1/3)#1736
ajay-kesavan merged 8 commits into
mainfrom
ajay/eval-option-c-sdk

ajay-kesavan commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

sonarqubecloud Bot commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ajay-kesavan commented Jun 19, 2026

Summary

Changes

Why

Companions

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sonarqubecloud Bot commented Jun 20, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants