Skip to content

feat(lint): expose XPath/expression AST to Starlark rules#688

Open
Andries-Smit wants to merge 5 commits into
mendixlabs:mainfrom
Andries-Smit:feat/xpath
Open

feat(lint): expose XPath/expression AST to Starlark rules#688
Andries-Smit wants to merge 5 commits into
mendixlabs:mainfrom
Andries-Smit:feat/xpath

Conversation

@Andries-Smit

@Andries-Smit Andries-Smit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Add two new builtins available to all .star lint rules:

  • xpath_expressions() — iterates XPathExpressionEntry records from the catalog, covering RETRIEVE, SECURITY, and DATASOURCE usage types across microflows, access rules, and pages.

  • parse_xpath(expr) — parses a raw XPath string into a structured Starlark tree using the existing exprcheck parser. Returns a recursive struct tree with a kind discriminator field and type-specific child fields (left/right for bin, operand for unary, args for call, etc.).

Internal additions:

  • starlark_xpath.go — robustExprToStarlark() converter and xpathExpressionEntryToStarlark() mapper
  • starlark_xpath_test.go — unit tests for the converter
  • LintContext.XPathExpressions() iterator (context.go)

These builtins make it possible to write performance and security lint rules that inspect the parsed expression tree rather than raw strings — for example, flagging negative enumeration XPath conditions that prevent database index usage.

Example Rule

# PERF001: Prefer Positive XPath Conditions
#
# Negative conditions (!=, not()) in database retrieve constraints prevent the
# database from using indices, since it cannot index "what is not there".
# This specifically applies to enumeration attribute comparisons.
#
# Noncompliant: [EnumAttribute != Module.Enumeration.C]
# Compliant:    [EnumAttribute = Module.Enumeration.A or EnumAttribute = Module.Enumeration.B]
#
# Reference: Mendix performance best practices, negative XPath constraints.

RULE_ID = "PERF001"
RULE_NAME = "PositiveXPathCondition"
DESCRIPTION = "Flags negative enumeration XPath conditions (!=, not) in retrieve/security constraints — negative conditions prevent index usage"
CATEGORY = "performance"
SEVERITY = "warning"

def check():
    violations = []

    for e in xpath_expressions():
        if e.usage_type not in ("RETRIEVE", "SECURITY"):
            continue
        if not e.xpath_expression:
            continue

        ast = parse_xpath(e.xpath_expression)
        if _has_enum_negation(ast):
            violations.append(violation(
                message="Negative XPath condition in {} '{}' prevents index usage on '{}'".format(
                    e.document_type.lower(), e.document_qualified_name, e.target_entity
                ),
                location=location(
                    module=e.module_name,
                    document_type=e.document_type.lower(),
                    document_name=e.document_qualified_name,
                    document_id=e.document_id,
                ),
                suggestion="Rewrite '{}' using positive conditions (= and or) so the database can use indices".format(
                    e.xpath_expression
                ),
            ))

    return violations

# ---------------------------------------------------------------------------
# AST helpers
# ---------------------------------------------------------------------------

def _is_attr(node):
    """True for an attribute or association path: bare identifier (variable), attr_path, or 2-part qualified name."""
    return node.kind == "variable" or node.kind == "attr_path" or (node.kind == "qname" and node.sub == "")

def _is_enum_or_string(node):
    """True for an enumeration value (3-part qname) or a string literal."""
    return node.kind == "string" or (node.kind == "qname" and node.sub != "")

def _is_attr_enum_pair(left, right):
    """True when one side is an attribute path and the other is an enum/string literal."""
    return (_is_attr(left) and _is_enum_or_string(right)) or \
           (_is_enum_or_string(left) and _is_attr(right))

def _is_neq_pattern(node):
    """Matches: attr != Enum.Value"""
    return (node.kind == "bin" and node.op == "!=" and
            _is_attr_enum_pair(node.left, node.right))

def _is_not_eq_pattern(node):
    """Matches: not(attr = Enum.Value)"""
    if node.kind != "call" or node.name.lower() != "not":
        return False
    for arg in node.args:
        if (arg.kind == "bin" and arg.op == "=" and
                _is_attr_enum_pair(arg.left, arg.right)):
            return True
    return False

def _has_enum_negation(root):
    """Walks the AST using an explicit stack (no recursion, no while — Starlark restrictions).
    Uses for _ in range(N) as a bounded loop over a mutable work list.
    """
    if root == None:
        return False
    stack = [root]
    for _ in range(256):  # XPath ASTs are shallow in practice; 256 is a safe upper bound
        if not stack:
            break
        node = stack.pop()
        if node == None:
            continue
        k = node.kind

        # Leaf nodes — no children to inspect
        if k in ("null", "string", "number", "bool", "empty", "variable",
                 "attr_path", "qname", "constant", "token", "recovered", "unknown"):
            continue

        # Pattern check at this node
        if _is_neq_pattern(node) or _is_not_eq_pattern(node):
            return True

        # Push children onto the stack
        if k == "bin":
            stack.append(node.left)
            stack.append(node.right)
        elif k == "unary":
            stack.append(node.operand)
        elif k == "call":
            for a in node.args:
                stack.append(a)
        elif k == "paren":
            stack.append(node.inner)
        elif k == "if":
            stack.append(node.cond)
            stack.append(node.then)
            stack.append(node.else_)
    return False

@github-actions

Copy link
Copy Markdown

AI Code Review

Review Summary

Critical Issues: None
Moderate Issues:

  • Missing integration test for new Starlark builtins in starlark_test.go (only conversion function tests exist)

Minor Issues:

  • Example rule in PR description not added to test suite (though acceptable as illustration)
  • Recursive robustExprToStarlark could theoretically stack overflow on extremely deep expressions (low risk in practice)

What Looks Good:

  • Clean implementation following existing linter patterns
  • Proper SQL safety and module exclusion handling
  • Comprehensive unit tests for conversion logic
  • Correct XPath bracket stripping and AST conversion
  • Well-scoped PR focused solely on Starlark lint enhancements
  • No MDL syntax changes (avoids full-stack consistency requirements)
  • Performance category mapping added appropriately

Recommendation: Approve after adding Starlark builtin integration test to starlark_test.go (e.g., test rule using xpath_expressions() and parse_xpath()). Current unit tests are sufficient for core logic but missing end-to-end Starlark environment validation.


Detailed Findings

Overlap & Duplication

No duplication:

  • No existing xpath_expressions() iterator or parse_xpath() builtin found via grep
  • Functionality distinct from existing Starlark helpers (e.g., role_mappings)
  • PR does not re-document shipped features

MDL Syntax Design

Not applicable:

  • PR adds no MDL syntax (only Starlark builtins for linter)
  • Skipped per checklist (only relevant for MDL grammar/AST changes)

Full-Stack Consistency

Not applicable:

  • No MDL feature added (no grammar/AST/visitor/executor changes)
  • Linter internals only (no BSON/UI impact)

Test Coverage

⚠️ Moderate gap:

  • ✅ New test file starlark_xpath_test.go covers conversion logic thoroughly
  • Missing: Integration test for Starlark builtins in starlark_test.go (e.g., test rule invoking xpath_expressions()/parse_xpath())
  • Existing starlark_test.go tests other builtins but not these new ones
  • Example rule in PR description illustrates usage but isn't tested

Security & Robustness

Strong:

  • SQL query uses static string + Scanner (no injection risk)
  • stripXPathBrackets is pure/input-safe
  • No file I/O, temp files, or hot-path concerns
  • Method receivers correct (no mutations)
  • XPath AST conversion handles all exprcheck node types

Scope & Atomicity

Focused:

  • Single concern: exposing XPath AST to Starlark linter rules
  • No unrelated changes (e.g., no version gating, no BSON changes)
  • All changes confined to linter package

Code Quality

High:

  • Follows existing linter patterns (see builtinRoleMappings/builtinProjectSecurity)
  • Clear comments and struct documentation
  • Deterministic iteration (SQL ORDER BY)
  • No magic numbers (256 stack depth in example is illustrative, not in code)
  • Manual lists absent (uses SQL for dynamic data)

Key Observations

  1. Starlark safety: Builtins return immutable structs; no shared state risks
  2. Performance: xpath_expressions() streams via iterator (avoids bulk allocation)
  3. Error handling: parse_xpath gracefully handles invalid expressions via exprcheck

Automated review via OpenRouter (Nemotron Super 120B) — workflow source

Andries-Smit added a commit to Andries-Smit/mxcli that referenced this pull request Jun 24, 2026
…arse_xpath builtins

Addresses the missing integration test flagged in the PR mendixlabs#688 review.
Exercises the full pipeline: catalog seed → LoadStarlarkRule → rule.Check → violation assertions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

AI Code Review

##Review Summary

This PR adds two new Starlark builtins (xpath_expressions() and parse_xpath(expr)) to enable lint rules to analyze XPath expression ASTs rather than raw strings. The changes are confined to the linter subsystem and do not modify MDL syntax or the full MDL pipeline.

Critical Issues

  • None found

Moderate Issues

  • The PR description mentions a \"PERF001\": \"Performance\" mapping in report.go, but this change is not visible in the provided diff. While the example rule sets CATEGORY = \"performance\" directly (so the lint functionality would work), this mapping may be needed for proper rule categorization in reports. However, since the PR focuses on exposing builtins for user-defined rules (not shipping a built-in rule), this is likely not blocking.

Minor Issues

  • In starlark_xpath.go, the robustExprToStarlark function returns a struct with kind: "null" for nil inputs. While functional, returning starlark.None might be more idiomatic for representing null values in Starlark, though the current approach is acceptable.
  • The _has_enum_negation helper function uses for _ in range(256) as a bounded loop workaround for Starlark's lack of while loops. This is a common pattern but worth noting as a Starlark-specific constraint.

What Looks Good

  • Correct implementation: Builtins properly follow existing Starlark patterns in starlark.go (buildPredeclared(), proper arg unpacking, error handling)
  • Robust SQL handling: XpathExpressions() uses proper parameterized queries, handles nullable fields with sql.NullString, and checks exclusions
  • Thorough AST conversion: robustExprToStarlark() comprehensively covers all exprcheck node types with correct field mapping
  • Strong test coverage: Includes unit tests for conversion functions and an integration test using a file-based catalog (avoiding in-memory pooling issues)
  • Clear documentation: The example rule in the PR description demonstrates practical usage for detecting negative XPath conditions
  • Safe XPath parsing: stripXPathBrackets() correctly handles optional outer brackets and whitespace
  • Iterator pattern: Properly implements the iterator protocol with yield and early termination support
  • No MDL syntax changes: Appropriately confines changes to the linter subsystem since this is about Starlark builtins, not MDL language features

Recommendation

Approve the PR. The changes are well-implemented, follow existing code patterns, include comprehensive tests, and provide valuable functionality for writing sophisticated lint rules. The minor concerns about the missing report.go change (if actually required) and Starlark null representation are not blocking issues for this feature's core purpose. The PR enables users to write powerful performance and security lint rules that analyze XPath expression structure rather than raw strings.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

@github-actions

Copy link
Copy Markdown

AI Code Review

What Looks Good

  • Feature completeness: Adds two valuable builtins (xpath_expressions() and parse_xpath()) that enable sophisticated XPath analysis in Starlark lint rules, exactly as requested
  • Test coverage: Includes both unit tests (starlark_xpath_test.go) and an integration test (starlark_xpath_integration_test.go) that verify the builtins work correctly together
  • Code quality: Follows existing patterns in the codebase (similar to other builtins in starlark.go), with clean, readable implementations
  • Integration: Properly wires the new functionality through the linting context and Starlark rule system
  • Safety: Handles edge cases like nil expressions and malformed XPath gracefully (recovered expressions produce safe struct outputs)
  • Documentation: Includes clear example rule demonstrating real-world usage for performance optimization

Recommendation

Approve - This PR is ready to merge. It adds valuable functionality for lint rule writers without violating any applicable review checklist items (it doesn't add MDL syntax, so those specific checks don't apply). The implementation is robust, well-tested, and follows established patterns in the codebase.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

This was referenced Jun 24, 2026
@ako

ako commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Strong PR — verified end-to-end on the branch.

What's solid:

  • Real, populated data path. The query targets the xpath_expressions view, which exists on main (tables.go:385) and is filled by builder_xpath.go. All 13 selected columns exist, and the builder emits the RETRIEVE/SECURITY/DATASOURCE usage types the example rule filters on.
  • No merge-base skew. Unlike feat(lint): expose constants() query function to Starlark lint rules #685, the new XPathExpressions() iterator correctly uses ctx.IsExcluded(e.ModuleName), so it honors --modules and stays consistent with the other 7 iterators.
  • Tests included and passing. Unit tests cover the AST converter + bracket stripping; the integration test drives a real Starlark rule through xpath_expressions() + parse_xpath() against a file-based catalog. go vet clean, go test ./mdl/linter/ green (incl. TestStarlarkXPathBuiltins); the exprcheck field mappings all compile.
  • Graceful failure: parse_xpath ignores the parse error and the converter maps nil → kind:"null", so malformed input degrades to recovered/null rather than panicking.

🟠 Main gap — no skill documentation (should block): all 5 changed files are under mdl/linter/; nothing updates .claude/skills/mendix/write-lint-rules.md. That skill has the canonical "Available Query Functions" table + per-object property tables, and #685 set the precedent of documenting each new builtin there. This PR adds two builtins (xpath_expressions(), parse_xpath()), a new xpath_expression object (13 fields), and an expr AST node schema with 14 kind variants and type-specific fields (left/right/op/operand/args/name/inner/cond/then/else_/path/variable/module/sub/value). Without docs, rule authors can't discover these or learn the field names — and since this skill is synced into user projects via make sync-skills, it's invisible to end users entirely (CLAUDE.md "Skills — new features documented" checklist item).

Minor:

  • stripXPathBrackets mishandles chained predicates. Mendix constraints can be [a = 1][b = 2]; the function strips only the outer pair when both ends are brackets, yielding a = 1][b = 2 → parser recovers, silently skipping a valid constraint. Correct for the common single-predicate case; worth a guard or a noted limitation.
  • Observation (not a defect): the catalog already stores a precomputed XPathAST TEXT column the entry struct doesn't expose; parse_xpath re-parses from the raw string. Fine since parse_xpath accepts arbitrary strings, but worth being aware of.
  • Nit: the example rule guards with if node == None, but the converter never returns Starlark None (it returns kind:"null" structs), so those checks are dead — the walker relies on "null" being in its leaf-kind set instead. Harmless; example only.

Recommendation: approve once the skill docs are added for the two builtins, the xpath_expression object, and the expr AST schema. Implementation, data wiring, and tests are all in good shape — the doc gap is the one thing that shouldn't merge without. The stripXPathBrackets chained-predicate case is a good small follow-up.

Andries-Smit and others added 5 commits June 26, 2026 11:47
Add two new builtins available to all `.star` lint rules:

- `xpath_expressions()` — iterates XPathExpressionEntry records
  from the catalog, covering RETRIEVE, SECURITY, and DATASOURCE
  usage types across microflows, access rules, and pages.

- `parse_xpath(expr)` — parses a raw XPath string into a
  structured Starlark tree using the existing exprcheck parser.
  Returns a recursive struct tree with a `kind` discriminator
  field and type-specific child fields (left/right for bin,
  operand for unary, args for call, etc.).

Internal additions:
- `starlark_xpath.go` — robustExprToStarlark() converter and
  xpathExpressionEntryToStarlark() mapper
- `starlark_xpath_test.go` — unit tests for the converter
- `LintContext.XPathExpressions()` iterator (context.go)
- `"PERF001": "Performance"` mapping in report.go

These builtins make it possible to write performance and security
lint rules that inspect the parsed expression tree rather than
raw strings — for example, flagging negative enumeration XPath
conditions that prevent database index usage (PERF001).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BGEk1wbb3i1kMc6HBAodHQ
…arse_xpath builtins

Addresses the missing integration test flagged in the PR mendixlabs#688 review.
Exercises the full pipeline: catalog seed → LoadStarlarkRule → rule.Check → violation assertions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The function returns a struct with kind="null", not starlark.None.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[a = 1][b = 2] was incorrectly stripped to a = 1][b = 2 because
the check only looked at the first and last characters. The fix
walks bracket depth to confirm the opening [ at position 0 is
matched by the final ] before stripping.

Added two test cases for chained predicates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R1yYeYn9rosf8UdAimdatt
…n skill

Adds to write-lint-rules.md:
- xpath_expressions() in the query functions table
- parse_xpath(s) in the helper functions table
- xpath_expression object (13 fields)
- expr AST node schema (14 kind variants + type-specific fields)
- Leaf-kind list and tree-walking example

Closes the skill-docs gap flagged in PR mendixlabs#688 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01R1yYeYn9rosf8UdAimdatt
@github-actions

Copy link
Copy Markdown

AI Code Review

What Looks Good

  • The PR is well-scoped, focusing solely on extending Starlark lint rule capabilities with two new builtins (xpath_expressions() and parse_xpath())
  • Implementation follows existing patterns in the linter codebase (similar to other builtins like role_mappings())
  • Comprehensive documentation added to .claude/skills/mendix/write-lint-rules.md with clear examples
  • Proper error handling (nil context checks, SQL error handling, graceful degradation)
  • Thorough test coverage including unit tests for conversion functions and an integration test
  • The example rule (PERF001) demonstrates practical usage for performance linting
  • No MDL syntax changes were made (appropriate since this is a lint framework enhancement, not an MDL language feature)
  • Code correctly handles edge cases like chained XPath brackets and nil AST nodes
  • Integration test uses proper temporary directory handling and file-based catalog for isolation

Recommendation

Approve. The PR successfully adds the requested XPath expression analysis capabilities to Starlark lint rules without violating any project guidelines. Since it doesn't introduce or modify MDL syntax, the MDL-specific checklist items (grammar, AST, visitor, executor, etc.) correctly do not apply. The implementation is robust, well-tested, and integrates cleanly with the existing linter framework. The documentation clearly explains the new builtins and their usage.


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants