feat(#1423): Diagram.trace() for upstream restriction propagation#1471
Open
dimitri-yatsenko wants to merge 5 commits into
Open
feat(#1423): Diagram.trace() for upstream restriction propagation#1471dimitri-yatsenko wants to merge 5 commits into
dimitri-yatsenko wants to merge 5 commits into
Conversation
Diagram.cascade(part_integrity="cascade") used to derive each Part's Master restriction by joining `master_ft.proj() & child_ft.proj()` on shared attribute names. This failed when the Part referenced its Master indirectly — through another Part with renamed FK columns (`.proj()` in the definition) or via a Part-of-Part chain that does not directly inherit Master's PK names. The intermediate Parts' restrictions were also skipped. Replace the proj-join shortcut with an upward walk of the actual FK graph from the Part to its Master, applying symmetric (upward) counterparts of the existing propagation rules at each edge. Key changes in src/datajoint/diagram.py: - New Diagram._apply_propagation_rule_upward — mirror of the existing forward propagation method. Same three rules (shared-PK copy, aliased reverse-rename, non-aliased projection) applied in the reverse direction (child → parent). - New Diagram._propagate_part_to_master — walks nx.shortest_path (Master → Part) and applies the upward rules along each real edge, transparently skipping the integer-named alias nodes that the graph inserts for aliased FKs. Restricts intermediate Parts too (the chain case from #1429 Case 2). Materializes the Master's restriction via to_arrays() so the subsequent forward cascade back down to Master's other Parts produces literal `WHERE ... IN (values)` clauses rather than self-referential subqueries (avoiding MySQL error 1093). - New Diagram._find_real_edge_props — looks up edge props for parent → child via the direct edge OR through an alias node. - _propagate_restrictions: seed-is-Part case. When the cascade starts at a Part (e.g. `Master.PartB.delete(part_integrity="cascade")`), the main loop's part_integrity block — nested inside the out_edges iteration — cannot fire because a leaf Part has no out-edges. Trigger the upward propagation explicitly for the seed before the main loop. - Diagram.cascade: expand nodes_to_show to include any node that the part_integrity propagation pulled in (the master and its descendants), so counts() and __iter__ report the full cascade subgraph. Tests in tests/integration/test_cascade_delete.py — three new mysql tests covering both #1429 cases plus an end-to-end delete. Full regression: 8 + 15 + 33 mysql tests pass. Slated for DataJoint 2.3.
- Drop unused `propagated_edges` parameter from `_propagate_part_to_master`
and its call sites. The parameter was vestigial after the design
switched from edge-blocking to materialization at the master.
- Document two limitations in the docstring:
- Single FK path: nx.shortest_path returns one path; non-shortest
paths are not applied.
- Memory cost of materialization: to_arrays() pulls matching master
PKs into Python memory.
- Add test_cascade_three_level_part_chain covering PartC → PartB →
PartA → Master. Confirms intermediate Parts are restricted at every
hop, not just the first.
All 36 mysql tests in cascade_delete + cascading_delete + dependencies
+ semantic_matching pass.
Implements T2.2.a of the provenance trinity (datajoint-docs#183). Upstream mirror of Diagram.cascade(): walks the FK graph from a restricted seed to every ancestor with OR convergence — an ancestor entity is included if reachable through any FK path from the seed. Reuses the upward propagation primitives added by #1468 (_apply_propagation_rule_upward / _find_real_edge_props) applied here in a generalized form (any child → any parent, not just Part → Master). Branch note: stacked on fix/1429-cascade-part-part-renamed-fk (#1468) for the upward primitives. Will rebase onto master after #1468 lands. What's added: - src/datajoint/diagram.py: - New @classmethod Diagram.trace(table_expr) — mirror of cascade(), walks ancestors instead of descendants, trims to ancestor subgraph. - New _propagate_restrictions_upstream(start_node) — multi-pass walk over in_edges, applies the upward rules at each real edge. Alias-node transparent. - New __getitem__(key) — supports both Table subclass/instance (returns pre-restricted QueryExpression) and string (returns pre-restricted FreeTable). Raises DataJointError for tables outside the trace's subgraph. - Bugfix in _apply_propagation_rule_upward Backward Rule 3: previous code projected child to its OWN PK (child_ft.proj()) which excluded non-primary FK columns. Now projects to the FK columns via proj(*attr_map.keys()), correctly carrying them into the parent restriction for non-primary-FK cases. Caught by test_trace_or_convergence_two_paths. - src/datajoint/dependencies.py: - New load_all_upstream() — symmetric to load_all_downstream. Iteratively discovers upstream schemas reachable via reverse FK edges, expanding the graph until convergence. - src/datajoint/adapters/{base,mysql,postgres}.py: - New find_upstream_schemas_sql(schemas_list) on each adapter, symmetric to find_downstream_schemas_sql. - tests/integration/test_trace.py (new, 8 tests covering single-hop, multi-hop, renamed FK, OR convergence across two paths, non-ancestor rejection, string indexing → FreeTable, counts(), leaf-table seed). All 8 trace tests pass on MySQL. Regression: test_cascade_delete + test_cascading_delete + test_dependencies + test_semantic_matching — 36 tests pass, no regressions from the Rule 3 fix. Slated for DataJoint 2.3.
This was referenced Jun 23, 2026
The trace-mode __getitem__ I added shadowed networkx.DiGraph's standard
adjacency-dict lookup for ALL Diagrams, not just trace results. ERD
tests (and any other code that does diagram[node_name] for adjacency)
were getting DataJointError("not in this trace's subgraph") instead of
the adjacency dict.
Fix: short-circuit non-trace diagrams (no _mode attribute or _mode != "trace")
to super().__getitem__(key) before any trace-specific logic runs.
Tests:
- 5 previously-failing erd tests now pass (test_erd, test_diagram_algebra,
test_repr_svg, test_make_image, test_part_table_parsing).
- 8/8 trace tests still pass.
dimitri-yatsenko
added a commit
that referenced
this pull request
Jun 23, 2026
Implements T2.2.c of the provenance trinity, completing the trio (Diagram.trace → self.upstream → strict_provenance). When dj.config["strict_provenance"] = True, runtime gates enforce the upstream-only convention inside make(): - Reads must target a table in the active trace's allowed set (declared ancestors + self + self's Parts). - Writes must target self or self's Parts. - Inserted rows' PK columns that overlap with the current key must equal the key's values (key-consistency rule). Default is False. Existing make() bodies are unaffected. Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace (#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase onto master after the chain merges. What's added: - src/datajoint/provenance.py (new): the runtime context module. - `_active_strict_make` ContextVar holding (target, allowed_tables, key) for the currently-executing make() invocation. ContextVar chosen over threading.local to propagate correctly across contextvars-aware concurrency boundaries. - `push_strict_make_context` / `pop_strict_make_context` — context lifecycle managed by `_populate_one`'s try/finally. - `assert_read_allowed(query_expression)` — read gate. Recursively discovers base tables via the QueryExpression's `_support` chain and checks each against the allowed set. - `assert_write_allowed(target_table, rows)` — write gate. Verifies the target is self or one of self's Part tables, and checks the key-consistency rule on each dict row. - src/datajoint/settings.py: new `strict_provenance: bool` field on Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING entry. - src/datajoint/autopopulate.py: in `_populate_one`, push the strict context (when the flag is on) just before the make() invocation block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name} ∪ {self's Parts}. Pop in the existing `finally` block. - src/datajoint/expression.py: `QueryExpression.cursor` now calls `assert_read_allowed(self)` before issuing SQL. No-op outside make(). - src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)` after the existing `_allow_insert` check. No-op outside make(). Part-table detection uses class `__dict__` traversal (filtered to Part subclasses) instead of `dir/getattr` to avoid triggering the `_JobsDescriptor` (which would lazy-declare ~~table inside the populate transaction — caught by the first test iteration). Documented limitation (deferred): the read gate does not distinguish reads that came through `self.upstream` from reads of the same ancestor via a direct expression. Both are allowed if the table is in the allowed set. The intent is to catch reads from *undeclared* dependencies; tightening the "must come through self.upstream" path requires propagating an attribution marker through QueryExpression composition and is left for a follow-up release. Tests in tests/integration/test_strict_provenance.py (6 new): - test_strict_compliant_make_passes — make() reading via self.upstream and writing self.insert1 with matching key runs cleanly under strict. - test_strict_blocks_read_from_undeclared_table — read from an unrelated table raises with "strict_provenance ... undeclared" message. - test_strict_blocks_write_to_other_table — insert into a non-self, non-Part target raises "not permitted". - test_strict_blocks_write_with_mismatched_key — row PK that disagrees with the current key raises "does not match the current make() key". - test_strict_writes_to_part_table_pass — self.PartName.insert(...) works. - test_strict_off_by_default_no_change — default-off regression check; the canonical "direct (Ancestor & key).fetch1()" pattern still works when strict_provenance is unset. Regression: 17/17 autopopulate tests pass with strict_provenance unset (default). 6/6 new strict tests pass with strict_provenance=True. 8/8 trace tests + 9/9 cascade tests unaffected. Slated for DataJoint 2.3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
T2.2.a of the provenance trinity (datajoint-docs#183 spec). Upstream mirror of `Diagram.cascade()`: walks the FK graph from a restricted seed to every ancestor with OR convergence — an ancestor entity is included if reachable through any FK path from the seed.
Closes #1423. Slated for DataJoint 2.3.
Branch dependency
This branch is stacked on `fix/1429-cascade-part-part-renamed-fk` (#1468) for the upward propagation primitives (`_apply_propagation_rule_upward`, `_find_real_edge_props`). After #1468 merges, this branch will be rebased onto master before review.
What's added
Tests
Sequencing
This is T2.2.a. After it merges:
Test plan