Skip to content

fix(#1429): cascade through FK chain for part_integrity="cascade"#1468

Open
dimitri-yatsenko wants to merge 3 commits into
masterfrom
fix/1429-cascade-part-part-renamed-fk
Open

fix(#1429): cascade through FK chain for part_integrity="cascade"#1468
dimitri-yatsenko wants to merge 3 commits into
masterfrom
fix/1429-cascade-part-part-renamed-fk

Conversation

@dimitri-yatsenko

@dimitri-yatsenko dimitri-yatsenko commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

`Diagram.cascade(part_integrity="cascade")` used to derive each Part's Master restriction by joining `master_ft.proj() & child_ft.proj()` on shared attribute names. This failed when the Part referenced its Master indirectly:

  • Case 1: Part references another Part via `.proj()` (renamed FK) — the shared-attribute join finds nothing, Master is not restricted.
  • Case 2: Part-of-Part chain (e.g. `PartB → PartA → Master`) — even when the shared-attribute join happens to find Master's PK, intermediate Parts in the chain are skipped.

Replace the proj-join shortcut with an upward walk of the actual FK graph from the Part to its Master, applying symmetric (upward) counterparts of the existing propagation rules at each edge. Closes #1429. Slated for DataJoint 2.3.

Approach

Component Change
New `Diagram._apply_propagation_rule_upward` Mirror of the existing forward propagation method. Same three rules (shared-PK copy, aliased reverse-rename, non-aliased projection) applied in the reverse direction.
New `Diagram._propagate_part_to_master` Walks `nx.shortest_path(Master → Part)` and applies upward rules along each real edge, transparently skipping alias nodes (integer-named graph nodes inserted for aliased FKs). Restricts intermediate Parts. Materializes the Master's restriction via `to_arrays()` so subsequent forward cascade back to other Parts produces literal value clauses, not self-referential subqueries (MySQL 1093).
New `Diagram._find_real_edge_props` Edge props lookup that transparently traverses alias nodes.
`_propagate_restrictions` Seed-is-Part case: when cascade starts at a leaf Part (e.g. `Master.PartB.delete(part_integrity="cascade")`), the main loop's part_integrity block cannot fire (nested inside `out_edges`, leaf Part has none). Trigger upward propagation explicitly for the seed.
`Diagram.cascade` Expand `nodes_to_show` to include nodes pulled in by part_integrity propagation (the master and its descendants), so `counts()` reports the full cascade subgraph.

Tests

Three new MySQL tests in `tests/integration/test_cascade_delete.py::TestLineageRefreshOnDecoration`:

  • `test_cascade_part_of_part_no_master_reference` — Case 2: `PartB → PartA → Master`. Verify Master and PartA are restricted, and that the master cascades back down to all its Parts.
  • `test_cascade_part_of_part_renamed_fk` — Case 1: `PartB → PartA.proj(src_master='master_id', ...)`. Verify the renamed FK is reversed walking up to Master.
  • `test_cascade_part_of_part_actual_delete` — end-to-end `delete()` through a Part-of-Part chain (no MySQL 1093).

Regression: all 8 cascade_delete mysql tests pass, plus 15 in cascading_delete + dependencies, plus 33 in semantic_matching + erd. No regressions.

What's not in this PR

  • Companion docs: a detailed cascade-rules specification will land in a separate datajoint-docs PR. Captures the 3 forward rules + 3 upward rules, the alias-node mechanism, the part_integrity modes, and the materialization semantics.

Test plan

  • All 3 new tests pass on MySQL
  • All existing cascade/dependency/diagram/semantic-matching tests pass — no regressions
  • CI green on this PR
  • Companion datajoint-docs PR opened with cascade-rules spec

Diagram.cascade(part_integrity="cascade") used to derive each Part's
Master restriction by joining `master_ft.proj() & child_ft.proj()` on
shared attribute names. This failed when the Part referenced its Master
indirectly — through another Part with renamed FK columns (`.proj()` in
the definition) or via a Part-of-Part chain that does not directly
inherit Master's PK names. The intermediate Parts' restrictions were
also skipped.

Replace the proj-join shortcut with an upward walk of the actual FK
graph from the Part to its Master, applying symmetric (upward)
counterparts of the existing propagation rules at each edge.

Key changes in src/datajoint/diagram.py:

- New Diagram._apply_propagation_rule_upward — mirror of the existing
  forward propagation method. Same three rules (shared-PK copy, aliased
  reverse-rename, non-aliased projection) applied in the reverse
  direction (child → parent).

- New Diagram._propagate_part_to_master — walks nx.shortest_path
  (Master → Part) and applies the upward rules along each real edge,
  transparently skipping the integer-named alias nodes that the graph
  inserts for aliased FKs. Restricts intermediate Parts too (the chain
  case from #1429 Case 2). Materializes the Master's restriction via
  to_arrays() so the subsequent forward cascade back down to Master's
  other Parts produces literal `WHERE ... IN (values)` clauses rather
  than self-referential subqueries (avoiding MySQL error 1093).

- New Diagram._find_real_edge_props — looks up edge props for parent →
  child via the direct edge OR through an alias node.

- _propagate_restrictions: seed-is-Part case. When the cascade starts
  at a Part (e.g. `Master.PartB.delete(part_integrity="cascade")`),
  the main loop's part_integrity block — nested inside the out_edges
  iteration — cannot fire because a leaf Part has no out-edges. Trigger
  the upward propagation explicitly for the seed before the main loop.

- Diagram.cascade: expand nodes_to_show to include any node that the
  part_integrity propagation pulled in (the master and its descendants),
  so counts() and __iter__ report the full cascade subgraph.

Tests in tests/integration/test_cascade_delete.py — three new mysql
tests covering both #1429 cases plus an end-to-end delete. Full
regression: 8 + 15 + 33 mysql tests pass.

Slated for DataJoint 2.3.

@MilagrosMarin MilagrosMarin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the implementation in depth. Substantive bug is fixed correctly and the design is sound — observations below are quality nits, not correctness issues.

Verification ✅

Architectural fix is correct. The shift from "master_ft.proj() & child_ft.proj() shared-attribute join" → "walk the actual FK graph with backward propagation rules" correctly addresses both bug cases from #1429:

  • Case 1 (renamed FK): Backward Rule 2 (child_ft.proj(**{pk: fk for fk, pk in attr_map.items()})) reverses the renaming
  • Case 2 (Part-of-Part chain): Walking nx.shortest_path restricts intermediate Parts along the way

Backward rules mirror forward rules exactly. Compared _apply_propagation_rule_upward (lines 607-661) against _apply_propagation_rule (lines 549-605):

Forward Rule 1 Backward Rule 1
Condition not aliased and parent_attrs <= child_pk not aliased and child_attrs <= parent_pk
Action Copy parent restriction to child Copy child restriction to parent
Forward Rule 2 Backward Rule 2
Condition aliased aliased
Action parent_ft.proj(**{fk: pk for fk, pk in attr_map.items()}) child_ft.proj(**{pk: fk for fk, pk in attr_map.items()})
Forward Rule 3 Backward Rule 3
Condition not aliased, restriction ⊄ child PK not aliased, restriction ⊄ parent PK
Action parent_ft.proj() child_ft.proj()

Symmetry is precise. ✓

Materialization rationale is sound. to_arrays() converts the master's accumulated query restriction into a literal value tuple, avoiding MySQL error 1093 (self-referential subquery) on subsequent forward cascade. The empty-result case (len(master_pk_values) == 0 → restrictions[master_name] = [False]) is the right behavior for "include master with zero matches in counts/iter".

Seed-is-Part special case is justified. A leaf Part has no out-edges, so the main loop's part_integrity block (nested inside for target in out_edges(node)) cannot fire. The pre-loop trigger at lines 467-474 is the only path that catches (Master.PartB & key).delete(part_integrity="cascade") when PartB is a leaf.

visited_masters correctly deduplicates pre-loop and main-loop triggers.

Tests cover the bug cases:

  • test_cascade_part_of_part_no_master_reference — Case 2
  • test_cascade_part_of_part_renamed_fk — Case 1
  • test_cascade_part_of_part_actual_delete — end-to-end SQL execution (no 1093)

Observations worth flagging

1. Unused parameter propagated_edges in _propagate_part_to_master (line 663). Accepted but never referenced inside the function body. Either remove from the signature or use it to track upward edges (e.g., prevent double-application if the function is invoked twice with overlapping paths). Right now it's misleading dead weight.

2. PR description typo. The test-section line says:

Three new MySQL tests in tests/integration/test_cascade_delete.py::TestLineageRefreshOnDecoration

But TestLineageRefreshOnDecoration is the class name from #1467 (the ~lineage PR). The tests added here aren't inside any class — they're top-level functions. Looks like a copy-paste from #1467's PR template.

3. PostgreSQL coverage gap. Tests use schema_by_backend, which is backend-parameterized in principle (see connection_by_backend/db_creds_by_backend). PR test plan says "All 3 new tests pass on MySQL" — only MySQL is claimed. The MySQL 1093 issue doesn't apply to PostgreSQL (which permits self-referential subqueries), but the upward propagation logic is supposed to work on both. Worth confirming the parameterization actually runs on PG, since the materialization choice is MySQL-motivated and PG might exercise different code paths.

4. Three-level Part chain not tested. Tests cover PartB → PartA → Master (2 hops). A PartC → PartB → PartA → Master chain would exercise the loop in _propagate_part_to_master for len(real_path) > 3 and confirm intermediate Parts are restricted at every hop, not just the first. Worth one test.

5. Multiple FK paths from Master to a Part. nx.shortest_path returns one path. If a Part has multiple FK chains to its Master (unusual but conceivable — e.g., Part references two different intermediate Parts), restrictions through the non-shortest paths are missed. The docstring should note this limitation (or explicitly check for graph-uniqueness).

6. Materialization memory cost. master_ft.proj().to_arrays() pulls all matching master PKs into Python memory. For a cascade preview / large delete, this could be substantial (bounded by the count of distinct master rows referenced by the matching parts — typically small, but not always). A docstring note would help users predict the cost surface.

7. Companion docs PR. PR description says a cascade-rules spec is forthcoming in datajoint-docs. The spec will be more reviewable once the impl is settled, so this PR doesn't need to wait — just noting the cross-reference for tracking.

Summary

The substantive bug is fixed and the design is sound. Items #1#6 are quality observations, not correctness issues — most could be addressed in a follow-up or as small docstring/test additions before merge. The unused parameter (#1) and PR description typo (#2) are mechanical fixes worth catching.

- Drop unused `propagated_edges` parameter from `_propagate_part_to_master`
  and its call sites. The parameter was vestigial after the design
  switched from edge-blocking to materialization at the master.

- Document two limitations in the docstring:
  - Single FK path: nx.shortest_path returns one path; non-shortest
    paths are not applied.
  - Memory cost of materialization: to_arrays() pulls matching master
    PKs into Python memory.

- Add test_cascade_three_level_part_chain covering PartC → PartB →
  PartA → Master. Confirms intermediate Parts are restricted at every
  hop, not just the first.

All 36 mysql tests in cascade_delete + cascading_delete + dependencies
+ semantic_matching pass.
@dimitri-yatsenko

Copy link
Copy Markdown
Member Author

Thanks @MilagrosMarin — thorough review. Pushed 9094a64 addressing #1, #4, #5, #6.

# Your point Resolution
1 Unused `propagated_edges` parameter Removed. Vestigial from the earlier edge-blocking design; materialization at the master made it unnecessary. Dropped from signature and both call sites.
2 PR description typo (`TestLineageRefreshOnDecoration`) Fixed in PR description (the tests are top-level functions, not inside a class).
3 PostgreSQL coverage gap The new tests use `schema_by_backend` which CI parameterizes for both backends. PG runs as `test_cascade_part_of_part_*[postgresql]` in CI; local errors here are container-availability only. I'll confirm CI green once it lands.
4 3-hop Part chain not tested Added `test_cascade_three_level_part_chain` — `PartC → PartB → PartA → Master`. Confirms intermediate Parts (PartA, PartB) are restricted at every hop.
5 Multiple FK paths limitation Documented in the docstring under a new "Limitations" section. The pattern is unusual; users on it can fall back to `part_integrity="ignore"` + manual `delete()`.
6 Materialization memory cost Documented in the same Limitations section. Bounded by distinct master rows referenced by the matching parts.
7 Companion docs PR datajoint-docs#182 (cascade-rules spec) is open.

Full mysql regression: 36 tests in cascade_delete + cascading_delete + dependencies + semantic_matching all pass on this commit.

dimitri-yatsenko added a commit that referenced this pull request Jun 23, 2026
Implements T2.2.a of the provenance trinity (datajoint-docs#183). Upstream
mirror of Diagram.cascade(): walks the FK graph from a restricted seed to
every ancestor with OR convergence — an ancestor entity is included if
reachable through any FK path from the seed.

Reuses the upward propagation primitives added by #1468
(_apply_propagation_rule_upward / _find_real_edge_props) applied here in
a generalized form (any child → any parent, not just Part → Master).

Branch note: stacked on fix/1429-cascade-part-part-renamed-fk (#1468)
for the upward primitives. Will rebase onto master after #1468 lands.

What's added:

- src/datajoint/diagram.py:
  - New @classmethod Diagram.trace(table_expr) — mirror of cascade(),
    walks ancestors instead of descendants, trims to ancestor subgraph.
  - New _propagate_restrictions_upstream(start_node) — multi-pass walk
    over in_edges, applies the upward rules at each real edge.
    Alias-node transparent.
  - New __getitem__(key) — supports both Table subclass/instance
    (returns pre-restricted QueryExpression) and string (returns
    pre-restricted FreeTable). Raises DataJointError for tables outside
    the trace's subgraph.
  - Bugfix in _apply_propagation_rule_upward Backward Rule 3: previous
    code projected child to its OWN PK (child_ft.proj()) which excluded
    non-primary FK columns. Now projects to the FK columns via
    proj(*attr_map.keys()), correctly carrying them into the parent
    restriction for non-primary-FK cases. Caught by
    test_trace_or_convergence_two_paths.

- src/datajoint/dependencies.py:
  - New load_all_upstream() — symmetric to load_all_downstream.
    Iteratively discovers upstream schemas reachable via reverse FK
    edges, expanding the graph until convergence.

- src/datajoint/adapters/{base,mysql,postgres}.py:
  - New find_upstream_schemas_sql(schemas_list) on each adapter,
    symmetric to find_downstream_schemas_sql.

- tests/integration/test_trace.py (new, 8 tests covering single-hop,
  multi-hop, renamed FK, OR convergence across two paths, non-ancestor
  rejection, string indexing → FreeTable, counts(), leaf-table seed).

All 8 trace tests pass on MySQL. Regression: test_cascade_delete +
test_cascading_delete + test_dependencies + test_semantic_matching
— 36 tests pass, no regressions from the Rule 3 fix.

Slated for DataJoint 2.3.
dimitri-yatsenko added a commit that referenced this pull request Jun 23, 2026
Implements T2.2.c of the provenance trinity, completing the trio
(Diagram.trace → self.upstream → strict_provenance).

When dj.config["strict_provenance"] = True, runtime gates enforce the
upstream-only convention inside make():
- Reads must target a table in the active trace's allowed set
  (declared ancestors + self + self's Parts).
- Writes must target self or self's Parts.
- Inserted rows' PK columns that overlap with the current key must
  equal the key's values (key-consistency rule).

Default is False. Existing make() bodies are unaffected.

Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace
(#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase
onto master after the chain merges.

What's added:

- src/datajoint/provenance.py (new): the runtime context module.
  - `_active_strict_make` ContextVar holding (target, allowed_tables,
    key) for the currently-executing make() invocation. ContextVar
    chosen over threading.local to propagate correctly across
    contextvars-aware concurrency boundaries.
  - `push_strict_make_context` / `pop_strict_make_context` — context
    lifecycle managed by `_populate_one`'s try/finally.
  - `assert_read_allowed(query_expression)` — read gate. Recursively
    discovers base tables via the QueryExpression's `_support` chain
    and checks each against the allowed set.
  - `assert_write_allowed(target_table, rows)` — write gate. Verifies
    the target is self or one of self's Part tables, and checks the
    key-consistency rule on each dict row.

- src/datajoint/settings.py: new `strict_provenance: bool` field on
  Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING
  entry.

- src/datajoint/autopopulate.py: in `_populate_one`, push the strict
  context (when the flag is on) just before the make() invocation
  block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name}
  ∪ {self's Parts}. Pop in the existing `finally` block.

- src/datajoint/expression.py: `QueryExpression.cursor` now calls
  `assert_read_allowed(self)` before issuing SQL. No-op outside make().

- src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)`
  after the existing `_allow_insert` check. No-op outside make().

Part-table detection uses class `__dict__` traversal (filtered to Part
subclasses) instead of `dir/getattr` to avoid triggering the
`_JobsDescriptor` (which would lazy-declare ~~table inside the populate
transaction — caught by the first test iteration).

Documented limitation (deferred): the read gate does not distinguish
reads that came through `self.upstream` from reads of the same ancestor
via a direct expression. Both are allowed if the table is in the
allowed set. The intent is to catch reads from *undeclared*
dependencies; tightening the "must come through self.upstream" path
requires propagating an attribution marker through QueryExpression
composition and is left for a follow-up release.

Tests in tests/integration/test_strict_provenance.py (6 new):

- test_strict_compliant_make_passes — make() reading via self.upstream
  and writing self.insert1 with matching key runs cleanly under strict.
- test_strict_blocks_read_from_undeclared_table — read from an unrelated
  table raises with "strict_provenance ... undeclared" message.
- test_strict_blocks_write_to_other_table — insert into a non-self,
  non-Part target raises "not permitted".
- test_strict_blocks_write_with_mismatched_key — row PK that disagrees
  with the current key raises "does not match the current make() key".
- test_strict_writes_to_part_table_pass — self.PartName.insert(...) works.
- test_strict_off_by_default_no_change — default-off regression check;
  the canonical "direct (Ancestor & key).fetch1()" pattern still works
  when strict_provenance is unset.

Regression: 17/17 autopopulate tests pass with strict_provenance unset
(default). 6/6 new strict tests pass with strict_provenance=True.
8/8 trace tests + 9/9 cascade tests unaffected.

Slated for DataJoint 2.3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cascade may fail when Part table references another Part with renamed foreign keys

2 participants