Skip to content

feat(#1424): self.upstream property for pre-restricted ancestor access#1473

Open
dimitri-yatsenko wants to merge 1 commit into
feat/1423-diagram-tracefrom
feat/1424-self-upstream
Open

feat(#1424): self.upstream property for pre-restricted ancestor access#1473
dimitri-yatsenko wants to merge 1 commit into
feat/1423-diagram-tracefrom
feat/1424-self-upstream

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

T2.2.b of the provenance trinity. Inside `make()`, `self.upstream` exposes a pre-constructed `Diagram.trace(self & key)` so user code can read declared ancestors with provenance-safe, ergonomic syntax.

Closes #1424. Slated for DataJoint 2.3.

Branch dependency

Stacked on `feat/1423-diagram-trace` (#1471). This PR's base is `feat/1423-diagram-trace`, not `master`. Will rebase onto master once #1471 (and its parent #1468) merge.

What's added

Component File
`AutoPopulate._upstream` (class attr, instance storage) `src/datajoint/autopopulate.py`
`AutoPopulate.upstream` property — returns the trace inside `make()`, raises with a helpful message outside `src/datajoint/autopopulate.py`
Wire `self._upstream = Diagram.trace(self & dict(key))` into `_populate_one()` before `make()`; clear in the existing `finally` block `src/datajoint/autopopulate.py`
5 integration tests covering: basic pre-restricted read, non-ancestor rejection, unset-outside-make error, stale-state cleanup, tripartite-make consistency `tests/integration/test_autopopulate.py`

Design notes

  • Lazy construction: `Diagram.trace(self & dict(key))` only does a graph copy at the `_populate_one` layer. The expensive SQL fetch fires when the user accesses `self.upstream[T].fetch(...)`. Cheap for `make()` bodies that don't use `self.upstream`.
  • No `make()` signature change: `self.upstream` is a new attribute on `self`, not a new parameter. Existing `make()` implementations are unaffected.
  • Tripartite-safe: `self._upstream` is set once before all three invocation sites (lines 616, 619, 625). All three phases see the same upstream view.
  • No stale state: the existing `finally` block (line 716) that resets `_allow_insert` now also resets `_upstream`, so subsequent attribute access raises a clear error rather than silently using a stale trace.

Tests

  • 5/5 new upstream tests pass on MySQL.
  • 17/17 existing autopopulate tests pass — no regression.

Sequencing

Once this PR merges, T2.2.c (`strict_provenance` config flag, the runtime gate) goes next. It branches off this PR.

Test plan

Implements T2.2.b of the provenance trinity. Inside make(), self.upstream
exposes a pre-constructed Diagram.trace(self & key) so users can read
declared ancestors with provenance-safe, ergonomic syntax.

Branch stacked on feat/1423-diagram-trace (#1471) for Diagram.trace().

What's added:

- src/datajoint/autopopulate.py:
  - AutoPopulate._upstream class attribute (default None) — instance
    storage for the per-make() trace.
  - AutoPopulate.upstream property — returns the trace if set, raises
    DataJointError with a clear "only available inside make()" message
    otherwise. The error includes the fallback pattern
    (dj.Diagram.trace(self & key)) so the user knows the escape hatch.
  - In _populate_one, set self._upstream = Diagram.trace(self & dict(key))
    immediately before the make() invocation block. Construction is
    lazy at this layer (graph copy only); the SQL fetch fires when the
    user accesses self.upstream[T].fetch(...).
  - The existing `finally` block (line 716) that resets _allow_insert
    now also resets _upstream to None, so subsequent attribute access
    raises a clear error rather than silently returning a stale trace
    from the previous make() call.

What's not changed:

- make() signature: unchanged — key remains a dict, make_kwargs work
  as before. self.upstream is a new attribute on self, not a new
  parameter.
- Tripartite make pattern: self._upstream is set once before all three
  make() invocations, so all three phases see the same upstream view.

Tests in tests/integration/test_autopopulate.py (5 new):

- test_upstream_provides_pre_restricted_ancestor — basic case: make()
  reads self.upstream[Subject].fetch1("name") and the value is correctly
  pre-restricted to the current key.
- test_upstream_rejects_non_ancestor — self.upstream[Unrelated] raises
  DataJointError ("not in this trace"). Inherited from Diagram.__getitem__.
- test_upstream_unset_outside_make — accessing the property outside of
  make() raises with the helpful "only available inside make()" message.
- test_upstream_cleared_after_make — after populate() completes, accessing
  the property on a fresh instance still raises (verifies the finally
  cleanup; no stale state).
- test_upstream_seen_across_tripartite_make — both make_fetch / make_compute
  / make_insert see the same self.upstream value.

Full regression: 17/17 autopopulate tests pass on MySQL.

Slated for DataJoint 2.3. Blocked on #1471 (Diagram.trace) merging;
T2.2.c (strict_provenance) is stacked on this PR.
dimitri-yatsenko added a commit that referenced this pull request Jun 23, 2026
Implements T2.2.c of the provenance trinity, completing the trio
(Diagram.trace → self.upstream → strict_provenance).

When dj.config["strict_provenance"] = True, runtime gates enforce the
upstream-only convention inside make():
- Reads must target a table in the active trace's allowed set
  (declared ancestors + self + self's Parts).
- Writes must target self or self's Parts.
- Inserted rows' PK columns that overlap with the current key must
  equal the key's values (key-consistency rule).

Default is False. Existing make() bodies are unaffected.

Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace
(#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase
onto master after the chain merges.

What's added:

- src/datajoint/provenance.py (new): the runtime context module.
  - `_active_strict_make` ContextVar holding (target, allowed_tables,
    key) for the currently-executing make() invocation. ContextVar
    chosen over threading.local to propagate correctly across
    contextvars-aware concurrency boundaries.
  - `push_strict_make_context` / `pop_strict_make_context` — context
    lifecycle managed by `_populate_one`'s try/finally.
  - `assert_read_allowed(query_expression)` — read gate. Recursively
    discovers base tables via the QueryExpression's `_support` chain
    and checks each against the allowed set.
  - `assert_write_allowed(target_table, rows)` — write gate. Verifies
    the target is self or one of self's Part tables, and checks the
    key-consistency rule on each dict row.

- src/datajoint/settings.py: new `strict_provenance: bool` field on
  Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING
  entry.

- src/datajoint/autopopulate.py: in `_populate_one`, push the strict
  context (when the flag is on) just before the make() invocation
  block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name}
  ∪ {self's Parts}. Pop in the existing `finally` block.

- src/datajoint/expression.py: `QueryExpression.cursor` now calls
  `assert_read_allowed(self)` before issuing SQL. No-op outside make().

- src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)`
  after the existing `_allow_insert` check. No-op outside make().

Part-table detection uses class `__dict__` traversal (filtered to Part
subclasses) instead of `dir/getattr` to avoid triggering the
`_JobsDescriptor` (which would lazy-declare ~~table inside the populate
transaction — caught by the first test iteration).

Documented limitation (deferred): the read gate does not distinguish
reads that came through `self.upstream` from reads of the same ancestor
via a direct expression. Both are allowed if the table is in the
allowed set. The intent is to catch reads from *undeclared*
dependencies; tightening the "must come through self.upstream" path
requires propagating an attribution marker through QueryExpression
composition and is left for a follow-up release.

Tests in tests/integration/test_strict_provenance.py (6 new):

- test_strict_compliant_make_passes — make() reading via self.upstream
  and writing self.insert1 with matching key runs cleanly under strict.
- test_strict_blocks_read_from_undeclared_table — read from an unrelated
  table raises with "strict_provenance ... undeclared" message.
- test_strict_blocks_write_to_other_table — insert into a non-self,
  non-Part target raises "not permitted".
- test_strict_blocks_write_with_mismatched_key — row PK that disagrees
  with the current key raises "does not match the current make() key".
- test_strict_writes_to_part_table_pass — self.PartName.insert(...) works.
- test_strict_off_by_default_no_change — default-off regression check;
  the canonical "direct (Ancestor & key).fetch1()" pattern still works
  when strict_provenance is unset.

Regression: 17/17 autopopulate tests pass with strict_provenance unset
(default). 6/6 new strict tests pass with strict_provenance=True.
8/8 trace tests + 9/9 cascade tests unaffected.

Slated for DataJoint 2.3.
@dimitri-yatsenko dimitri-yatsenko force-pushed the feat/1424-self-upstream branch from 86b1ae7 to 7e4130f Compare June 23, 2026 13:23
@dimitri-yatsenko dimitri-yatsenko marked this pull request as ready for review June 23, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant