Skip to content

[MINOR] Fix class-init deadlock between AOffset and its subclasses#2502

Merged
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix/offset-clinit-deadlock
Jun 23, 2026
Merged

[MINOR] Fix class-init deadlock between AOffset and its subclasses#2502
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix/offset-clinit-deadlock

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

AOffset initialized a cached empty slice in its static initializer by instantiating its own OffsetEmpty subclass. Since OffsetEmpty (and the other offset subclasses) depend on AOffset being initialized first, this formed a superclass/subclass class-initialization cycle. When two threads first touched the offset classes concurrently (e.g. parallel test execution), each could hold one class's init monitor while waiting for the other, deadlocking on the JVM class-initialization monitors. Such a deadlock is invisible to the JVM deadlock detector and cannot be interrupted, so the affected JVM hangs indefinitely. It only manifests under concurrent first-touch, which is why it never reproduced in single-threaded local runs.

Defer the empty slice to a lazy holder accessed via emptySlice(), so AOffset's static initializer no longer references any subclass. By the time the holder is touched, AOffset is already initialized, so no cycle exists.

Add a regression test that forces concurrent first-initialization of the offset classes through a dedicated class loader across repeated rounds and fails if it does not complete promptly.

AOffset initialized a cached empty slice in its static initializer by
instantiating its own OffsetEmpty subclass. Since OffsetEmpty (and the
other offset subclasses) depend on AOffset being initialized first, this
formed a superclass/subclass class-initialization cycle. When two threads
first touched the offset classes concurrently (e.g. parallel test
execution), each could hold one class's init monitor while waiting for the
other, deadlocking on the JVM class-initialization monitors. Such a
deadlock is invisible to the JVM deadlock detector and cannot be
interrupted, so the affected JVM hangs indefinitely. It only manifests
under concurrent first-touch, which is why it never reproduced in
single-threaded local runs.

Defer the empty slice to a lazy holder accessed via emptySlice(), so
AOffset's static initializer no longer references any subclass. By the
time the holder is touched, AOffset is already initialized, so no cycle
exists.

Add a regression test that forces concurrent first-initialization of the
offset classes through a dedicated class loader across repeated rounds and
fails if it does not complete promptly.
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.42%. Comparing base (e4f0987) to head (a88db67).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2502      +/-   ##
============================================
- Coverage     71.47%   71.42%   -0.06%     
+ Complexity    48883    48848      -35     
============================================
  Files          1573     1573              
  Lines        189238   189239       +1     
  Branches      37128    37128              
============================================
- Hits         135261   135167      -94     
- Misses        43530    43609      +79     
- Partials      10447    10463      +16     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard Baunsgaard merged commit 3871809 into apache:main Jun 23, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant