Skip to content

fix(sbom): release lock before sleeping in _rate_limit#1896

Open
mesutoezdil wants to merge 5 commits into
NVIDIA:mainfrom
mesutoezdil:fix/sbom-rate-limit-lock
Open

fix(sbom): release lock before sleeping in _rate_limit#1896
mesutoezdil wants to merge 5 commits into
NVIDIA:mainfrom
mesutoezdil:fix/sbom-rate-limit-lock

Conversation

@mesutoezdil

@mesutoezdil mesutoezdil commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

_rate_limit called time.sleep(wait) inside the _rate_lock block.
While 1 thread slept (0.15s per call), all other threads blocked on the lock, even when querying different domains.

With _MAX_WORKERS = 12 running crates.io, npm, and pypi concurrently, the thread pool was effectively serial.

Move time.sleep(wait) outside the lock.
Update _last_request[domain] before sleeping so subsequent threads see the correct timestamp.

Measured concurrent calls to 2 independent domains before and after:

  • Before: ~0.207s (serial)
  • After: ~0.105s (concurrent)

@mesutoezdil mesutoezdil requested review from a team, derekwaynecarr and mrunalp as code owners June 13, 2026 10:00
@copy-pr-bot

copy-pr-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: This is project-valid small, concentrated SBOM tooling work. The goal of avoiding global lock contention across independent package registries is in scope for OpenShell release/SBOM maintenance.
Head SHA: cc4f1d5947335c961d5c4dad2e15614234647ddd

Review findings:

  • Blocking: deploy/sbom/resolve_licenses.py:214 records _last_request[domain] = now before sleeping. That releases the global lock, but it does not reserve a future slot for same-domain callers. If two same-domain threads enter a few milliseconds apart, they can both sleep roughly interval and then issue requests only a few milliseconds apart, weakening the per-domain rate limit. Please reserve the next allowed request time while holding the lock, for example by calculating next_request = max(now, last + interval), storing that value, and sleeping for next_request - now outside the lock. Using time.monotonic() would also be safer for interval timing than wall-clock time.time().
  • Blocking: Please add focused regression coverage for the rate limiter behavior: same-domain concurrent callers should be spaced by at least the configured interval, while different domains should not block each other on the sleep.

Docs: Not needed; this changes internal SBOM tooling behavior and does not alter user-facing CLI/API/TUI/docs behavior.

Checks: DCO and vouch are passing. Branch Checks and Helm Lint are still waiting for copy-pr validation, but I am not posting /ok to test until the blocking review feedback is addressed.

Next state: gator:in-review

@johntmyers johntmyers added the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 13, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Author Follow-Up Nudge

This PR has been in gator:in-review for more than 48 business hours with unresolved review feedback on head cc4f1d5947335c961d5c4dad2e15614234647ddd.

@mesutoezdil, please respond to the review comments or push an update. If this is no longer planned, please say so and a maintainer can close it out.

@mesutoezdil mesutoezdil requested a review from maxamillion as a code owner June 18, 2026 14:51
@mesutoezdil

mesutoezdil commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Fixed in e067839. Renamed to resolve_licenses_test.py, added test:sbom task to tasks/test.toml.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 153abe065f6db4e4e7ac1424526f5fe63a995e4f after @mesutoezdil's 2026-06-18 14:51 UTC update saying the same-domain slot reservation and regression tests were added.

Disposition: partially resolved.

Remaining items:

  • Resolved: _rate_limit now uses time.monotonic() and reserves next_req = max(now, last + interval) while holding _rate_lock, which addresses the prior same-domain wake-together race.
  • Blocking: the new regression test is not wired into the normal test suite. pyproject.toml configures pytest discovery for *_test.py under python/, and tasks/test.toml runs uv run pytest python/. The added file is deploy/sbom/test_resolve_licenses.py, so CI and mise run test will not run it by default. Please either add a focused SBOM test task to tasks/test.toml and include it in test, or otherwise wire deploy/sbom tests into the standard suite. Renaming the file to match the repo's *_test.py convention may also be needed depending on the chosen task.

Docs: not needed; this is release/SBOM tooling behavior, not a direct user-facing UX change.

Next state: gator:in-review

time.sleep was called inside the _rate_lock block, blocking all threads
from checking their own domain rate limit while one thread slept.
With _MAX_WORKERS=12 querying crates.io, npm, and pypi concurrently,
this made the thread pool effectively serial.

Move the sleep outside the lock so threads for different domains
can proceed concurrently.
Use time.monotonic() and store next_req = max(now, last + interval) so
concurrent same-domain callers each get a unique future slot rather than
all sleeping the same amount. Add regression tests for same-domain
spacing and different-domain non-blocking.
Rename test_resolve_licenses.py to resolve_licenses_test.py to match
repo *_test.py convention. Add test:sbom task to tasks/test.toml and
include it in the top-level test depends so mise run test picks it up.
@mesutoezdil mesutoezdil force-pushed the fix/sbom-rate-limit-lock branch from 8eac131 to e067839 Compare June 18, 2026 15:06
@mesutoezdil

Copy link
Copy Markdown
Contributor Author

ready for re-check on e067839

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head e067839174a00278584c76e6e23a55947ff6b08c after @mesutoezdil's 2026-06-18 15:08 UTC comment requesting a re-check.

Disposition: partially resolved.

Remaining items:

  • Resolved: the prior test-suite wiring issue is addressed. tasks/test.toml now adds test:sbom to the top-level test dependency, and deploy/sbom/resolve_licenses_test.py follows the repo's *_test.py discovery convention for that task.
  • Blocking: deploy/sbom/resolve_licenses_test.py is missing the required SPDX header. Branch Checks run mise run license:check, which scans tracked .py files, so this new test file is expected to fail until it includes the standard two-line NVIDIA/Apache-2.0 header.
  • Blocking: test_different_domains_do_not_block_each_other does not currently exercise the original lock-held-sleep regression. Because both tested domains start without a prior timestamp, both calls have zero wait, so the old implementation would also pass. Please pre-seed one domain so it must sleep, start that sleeper, then assert an independent domain returns quickly while the first thread is sleeping.

Docs: not needed; this remains internal SBOM tooling behavior, not a direct user-facing UX change.

Checks: I am keeping this in review and not posting /ok to test until the blocking review feedback is addressed.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 7c210df2499e0cae639eb161290180eae9474f2f after @mesutoezdil's 2026-06-18 15:20 UTC comment requesting a re-check on 7c210df2.

Disposition: resolved.

Remaining items:

  • Resolved: the new test file now has the required SPDX header.
  • Resolved: test_different_domains_do_not_block_each_other now pre-seeds alpha so one thread must sleep while beta should proceed independently, which exercises the original lock-held-sleep regression.
  • No blocking review items remain. The independent reviewer noted a non-blocking robustness concern that the cross-domain test remains somewhat scheduler-dependent because ready.set() fires before the alpha thread is guaranteed to be inside _rate_limit; that can be improved later, but it does not block CI or gator progression.

Docs: not needed; this remains internal SBOM tooling behavior, not a direct user-facing UX change.

Checks: DCO is passing. Branch Checks and Helm Lint are waiting for copy-pr validation, and I have maintainer authority, so I will post /ok to test 7c210df2499e0cae639eb161290180eae9474f2f in a separate command-only comment.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test 7c210df

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: This remains project-valid small, concentrated SBOM tooling work.
Head SHA: 7c210df2499e0cae639eb161290180eae9474f2f

Review findings:

  • Blocking: Branch Checks failed in both Python jobs during mise run python:lint. ruff reports B905 and RUF007 at deploy/sbom/resolve_licenses_test.py:29 because the test uses zip(times, times[1:]) for successive pairs. Please replace that with itertools.pairwise(times) or another lint-clean equivalent.

Docs: not needed; this remains internal SBOM tooling behavior and does not alter user-facing CLI/API/TUI/docs behavior.

Checks: OpenShell / Branch Checks is failing; OpenShell / Helm Lint, OpenShell / E2E, and OpenShell / GPU E2E are green or intentionally skipped because no E2E labels are applied.

Next state: gator:in-review

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 2511ce050ccfafa05c4cee1e34d1d7bc3fa2853e after @mesutoezdil's 2026-06-18 16:25 UTC comment requesting a re-check on 2511ce05.

Disposition: resolved.

Remaining items:

  • Resolved: deploy/sbom/resolve_licenses_test.py now uses itertools.pairwise(times), which addresses the ruff B905/RUF007 failure from zip(times, times[1:]).
  • Resolved: prior gator feedback remains addressed: _rate_limit reserves same-domain slots under _rate_lock, uses monotonic time, has focused SBOM regression tests, wires those tests into mise run test via test:sbom, and includes the required SPDX header.
  • No blocking review items remain. The independent reviewer noted only non-blocking cleanup/robustness suggestions for the test implementation.

Docs: not needed; this remains internal release/SBOM tooling behavior, not a direct user-facing CLI/API/TUI/docs change.

Checks: DCO is passing. Branch Checks and Helm Lint are waiting for copy-pr validation on the new head, and I have maintainer authority, so I will post /ok to test 2511ce050ccfafa05c4cee1e34d1d7bc3fa2853e in a separate command-only comment.

Next state: gator:watch-pipeline

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test 2511ce0

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 18, 2026
@johntmyers johntmyers added the gator:approval-needed Gator completed review; maintainer approval needed label Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Maintainer Approval Needed

Gator validation and PR monitoring are complete.

Validation: This is project-valid small, concentrated SBOM tooling work that removes unnecessary cross-domain lock contention in release/SBOM license resolution.
Review: Prior blocking findings are resolved. _rate_limit now reserves same-domain request slots under _rate_lock, uses monotonic timing, adds focused SBOM regression coverage, wires the SBOM tests into mise run test, and keeps the new test file lint/license clean.
Docs: Not needed; this changes internal release/SBOM tooling behavior and does not alter user-facing CLI/API/TUI/docs behavior.
Checks: DCO is passing. OpenShell / Branch Checks and OpenShell / Helm Lint are passing for head 2511ce050ccfafa05c4cee1e34d1d7bc3fa2853e.
E2E: N/A. No test:e2e, test:e2e-gpu, or Kubernetes E2E labels are required for this SBOM-only change; the required E2E status gates report those suites as not applied.

Human maintainer approval or merge decision is now required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:approval-needed Gator completed review; maintainer approval needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants