fix(scheduler): honor a pauseJob() that lands while a job is in flight#161
Open
truffle-dev wants to merge 1 commit into
Open
fix(scheduler): honor a pauseJob() that lands while a job is in flight#161truffle-dev wants to merge 1 commit into
truffle-dev wants to merge 1 commit into
Conversation
executeJob captured newStatus from the job snapshot taken at pickup, then wrote it back unconditionally after handleMessage returned. For an agent task that runs for minutes, a pauseJob() landing in that window set status='paused' in the row, but the write-back stomped it back to 'active' and the next fire ran anyway. Re-read the live status just before the write-back and honor an external pause. Only the 'active' continuation case is at risk; 'completed' and 'failed' are this run's terminal outcomes and must win. resumeJob() recomputes next_run_at, so clearing it here is safe. Fixes ghostwright#148
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
executeJobcapturesnewStatusfrom the in-memory job snapshot taken at pickup, then writes it back unconditionally afterhandleMessagereturns. Because an agent task can run for minutes, apauseJob()call landing during that window writesstatus='paused'to the row, but the write-back stomps it back to'active'and the next scheduled fire runs anyway. The pause is silently lost.This honors a concurrent pause by re-reading the live status just before the write-back.
Why this matters
pauseJob()(service.ts) doesUPDATE ... SET status='paused' WHERE id=? AND status='active'. The SQL succeeds, butexecuteJob's finalUPDATE scheduled_jobs SET ... status=?uses the stale captured value and overwrites it. An operator who pauses a long-running recurring job mid-execution sees it keep firing. There is no error, no log, nothing to debug against.The fix
Right before the write-back, re-read the row's current status. If an external writer set it to
'paused'during the run, honor it (and clearnext_run_at). The guard is scoped to the'active'continuation case only:'completed'and'failed'are this run's terminal outcomes and must win over a concurrent pause.resumeJob()recomputesnext_run_atfrom the schedule when the operator un-pauses, so clearing it here is safe.One extra single-row read per fire, only on the continuation path.
Test
Adds a regression test in
scheduler-hardening.test.ts. The mockedhandleMessageperforms the concurrent pause mid-run (UPDATE ... SET status='paused'), then the test asserts the row stays'paused'withnext_run_atnull afterexecuteJob.Mutation-proved: reverting the fix turns the test red (
Expected: "paused"), restoring it turns it green.bun test src/scheduler/-> 116 pass, 0 failbun run lint-> cleanbun run typecheck-> cleanFixes #148