Skip to content

Clean up multiprocessing pools on failure#208

Merged
acdha merged 1 commit into
LibraryOfCongress:masterfrom
artefactual-labs:dev/pool-fix
Jun 18, 2026
Merged

Clean up multiprocessing pools on failure#208
acdha merged 1 commit into
LibraryOfCongress:masterfrom
artefactual-labs:dev/pool-fix

Conversation

@sevein

@sevein sevein commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Ensure bagit-python reliably cleans up multiprocessing pools when parallel manifest generation or validation fails with processes > 1.

Successful make_bag() calls already cleaned up correctly through the normal graceful pool path. The gap was failures inside Pool.map(): worker exceptions skipped close() and join(), allowing BagIt child processes to remain alive after the caller received the error. In long-running services, those leftover children can later become orphaned or defunct when the owning worker exits.

This follows up on c451b24 ("Wait for validation Pool to finish"), the validation-pool cleanup work Douglas and I did, by extending the same reliability expectation to failure paths.

Ensure bagit-python reliably cleans up multiprocessing pools when parallel
manifest generation or validation fails with processes > 1.

Successful make_bag() calls already cleaned up correctly through the normal
graceful pool path. The gap was failures inside Pool.map(): worker
exceptions skipped close() and join(), allowing BagIt child processes to remain
alive after the caller received the error. In long-running services, those
leftover children can later become orphaned or defunct when the owning worker
exits.

This follows up on c451b24 ("Wait for validation Pool to finish"), the
validation-pool cleanup work Douglas and I did, by extending the same
reliability expectation to failure paths.
Comment thread src/bagit/__init__.py
else:
pool = multiprocessing.Pool(
processes if processes else None, initializer=worker_init
hash_results = _multiprocessing_pool_map(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether we might want to refactor this to avoid buffering everything in memory (i.e. using Pool.imap so it could yield from in the utility function but I don't think it's very common that people are using this in a way where they need to worry about total RAM or can meaningfully process the results before they're all ready.

@acdha acdha merged commit 4bd2713 into LibraryOfCongress:master Jun 18, 2026
10 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants