Skip to content

source-archive: skip only complete captures on re-runs#298

Merged
probably-jaden merged 1 commit into
mainfrom
feat/skip-complete-captures
Jun 29, 2026
Merged

source-archive: skip only complete captures on re-runs#298
probably-jaden merged 1 commit into
mainfrom
feat/skip-complete-captures

Conversation

@probably-jaden

@probably-jaden probably-jaden commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

ContentStore.lookup previously returned a cache hit for any prior capture within the TTL, even a partial one (e.g. a failed screenshot encode left screenshot_key=None). It now treats a capture as "done" only when it has every expected format — browser captures need html + markdown + screenshot; PDFs (no screenshot) need markdown — so a re-run retries the missing format instead of skipping it forever. Already-complete sites are still skipped.

Tests added for the incomplete-miss and PDF-complete cases.

ContentStore.lookup now treats a capture as a cache hit only when it has every
expected format (html + markdown + screenshot; PDFs need only markdown). A
partial capture (e.g. a failed screenshot encode) becomes a miss, so a re-run
retries the missing format instead of skipping it forever.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@probably-jaden probably-jaden merged commit 0018956 into main Jun 29, 2026
2 checks passed
@probably-jaden probably-jaden deleted the feat/skip-complete-captures branch June 29, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant