feat: Ability to re-include older episodes when max_age is increased by mm503 · Pull Request #872 · mxpv/podsync

mm503 · 2026-06-14T17:51:19Z

Hey @mxpv, I have a bit controversial PR for you. On several occasions, I needed to raise the max_age setting for a channel to go farther in history, and as you probably know, there is a design limitation that makes it a bit hard.

Increasing a feed's max_age now pulls in older episodes that were previously out of range, instead of silently ignoring them. This required decoupling discovery depth from page_size and making episode re-inclusion work off the database rather than the API window to save API credits.

Changes:

Discovery depth is driven by max_age: a normal update still does a shallow scan (page_size most-recent episodes), but when max_age is expanded beyond what was previously scanned, the YouTube builder performs a one-time deep scan that pages back to the max_age cutoff. A per-feed ScannedThrough high-water mark gates this so it costs nothing in steady state, and it is bounded by a 20-page backstop. Brand-new feeds stay shallow.
Previously cleaned (soft-deleted) episodes that match the filters again are re-queued for download off the database records directly, independent of the page_size API window. Only episodes that would survive the keep_last cleanup are re-queued, to avoid a re-download/re-clean loop.
Cleanup no longer clears an episode's title/description when soft-deleting it, so a re-included episode keeps its metadata. For episodes cleaned by older versions (which wiped metadata), the title/description are recovered from the current feed query.
The feed build skips a downloaded episode with an empty title instead of aborting the whole feed, so one malformed item can no longer take down the entire feed.
A startup warning is logged when clean.keep_last exceeds page_size and no max_age is configured.
Currently implemented for YouTube.

It was a bit of a journey where I went through damaging some DB records on my side all the way to repairing them and having a behavior that doesn't shred through API credits unnecessarily.

Increasing a feed's max_age now pulls in older episodes that were previously out of range, instead of silently ignoring them. This required decoupling discovery depth from page_size and making episode re-inclusion work off the database rather than the API window. - Discovery depth is driven by max_age: a normal update still does a shallow scan (page_size most-recent episodes), but when max_age is expanded beyond what was previously scanned, the YouTube builder performs a one-time deep scan that pages back to the max_age cutoff. A per-feed ScannedThrough high-water mark gates this so it costs nothing in steady state, and it is bounded by a 20-page backstop. Brand-new feeds stay shallow. - Previously cleaned (soft-deleted) episodes that match the filters again are re-queued for download off the database records directly, independent of the page_size API window. Only episodes that would survive the keep_last cleanup are re-queued, to avoid a re-download/re-clean loop. - Cleanup no longer clears an episode's title/description when soft-deleting it, so a re-included episode keeps its metadata. For episodes cleaned by older versions (which wiped metadata), the title/description are recovered from the current feed query. - The feed build skips a downloaded episode with an empty title instead of aborting the whole feed, so one malformed item can no longer take down the entire feed. - A startup warning is logged when clean.keep_last exceeds page_size and no max_age is configured. Currently implemented for YouTube.

Copilot

Pull request overview

Adds a mechanism to “catch up” and re-include older episodes when a feed’s max_age is increased, primarily for YouTube feeds, while trying to avoid steady-state API quota costs by persisting a per-feed discovery watermark.

Changes:

Introduces ScannedThrough and DiscoverSince to support one-time (per expansion) max_age-driven deep discovery for YouTube, capped at 20 pages.
Re-queues previously cleaned episodes based on DB records (with metadata preservation) and adds tests covering the new discovery/resurrection behavior.
Makes XML feed generation resilient to malformed downloaded episodes with empty titles; adds startup warning for keep_last > page_size without max_age.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
services/update/updater.go	Adds discovery watermarking, deep-scan windowing, resurrection logic, and preserves metadata on cleanup
services/update/updater_test.go	New unit tests for discoveryWindow/nextScannedThrough/resurrection/cleanup behavior
services/update/matcher.go	Splits duration/age matching into helper for cheaper pre-filtering
pkg/model/feed.go	Adds `ScannedThrough` field to feed model
pkg/feed/config.go	Adds runtime-only `DiscoverSince` hint for builders
pkg/builder/youtube.go	Implements deep scan paging to cutoff with quota backstop and keeps in-window episodes
pkg/feed/xml.go	Skips downloaded episodes with empty title instead of failing whole feed build
pkg/feed/xml_test.go	Adds test ensuring empty-title episodes don’t break XML build
cmd/podsync/config.go	Warns on `keep_last > page_size` when no `max_age` is set
cmd/podsync/config_test.go	Adds tests for the new startup warning behavior
CLAUDE.md	Updates repository documentation to describe the new discovery/resurrection behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mm503 · 2026-06-25T19:42:08Z

+	// Carry forward the discovery high-water mark so the deep scan only repeats while catching up.
+	result.ScannedThrough = nextScannedThrough(since, watermark, result, processed, &feedConfig.Filters)
+


Hi @mxpv , I accepted the proposed fix to the second issue and commit d889025 has a fix for the first issue. I built and tested locally - it seems to behave as expected

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ScannedThrough was computed before resurrectEpisodes ran, counting cleaned episodes as processed. When resurrection re-queued more than page_size of them, the watermark advanced and the next shallow cycle pruned the still-pending records. Resurrection now reports the re-queued IDs and they are excluded from the processed set before the watermark is computed, so the next cycle stays deep until they finish downloading.

mm503 force-pushed the fix/support_expand_max_age branch 2 times, most recently from 6094c01 to f34a9b9 Compare June 14, 2026 18:01

mm503 force-pushed the fix/support_expand_max_age branch from f34a9b9 to 6d5c2e5 Compare June 14, 2026 18:06

mxpv requested a review from Copilot June 23, 2026 02:43

Copilot started reviewing on behalf of mxpv June 23, 2026 02:43 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

mm503 and others added 2 commits June 25, 2026 14:06

Potential fix for pull request finding

428f721

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Ability to re-include older episodes when max_age is increased#872

feat: Ability to re-include older episodes when max_age is increased#872
mm503 wants to merge 3 commits into
mxpv:mainfrom
mm503:fix/support_expand_max_age

mm503 commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mm503 Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Carry forward the discovery high-water mark so the deep scan only repeats while catching up.
		result.ScannedThrough = nextScannedThrough(since, watermark, result, processed, &feedConfig.Filters)

Uh oh!

Conversation

mm503 commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mm503 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants