Skip to content

feat: Ability to re-include older episodes when max_age is increased#872

Open
mm503 wants to merge 3 commits into
mxpv:mainfrom
mm503:fix/support_expand_max_age
Open

feat: Ability to re-include older episodes when max_age is increased#872
mm503 wants to merge 3 commits into
mxpv:mainfrom
mm503:fix/support_expand_max_age

Conversation

@mm503

@mm503 mm503 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Hey @mxpv, I have a bit controversial PR for you. On several occasions, I needed to raise the max_age setting for a channel to go farther in history, and as you probably know, there is a design limitation that makes it a bit hard.

Increasing a feed's max_age now pulls in older episodes that were previously out of range, instead of silently ignoring them. This required decoupling discovery depth from page_size and making episode re-inclusion work off the database rather than the API window to save API credits.

Changes:

  • Discovery depth is driven by max_age: a normal update still does a shallow scan (page_size most-recent episodes), but when max_age is expanded beyond what was previously scanned, the YouTube builder performs a one-time deep scan that pages back to the max_age cutoff. A per-feed ScannedThrough high-water mark gates this so it costs nothing in steady state, and it is bounded by a 20-page backstop. Brand-new feeds stay shallow.
  • Previously cleaned (soft-deleted) episodes that match the filters again are re-queued for download off the database records directly, independent of the page_size API window. Only episodes that would survive the keep_last cleanup are re-queued, to avoid a re-download/re-clean loop.
  • Cleanup no longer clears an episode's title/description when soft-deleting it, so a re-included episode keeps its metadata. For episodes cleaned by older versions (which wiped metadata), the title/description are recovered from the current feed query.
  • The feed build skips a downloaded episode with an empty title instead of aborting the whole feed, so one malformed item can no longer take down the entire feed.
  • A startup warning is logged when clean.keep_last exceeds page_size and no max_age is configured.
  • Currently implemented for YouTube.

It was a bit of a journey where I went through damaging some DB records on my side all the way to repairing them and having a behavior that doesn't shred through API credits unnecessarily.

@mm503 mm503 force-pushed the fix/support_expand_max_age branch 2 times, most recently from 6094c01 to f34a9b9 Compare June 14, 2026 18:01
Increasing a feed's max_age now pulls in older episodes that were previously out of range, instead of silently
ignoring them. This required decoupling discovery depth from page_size and making episode re-inclusion work off
the database rather than the API window.

- Discovery depth is driven by max_age: a normal update still does a shallow scan (page_size most-recent
  episodes), but when max_age is expanded beyond what was previously scanned, the YouTube builder performs a
  one-time deep scan that pages back to the max_age cutoff. A per-feed ScannedThrough high-water mark gates this
  so it costs nothing in steady state, and it is bounded by a 20-page backstop. Brand-new feeds stay shallow.
- Previously cleaned (soft-deleted) episodes that match the filters again are re-queued for download off the
  database records directly, independent of the page_size API window. Only episodes that would survive the
  keep_last cleanup are re-queued, to avoid a re-download/re-clean loop.
- Cleanup no longer clears an episode's title/description when soft-deleting it, so a re-included episode keeps
  its metadata. For episodes cleaned by older versions (which wiped metadata), the title/description are
  recovered from the current feed query.
- The feed build skips a downloaded episode with an empty title instead of aborting the whole feed, so one
  malformed item can no longer take down the entire feed.
- A startup warning is logged when clean.keep_last exceeds page_size and no max_age is configured.

Currently implemented for YouTube.
@mm503 mm503 force-pushed the fix/support_expand_max_age branch from f34a9b9 to 6d5c2e5 Compare June 14, 2026 18:06
@mxpv mxpv requested a review from Copilot June 23, 2026 02:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a mechanism to “catch up” and re-include older episodes when a feed’s max_age is increased, primarily for YouTube feeds, while trying to avoid steady-state API quota costs by persisting a per-feed discovery watermark.

Changes:

  • Introduces ScannedThrough and DiscoverSince to support one-time (per expansion) max_age-driven deep discovery for YouTube, capped at 20 pages.
  • Re-queues previously cleaned episodes based on DB records (with metadata preservation) and adds tests covering the new discovery/resurrection behavior.
  • Makes XML feed generation resilient to malformed downloaded episodes with empty titles; adds startup warning for keep_last > page_size without max_age.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
services/update/updater.go Adds discovery watermarking, deep-scan windowing, resurrection logic, and preserves metadata on cleanup
services/update/updater_test.go New unit tests for discoveryWindow/nextScannedThrough/resurrection/cleanup behavior
services/update/matcher.go Splits duration/age matching into helper for cheaper pre-filtering
pkg/model/feed.go Adds ScannedThrough field to feed model
pkg/feed/config.go Adds runtime-only DiscoverSince hint for builders
pkg/builder/youtube.go Implements deep scan paging to cutoff with quota backstop and keeps in-window episodes
pkg/feed/xml.go Skips downloaded episodes with empty title instead of failing whole feed build
pkg/feed/xml_test.go Adds test ensuring empty-title episodes don’t break XML build
cmd/podsync/config.go Warns on keep_last > page_size when no max_age is set
cmd/podsync/config_test.go Adds tests for the new startup warning behavior
CLAUDE.md Updates repository documentation to describe the new discovery/resurrection behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/update/updater.go Outdated
Comment on lines +169 to +171
// Carry forward the discovery high-water mark so the deep scan only repeats while catching up.
result.ScannedThrough = nextScannedThrough(since, watermark, result, processed, &feedConfig.Filters)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mxpv , I accepted the proposed fix to the second issue and commit d889025 has a fix for the first issue. I built and tested locally - it seems to behave as expected

Comment thread services/update/updater.go
mm503 and others added 2 commits June 25, 2026 14:06
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
ScannedThrough was computed before resurrectEpisodes ran, counting
cleaned episodes as processed. When resurrection re-queued more than
page_size of them, the watermark advanced and the next shallow cycle
pruned the still-pending records. Resurrection now reports the re-queued
IDs and they are excluded from the processed set before the watermark is
computed, so the next cycle stays deep until they finish downloading.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants