Skip to content

Conversation

@tw4l
Copy link
Member

@tw4l tw4l commented Nov 18, 2025

Fixes #2957

Full backend and frontend implementation, with a new email notification to org admins when a crawl is paused because an org quota has been reached.

Backend changes

  • Modify operator to auto-pause crawls when quotas are reached or archiving is disabled rather than stopping the crawls
  • Add new crawl states: paused_storage_quota_reached, paused_time_quota_reached, paused_org_readonly
  • Add uploaded WACZs to org storage totals immediately after upload so that auto-paused crawls will actually put the org's bytesStored above the storage quota
  • Send an email from new template to all org admins when a crawl is auto-paused with information about what to do
  • Fix datetime deprecation in tests

Frontend changes

  • Add new paused crawl states
  • Update checks throughout frontend for whether crawl is paused to compare against all paused states

Dependencies

Relies on crawler changes introduced in webrecorder/browsertrix-crawler#919

Out of scope

Crawl workflow counts are a bit off, counting all crawls that complete as successful regardless of state and sometimes incrementing workflow storage counts incorrectly. I started trying to address that in this branch but it's a bit involved and may require a migration so best handled separately, I think. Issue: #3011

@tw4l tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch 2 times, most recently from 217e935 to db80a95 Compare November 24, 2025 19:47
@tw4l tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch from db80a95 to c15c1ec Compare November 24, 2025 19:47
@tw4l tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch from 8298fda to 7a0a515 Compare November 24, 2025 21:01
@tw4l tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch from 3ae0565 to 6432f42 Compare November 24, 2025 21:50
ikreymer and others added 2 commits November 25, 2025 11:11
#3013)

… pending, un-uploaded size

- use pending size to determine if quota reached
- also request pause to be set before assuming paused state
- also ensure data is actually committed before shutting down pods (in
case of any edge cases)
- clear paused flag in redis after crawler pods shutdown
- add OpCrawlStats to avoid adding unnecessary profile_update to public
API

this assumes changes in crawler to support: clearing size after WACZ
upload, ensure upload happens if pod starts when crawl is paused

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: When a quota is reached, the crawl should be paused instead of stopped.

3 participants