Joebecher/ccmrg 2042 cap bundle analysis processor at 10 attempts and fix by drazisil-codecov · Pull Request #712 · codecov/umbrella

drazisil-codecov · 2026-02-18T17:21:24Z

bump ci

Note

Medium Risk
Changes core retry/lock-acquisition control flow and error handling for multiple Celery tasks, which can alter when tasks stop retrying and how failures are acknowledged. While well-covered by updated tests, misconfiguration or edge cases could prematurely stop processing or change chain outcomes.

Overview
Aligns retry semantics across workers so max_retries consistently means maximum total attempts (stop when attempts >= max_retries), updating BaseCodecovTask._has_exceeded_max_attempts/safe_retry and related config comments.

Enhances LockManager to track lock-acquisition attempts in Redis (lock_attempts:* with 1-day TTL), raise LockRetry(max_retries_exceeded=True) when the counter hits the cap, and clear the counter after a successful lock; tasks (bundle_analysis_processor, manual_trigger, preprocess_upload, upload_finisher, and notification/finisher tasks) now honor this cap to avoid infinite retry loops on lock contention or message re-delivery.

Refactors bundle_analysis_processor error paths to log consistently and to set upload state to error (best-effort commit) while returning previous_result instead of re-raising in several failure cases, preserving Celery chain behavior; updates unit/integration tests accordingly (notably mocking redis.incr and asserting the new attempt-based boundaries/log fields).

^{Written by Cursor Bugbot for commit 1fe32c5. This will update automatically on new commits. Configure here.}

…ry logic Cap total attempts at 10 (not 10 retries + 1) for BundleAnalysisProcessorTask and LockManager so we stop after 10 tries. Add Redis-backed attempt counter in LockManager for lock contention so broker re-deliveries with unchanged headers do not retry indefinitely. BaseCodecovTask._has_exceeded_max_attempts now takes max_attempts and compares to attempts (retries + 1 or header). On generic exception in bundle processor, return and set upload to error instead of re-raising to avoid unbounded retries. Update tests: mock request for _has_exceeded_max_attempts, set mock_redis.incr.return_value where LockManager compares attempts, and adjust cleanup test to expect return instead of raised ValueError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

- LockManager: extract _clear_lock_attempt_counter to remove nested try in locked() - Upload finisher: log max_attempts as UPLOAD_PROCESSOR_MAX_RETRIES (not +1) - Lock_manager: comment TTL intent instead of restating 24h - Tests: remove hard-coded (10) from comments; use max_attempts wording Co-authored-by: Cursor <cursoragent@cursor.com>

- BaseCodecovTask: doc and safe_retry use max_retries; drop max_attempts property - LockRetry: max_attempts -> max_retries (same semantics: max total attempts) - LockManager/bundle_analysis_processor/upload_finisher: log and Sentry use max_retries - Tests: LockRetry(max_retries=...), comments say max_retries - celery_config: one-line convention (max_retries = max total attempts) - Fix duplicate dict keys in lock_manager and upload_finisher Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

…ssor-at-10-attempts-and-fix Resolve lock_manager conflict: keep attempt counter and max_retries logic, use self.base_retry_countdown (from main) for countdown calculation. Co-authored-by: Cursor <cursoragent@cursor.com>

…test LockManager uses redis incr for attempt count; mock must return an int so attempts >= max_retries does not raise TypeError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

When LockManager's Redis attempt counter hits max_retries before the task's attempt count (e.g. re-deliveries), it raises LockRetry(max_retries_exceeded=True, countdown=0). ManualTriggerTask only checked self._has_exceeded_max_attempts(), so it fell through to self.retry(countdown=0) and caused rapid zero-delay retries. Align with preprocess_upload and other callers: check retry.max_retries_exceeded or self._has_exceeded_max_attempts() and return failure dict when either is true. Add test for the Redis-counter path. Co-authored-by: Cursor <cursoragent@cursor.com>

- Return processing_results in max-retries-exceeded path for consistent defensive isinstance behavior (bundle_analysis_processor) - Consolidate redis exception imports; add blank line between lock_name and attempt_key in LockManager.locked() (lock_manager) Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

linear · 2026-02-18T17:21:28Z

CCMRG-2042 Cap bundle analysis processor at 10 attempts and fix infinite retry

sentry · 2026-02-18T17:25:34Z

apps/worker/services/lock_manager.py

-                        "max_attempts": max_attempts,
                        "max_retries": max_retries,
                        "repoid": self.repoid,


Bug: The Redis INCR and EXPIRE commands for the lock attempt counter are not atomic. A failure between these calls can create a counter with no TTL, permanently blocking the lock.
_{Severity: HIGH}

Suggested Fix

To ensure atomicity, use a Redis transaction (MULTI/EXEC) or a Lua script to perform the INCR and EXPIRE operations together. This guarantees that either both commands succeed or neither does, preventing the creation of a counter key without a TTL.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: apps/worker/services/lock_manager.py#L179-L181 Potential issue: When lock acquisition fails, the code increments a Redis counter (`INCR`) and then sets a TTL on it (`EXPIRE`). These are two separate, non-atomic operations. If the `INCR` call succeeds but the `EXPIRE` call fails (e.g., due to a Redis timeout or connection error), the counter key will be created without a TTL. If this counter reaches the maximum retry limit, it will never be cleared, as clearing only happens on successful lock acquisition. This will permanently block any future attempts to acquire the same lock, effectively poisoning it.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

codspeed-hq · 2026-02-18T17:29:16Z

Merging this PR will not alter performance

✅ 9 untouched benchmarks

_{Comparing joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix (1fe32c5) with main (afa4356)¹}

No successful run was found on main (9b95e6b) during the generation of this report, so afa4356 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

drazisil-codecov and others added 7 commits February 6, 2026 13:04

test(worker): set mock_redis.incr.return_value in BA retry countdown …

279dbe8

…test LockManager uses redis incr for attempt count; mock must return an int so attempts >= max_retries does not raise TypeError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

sentry bot reviewed Feb 18, 2026

View reviewed changes

drazisil-codecov enabled auto-merge February 18, 2026 17:33

jason-ford-codecov approved these changes Feb 18, 2026

View reviewed changes

drazisil-codecov added this pull request to the merge queue Feb 18, 2026

Merged via the queue into main with commit fe32d2d Feb 18, 2026
102 of 103 checks passed

drazisil-codecov deleted the joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix branch February 18, 2026 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joebecher/ccmrg 2042 cap bundle analysis processor at 10 attempts and fix#712

Joebecher/ccmrg 2042 cap bundle analysis processor at 10 attempts and fix#712
drazisil-codecov merged 7 commits intomainfrom
joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix

drazisil-codecov commented Feb 18, 2026 •

edited by cursor bot

Loading

Uh oh!

linear bot commented Feb 18, 2026

Uh oh!

sentry bot Feb 18, 2026

Uh oh!

codspeed-hq bot commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

drazisil-codecov commented Feb 18, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Feb 18, 2026

Uh oh!

sentry bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Feb 18, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

drazisil-codecov commented Feb 18, 2026 •

edited by cursor bot

Loading