Skip to content

Joebecher/ccmrg 2042 cap bundle analysis processor at 10 attempts and fix#712

Merged
drazisil-codecov merged 7 commits intomainfrom
joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix
Feb 18, 2026
Merged

Joebecher/ccmrg 2042 cap bundle analysis processor at 10 attempts and fix#712
drazisil-codecov merged 7 commits intomainfrom
joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix

Conversation

@drazisil-codecov
Copy link
Contributor

@drazisil-codecov drazisil-codecov commented Feb 18, 2026

bump ci


Note

Medium Risk
Changes core retry/lock-acquisition control flow and error handling for multiple Celery tasks, which can alter when tasks stop retrying and how failures are acknowledged. While well-covered by updated tests, misconfiguration or edge cases could prematurely stop processing or change chain outcomes.

Overview
Aligns retry semantics across workers so max_retries consistently means maximum total attempts (stop when attempts >= max_retries), updating BaseCodecovTask._has_exceeded_max_attempts/safe_retry and related config comments.

Enhances LockManager to track lock-acquisition attempts in Redis (lock_attempts:* with 1-day TTL), raise LockRetry(max_retries_exceeded=True) when the counter hits the cap, and clear the counter after a successful lock; tasks (bundle_analysis_processor, manual_trigger, preprocess_upload, upload_finisher, and notification/finisher tasks) now honor this cap to avoid infinite retry loops on lock contention or message re-delivery.

Refactors bundle_analysis_processor error paths to log consistently and to set upload state to error (best-effort commit) while returning previous_result instead of re-raising in several failure cases, preserving Celery chain behavior; updates unit/integration tests accordingly (notably mocking redis.incr and asserting the new attempt-based boundaries/log fields).

Written by Cursor Bugbot for commit 1fe32c5. This will update automatically on new commits. Configure here.

drazisil-codecov and others added 7 commits February 6, 2026 13:04
…ry logic

Cap total attempts at 10 (not 10 retries + 1) for BundleAnalysisProcessorTask
and LockManager so we stop after 10 tries. Add Redis-backed attempt counter
in LockManager for lock contention so broker re-deliveries with unchanged
headers do not retry indefinitely. BaseCodecovTask._has_exceeded_max_attempts
now takes max_attempts and compares to attempts (retries + 1 or header).
On generic exception in bundle processor, return and set upload to error
instead of re-raising to avoid unbounded retries. Update tests: mock request
for _has_exceeded_max_attempts, set mock_redis.incr.return_value where
LockManager compares attempts, and adjust cleanup test to expect return
instead of raised ValueError.

Refs CCMRG-2042

Co-authored-by: Cursor <cursoragent@cursor.com>
- LockManager: extract _clear_lock_attempt_counter to remove nested try in locked()
- Upload finisher: log max_attempts as UPLOAD_PROCESSOR_MAX_RETRIES (not +1)
- Lock_manager: comment TTL intent instead of restating 24h
- Tests: remove hard-coded (10) from comments; use max_attempts wording

Co-authored-by: Cursor <cursoragent@cursor.com>
- BaseCodecovTask: doc and safe_retry use max_retries; drop max_attempts property
- LockRetry: max_attempts -> max_retries (same semantics: max total attempts)
- LockManager/bundle_analysis_processor/upload_finisher: log and Sentry use max_retries
- Tests: LockRetry(max_retries=...), comments say max_retries
- celery_config: one-line convention (max_retries = max total attempts)
- Fix duplicate dict keys in lock_manager and upload_finisher

Refs CCMRG-2042

Co-authored-by: Cursor <cursoragent@cursor.com>
…ssor-at-10-attempts-and-fix

Resolve lock_manager conflict: keep attempt counter and max_retries logic,
use self.base_retry_countdown (from main) for countdown calculation.

Co-authored-by: Cursor <cursoragent@cursor.com>
…test

LockManager uses redis incr for attempt count; mock must return an int
so attempts >= max_retries does not raise TypeError.

Refs CCMRG-2042

Co-authored-by: Cursor <cursoragent@cursor.com>
When LockManager's Redis attempt counter hits max_retries before the task's
attempt count (e.g. re-deliveries), it raises LockRetry(max_retries_exceeded=True,
countdown=0). ManualTriggerTask only checked self._has_exceeded_max_attempts(),
so it fell through to self.retry(countdown=0) and caused rapid zero-delay retries.

Align with preprocess_upload and other callers: check retry.max_retries_exceeded
or self._has_exceeded_max_attempts() and return failure dict when either is true.
Add test for the Redis-counter path.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Return processing_results in max-retries-exceeded path for consistent
  defensive isinstance behavior (bundle_analysis_processor)
- Consolidate redis exception imports; add blank line between lock_name
  and attempt_key in LockManager.locked() (lock_manager)

Refs CCMRG-2042

Co-authored-by: Cursor <cursoragent@cursor.com>
@linear
Copy link

linear bot commented Feb 18, 2026

Comment on lines -179 to 181
"max_attempts": max_attempts,
"max_retries": max_retries,
"repoid": self.repoid,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The Redis INCR and EXPIRE commands for the lock attempt counter are not atomic. A failure between these calls can create a counter with no TTL, permanently blocking the lock.
Severity: HIGH

Suggested Fix

To ensure atomicity, use a Redis transaction (MULTI/EXEC) or a Lua script to perform the INCR and EXPIRE operations together. This guarantees that either both commands succeed or neither does, preventing the creation of a counter key without a TTL.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: apps/worker/services/lock_manager.py#L179-L181

Potential issue: When lock acquisition fails, the code increments a Redis counter
(`INCR`) and then sets a TTL on it (`EXPIRE`). These are two separate, non-atomic
operations. If the `INCR` call succeeds but the `EXPIRE` call fails (e.g., due to a
Redis timeout or connection error), the counter key will be created without a TTL. If
this counter reaches the maximum retry limit, it will never be cleared, as clearing only
happens on successful lock acquisition. This will permanently block any future attempts
to acquire the same lock, effectively poisoning it.

Did we get this right? 👍 / 👎 to inform future reviews.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 18, 2026

Merging this PR will not alter performance

✅ 9 untouched benchmarks


Comparing joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix (1fe32c5) with main (afa4356)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (9b95e6b) during the generation of this report, so afa4356 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@drazisil-codecov drazisil-codecov added this pull request to the merge queue Feb 18, 2026
Merged via the queue into main with commit fe32d2d Feb 18, 2026
102 of 103 checks passed
@drazisil-codecov drazisil-codecov deleted the joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix branch February 18, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments