fix: resolve deployment status blocking without clear reason

## Overview
Review and fix deployment protection rule processing to prevent deployments from remaining blocked in "waiting" state without clear reason. Current implementation has several failure modes that leave deployments in an indeterminate state.

## Problem Statement
- Deployments remain blocked even when rules pass
- No clear error messages when deployment approval fails
- Silent failures in callback URL processing
- Missing timeout handling for agent execution
- No retry logic for failed GitHub API calls
- Exception handling doesn't ensure deployment status is set

## Root Causes Identified

### Error Handling Gaps
- Exceptions in `process()` method return `success=False` but don't call callback URL
- Failed `review_deployment_protection_rule` calls don't retry or provide fallback
- Missing validation of callback_url and environment before API calls
- No timeout mechanism for agent execution that could hang

### Deployment Scheduler Issues
- Time-based violations may not properly re-evaluate and approve
- Missing error handling in re-evaluation logic
- No logging for why deployments remain blocked

### API Call Reliability
- No retry logic for transient GitHub API failures
- Missing validation of API response before proceeding
- No fallback mechanism when callback URL is invalid

## Requirements

### Error Handling Improvements
- Always call callback URL even on exceptions (approve with error message or reject appropriately)
- Add timeout wrapper for agent execution (max 30 seconds)
- Validate callback_url and environment before making API calls
- Implement retry logic for GitHub API calls with exponential backoff
- Add comprehensive error logging with deployment context

### Deployment Status Guarantees
- Ensure deployment status is always set (approved or rejected)
- Never leave deployment in "waiting" state without action
- Add fallback approval mechanism for critical failures
- Log all deployment status changes with reasoning

### Monitoring and Debugging
- Add structured logging for deployment processing lifecycle
- Include deployment_id, environment, and callback_url in all log messages
- Track deployment processing time and identify bottlenecks
- Add metrics for deployment approval/rejection rates

### Code Changes
- Update `src/event_processors/deployment_protection_rule.py` error handling
- Add timeout handling in agent execution path
- Implement retry logic in `_approve_deployment` and `_reject_deployment`
- Review and fix `src/tasks/scheduler/deployment_scheduler.py` re-evaluation logic
- Add validation utilities for deployment callback URLs

## Implementation Notes
- Use existing `execute_with_timeout` utility from `src/core/utils/timeout.py`
- Implement retry logic using `retry_with_backoff` from `src/core/utils/retry.py`
- Add deployment status tracking for observability
- Ensure backward compatibility with existing deployment flows
- Add unit tests for error scenarios and timeout handling

## Acceptance Criteria
- Deployments never remain in "waiting" state without action
- All deployment status changes are logged with clear reasoning
- Failed API calls are retried with exponential backoff
- Agent execution has timeout protection
- Error messages clearly indicate why deployment was blocked or approved
- Comprehensive test coverage for error scenarios
- Deployment scheduler properly handles re-evaluation edge cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve deployment status blocking without clear reason #26

Overview

Problem Statement

Root Causes Identified

Error Handling Gaps

Deployment Scheduler Issues

API Call Reliability

Requirements

Error Handling Improvements

Deployment Status Guarantees

Monitoring and Debugging

Code Changes

Implementation Notes

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: resolve deployment status blocking without clear reason #26

Description

Overview

Problem Statement

Root Causes Identified

Error Handling Gaps

Deployment Scheduler Issues

API Call Reliability

Requirements

Error Handling Improvements

Deployment Status Guarantees

Monitoring and Debugging

Code Changes

Implementation Notes

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions