[ENHANCEMENT] Add retry logic with exponential backoff for LLM API calls

## Problem

LLM API calls can fail due to:
- Rate limiting (429 errors)
- Temporary network issues
- Service unavailability (503 errors)
- Timeout errors

Currently, these failures cause immediate workflow termination without retry attempts.

## Proposed Solution

Implement retry logic with exponential backoff for transient failures:

```python
# flo_ai/llm/retry.py
import time
from typing import Callable, TypeVar, Optional
from functools import wraps

T = TypeVar('T')

class RetryConfig:
    def __init__(
        self,
        max_retries: int = 3,
        initial_delay: float = 1.0,
        max_delay: float = 60.0,
        exponential_base: float = 2.0,
        jitter: bool = True
    ):
        self.max_retries = max_retries
        self.initial_delay = initial_delay
        self.max_delay = max_delay
        self.exponential_base = exponential_base
        self.jitter = jitter

def with_retry(config: Optional[RetryConfig] = None):
    """Decorator for retrying LLM API calls with exponential backoff"""
    if config is None:
        config = RetryConfig()
    
    def decorator(func: Callable[..., T]) -> Callable[..., T]:
        @wraps(func)
        def wrapper(*args, **kwargs) -> T:
            last_exception = None
            
            for attempt in range(config.max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except (RateLimitError, TimeoutError, ServiceUnavailableError) as e:
                    last_exception = e
                    
                    if attempt == config.max_retries:
                        raise
                    
                    # Calculate delay with exponential backoff
                    delay = min(
                        config.initial_delay * (config.exponential_base ** attempt),
                        config.max_delay
                    )
                    
                    # Add jitter to prevent thundering herd
                    if config.jitter:
                        delay *= (0.5 + random.random() * 0.5)
                    
                    logger.warning(
                        f"Attempt {attempt + 1}/{config.max_retries} failed: {e}. "
                        f"Retrying in {delay:.2f}s..."
                    )
                    time.sleep(delay)
            
            raise last_exception
        
        return wrapper
    return decorator
```

## Usage

```python
# In LLM client
class OpenAIClient:
    @with_retry(RetryConfig(max_retries=3, initial_delay=1.0))
    def generate(self, prompt: str) -> str:
        return self.client.chat.completions.create(...)
```

## YAML Configuration

```yaml
agents:
  - name: "my_agent"
    model: "gpt-4"
    retry:
      enabled: true
      max_retries: 3
      initial_delay: 1.0
      max_delay: 60.0
      exponential_base: 2.0
      jitter: true
```

## Benefits

1. ✅ Improved reliability for production workflows
2. ✅ Automatic recovery from transient failures
3. ✅ Configurable per-agent
4. ✅ Prevents cascading failures
5. ✅ Better user experience (no manual retries)

## Implementation Checklist

- [ ] Create retry decorator with exponential backoff
- [ ] Add retry configuration to agent schema
- [ ] Integrate with all LLM clients (OpenAI, Anthropic, Gemini)
- [ ] Add retry metrics/logging
- [ ] Update documentation
- [ ] Add tests for retry logic

## Related Issues

- #172 (Gemini looping) - Retry logic could help with transient failures
- #141 (YAML configuration) - Retry config should be YAML-configurable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Add retry logic with exponential backoff for LLM API calls #190

Problem

Proposed Solution

Usage

YAML Configuration

Benefits

Implementation Checklist

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT] Add retry logic with exponential backoff for LLM API calls #190

Description

Problem

Proposed Solution

Usage

YAML Configuration

Benefits

Implementation Checklist

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions