Skip to content

[ENHANCEMENT] Add retry logic with exponential backoff for LLM API calls #190

@harikapadia999

Description

@harikapadia999

Problem

LLM API calls can fail due to:

  • Rate limiting (429 errors)
  • Temporary network issues
  • Service unavailability (503 errors)
  • Timeout errors

Currently, these failures cause immediate workflow termination without retry attempts.

Proposed Solution

Implement retry logic with exponential backoff for transient failures:

# flo_ai/llm/retry.py
import time
from typing import Callable, TypeVar, Optional
from functools import wraps

T = TypeVar('T')

class RetryConfig:
    def __init__(
        self,
        max_retries: int = 3,
        initial_delay: float = 1.0,
        max_delay: float = 60.0,
        exponential_base: float = 2.0,
        jitter: bool = True
    ):
        self.max_retries = max_retries
        self.initial_delay = initial_delay
        self.max_delay = max_delay
        self.exponential_base = exponential_base
        self.jitter = jitter

def with_retry(config: Optional[RetryConfig] = None):
    """Decorator for retrying LLM API calls with exponential backoff"""
    if config is None:
        config = RetryConfig()
    
    def decorator(func: Callable[..., T]) -> Callable[..., T]:
        @wraps(func)
        def wrapper(*args, **kwargs) -> T:
            last_exception = None
            
            for attempt in range(config.max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except (RateLimitError, TimeoutError, ServiceUnavailableError) as e:
                    last_exception = e
                    
                    if attempt == config.max_retries:
                        raise
                    
                    # Calculate delay with exponential backoff
                    delay = min(
                        config.initial_delay * (config.exponential_base ** attempt),
                        config.max_delay
                    )
                    
                    # Add jitter to prevent thundering herd
                    if config.jitter:
                        delay *= (0.5 + random.random() * 0.5)
                    
                    logger.warning(
                        f"Attempt {attempt + 1}/{config.max_retries} failed: {e}. "
                        f"Retrying in {delay:.2f}s..."
                    )
                    time.sleep(delay)
            
            raise last_exception
        
        return wrapper
    return decorator

Usage

# In LLM client
class OpenAIClient:
    @with_retry(RetryConfig(max_retries=3, initial_delay=1.0))
    def generate(self, prompt: str) -> str:
        return self.client.chat.completions.create(...)

YAML Configuration

agents:
  - name: "my_agent"
    model: "gpt-4"
    retry:
      enabled: true
      max_retries: 3
      initial_delay: 1.0
      max_delay: 60.0
      exponential_base: 2.0
      jitter: true

Benefits

  1. ✅ Improved reliability for production workflows
  2. ✅ Automatic recovery from transient failures
  3. ✅ Configurable per-agent
  4. ✅ Prevents cascading failures
  5. ✅ Better user experience (no manual retries)

Implementation Checklist

  • Create retry decorator with exponential backoff
  • Add retry configuration to agent schema
  • Integrate with all LLM clients (OpenAI, Anthropic, Gemini)
  • Add retry metrics/logging
  • Update documentation
  • Add tests for retry logic

Related Issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions