Skip to content

Feature Request: Add async support for Generative AI Inference client #836

@fede-kamel

Description

@fede-kamel

Feature Request

Description

Add native async/await support for the OCI Generative AI Inference client to enable non-blocking concurrent requests in async applications.

Problem Statement

The current SDK uses synchronous HTTP requests via the requests library. This causes issues in async applications:

  1. Event loop blocking: Sync calls block the event loop in FastAPI, async agents, and other async frameworks
  2. Limited concurrency: Cannot efficiently make concurrent API calls
  3. Performance bottleneck: Sequential requests are significantly slower than concurrent alternatives

Proposed Solution

Add an AsyncGenerativeAiInferenceClient class that:

  • Uses aiohttp for true async HTTP requests
  • Reuses the existing OCI Signer for authentication
  • Provides async versions of all GenAI operations (chat, streaming, embeddings, etc.)
  • Supports async context manager pattern

Example Usage

import asyncio
from oci.generative_ai_inference import AsyncGenerativeAiInferenceClient

async def main():
    async with AsyncGenerativeAiInferenceClient(config) as client:
        # Concurrent requests - 3x faster than sequential
        results = await asyncio.gather(
            client.chat(details1),
            client.chat(details2),
            client.chat(details3),
        )

asyncio.run(main())

Performance Impact

Testing shows 2-3.5x throughput improvement for concurrent workloads:

Scenario Sequential Concurrent Speedup
3 requests (Llama 3.3) 1.30s 0.64s 2.01x
3 requests (Llama 3.2) 1.40s 0.44s 3.18x
3 requests (Cohere) 0.50s 0.14s 3.54x

Use Cases

  1. FastAPI/async web frameworks: Non-blocking GenAI calls in async endpoints
  2. LangChain agents: Concurrent tool calls and chain execution
  3. Batch processing: Parallel processing of multiple prompts
  4. Real-time applications: Low-latency streaming responses

Implementation

A reference implementation is provided in PR #835 with:

  • Full async client implementation
  • 15 unit tests
  • 7 integration tests
  • Tested on Python 3.9, 3.12, 3.13, 3.14
  • Tested with 6 different models

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions