Skip to content

feat(plugins): add fallback plugin to automatically fallback between Gemini models #88

@benmizrahi

Description

@benmizrahi

🔴 Required Information

Is your feature request related to a specific problem?

When using ADK with models that can return transient HTTP errors such as
rate-limit (429) or gateway timeout (504), there is no built-in mechanism
to:

  1. Guarantee that every new request always starts with the intended primary
    model (non-persistent fallback).
  2. Detect those error responses and record structured metadata about the fallback
    event so that the caller or model layer can take remedial action.

Manually wiring this logic in every agent callback is repetitive, error-prone,
and leaks infrastructure concerns into agent business logic.

Describe the Solution You'd Like

A new prebuilt plugin — FallbackPlugin — that:

  1. Resets the model to root_model before every LLM request via
    before_model_callback, so that any fallback state from a previous failed
    turn does not carry over ("non-persistent fallback").

  2. Detects retriable errors in the response via after_model_callback:
    when LlmResponse.error_code matches one of the configured error_status
    codes ([429, 504] by default) and a fallback_model is configured, it
    writes the following keys to LlmResponse.custom_metadata:

    Key Type Description
    fallback_triggered bool Always True.
    original_model str The root_model value.
    fallback_model str The backup model identifier.
    fallback_attempt int Cumulative attempt count for this context.
    error_code str String representation of the error code.
  3. Tracks per-context attempt counts in an internal dictionary, pruned
    automatically to avoid unbounded memory growth.

The plugin deliberately does not re-issue the request itself; the actual
retry is delegated to the underlying model (e.g. LiteLlm's fallbacks
parameter), keeping the plugin focused on a single responsibility.

API surface:

from google.adk.plugins.fallback_plugin import FallbackPlugin

plugin = FallbackPlugin(
    root_model="gemini-2.0-flash",    # Primary model, always tried first.
    fallback_model="gemini-1.5-pro",  # Backup model recorded in metadata.
    error_status=[429, 504],          # Default retriable HTTP codes.
)

app = App(
    agent=root_agent,
    plugins=[plugin],
)

Impact on your work

Without this plugin, every team using multi-model setups with LiteLlm
fallbacks has to reimplement the same before_model_callback /
after_model_callback boilerplate. This plugin consolidates that pattern
into a single, tested, zero-configuration component that is available
immediately.

This is important for production deployments where models can hit rate limits
and graceful degradation is required.

Willingness to contribute

Yes — implementation, unit tests, sample agent, and README are all
included in the accompanying PR.


🟡 Recommended Information

Describe Alternatives You've Considered

  • Agent-level callbacks: The same logic can be placed in
    before_model_callback / after_model_callback on a single LlmAgent.
    This does not scale — every agent that needs fallback has to duplicate the
    code, and the reset-to-root-model guarantee is easy to forget.
  • LiteLlm fallbacks alone: LiteLlm's native retry handles the actual
    re-request, but provides no per-turn reset guarantee and no structured
    metadata on the LlmResponse. The FallbackPlugin is designed to
    complement, not replace, LiteLlm's built-in mechanism.

Proposed API / Implementation

# src/google/adk/plugins/fallback_plugin.py

class FallbackPlugin(BasePlugin):
    def __init__(
        self,
        name: str = "fallback_plugin",
        root_model: Optional[str] = None,
        fallback_model: Optional[str] = None,
        error_status: list[int] = [429, 504],
    ) -> None: ...

    async def before_model_callback(
        self,
        *,
        callback_context: CallbackContext,
        llm_request: LlmRequest,
    ) -> Optional[LlmResponse]: ...

    async def after_model_callback(
        self,
        *,
        callback_context: CallbackContext,
        llm_response: LlmResponse,
    ) -> Optional[LlmResponse]: ...

Additional Context

  • Pairs naturally with contributing/samples/litellm_with_fallback_models
    which already demonstrates LiteLlm's fallbacks parameter.
  • A new sample contributing/samples/plugin_fallback/ is included with a
    README.md documenting configuration, metadata schema, and usage.
  • The plugin should be listed in the
    Prebuilt Plugins section of
    the documentation alongside GlobalInstructionPlugin,
    ReflectAndRetryToolPlugin, and DebugLoggingPlugin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions