Skip to content

Feature Request: Expose configurable timeout parameter for experiment task execution in run_experiment #121

@techno-anthropology

Description

@techno-anthropology

Feature Request: Expose configurable timeout parameter for experiment task execution in run_experiment

Summary

The async_run_experiment() function in the Arize SDK does not expose a configurable timeout parameter for task execution. Long-running tasks (>60-120 seconds) are being requeued due to hardcoded timeout limits, causing tasks to execute significantly more times than expected.

Problem Description

When running experiments with long-running tasks (e.g., 60+ seconds per record), the executor times out and requeues tasks, resulting in:

  • Tasks executing significantly more times than the dataset size (e.g., 8x more executions for a 4-record dataset)
  • Log messages showing "Worker timeout, requeuing"
  • Wasted compute resources and potential inconsistent results

Root Cause

The AsyncExecutor used in arize/experimental/datasets/experiments/functions.pyasync_run_experiment() has a hardcoded timeout (120s in version 7.51.2) defined in evaluators/executors.py (~line 177). There is currently no way to override this timeout through the public API.

Environment

  • arize SDK version: 7.51.2
  • Python: 3.12

Proposed Solution

Expose a timeout parameter in the run_experiment / async_run_experiment functions, similar to how concurrency is already exposed:

from arize.experimental.datasets import ArizeDatasetsClient

client = ArizeDatasetsClient(developer_key="...", api_key="...")

client.run_experiment(
    space_id="...",
    experiment_name="my-experiment",
    dataset_id="...",
    task=my_long_running_task,
    evaluators=[...],
    concurrency=4,
    timeout=300,  # NEW: Allow users to configure task timeout (e.g., 5 minutes)
)

Additional Context

  • The Phoenix SDK's run_experiment already supports a timeout parameter, but Phoenix lacks space_id support needed for Arize platform integration
  • This is blocking use cases where experiment tasks legitimately require longer execution times (e.g., complex LLM chains, external API calls with latency)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions