-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Feature Request: Expose configurable timeout parameter for experiment task execution in run_experiment
Summary
The async_run_experiment() function in the Arize SDK does not expose a configurable timeout parameter for task execution. Long-running tasks (>60-120 seconds) are being requeued due to hardcoded timeout limits, causing tasks to execute significantly more times than expected.
Problem Description
When running experiments with long-running tasks (e.g., 60+ seconds per record), the executor times out and requeues tasks, resulting in:
- Tasks executing significantly more times than the dataset size (e.g., 8x more executions for a 4-record dataset)
- Log messages showing
"Worker timeout, requeuing" - Wasted compute resources and potential inconsistent results
Root Cause
The AsyncExecutor used in arize/experimental/datasets/experiments/functions.py → async_run_experiment() has a hardcoded timeout (120s in version 7.51.2) defined in evaluators/executors.py (~line 177). There is currently no way to override this timeout through the public API.
Environment
arizeSDK version: 7.51.2- Python: 3.12
Proposed Solution
Expose a timeout parameter in the run_experiment / async_run_experiment functions, similar to how concurrency is already exposed:
from arize.experimental.datasets import ArizeDatasetsClient
client = ArizeDatasetsClient(developer_key="...", api_key="...")
client.run_experiment(
space_id="...",
experiment_name="my-experiment",
dataset_id="...",
task=my_long_running_task,
evaluators=[...],
concurrency=4,
timeout=300, # NEW: Allow users to configure task timeout (e.g., 5 minutes)
)Additional Context
- The Phoenix SDK's
run_experimentalready supports atimeoutparameter, but Phoenix lacksspace_idsupport needed for Arize platform integration - This is blocking use cases where experiment tasks legitimately require longer execution times (e.g., complex LLM chains, external API calls with latency)