-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem Statement
Retry behavior is static in Invoke-IdlePlanObject. This might be too fixed and needs flexible approach.
Summary
Make IdLE's retry policy configurable by the host (executor parameters / engine options), while keeping retries safe-by-default (retry only for errors explicitly marked as transient).
This is a follow-up to Issue #11: Execution: safe retries for transient failures (fail-fast).
Motivation
Right now the engine uses hardcoded defaults for retry behavior (attempt count, backoff, delay). Different hosts/environments may need different values (e.g., CI vs. production, slow vs. fast providers), but we must avoid exposing timing/DoS knobs to untrusted workflow definitions.
Goals
- Allow the host to configure retry behavior per run.
- Preserve safe-by-default semantics:
- Retry only when
Exception.Data['Idle.IsTransient'] = $true(or equivalent marker). - Fail fast for non-transient errors.
- Retry only when
- Validate and normalize options deterministically.
- Keep the core headless (no host-specific dependencies).
Non-goals
- Do not make retry policy configurable via workflow/plan definitions.
- Do not add provider-specific heuristics for transient detection in this issue.
- Do not implement per-step/per-step-type override rules (can be a later enhancement).
Proposal
Public API
Add an optional parameter to the public executor(s), e.g.:
Invoke-IdlePlanandInvoke-IdlePlanObject:-ExecutionOptions(preferred) or-RetryPolicy
Example shape:
$executionOptions = @{
RetryPolicy = @{
MaxAttempts = 3
InitialDelayMilliseconds = 250
BackoffFactor = 2.0
MaxDelayMilliseconds = 5000
JitterRatio = 0.2
}
}
Invoke-IdlePlanObject -Plan $plan -Providers $providers -ExecutionOptions $executionOptionsValidation
- Reject ScriptBlocks in options (same rule as for plan/provider inputs).
- Validate ranges:
MaxAttempts: 1..50InitialDelayMilliseconds: 0..600000BackoffFactor: 1.0..100.0MaxDelayMilliseconds: 0..600000JitterRatio: 0.0..1.0
Defaults
If host does not supply a policy, use current engine defaults:
MaxAttempts=3InitialDelayMilliseconds=250BackoffFactor=2.0MaxDelayMilliseconds=5000JitterRatio=0.2
Security considerations
- Options are trusted host input, not workflow input.
- Workflow/plan remain data-only and must not control retry timing.
- Transient detection remains marker-based to avoid unsafe retries.
Testing
Add/extend Pester tests:
- Host can set
MaxAttempts=1and transient errors fail without retry. - Host can set
MaxAttempts=3and a transient error retries and succeeds. - Invalid values are rejected (range validation).
- Options with ScriptBlocks are rejected.
Mock Start-Sleep to keep tests fast.
Documentation
- Update cmdlet reference (platyPS) after public parameter changes.
- Add a short note in docs about "host-owned execution options" (wherever execution policy is described).
Definition of Done
- Public API updated with optional execution options.
- Input validation added (no script blocks, range checks).
- Unit tests added/updated (Linux + Windows CI green).
- Docs regenerated/updated as needed.
- No changes to workflow/plan schema required.