Python: flaky integration tests blocking merge queue (Foundry Hosting + Foundry embeddings)

Two integration tests are flaking in the merge queue and blocking merges into `main`. Both are already marked `@pytest.mark.flaky` and have a 3-attempt retry policy, but all three attempts fail back-to-back. They have been temporarily skipped to unblock the queue (see follow-up PR) and need a real fix.

Observed failing on at least PRs #5301 and #5531 — these flakes are systemic, not PR-specific. Example failing run: https://github.com/microsoft/agent-framework/actions/runs/25083550113

Test 1 — `test_temperature_and_max_tokens`

- **Path:** `packages/foundry_hosting/tests/test_responses_int.py::TestOptions::test_temperature_and_max_tokens`
- **Job:** Python Tests - Foundry Hosting Integration

The test posts a `/responses` request with `max_output_tokens=50` and asserts that the response contains exactly one message-type output:

```python
output_messages = [o for o in body["output"] if o["type"] == "message"]
assert len(output_messages) == 1
```

It fails with `assert 0 == 1` — i.e. the response status is `completed` (HTTP 200), but the `output` array contains no `type == "message"` entries. The captured server log shows `output_count=1`, so an output item exists; it's just not a message (likely a reasoning item only, with the message truncated by the 50-token cap). Failed all 3 attempts.

**Suggested fix:** raise `max_output_tokens` so the model has room to emit a user-visible message after any reasoning output, or relax the assertion to allow any output type while still verifying status/limits were honored.

Test 2 — `test_text_embedding_live`

- **Path:** `packages/foundry/tests/foundry/test_foundry_embedding_client.py::TestFoundryEmbeddingIntegration::test_text_embedding_live`
- **Job:** Python Integration Tests - Foundry

Calls `FoundryEmbeddingClient().get_embeddings(["Hello, world!"])` against the live endpoint. Fails with:

```
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

The Azure SDK got an empty response body from `azure.ai.inference` and tried to `loads('')`. Failed all 3 attempts. This looks like a transient endpoint issue or a missing-credential / 204 response that the client isn't surfacing as a clearer error.

**Suggested fix:** investigate why the embeddings endpoint returns an empty body in CI (auth? region? service-side flake?); also consider catching the empty-response case in the client and raising a typed error rather than letting `json.loads` blow up.

## Why this also keeps biting unrelated PRs

Both tests are gated by the `core` path filter in `.github/workflows/python-merge-tests.yml`:

```yaml
core:
  - 'python/packages/core/agent_framework/_*.py'
  - 'python/packages/core/agent_framework/_workflows/**'
  - 'python/packages/core/agent_framework/exceptions.py'
  - 'python/packages/core/agent_framework/observability.py'
```

Any change under `_workflows/**` triggers every provider's live integration suite via `coreChanged == 'true'`, so a workflows-only PR ends up gated on Foundry / Foundry Hosting / OpenAI / etc. flakes. Worth a follow-up to narrow this filter (e.g. split `workflows` out of `core`) once the flakes themselves are fixed.

## Action items

- [x] Fix `test_temperature_and_max_tokens` (foundry hosting)
- [ ] Fix `test_text_embedding_live` (foundry embeddings)
- [ ] Re-enable both tests (remove `@pytest.mark.skip`)
- [ ] (Optional follow-up) Narrow the `core` path filter so workflows-only PRs don't fan out to every provider

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: flaky integration tests blocking merge queue (Foundry Hosting + Foundry embeddings) #5553

Why this also keeps biting unrelated PRs

Action items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python: flaky integration tests blocking merge queue (Foundry Hosting + Foundry embeddings) #5553

Description

Why this also keeps biting unrelated PRs

Action items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions