ci(conformance): pin harness to 0.2.0-alpha.3 with expected-failures baseline#2877
Merged
Conversation
…baseline Modernizes the conformance CI to match the typescript-sdk pattern: - Pin @modelcontextprotocol/conformance via a single workflow-level CONFORMANCE_VERSION env var (was 0.1.10/0.1.13, drifted between jobs). - Add .github/actions/conformance/expected-failures.yml so the runner exits 0 on known-failing scenarios and 1 on regressions or stale entries. Client baseline is 16 scenarios grouped by SEP; server active suite is fully green. - Drop continue-on-error from both jobs so conformance actually gates. - Harden run-server.sh with a port-conflict guard, dead-process check, and curl --max-time, mirroring typescript-sdk's wrapper. The server --suite draft step (2026-07-28 scenarios) is a follow-up; the baseline file already has a placeholder for it. Supersedes #1921.
felixweinberger
approved these changes
Jun 15, 2026
| run: >- | ||
| npx --yes @modelcontextprotocol/conformance@"$CONFORMANCE_VERSION" client | ||
| --command 'uv run --frozen python .github/actions/conformance/client.py' | ||
| --suite all |
Contributor
There was a problem hiding this comment.
I'm actually not sure all is the correct one, counterintuitively - I believe it has 2 scenarios that aren't actually load bearing (I think something related to old client auth cases from a pre-2025-11-25 spec version)
Contributor
There was a problem hiding this comment.
Land it for now though, we can fix if we end up with remaining failures that aren't covered by any SEPs
| - auth/scope-step-up | ||
| # SEP-990 (enterprise-managed authorization extension): no fixture handler / | ||
| # client support for the token-exchange + JWT bearer flow. | ||
| - auth/enterprise-managed-authorization |
Contributor
There was a problem hiding this comment.
Do we not have EMA in Python at all? Do we need to build this into Python potentially @pcarleton?
Contributor
Author
There was a problem hiding this comment.
yea it does not apparently
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Modernizes the conformance CI to match the typescript-sdk pattern so the job actually gates and the failure set burns down per SEP. Supersedes #1921 (the composite-action approach) — pinning the npm package directly is what typescript-sdk settled on.
Motivation and Context
The conformance jobs were pinned to
0.1.10/0.1.13(drifted between server and client), ran withcontinue-on-error: trueso they never gated, and had no baseline file — failures were silently ignored. The conformance harness has since gained--expected-failures(#113) and the0.2.0-alphaline with version-aware scenario selection.Changes
CONFORMANCE_VERSION: "0.2.0-alpha.3"env var (one place to bump)..github/actions/conformance/expected-failures.yml— the path the conformance repo'sknown-sdks.tsalready expects. Client baseline is 16 scenarios grouped by SEP (same set as typescript-sdk's baseline); serveractivesuite is fully green soserver: [].continue-on-error: truefrom both jobs. The runner now exits 0 only when failures match the baseline, 1 on regressions or stale entries.run-server.shwith a port-conflict guard, dead-process check in the readiness loop, andcurl --max-time, mirroring typescript-sdk's wrapper.The server
--suite draftstep (2026-07-28 scenarios) is a follow-up PR; the baseline file already has aserver:placeholder for it.How Has This Been Tested?
Run locally against the conformance harness at
0.2.0-alpha.3:server --suite active --expected-failures ...→ 30/30 scenarios pass, 42 assertions, exit 0.client --suite all --expected-failures ...→ 40 scenarios: 24 pass, 16 expected-fail, 0 unexpected, 0 stale, exit 0.initialize) to the baseline → exit 1 withStale baseline entries (now passing - remove from baseline): initialize.bash -n, YAML parse, and pre-commit all clean.Breaking Changes
None for SDK users. For contributors: the conformance jobs now fail on regressions instead of silently passing.
run-server.shrequiresCONFORMANCE_VERSIONto be set when run locally (the error message points at the workflow pin).Types of changes
Checklist
Additional context
run-server.shtov1.xwill need theenv: CONFORMANCE_VERSIONblock backported in the same change (the script now reads it via${CONFORMANCE_VERSION:?}).python-sdk(main) entry toknown-sdks.tsalongside the existingpython-sdk-v1.AI Disclaimer