diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5c0c1c4..1ec752b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -55,6 +55,14 @@ bash -n scripts/doctor.sh bash -n scripts/lint.sh ``` +External interoperability experiments stay outside the default regression baseline. When you need to reproduce current official-tool behavior, run: + +```bash +bash ./scripts/conformance.sh +``` + +Treat that output as investigation input. Do not fold it into `doctor.sh` or the default CI quality gate unless the repository explicitly decides to promote a specific experiment into a maintained policy. + If you change extension methods, extension metadata, or Agent Card/OpenAPI contract surfaces, also run: ```bash diff --git a/docs/conformance-triage.md b/docs/conformance-triage.md new file mode 100644 index 0000000..c55544d --- /dev/null +++ b/docs/conformance-triage.md @@ -0,0 +1,89 @@ +# External Conformance Triage + +This document records the first local `./scripts/conformance.sh mandatory` run against the official `a2aproject/a2a-tck` using the repository's dummy-backed SUT. + +## Standards Used For Triage + +- `a2a-sdk==0.3.25` as installed in this repository: + - `AgentCard` uses `additionalInterfaces`, not `supportedInterfaces`. + - JSON-RPC request models use `message/send`, `tasks/get`, `tasks/cancel`, and `agent/getAuthenticatedExtendedCard`. + - The installed SDK does not expose a JSON-RPC `ListTasks` request model. +- A2A v0.3.0 specification: + - JSON-RPC methods use the `{category}/{action}` pattern such as `message/send` and `tasks/get`. + - Transport declarations use `preferredTransport` plus `additionalInterfaces`. + - The method mapping table lists `tasks/list` as gRPC/REST only. +- Repository compatibility policy: + - `A2A-Version` negotiation supports both `0.3` and `1.0`. + - Payloads still follow the shipped `0.3` SDK baseline. + - `1.0` compatibility is currently documented as partial rather than complete. + +## Classification Labels + +- `TCK issue`: the failing expectation conflicts with `a2a-sdk==0.3.25` and the v0.3.0 baseline used by this repository. +- `TCK issue; also a repo v1.0 gap`: the exact failure is caused by a TCK mismatch, but the same area would still need extra work for stronger `1.0` compatibility. +- `TCK issue / local experiment artifact`: the failure comes from an aggressive heuristic or from local dummy-run characteristics and should not be treated as a runtime protocol bug. + +## Per-Test Triage + +- `tests/mandatory/authentication/test_auth_compliance_v030.py::test_security_scheme_structure_compliance`: `TCK issue`. The TCK expects each `securitySchemes` entry to be wrapped as `{httpAuthSecurityScheme: {...}}`, but `a2a-sdk==0.3.25` exposes the flattened OpenAPI-shaped object with fields like `type`, `scheme`, `description`, and `bearerFormat`. +- `tests/mandatory/authentication/test_auth_enforcement.py::test_authentication_scheme_consistency`: `TCK issue`. Same root cause as the previous test: the TCK validates a non-SDK wrapper shape instead of the installed SDK schema. +- `tests/mandatory/jsonrpc/test_a2a_error_codes_enhanced.py::test_push_notification_not_supported_error_32003_enhanced`: `TCK issue`. The failure is a TCK helper bug: `transport_create_task_push_notification_config()` is called with the wrong positional signature before the runtime behavior is even exercised. +- `tests/mandatory/jsonrpc/test_json_rpc_compliance.py::test_rejects_invalid_json_rpc_requests[invalid_request4--32602]`: `TCK issue`. The test sends JSON-RPC method `SendMessage`; under the v0.3.0 / SDK 0.3.25 baseline the correct method is `message/send`, so the runtime correctly returns `-32601` for an unknown method instead of `-32602`. +- `tests/mandatory/jsonrpc/test_json_rpc_compliance.py::test_rejects_invalid_params`: `TCK issue`. Same method-name mismatch as above; with the correct `message/send` method the runtime returns `-32602` for invalid parameters. +- `tests/mandatory/jsonrpc/test_protocol_violations.py::test_duplicate_request_ids`: `TCK issue`. The first request already fails because the TCK uses `SendMessage` instead of `message/send`, so the duplicate-ID assertion never reaches the actual duplicate-ID behavior. +- `tests/mandatory/protocol/test_a2a_v030_new_methods.py::TestMethodMappingCompliance::test_core_method_mapping_compliance`: `TCK issue; also a repo v1.0 gap`. The JSON-RPC client uses PascalCase methods (`SendMessage`, `GetTask`, `CancelTask`) that do not match the v0.3.0 JSON-RPC mapping, but the repository also does not currently provide PascalCase aliases even when `A2A-Version: 1.0` is negotiated. +- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_valid_text`: `TCK issue; also a repo v1.0 gap`. The failing request uses `SendMessage` over JSON-RPC; the repository correctly supports `message/send` for the current SDK baseline, but not the PascalCase alias. +- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_invalid_params`: `TCK issue; also a repo v1.0 gap`. Direct cause is the same PascalCase JSON-RPC method mismatch. +- `tests/mandatory/protocol/test_message_send_method.py::test_message_send_continue_task`: `TCK issue; also a repo v1.0 gap`. Direct cause is again `SendMessage` instead of `message/send`. +- `tests/mandatory/protocol/test_state_transitions.py::test_task_history_length`: `TCK issue; also a repo v1.0 gap`. Task creation fails only because the TCK uses `SendMessage` on JSON-RPC. +- `tests/mandatory/protocol/test_tasks_cancel_method.py::test_tasks_cancel_valid`: `TCK issue; also a repo v1.0 gap`. The fixture cannot create a task because the TCK uses `SendMessage`; the runtime's `tasks/cancel` behavior is not the direct failing cause in this run. +- `tests/mandatory/protocol/test_tasks_cancel_method.py::test_tasks_cancel_nonexistent`: `TCK issue; also a repo v1.0 gap`. The TCK calls JSON-RPC `CancelTask`; under the v0.3.0 baseline the method is `tasks/cancel`. With the correct method, the runtime returns `Task not found` / `-32001`. +- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_valid`: `TCK issue; also a repo v1.0 gap`. The task-creation fixture fails first because the TCK uses `SendMessage`. +- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_with_history_length`: `TCK issue; also a repo v1.0 gap`. Same fixture failure via `SendMessage`. +- `tests/mandatory/protocol/test_tasks_get_method.py::test_tasks_get_nonexistent`: `TCK issue; also a repo v1.0 gap`. The TCK calls JSON-RPC `GetTask`; under the v0.3.0 baseline the method is `tasks/get`. With the correct method, the runtime returns `Task not found` / `-32001`. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_all_tasks`: `TCK issue; also a repo v1.0 gap`. The test suite uses JSON-RPC `ListTasks`, which is outside the `a2a-sdk==0.3.25` JSON-RPC surface and outside the v0.3.0 JSON-RPC mapping. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_empty_when_none_exist`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_validates_required_fields`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestBasicListing::test_list_tasks_sorted_by_timestamp_descending`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_context_id`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_status`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_filter_by_last_updated_after`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestFiltering::test_combined_filters`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_default_page_size`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_custom_page_size`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_page_token_navigation`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_last_page_detection`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestPagination::test_total_size_accuracy`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_zero`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_custom`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestHistoryLimiting::test_history_length_exceeds_actual`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestArtifactInclusion::test_artifacts_excluded_by_default`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestArtifactInclusion::test_artifacts_included_when_requested`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_page_token_error`: `TCK issue; also a repo v1.0 gap`. The assertion expects JSON-RPC param validation on `ListTasks`, but the direct failure is still that `ListTasks` is not a supported JSON-RPC method in the current SDK baseline. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_status_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_negative_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_zero_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_out_of_range_page_size_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_default_page_size_is_50`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_negative_history_length_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/protocol/test_tasks_list_method.py::TestEdgeCasesAndErrors::test_invalid_timestamp_error`: `TCK issue; also a repo v1.0 gap`. Same JSON-RPC `ListTasks` mismatch. +- `tests/mandatory/security/test_agent_card_security.py::test_public_agent_card_access_control`: `TCK issue`. The TCK requires `supportedInterfaces`, but `a2a-sdk==0.3.25` and the v0.3.0 specification use `additionalInterfaces`. +- `tests/mandatory/security/test_agent_card_security.py::test_sensitive_information_protection`: `TCK issue / local experiment artifact`. The failure is driven by heuristic keyword scanning (`token`, `private`, `127.0.0.1`, non-standard port) against a local dummy-backed run. That is not a reliable indicator of protocol non-compliance. +- `tests/mandatory/security/test_agent_card_security.py::test_security_scheme_consistency`: `TCK issue`. Same schema mismatch as the earlier authentication tests: the TCK expects wrapped security scheme objects instead of the installed SDK shape. +- `tests/mandatory/transport/test_multi_transport_equivalence.py::test_message_sending_equivalence`: `TCK issue; also a repo v1.0 gap`. The transport client uses JSON-RPC `SendMessage`; under the v0.3.0 baseline the method is `message/send`, but stronger `1.0` compatibility would still require additional alias handling. +- `tests/mandatory/transport/test_multi_transport_equivalence.py::test_concurrent_operation_equivalence`: `TCK issue; also a repo v1.0 gap`. Same direct cause as the previous test: the JSON-RPC client sends `SendMessage`. + +## Adjacent Repository Gaps Found During Triage + +These did not directly cause the exact failed node IDs above, but they are real repository-side gaps revealed during follow-up probes: + +- `A2A-Version: 1.0` still returns `-32601` for JSON-RPC `SendMessage` and `GetExtendedAgentCard`. That means current `1.0` support is still limited to negotiation and error-shaping rather than full method-surface compatibility. +- `GET /v1/tasks` currently returns `500 NotImplementedError` in a local probe, even though the route exists and repository docs describe the SDK-owned REST surface as including task listing. That behavior should be treated as a repository issue independent from the TCK's incorrect JSON-RPC `ListTasks` expectation. + +## Summary + +For the exact 47 failed/error cases in the first mandatory run: + +- No failure is a clean `a2a-sdk==0.3.25` / v0.3.0 conformance bug in the current runtime. +- Most failures come from TCK method/schema assumptions that do not match the shipped SDK baseline. +- Several failures also highlight future repository work if stronger `1.0` compatibility becomes a goal. diff --git a/docs/conformance.md b/docs/conformance.md new file mode 100644 index 0000000..87dc3c6 --- /dev/null +++ b/docs/conformance.md @@ -0,0 +1,67 @@ +# External Conformance Experiments + +This repository keeps internal regression and external interoperability experiments separate on purpose. + +## Scope + +- `./scripts/doctor.sh` remains the primary internal regression entrypoint. +- `./scripts/conformance.sh` is a local/manual experiment entrypoint for official external tooling. +- External conformance output should be treated as investigation input, not as an automatic merge gate. + +## Current Experiment Shape + +The default `./scripts/conformance.sh` workflow does the following: + +1. Sync the repository environment unless explicitly skipped. +2. Cache or refresh the official `a2aproject/a2a-tck` checkout. +3. Start a local dummy-backed `opencode-a2a` runtime unless `CONFORMANCE_SUT_URL` points to an existing SUT. +4. Run the requested TCK category, defaulting to `mandatory`. +5. Preserve raw logs and machine-readable reports under `run/conformance//`. + +The default local SUT uses the repository test double `DummyChatOpencodeUpstreamClient`. That keeps the experiment reproducible without requiring a live OpenCode upstream. + +## Usage + +Run the default mandatory experiment: + +```bash +bash ./scripts/conformance.sh +``` + +Run a different TCK category: + +```bash +bash ./scripts/conformance.sh capabilities +``` + +Target an already running runtime instead of the local dummy-backed SUT: + +```bash +CONFORMANCE_SUT_URL=http://127.0.0.1:8000 \ +A2A_AUTH_TYPE=bearer \ +A2A_AUTH_TOKEN=dev-token \ +bash ./scripts/conformance.sh mandatory +``` + +## Artifacts + +Each run keeps the following artifacts in the selected output directory: + +- `agent-card.json`: fetched public Agent Card +- `health.json`: fetched authenticated health payload when the local SUT is used +- `tck.log`: raw TCK console output +- `pytest-report.json`: pytest-json-report output emitted by the TCK runner +- `failed-tests.json`: compact list of failed/error node IDs for triage +- `metadata.json`: experiment metadata including local repo commit and cached TCK commit + +## Interpretation Guidance + +When a TCK run fails, inspect the raw report before changing the runtime: + +- Some failures may point to real runtime gaps. +- Some failures may come from TCK assumptions that do not match `a2a-sdk==0.3.25`. +- Some failures may come from A2A v0.3 versus v1.0 naming or schema drift. + +The experiment is useful only if those categories stay separate during triage. + +The current first-pass triage is recorded in [`./conformance-triage.md`](./conformance-triage.md). diff --git a/scripts/README.md b/scripts/README.md index 43a4270..f091b1c 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -12,6 +12,7 @@ Executable scripts live in this directory. This file is the entry index for the ## Other Scripts - [`doctor.sh`](./doctor.sh): primary local development regression entrypoint (uv sync + lint + tests + coverage) +- [`conformance.sh`](./conformance.sh): local/manual external A2A conformance experiment entrypoint; caches the official TCK, can launch a dummy-backed local SUT, and preserves raw artifacts under `run/conformance/` - [`dependency_health.sh`](./dependency_health.sh): development dependency review entrypoint (`sync`/`pip check` + outdated + dev audit), while blocking CI/publish audits focus on runtime dependencies - [`check_coverage.py`](./check_coverage.py): enforces the overall coverage floor and per-file minimums for critical modules - [`lint.sh`](./lint.sh): lint helper @@ -20,3 +21,4 @@ Executable scripts live in this directory. This file is the entry index for the ## Notes - `doctor.sh` and `dependency_health.sh` intentionally remain separate entrypoints and share common prerequisites through [`health_common.sh`](./health_common.sh). +- External conformance experiments remain intentionally separate from the default regression path. See [`../docs/conformance.md`](../docs/conformance.md). diff --git a/scripts/conformance.sh b/scripts/conformance.sh new file mode 100644 index 0000000..b16c3dd --- /dev/null +++ b/scripts/conformance.sh @@ -0,0 +1,248 @@ +#!/usr/bin/env bash +# Run a local-only external A2A conformance experiment without changing default repo regression gates. +set -euo pipefail + +# shellcheck source=./health_common.sh +source "$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/health_common.sh" + +usage() { + cat <<'EOF' +Usage: + bash ./scripts/conformance.sh [category] + +Purpose: + Run the official A2A TCK as a local/manual experiment. + This script is intentionally separate from doctor.sh and CI quality gates. + +Category: + Defaults to "mandatory". Any category supported by a2aproject/a2a-tck run_tck.py is accepted. + +Selected environment variables: + CONFORMANCE_OUTPUT_DIR Override artifact directory (default: run/conformance/) + CONFORMANCE_TCK_DIR Override cached TCK checkout path (default: .cache/a2a-tck) + CONFORMANCE_TCK_REPO Override TCK repo URL (default: https://github.com/a2aproject/a2a-tck.git) + CONFORMANCE_TCK_REF Override TCK git ref (default: main) + CONFORMANCE_TRANSPORTS Override requested transports (default: jsonrpc) + CONFORMANCE_TRANSPORT_STRATEGY Override TCK transport strategy (default: agent_preferred) + CONFORMANCE_SUT_URL Use an already running SUT instead of the local dummy-backed runtime + CONFORMANCE_SUT_PORT Override local dummy-backed SUT port (default: 8011) + CONFORMANCE_SKIP_REPO_SYNC=1 Skip uv sync/uv pip check for this repository + CONFORMANCE_SKIP_TCK_SYNC=1 Skip uv sync inside the cached TCK checkout + CONFORMANCE_AUTH_TYPE Default auth type when A2A_AUTH_TYPE is unset (default: bearer) + CONFORMANCE_AUTH_TOKEN Default auth token when A2A_AUTH_TOKEN is unset (default: test-token) + +Advanced authentication: + The script preserves any caller-provided A2A_AUTH_* variables and only sets defaults + for the common bearer-token case used by the local dummy-backed runtime. +EOF +} + +if [[ "${1:-}" == "--help" || "${1:-}" == "-h" ]]; then + usage + exit 0 +fi + +if [[ "$#" -gt 1 ]]; then + echo "Expected at most one positional argument: category" >&2 + exit 1 +fi + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)" + +category="${1:-${CONFORMANCE_CATEGORY:-mandatory}}" +timestamp="$(date -u +%Y%m%dT%H%M%SZ)" +output_dir="${CONFORMANCE_OUTPUT_DIR:-${ROOT_DIR}/run/conformance/${timestamp}}" +tck_dir="${CONFORMANCE_TCK_DIR:-${ROOT_DIR}/.cache/a2a-tck}" +tck_repo="${CONFORMANCE_TCK_REPO:-https://github.com/a2aproject/a2a-tck.git}" +tck_ref="${CONFORMANCE_TCK_REF:-main}" +transport_strategy="${CONFORMANCE_TRANSPORT_STRATEGY:-agent_preferred}" +transports="${CONFORMANCE_TRANSPORTS:-jsonrpc}" +sut_port="${CONFORMANCE_SUT_PORT:-8011}" +repo_log="${output_dir}/repo-health.log" +tck_sync_log="${output_dir}/tck-sync.log" +sut_log="${output_dir}/sut.log" +tck_log="${output_dir}/tck.log" + +mkdir -p "${output_dir}" + +cleanup() { + local exit_code="$1" + if [[ -n "${sut_pid:-}" ]] && kill -0 "${sut_pid}" >/dev/null 2>&1; then + kill "${sut_pid}" >/dev/null 2>&1 || true + wait "${sut_pid}" >/dev/null 2>&1 || true + fi + exit "${exit_code}" +} + +trap 'cleanup $?' EXIT + +cd "${ROOT_DIR}" + +if [[ "${CONFORMANCE_SKIP_REPO_SYNC:-0}" != "1" ]]; then + run_shared_repo_health_prerequisites "conformance" >"${repo_log}" 2>&1 +fi + +mkdir -p "$(dirname "${tck_dir}")" +if [[ ! -d "${tck_dir}/.git" ]]; then + git clone --depth 1 "${tck_repo}" "${tck_dir}" >"${output_dir}/tck-clone.log" 2>&1 +fi + +git -C "${tck_dir}" fetch --depth 1 origin "${tck_ref}" >"${output_dir}/tck-fetch.log" 2>&1 +git -C "${tck_dir}" checkout --quiet FETCH_HEAD + +if [[ "${CONFORMANCE_SKIP_TCK_SYNC:-0}" != "1" ]]; then + ( + cd "${tck_dir}" + uv sync + ) >"${tck_sync_log}" 2>&1 +fi + +if [[ -z "${A2A_AUTH_TYPE:-}" ]]; then + export A2A_AUTH_TYPE="${CONFORMANCE_AUTH_TYPE:-bearer}" +fi +if [[ -z "${A2A_AUTH_TOKEN:-}" && "${A2A_AUTH_TYPE}" == "bearer" ]]; then + export A2A_AUTH_TOKEN="${CONFORMANCE_AUTH_TOKEN:-test-token}" +fi + +sut_url="${CONFORMANCE_SUT_URL:-}" +if [[ -z "${sut_url}" ]]; then + sut_url="http://127.0.0.1:${sut_port}" + export CONFORMANCE_SUT_PORT="${sut_port}" + export CONFORMANCE_SUT_URL="${sut_url}" + uv run python - <<'PY' >"${sut_log}" 2>&1 & +import uvicorn +import opencode_a2a.server.application as app_module + +from tests.support.helpers import DummyChatOpencodeUpstreamClient, make_settings + +app_module.OpencodeUpstreamClient = DummyChatOpencodeUpstreamClient +settings = make_settings( + a2a_host="127.0.0.1", + a2a_port=int(__import__("os").environ["CONFORMANCE_SUT_PORT"]), + a2a_public_url=__import__("os").environ["CONFORMANCE_SUT_URL"], + a2a_bearer_token=__import__("os").environ.get("A2A_AUTH_TOKEN", "test-token"), +) +app = app_module.create_app(settings) +uvicorn.run(app, host="127.0.0.1", port=settings.a2a_port, log_level="warning") +PY + sut_pid="$!" + + for _ in $(seq 1 50); do + if curl -fsS "${sut_url}/.well-known/agent-card.json" >"${output_dir}/agent-card.json"; then + if curl -fsS -H "Authorization: Bearer ${A2A_AUTH_TOKEN:-test-token}" "${sut_url}/health" \ + >"${output_dir}/health.json"; then + break + fi + fi + sleep 0.2 + done + + if [[ ! -f "${output_dir}/agent-card.json" || ! -f "${output_dir}/health.json" ]]; then + echo "SUT did not become ready at ${sut_url}" >&2 + cat "${sut_log}" >&2 || true + exit 1 + fi +else + curl -fsS "${sut_url}/.well-known/agent-card.json" >"${output_dir}/agent-card.json" +fi + +json_report_name="pytest-${category}.json" + +set +e +( + cd "${tck_dir}" + CONFORMANCE_CATEGORY="${category}" \ + CONFORMANCE_SUT_URL="${sut_url}" \ + CONFORMANCE_JSON_REPORT_NAME="${json_report_name}" \ + CONFORMANCE_TRANSPORT_STRATEGY="${transport_strategy}" \ + CONFORMANCE_TRANSPORTS="${transports}" \ + uv run python - <<'PY' +from __future__ import annotations + +import os +import run_tck + +raise SystemExit( + run_tck.run_test_category( + category=os.environ["CONFORMANCE_CATEGORY"], + sut_url=os.environ["CONFORMANCE_SUT_URL"], + verbose=False, + verbose_log=True, + generate_report=False, + json_report=os.environ["CONFORMANCE_JSON_REPORT_NAME"], + transport_strategy=os.environ["CONFORMANCE_TRANSPORT_STRATEGY"], + enable_equivalence_testing=None, + transports=os.environ["CONFORMANCE_TRANSPORTS"], + ) +) +PY +) 2>&1 | tee "${tck_log}" +tck_exit="${PIPESTATUS[0]}" +set -e + +report_path="${tck_dir}/reports/${json_report_name}" +if [[ -f "${report_path}" ]]; then + cp "${report_path}" "${output_dir}/pytest-report.json" +fi + +CONFORMANCE_CATEGORY="${category}" \ +CONFORMANCE_OUTPUT_DIR="${output_dir}" \ +CONFORMANCE_SUT_URL="${sut_url}" \ +CONFORMANCE_TCK_DIR="${tck_dir}" \ +CONFORMANCE_TCK_REF="${tck_ref}" \ +CONFORMANCE_TRANSPORTS="${transports}" \ +CONFORMANCE_TRANSPORT_STRATEGY="${transport_strategy}" \ +uv run python - <<'PY' +from __future__ import annotations + +import json +import os +import subprocess +from pathlib import Path + +output_dir = Path(os.environ["CONFORMANCE_OUTPUT_DIR"]) +report_path = output_dir / "pytest-report.json" + +metadata = { + "category": os.environ["CONFORMANCE_CATEGORY"], + "sut_url": os.environ["CONFORMANCE_SUT_URL"], + "tck_dir": os.environ["CONFORMANCE_TCK_DIR"], + "tck_ref": os.environ["CONFORMANCE_TCK_REF"], + "transports": os.environ["CONFORMANCE_TRANSPORTS"], + "transport_strategy": os.environ["CONFORMANCE_TRANSPORT_STRATEGY"], + "repo_commit": subprocess.check_output(["git", "rev-parse", "HEAD"], text=True).strip(), + "tck_commit": subprocess.check_output( + ["git", "-C", os.environ["CONFORMANCE_TCK_DIR"], "rev-parse", "HEAD"], + text=True, + ).strip(), +} + +(output_dir / "metadata.json").write_text(json.dumps(metadata, indent=2) + "\n") + +if report_path.exists(): + report = json.loads(report_path.read_text()) + failures = [] + for test in report.get("tests", []): + outcome = test.get("outcome") + if outcome in {"failed", "error"}: + failures.append( + { + "nodeid": test.get("nodeid"), + "outcome": outcome, + "keywords": sorted(test.get("keywords", [])), + } + ) + (output_dir / "failed-tests.json").write_text(json.dumps(failures, indent=2) + "\n") +PY + +echo "Conformance artifacts: ${output_dir}" +echo "TCK log: ${tck_log}" +if [[ -f "${output_dir}/pytest-report.json" ]]; then + echo "Pytest JSON report: ${output_dir}/pytest-report.json" +fi +if [[ -f "${output_dir}/failed-tests.json" ]]; then + echo "Failed tests index: ${output_dir}/failed-tests.json" +fi + +exit "${tck_exit}" diff --git a/src/opencode_a2a/server/application.py b/src/opencode_a2a/server/application.py index c512446..5e74624 100644 --- a/src/opencode_a2a/server/application.py +++ b/src/opencode_a2a/server/application.py @@ -96,11 +96,13 @@ _looks_like_jsonrpc_envelope, _looks_like_jsonrpc_message_payload, _normalize_content_type, + _normalize_v1_jsonrpc_method_alias, _parse_content_length, _parse_json_body, _request_body_too_large_response, _RequestBodyTooLargeError, ) +from .rest_tasks import build_list_tasks_route from .state_store import ( build_interrupt_request_repository, build_session_state_repository, @@ -148,6 +150,7 @@ "_is_json_content_type", "_looks_like_jsonrpc_envelope", "_looks_like_jsonrpc_message_payload", + "_normalize_v1_jsonrpc_method_alias", "_normalize_content_type", "_normalize_log_level", "_parse_content_length", @@ -614,7 +617,12 @@ def create_app(settings: Settings) -> FastAPI: ) app.add_middleware(GZipMiddleware, minimum_size=settings.a2a_http_gzip_minimum_size) jsonrpc_app.add_routes_to_app(app) - for route, callback in rest_adapter.routes().items(): + rest_routes = rest_adapter.routes() + rest_routes[("/v1/tasks", "GET")] = build_list_tasks_route( + task_store=task_store, + default_protocol_version=settings.a2a_protocol_version, + ) + for route, callback in rest_routes.items(): app.add_api_route(route[0], callback, methods=[route[1]]) app.state._jsonrpc_app = jsonrpc_app app.state.task_store = task_store diff --git a/src/opencode_a2a/server/middleware.py b/src/opencode_a2a/server/middleware.py index 7d8a1d6..d3c9db5 100644 --- a/src/opencode_a2a/server/middleware.py +++ b/src/opencode_a2a/server/middleware.py @@ -34,6 +34,7 @@ _looks_like_jsonrpc_envelope, _looks_like_jsonrpc_message_payload, _normalize_content_type, + _normalize_v1_jsonrpc_method_alias, _parse_content_length, _parse_json_body, _request_body_too_large_response, @@ -123,6 +124,19 @@ def _error_protocol_version(request: Request) -> str: return raw_value.strip() return cast(str, settings.a2a_protocol_version) + def _uses_v1_jsonrpc_aliases(request: Request) -> bool: + negotiated = getattr(request.state, "a2a_protocol_version", None) + if negotiated == "1.0": + return True + + raw_value = request.headers.get("A2A-Version") or request.query_params.get("A2A-Version") + if not isinstance(raw_value, str) or not raw_value.strip(): + return False + try: + return normalize_protocol_version(raw_value) == "1.0" + except ValueError: + return False + @app.middleware("http") async def negotiate_a2a_protocol_version(request: Request, call_next): token: Token | None = None @@ -367,6 +381,46 @@ async def guard_rest_payload_shape(request: Request, call_next): if token is not None: _REQUEST_BODY_BYTES.reset(token) + @app.middleware("http") + async def normalize_v1_jsonrpc_method_aliases(request: Request, call_next): + token: Token | None = None + rewrite_token: Token | None = None + if ( + request.method != "POST" + or request.url.path != "/" + or not _uses_v1_jsonrpc_aliases(request) + ): + return await call_next(request) + + try: + body, token = await _get_request_body(request) + payload = _parse_json_body(body) + normalized_payload = _normalize_v1_jsonrpc_method_alias( + payload, + protocol_version="1.0", + ) + if normalized_payload is not None and normalized_payload is not payload: + normalized_body = json.dumps( + normalized_payload, + ensure_ascii=False, + separators=(",", ":"), + ).encode("utf-8") + request._body = normalized_body + rewrite_token = _REQUEST_BODY_BYTES.set(normalized_body) + return await call_next(request) + except _RequestBodyTooLargeError as error: + return _request_body_too_large_response( + path=request.url.path, + method=request.method, + error=error, + protocol_version=_error_protocol_version(request), + ) + finally: + if rewrite_token is not None: + _REQUEST_BODY_BYTES.reset(rewrite_token) + if token is not None: + _REQUEST_BODY_BYTES.reset(token) + @app.middleware("http") async def log_payloads(request: Request, call_next): token: Token | None = None diff --git a/src/opencode_a2a/server/request_parsing.py b/src/opencode_a2a/server/request_parsing.py index 4d76559..35717aa 100644 --- a/src/opencode_a2a/server/request_parsing.py +++ b/src/opencode_a2a/server/request_parsing.py @@ -15,6 +15,18 @@ logger = logging.getLogger(__name__) +_V1_JSONRPC_METHOD_ALIASES = { + "CancelTask": "tasks/cancel", + "CreateTaskPushNotificationConfig": "tasks/pushNotificationConfig/set", + "DeleteTaskPushNotificationConfig": "tasks/pushNotificationConfig/delete", + "GetExtendedAgentCard": "agent/getAuthenticatedExtendedCard", + "GetTask": "tasks/get", + "GetTaskPushNotificationConfig": "tasks/pushNotificationConfig/get", + "ListTaskPushNotificationConfigs": "tasks/pushNotificationConfig/list", + "SendMessage": "message/send", + "SendStreamingMessage": "message/stream", +} + def _parse_json_body(body_bytes: bytes) -> dict | None: try: @@ -92,6 +104,22 @@ def _looks_like_jsonrpc_envelope(payload: dict | None) -> bool: return isinstance(method, str) and isinstance(version, str) +def _normalize_v1_jsonrpc_method_alias( + payload: dict | None, *, protocol_version: str +) -> dict | None: + if payload is None or protocol_version != "1.0": + return payload + method = payload.get("method") + if not isinstance(method, str): + return payload + canonical_method = _V1_JSONRPC_METHOD_ALIASES.get(method) + if canonical_method is None or canonical_method == method: + return payload + normalized_payload = dict(payload) + normalized_payload["method"] = canonical_method + return normalized_payload + + class _RequestBodyTooLargeError(Exception): def __init__(self, *, limit: int, actual_size: int) -> None: super().__init__("Request body too large") diff --git a/src/opencode_a2a/server/rest_tasks.py b/src/opencode_a2a/server/rest_tasks.py new file mode 100644 index 0000000..278b968 --- /dev/null +++ b/src/opencode_a2a/server/rest_tasks.py @@ -0,0 +1,336 @@ +from __future__ import annotations + +import base64 +import json +import logging +from dataclasses import dataclass +from datetime import UTC, datetime + +from a2a.server.tasks.task_store import TaskStore +from a2a.types import Task, TaskState +from fastapi import Request +from fastapi.responses import JSONResponse + +from ..jsonrpc.error_responses import build_http_error_body +from .task_store import TaskStoreOperationError, list_stored_tasks + +logger = logging.getLogger(__name__) +_DEFAULT_LIST_TASKS_PAGE_SIZE = 50 +_MAX_LIST_TASKS_PAGE_SIZE = 100 +_MIN_LIST_TASKS_PAGE_SIZE = 1 + + +@dataclass(frozen=True) +class _TaskCursor: + task_id: str + timestamp: datetime + + +@dataclass(frozen=True) +class _ListTasksQuery: + cursor: _TaskCursor | None + context_id: str | None + include_artifacts: bool + history_length: int + requested_page_size: int + status: TaskState | None + status_timestamp_after: datetime | None + + +class _ListTasksValidationError(ValueError): + def __init__(self, *, field: str, message: str) -> None: + super().__init__(message) + self.field = field + self.message = message + + +def build_list_tasks_route( + *, + task_store: TaskStore, + default_protocol_version: str, +): + async def list_tasks_route(request: Request) -> JSONResponse: + protocol_version = getattr( + request.state, + "a2a_protocol_version", + default_protocol_version, + ) + try: + query = _parse_list_tasks_query(request) + tasks = await list_stored_tasks(task_store) + except _ListTasksValidationError as error: + return _invalid_argument_response( + field=error.field, + message=error.message, + protocol_version=protocol_version, + ) + except TaskStoreOperationError as error: + return JSONResponse( + build_http_error_body( + protocol_version=protocol_version, + status_code=500, + status="INTERNAL", + message="Task store unavailable while listing tasks.", + legacy_payload={ + "error": "Task store unavailable while listing tasks.", + "operation": error.operation, + }, + reason="TASK_STORE_UNAVAILABLE", + metadata={"operation": error.operation}, + ), + status_code=500, + ) + + filtered_tasks = _filter_tasks(tasks, query=query) + total_size = len(filtered_tasks) + paged_tasks = _apply_cursor(filtered_tasks, cursor=query.cursor) + page_tasks = paged_tasks[: query.requested_page_size] + next_page_token = "" + if len(paged_tasks) > len(page_tasks) and page_tasks: + next_page_token = _encode_page_token(page_tasks[-1]) + + return JSONResponse( + { + "tasks": [ + _serialize_task( + task, + history_length=query.history_length, + include_artifacts=query.include_artifacts, + ) + for task in page_tasks + ], + "nextPageToken": next_page_token, + "pageSize": len(page_tasks), + "totalSize": total_size, + } + ) + + return list_tasks_route + + +def _filter_tasks(tasks: list[Task], *, query: _ListTasksQuery) -> list[Task]: + filtered = tasks + + if query.context_id is not None: + filtered = [task for task in filtered if task.context_id == query.context_id] + + if query.status is not None: + filtered = [task for task in filtered if task.status.state == query.status] + + if query.status_timestamp_after is not None: + filtered = [ + task for task in filtered if _task_sort_key(task)[0] >= query.status_timestamp_after + ] + + return sorted( + filtered, + key=_task_sort_key, + reverse=True, + ) + + +def _apply_cursor(tasks: list[Task], *, cursor: _TaskCursor | None) -> list[Task]: + if cursor is None: + return tasks + return [task for task in tasks if _task_sort_key(task) < _cursor_sort_key(cursor)] + + +def _serialize_task( + task: Task, + *, + history_length: int, + include_artifacts: bool, +) -> dict: + payload = task.model_dump(mode="json", by_alias=True, exclude_none=True) + + history = payload.get("history") + if history_length <= 0: + payload.pop("history", None) + elif isinstance(history, list): + payload["history"] = history[-history_length:] + + if not include_artifacts: + payload.pop("artifacts", None) + + return payload + + +def _parse_list_tasks_query(request: Request) -> _ListTasksQuery: + page_size_value = request.query_params.get("pageSize") + if page_size_value is None: + requested_page_size = _DEFAULT_LIST_TASKS_PAGE_SIZE + else: + requested_page_size = _parse_int(page_size_value, field="pageSize") + if not (_MIN_LIST_TASKS_PAGE_SIZE <= requested_page_size <= _MAX_LIST_TASKS_PAGE_SIZE): + raise _ListTasksValidationError( + field="pageSize", + message="pageSize must be between 1 and 100.", + ) + + history_length_value = request.query_params.get("historyLength") + if history_length_value is None: + history_length = 0 + else: + history_length = _parse_int(history_length_value, field="historyLength") + if history_length < 0: + raise _ListTasksValidationError( + field="historyLength", + message="historyLength must be greater than or equal to 0.", + ) + + include_artifacts = _parse_bool( + request.query_params.get("includeArtifacts"), + field="includeArtifacts", + default=False, + ) + cursor = _decode_page_token(request.query_params.get("pageToken")) + + status_value = request.query_params.get("status") + status = None + if status_value is not None: + try: + status = TaskState(status_value) + except ValueError as exc: + raise _ListTasksValidationError( + field="status", + message=f"Unsupported task status {status_value!r}.", + ) from exc + + status_timestamp_after = None + status_timestamp_after_value = request.query_params.get("statusTimestampAfter") + if status_timestamp_after_value is not None: + status_timestamp_after = _parse_timestamp( + status_timestamp_after_value, + field="statusTimestampAfter", + ) + + return _ListTasksQuery( + cursor=cursor, + context_id=request.query_params.get("contextId"), + include_artifacts=include_artifacts, + history_length=history_length, + requested_page_size=requested_page_size, + status=status, + status_timestamp_after=status_timestamp_after, + ) + + +def _parse_int(raw_value: str, *, field: str) -> int: + try: + return int(raw_value) + except ValueError as exc: + raise _ListTasksValidationError( + field=field, + message=f"{field} must be an integer.", + ) from exc + + +def _parse_bool(raw_value: str | None, *, field: str, default: bool) -> bool: + if raw_value is None: + return default + normalized = raw_value.strip().lower() + if normalized in {"true", "1"}: + return True + if normalized in {"false", "0"}: + return False + raise _ListTasksValidationError( + field=field, + message=f"{field} must be a boolean.", + ) + + +def _parse_timestamp(raw_value: str, *, field: str) -> datetime: + normalized = raw_value.strip() + if normalized.endswith("Z"): + normalized = normalized[:-1] + "+00:00" + try: + parsed = datetime.fromisoformat(normalized) + except ValueError as exc: + raise _ListTasksValidationError( + field=field, + message=f"{field} must be a valid ISO 8601 timestamp.", + ) from exc + if parsed.tzinfo is None: + parsed = parsed.replace(tzinfo=UTC) + return parsed.astimezone(UTC) + + +def _task_status_timestamp(task: Task) -> datetime: + timestamp = task.status.timestamp + if not timestamp: + return datetime.min.replace(tzinfo=UTC) + try: + return _parse_timestamp(timestamp, field="status.timestamp") + except _ListTasksValidationError: + logger.warning( + "Ignoring invalid task status timestamp while listing tasks task_id=%s timestamp=%r", + task.id, + timestamp, + ) + return datetime.min.replace(tzinfo=UTC) + + +def _task_sort_key(task: Task) -> tuple[datetime, str]: + return (_task_status_timestamp(task), task.id) + + +def _cursor_sort_key(cursor: _TaskCursor) -> tuple[datetime, str]: + return (cursor.timestamp, cursor.task_id) + + +def _decode_page_token(raw_value: str | None) -> _TaskCursor | None: + if raw_value is None or not raw_value.strip(): + return None + + normalized = raw_value.strip() + padding = "=" * (-len(normalized) % 4) + try: + decoded = base64.urlsafe_b64decode(normalized + padding).decode("utf-8") + payload = json.loads(decoded) + task_id = payload["id"] + timestamp = payload["timestamp"] + if not isinstance(task_id, str) or not task_id.strip(): + raise ValueError("id must be a non-empty string") + if not isinstance(timestamp, str) or not timestamp.strip(): + raise ValueError("timestamp must be a non-empty string") + return _TaskCursor( + task_id=task_id, + timestamp=_parse_timestamp(timestamp, field="pageToken.timestamp"), + ) + except Exception as exc: + raise _ListTasksValidationError( + field="pageToken", + message="pageToken is invalid.", + ) from exc + + +def _encode_page_token(task: Task) -> str: + payload = json.dumps( + { + "id": task.id, + "timestamp": _task_status_timestamp(task).isoformat(), + }, + separators=(",", ":"), + sort_keys=True, + ).encode("utf-8") + return base64.urlsafe_b64encode(payload).decode("utf-8").rstrip("=") + + +def _invalid_argument_response( + *, + field: str, + message: str, + protocol_version: str, +) -> JSONResponse: + return JSONResponse( + build_http_error_body( + protocol_version=protocol_version, + status_code=400, + status="INVALID_ARGUMENT", + message=message, + legacy_payload={"error": message, "field": field}, + reason="INVALID_LIST_TASKS_REQUEST", + metadata={"field": field}, + ), + status_code=400, + ) diff --git a/src/opencode_a2a/server/task_store.py b/src/opencode_a2a/server/task_store.py index 8d00f56..a0d03c8 100644 --- a/src/opencode_a2a/server/task_store.py +++ b/src/opencode_a2a/server/task_store.py @@ -358,6 +358,35 @@ def unwrap_task_store(task_store: TaskStore) -> TaskStore: return task_store +async def list_stored_tasks( + task_store: TaskStore, + context: ServerCallContext | None = None, +) -> list[Task]: + del context + raw_task_store = unwrap_task_store(task_store) + + try: + if isinstance(raw_task_store, InMemoryTaskStore): + async with raw_task_store.lock: + return list(raw_task_store.tasks.values()) + + if isinstance(raw_task_store, DatabaseTaskStore): + await raw_task_store._ensure_initialized() + async with raw_task_store.async_session_maker() as session: + stmt = select(raw_task_store.task_model) + result = await session.execute(stmt) + task_models = result.scalars().all() + return [raw_task_store._from_orm(task_model) for task_model in task_models] + + raise TypeError( + f"Unsupported task store type for listing tasks: {type(raw_task_store).__name__}" + ) + except TaskStoreOperationError: + raise + except Exception as exc: + raise TaskStoreOperationError("list", None) from exc + + def _configure_sqlite_connection(dbapi_connection: Any, _connection_record: Any) -> None: cursor = dbapi_connection.cursor() try: diff --git a/tests/jsonrpc/test_dispatch_registry.py b/tests/jsonrpc/test_dispatch_registry.py index 9bbb2ce..17f68ff 100644 --- a/tests/jsonrpc/test_dispatch_registry.py +++ b/tests/jsonrpc/test_dispatch_registry.py @@ -84,6 +84,49 @@ async def _fake_base_handle(self, request): # noqa: ANN001 assert response.json() == {"delegated_method": "tasks/pushNotificationConfig/get"} +@pytest.mark.asyncio +@pytest.mark.parametrize( + ("alias_method", "canonical_method"), + ( + ("SendMessage", "message/send"), + ("SendStreamingMessage", "message/stream"), + ("GetTask", "tasks/get"), + ("CancelTask", "tasks/cancel"), + ("GetExtendedAgentCard", "agent/getAuthenticatedExtendedCard"), + ("GetTaskPushNotificationConfig", "tasks/pushNotificationConfig/get"), + ("ListTaskPushNotificationConfigs", "tasks/pushNotificationConfig/list"), + ("CreateTaskPushNotificationConfig", "tasks/pushNotificationConfig/set"), + ("DeleteTaskPushNotificationConfig", "tasks/pushNotificationConfig/delete"), + ), +) +async def test_v1_pascalcase_jsonrpc_aliases_delegate_to_canonical_methods( + monkeypatch, + alias_method: str, + canonical_method: str, +) -> None: + async def _fake_base_handle(self, request): # noqa: ANN001 + payload = await request.json() + return JSONResponse({"delegated_method": payload["method"]}) + + monkeypatch.setattr(A2AFastAPIApplication, "_handle_requests", _fake_base_handle) + app = app_module.create_app(make_settings(a2a_bearer_token="test-token", **_BASE_SETTINGS)) + + transport = httpx.ASGITransport(app=app) + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post( + "/", + headers={ + "Authorization": "Bearer test-token", + "A2A-Version": "1.0", + }, + json={"jsonrpc": "2.0", "id": 1, "method": alias_method, "params": {}}, + ) + + assert response.status_code == 200 + assert response.headers["A2A-Version"] == "1.0" + assert response.json() == {"delegated_method": canonical_method} + + @pytest.mark.asyncio async def test_extension_methods_stay_on_local_registry(monkeypatch) -> None: dummy = DummySessionQueryOpencodeUpstreamClient( diff --git a/tests/jsonrpc/test_jsonrpc_unsupported_method.py b/tests/jsonrpc/test_jsonrpc_unsupported_method.py index e7bd411..bc98f96 100644 --- a/tests/jsonrpc/test_jsonrpc_unsupported_method.py +++ b/tests/jsonrpc/test_jsonrpc_unsupported_method.py @@ -64,6 +64,26 @@ async def test_unsupported_method_uses_requested_protocol_version() -> None: assert "message/send" in body["error"]["data"]["supportedMethods"] +@pytest.mark.asyncio +async def test_pascalcase_jsonrpc_aliases_remain_unsupported_on_v03() -> None: + settings = make_settings(a2a_bearer_token="test-token") + app = create_app(settings) + transport = httpx.ASGITransport(app=app) + + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post( + "/", + headers={"Authorization": "Bearer test-token"}, + json={"jsonrpc": "2.0", "id": 123, "method": "SendMessage", "params": {}}, + ) + + assert response.status_code == 200 + body = response.json() + assert body["error"]["code"] == -32601 + assert body["error"]["data"]["method"] == "SendMessage" + assert "message/send" in body["error"]["data"]["supported_methods"] + + @pytest.mark.asyncio async def test_unsupported_v1_minor_version_returns_v1_error_details() -> None: settings = make_settings(a2a_bearer_token="test-token") diff --git a/tests/scripts/test_script_health_contract.py b/tests/scripts/test_script_health_contract.py index 238b6e9..eb8382c 100644 --- a/tests/scripts/test_script_health_contract.py +++ b/tests/scripts/test_script_health_contract.py @@ -1,6 +1,7 @@ from pathlib import Path DOCTOR_TEXT = Path("scripts/doctor.sh").read_text() +CONFORMANCE_TEXT = Path("scripts/conformance.sh").read_text() DEPENDENCY_HEALTH_TEXT = Path("scripts/dependency_health.sh").read_text() HEALTH_COMMON_TEXT = Path("scripts/health_common.sh").read_text() SMOKE_TEST_TEXT = Path("scripts/smoke_test_built_cli.sh").read_text() @@ -41,10 +42,19 @@ def test_dependency_health_keeps_dependency_review_scope() -> None: def test_scripts_index_documents_split_health_entrypoints() -> None: assert "local development regression entrypoint" in SCRIPTS_INDEX_TEXT + assert "external A2A conformance experiment entrypoint" in SCRIPTS_INDEX_TEXT assert "dependency review entrypoint" in SCRIPTS_INDEX_TEXT assert "health_common.sh" in SCRIPTS_INDEX_TEXT +def test_conformance_script_keeps_external_experiment_scope() -> None: + assert 'run_shared_repo_health_prerequisites "conformance"' in CONFORMANCE_TEXT + assert "Run the official A2A TCK as a local/manual experiment." in CONFORMANCE_TEXT + assert "This script is intentionally separate from doctor.sh" in CONFORMANCE_TEXT + assert "DummyChatOpencodeUpstreamClient" in CONFORMANCE_TEXT + assert "failed-tests.json" in CONFORMANCE_TEXT + + def test_smoke_test_requires_explicit_wheel_selection_when_dist_is_ambiguous() -> None: assert 'if [[ "$#" -gt 1 ]]; then' in SMOKE_TEST_TEXT assert ( diff --git a/tests/server/test_app_behaviors.py b/tests/server/test_app_behaviors.py index 20a1b99..bca7619 100644 --- a/tests/server/test_app_behaviors.py +++ b/tests/server/test_app_behaviors.py @@ -40,6 +40,7 @@ _looks_like_jsonrpc_message_payload, _normalize_content_type, _normalize_log_level, + _normalize_v1_jsonrpc_method_alias, _parse_content_length, _parse_json_body, _request_body_too_large_response, @@ -113,6 +114,28 @@ def test_request_payload_helpers_cover_edge_cases() -> None: assert _looks_like_jsonrpc_envelope(None) is False assert _looks_like_jsonrpc_envelope({"jsonrpc": "2.0", "method": "message/send"}) is True assert _looks_like_jsonrpc_envelope({"jsonrpc": 2, "method": "message/send"}) is False + assert _normalize_v1_jsonrpc_method_alias(None, protocol_version="1.0") is None + assert _normalize_v1_jsonrpc_method_alias( + {"jsonrpc": "2.0", "method": "SendMessage"}, + protocol_version="1.0", + ) == { + "jsonrpc": "2.0", + "method": "message/send", + } + assert _normalize_v1_jsonrpc_method_alias( + {"jsonrpc": "2.0", "method": "SendMessage"}, + protocol_version="0.3", + ) == { + "jsonrpc": "2.0", + "method": "SendMessage", + } + assert _normalize_v1_jsonrpc_method_alias( + {"jsonrpc": "2.0", "method": "message/send"}, + protocol_version="1.0", + ) == { + "jsonrpc": "2.0", + "method": "message/send", + } response = _request_body_too_large_response( path="/", diff --git a/tests/server/test_transport_contract.py b/tests/server/test_transport_contract.py index ca4c1e5..36424a1 100644 --- a/tests/server/test_transport_contract.py +++ b/tests/server/test_transport_contract.py @@ -1,11 +1,22 @@ import logging import types +from datetime import UTC, datetime, timedelta from unittest.mock import AsyncMock, MagicMock import httpx import pytest from a2a.server.apps.rest.rest_adapter import RESTAdapter -from a2a.types import TransportProtocol +from a2a.types import ( + Artifact, + Message, + Part, + Role, + Task, + TaskState, + TaskStatus, + TextPart, + TransportProtocol, +) from opencode_a2a.server.application import ( AUTHENTICATED_EXTENDED_CARD_CACHE_CONTROL, @@ -18,6 +29,41 @@ from tests.support.helpers import DummyChatOpencodeUpstreamClient, make_settings +def _task_for_listing( + *, + task_id: str, + context_id: str, + state: TaskState = TaskState.completed, + timestamp: str, + include_artifacts: bool = False, + history_size: int = 0, +) -> Task: + task = Task( + id=task_id, + context_id=context_id, + status=TaskStatus(state=state, timestamp=timestamp), + ) + if include_artifacts: + task.artifacts = [ + Artifact( + artifact_id=f"{task_id}-artifact", + parts=[Part(root=TextPart(text=f"artifact:{task_id}"))], + ) + ] + if history_size > 0: + task.history = [ + Message( + message_id=f"{task_id}-history-{index}", + role=Role.agent, + parts=[Part(root=TextPart(text=f"history:{task_id}:{index}"))], + context_id=context_id, + task_id=task_id, + ) + for index in range(history_size) + ] + return task + + def test_agent_card_declares_dual_stack_with_http_json_preferred() -> None: card = build_agent_card(make_settings(a2a_bearer_token="test-token")) @@ -53,6 +99,296 @@ def test_rest_adapter_exposes_sdk_rest_routes() -> None: assert "/v1/tasks/{id}:subscribe" in route_paths +@pytest.mark.asyncio +async def test_list_tasks_route_returns_paginated_results(monkeypatch) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app( + make_settings( + a2a_bearer_token="test-token", + a2a_task_store_backend="memory", + ) + ) + task_store = app.state.task_store + now = datetime.now(UTC) + await task_store.save( + _task_for_listing( + task_id="task-new", + context_id="ctx-list", + timestamp=(now + timedelta(seconds=2)).isoformat(), + include_artifacts=True, + history_size=3, + ) + ) + await task_store.save( + _task_for_listing( + task_id="task-old", + context_id="ctx-list", + timestamp=(now + timedelta(seconds=1)).isoformat(), + include_artifacts=True, + history_size=2, + ) + ) + await task_store.save( + _task_for_listing( + task_id="task-other", + context_id="ctx-other", + state=TaskState.working, + timestamp=now.isoformat(), + history_size=1, + ) + ) + + transport = httpx.ASGITransport(app=app) + headers = {"Authorization": "Bearer test-token"} + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + first_page = await client.get( + "/v1/tasks", + headers=headers, + params={"contextId": "ctx-list", "pageSize": "1"}, + ) + + assert first_page.status_code == 200 + first_payload = first_page.json() + assert first_payload["totalSize"] == 2 + assert first_payload["pageSize"] == 1 + assert first_payload["tasks"][0]["id"] == "task-new" + assert "artifacts" not in first_payload["tasks"][0] + assert "history" not in first_payload["tasks"][0] + assert first_payload["nextPageToken"] + + second_page = await client.get( + "/v1/tasks", + headers=headers, + params={ + "contextId": "ctx-list", + "pageSize": "1", + "pageToken": first_payload["nextPageToken"], + }, + ) + + assert second_page.status_code == 200 + second_payload = second_page.json() + assert second_payload["totalSize"] == 2 + assert second_payload["pageSize"] == 1 + assert second_payload["tasks"][0]["id"] == "task-old" + assert second_payload["nextPageToken"] == "" + + +@pytest.mark.asyncio +async def test_list_tasks_route_cursor_stays_stable_when_newer_task_is_inserted( + monkeypatch, +) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app( + make_settings( + a2a_bearer_token="test-token", + a2a_task_store_backend="memory", + ) + ) + task_store = app.state.task_store + now = datetime.now(UTC) + await task_store.save( + _task_for_listing( + task_id="task-page-1", + context_id="ctx-cursor", + timestamp=(now + timedelta(seconds=3)).isoformat(), + ) + ) + await task_store.save( + _task_for_listing( + task_id="task-page-2", + context_id="ctx-cursor", + timestamp=(now + timedelta(seconds=2)).isoformat(), + ) + ) + await task_store.save( + _task_for_listing( + task_id="task-page-3", + context_id="ctx-cursor", + timestamp=(now + timedelta(seconds=1)).isoformat(), + ) + ) + + transport = httpx.ASGITransport(app=app) + headers = {"Authorization": "Bearer test-token"} + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + first_page = await client.get( + "/v1/tasks", + headers=headers, + params={"contextId": "ctx-cursor", "pageSize": "1"}, + ) + first_payload = first_page.json() + assert first_payload["tasks"][0]["id"] == "task-page-1" + + await task_store.save( + _task_for_listing( + task_id="task-inserted-later", + context_id="ctx-cursor", + timestamp=(now + timedelta(seconds=4)).isoformat(), + ) + ) + + second_page = await client.get( + "/v1/tasks", + headers=headers, + params={ + "contextId": "ctx-cursor", + "pageSize": "1", + "pageToken": first_payload["nextPageToken"], + }, + ) + + assert second_page.status_code == 200 + second_payload = second_page.json() + assert second_payload["totalSize"] == 4 + assert second_payload["pageSize"] == 1 + assert second_payload["tasks"][0]["id"] == "task-page-2" + + +@pytest.mark.asyncio +async def test_list_tasks_route_supports_history_artifacts_and_filters(monkeypatch) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app( + make_settings( + a2a_bearer_token="test-token", + a2a_task_store_backend="memory", + ) + ) + task_store = app.state.task_store + now = datetime.now(UTC) + target_task = _task_for_listing( + task_id="task-filtered", + context_id="ctx-filtered", + state=TaskState.completed, + timestamp=(now + timedelta(seconds=1)).isoformat(), + include_artifacts=True, + history_size=4, + ) + await task_store.save(target_task) + await task_store.save( + _task_for_listing( + task_id="task-excluded-status", + context_id="ctx-filtered", + state=TaskState.failed, + timestamp=now.isoformat(), + include_artifacts=True, + history_size=2, + ) + ) + + transport = httpx.ASGITransport(app=app) + headers = {"Authorization": "Bearer test-token"} + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get( + "/v1/tasks", + headers=headers, + params={ + "contextId": "ctx-filtered", + "status": "completed", + "historyLength": "2", + "includeArtifacts": "true", + "statusTimestampAfter": now.isoformat(), + }, + ) + + assert response.status_code == 200 + payload = response.json() + assert payload["totalSize"] == 1 + assert payload["pageSize"] == 1 + assert payload["nextPageToken"] == "" + returned_task = payload["tasks"][0] + assert returned_task["id"] == "task-filtered" + assert len(returned_task["history"]) == 2 + assert returned_task["artifacts"][0]["artifactId"] == "task-filtered-artifact" + + +@pytest.mark.asyncio +async def test_list_tasks_route_tolerates_invalid_stored_status_timestamp(monkeypatch) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app( + make_settings( + a2a_bearer_token="test-token", + a2a_task_store_backend="memory", + ) + ) + task_store = app.state.task_store + now = datetime.now(UTC) + await task_store.save( + _task_for_listing( + task_id="task-valid-ts", + context_id="ctx-invalid-ts", + timestamp=now.isoformat(), + ) + ) + await task_store.save( + _task_for_listing( + task_id="task-invalid-ts", + context_id="ctx-invalid-ts", + timestamp="not-a-timestamp", + ) + ) + + transport = httpx.ASGITransport(app=app) + headers = {"Authorization": "Bearer test-token"} + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get( + "/v1/tasks", + headers=headers, + params={"contextId": "ctx-invalid-ts"}, + ) + + assert response.status_code == 200 + payload = response.json() + assert payload["totalSize"] == 2 + assert [task["id"] for task in payload["tasks"]] == ["task-valid-ts", "task-invalid-ts"] + + +@pytest.mark.asyncio +async def test_list_tasks_route_validates_query_parameters(monkeypatch) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app( + make_settings( + a2a_bearer_token="test-token", + a2a_task_store_backend="memory", + ) + ) + transport = httpx.ASGITransport(app=app) + headers = {"Authorization": "Bearer test-token"} + + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + page_size_error = await client.get( + "/v1/tasks", + headers=headers, + params={"pageSize": "0"}, + ) + page_token_error = await client.get( + "/v1/tasks", + headers=headers, + params={"pageToken": "invalid-token"}, + ) + + assert page_size_error.status_code == 400 + assert page_size_error.json() == { + "error": "pageSize must be between 1 and 100.", + "field": "pageSize", + } + assert page_token_error.status_code == 400 + assert page_token_error.json() == { + "error": "pageToken is invalid.", + "field": "pageToken", + } + + def test_openapi_rest_message_routes_include_schema_and_examples() -> None: app = create_app(make_settings(a2a_bearer_token="test-token")) openapi = app.openapi() @@ -377,6 +713,38 @@ async def test_dual_stack_send_accepts_transport_native_payloads(monkeypatch) -> assert rpc_resp.json().get("error") is None +@pytest.mark.asyncio +async def test_v1_pascalcase_sendmessage_alias_is_accepted(monkeypatch) -> None: + import opencode_a2a.server.application as app_module + + monkeypatch.setattr(app_module, "OpencodeUpstreamClient", DummyChatOpencodeUpstreamClient) + app = app_module.create_app(make_settings(a2a_bearer_token="test-token")) + transport = httpx.ASGITransport(app=app) + headers = { + "Authorization": "Bearer test-token", + "A2A-Version": "1.0", + } + alias_payload = { + "jsonrpc": "2.0", + "id": 7, + "method": "SendMessage", + "params": { + "message": { + "messageId": "m-rpc-v1", + "role": "user", + "parts": [{"kind": "text", "text": "hello from v1 alias"}], + } + }, + } + + async with httpx.AsyncClient(transport=transport, base_url="http://test") as client: + rpc_resp = await client.post("/", headers=headers, json=alias_payload) + + assert rpc_resp.status_code == 200 + assert rpc_resp.headers["A2A-Version"] == "1.0" + assert rpc_resp.json().get("error") is None + + @pytest.mark.asyncio async def test_dual_stack_send_rejects_cross_transport_payload_shapes(monkeypatch) -> None: import opencode_a2a.server.application as app_module