Skip to content

fix(cloud): handle unsigned S3 metadata headers in presigned URL uploads#1314

Merged
even-wei merged 4 commits intomainfrom
fix/cloud-smoke-test-staging
Apr 15, 2026
Merged

fix(cloud): handle unsigned S3 metadata headers in presigned URL uploads#1314
even-wei merged 4 commits intomainfrom
fix/cloud-smoke-test-staging

Conversation

@kentwelcome
Copy link
Copy Markdown
Member

@kentwelcome kentwelcome commented Apr 15, 2026

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?
Bug fix

What this PR does / why we need it:

Fixes two S3 upload errors encountered on staging:

  1. AccessDenied – headers not signed — The OSS client sends x-amz-tagging and x-amz-meta-* headers with every S3 PUT, but the cloud backend's presigned URL may not sign them. S3 rejects unsigned headers.

  2. InvalidArgument – must provide an appropriate secret key — A naive fix that filters all unsigned headers also strips SSE-C encryption headers, causing S3 to receive the algorithm header without the key.

Root cause: The OSS client always adds metadata headers (x-amz-tagging, x-amz-meta-commit, x-amz-meta-dbt_version) to S3 uploads regardless of whether the presigned URL was signed for them. The metadata is already sent to the cloud API in the POST body, so these client-side S3 headers are optional.

Fix: Add filter_headers_for_presigned_url() which:

  • Parses X-Amz-SignedHeaders from the presigned URL
  • Drops only unsigned metadata headers (x-amz-tagging, x-amz-meta-*)
  • Never filters SSE-C or other non-metadata headers

This is backward-compatible: if the server signs metadata headers, they're still sent.

Changes:

File Change
recce/state/cloud.py Add get_signed_headers(), filter_headers_for_presigned_url(). Applied in _upload_state_to_url() and _upload_state_to_recce_cloud()
recce/state/__init__.py Export new function
recce/artifact.py Applied filter in upload_dbt_artifacts()
tests/state/test_cloud.py 8 new tests covering signed header parsing, metadata filtering, SSE-C preservation, and fallback

Also includes the CI fix for staging cloud smoke test token (integration-tests-cloud.yaml).

Which issue(s) this PR fixes:

Special notes for your reviewer:
The filter only targets metadata headers by prefix. SSE-C headers are always passed through to avoid partial-stripping errors.

Does this PR introduce a user-facing change?:
NONE

Signed-off-by: Kent Huang <kent@infuseai.io>
Copilot AI review requested due to automatic review settings April 15, 2026 03:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the staging Recce Cloud smoke test workflow to use the correct GitHub token secret and standardizes YAML quoting in the workflow triggers.

Changes:

  • Update the staging smoke test job to use secrets.RECCE_CLOUD_TOKEN instead of secrets.RECCE_CLOUD_TOKEN_STAGING.
  • Normalize YAML string quoting for the scheduled cron expression and workflow dispatch input description.

Comment thread .github/workflows/integration-tests-cloud.yaml Outdated
…loads

The cloud backend may generate presigned URLs that don't sign
x-amz-tagging and x-amz-meta-* headers. Sending unsigned headers
causes S3 to reject with AccessDenied. This adds
filter_headers_for_presigned_url() which parses X-Amz-SignedHeaders
from the presigned URL and drops only unsigned metadata headers.

SSE-C encryption headers are never filtered — stripping them partially
causes 'InvalidArgument: must provide an appropriate secret key'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
@kentwelcome kentwelcome changed the title [Fix] Staging cloud smoke test token issue fix(cloud): handle unsigned S3 metadata headers in presigned URL uploads Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
recce/artifact.py 26.74% <100.00%> (+15.63%) ⬆️
recce/state/__init__.py 100.00% <ø> (ø)
recce/state/cloud.py 72.42% <100.00%> (+6.73%) ⬆️
tests/state/test_cloud.py 99.84% <100.00%> (+0.03%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kentwelcome and others added 2 commits April 15, 2026 11:58
Signed-off-by: Kent Huang <kent@infuseai.io>
Add integration-level tests for the two uncovered call sites:
- upload_dbt_artifacts() in artifact.py
- RecceCloudStateManager._upload_state_to_recce_cloud() in cloud.py

Both verify that unsigned metadata headers are dropped while SSE-C
headers are preserved when the presigned URL doesn't sign them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent Huang <kent@infuseai.io>
@kentwelcome
Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

@kentwelcome
Copy link
Copy Markdown
Member Author

Code Review — PR #1314

Summary

Adds filter_headers_for_presigned_url() to drop unsigned S3 metadata headers (x-amz-tagging, x-amz-meta-*) from upload requests while preserving SSE-C encryption headers. Applied at all three upload sites. Well-tested with 10 new tests covering parsing, filtering, SSE-C preservation, fallback, and integration paths. CI workflow token change is straightforward.

Findings

No critical issues found.

Verification results:

  • make flake8 — passed
  • pytest tests/state/test_cloud.py tests/test_cloud_upload.py — 101 passed
  • Filter logic correctly preserves SSE-C headers and only drops metadata-prefix headers
  • Fallback behavior (send all headers when X-Amz-SignedHeaders absent) is safe — matches pre-PR behavior

Verdict

Approved.

@kentwelcome kentwelcome self-assigned this Apr 15, 2026
@kentwelcome kentwelcome requested a review from even-wei April 15, 2026 04:12
@even-wei
Copy link
Copy Markdown
Contributor

Code Review: PR #1314

Files reviewed: 7 (5 in PR diff + 2 valuediff changes on branch)
Categories: Business logic, Config/infra, Tests
Passes run: A, B, C, D, E, F

Validation Results

Pass A: Correctness & Logic — PASS

The header filtering logic is sound:

  • get_signed_headers() correctly parses X-Amz-SignedHeaders from presigned URLs, handling both capitalized and lowercase param names via fallback (qs.get("X-Amz-SignedHeaders", qs.get("x-amz-signedheaders", [""]))).
  • filter_headers_for_presigned_url() uses a clean predicate: only x-amz-tagging and x-amz-meta-* are subject to filtering. SSE-C headers always pass through because they don't match the metadata prefix tuple. This is the correct approach — it avoids the "strip SSE algorithm header but not the key" trap described in the PR.
  • Fallback when signed is empty (no X-Amz-SignedHeaders in URL) returns all headers unchanged — preserves backward compatibility with presigned URLs that don't use SigV4 or older server versions.
  • The valuediff adapter.quote() change correctly wraps column identifiers, preventing SQL parsers from misinterpreting 47_1_TransId as the numeric literal 47.

Pass B: Security — PASS

  • CI token change from RECCE_CLOUD_TOKEN/RECCE_CLOUD_TOKEN_STAGING to DEV_INFUSEAI_GITHUB_TOKEN uses secrets.* references appropriately.
  • No secrets, PII, or credentials exposed.
  • The header filter does not weaken encryption — SSE-C headers are explicitly preserved.

Pass C: Cross-Reference Consistency — PASS

Verified all upload call sites:

Upload path Sends metadata headers? Has filter?
CloudStateLoader._upload_state_to_url Yes (password + metadata) cloud.py:523
RecceCloudStateManager._upload_state_to_recce_cloud Yes (password + metadata) cloud.py:611
artifact.upload_dbt_artifacts Yes (password + metadata) artifact.py:188
artifact.upload_artifacts_to_session No (raw PUT, no x-amz headers) N/A
cli.py CLL map/cache uploads No (Content-Type only) N/A
server.py session manifest/catalog No (raw PUT) N/A

All three paths that construct metadata headers now apply the filter. All paths that don't use metadata headers are correctly left unchanged.

filter_headers_for_presigned_url is properly exported in recce/state/__init__.py (both in imports and __all__) and imported in recce/artifact.py.

Pass D: Error Handling & Edge Cases — PASS

Edge cases handled correctly:

  • URL without X-Amz-SignedHeaders → empty set → fallback returns all headers
  • parse_qs returning multiple values → [0] takes the first (S3 URLs only have one)
  • Empty string in raw.split(";") → filtered out by if h.strip() in the set comprehension

Pass E: Test Coverage & Quality — PASS

Strong test coverage:

  • TestGetSignedHeaders (3 tests): standard URL, missing param, case-insensitive param name
  • TestFilterHeadersForPresignedUrl (5 tests): unsigned metadata filtered, all-signed kept, no-signed-param fallback, SSE-C preserved when unsigned, non-metadata headers preserved
  • TestRecceCloudStateManagerUploadFiltersHeaders (1 test): end-to-end verification that _upload_state_to_recce_cloud drops unsigned metadata
  • TestUploadDbtArtifactsFiltersHeaders (1 test): end-to-end verification for artifact.py path
  • Valuediff regression tests (2 tests): digit-starting single key and composite key

The existing TestUploadIncludesMetaHeaders test continues to work correctly because its mock URL (http://presigned.url) has no X-Amz-SignedHeaders, triggering the fallback that passes all headers through.

Pass F: Diff-Specific Checks — PASS

  • PR body accurately describes the 4 core files and the CI fix
  • The valuediff quoting changes (commit 814afcc4) appear in the branch diff but not in the GitHub PR file list — they may have been merged to main separately
  • No partial renames or stale references

Verification Results

All CI checks pass (21/21):

  • ✅ Check Code Style with Flake8
  • ✅ Test Python Versions (3.11, 3.12, 3.13)
  • ✅ Test DBT Versions
  • ✅ smoke-test (3.11/1.7, 3.12/1.8, 3.13/latest)
  • ✅ smoke-test-cloud (3.11/1.8, 3.13/latest)
  • ✅ CodeQL (Python, JavaScript, Actions)
  • ✅ DCO
  • ✅ codecov/patch + codecov/project

Verdict: GO

Clean, well-designed fix. The filter correctly targets only metadata headers by prefix, preserving SSE-C and other non-metadata headers. The fallback behavior ensures backward compatibility. Test coverage is thorough with both unit tests and call-site integration tests.

Notes (informational only)

  1. NOTE: _METADATA_HEADER_PREFIXES = ("x-amz-tagging", "x-amz-meta-") uses startswith() which means any header starting with "x-amz-tagging" (e.g., hypothetical x-amz-tagging-directive) would also be filtered. S3's x-amz-tagging-directive exists for COPY operations but not PUT uploads, so this is safe in practice. (recce/state/cloud.py:50)

Copy link
Copy Markdown
Contributor

@even-wei even-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review: No critical issues found. Clean fix with thorough test coverage.

@even-wei even-wei merged commit 36bff85 into main Apr 15, 2026
21 checks passed
@even-wei even-wei deleted the fix/cloud-smoke-test-staging branch April 15, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants