Skip to content

Feat(Storage): Enable full object checksum validation on JSON path#8825

Open
thiyaguk09 wants to merge 10 commits intogoogleapis:mainfrom
thiyaguk09:feat/enable-full-checksum-validation
Open

Feat(Storage): Enable full object checksum validation on JSON path#8825
thiyaguk09 wants to merge 10 commits intogoogleapis:mainfrom
thiyaguk09:feat/enable-full-checksum-validation

Conversation

@thiyaguk09
Copy link
Copy Markdown
Contributor

@thiyaguk09 thiyaguk09 commented Dec 26, 2025

Enhanced Checksum Validation & Header Logic

This PR implements comprehensive MD5 and CRC32c checksum validation for object uploads, ensuring data integrity via the X-Goog-Hash header, improving data integrity verification. It refactors the upload architecture to handle hashes dynamically across different upload strategies.

Key Technical Changes

1. Core Library Enhancements (google-cloud-core)

  • ResumableUploader & StreamableUploader: Added type-safe logic (int)($rangeEnd + 1) === (int)$size to ensure X-Goog-Hash is transmitted only on the final chunk/request, preventing intermediate validation errors.
  • MultipartUploader: Standardized header merging to ensure hashes calculated by the connection layer are always included in single-shot uploads.
  • Header Integrity: Refactored restOptions merging to ensure custom metadata and encryption headers are preserved alongside checksums.

2. Storage Package Improvements (google-cloud-storage)

  • Automatic Hashing: Implemented logic to calculate missing MD5 or CRC32c hashes when the validate option is enabled.
  • Validation Logic: Updated Bucket::upload() to honor user-provided checksums and prevent redundant re-calculation.
  • Test Coverage: Added unit tests in BucketTest and RestTest to verify hash behavior in resumable, streamable, and multipart scenarios.

Note

CI "Lowest Dependencies" Failure: This failure occurs because the CI environment pulls the tagged version of google-cloud-core from Packagist instead of using the local changes in this PR. This will resolve once the Core changes are merged.

@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Dec 26, 2025
@thiyaguk09 thiyaguk09 force-pushed the feat/enable-full-checksum-validation branch 2 times, most recently from d61822d to 4091fcc Compare March 23, 2026 11:33
Refactor Resumable, Streamable, and Multipart uploaders to ensure
integrity headers (X-Goog-Hash) are only attached to the request
when an upload is being finalized.

- In StreamableUploader, introduced `$isFinalRequest` to track
  intent before writeSize recalculations.
- In ResumableUploader, added a boundary check to only attach
  the hash when the current range matches the total file size.
- Aligns with GCS best practices for resumable upload integrity.
@thiyaguk09 thiyaguk09 force-pushed the feat/enable-full-checksum-validation branch from 4091fcc to 2ac9aad Compare March 23, 2026 11:40
@thiyaguk09 thiyaguk09 marked this pull request as ready for review March 23, 2026 12:54
@thiyaguk09 thiyaguk09 requested review from a team as code owners March 23, 2026 12:54
@Dhriti07 Dhriti07 requested a review from nidhiii-27 March 24, 2026 08:37
Copy link
Copy Markdown
Contributor

@bshaffer bshaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question and a few code-cleanup nits! Otherwise this looks great

Comment on lines +503 to +525
$md5Hash = null;
$crc32c = null;

if ($validate !== false) {
$md5Hash = base64_encode(Utils::hash($args['data'], 'md5', true));
$crc32c = $this->crcFromStream($args['data']);

if ($validate === 'md5') {
$args['metadata']['md5Hash'] = $md5Hash;
} elseif ($validate === 'crc32') {
$args['metadata']['crc32c'] = $crc32c;
}
}

// Prepare the X-Goog-Hash header string
$xGoogHash = [];
if ($crc32c) {
$xGoogHash[] = 'crc32c=' . $crc32c;
}
if ($md5Hash) {
$xGoogHash[] = 'md5=' . $md5Hash;
}
$xGoogHashHeader = implode(',', $xGoogHash);
Copy link
Copy Markdown
Contributor

@bshaffer bshaffer Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit hard to follow. I think it could be improved by 1) placing everything inside the if($validate) block, and 2) forming the $xGoogHash in a single statement

        $xGoogHashHeader = '';
        if ($validate !== false) {
            $md5Hash = base64_encode(Utils::hash($args['data'], 'md5', true));
            $crc32c = $this->crcFromStream($args['data']);

            // Add validation metadata
            if ($validate === 'md5') {
                $args['metadata']['md5Hash'] = $md5Hash;
            } elseif ($validate === 'crc32') {
                $args['metadata']['crc32c'] = $crc32c;
            }

            // Prepare the X-Goog-Hash header string
            $xGoogHashHeader = implode(',', array_filter([
                $md5Hash ? 'md5=' . $md5Hash : null,
                $crc32c ? 'crc32c=' . $crc32c : null,
            ]));
        }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the hash logic to stay within the $validate block and used the array_filter approach for a cleaner implode.

Comment on lines +553 to +554
$args['uploaderOptions']['restOptions'] ??= [];
$args['uploaderOptions']['restOptions']['headers'] ??= [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun fact - thanks to "autovivification", these two lines aren't necessary as long as you modify your line below (see my other comment)

Suggested change
$args['uploaderOptions']['restOptions'] ??= [];
$args['uploaderOptions']['restOptions']['headers'] ??= [];

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on autovivification—I've removed the explicit array initializations and updated the array_merge to use the null coalescing operator instead. This definitely cleans up the Rest.php logic.


if (!empty($args['headers'])) {
$args['uploaderOptions']['restOptions']['headers'] = array_merge(
$args['uploaderOptions']['restOptions']['headers'],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work as expected thanks to autovivification:

Suggested change
$args['uploaderOptions']['restOptions']['headers'],
$args['uploaderOptions']['restOptions']['headers'] ?? [],

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated—thanks for the tip! I've updated the logic to leverage autovivification and simplified the initialization.

if ($size !== '*' && ($rangeEnd + 1) == (int) $size) {
$customHeaders = $this->requestOptions['restOptions']['headers'] ?? [];
if (isset($customHeaders['X-Goog-Hash'])) {
$headers['X-Goog-Hash'] = $customHeaders['X-Goog-Hash'];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are only setting this one header instead of supporting all custom headers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The intent was to isolate the X-Goog-Hash to the final chunk to avoid intermediate validation errors, but this implementation accidentally drops other custom headers. I will refactor this to merge all $customHeaders while specifically unsetting X-Goog-Hash only on non-final chunks.

if ($isFinalRequest) {
$customHeaders = $this->requestOptions['restOptions']['headers'] ?? [];
if (isset($customHeaders['X-Goog-Hash'])) {
$headers['X-Goog-Hash'] = $customHeaders['X-Goog-Hash'];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here - is there a reason we're only supporting the one custom header instead of supporting all custom headers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was to isolate the X-Goog-Hash to the final chunk to avoid intermediate validation errors, but this implementation accidentally drops other custom headers. I will refactor this to merge all $customHeaders while specifically unsetting X-Goog-Hash only on non-final chunks.

- Refactor Rest.php hash calculation to be more concise using
array_filter.
- Remove redundant array initializations in Rest.php by utilizing PHP
autovivification.
- Improve readability of X-Goog-Hash header generation.
@thiyaguk09 thiyaguk09 requested a review from bshaffer April 1, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants