Feat(Storage): Enable full object checksum validation on JSON path#8825
Feat(Storage): Enable full object checksum validation on JSON path#8825thiyaguk09 wants to merge 10 commits intogoogleapis:mainfrom
Conversation
d61822d to
4091fcc
Compare
Refactor Resumable, Streamable, and Multipart uploaders to ensure integrity headers (X-Goog-Hash) are only attached to the request when an upload is being finalized. - In StreamableUploader, introduced `$isFinalRequest` to track intent before writeSize recalculations. - In ResumableUploader, added a boundary check to only attach the hash when the current range matches the total file size. - Aligns with GCS best practices for resumable upload integrity.
4091fcc to
2ac9aad
Compare
bshaffer
left a comment
There was a problem hiding this comment.
Just one question and a few code-cleanup nits! Otherwise this looks great
Storage/src/Connection/Rest.php
Outdated
| $md5Hash = null; | ||
| $crc32c = null; | ||
|
|
||
| if ($validate !== false) { | ||
| $md5Hash = base64_encode(Utils::hash($args['data'], 'md5', true)); | ||
| $crc32c = $this->crcFromStream($args['data']); | ||
|
|
||
| if ($validate === 'md5') { | ||
| $args['metadata']['md5Hash'] = $md5Hash; | ||
| } elseif ($validate === 'crc32') { | ||
| $args['metadata']['crc32c'] = $crc32c; | ||
| } | ||
| } | ||
|
|
||
| // Prepare the X-Goog-Hash header string | ||
| $xGoogHash = []; | ||
| if ($crc32c) { | ||
| $xGoogHash[] = 'crc32c=' . $crc32c; | ||
| } | ||
| if ($md5Hash) { | ||
| $xGoogHash[] = 'md5=' . $md5Hash; | ||
| } | ||
| $xGoogHashHeader = implode(',', $xGoogHash); |
There was a problem hiding this comment.
This logic is a bit hard to follow. I think it could be improved by 1) placing everything inside the if($validate) block, and 2) forming the $xGoogHash in a single statement
$xGoogHashHeader = '';
if ($validate !== false) {
$md5Hash = base64_encode(Utils::hash($args['data'], 'md5', true));
$crc32c = $this->crcFromStream($args['data']);
// Add validation metadata
if ($validate === 'md5') {
$args['metadata']['md5Hash'] = $md5Hash;
} elseif ($validate === 'crc32') {
$args['metadata']['crc32c'] = $crc32c;
}
// Prepare the X-Goog-Hash header string
$xGoogHashHeader = implode(',', array_filter([
$md5Hash ? 'md5=' . $md5Hash : null,
$crc32c ? 'crc32c=' . $crc32c : null,
]));
}There was a problem hiding this comment.
I've refactored the hash logic to stay within the $validate block and used the array_filter approach for a cleaner implode.
Storage/src/Connection/Rest.php
Outdated
| $args['uploaderOptions']['restOptions'] ??= []; | ||
| $args['uploaderOptions']['restOptions']['headers'] ??= []; |
There was a problem hiding this comment.
fun fact - thanks to "autovivification", these two lines aren't necessary as long as you modify your line below (see my other comment)
| $args['uploaderOptions']['restOptions'] ??= []; | |
| $args['uploaderOptions']['restOptions']['headers'] ??= []; |
There was a problem hiding this comment.
Good point on autovivification—I've removed the explicit array initializations and updated the array_merge to use the null coalescing operator instead. This definitely cleans up the Rest.php logic.
Storage/src/Connection/Rest.php
Outdated
|
|
||
| if (!empty($args['headers'])) { | ||
| $args['uploaderOptions']['restOptions']['headers'] = array_merge( | ||
| $args['uploaderOptions']['restOptions']['headers'], |
There was a problem hiding this comment.
This will work as expected thanks to autovivification:
| $args['uploaderOptions']['restOptions']['headers'], | |
| $args['uploaderOptions']['restOptions']['headers'] ?? [], |
There was a problem hiding this comment.
Updated—thanks for the tip! I've updated the logic to leverage autovivification and simplified the initialization.
| if ($size !== '*' && ($rangeEnd + 1) == (int) $size) { | ||
| $customHeaders = $this->requestOptions['restOptions']['headers'] ?? []; | ||
| if (isset($customHeaders['X-Goog-Hash'])) { | ||
| $headers['X-Goog-Hash'] = $customHeaders['X-Goog-Hash']; |
There was a problem hiding this comment.
Is there a reason we are only setting this one header instead of supporting all custom headers?
There was a problem hiding this comment.
You're right. The intent was to isolate the X-Goog-Hash to the final chunk to avoid intermediate validation errors, but this implementation accidentally drops other custom headers. I will refactor this to merge all $customHeaders while specifically unsetting X-Goog-Hash only on non-final chunks.
| if ($isFinalRequest) { | ||
| $customHeaders = $this->requestOptions['restOptions']['headers'] ?? []; | ||
| if (isset($customHeaders['X-Goog-Hash'])) { | ||
| $headers['X-Goog-Hash'] = $customHeaders['X-Goog-Hash']; |
There was a problem hiding this comment.
same question here - is there a reason we're only supporting the one custom header instead of supporting all custom headers?
There was a problem hiding this comment.
The intent was to isolate the X-Goog-Hash to the final chunk to avoid intermediate validation errors, but this implementation accidentally drops other custom headers. I will refactor this to merge all $customHeaders while specifically unsetting X-Goog-Hash only on non-final chunks.
- Refactor Rest.php hash calculation to be more concise using array_filter. - Remove redundant array initializations in Rest.php by utilizing PHP autovivification. - Improve readability of X-Goog-Hash header generation.
Enhanced Checksum Validation & Header Logic
This PR implements comprehensive MD5 and CRC32c checksum validation for object uploads, ensuring data integrity via the
X-Goog-Hashheader, improving data integrity verification. It refactors the upload architecture to handle hashes dynamically across different upload strategies.Key Technical Changes
1. Core Library Enhancements (
google-cloud-core)ResumableUploader&StreamableUploader: Added type-safe logic(int)($rangeEnd + 1) === (int)$sizeto ensureX-Goog-Hashis transmitted only on the final chunk/request, preventing intermediate validation errors.MultipartUploader: Standardized header merging to ensure hashes calculated by the connection layer are always included in single-shot uploads.restOptionsmerging to ensure custom metadata and encryption headers are preserved alongside checksums.2. Storage Package Improvements (
google-cloud-storage)MD5orCRC32chashes when thevalidateoption is enabled.Bucket::upload()to honor user-provided checksums and prevent redundant re-calculation.BucketTestandRestTestto verify hash behavior in resumable, streamable, and multipart scenarios.Note
CI "Lowest Dependencies" Failure: This failure occurs because the CI environment pulls the tagged version of
google-cloud-corefrom Packagist instead of using the local changes in this PR. This will resolve once the Core changes are merged.