Skip to content

feat(storage): enable default CRC32C checksum validation for object downloads#9210

Open
salilg-eng wants to merge 1 commit into
googleapis:mainfrom
salilg-eng:feat/default-crc32c
Open

feat(storage): enable default CRC32C checksum validation for object downloads#9210
salilg-eng wants to merge 1 commit into
googleapis:mainfrom
salilg-eng:feat/default-crc32c

Conversation

@salilg-eng
Copy link
Copy Markdown
Contributor

Overview

Enables full object checksum validation (CRC32C by default) across GCS JSON read paths, covering all four download methods: downloadAsString, downloadToFile, downloadAsStream, and downloadAsStreamAsync.

It integrates a HashValidatingStream decorator that computes hashes on-the-fly and validates them upon reaching the end of the stream.

Key Features

  • CRC32C by Default: Prioritizes lightweight, hardware-accelerated CRC32C validation.
  • MD5 Fallback: Automatically falls back to validating MD5 if a CRC32C hash is not available on the GCS object.
  • Configurable Options: Supports user overrides through the 'validate' parameter (crc32, md5, or false to disable).
  • Subrange Bypass: Bypasses checks automatically on partial range downloads (HTTP 206 Partial Content) where full object validation is incompatible.

Buganizer Tasks Resolved

  • Fixes b/514548528 ([PHP] SDK: Full object checksum for all JSON and XML Reads as well)

Testing

  • Added new unit tests in HashValidatingStreamTest.php.
  • Added sync/async integration test cases inside RestTest.php.

@salilg-eng salilg-eng requested review from a team as code owners May 25, 2026 10:11
@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label May 25, 2026
Comment thread Storage/src/Connection/Rest.php Outdated
// the partial stream, we can just return the stream we fetched.
if ($transcodedObj) {
return $fetchedStream;
return $this->maybeWrapWithHashValidatingStream($fetchedStream, $args, $response);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, wrapping the transcoded stream in maybeWrapWithHashValidatingStream here will cause a 100% failure rate for compressed files because GCS serves uncompressed bytes on transcoding, whereas X-Goog-Hash contains the hash of the stored compressed bytes. Can we implement on-the-fly decompression and verification?

cc: @v-pratap what do you recommend?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the logic to check GCS's X-Goog-Stored-Content-Encoding header. If this header is present (indicating transcoding occurred), the validator now safely and automatically bypasses validation, which aligns with how GCS client libraries in other languages (like Java, Go, and Python) handle transcoding. I also added a dedicated unit test testDownloadObjectWithTranscodedObjectValidationBypassed to verify this behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll let Vaibhav confirm if bypassing this is okay. Let's keep this unresolved for now.

Comment thread Storage/src/Connection/Rest.php Outdated
Comment thread Storage/src/Connection/Rest.php Outdated
@kalragauri kalragauri requested a review from v-pratap May 26, 2026 06:10
@salilg-eng salilg-eng force-pushed the feat/default-crc32c branch from d52e22f to 1ed7da9 Compare May 26, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants