Skip to content

Conversation

@0verflowme
Copy link

This adds a higher-level blob streaming API that prefers the newer GetRawBlob method (when supported by the negotiated proxy protocol version) and otherwise falls back to GetBlob.

Motivation: bootc-dev/bootc#1424

Signed-off-by: Priyanshu Kumar <priyanshu.kumar@broadcom.com>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new get_blob_stream API that intelligently prefers the GetRawBlob method when supported by the proxy protocol, falling back to GetBlob otherwise. A significant addition is the VerifiedBlobReader and associated hashing logic, which enables on-the-fly digest and size verification for streamed blobs, enhancing data integrity. The changes also include necessary dependency updates (hex, sha2), new error types, and protocol version checks. The ProxyTooOld error message has been corrected for clarity. Overall, this is a well-implemented feature that improves the robustness and security of blob fetching.

let driver = async move {
err.await.map_err(Error::from)?;
match rx.await {
Ok(r) => verify_blob_bytes_read(&expected, expected_size, r),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the driver future, if rx.await returns an Err(_) (meaning the oneshot::Sender was dropped without sending a message), the driver currently returns Ok(()). This could potentially mask an issue if the VerifiedBlobReader was dropped unexpectedly (e.g., due to a panic) before its Drop implementation could send VerifiedBlobReadResult::Incomplete. While the Drop impl is designed to send Incomplete, an Err from rx.await indicates an even more unexpected state. It might be safer to propagate this error or at least log it, rather than silently succeeding, to ensure that all potential issues during blob streaming are surfaced.

Suggested change
Ok(r) => verify_blob_bytes_read(&expected, expected_size, r),
Err(e) => Err(Error::Other(format!("Blob stream verification channel error: {e}").into())),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this review is incorrect. rx.await returns Err(_) when the sender is dropped without sending (canceled). A panic will still run Drop, and Drop sends Incomplete, so you’d receive Ok(Incomplete), not Err(_).

@0verflowme
Copy link
Author

0verflowme commented Dec 19, 2025

Question on fallback robustness:
Right now get_blob_stream() decides between GetRawBlob and GetBlob purely based on the negotiated protocol version (no string-matching on errors). Can we confirm the protocol contract that Initialize returning >= 0.2.8 guarantees GetRawBlob is implemented?
If that guarantee doesn’t hold, we probably need a machine-readable capability signal so that fallback remains reliable.

cc: @cgwalters

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Just as a first pass I think what would help build confidence here is ensuring in CI we're always testing both the old and new code.

One suggestion: A CI run that tests in quay.io/almalinuxorg/almalinux-bootc:10.0 as that uses skopeo version 1.18.1 which predates https://github.com/containers/skopeo/releases/tag/v1.19.0 when this feature appeared.

Run tests in containers with skopeo 1.18 and >=1.19 to exercise both GetBlob fallback and GetRawBlob paths.

Signed-off-by: Priyanshu Kumar <priyanshu.kumar@broadcom.com>
@0verflowme
Copy link
Author

@cgwalters I've updated the PR Can you please check?

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this! A second pass review

Digest::from_str(&format!("sha256:{}", hex::encode(h.finalize()))).unwrap()
}

fn write_blob(root: &std::path::Path, bytes: &[u8]) -> Result<(Digest, u64)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an https://docs.rs/ocidir/latest/ocidir/ crate for this

-v "$PWD:/src:ro" \
-v "$PWD/.ci-cargo-home:/root/.cargo" \
-w /src \
quay.io/fedora/fedora:41 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't thin fedora:41 has the new skopeo, also it's end of life. Let's use quay.io/centos/centos:stream10` instead

Comment on lines +214 to +215
debug_assert!(after >= before);
let delta = after - before;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about let delta = after.checked_sub(before).unwrap()

digest: &Digest,
expected_size: u64,
) -> Result<BlobStream<'a>> {
let fallback_to_get_blob = || async move {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the closure?

return fallback_to_get_blob().await;
}

match self.get_raw_blob(img, digest).await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just ? here


#[derive(Debug)]
enum VerifiedBlobReadResult {
Complete { nbytes: u64, digest_hex: String },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add type safety to digest_hex using https://docs.rs/oci-spec/latest/oci_spec/image/struct.Digest.html

let delta = after - before;
if delta > 0 {
let chunk = &buf.filled()[before..after];
if let Some(hasher) = self.hasher.as_mut() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always have a hasher right?

debug_assert!(after >= before);
let delta = after - before;
if delta > 0 {
let chunk = &buf.filled()[before..after];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by this...it seems like we're trying to avoid skipping the start of the buffer, but why do we need to do that? Can't we just consume filled() each time?

Also isn't this missing calling clear()?

(I am not super familiar with implementing AsyncRead this way to be clear)

inner: R,
nbytes: u64,
hasher: Option<Hasher>,
completion: Option<oneshot::Sender<VerifiedBlobReadResult>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on this would be useful, the expected control flow here is not obvious to me.

If there's a way we can avoid this oneshot channel that'd be good

}
let before = buf.filled().len();
match Pin::new(&mut self.inner).poll_read(cx, buf) {
v @ std::task::Poll::Ready(Ok(_)) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use () instead of _ to strengthen the match

@cgwalters cgwalters self-assigned this Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants