Skip to content

Conversation

@gfusee
Copy link

@gfusee gfusee commented May 19, 2025

Abstract

When trying to write objects with shared logic, developers might need to define generic objects. This is especially true in programming languages that don’t support interfaces such as Move.

These generic objects are easy to serialize, since the compiler or runtime always knows the exact object type at the time of calling the serialization function. However, this doesn’t apply to deserialization, where we might want to decode the common fields of a generic object without needing to know its generic parameters.

Let’s take an example using the Move language. A developer could declare the following object:

public struct PledgedLoan<T: store> has key, store {
   borrowed: u64,
   fees: u64,
   locked: T
}

This object is serialized in BCS as <borrowed bytes><fees bytes><T bytes>

Let's assume we want to develop a backend around this that checks what account borrowed the most and the one having the highest pending fees. We only have to retrieve each PledgedLoan for all the accounts and deserialize them into the following Rust struct for each address:

#[derive(Deserialize)]
pub struct CommonPledgeLoan {
   pub borrowed: u64,
   pub fees: u64
}

This is impossible to do in the current implementation of this crate because the deserializer checks that all the bytes input has been used, which is not the case because we don't need the locked field.

Safety concerns

Removing these checks directly is not good practice. Indeed, we can imagine fields of a struct having an incorrect Deserialize implementation, causing them to consume more or fewer bytes than their Serialize counterpart. In the best case, this would make deserialization fail, in the worst case, it could succeed with incorrect or undefined behavior.

Solution implemented

I added a boolean field, discard_remaining_input, to the Deserializer struct. When true, it skips the check in the end method. This field is always set to false in all existing methods, so the current behavior of the crate remains unchanged.

Next, I added two unsafe functions:

  • new_discarding_remaining_input: like new, but sets discard_remaining_input to true.
  • from_bytes_discarding_remaining_input: like from_bytes, but uses new_discarding_remaining_input instead of new.

I marked them unsafe to force developers to understand the risks, since misuse can have severe consequences.

Comment on lines +82 to +83
#[allow(unsafe_code)]
pub unsafe fn from_bytes_discarding_remaining_input<'a, T>(bytes: &'a [u8]) -> Result<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use unsafe for that. I believe what you're asking is a variant where we don't call end(). Or equivalently exposing the Deserializer struct. Historically, @bmwill and I have pushed back on this idea. Not sure if there is a stronger argument now.

Copy link
Author

@gfusee gfusee May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by some std's methods that are marked as unsafe even they don't use any unsafe code, such as Vec::set_len

To not call end would indeed achieve what I'm asking. However, end is defined in a trait (BcsDeserializer), which implicitly tells that it will be called anyway at the... end of something? Even the comment states this: The Deserializer::end method should be called after a type has been fully deserialized. It would be unintuitive to not call it in my opinion.

Exposing the Deserializer is a solution, but it would oblige developers to write themself the code to bypass the end function. What do you think about a variant of this where:

  • we remove the end function to be a requirement of the BcsDeserializer trait, which has no side effect since it is never called in a generic context (at least I didn't find any generic usage, correct me if I'm wrong)
  • we rename the Deserializer struct to RawDeserializer, and we remove the end method
  • we create a struct called DefaultDeserializer that has the end method that ensures if there is no remaining input. It uses RawDeserializer logic behind the scene
  • we use DefaultDeserializer where Deserializer was used
  • we expose both DefaultDeserializer and RawDeserializer
  • no from_bytes_discarding_remaining_input, no new_discarding_remaining_input and no discard_remaining_input

This way the developer would not notice any change, but someone wanting to not have the checks would just have to use RawDeserializer directly

Let me know if it looks good for you 🙏

@ma2bd ma2bd requested a review from bmwill May 19, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants