Added discard remaining input option when deserializing #13

gfusee · 2025-05-19T01:03:26Z

Abstract

When trying to write objects with shared logic, developers might need to define generic objects. This is especially true in programming languages that don’t support interfaces such as Move.

These generic objects are easy to serialize, since the compiler or runtime always knows the exact object type at the time of calling the serialization function. However, this doesn’t apply to deserialization, where we might want to decode the common fields of a generic object without needing to know its generic parameters.

Let’s take an example using the Move language. A developer could declare the following object:

public struct PledgedLoan<T: store> has key, store {
   borrowed: u64,
   fees: u64,
   locked: T
}

This object is serialized in BCS as <borrowed bytes><fees bytes><T bytes>

Let's assume we want to develop a backend around this that checks what account borrowed the most and the one having the highest pending fees. We only have to retrieve each PledgedLoan for all the accounts and deserialize them into the following Rust struct for each address:

#[derive(Deserialize)]
pub struct CommonPledgeLoan {
   pub borrowed: u64,
   pub fees: u64
}

This is impossible to do in the current implementation of this crate because the deserializer checks that all the bytes input has been used, which is not the case because we don't need the locked field.

Safety concerns

Removing these checks directly is not good practice. Indeed, we can imagine fields of a struct having an incorrect Deserialize implementation, causing them to consume more or fewer bytes than their Serialize counterpart. In the best case, this would make deserialization fail, in the worst case, it could succeed with incorrect or undefined behavior.

Solution implemented

I added a boolean field, discard_remaining_input, to the Deserializer struct. When true, it skips the check in the end method. This field is always set to false in all existing methods, so the current behavior of the crate remains unchanged.

Next, I added two unsafe functions:

new_discarding_remaining_input: like new, but sets discard_remaining_input to true.
from_bytes_discarding_remaining_input: like from_bytes, but uses new_discarding_remaining_input instead of new.

I marked them unsafe to force developers to understand the risks, since misuse can have severe consequences.

ma2bd · 2025-05-19T21:55:19Z

src/de.rs

+#[allow(unsafe_code)]
+pub unsafe fn from_bytes_discarding_remaining_input<'a, T>(bytes: &'a [u8]) -> Result<T>


I wouldn't use unsafe for that. I believe what you're asking is a variant where we don't call end(). Or equivalently exposing the Deserializer struct. Historically, @bmwill and I have pushed back on this idea. Not sure if there is a stronger argument now.

I was inspired by some std's methods that are marked as unsafe even they don't use any unsafe code, such as Vec::set_len

To not call end would indeed achieve what I'm asking. However, end is defined in a trait (BcsDeserializer), which implicitly tells that it will be called anyway at the... end of something? Even the comment states this: The Deserializer::end method should be called after a type has been fully deserialized. It would be unintuitive to not call it in my opinion.

Exposing the Deserializer is a solution, but it would oblige developers to write themself the code to bypass the end function. What do you think about a variant of this where:

we remove the end function to be a requirement of the BcsDeserializer trait, which has no side effect since it is never called in a generic context (at least I didn't find any generic usage, correct me if I'm wrong)

we rename the Deserializer struct to RawDeserializer, and we remove the end method

we create a struct called DefaultDeserializer that has the end method that ensures if there is no remaining input. It uses RawDeserializer logic behind the scene

we use DefaultDeserializer where Deserializer was used

we expose both DefaultDeserializer and RawDeserializer

no from_bytes_discarding_remaining_input, no new_discarding_remaining_input and no discard_remaining_input

This way the developer would not notice any change, but someone wanting to not have the checks would just have to use RawDeserializer directly

Let me know if it looks good for you 🙏

Added discard remaining input option when deserializing

8af07d1

ma2bd reviewed May 19, 2025

View reviewed changes

ma2bd requested a review from bmwill May 19, 2025 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added discard remaining input option when deserializing #13

Added discard remaining input option when deserializing #13

Uh oh!

gfusee commented May 19, 2025 •

edited

Loading

Uh oh!

ma2bd May 19, 2025

Uh oh!

gfusee May 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#[allow(unsafe_code)]
		pub unsafe fn from_bytes_discarding_remaining_input<'a, T>(bytes: &'a [u8]) -> Result<T>

Added discard remaining input option when deserializing #13

Are you sure you want to change the base?

Added discard remaining input option when deserializing #13

Uh oh!

Conversation

gfusee commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Abstract

Safety concerns

Solution implemented

Uh oh!

ma2bd May 19, 2025

Choose a reason for hiding this comment

Uh oh!

gfusee May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gfusee commented May 19, 2025 •

edited

Loading

gfusee May 24, 2025 •

edited

Loading