-
Notifications
You must be signed in to change notification settings - Fork 0
introduce proc macro for zero-copy protobuf serialization #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7f83b66 to
f0f1ef7
Compare
…spec Per protobuf specification, when a map entry is missing its key or value field, the type's default value should be used instead of returning an error. Changes: - Add Default bounds to deserialize_map_entry generic parameters - Use unwrap_or_default() instead of returning errors for missing fields - Add Default bounds to HashMap/AHashMap ProtoFieldDeserialize impls - Update documentation to reflect protobuf-compatible behavior This allows map types to support any value type that implements Default. For types without Default (like OffsetDateTime), users should wrap them in Option<T> so that None is the default, avoiding epoch timestamps.
f0f1ef7 to
bee0244
Compare
…deserialization and update field syntax 1. Update to new #[field(id = N)] syntax 2. Add RepeatedFieldDeserialize trait for optimized repeated field deserialization Changes: - Replace #[field = N] with #[field(id = N)] syntax in tests - Add RepeatedFieldDeserialize trait to eliminate intermediate allocations - Implement trait for HashMap, AHashMap, Vec<T> (except Vec<u8>), and TinyMap - Support both explicit #[field(repeated)] marker and auto-detection of common types - Auto-detect: Vec, HashMap, AHashMap, TinyMap, BTreeMap, IndexMap - Special case Vec<u8> as bytes field (not repeated) Performance: Reduces repeated field deserialization from O(n) intermediate container allocations to O(1) final container allocation.
2370cb4 to
19abaaf
Compare
8ca41fd to
9b7b67f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a procedural macro system for zero-copy protobuf serialization, eliminating the need for allocations during serialization/deserialization by working directly with the protobuf wire format. The implementation includes compile-time generated serialization code, support for auto-assigned and explicit field numbering, optimized repeated field handling, and type-safe oneOf enum serialization.
- Adds new
bd-macroscrate with#[proto_serializable]procedural macro - Creates
bd-proto-util/serializationmodule with traits and runtime helpers - Migrates existing code to use protobuf serialization instead of bincode
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| bd-workflows/Cargo.toml | Adds dependencies on bd-macros and bd-proto-util for protobuf serialization |
| bd-proto-util/src/serialization/types.rs | Implements serialization traits for primitive types, collections, time types, and special types |
| bd-proto-util/src/serialization/tests/macro_test.rs | Comprehensive test suite validating macro-generated serialization code |
| bd-proto-util/src/serialization/runtime.rs | Runtime helper for reading nested protobuf messages |
| bd-proto-util/src/serialization/mod.rs | Core trait definitions for protobuf serialization system |
| bd-proto-util/src/serialization/map.rs | Helper functions for map serialization/deserialization |
| bd-proto-util/src/serialization/macros.rs | Declarative macro for reducing repeated field implementation boilerplate |
| bd-proto-util/src/lib.rs | Exports new serialization module |
| bd-proto-util/Cargo.toml | Adds dependencies for serialization support |
| bd-macros/src/lib.rs | Core procedural macro implementation generating serialization code |
| bd-macros/Cargo.toml | New crate manifest for procedural macros |
| bd-log-primitives/src/tiny_set/proto.rs | Protobuf serialization for TinySet and TinyMap collections |
| bd-log-primitives/src/tiny_set.rs | Adds iterator and Extend trait implementations for protobuf compatibility |
| bd-log-primitives/src/lib.rs | Updates timestamp serialization to use new protobuf traits |
| bd-log-primitives/Cargo.toml | Adds dependencies for protobuf serialization |
| Cargo.toml | Registers bd-macros crate in workspace |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[allow(clippy::cast_possible_truncation)] | ||
| protobuf::rt::uint64_size(field_number, *self as u64) | ||
| } | ||
|
|
||
| fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> { | ||
| #[allow(clippy::cast_possible_truncation)] | ||
| os.write_uint64(field_number, *self as u64)?; | ||
| Ok(()) | ||
| } | ||
| } | ||
|
|
||
| impl ProtoFieldDeserialize for usize { | ||
| fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> { | ||
| let value = is.read_uint64()?; | ||
| #[allow(clippy::cast_possible_truncation)] | ||
| Ok(value as Self) |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast from usize to u64 may truncate on 32-bit platforms, but more importantly, this could result in inefficient size computation if usize is smaller than u64 on the target platform. Consider using the actual size of usize for the protobuf encoding rather than always using uint64_size.
| #[allow(clippy::cast_possible_truncation)] | |
| protobuf::rt::uint64_size(field_number, *self as u64) | |
| } | |
| fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> { | |
| #[allow(clippy::cast_possible_truncation)] | |
| os.write_uint64(field_number, *self as u64)?; | |
| Ok(()) | |
| } | |
| } | |
| impl ProtoFieldDeserialize for usize { | |
| fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> { | |
| let value = is.read_uint64()?; | |
| #[allow(clippy::cast_possible_truncation)] | |
| Ok(value as Self) | |
| #[cfg(target_pointer_width = "64")] | |
| { | |
| #[allow(clippy::cast_possible_truncation)] | |
| protobuf::rt::uint64_size(field_number, *self as u64) | |
| } | |
| #[cfg(target_pointer_width = "32")] | |
| { | |
| #[allow(clippy::cast_possible_truncation)] | |
| protobuf::rt::uint32_size(field_number, *self as u32) | |
| } | |
| } | |
| fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> { | |
| #[cfg(target_pointer_width = "64")] | |
| { | |
| #[allow(clippy::cast_possible_truncation)] | |
| os.write_uint64(field_number, *self as u64)?; | |
| } | |
| #[cfg(target_pointer_width = "32")] | |
| { | |
| #[allow(clippy::cast_possible_truncation)] | |
| os.write_uint32(field_number, *self as u32)?; | |
| } | |
| Ok(()) | |
| } | |
| } | |
| impl ProtoFieldDeserialize for usize { | |
| fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> { | |
| #[cfg(target_pointer_width = "64")] | |
| { | |
| let value = is.read_uint64()?; | |
| #[allow(clippy::cast_possible_truncation)] | |
| Ok(value as Self) | |
| } | |
| #[cfg(target_pointer_width = "32")] | |
| { | |
| let value = is.read_uint32()?; | |
| #[allow(clippy::cast_possible_truncation)] | |
| Ok(value as Self) | |
| } |
| // Since OffsetDateTime maps to google.protobuf.Timestamp and google.protobuf.Timestamp is a | ||
| // trivial message we can just delegate to the protobuf implementation here. | ||
| Timestamp::default().compute_size() + protobuf::rt::tag_size(field_number) |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing the size of a default Timestamp on every call is inefficient. The size of an empty Timestamp message is constant and should be computed once or treated as a known constant. Consider caching this value or computing it directly.
| // Since OffsetDateTime maps to google.protobuf.Timestamp and google.protobuf.Timestamp is a | |
| // trivial message we can just delegate to the protobuf implementation here. | |
| Timestamp::default().compute_size() + protobuf::rt::tag_size(field_number) | |
| // Since OffsetDateTime maps to google.protobuf.Timestamp and the size of an empty | |
| // google.protobuf.Timestamp message is constant (zero), we only need to account | |
| // for the tag size here. | |
| protobuf::rt::tag_size(field_number) |
| ts.seconds | ||
| .checked_mul(1_000_000_000) | ||
| .and_then(|s| s.checked_add(i64::from(ts.nanos))) | ||
| .ok_or_else(|| anyhow::anyhow!("Timestamp overflow"))? |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'Timestamp overflow' is too generic. It should include the actual values that caused the overflow (seconds and nanos) to aid in debugging.
| .ok_or_else(|| anyhow::anyhow!("Timestamp overflow"))? | |
| .ok_or_else(|| { | |
| anyhow::anyhow!( | |
| "Timestamp overflow when combining seconds and nanos: seconds={}, nanos={}", | |
| ts.seconds, | |
| ts.nanos | |
| ) | |
| })? |
bd-macros/src/lib.rs
Outdated
| let field_name = field.ident.as_ref().unwrap(); | ||
| let field_type = &field.ty; | ||
| #[allow(clippy::cast_possible_truncation)] | ||
| let field_num = (idx + 1) as u32; |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field numbering starts at 1 for struct variant fields, but this doesn't respect any explicit field numbering that might be specified via #[field(id = N)] attributes. The code should parse and respect field attributes for struct variant fields.
bd-macros/src/lib.rs
Outdated
| let auto_idx = fields_with_attrs[..= field_idx] | ||
| .iter() | ||
| .filter(|(_, a)| !a.skip) | ||
| .count(); |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-assigned field numbers start at 0, but protobuf field numbers must start at 1. This will cause all auto-assigned fields to be off by one, resulting in incompatible wire format.
bd-macros/src/lib.rs
Outdated
| } | ||
|
|
||
| os.write_tag(field_number, protobuf::rt::WireType::LengthDelimited)?; | ||
| os.write_raw_varint32(inner_size as u32)?; |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as comments 6 and 8: casting u64 to u32 without validation. This appears in multiple places and should be addressed consistently.
| os.write_raw_varint32(inner_size as u32)?; | |
| let inner_size_u32 = u32::try_from(inner_size) | |
| .map_err(|_| ::anyhow::anyhow!("serialized message size {} exceeds u32::MAX", inner_size))?; | |
| os.write_raw_varint32(inner_size_u32)?; |
- Add overflow checks for u64 to u32 conversions in length-delimited field serialization - Add field numbering validation to ensure auto-assigned field numbers start at 1 - Remove usize/isize ProtoFieldSerialize/Deserialize implementations due to cross-platform compatibility concerns - Fix u64 to usize overflow handling in serialize_message_to_bytes - Remove direct OffsetDateTime serialization
8511999 to
71f8889
Compare
This adds a new #[field(serialize_as = "Type")] attribute that allows
using one type in the struct while serializing as a different type. This
is useful for platform-dependent types like usize that need to serialize
as a fixed-width type (u64) for cross-platform compatibility.
The conversion uses From/TryFrom traits - serialization uses From to
convert to the wire type, and deserialization uses TryFrom with proper
error handling.
Example usage:
#[bd_macros::proto_serializable]
pub struct Example {
#[field(id = 1, serialize_as = "u64")]
pub index: usize,
}
This allows ergonomic use of usize in code while ensuring consistent
u64 serialization across platforms.
The initial implementation used From::from() which doesn't work for usize -> u64 conversion on all platforms. Changed to use 'as' cast for serialization (always safe for usize -> u64) and TryFrom for deserialization (with proper error handling for platforms where u64 values might not fit in usize). Also added TinySet to the list of repeated field types and added a test case demonstrating the serialize_as feature.
7b23f1d to
6eb73bd
Compare
| @@ -0,0 +1,870 @@ | |||
| // shared-core - bitdrift's common client/server libraries | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for me the code in this file is really difficult to read. Can you have the LLM add more comments? For this code I think a high level of what it does would be useful as well as maybe some example of what is being processed at each major branch point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it add more comments and also split it up into a struct mod and an enum mod, I think it's a bit easier to follow now
| // Implementation for LogType enum (int32) | ||
| // `LogType` is from `bd_proto::protos::logging::payload`. | ||
| impl ProtoType for bd_proto::protos::logging::payload::LogType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this special cased? Seems odd. Seems like this should be generic over enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<E: protobuf::Enum> ProtoType for E {
fn wire_type() -> WireType {
WireType::Varint
}
}
unfortunately this doesn't work with the orphan rule, since protobuf::Enum is defined in a different crate it won't let us do this since upstream changes could cause conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do something tricky with a NewType wrapper and then just have the proc macro use the wrapper automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to make this better by building this into the macro, now by adding #[field(id = 1, proto_enum)] it will codegen code that uses the protobuf::Enum helpers directly
| if val == 0 { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you lifted the 0/default check from the original code to here, but it would be nice if we did this generically in this file? Like check if value is == to default value (require default) and then skip size/serializing?
| @@ -0,0 +1,134 @@ | |||
| // shared-core - bitdrift's common client/server libraries | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: my sense is a bunch of this file can get shorter with some macros for shared types (numbers, strings, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I took a pass, I think it's a bit easier to follow now. Also folded the Default check into the macro per the other comment
| a: u32, | ||
| b: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't think we should support implicit field numbering. This seems like a footgun waiting to happen. I would probably remove it and make field numbers explicit and required. Also per offline convo in an optimal world we would check this against the .proto definitions. Fine as a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed implicit ordering
This changes the proto_serializable macro to require explicit #[field(id = N)] attributes for all non-skipped fields in both structs and enum struct variants. This removes the implicit field numbering behavior to make field IDs explicit and prevent accidental field number changes.
This removes the workflow-related changes that should be on the workflow-serde branch instead of proto-proc-macro branch.
- Use more macros to simplify seriialization code - Split macro implementations into separate files for better organization and add comments
| let tag = is.read_raw_varint32()?; | ||
| let field_number = tag >> 3; | ||
| let wire_type_bits = tag & 0x07; | ||
| let wire_type = match wire_type_bits { | ||
| 0 => WireType::Varint, | ||
| 1 => WireType::Fixed64, | ||
| 2 => WireType::LengthDelimited, | ||
| 3 => WireType::StartGroup, | ||
| 4 => WireType::EndGroup, | ||
| 5 => WireType::Fixed32, | ||
| _ => { | ||
| return Err(anyhow::anyhow!( | ||
| "Unknown wire type {wire_type_bits} (tag={tag}, field={field_number})" | ||
| )); | ||
| }, | ||
| }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this code appears in multiple places. It's also possible that the protobuf library already has a high layer wrapper that does all this. I vaguely recall it does but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not publicly exposed but I added a helper that seems to have made this much better, good callout
Summary
Introduces
#[proto_serializable]proc macro for zero-copy protobuf serialization.The macro generates manual protobuf serialization code that:
Changes
#[proto_serializable]ProtoFieldSerialize,ProtoFieldDeserialize,RepeatedFieldDeserialize) and runtime helpersStringOrBytesand related types