Skip to content

Conversation

@snowp
Copy link
Contributor

@snowp snowp commented Dec 23, 2025

Summary

Introduces #[proto_serializable] proc macro for zero-copy protobuf serialization.

The macro generates manual protobuf serialization code that:

  • Eliminates allocations by working directly with protobuf wire format
  • Supports auto-assigned and explicit field numbering
  • Handles repeated fields (Vec, HashMap, etc.) with optimized deserialization paths
  • Provides type-safe oneOf enum serialization

Changes

  • bd-macros: New proc macro crate implementing #[proto_serializable]
  • bd-proto-util: Added serialization traits (ProtoFieldSerialize, ProtoFieldDeserialize, RepeatedFieldDeserialize) and runtime helpers
  • bd-key-value: Migrated from bincode to protobuf serialization while maintaining backward compatibility for strings
  • bd-log-primitives: Applied macro to StringOrBytes and related types
  • bd-crash-handler: Updated global state storage to use protobuf serialization

@snowp snowp changed the title Proto proc macro introduce proc macro for zero-copy protobuf serialization Dec 23, 2025
@snowp snowp force-pushed the proto-proc-macro branch 19 times, most recently from 7f83b66 to f0f1ef7 Compare December 24, 2025 20:44
…spec

Per protobuf specification, when a map entry is missing its key or value field,
the type's default value should be used instead of returning an error.

Changes:
- Add Default bounds to deserialize_map_entry generic parameters
- Use unwrap_or_default() instead of returning errors for missing fields
- Add Default bounds to HashMap/AHashMap ProtoFieldDeserialize impls
- Update documentation to reflect protobuf-compatible behavior

This allows map types to support any value type that implements Default.
For types without Default (like OffsetDateTime), users should wrap them
in Option<T> so that None is the default, avoiding epoch timestamps.
…deserialization and update field syntax

1. Update to new #[field(id = N)] syntax
2. Add RepeatedFieldDeserialize trait for optimized repeated field deserialization

Changes:
- Replace #[field = N] with #[field(id = N)] syntax in tests
- Add RepeatedFieldDeserialize trait to eliminate intermediate allocations
- Implement trait for HashMap, AHashMap, Vec<T> (except Vec<u8>), and TinyMap
- Support both explicit #[field(repeated)] marker and auto-detection of common types
- Auto-detect: Vec, HashMap, AHashMap, TinyMap, BTreeMap, IndexMap
- Special case Vec<u8> as bytes field (not repeated)

Performance: Reduces repeated field deserialization from O(n) intermediate
container allocations to O(1) final container allocation.
@snowp snowp marked this pull request as ready for review December 27, 2025 17:38
@snowp snowp requested a review from Copilot December 27, 2025 20:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a procedural macro system for zero-copy protobuf serialization, eliminating the need for allocations during serialization/deserialization by working directly with the protobuf wire format. The implementation includes compile-time generated serialization code, support for auto-assigned and explicit field numbering, optimized repeated field handling, and type-safe oneOf enum serialization.

  • Adds new bd-macros crate with #[proto_serializable] procedural macro
  • Creates bd-proto-util/serialization module with traits and runtime helpers
  • Migrates existing code to use protobuf serialization instead of bincode

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
bd-workflows/Cargo.toml Adds dependencies on bd-macros and bd-proto-util for protobuf serialization
bd-proto-util/src/serialization/types.rs Implements serialization traits for primitive types, collections, time types, and special types
bd-proto-util/src/serialization/tests/macro_test.rs Comprehensive test suite validating macro-generated serialization code
bd-proto-util/src/serialization/runtime.rs Runtime helper for reading nested protobuf messages
bd-proto-util/src/serialization/mod.rs Core trait definitions for protobuf serialization system
bd-proto-util/src/serialization/map.rs Helper functions for map serialization/deserialization
bd-proto-util/src/serialization/macros.rs Declarative macro for reducing repeated field implementation boilerplate
bd-proto-util/src/lib.rs Exports new serialization module
bd-proto-util/Cargo.toml Adds dependencies for serialization support
bd-macros/src/lib.rs Core procedural macro implementation generating serialization code
bd-macros/Cargo.toml New crate manifest for procedural macros
bd-log-primitives/src/tiny_set/proto.rs Protobuf serialization for TinySet and TinyMap collections
bd-log-primitives/src/tiny_set.rs Adds iterator and Extend trait implementations for protobuf compatibility
bd-log-primitives/src/lib.rs Updates timestamp serialization to use new protobuf traits
bd-log-primitives/Cargo.toml Adds dependencies for protobuf serialization
Cargo.toml Registers bd-macros crate in workspace

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 109 to 124
#[allow(clippy::cast_possible_truncation)]
protobuf::rt::uint64_size(field_number, *self as u64)
}

fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> {
#[allow(clippy::cast_possible_truncation)]
os.write_uint64(field_number, *self as u64)?;
Ok(())
}
}

impl ProtoFieldDeserialize for usize {
fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> {
let value = is.read_uint64()?;
#[allow(clippy::cast_possible_truncation)]
Ok(value as Self)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast from usize to u64 may truncate on 32-bit platforms, but more importantly, this could result in inefficient size computation if usize is smaller than u64 on the target platform. Consider using the actual size of usize for the protobuf encoding rather than always using uint64_size.

Suggested change
#[allow(clippy::cast_possible_truncation)]
protobuf::rt::uint64_size(field_number, *self as u64)
}
fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> {
#[allow(clippy::cast_possible_truncation)]
os.write_uint64(field_number, *self as u64)?;
Ok(())
}
}
impl ProtoFieldDeserialize for usize {
fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> {
let value = is.read_uint64()?;
#[allow(clippy::cast_possible_truncation)]
Ok(value as Self)
#[cfg(target_pointer_width = "64")]
{
#[allow(clippy::cast_possible_truncation)]
protobuf::rt::uint64_size(field_number, *self as u64)
}
#[cfg(target_pointer_width = "32")]
{
#[allow(clippy::cast_possible_truncation)]
protobuf::rt::uint32_size(field_number, *self as u32)
}
}
fn serialize(&self, field_number: u32, os: &mut CodedOutputStream<'_>) -> Result<()> {
#[cfg(target_pointer_width = "64")]
{
#[allow(clippy::cast_possible_truncation)]
os.write_uint64(field_number, *self as u64)?;
}
#[cfg(target_pointer_width = "32")]
{
#[allow(clippy::cast_possible_truncation)]
os.write_uint32(field_number, *self as u32)?;
}
Ok(())
}
}
impl ProtoFieldDeserialize for usize {
fn deserialize(is: &mut CodedInputStream<'_>) -> Result<Self> {
#[cfg(target_pointer_width = "64")]
{
let value = is.read_uint64()?;
#[allow(clippy::cast_possible_truncation)]
Ok(value as Self)
}
#[cfg(target_pointer_width = "32")]
{
let value = is.read_uint32()?;
#[allow(clippy::cast_possible_truncation)]
Ok(value as Self)
}

Copilot uses AI. Check for mistakes.
Comment on lines 404 to 406
// Since OffsetDateTime maps to google.protobuf.Timestamp and google.protobuf.Timestamp is a
// trivial message we can just delegate to the protobuf implementation here.
Timestamp::default().compute_size() + protobuf::rt::tag_size(field_number)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the size of a default Timestamp on every call is inefficient. The size of an empty Timestamp message is constant and should be computed once or treated as a known constant. Consider caching this value or computing it directly.

Suggested change
// Since OffsetDateTime maps to google.protobuf.Timestamp and google.protobuf.Timestamp is a
// trivial message we can just delegate to the protobuf implementation here.
Timestamp::default().compute_size() + protobuf::rt::tag_size(field_number)
// Since OffsetDateTime maps to google.protobuf.Timestamp and the size of an empty
// google.protobuf.Timestamp message is constant (zero), we only need to account
// for the tag size here.
protobuf::rt::tag_size(field_number)

Copilot uses AI. Check for mistakes.
ts.seconds
.checked_mul(1_000_000_000)
.and_then(|s| s.checked_add(i64::from(ts.nanos)))
.ok_or_else(|| anyhow::anyhow!("Timestamp overflow"))?
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Timestamp overflow' is too generic. It should include the actual values that caused the overflow (seconds and nanos) to aid in debugging.

Suggested change
.ok_or_else(|| anyhow::anyhow!("Timestamp overflow"))?
.ok_or_else(|| {
anyhow::anyhow!(
"Timestamp overflow when combining seconds and nanos: seconds={}, nanos={}",
ts.seconds,
ts.nanos
)
})?

Copilot uses AI. Check for mistakes.
let field_name = field.ident.as_ref().unwrap();
let field_type = &field.ty;
#[allow(clippy::cast_possible_truncation)]
let field_num = (idx + 1) as u32;
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field numbering starts at 1 for struct variant fields, but this doesn't respect any explicit field numbering that might be specified via #[field(id = N)] attributes. The code should parse and respect field attributes for struct variant fields.

Copilot uses AI. Check for mistakes.
Comment on lines 444 to 447
let auto_idx = fields_with_attrs[..= field_idx]
.iter()
.filter(|(_, a)| !a.skip)
.count();
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-assigned field numbers start at 0, but protobuf field numbers must start at 1. This will cause all auto-assigned fields to be off by one, resulting in incompatible wire format.

Copilot uses AI. Check for mistakes.
}

os.write_tag(field_number, protobuf::rt::WireType::LengthDelimited)?;
os.write_raw_varint32(inner_size as u32)?;
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as comments 6 and 8: casting u64 to u32 without validation. This appears in multiple places and should be addressed consistently.

Suggested change
os.write_raw_varint32(inner_size as u32)?;
let inner_size_u32 = u32::try_from(inner_size)
.map_err(|_| ::anyhow::anyhow!("serialized message size {} exceeds u32::MAX", inner_size))?;
os.write_raw_varint32(inner_size_u32)?;

Copilot uses AI. Check for mistakes.
- Add overflow checks for u64 to u32 conversions in length-delimited field serialization
- Add field numbering validation to ensure auto-assigned field numbers start at 1
- Remove usize/isize ProtoFieldSerialize/Deserialize implementations due to cross-platform compatibility concerns
- Fix u64 to usize overflow handling in serialize_message_to_bytes
- Remove direct OffsetDateTime serialization
This adds a new #[field(serialize_as = "Type")] attribute that allows
using one type in the struct while serializing as a different type. This
is useful for platform-dependent types like usize that need to serialize
as a fixed-width type (u64) for cross-platform compatibility.

The conversion uses From/TryFrom traits - serialization uses From to
convert to the wire type, and deserialization uses TryFrom with proper
error handling.

Example usage:
  #[bd_macros::proto_serializable]
  pub struct Example {
    #[field(id = 1, serialize_as = "u64")]
    pub index: usize,
  }

This allows ergonomic use of usize in code while ensuring consistent
u64 serialization across platforms.
The initial implementation used From::from() which doesn't work for
usize -> u64 conversion on all platforms. Changed to use 'as' cast
for serialization (always safe for usize -> u64) and TryFrom for
deserialization (with proper error handling for platforms where u64
values might not fit in usize).

Also added TinySet to the list of repeated field types and added a
test case demonstrating the serialize_as feature.
@@ -0,0 +1,870 @@
// shared-core - bitdrift's common client/server libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for me the code in this file is really difficult to read. Can you have the LLM add more comments? For this code I think a high level of what it does would be useful as well as maybe some example of what is being processed at each major branch point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it add more comments and also split it up into a struct mod and an enum mod, I think it's a bit easier to follow now

Comment on lines 619 to 621
// Implementation for LogType enum (int32)
// `LogType` is from `bd_proto::protos::logging::payload`.
impl ProtoType for bd_proto::protos::logging::payload::LogType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this special cased? Seems odd. Seems like this should be generic over enums?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl<E: protobuf::Enum> ProtoType for E {
  fn wire_type() -> WireType {
    WireType::Varint
  }
}

unfortunately this doesn't work with the orphan rule, since protobuf::Enum is defined in a different crate it won't let us do this since upstream changes could cause conflicts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do something tricky with a NewType wrapper and then just have the proc macro use the wrapper automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to make this better by building this into the macro, now by adding #[field(id = 1, proto_enum)] it will codegen code that uses the protobuf::Enum helpers directly

Comment on lines 631 to 633
if val == 0 {
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you lifted the 0/default check from the original code to here, but it would be nice if we did this generically in this file? Like check if value is == to default value (require default) and then skip size/serializing?

@@ -0,0 +1,134 @@
// shared-core - bitdrift's common client/server libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: my sense is a bunch of this file can get shorter with some macros for shared types (numbers, strings, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I took a pass, I think it's a bit easier to follow now. Also folded the Default check into the macro per the other comment

Comment on lines 20 to 21
a: u32,
b: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think we should support implicit field numbering. This seems like a footgun waiting to happen. I would probably remove it and make field numbers explicit and required. Also per offline convo in an optimal world we would check this against the .proto definitions. Fine as a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed implicit ordering

This changes the proto_serializable macro to require explicit #[field(id = N)]
attributes for all non-skipped fields in both structs and enum struct variants.
This removes the implicit field numbering behavior to make field IDs explicit
and prevent accidental field number changes.
This removes the workflow-related changes that should be on the workflow-serde
branch instead of proto-proc-macro branch.
- Use more macros to simplify seriialization code
- Split macro implementations into separate files for better organization and add comments
Comment on lines 27 to 43
let tag = is.read_raw_varint32()?;
let field_number = tag >> 3;
let wire_type_bits = tag & 0x07;
let wire_type = match wire_type_bits {
0 => WireType::Varint,
1 => WireType::Fixed64,
2 => WireType::LengthDelimited,
3 => WireType::StartGroup,
4 => WireType::EndGroup,
5 => WireType::Fixed32,
_ => {
return Err(anyhow::anyhow!(
"Unknown wire type {wire_type_bits} (tag={tag}, field={field_number})"
));
},
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this code appears in multiple places. It's also possible that the protobuf library already has a high layer wrapper that does all this. I vaguely recall it does but not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not publicly exposed but I added a helper that seems to have made this much better, good callout

@snowp snowp enabled auto-merge (squash) January 2, 2026 19:28
@snowp snowp merged commit 58e0655 into main Jan 2, 2026
6 of 7 checks passed
@snowp snowp deleted the proto-proc-macro branch January 2, 2026 19:32
@github-actions github-actions bot locked and limited conversation to collaborators Jan 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants