Skip to content

[Driver] Stream the auction JSON body to S3 + solvers#4575

Open
jmg-duarte wants to merge 12 commits into
mainfrom
jmgd/stream-solve-body
Open

[Driver] Stream the auction JSON body to S3 + solvers#4575
jmg-duarte wants to merge 12 commits into
mainfrom
jmgd/stream-solve-body

Conversation

@jmg-duarte

@jmg-duarte jmg-duarte commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

Streams the auction JSON through the network, avoiding the large memory spike from generating it upfront and then working with it.

The implementation tee's the JSON to the solvers and S3; in the S3 path it gets gzipped first (if only brotli was properly supported 😮‍💨)

Following numbers are with 8 chunks in memory

Big picture (Arb1 shadow, Arb1 Staging, Avalanche Staging)
image

Avalanche, in yellow, before the patch, in blue after (with a trade)
image

Longer time horizon — note the size of the spikes in orange VS the purple ones
image

These numbers are with 2 chunks

In blue, orders I placed
Screenshot 2026-06-30 at 16 30 00


Production, Base, avg_over_time CPU

image

Production, Base, current + avg_over_time Memory

image

The tests above were carried out without 381aa8a which we've seen reduce the memory spikes

Changes

  • New tee stream
  • New chunking stream

How to test

Graphs above, S3 auctions are intact

@jmg-duarte jmg-duarte force-pushed the jmgd/stream-solve-body branch from 3377af2 to ed8210a Compare June 29, 2026 15:16
Comment thread crates/driver/src/infra/solver/streaming.rs Outdated
Comment thread crates/driver/src/infra/solver/streaming.rs Outdated
Comment thread crates/driver/src/infra/solver/streaming/mod.rs
S: Write + Finalize + Send + 'static,
{
let (tx, rx) = mpsc::channel::<std::io::Result<Bytes>>(CHANNEL_CAPACITY);
let body = reqwest::Body::wrap_stream(ReceiverStream::new(rx));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the ByteStream that currently lives in the autopilot? That would automatically provide metrics about how long each solver needs to stream the bytes which can be very helpful to check if an issue is on our end or on the solver's end.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done but this makes the PR bigger and a bit harder to reason about, i'd extract it to reduce burden on you guys

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Then let's leave this for later to not have too many changes in the same PR.

Comment thread crates/driver/src/infra/solver/streaming/tee_writer.rs
Comment thread crates/driver/src/infra/solver/streaming.rs Outdated
jmg-duarte and others added 6 commits June 29, 2026 18:50
Review fixes that don't depend on the transmission-timing wrapper:
- Reduce CHANNEL_CAPACITY 8 -> 2; double buffering is enough to keep the
  consumer fed without raising the memory ceiling.
- Carry raw `Bytes` over the body channel and wrap them into infallible
  `Ok`s at the `wrap_stream` boundary instead of channeling `Result`s.
- Move the gzip compression policy into the `s3` crate via `GzipWriter`,
  so the streamed archive copy is compressed with the same settings as
  the eager upload path.
- Use the `mod.rs` layout for the `streaming` module.
- Note the UNIX `tee` lineage in `TeeWriter`'s docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the request-body transmission timing that lived in autopilot's
`ByteStream` into a generic `observe::http_body::Measured` stream wrapper,
and apply it to the driver's streamed `/solve` body so the driver->solver
hop also reports `to_transmission_start_ms` / `transmission_ms`.

autopilot now wraps a plain single-chunk stream with `Measured` and drops
the bespoke `ByteStream` type, which was only a `Bytes`-to-stream adapter
once its timing moved out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jmg-duarte jmg-duarte marked this pull request as ready for review June 30, 2026 11:05
@jmg-duarte jmg-duarte requested a review from a team as a code owner June 30, 2026 11:05
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jmg-duarte's task in 3m 11s —— View job


PR Review: Stream the auction JSON body to S3 + solvers

Nice change — the tee + chunked-streaming approach is clean and the memory-spike graphs are convincing. The TeeWriter/Finalize/GzipCapture decomposition reads well and the tests cover the important paths (byte forwarding + gzip round-trip). A few things worth a look before merge, posted inline:

  • Read changed files and understand the streaming/tee implementation
  • Review for correctness, error handling, and concurrency issues
  • Post inline comments
  • Post summary

Findings (inline):

  1. serialize_request overhead metric flatlines — serialization is now deferred into the streaming task, so the measurement right after building the body captures near-zero instead of the actual serialize cost. (solver/mod.rs:410)
  2. Archival is now coupled to solver-request success — if the solver connection drops/times out mid-upload, serialization aborts before finalize() and the S3 archive is silently skipped (regression vs. the eager archive_auction on main). Also flags that the spawn_blocking thread is now held for the whole upload. (solver/mod.rs:400)
  3. Truncated body on serialization error (low risk) — the Infallible stream can't abort the request, so a mid-stream serialize error sends a truncated-but-complete-looking body. Noted as the trade-off behind the Infallible simplification. (streaming/mod.rs:56)

Nothing blocking from a pure-correctness standpoint; #2 is the one I'd most want a deliberate decision on (is dropping the archive on a failed solver request acceptable?). The s3-crate-vs-driver compression placement and ByteStream metrics points were already discussed and deferred, so I didn't re-raise them.

Comment thread crates/driver/src/infra/solver/mod.rs Outdated
Comment on lines 410 to 416
if auction.id().is_some() {
::observe::metrics::metrics().measure_auction_overhead(
start,
"driver",
"serialize_request",
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serialize_request overhead metric no longer measures serialization. stream_body/stream_body_and_gzip return immediately with a lazy body handle — the actual serde_json::to_writer work now happens inside the spawn_blocking task, driven by network backpressure. So start..now here captures only the DTO construction + channel setup (near-zero), not the serialization cost it used to measure on the eager path.

If the intent is to keep tracking serialization time, the measurement needs to move into the streaming task (e.g. around serde_json::to_writer); otherwise this metric will silently flatline after this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's possible to measure this without the stream wrapper we discussed

Comment on lines +400 to +409
let body: reqwest::Body = match archive_id {
// Stream the request body while capturing a gzipped copy for S3, so
// neither the request nor the archive holds the full JSON at once.
Some(id) => {
let (body, compressed) = streaming::stream_body_and_gzip(auction_dto);
self.persistence.archive_auction_gzipped(id, compressed);
body
}
None => streaming::stream_body(auction_dto),
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change worth a second look: S3 archival is now coupled to the solver request streaming successfully to completion.

On main, the auction was serialized eagerly into Bytes and archive_auction uploaded that buffer independently of the solver request outcome. Here, the gzipped bytes only reach archive_auction_gzipped when finalize() runs, which only happens if serde_json::to_writer completes — and that requires reqwest to pull the entire body. If the solver connection drops or times out mid-upload, ChannelWriter::blocking_send errors → serialization aborts early → GzipCapture is dropped without finalize() → the oneshot sender is dropped → the archive is silently skipped.

Net effect: auctions whose solver request fails partway through transmission no longer get archived — which may be exactly the auctions you'd want to inspect later. Is dropping the archive in that case intended? If not, consider decoupling the archive from the request stream (e.g. tee into an independent buffer rather than gating finalize on full request consumption).

Secondary note: the spawn_blocking thread is now held for the full duration of the body upload (it blocks on blocking_send until reqwest drains each chunk), rather than being released right after producing the buffer. Per-solve that's a blocking-pool thread occupied for the whole solver round-trip.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved this with the TeeWriter and the BestEffortSink, which will allow the TeeWriter to continue uploading to S3 when this scenarion happens

Comment thread crates/driver/src/infra/solver/streaming/mod.rs

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces streaming serialization and compression for auction payloads to optimize memory usage. Instead of buffering the entire uncompressed JSON in memory, the request body is streamed to the solver while a gzipped copy is captured on the fly for S3 archival. Feedback on these changes highlights the need to elevate the log level for serialization and flushing errors from debug to warn/error, and to make the TeeWriter::write implementation more robust by properly handling partial writes instead of assuming full writes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/driver/src/infra/solver/streaming/mod.rs
Comment thread crates/driver/src/infra/solver/streaming/mod.rs
Comment thread crates/driver/src/infra/solver/streaming/tee_writer.rs

@MartinquaXD MartinquaXD left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. When I briefly looked at the metrics I remember this incurring some non-trivial CPU overhead. Could you please provide updated metrics for this?
Ideally from some network with reasonably sized auctions. Given that this was already e2e tested thoroughly one should probably just run it in prod for a little bit to get the more accurate and relevant numbers for this.

Comment thread crates/driver/Cargo.toml Outdated
Comment thread crates/driver/src/infra/solver/streaming/mod.rs
Comment thread crates/driver/src/infra/solver/mod.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants