fix(proxy): strip double length-prefix on fetch/produce fan-out re-marshal#157
Open
kamir wants to merge 2 commits into
Open
fix(proxy): strip double length-prefix on fetch/produce fan-out re-marshal#157kamir wants to merge 2 commits into
kamir wants to merge 2 commits into
Conversation
…re-marshal encodeFetchRequest/encodeProduceRequest returned kmsg RequestFormatter.AppendRequest output, which already carries a 4-byte big-endian size prefix. forwardToBackend writes that payload via protocol.WriteFrame, which prepends its own size prefix. On any path that re-marshals (multi-broker fan-out where canUseOriginal is false, or a retry where originalPayload is nil'd) the on-wire frame became [size][size][header][body]. The broker reads the outer size, then parses the inner size bytes as apiKey/apiVersion, producing the observed "decode Produce v###: not enough data" and a dead consume path on the 3-broker azure-sim. The canUseOriginal branch was unaffected because originalPayload is the prefix-less ReadFrame payload. Fix: strip the leading 4 bytes that AppendRequest writes so WriteFrame is the single source of the size prefix. Add a real round-trip regression (encode -> WriteFrame -> ReadFrame -> ParseRequest, the exact broker decode path) for both fetch (v12, carries topic name) and produce (v9); it fails on the pre-fix code with the same family of decode garbage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…prefix regression Strengthen TestEncodeRequestNoDoubleLengthPrefix so it proves the single-length-prefix invariant directly instead of only checking the decoded header. For each case, after protocol.WriteFrame the test now asserts the first 4 bytes equal big-endian(len(rest)) AND that rest equals the encodeFetchRequest/encodeProduceRequest output byte-for-byte (exactly one prefix), then ParseRequest round-trips with the correct apiKey/apiVersion. Cases added: single-partition fetch, multi-partition fetch, a multi-broker fan-out (sub-requests built the way groupFetchPartitionsByBroker builds them, each encoded and asserted to frame singly), and an empty-partition fetch. The fan-out case also asserts no two sub-request encodings share a backing array, confirming no buffer aliasing on the hot path (each encode does a fresh AppendRequest(nil, ...) and returns a sub-slice of that buffer). go test ./cmd/proxy/... passes; negative-tested by reverting the b[4:] strip in both encode funcs, which makes every case fail with the same decode garbage, then restored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The proxy double-frames forwarded Fetch/Produce requests on the re-marshal path, so the
backend broker reads the inner size prefix as the request's apiKey/apiVersion and rejects
every request with an impossible version. This breaks the consume path whenever a fetch
fans out to more than one broker, or on any retry. One-line cause, one-line fix in two
helpers, plus a strengthened round-trip regression test.
This is the REQUEST re-marshal path (not response assembly). Built on top of #156
(
92458c7); see the consume-path boundary below for how this relates to #156 and #149.Symptom
With a multi-broker cluster, consumers get no data. Broker logs, on essentially every request:
(Produce maxes near v11, so v544 is impossible.) Proxy logs:
Root cause
forwardToBackendsends the request viaprotocol.WriteFrame(conn, payload), which prependsthe 4-byte Kafka message-size prefix. The fetch fan-out has two payload sources:
canUseOriginal(payload = originalPayload):originalPayloadcame fromReadFrameand isheader+body WITHOUT the size prefix, so
WriteFrameadds exactly one prefix. Correct.payload = encodeFetchRequest(header, subReq)):encodeFetchRequestreturnskmsg.RequestFormatter.AppendRequest(...), which ALREADY includes the 4-byte size prefix.WriteFramethen adds a second one, so the wire is[size][size][header][body]. The brokerreads the outer size, then parses the inner 4-byte size as apiKey(2)+apiVersion(2) and gets
garbage.
canUseOriginal := originalPayload != nil && len(groups) == 1, so the double-frame fireswhenever the fetch fans out to more than one broker, or on any retry iteration (where
originalPayloadis set to nil). A single-broker cluster never hit it because it always usedthe original prefix-less payload, which is why this was not seen earlier.
encodeProduceRequesthas the identical
AppendRequest+WriteFramepattern and the same latent double-frame.Fix
Strip the 4-byte prefix that
AppendRequestwrites, in bothencodeFetchRequestandencodeProduceRequest, sinceWriteFramere-adds it. The request body stays framed thesame way it was; we only remove the redundant inner prefix so
WriteFrameis the singlesource of the size prefix:
No buffer aliasing on the fan-out hot path
The fan-out encodes each broker sub-request independently. Each
encodeFetchRequest/encodeProduceRequestcall doesformatter.AppendRequest(nil, ...), i.e. a freshallocation per sub-request, and returns
b[4:], a sub-slice of that same fresh buffer.Concurrent sub-requests therefore never share or mutate one backing buffer. The
strengthened test asserts this directly: across the fan-out sub-requests no two encodings
share a backing array.
Test
TestEncodeRequestNoDoubleLengthPrefix(cmd/proxy/main_test.go) is a real round-trip:encode -> protocol.WriteFrame -> protocol.ReadFrame -> protocol.ParseRequest, the exactbroker decode path. It was strengthened to prove the single-prefix invariant directly
rather than only checking the decoded header. For each case, after
WriteFrameit asserts:rest(the payload the broker reads back) equals theencodeFetchRequest/encodeProduceRequestoutput byte-for-byte,so there is provably exactly one length prefix on the wire. It then asserts
ParseRequestreturns the correct apiKey/apiVersion and that the topics/partitions survive.
Cases covered:
groupFetchPartitionsByBrokerbuilds them (one
*kmsg.FetchRequestper owning broker, settings copied from the parent),each sub-request is encoded and asserted to frame singly and round-trip, and the no-aliasing
check above runs across them,
go build ./...,go vet ./cmd/proxy/...,gofmt, andgo test ./cmd/proxy/...are green.Negative-tested: reverting the
b[4:]strip in both encode funcs makes every case fail withthe same
decode Produce v###/insufficient bytesgarbage from the symptom, then restored.The byte-level round-trip test is the concrete runnable evidence here. A live multi-broker
capture is pending the lab coming back online; the franz-go recipe below is the manual
reproduction to run against a real cluster once it is up.
Consume-path boundary: how this relates to #156 and #149
These three are orthogonal, not redundant:
92458c7): handles a transient backend EOF immediately after a brokerrestart, on the retry path. It is covered by a test that uses a mock backend and asserts
the retry behaviour; that test does not exercise real request framing, which is why Empty eof race edge #156
was green while this double-prefix defect was still live. The two are independent.
config (
KAFSCALE_PRODUCE_SYNC_FLUSH=true), this is the consume fix for the common case.It is the prime suspect for the consume-through-proxy empty-result symptom, because any
multi-broker fan-out or retry double-frames the request and the broker rejects it.
because with sync-flush on the records are already durable before the produce is
acknowledged. It only matters under flush-disabled /
acks=0.So the default-config consume fix is this PR. #156 fixes a different (EOF/restart) failure
on the same forward path, and #149 only changes behaviour once you turn the default
sync-flush off.
franz-go reproduction recipe
Lab is offline, so this is the recipe to run against a real multi-broker cluster rather than
captured output. Produce N records across a multi-partition topic, then consume the same
topic THROUGH the proxy with a franz-go (
kgo) client and observe empty before the fix and Nrecords after, at the same topic and HWM.
A multi-broker cluster (so the fetch fans out to more than one broker) is what makes the
re-marshal path fire. Against a single broker the original prefix-less payload is used and the
bug is masked.