Skip to content

Fix fragment-induced BAD,Incorrect result rejects: buffer TCP reads per connection#4

Open
gionag wants to merge 1 commit intoduino-coin:mainfrom
gionag:fix/receivedata-buffering
Open

Fix fragment-induced BAD,Incorrect result rejects: buffer TCP reads per connection#4
gionag wants to merge 1 commit intoduino-coin:mainfrom
gionag:fix/receivedata-buffering

Conversation

@gionag
Copy link
Copy Markdown

@gionag gionag commented Apr 20, 2026

Problem

Under sustained load a fraction of mining shares were rejected as
BAD,Incorrect result even when the nonce was mathematically
correct. Miners see a rejection rate of roughly 0.3 % for no
apparent reason. Reproduced against this repo verbatim at ~0.17 %.

Root cause

receiveData in src/mining.js attached a one-shot 'data'
listener and resolved with whatever single chunk Node.js delivered:

const receiveData = (conn) => new Promise((resolve) => {
    conn.on("data", function listener(data) {
        conn.removeListener("data", listener);
        resolve(data.trim());
    });
});

TCP does not preserve application-level message boundaries. Under
event-loop pressure a single client message can span two 'data'
events, or two messages can arrive in one event. The one-shot
pattern has no way to recover: answer[0] ends up parsing a
fragment of a different message ("JOB", a mining_key suffix,
etc), parseInt(...) returns NaN, and miner_res !== random
triggers BAD,Incorrect result on a perfectly valid share.

The same fragmentation risk exists in connectionHandler.js
mainListener for the very first JOB_REQ on a new connection —
a truncated mining_key can be passed to miningHandler and
propagated into minersStats.pw for the rest of the session.

Fix

Install a single persistent 'data' listener per connection that
appends incoming chunks to a per-connection buffer, then hands out
one message per receiveData call.

Framing is dual-mode for backward compatibility:

  • Fast path — if \n is in the buffer, cut there. Clients
    that terminate their messages get robust boundary detection.
  • Legacy path — if no \n arrives within
    LEGACY_IDLE_FLUSH_MS (10 ms) after the last chunk, treat the
    buffer as a single message. TCP fragment parts of the same
    logical message arrive sub-ms apart at receive side, so this
    idle gap cleanly separates "still arriving" from "message
    complete".

Existing miners that do not append \n (AVR_Miner.py,
PC_Miner.py, and most community clients) are unaffected — they
hit the legacy path with an imperceptible 10 ms tail. New clients
that opt in to \n termination get the zero-latency fast path.

Validation

Test harness: 47 concurrent virtual-worker TCP sockets against the
patched pool, diff = 10, ~0.5 shares/sec/worker.

Unpatched (baseline) Patched
Duration ~15 min 35 min
Shares processed ~17,000 33,947
BAD,Incorrect result 30 (0.17 %) 0
First reject at ~14,192 shares never

Evidence on the unpatched side included 30 distinct reject cycles
where the client's submitted nonce was the correct answer to the
pool's own (lastBlockhash, newHash) challenge (verified by
client-side SHA-1 comparison), but the pool's answer[0] parsed
as "JOB" or "...destroyer" (the mining_key tail) — proof that
the fragment-reassembly assumption was wrong.

Files changed

  • src/mining.jsreceiveData rewritten with persistent
    per-connection buffer + dual-mode framing.
  • src/connectionHandler.jsmainListener refactored to the
    same pattern so the first JOB_REQ gets the same protection;
    post-\n leftover is handed off to mining.js via the shared
    conn._recvBuf so no bytes are lost at the handler boundary.

No client-side changes required. No API changes. No configuration
changes.

…d share rejects

receiveData attached a one-shot 'data' listener and resolved with
whatever single chunk Node.js delivered. TCP does not preserve
application-level message boundaries — a client message can span
multiple 'data' events, or two messages can coalesce into one —
and this pattern had no way to recover. answer[0] ended up parsing
fragments ("JOB", mining_key tails), parseInt returned NaN,
miner_res !== random, and valid shares got rejected as
BAD,Incorrect result.

Observed rate under 47 concurrent mining sockets over ~80 ms RTT:
roughly 0.34 % of shares rejected. Reproduced locally at ~0.17 %;
first reject consistently appeared after ~14k clean shares.

Fix: install a single persistent 'data' listener per connection
that accumulates into a per-conn buffer and hands out one message
at a time. Framing is dual-mode for backward compatibility:

  - If the buffer contains '\n', cut there (robust path).
  - Otherwise wait LEGACY_IDLE_FLUSH_MS (10 ms) after the last
    chunk; if nothing more arrives, treat the buffer as a single
    message. Fragments of the same logical message arrive sub-ms
    apart at receive side, so the idle gap cleanly separates
    "still arriving" from "done".

The same pattern is applied to connectionHandler's mainListener so
the first JOB_REQ on a new connection cannot pass a truncated
mining_key to miningHandler.

Legacy miners that do not append '\n' are unaffected. Clients that
opt in to '\n' termination get the zero-latency fast path.

Validation: 47 virtual workers against the patched pool for 35
minutes, 33,947 shares processed, 0 BAD,Incorrect result. Baseline
unpatched: first BAD at ~14,192 shares, persistent thereafter.
@gionag
Copy link
Copy Markdown
Author

gionag commented Apr 20, 2026

Note for follow-up

This PR is fully backward-compatible: clients that don't terminate
their messages with \n still work via the 10 ms idle-flush legacy
path.

Worth noting for future work: if the reference miner code is updated
so that every outgoing message appends \n, every client hits the
fast path with zero added latency and fully deterministic framing.
That's a small, mechanical change on the miner side. Once the
majority of deployed miners send \n, the legacy idle-flush path
here could be removed in a later simplification of receiveData.

Not a prerequisite for merging this PR — it stands on its own — just
a cleanup direction to keep in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant